### Techniques for Design of Compact and Efficient Digital Doherty CMOS Power Amplifiers and Transmitters

А

### Dissertation

Presented to

the faculty of the School of Engineering and Applied Science University of Virginia

> in partial fulfillment of the requirements for the degree

> > Doctor of Philosophy

by

Jay R. Sheth

December 2022

### **APPROVAL SHEET**

This

Dissertation

# is submitted in partial fulfillment of the requirements for the degree of

### Doctor of Philosophy

### Author: Jay R. Sheth

This Dissertation has been read and approved by the examing committee:

Advisor: Steven M. Bowers

Advisor:

Committee Member: Robert M. Weikle, II

Committee Member: N. Scott Barker

Committee Member: Travis N. Blalock

Committee Member: Bradford Campbell

Committee Member:

Committee Member:

Accepted for the School of Engineering and Applied Science:

Jennifer L. West, School of Engineering and Applied Science
December 2022

### Techniques for Design of Compact and Efficient Digital Doherty CMOS Power Amplifiers and Transmitters

Jay R. Sheth

Dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in

the School of Engineering and Applied Science of the University of Virginia

> Committee Steven M. Bowers, Advisor Robert M. Weikle II, Chair N. Scott Barker Travis N. Blalock Bradford Campbell

December 2022 Charlottesville, Virginia

Keywords: digital transmitter, digital power amplifier, asymmetric Doherty, single footprint Doherty, compact matching network, CMOS, high efficiency, back-off efficiency.

All Rights Reserved, Jay R. Sheth

### Techniques for Design of Compact and Efficient Digital Doherty CMOS Power Amplifiers and Transmitters

Jay R. Sheth

#### (ABSTRACT)

The Sub-7 GHz spectrum (450 MHz to 7 GHz) is widely used for a variety of wireless communication networks. Over the years, the use of this spectrum has evolved heavily to support increased connectivity and higher data-rates. Various emerging applications such as smart home connectivity, augmented reality, virtual reality, smart farming, inventory tracking, and industrial monitoring, have led to the creation of newer networks, such as 5G cellular, Wi-Fi 6E, and Wi-Fi HaLow. Despite the vastly different operations of these networks, all of them share a common need for cost-effective devices with long battery life. Transmitters are one of the most important components of a wireless device since they directly affect both of these attributes through their size and power dissipation. Therefore, this dissertation focuses on frequency and output power scalable design techniques to realize compact and efficient wireless transmitters.

In recent years, digitally implemented transmitter architectures have garnered a significant amount of interest as they are amenable to scale with process nodes, and they have the potential to achieve improved battery life while being cost-effective. However, in practice, the battery life improvement is severely limited, as these architectures suffer from poor efficiency in output power back-off. Further, the two well-known digital transmitter implementations, polar and quadrature, have their own limitations. The polar implementation suffers from wide bandwidth expansion that limits the overall data-rate of the transmitter, while the quadrature implementation exhibits degraded efficiency due to the enormous in-phase/quadrature basis vector overlap, which further limits the improvement to battery life. This dissertation proposes and implements two techniques, a nested Doherty architecture and a transformer-within-transformer Doherty architecture, to improve efficiency in deep power back-off, while still achieving a compact form-factor to enable cost-effective transmitters. In addition, a multi-phase current-mode implementation of a digital power amplifier is also proposed and implemented to achieve higher data-rates, while also improving the overall efficiency, to improve the battery life of the transmitter even further. These techniques are demonstrated using two different integrated circuit chips implemented in a general-purpose 65 nm CMOS process:

1) A four-way asymmetric digital polar power amplifier shows an implementation of a four-way nested Doherty architecture to achieve efficiency enhancement up to 9 dB in output power back-off, while maintaining a compact form-factor, for IoT-type applications around 5 GHz.

2) An asymmetric current-mode eight-phase digital Doherty transmitter using a single footprint transformer-based matching network shows an implementation of a transformer-within-transformer Doherty architecture to achieve efficiency enhancement up to 9.5 dB output power back-off, while also realizing a compact form-factor. Additionally, it highlights the implementation of an eight-phase architecture to improve the efficiency profile compared to its quadrature counterpart, for high data-rate applications around 6.5 GHz, such as WiFi 6E.

This work was supported in part by the National Science Foundation (NSF) under CAREER Grant CCSS-1846091, and the National Ground Intelligence Center (NGIC).

# Acknowledgments

The road to finishing this degree was certainly long. As I think about my journey, I am immeasurably thankful for everyone's support, encouragement, and blessings. In particular, I would like to extend my sincerest gratitude to:

My advisor, Steve Bowers, for inviting me to join your research group and giving me your unwavering support. I have learned a lot from you, including the art of learning efficiently. Thank you for being understanding and patient when life hampered research progress, and when everything we researched seemed to be hitting a dead end (Remember Q-meter?). I am truly grateful for your encouragement during these past seven years. My dissertation committee members, Bobby Weikle, Scott Barker, Travis Blalock, and Brad Campbell, for your feedback and mentorship throughout the process of writing and defending this thesis. Terry Tigner, my work mom, for your continual kindness, encouragement, and support. You were always there to lend an ear during the tough times, and to fight for me when the part vendors didn't keep their end of the deal. What would I have done without you! Beth Eastwood-Beatty, for helping me navigate all the administrative hurdles, and always having my back when I miss deadlines.

Rob Costanzo, Jesse Moody, and Pouyan Bassirian, former members of the IECS group (or the Ginyu force, as Jesse called it) for your generous help both during our mutual time in the group and later during my job search. Our friendship made my time in this program so much more enjoyable. Vinay Iyer and Linsheng Zhang, the original Tx group members (although Linsheng did ditch us for the dark side - WuRx!), for the late-night technical and emotional support during tapeouts. Couldn't have done it without you guys! Divya Duvvuri, Xiaochuan Shen, Prerana

Singaraju, Yaobin Zhang, Pedram Shirmohammadi, Samin Hanifi, Jinhua Wang, and Shadrach Sarpong, current members of the IECS group, and Anjana Dissanayake and Sumanth Kamineni, members of Ben Calhoun's group for helpful technical discussions and your generous support. It has been a pleasure working with y'all!

Rakesh and Parul Sheth, my parents, for your many sacrifices to support my education and to give me a bright future. From moving to America, then settling in this "tiny" town and having us over every Tuesday for dinner, you have loved and supported me throughout my whole life, and I am forever grateful to have such wonderful parents. Pari Sheth, my sister, and friend, for always encouraging me and listening to all the joys and struggles, especially during this program. My grandparents, Arvind and Usha Sheth, my uncle and aunts, Kaushik Shah, Charu Shah, and Swati Kothari for cheer-leading me throughout my whole life. My church family and my wife's family members, Steve, Katrina, Heather, and Barbara Parshley, for always keeping me in your prayers, especially when the chips kept blowing up.

Special thanks to my wife, Sarah Sheth, for the immeasurable amount of help you provided. You patiently kept postponing all the plans you had, as my graduation kept getting delayed. You were my biggest supporter during this journey, and you were there for me whenever I needed you - whether that was making late-night dinner during tapeouts, or correcting pages and pages of my work for grammatical errors during paper submissions, or helping me practice for countless hours for all kinds of presentations. You truly made this journey easier, and I am so thankful that God gave you in my life.

Finally, and most importantly, I want to thank my Lord, my God, Jesus Christ, who gave me the strength and the courage to complete this program. Thank You for all the blessings!

# Contents

| 1 | Rese | earch O | verview                                                                           | 1  |
|---|------|---------|-----------------------------------------------------------------------------------|----|
|   | 1.1  | Proble  | m Statement and Motivation                                                        | 1  |
|   | 1.2  | Prior A | Art                                                                               | 3  |
|   |      | 1.2.1   | Quadrature Amplitude Modulation (QAM)                                             | 3  |
|   |      | 1.2.2   | Traditional Analog Quadrature Transmitter Architecture                            | 4  |
|   |      | 1.2.3   | Complementary Metal Oxide Semiconductor (CMOS) Digital Transmitters               | 6  |
|   |      | 1.2.4   | M-QAM Probability Density Function and PA Efficiency in Power Back-Off (PBO)      | 11 |
|   |      | 1.2.5   | Synergistic Work: Wideband Indium Phosphide (InP) PAs in F-band (90GHz - 140 GHz) | 12 |
|   | 1.3  | Disser  | tation Organization                                                               | 13 |
|   |      | 1.3.1   | Thesis Statement                                                                  | 13 |
|   |      | 1.3.2   | Research Questions                                                                | 14 |
|   |      | 1.3.3   | Research Contributions                                                            | 15 |
|   |      | 1.3.4   | Dissertation Organization                                                         | 17 |

|   | 1.4   | List of P  | Publications                                                      | 18 |
|---|-------|------------|-------------------------------------------------------------------|----|
| 2 | Desi  | gn of a C  | MOS Nested Doherty Digital Power Amplifier for Low-Power Applica- |    |
|   | tions | 5          |                                                                   | 20 |
|   | 2.1   | Introduc   | tion, Motivation, and Prior Art                                   | 20 |
|   | 2.2   | Nested I   | Doherty Architecture for Low Output Power Digital PAs             | 23 |
|   |       | 2.2.1      | Ideal Transistor Model                                            | 25 |
|   |       | 2.2.2      | Drain-Source Voltage $(V_{ds})$ Dependent Transistor Model        | 27 |
|   | 2.3   | Design of  | of a Four-way Nested Doherty PA                                   | 28 |
|   | 2.4   | Impleme    | entation and Simulations                                          | 32 |
|   | 2.5   | Measure    | ements                                                            | 39 |
|   |       | 2.5.1      | CW Measurements                                                   | 40 |
|   |       | 2.5.2      | RF Modulated Measurements                                         | 44 |
|   | 2.6   | Conclus    | ion                                                               | 46 |
|   | 2.7   | Contribu   | itions                                                            | 46 |
| 3 | Desi  | gn of an   | Asymmetric Current-Mode Multi-phase Digital Doherty Transmitter   |    |
|   | Usin  | ig a Singl | e Footprint Transformer-Based Matching Network                    | 48 |
|   | 3.1   | Introduc   | tion, Motivation, and Prior Art                                   | 48 |
|   | 3.2   | Design 7   | Techniques for a Multi-phase Asymmetric Doherty Architecture      | 51 |
|   |       | 3.2.1      | Multi-phase Architecture                                          | 51 |
|   |       | 3.2.2      | Multi-phase Generation                                            | 59 |
|   |       | 3.2.3      | Transformer Matching Network for Asymmetric Doherty PA:           | 63 |

|   |     | 3.2.4   | Derivations Results and Design Strategy:                               | 70  |
|---|-----|---------|------------------------------------------------------------------------|-----|
|   | 3.3 | Impler  | nentation of a Current-Mode Eight-phase Asymmetric Doherty Transmitter |     |
|   |     | Using   | a Single Footprint Transformer                                         | 72  |
|   |     | 3.3.1   | Implementation of Wideband Eight-phase Generation around 6.5 GHz       | 72  |
|   |     | 3.3.2   | Implementation of Eight-phase basis vector mapper around 6.5 GHz       | 77  |
|   |     | 3.3.3   | Implementation of the PAs and its Unit Cells                           | 78  |
|   |     | 3.3.4   | Implementation of Single Footprint Asymmetric Series Doherty Trans-    |     |
|   |     |         | former Network                                                         | 81  |
|   | 3.4 | Measu   | rement                                                                 | 83  |
|   |     | 3.4.1   | CW Measurements                                                        | 83  |
|   |     | 3.4.2   | RF Modulated Measurements                                              | 88  |
|   | 3.5 | Conclu  | usion                                                                  | 92  |
|   | 3.6 | Contri  | butions                                                                | 92  |
| 4 | Wid | eband I | nP PAs in F-band with Modulation Measurements                          | 94  |
|   | 4.1 | Introdu | uction                                                                 | 94  |
|   | 4.2 | Design  | of a Single Stage Stacked PA and an Eight-Way DAT PA                   | 96  |
|   |     | 4.2.1   | Single Stage Stacked PA                                                | 96  |
|   |     | 4.2.2   | Eight-Way Distributed Active Transformer (DAT) Based PA                | 98  |
|   | 4.3 | CW M    | easurements                                                            | 98  |
|   | 4.4 | Modul   | ation measurements of a Single Stage Stacked PA                        | 101 |
|   | 4.5 | Conclu  | isions                                                                 | 104 |

|                                                                                                 | 4.6 | Contri | butions                                       | 104 |
|-------------------------------------------------------------------------------------------------|-----|--------|-----------------------------------------------|-----|
| 5 Conclusions, Methodology for Designing Digital Transmitters, Future Directions,<br>Other Work |     |        | 1<br>106                                      |     |
|                                                                                                 | 5.1 | Disser | tation Conclusions                            | 106 |
|                                                                                                 | 5.2 | Consid | lerations for Designing a Digital Transmitter | 108 |
|                                                                                                 | 5.3 | Future | Directions                                    | 111 |
|                                                                                                 |     | 5.3.1  | Wideband Transmitter Operation                | 111 |
|                                                                                                 |     | 5.3.2  | Improving Linearity                           | 112 |
|                                                                                                 |     | 5.3.3  | Robustness to Load Impedance Variations       | 112 |
|                                                                                                 | 5.4 | Other  | Work                                          | 113 |

# **List of Figures**

| 1.1 | A future smart-city with wireless communication systems being used in a vari-                                                                                |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
|     | ety of emerging applications such as smart farming, inventory tracking, vehicular                                                                            |
|     | communications, smart home connectivity, and augmented and virtual reality 2                                                                                 |
| 1.2 | (a) A constellation of 16-QAM with root raised cosine filtering and (b) its cor-                                                                             |
|     | responding time domain waveform exhibiting high peak-to-average power ratio                                                                                  |
|     | (PAPR)                                                                                                                                                       |
| 1.3 | A traditional analog quadrature transmitter using a quadrature mixer followed by                                                                             |
|     | a linear power amplifier to transmit M-QAM waveform at RF                                                                                                    |
| 1.4 | (a) A digital polar transmitter using a COordinate Rotation DIgital Computer (CORDIC)                                                                        |
|     | block to convert the in-phase (I) and quadrature (Q) components to amplitude ( $\rho$ )                                                                      |
|     | and phase ( $\phi$ ), where $\phi$ gets up-converted to RF for producing (b) a constant en-                                                                  |
|     |                                                                                                                                                              |
|     | velope phase-shift keyed (PSK) signal, and $\rho$ is fed to an amplitude decoder to                                                                          |
|     | velope phase-shift keyed (PSK) signal, and $\rho$ is fed to an amplitude decoder to control the PA unit cells, all to generate the desired (c) QAM output    |
| 1.5 | velope phase-shift keyed (PSK) signal, and $\rho$ is fed to an amplitude decoder to<br>control the PA unit cells, all to generate the desired (c) QAM output |
| 1.5 | velope phase-shift keyed (PSK) signal, and $\rho$ is fed to an amplitude decoder to<br>control the PA unit cells, all to generate the desired (c) QAM output |

| 1.6 | A digital quadrature transmitter using only quadrature (I and Q) continuous wave           |    |
|-----|--------------------------------------------------------------------------------------------|----|
|     | (CW) input tones, eliminating the need for a CORDIC block, thereby achieving               |    |
|     | higher data-rates compared to a digital polar transmitter. The desired M-QAM               |    |
|     | waveform is produced at the output through the weighted summation of I and Q               |    |
|     | components                                                                                 | 9  |
| 1.7 | Comparing the efficiency degradation as the desired output phase is varied, where          |    |
|     | the polar architecture achieves maximum efficiency for all phases, while the quadra-       |    |
|     | ture architecture achieves maximum efficiency only at the basis quadrature vectors,        |    |
|     | with worst-case performance around $45^{\circ}$ .                                          | 10 |
| 1.8 | The probability density function of a 64-QAM waveform, with a root-raised cosine           |    |
|     | filter coefficient $\alpha = 0.35$ , in power back-off (PBO), compared with the efficiency |    |
|     | of a normalized class B and a traditional Doherty PA in PBO                                | 11 |
| 2.1 | Extension of a generic (a) 2-way Doherty architecture to (b) 3-way, and to (c) n-          |    |
|     | way through nesting, which results in (d) 2, 3, and n efficiency enhancement peaks,        |    |
|     | respectively, in power back-off (PBO)                                                      | 21 |
| 2.2 | Digital implementation of a power amplifier (PA) block consists of multiple unit           |    |
|     | cells, where each unit cell is composed of a set of digital buffers and a common           |    |
|     | source output stage                                                                        | 22 |
| 2.3 | (a) A generic n-way class G Doherty architecture with $n - 2 V_{DD}$ s resulting in $n$    |    |
|     | efficiency enhancement peaks, and (b) desired current drive strengths of the main          |    |
|     | and the peaking amplifier for case $n = 3$ and $\alpha_1 = 2$                              | 24 |
| 2.4 | Comparison of nested Doherty with class G Doherty architecture in power back-              |    |
|     | off using the ideal and the $V_{ds}$ dependent transistor model. (a) Power dissipation     |    |
|     | trend in the output stage and buffers. (b) Drain efficiency of the output stage and        |    |
|     | system efficiency trend of the whole PA                                                    | 25 |
|     |                                                                                            |    |

| 2.5  | Improvement offered by nested Doherty PA over class G Doherty PA dependent           |    |
|------|--------------------------------------------------------------------------------------|----|
|      | on the relative buffer power consumption at 9 dB PBO. Losses in the OMN and          |    |
|      | the switches of class G are ignored                                                  | 26 |
| 2.6  | An example showing the effect of lowering $V_{DD}$ on a common source stage driven   |    |
|      | with a constant amplitude voltage swing. Simulation performed in a general-          |    |
|      | purpose 65nm CMOS process with optimum impedance presented at each $V_{DD}$          |    |
|      | value to obtain full swing at the output                                             | 27 |
| 2.7  | (a) Asymmetric Doherty architecture designed using (b) scaling ratios to generate    |    |
|      | the inverting network impedance and (c) current drive strengths of the main and      |    |
|      | peaking amplifier that result in (d) efficiency enhancement at an arbitrary back-off |    |
|      | level                                                                                | 29 |
| 2.8  | (a) Asymmetric 2-way Doherty architecture that enhances efficiency through 3 dB      |    |
|      | PBO, which extends to (b) a 3-way architecture through current drive strengths       |    |
|      | shown in (c) to achieve (d) efficiency enhancements at 3 dB and 6 dB PBO             | 30 |
| 2.9  | (a) Proposed four-way Doherty PA architecture to maintain optimal matching through   |    |
|      | 9 dB PBO, yielding (b) efficiency enhancements by (c) scaling RF current drives      |    |
|      | to (d) produce the desired optimal output impedances                                 | 31 |
| 2.10 | Implementation of a four-way Doherty output impedance matching network with          |    |
|      | on-chip lumped elements for transmission lines. Consolidation reduces the total      |    |
|      | number of inductors from 10 down to 4. The actual implementation is differential     | 32 |
| 2.11 | Simulated drain efficiency and passive efficiency of the implemented D4DPA show-     |    |
|      | ing the effects of loss in passive OMN                                               | 33 |
| 2.12 | (a) Implementation of a quadrature hybrid along with LC network to impedance         |    |
|      | transform and bias the RF inverters resulting in 6 inductors. (b) Consolidated       |    |
|      | quadrature hybrid and input matching network resulting in only 4 inductors. The      |    |
|      | actual implementation is differential                                                | 34 |

| 2.13 | Simulated frequency dependence comparison of voltage gain and quadrature phase       |    |
|------|--------------------------------------------------------------------------------------|----|
|      | generation for the input network before and after consolidation using ideal compo-   |    |
|      | nents.                                                                               | 35 |
| 2.14 | EM simulated frequency dependence of the realized consolidated input network for     |    |
|      | (a) quadrature gain and phase generation, and (b) mismatch in differential magni-    |    |
|      | tude and phase.                                                                      | 36 |
| 2.15 | Block-level schematic of each digital PA block along with the transistor sizes. The  |    |
|      | PA block can be turned off through " $PA_x$ Block Enable" node controlled by a 4     |    |
|      | input functional OR gate.                                                            | 37 |
| 2.16 | Frequency dependence of differential quadrature phases at the input of the CS stage  |    |
|      | of 4 PA blocks for the MSB (x8 bit)                                                  | 37 |
| 2.17 | Overall block diagram of the implemented differential four-way digital Doherty       |    |
|      | power amplifier with consolidated output and input matching network and on-chip      |    |
|      | quadrature signal generation.                                                        | 38 |
| 2.18 | Die photo of the PA realized in a 65 nm CMOS process.                                | 38 |
| 2.19 | Simulated and measured input reflection coefficient $(S_{11})$ of the PA             | 39 |
| 2.20 | Measured drain efficiency (DE) and system efficiency (SE) of the implemented PA      |    |
|      | at (a) 4.75 GHz and (b) 5.25 GHz compared to normalized class B and class A PAs.     | 39 |
| 2.21 | Measured frequency dependence of the implemented PA for (a) DE, SE, (b) gain,        |    |
|      | and output power $(P_{out})$ at peak DE, where the input power is varied to maximize |    |
|      | efficiency while still maintaining gain above 10 dB. DE performance is also shown    |    |
|      | when the PA is 1/8 turned on, and it is compared with normalized class B             | 41 |
| 2.22 | Measured AM-PM degradation of the implemented PA at 5.25 GHz for the optimal         |    |
|      | drain efficiency points.                                                             | 41 |

| 2.23 | For a 1 MSym/s QPSK waveform at 5.25 GHz, measured (a) constellation, (b) DE in PBO compared to normalized class B waveform, and (c) r.m.s. error vector                                                                                                                      |    |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | magnitude $(EVM_{rms})$ , and adjacent channel leakage ratio (ACLR) in PBO                                                                                                                                                                                                    | 42 |
| 2.24 | Measurement setup for outputting a 16 QAM RF signal through the implemented PA                                                                                                                                                                                                | 43 |
| 2.25 | Measured (a) constellation, $EVM_{rms}$ , and (b) ACLR for a 1 MSym/s 16 QAM waveform at 5.25 GHz.                                                                                                                                                                            | 43 |
| 2.26 | For a 1 MSym/s 16 QAM waveform at 5.25 GHz, measured (a) DE in PBO com-<br>pared to normalized class B waveform, and (b) $EVM_{rms}$ and ACLR in PBO. (c) $P_{out}$ , DE, (d) ACLR, and $EVM_{rms}$ are measured for robustness with respect to<br>symbol rate.               | 45 |
| 2.27 | Far-out spectrum for a 5 MSym/s 10× oversampled 16 QAM waveform at 5.25 GHz.                                                                                                                                                                                                  | 45 |
| 3.1  | A digital power amplifier with a total of $p$ cells, where $m$ cells are driven with $\phi_A/\overline{\phi_A}$ , $n$ cells are driven with $\phi_B/\overline{\phi_B}$ , and the remaining $[p-(m+n)]$ cells are turned off to achieve the desired output amplitude and phase | 52 |
| 3.2  | A simplified model of a digital PA using a multi-phase architecture, where the                                                                                                                                                                                                |    |
|      | difference in the two basis vectors corresponds to an increased duty-cycle $(D)$ of<br>the on-time $(t_1)$ of the switch, which leads to a boost in the voltage waveform<br>across the switch                                                                                 | 52 |
| 3.3  | difference in the two basis vectors corresponds to an increased duty-cycle $(D)$ of<br>the on-time $(t_1)$ of the switch, which leads to a boost in the voltage waveform<br>across the switch                                                                                 | 52 |

| 3.4 | For $m + n = p$ (maximum output amplitude ( $ V_{out,max} $ ) contour), a quadrature        |    |
|-----|---------------------------------------------------------------------------------------------|----|
|     | architecture (red) exhibits severe performance degradation, with a worst-case ef-           |    |
|     | ficiency of 42%, while its 8-phase counterpart (green) maintains a relatively high          |    |
|     | worst-case efficiency of 69%, highlighting a $1.64 \times$ improvement                      | 55 |
| 3.5 | Time-domain voltage waveforms at the drain of cascode cells when the PA is op-              |    |
|     | erated along the basis vectors ( $m = p, n = 0$ or $n = p, m = 0$ ) and at the middle       |    |
|     | of the two basis vectors ( $m = n = p/2$ ), for both (a) the quadrature and (b) the 8-      |    |
|     | phase architecture. The voltage peak exceeds $3 \times V_{DD}$ for the quadrature architec- |    |
|     | ture, while it is relatively tamer for the 8-phase counterpart, alleviating reliability     |    |
|     | concerns related to the breakdown of the devices.                                           | 56 |
| 3.6 | Input code word constellation for (a) a quadrature and (b) an 8-phase digital PA            |    |
|     | resulting in a more circular output constellation (c) and (d) due to PA non-linearity.      |    |
|     | The 8-phase architecture also achieves a higher resolution since the total number           |    |
|     | of code words is distributed in a smaller sector of the constellation                       | 58 |
| 3.7 | Potential solutions to generate 8 equally spaced phase-shifted signals around 6.5           |    |
|     | GHz through (a) flip-flop based frequency dividers, (b) polyphase filter (PPF), and         |    |
|     | (c) a two phase (red) or eight phase (red + pink) injection locked eight-stage ring         |    |
|     | oscillator (2P-8ILRO or 8P-8ILRO)                                                           | 60 |
| 3.8 | Error (deviation) from the desired eight equally spaced phase-shifted signals around        |    |
|     | 6.5 GHz when generated through (a) a PPF, (b) a 2P-8ILRO, and (c) an eight-phase            |    |
|     | injection locked eight-stage ring oscillator (8P-ILRO)                                      | 62 |
| 3.9 | (a) A transformer-based asymmetric series Doherty architecture using (b) asym-              |    |
|     | metric current drive strengths of the main and the peaking amplifier, (c) to achieve        |    |
|     | the optimal impedances at 0 dB and $20.\log(\alpha)$ dB back-off, and thereby leading to    |    |
|     | (d) efficiency enhancement at $20.\log(\alpha)$ dB back-off                                 | 64 |

| 3.10 | (a) Implementation of a transformer-based asymmetric series Doherty matching                     |    |
|------|--------------------------------------------------------------------------------------------------|----|
|      | network, where $k_1$ and $k_2$ represent the magnetic coupling coefficients of the main          |    |
|      | and peaking transformers, respectively. $L_{res1}$ and $L_{res2}$ represent the inductors        |    |
|      | used to resonate the parasitic capacitors of the PAs, and the transmission line is               |    |
|      | implemented using an equivalent high-pass $T \operatorname{LC}$ network. Taking advantage of     |    |
|      | the parallel magnetic inductance and the series leakage inductance of a practical                |    |
|      | transformer, (b) the entire network can be implemented simply using two trans-                   |    |
|      | formers ( $Lp1$ - $Ls1$ and $Lp2$ - $Ls2$ ), and a capacitor ( $C_{OMN}$ ), to achieve a compact |    |
|      | design                                                                                           | 65 |
| 3.11 | A transformer-based asymmetric series Doherty network showing impedances and                     |    |
|      | current labels for helping with the derivation of the design methodology                         | 67 |
| 3.12 | Design of a transformer-based asymmetric series Doherty network at any desired                   |    |
|      | frequency and at any desired efficiency enhancement (EE) level using (a) $L_{s1}$ , (b)          |    |
|      | $L_{s2}$ , (c) $k_1$ , and (d) $k_2$                                                             | 71 |
| 3.13 | Overall block diagram of the implemented current-mode eight-phase asymmetric                     |    |
|      | digital Doherty transmitter using a single footprint transformer with on-chip multi-             |    |
|      | phase generation, along with the overall block-level schematic of the unit cells of              |    |
|      | the PAs                                                                                          | 73 |
| 3.14 | An eight-phase injected eight-stage injection locked ring oscillator (8P-8ILRO)                  |    |
|      | technique to generate eight equally spaced phase-shifted signals around 6.5 GHz                  |    |
|      | is composed of (a) a PPF to generate the eight "unclean" injection signals, (b) a                |    |
|      | bias tee to raise the DC offset of the injection signals, and (c) an eight-stage ring            |    |
|      | oscillator to output the "clean" phase-shifted signals                                           | 74 |
| 3.15 | Post parasitic extracted voltage swing attenuation from the input (single-ended) to              |    |
|      | the outputs (single-ended) of the PPF.                                                           | 75 |

| 3.16 | Post parasitic extracted error (deviation) from the desired eight equally spaced                                                                           |    |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | phase-shifted signals around 6.5 GHz shows significant error at the output of (a)                                                                          |    |
|      | the three-ring PPF, while (b) the error is substantially reduced through the 8P-8ILRO.                                                                     | 76 |
| 3.17 | Block-level schematic of the implemented eight-phase basis vector mapper                                                                                   | 77 |
| 3.18 | (a) An implementation of an asymmetric transformer-based series Doherty net-                                                                               |    |
|      | work with parasitic magnetic coupling, resulting in (b) efficiency enhancement                                                                             |    |
|      | close to 9.5 dB PBO when $k_p = 0$ , and the degraded performance in PBO as $k_p$                                                                          |    |
|      | increases.                                                                                                                                                 | 80 |
| 3.19 | (a) The implemented single footprint transformer-based series Doherty network us-                                                                          |    |
|      | ing the transformer-within-transformer technique with (b) very low parasitic cou-                                                                          |    |
|      | pling, resulting in (c) maximum passive efficiency of 70%, and (d) the desired                                                                             |    |
|      | input impedance for the Main and the Peak PA.                                                                                                              | 82 |
|      |                                                                                                                                                            |    |
| 3.20 | Die photo of the transmitter realized in a 65 nm CMOS process                                                                                              | 82 |
| 3.21 | Measured phase noise of the injection locked ring oscillator for the entire locking                                                                        |    |
|      | range at a constant $V_{RO} = 1.01 \text{ V} \dots $ | 85 |
| 3.22 | Measured frequency dependence of the implemented PA for (a) DE at maximum                                                                                  |    |
|      | output power and at power back-off when the peak PA is turned off, and (b) maxi-                                                                           |    |
|      | mum output power and gain, where the supply of the ring oscillator is varied with                                                                          |    |
|      | respect to frequency to achieve optimal phase poise performance                                                                                            | 85 |
|      | respect to frequency to demote optimal phase noise performance                                                                                             | 05 |
| 3.23 | Measured drain efficiency (DE) and system efficiency (SE) of the implemented                                                                               |    |
|      | PA at (a) 6.25 GHz, (b) 6.50 GHz, (c) 6.75 GHz and (d) 7.00 GHz compared to                                                                                |    |
|      | normalized class B PAs                                                                                                                                     | 86 |
| 3.24 | Measured DE at maximum $P_{out}$ contour $(m + n = p)$ with respect to normalized                                                                          |    |
|      | output phase for all basis vector mapper code bits at 6.5 GHz                                                                                              | 87 |

| 3.25 | Measured AM-AM and AM-PM of the implemented transmitter for all basis vector mapper code bits at 6.5 GHz.                                                                                                                                      | 88  |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 3.26 | Measured (a) 32-QAM and (6) 64-QAM constellations at maximum $P_{out}$ for increasing symbol rates at 6.5 GHz.                                                                                                                                 | 89  |
| 3.27 | Measured (a) 128-QAM constellation $EVM_{rms}$ and (b) ACLR for a 12.5 MSym/s waveform at 6 dB output power back-off at 6.5 GHz.                                                                                                               | 90  |
| 3.28 | Measured DE and $EVM_{rms}$ dependence on output power at 6.5 GHz for (a) 16-QAM, (b) 32-QAM, (c) 64-QAM constellations at 20 MSym/s, and (d) 128-QAM constellation at 12.5 MSym/s.                                                            | 91  |
| 3.29 | Far-out spectrum for a 20 MSym/s 8× oversampled 64 QAM waveform at 6.5 GHz showing the resulting zero-order hold (ZOH) sampling images                                                                                                         | 91  |
| 4.1  | Schematic of the stacked PA, showing its input and output matching networks, $R_b$ resistance for stability, and radial stubs for presenting broadband low impedance.                                                                          | 95  |
| 4.2  | Block-level schematic of a distributed active transformer (DAT) based power am-<br>plifier with the inter-stage matching network between the final stacked PA stage<br>and a differential driver stage.                                        | 97  |
| 4.3  | Die photos of (a) the single stacked PA, and (b) the DAT based PA implemented in Teledyne's 130 nm InP process.                                                                                                                                | 99  |
| 4.4  | <ul><li>(a) Small-signal measurements of the stacked PA showing a 3 dB bandwidth from</li><li>90 - 140 GHz, and (b) large signal power back-off measurements at 112.5 GHz</li><li>showing a peak 18.3% power added efficiency (PAE).</li></ul> | 99  |
| 4.5  | Large signal frequency dependence measurements of the stacked PA showing (a) more than 7% PAE, and (b) less than 3 dB ripple from 90-140 GHz.                                                                                                  | 100 |

| 4.6 | 6 (a) Small-signal measurements of the DAT PA showing a 3 dB bandwidth from       |  |  |
|-----|-----------------------------------------------------------------------------------|--|--|
|     | 100 to 140 GHz, and (b) large signal frequency dependence showing $> 20$ dBm      |  |  |
|     | $P_{sat}$ and > 7% PAE from 115 to 130 GHz                                        |  |  |
| 4.7 | Large signal power back-off measurements at 120 GHz showing a peak 11.5%          |  |  |
|     | power added efficiency (PAE)                                                      |  |  |
| 4.8 | Multi-Gbps modulation measurement setup to study the behavior of a standalone     |  |  |
|     | D-band PA close to its saturation regime using commercial-of-the-shelf (COTS)     |  |  |
|     | equipment                                                                         |  |  |
| 4.9 | Down-converted and demodulated (a) 500 MSPS 16-QAM, and (b) 5 GSPS QPSK           |  |  |
|     | constellations, along with (c) output power and symbol-rate dependence on the     |  |  |
|     | $EVM_{rms}$ of various modulated signals through the single stage stacked PA at a |  |  |
|     | carrier frequency of 121 GHz                                                      |  |  |

# **List of Tables**

| 1.1 | Power Amplifier Class of Operation                                        | 6   |
|-----|---------------------------------------------------------------------------|-----|
| 2.1 | Comparison to CMOS PAs with PBO Efficiency Enhancement                    | 47  |
| 3.1 | Comparison with State-of-the-Art CMOS PAs Operating above 5 GHz           | 93  |
| 4.1 | Table of comparison for stand-alone PAs/Front-Ends at greater than 100GHz | 103 |

# Chapter 1

# **Research Overview**

### **1.1 Problem Statement and Motivation**

The Sub-7 GHz spectrum (450 MHz to 7 GHz) is widely used for a variety of wireless communication networks. Over the years, the use of this spectrum has evolved heavily to support increased connectivity and higher data-rates. For example, the first release of 4G Long-term evolution (LTE) cellular network by the 3<sup>*rd*</sup> Generation Partnership Project (3GPP) in 2008 used only a limited set of frequency bands within the 700 MHz to 2.6 GHz spectrum [1]; however, since then, the LTE network has undergone numerous evolutions to include multiple frequency bands within the 450 MHz to 6 GHz spectrum [2]. Similarly, in 1997, IEEE 802.11 (precursor of Wi-Fi) was introduced to operate with relatively low data-rates around 2.4 GHz, which later evolved to include spectrum around 5 GHz as well, by the Wi-Fi Alliance [3].

To support various emerging applications such as smart home connectivity, augmented reality, and virtual reality, the 3GPP recently introduced the 5G new radio (NR) cellular network, which includes about 1.5 GHz of spectrum around 3 GHz, through frequency re-framing [2]. Similarly, the Wi-Fi Alliance introduced Wi-Fi 6E to take advantage of the recently allocated 1200 MHz of spectrum around 6.5 GHz for unlicensed use [4]. In addition, other emerging applications such as



Figure 1.1: A future smart-city with wireless communication systems being used in a variety of emerging applications such as smart farming, inventory tracking, vehicular communications, smart home connectivity, and augmented and virtual reality.

smart farming, inventory tracking, industrial monitoring, etc. (Fig. 1.1), have an ever-increasing need for connectivity as well, which is expected to result in an exponential growth of Internet of Things (IoT) sensor nodes [5, 6]. In 2021, the Wi-Fi Alliance introduced the Wi-Fi HaLow network to operate around the 915 MHz industrial, scientific, and medical (ISM) band for such applications [7,8].

In the future, as the demand for connectivity and data-rates increase even further, it can be assumed that the sub-7 GHz spectrum will continue to be re-framed to support those needs. However, all of these applications have always shared, and will continue to share, a common need for cost-effective devices with long battery life. For example, in the world of cellular and portable devices, the pressure on reducing the cost of integrated circuits to reduce the cost of products is well-known [9], and increasing battery life is crucial to increasing the usable time of the device. Similarly, in the world of IoT sensor nodes, with the projection of 30 billion devices being deployed worldwide by 2030,

the cost of each device needs to be extremely low to facilitate affordable sensor networks [10]. In addition, since it is not feasible to replace the batteries of such a large number of devices [11, 12], every device needs to operate on energy harvesting components, and their power dissipation needs to be sufficiently low to prevent their batteries from being depleted.

A transmitter is one of the most important components of a wireless device since it directly affects both its cost and battery life. As explained later, transmitters typically contain multiple spiral structures that occupy appreciable amounts of area, which increases the cost of the integrated circuit. Additionally, the power dissipated in a transmitter is significantly higher compared to all other components of a wireless device, and thus determines the battery life. Therefore, this dissertation focuses on frequency scalable design techniques to realize compact and efficient wireless transmitters.

### **1.2 Prior** Art

#### **1.2.1** Quadrature Amplitude Modulation (QAM)

As discussed above, the sub-7 GHz spectrum is a limited resource. Therefore, most modern wireless communication systems aim to use spectrum efficiently by employing higher-order modulation schemes to achieve high data-rates. Quadrature amplitude modulation (QAM), is one of the most widely used modulation techniques, where data is encoded in symbols using both amplitude and phase modulation. Such a signal can be represented on an in-phase/quadrature constellation, where, at any given instant, the distance from the center represents the magnitude, while the angle with respect to the in-phase axis, represents the phase. An example of a 16-QAM constellation is shown in Fig. 1.2(a), where each orange dot represents a data symbol. Since there are a total of 16 data symbols in the constellation, each symbol is the equivalent of transmitting  $log_2(16)$ , i.e four bits of data. This technique can be extended to M-QAM, where each symbol is equivalent to transmitting  $log_2(M)$  bits of information. In general, increasing the order of such a constellation



Figure 1.2: (a) A constellation of 16-QAM with root raised cosine filtering and (b) its corresponding time domain waveform exhibiting high peak-to-average power ratio (PAPR).

is desired, since it increases the number of bits represented by any given symbol, and thereby increases data-rate. Additionally, a root-raised cosine filter is applied to M-QAM constellations to allow for a smooth transition from one data symbol to the next, as this helps reduce the bandwidth of the transmitted signal and prevents corrupting the adjacent frequency channels. This smooth transition is shown using blue "spaghetti lines" in Fig. 1.2(a). The time domain equivalent of this filtered signal is shown in Fig. 1.2(b), and it illustrates that such a waveform exhibits a high peak-to-average power ratio (PAPR). It is important to note that such signals can be processed and amplified only using highly linear circuit blocks, as any non-linearity can compress the peaks of the waveform, leading to spectrum degradation, and potentially information degradation as well.

### 1.2.2 Traditional Analog Quadrature Transmitter Architecture

Traditionally, an analog quadrature transmitter is used to generate and amplify the M-QAM waveform at radio frequency (RF). The block-level architecture of such a transmitter is shown in Fig.1.3(a), where a digital baseband generates the respective in-phase (I) and quadrature (Q) components, and these digital signals are then converted to their analog counterparts with a digital-toanalog converter (DAC). Next, these signals are up-converted to RF using a local oscillator (LO) and a mixer, and then they are summed to generate the desired waveform. The complex envelope S of the waveform is shown in Equation (1.1). Finally, this signal is amplified using a highly linear



Figure 1.3: A traditional analog quadrature transmitter using a quadrature mixer followed by a linear power amplifier to transmit M-QAM waveform at RF.

power amplifier (PA) to generate the required output power  $(P_{out})$ .

$$S = I + jQ \tag{1.1}$$

Although this architecture is able to transmit the desired signal, it suffers from the following:

- **Poor efficiency:** The PA is the most crucial component of a wireless transmitter, as it is the most power-consuming block in the system. Therefore, designing PAs with high efficiency is critical, since they dictate the efficiency of the entire transmitter. The efficiency of the PA is determined by the class in which it operates, and Table 1.1 shows the theoretical maximum efficiencies of various PA classes. Since the signal needs linear amplification, higher classes of PAs are not readily compatible with the traditional analog quadrature transmitter architecture. This limits the maximum achievable efficiency, and therefore, the battery life of the device.
- Large area: The PA shown in Fig. 1.3(a) is typically composed of multiple gain stages and pre-amplifier stages to generate the desired output power. These amplifiers use matching networks in-between that are composed of inductors and transformers. Such a design significantly increases the area of the integrated circuit chip and increases the cost of the transmitter [9].

| Class       | Max. Efficiency       | Linearity  |
|-------------|-----------------------|------------|
| А           | 50%                   | Linear     |
| AB          | 50% to 78.5%          | Linear     |
| В           | 78.5%                 | Linear     |
| С           | > 78.5%               | Linear     |
|             | $P_{out} = 0$ at 100% |            |
| D and above | 100%                  | Non-linear |

Table 1.1: Power Amplifier Class of Operation

• **Poor scalability:** Since the PA is implemented in an analog fashion and has numerous inductors and transformer structures, such a design is difficult to scale with process nodes. Also, there is typically a delay in the availability of RF models in the process design kit (pdk) compared to the digital models, which impedes the scalability of such a design [9].

### **1.2.3** Complementary Metal Oxide Semiconductor (CMOS) Digital Transmitters

In recent years, CMOS digital transmitters have garnered a significant amount of interest as they have the potential to overcome all of the above-mentioned limitations of their analog counterparts [13–19]. The key difference is replacing the traditional power amplifier with a digital power amplifier. Instead of linearly amplifying a high PAPR waveform, a digital PA is composed of multiple bit slices (or unit cells), where the output amplitude of the PA is varied by digitally controlling each bit slice on/off. This allows the drive signal for each unit cell to be a constant envelope signal that does not contain any amplitude modulation information. Since amplitude compression is no longer a concern, each unit cell can be implemented using a non-linear PA, such as Class D or  $D^{-1}$ , to improve efficiency. Further, unlike the analog PA, the inductors and transformers in pre-amplifier stages can be eliminated, since these stages can be implemented using non-linear NOT



Figure 1.4: (a) A digital polar transmitter using a COordinate Rotation DIgital Computer (CORDIC) block to convert the in-phase (I) and quadrature (Q) components to amplitude ( $\rho$ ) and phase ( $\phi$ ), where  $\phi$  gets up-converted to RF for producing (b) a constant envelope phase-shift keyed (PSK) signal, and  $\rho$  is fed to an amplitude decoder to control the PA unit cells, all to generate the desired (c) QAM output.

gates as they do not need to preserve amplitude information. And finally, such a design is amenable to scale with process nodes, since the PA in every unit cell is simply used as a non-linear switch.

In order to produce the desired M-QAM waveform at the output of a digital transmitter, one of the following techniques is popularly used: a) Polar [15, 16, 20–22] or b) Quadrature [23–27]

#### **Digital Polar Transmitters**

The block-level architecture of such a digital polar transmitter is shown in Fig. 1.4(a). Here, the in-phase (I) and the quadrature (Q) components are used to generate amplitude ( $\rho$ ) and phase ( $\phi$ ) information using a COordinate Rotation DIgital Computer (CORDIC) block.  $\phi$  is fed to an RF modulator to generate a constant envelope phase-shift keyed (PSK) signal, as shown in Fig. 1.4(b),



Figure 1.5: The non-linear inverse tangent operation used to generate a constant envelope phase shift keyed (PSK) RF signal leads to about  $5 \times$  bandwidth expansion compared to the desired QAM signal.

while  $\rho$  is fed to an amplitude decoder to generate the control signals for the PA bit slices. The unit cells combine the information from  $\phi$  and  $\rho$  to produce the desired M-QAM output, as shown in Fig. 1.4(c).

$$\rho = \sqrt{I^2 + Q^2} \tag{1.2}$$

$$\phi = \tan^{-1}(\frac{Q}{I}) \tag{1.3}$$

$$S = \rho e^{j\phi} \tag{1.4}$$

In practice, it is challenging to achieve high data-rates with a compact form-factor using the polar digital transmitter. As shown in equation 1.4, the constant envelope PSK RF signal is generated through an inverse tangent operation. Due to the non-linear nature of this mathematical operation, the original signal (S) undergoes a massive bandwidth expansion of about  $5\times$ , as illustrated in Fig. 1.5. This implies that the RF modulator circuit block, which generates the PSK signal, needs to attain a bandwidth that is  $5\times$  higher than the data bandwidth, which eventually limits the achievable bandwidth of the overall transmitter. Also, a Doherty PA explained in Sections



Figure 1.6: A digital quadrature transmitter using only quadrature (I and Q) continuous wave (CW) input tones, eliminating the need for a CORDIC block, thereby achieving higher data-rates compared to a digital polar transmitter. The desired M-QAM waveform is produced at the output through the weighted summation of I and Q components.

1.2.4 and 2.3, requires quadrature signals to improve efficiency, and in a polar implementation, the bandwidth of the block generating these quadrature signals also needs to be wideband. Therefore, these networks require the use of inductor- or transformer-based passive input networks that are large and increase the on-chip area.

#### **Digital Quadrature Transmitter**

The block-level architecture of a digital quadrature transmitter is shown in Fig. 1.6. Here, the digital PA is divided into two groups, one for the in-phase component, and the other for the quadrature component. The desired output M-QAM waveform is generated through the weighted summation of I and Q components that are controlled by an amplitude decoder. Unlike the digital polar transmitter, this architecture does not undergo bandwidth expansion, as it requires only continuous wave (CW) tones to be fed to the two PA groups. Therefore, it is amenable to high data-rate applications. Further, the quadrature signals can be generated using narrow-band active blocks that are



Figure 1.7: Comparing the efficiency degradation as the desired output phase is varied, where the polar architecture achieves maximum efficiency for all phases, while the quadrature architecture achieves maximum efficiency only at the basis quadrature vectors, with worst-case performance around  $45^{\circ}$ .

compact in size, since these signals are purely CW.

However, this architecture suffers from degraded efficiency when the desired output phase sweeps away from the basis I and Q components. Section 3.2.1 explains in detail that the weighted summation of I and Q components leads to degraded efficiency, with the worst-case efficiency occurring around 45°, as shown in Fig. 1.7. Therefore, while transmitting an M-QAM waveform, the overall efficiency of the transmitter degrades, which limits the improvement of battery life.

Thus, there is a need for a digital transmitter architecture that does not exhibit the massive bandwidth expansion of a polar transmitter or the degraded efficiency profile of a quadrature transmitter.



Figure 1.8: The probability density function of a 64-QAM waveform, with a root-raised cosine filter coefficient  $\alpha = 0.35$ , in power back-off (PBO), compared with the efficiency of a normalized class B and a traditional Doherty PA in PBO.

### 1.2.4 M-QAM Probability Density Function and PA Efficiency in Power Back-Off (PBO)

For transmitters producing M-QAM waveform, one of the most important PA metrics is its efficiency in power back-off (PBO). Consider a 64-QAM constellation with a root-raised cosine filter coefficient  $\alpha = 0.35$ , whose probability density function is shown in Fig. 1.8. It can be noticed that the probability peak occurs about 12 dB PBO, illustrating the high PAPR nature of this waveform. Therefore, to improve the efficiency of the transmitter, high efficiency in PBO is crucial. The theoretical maximum efficiency, shown in Table 1.1, is only achieved when the PA is driven to its maximum  $P_{out}$  capacity. As the output power of the PA is reduced, the load impedance seen by the PA is no longer optimal, leading to drastic efficiency degradation, as presented for a normalized class B PA case in Fig. 1.8. The performance in PBO for higher class PAs such as class D or  $D^{-1}$  is similar [28]. Therefore, in practical applications, the average efficiency of the transmitter is dramatically reduced even for digital power amplifiers employing higher classes of PA unit cells.

One of the most popular techniques to improve performance in PBO is using a Doherty network; however, the classical Doherty PA offers improvement only up to 6 dB in PBO [21, 22, 29], as

shown in Fig. 1.8. With the increasing use of higher order QAMs, PAs with efficiency improvement in deep PBO (up to 9 dB to 12 dB PBO) are desired [30–32]. Power supply switching techniques, such as class G Doherty, have been demonstrated in the literature [20, 25, 33, 34] to offer improved performance in deep PBO; however, these techniques need multiple power supplies and/ or a complex power management circuit with modulators capable of handling wide dynamic output currents. Nested Doherty architecture offers a potential solution, but its matching network consists of multiple inductors, which require a tremendous amount of area, and thereby increase the cost of the chip.

Thus, there is a need for compact and high efficiency PAs with improved performance in deep PBO.

## 1.2.5 Synergistic Work: Wideband Indium Phosphide (InP) PAs in F-band (90 GHz - 140 GHz)

In addition to the work done at sub-7 GHz in digital power amplifiers and digital transmitters, this dissertation also showcases power amplifiers in F-band for 6G type applications, such as holographic telepresence, tactile internet, and robotic surgeries [35]. The required data-rates for such applications are anticipated to be so high that it will be challenging to achieve in the sub-7 GHz spectrum and the mm-wave 5G spectrum (24 GHz to 30 GHz). One potential solution is to increase the operating frequency to around 100 GHz to exploit the vast amounts of available spectrum [36]. In 2018, the European Conference of Postal and Telecommunication Administration (CEPT) Electronic Communication Committee (ECC) recommended the use of 92-114.5 GHz and 130-174.8 GHz bands for back-haul and front-haul [37]. In 2019, US Federal Communications Commission (FCC) opened 21 GHz of unlicensed spectrum above 100 GHz for 6G experimentation. Along with wireless communication, F-band is also useful for THz imaging, positioning, and sensing, which all also rely on wireless circuit blocks [36]. Therefore, research interest in F-band has been gaining momentum over the past few years. Although CMOS offers numerous advantages for designing digital PAs in the sub-7 GHz spectrum, it is challenging to design high efficiency and high output power PAs in F-band due to its limited  $f_t/f_{max}$  and limited breakdown voltage. Instead, the heterojunction bipolar transistors (HBTs) offered in the Indium Phosphide (InP) process have garnered interest for such applications [38–42]. At these frequencies, low-loss power combining techniques are crucial to achieve high efficiency and high  $P_{out}$  PAs. However, the techniques explored in the literature either sacrifice efficiency to achieve increased output power and bandwidth [38, 39], or sacrifice bandwidth to improve efficiency and output power [40, 41]. Another important aspect of F-band InP PAs is to measure them with modulated waveforms. It is critical to test these PAs close to their saturation region with 6Gtype giga-bits-per-second (Gbps) modulated signals to understand the PA behavior in its practical use. However, the literature does not hold any readily implementable solutions to accomplish this measurement.

Thus, there is a need for wideband, high efficiency, and high output power InP PAs in Fband, along with modulation measurement setup that can drive these standalone PAs close to their saturation regime using commercial-off-the-shelf (COTS) equipment.

### **1.3** Dissertation Organization

#### **1.3.1** Thesis Statement

In recent years, digitally implemented power amplifier and transmitter architectures have garnered a significant amount of interest as they are amenable to scale with process nodes, and they have the potential to achieve improved battery life while being cost-effective. However, in practical applications, the battery life improvement is severely limited, as these architectures suffer from poor efficiency in output power back-off. Further, the two well-known digital transmitter implementations, polar and quadrature, have their own limitations. The polar implementation suffers from wide bandwidth expansion that limits the overall data-rate of the transmitter, while the quadrature implementation exhibits degraded efficiency due to in-phase/quadrature basis vector combination, which further limits the improvement to battery life.

This dissertation proposes and implements two techniques, a nested Doherty architecture and a transformer-within-transformer Doherty architecture, to improve efficiency in deep power back-off, while still achieving a compact form-factor to realize cost-effective transmitters. In addition, a multi-phase current-mode implementation of a digital power amplifier is also proposed and implemented to achieve higher data-rates, while also improving overall efficiency, to improve the battery life of the transmitter even further.

The techniques mentioned above are demonstrated using two different integrated circuit chips: 1) A four-way asymmetric digital polar power amplifier showcases an implementation of a fourway nested Doherty architecture to achieve efficiency enhancement up to 9 dB PBO, while maintaining a compact form-factor, for IoT-type applications around 5 GHz.

2) An asymmetric current-mode eight-phase digital Doherty transmitter using a single footprint transformer showcases an implementation of a transformer-within-transformer Doherty architecture to achieve efficiency enhancement up to 9.5 dB PBO, while also realizing a compact formfactor. Additionally, it highlights an eight-phase architecture implementation to improve the efficiency profile compared to its quadrature counterpart, for high data-rate applications around 6.5 GHz, such as WiFi 6E.

#### **1.3.2 Research Questions**

The dissertation will focus on the following research questions:

- **Research Question 1.** Typical nested Doherty transmitter architectures require numerous inductors that increase the area of the chip. How can a nested Doherty architecture be implemented with fewer inductors to reduce chip size, thereby making it cost-effective?
- **Research Question 2.** Although a transformer-within-transformer technique has the potential to facilitate a compact form-factor, and thereby realize cost-effective transmitters, what
are the implications of one transformer magnetically coupling with the other? How can this coupling be minimized?

- **Research Question 3.** Given that a multi-phase implementation of a current-mode digital transmitter improves the overall efficiency, and thereby improves battery life, how many basis vectors should be utilized in a practical design, and how do they affect the efficiency profile?
- **Research Question 4.** Typically, an even integer multiple of the operating frequency is provided to the transmitter, so it can be divided down to generate multi-phases on-chip. However, this technique becomes increasingly cumbersome as the operating frequency is increased and the number of basis vectors is increased. What circuit techniques can be utilized for a multi-phase generation with only the operating frequency being provided to the transmitter?

#### **1.3.3 Research Contributions**

The research contributions of this dissertation and a summary of the answers to the research questions mentioned above are listed below:

• Compact nested Doherty architecture: A nested Doherty architecture requires multiple quarter-wave transmission lines that are traditionally implemented using a physical transmission line model (π network with inductor-capacitor-inductor), which results in a significant number of inductors that are not practical to realize on-chip. This dissertation proposes the use of an inverted transmission line model (π network with capacitor-inductor-capacitor), which leads to a considerable number of inductors being placed in parallel that can be easily consolidated, resulting in a compact form-factor. An implementation of the proposed technique in a general-purpose 65 nm process, shown in Chapter 2, results in a measured 42% peak drain efficiency at 5.25 GHz with 1.6× improvement at 9 dB PBO compared to a normalized class B PA.

- Design strategy for a transformer-based asymmetric Doherty architecture: This dissertation introduces a design strategy for the implementation of a two-way asymmetric series Doherty network operating at any desired frequency and achieving efficiency enhancement at any desired back-off level, as shown in Section 3.2.4. This strategy is used as the starting point for designing the proposed transformer-within-transformer architecture.
- Single footprint transformer-based asymmetric Doherty architecture: A transformerwithin-transformer technique is proposed to achieve a compact form-factor for an asymmetric Doherty architecture. In order to prevent unwanted magnetic coupling between the two transformers, one of the transformers is twisted into a Figure-8 shape and inserted into a non-twisted transformer. The resulting magnetic flux in the two "octagons" of Figure-8 are equal in magnitude but opposite in direction; therefore, the net induced current in the nontwisted loops due to the Figure-8 loops is close to zero. An implementation of the proposed technique in a general-purpose 65 nm process, shown in Chapter 3, results in a measured 33% peak drain efficiency at 6.5 GHz with 1.76× improvement at 8 dB PBO compared to a normalized class B PA.
- Multi-phase current-mode transmitter architecture: An multi-phase digital architecture is proposed to overcome the limitation of the massive bandwidth expansion associated with the polar design, while also improving the efficiency profile degradation associated with the quadrature digital architecture by  $1.64 \times$ , as shown in Fig. 3.4. An eight-phase implementation in a general-purpose 65 nm process, shown in Chapter 3, results in a peak drain efficiency of 33%, and a relatively high worst-case efficiency of 25% that corresponds to only a  $0.24 \times$  reduction, compared to a  $0.45 \times$  reduction associated with an idealized quadrature architecture.
- Wideband multi-phase generation for multi-phase transmitter: Traditionally, flip-flop based dividers are used to generate multi-phases for wideband operation. However, generating eight equally spaced phases around 6.5 GHz using this technique results in an input frequency of 26 GHz. To prevent the requirement of this 4× operating frequency, a 3-ring

polyphase filter that is injection locked to an eight-stage ring oscillator is proposed in this work. The proposed design requires only the operating frequency as the input frequency. An implementation of the proposed technique in a general-purpose 65 nm process, shown in Chapter 3, results in more than 1 GHz of locking range around the operating frequency of 6.5 GHz.

#### **1.3.4** Dissertation Organization

The rest of the dissertation is organized as follows:

# Chapter 2: Design of a CMOS Nested Doherty Digital Power Amplifier for Low-Power Applications

This chapter presents a rigorous analysis to show the improvement in system efficiency for a nested Doherty architecture over the popular class G Doherty architecture for low-power applications. This chapter also presents designing an N-way current-combined asymmetric Doherty PA to improve efficiency in deep back-off power levels, and it illustrates design techniques to reduce the overall size of the PA by the consolidation of components in the input and output matching networks.

## Chapter 3: Design of an Asymmetric Current-Mode Multi-phase Digital Doherty Transmitter Using a Single Footprint Transformer-Based Matching Network

This chapter explores the effect of increasing the number of basis vectors of a multi-phase architecture to improve the efficiency profile with respect to the output phase. It also presents a wideband multi-phase generation technique with only the operating frequency as the input to the transmitter. Further, it rigorously derives design equations for implementing an asymmetric transformer-based series Doherty matching network at any desired frequency to achieve efficiency enhancement at any desired back-off level. Finally, it illustrates a transformer-within-transformer technique, where one of the transformers is twisted and inserted into a non-twisted transformer for magnetic flux cancellation, thereby allowing for a compact matching network.

#### **Chapter 4: Wideband InP PAs in F-band with Modulation Measurements**

This chapter presents the design of a stacked PA in Teledyne's 130 nm InP process to achieve wide bandwidth and high efficiency in F-band. The stacked design is used as the basis for a distributed active transformer-based PA to achieve wide bandwidth and high  $P_{out}$ . Finally, it illustrates a frequency and data-rate scalable modulation measurement setup using only readily available equipment, and it demonstrates state-of-the-art data-rates on a stand-alone PA operating above 100 GHz with  $P_{out}$  above 10 dBm

## Chapter 5: Conclusions, Considerations for Designing Digital Transmitters, Future Directions, and Other Work

This chapter concludes the dissertation by presenting a methodology for designing a digital transmitter. Additionally, it includes a discussion of future directions related to this work, and it also describes other work that was not part of this dissertation.

# **1.4 List of Publications**

[JS1] J. Sheth, L. Zhang, X. Shen, V. Iyer and S.M. Bowers, "An Asymmetric Current-Mode Multi-Phase Digital Doherty Transmitter Using a Single Footprint transformer-based Matching Network," to be submitted to *IEEE Open Journal of the Solid-State Circuits Society (OJ-SSCS)*.

[JS2] J. Sheth and S. M. Bowers, "A Four-Way Nested Digital Doherty Power Amplifier for Low-Power Applications," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 69, no. 6, pp. 2782-2794, June 2021, doi: 10.1109/TMTT.2021.3057895.

[JS3] J. Sheth and S. M. Bowers, "A Differential Digital 4-Way Doherty Power Amplifier with 48% Peak Drain Efficiency for Low Power Applications," *2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, 2020, pp. 119-122, doi: 10.1109/RFIC49505.2020.9218395.

[JS4] L. Zhang, V. Iyer, J. Sheth, L. Xie, R. Weikle and S.M. Bowers, "An F-band DAT Based Power Amplifier in InP 130 nm HBT Technology," to be submitted to *IEEE Transactions on Tera*- [JS5] V. Iyer, J. Sheth, L. Zhang, R. Weikle and S.M. Bowers, "A 15.3 dBm, 18.3% PAE Fband Power Amplifier in 130 nm InP HBT with Modulation Measurements," accepted in *IEEE Microwave and Wireless Technology Letters*.

[JS6] V. Iyer, J. Sheth, L. Zhang, R. M. Weikle and S.M. Bowers, "A 90-125 GHz Stacked PA in 130 nm InP HBT with 18.3 % peak PAE at 15.3 dBm Output Power," *2022 United States National Committee of URSI National Radio Science Meeting (USNC-URSI NRSM)*, 2022, pp. 224-225, doi: 10.23919/USNC-URSINRSM57467.2022.9881401.

[JS7] L. Zhang, V. Iyer, J. Sheth, L. Xie, R. M. Weikle and S. M. Bowers, "A 117.5-130 GHz 22.1 dBm 11.5% PAE DAT Based Power Amplifier in InP 130 nm HBT Technology," *2021 16th European Microwave Integrated Circuits Conference (EuMIC)*, 2022, pp. 229-232, doi: 10.23919/Eu-MIC50153.2022.9783740.

[JS8] D. Duvvuri, J. Sheth, S. Bhattacharya, S. Hanifi, B. H. Calhoun and S. M. Bowers "A 5  $\mu$ W -80 dBm Multi-channel Tuned RF Wake-up Receiver for SHF Band Applications," *currently being measured*.

[JS9] V. Iyer, C. Moore, J. Sheth, S.M. Bowers and R. M. Weikle "A 600 GHz Phase Conjugation System in 250 nm InP Process," *currently being measured*.

# **Chapter 2**

# Design of a CMOS Nested Doherty Digital Power Amplifier for Low-Power Applications

## 2.1 Introduction, Motivation, and Prior Art

As mentioned in Chapter 1, emerging applications, such as agriculture monitoring, inventory tracking, and smart home connectivity require ultra low-power ad-hoc sensor networks [6]. These networks are composed of transceiver nodes that can benefit from a compact form-factor. Since antennas dominate the overall volume of such nodes, increasing the frequency of operation is desirable to reduce the antenna size. However, reducing the volume of these nodes limits the battery capacity, which in turn affects their lifetime. Since the PA is the most power-hungry component of this system, there is a need for high-efficiency PAs for low-power applications (< 10 dBm). Also, the nodes in these ad-hoc networks can form asymmetric links with the base-station, where the use of higher-order modulation schemes can reduce the energy consumption per bit by keeping the PA turned on for a shorter period of time [43]. Such nodes can benefit from PAs with high efficiency



Figure 2.1: Extension of a generic (a) 2-way Doherty architecture to (b) 3-way, and to (c) n-way through nesting, which results in (d) 2, 3, and n efficiency enhancement peaks, respectively, in power back-off (PBO).

in deep PBO to further extend their lifetime. Therefore, a four-way nested Doherty PA operating around 5 GHz is proposed for such applications. However, it is important to note that the nested Doherty technique presented in this chapter is frequency and output power scalable, and not just limited to low-power devices at 5 GHz.

Current state-of-the-art low power PAs are implemented digitally [43, 44], as this facilitates the integration of the PA into the transmitter. Typically, they use advanced topologies such as class D [17, 45, 46] and class E/F<sub>2</sub> [47, 48] to improve the efficiency at maximum output power, but the efficiency deteriorates in PBO. To improve the performance of a digital PA [13, 15, 16, 49, 50] in back-off, it can be used in conjunction with the popular Doherty architecture [29, 51]. However, it offers improvement to only about 6 dB PBO [21, 22, 52]. A class G technique [34], applied to the Doherty architecture, has been shown in the literature to boost the performance of PAs in deep PBO [20, 24, 33] for high-power applications (> 25 dBm). However, it relies on switching the  $V_{DD}$ 



Figure 2.2: Digital implementation of a power amplifier (PA) block consists of multiple unit cells, where each unit cell is composed of a set of digital buffers and a common source output stage.

of the PA to a lower value to achieve efficiency enhancement. This technique is challenging to scale down to PAs with low output power (< 10 dBm) operating at higher frequency ( $\sim$  5 GHz), as described in Section 2.2.

In the past, the use of multi-way analog Doherty architectures [30, 53] has been limited due to linearity concerns of the over-driven main amplifier. The main amplifier saturates in deep back-off when the input swing is relatively small, and when the input swing is increased to saturate the auxiliary amplifiers, the main amplifier is driven in deep saturation, causing non-linearities [54]. Recent digital implementations of multi-way combiner based Doherty architectures, such as the Nested Doherty PA illustrated in Fig. 2.1, have overcome the linearity concern related to the over-driven main amplifier since they require the input swing to remain constant, and they have been shown in the literature to offer improved efficiency in deep PBO [23, 32]. In this work, a differential four-way digital Doherty PA (D4DPA) is proposed that can theoretically maintain optimal matching for enhanced efficiency up to 9 dB PBO. It achieves a maximum  $P_{out}$  of 7.3 dBm at 4.75 GHz with peak drain efficiency (DE) of 48% and system efficiency (SE) of 31%. It also achieves a 2.2× DE improvement compared to normalized class B PA at 12.8 dB PBO at 5.25 GHz, and a DE of 34% for a 1 MSym/s 16 QAM RF waveform.

The rest of this chapter is divided into the following sections. Section 2.2 provides an in-depth analysis of the advantages offered by the nested architecture for current-mode digital PAs with low output power. The design of the D4DPA is discussed in Section 2.3 with the help of an asymmetric

Doherty architecture that achieves efficiency enhancement at an arbitrary back-off level. The implementation and simulations of the PA along with the matching networks and quadrature hybrid are presented in Section 2.4. CW and modulation measurements are shown in Section 2.5, and conclusions are drawn in Section 2.6.

# 2.2 Nested Doherty Architecture for Low Output Power Digital PAs

Presenting an optimal load to the PA is crucial to achieve maximum voltage swing on the transistor's drain, and thus maximum efficiency. For a class B PA, the presented impedance is optimal only at the maximum output power. As the output power decreases, the voltage swing also decreases, leading to degradation in efficiency. The traditional Doherty architecture overcomes this challenge and improves efficiency through 6 dB PBO. This architecture is composed of a main amplifier, a peaking amplifier, and an impedance inverter. At maximum output power, the main and the peaking amplifier work collectively to present the optimal impedance. At 6 dB PBO, the peaking amplifier turns off completely, and the inverting network doubles the impedance presented to the main amplifier. This re-maximizes the voltage swing and leads to efficiency enhancement, as shown in Fig. 2.1(d). Thus, such an architecture leads to 2 efficiency enhancement peaks; however, the efficiency degrades beyond the second peak.

To improve performance in deep PBO, multiple PA blocks and multiple inverter networks can be used to form a nested Doherty architecture, as depicted in Fig. 2.1(c). As the amplifier is pushed in PBO, the PA blocks start turning off sequentially from PA<sub>n</sub> to PA<sub>2</sub>. With each PA block turning off, the efficiency of the amplifier ideally boosts back to its maximum value due to the optimal impedance presented by the inverter networks. Thus, an n-way architecture with n PA blocks and n - 1 inverter networks produces n efficiency enhancement peaks, as shown in Fig. 2.1(d).

The nested Doherty architecture, for this work, is implemented using current mode digital PA



Figure 2.3: (a) A generic n-way class G Doherty architecture with  $n - 2 V_{DD}$ s resulting in n efficiency enhancement peaks, and (b) desired current drive strengths of the main and the peaking amplifier for case n = 3 and  $\alpha_1$  = 2.

blocks. Current-mode digital PAs [14, 18, 49, 55], and current-mode digital Doherty PAs [21] have been demonstrated to operate at high frequencies (> 3.5 GHz) with some of the highest reported drain efficiencies at high output powers in the literature. They have the potential to offer some of the same advantages for low output power PAs as well [45, 47]. Such a digital PA consists of multiple unit cells, as shown in Fig. 2.2. Each unit cell is composed of a final output stage that is usually driven by a set of digital buffers. Ideally, the power lost in the buffers would be relatively low compared to the power lost in the output stage to achieve high system efficiency. However, as frequency increases, the power lost in these buffers also increases linearly as  $CV^2f$ . Additionally, for low output powers, the PAs typically employ lower  $V_{DD}$ s (< 1V) to improve drain efficiency, but this limits their voltage gain, which further increases the relative power dissipated in the buffers. Therefore, the power lost in the buffers cannot be ignored for the proposed applications.

The buffer power dissipation becomes especially crucial in deep PBO, and the use of a nested Doherty architecture can help improve system efficiency, especially in comparison to other efficient PBO architectures, including the  $V_{DD}$  switching class G Doherty PA shown in Fig. 2.3. This improvement, particularly in lower output power applications such as IoT nodes, can be highlighted using two different models with increasing accuracy: 1) ideal transistor model, where switching the  $V_{DD}$  does not affect the current drive of the output stage, and 2) Drain source voltage ( $V_{ds}$ ) dependent transistor model, where the current drive is affected due to non-linearity of the transis-



Figure 2.4: Comparison of nested Doherty with class G Doherty architecture in power back-off using the ideal and the  $V_{ds}$  dependent transistor model. (a) Power dissipation trend in the output stage and buffers. (b) Drain efficiency of the output stage and system efficiency trend of the whole PA.

tors. Both scenarios assume ideal lossless matching networks, and they both neglect the additional power lost due to the switches of the class G Doherty architecture.

#### 2.2.1 Ideal Transistor Model

For the nested Doherty architecture, as each PA block is turned off, the digital buffers within that PA block can be turned off as well. Thus, the power lost in the buffers is reduced proportionally to the reduction in the output power, as illustrated in Fig. 2.4(a). Therefore, when the drain efficiency enhancement is achieved, the system efficiency also returns to its maximum value, as shown in Fig. 2.4(b).

In contrast, the system efficiency of class G digital Doherty architecture degrades in back-off. This architecture operates by switching its supply to a lower  $V_{DD}$  in deep PBO, as shown in Fig. 2.3(a).



Figure 2.5: Improvement offered by nested Doherty PA over class G Doherty PA dependent on the relative buffer power consumption at 9 dB PBO. Losses in the OMN and the switches of class G are ignored.

However, to satisfy the Doherty behavior and achieve drain efficiency enhancement, the power lost in the digital buffers is not reduced proportionally to the reduction in output power. For example, a simple implementation of a class G Doherty PA, depicted in Fig. 2.3(b), shows that when the output power is reduced by 6 dB, the main and the peaking PA desire half their maximum currents. This implies that half the number of digital cells need to be "on". So even though the output power drops by a fourth, power dissipated by the buffers is only halved, thereby increasing relative losses through the buffers. This issue can be extended to an n-way architecture with  $n - 2 V_{DD}$ s, and the resulting reduced system efficiency due to an increase in n is represented in Fig. 2.4(a)

The efficiencies in PBO of the two architectures are compared in Fig. 2.4(b). The jumps in the system efficiency, for the class G Doherty PA, occur when the  $V_{DD}$  switches to a lower value, and thus higher power is consumed in the buffers. The improvement offered by the nested Doherty PA over class G Doherty PA depends on the relative power lost in the buffers, which is defined as follows:

$$\% P_{buff,max} = \frac{P_{buff}}{P_{buff} + P_{output \, stage}} \times 100 \left|_{Pout.max} \right|$$
(2.1)

The above metric is defined at the maximum output power level to enable comparison between the two architectures. From simulations in a 65 nm CMOS process at 5 GHz and output stage  $V_{DD}$  of 0.55 V, the expected power lost in the buffer accounts for about 30% to 40% of the total power dissipation. This results in a system efficiency improvement of  $1.3 \times$  to  $1.4 \times$  for the nested



Figure 2.6: An example showing the effect of lowering  $V_{DD}$  on a common source stage driven with a constant amplitude voltage swing. Simulation performed in a general-purpose 65nm CMOS process with optimum impedance presented at each  $V_{DD}$  value to obtain full swing at the output.

Doherty architecture at 9 dB PBO, as shown in Fig. 2.5.

#### 2.2.2 Drain-Source Voltage (V<sub>ds</sub>) Dependent Transistor Model

For PAs with low  $V_{DD}$  (< 1V), the current drive of the transistor is severely affected when the  $V_{DD}$  is lowered even further. This current drive depends on the input voltage swing provided to the gate of the device. Since the transistors are driven by digital buffers, the amplitude of this voltage swing remains constant, regardless of whether the  $V_{DD}$  of the output stage is lowered. Therefore, in deep PBO, portions of the input swing can exceed  $V_{DD}$  and push the transistor in the triode region leading to reduced RF transconductance and thereby, reduced current drive. To overcome the current drive reduction and to achieve enhanced drain efficiency, more digital cells need to be turned "on", which further increases the relative power dissipated in the buffers. For an n-way class G Doherty architecture, as *n* increases, the buffers can potentially dissipate more power than the output stage in PBO, as shown in Fig. 2.4(a). This severely affects system efficiency.

In comparison, the nested digital Doherty architecture does not encounter this issue because the  $V_{DD}$  always remains constant. Fig. 2.6 depicts an example simulation performed in a 65 nm CMOS process on a common source stage, to show the effect of  $V_{ds}$  dependency on the current drive. Taking this simulation and Equation (2.1) into account, the improvement offered by the

nested Doherty architecture is calculated. It has the potential to improve the system efficiency by approximately  $1.6 \times$  to  $1.9 \times$  at 9 dB PBO, shown in Fig. 2.4(b) and Fig. 2.5.

It is important to note that the efficiency of the output matching network (OMN), also known as passive efficiency, in a nested Doherty architecture degrades with PBO; however, the passive efficiency of a class G Doherty architecture enhances back to its maximum value when the  $V_{DD}$  is switched [20]. Therefore, the OMN of the nested digital Doherty architecture needs to be designed with care to achieve the improvements described in the above sections.

# 2.3 Design of a Four-way Nested Doherty PA

In this work, a four-way nested Doherty architecture is implemented to achieve efficiency enhancement in 3 dB increments through a 9 dB back-off. The 3 dB increment also helps the PA ideally maintain its efficiency close to the maximum value through the whole enhancement range.

The design of the proposed four-way PA can be explained through an asymmetric 2-way Doherty architecture that achieves efficiency enhancement at an arbitrary back-off level. This arbitrary level is dictated by two design parameters: 1) the characteristic impedance of the inverter network, and 2) the ratio of the main and peaking amplifier's maximum current drive. These parameters are constrained as

$$\frac{Z_{Tline}}{Z_0} = \alpha_1 \tag{2.2}$$

$$\frac{\max(I_{Peak})}{\max(I_{Main})} = \alpha_1 - 1 \tag{2.3}$$

where  $\alpha_1$  denotes the back-off level on a linear scale where efficiency enhancement is achieved [56]. Consequently,  $\alpha_1$  also denotes the back-off level where the peaking amplifier turns off. Equations (2.2) and (2.3) can be intuitively explained through Fig. 2.7. Here, the current profile of the main amplifier is left constant, while that of the peaking amplifier is varied, along with the characteristic impedance of the inverter network. The resulting current profiles of the peaking amplifier and the resulting back-off efficiency curves are depicted in Fig. 2.7(c) and (d), respectively.



Figure 2.7: (a) Asymmetric Doherty architecture designed using (b) scaling ratios to generate the inverting network impedance and (c) current drive strengths of the main and peaking amplifier that result in (d) efficiency enhancement at an arbitrary back-off level.

These curves lead to efficiency enhancement from 1 dB PBO to 10 dB PBO in 1 dB steps.

In order to achieve efficiency enhancement at 3 dB PBO, the asymmetric Doherty architecture can be designed with

$$\alpha_1 = \sqrt{2} \quad \Longrightarrow \quad Z_{Tline1} = \sqrt{2}Z_0 \tag{2.4}$$

$$\max(I_{Main}) = I \tag{2.5}$$

$$\max(I_{Peak}) = I(\sqrt{2} - 1)$$
 (2.6)

as depicted in Fig. 2.7(b). At the maximum output power level, the two amplifiers work collectively, and the inverting network ensures that optimal impedances are presented to both amplifiers. These impedance values, mentioned in Fig. 2.8(a), lead to maximum efficiency. In back-off, the current drives of both the amplifiers decrease according to Fig. 2.7(c), and at 3 dB PBO, the peaking amplifier turns off completely, while the current drive of the main amplifier decreases by a



Figure 2.8: (a) Asymmetric 2-way Doherty architecture that enhances efficiency through 3 dB PBO, which extends to (b) a 3-way architecture through current drive strengths shown in (c) to achieve (d) efficiency enhancements at 3 dB and 6 dB PBO.

factor of  $\sqrt{2}$ . However, the inverting network boosts the impedance presented to the main amplifier by the same factor of  $\sqrt{2}$  to maximize the voltage swing, resulting in efficiency enhancement at 3 dB PBO, as depicted in Fig. 2.8(d).

To further improve the performance in back-off, this concept can be extended to a 3-way architecture through nesting. As shown in Fig. 2.8(b),  $PA_1$  and  $PA_2$  can be combined and considered as the "main" amplifier with a current drive of 2I, at 0 dB PBO. Now, the overall architecture looks very similar to the 2-way architecture, except that the "main" amplifier has twice the current drive strength. So the rest of the design parameters are scaled accordingly:

$$\alpha_2 = \sqrt{2} / 2 \implies Z_{Tline2} = (\sqrt{2Z_0}) / 2 \tag{2.7}$$

$$\max(I_{PA3}) = I(\sqrt{2} - 1) \times 2 \tag{2.8}$$



Figure 2.9: (a) Proposed four-way Doherty PA architecture to maintain optimal matching through 9 dB PBO, yielding (b) efficiency enhancements by (c) scaling RF current drives to (d) produce the desired optimal output impedances.

The resulting impedances, at 0 dB PBO, are mentioned in Fig. 2.8(b). As expected, these impedances are half of those from the 2-way architecture, since the equivalent current drive strength has been doubled. From 3 dB PBO onward,  $PA_3$  turns off, leading to the same design as that of the 2-way architecture, which results in another efficiency enhancement peak. Overall, the 3-way architecture obtains three efficiency peaks, as illustrated in Fig. 2.8(d).

This concept is further extended to a four-way nested architecture, as presented in Fig. 2.9(a), where each amplifier block is biased individually to generate the RF current drives, as shown in Fig. 2.9(c). These values for current drives are obtained by extension from the 3-way architecture, where

$$\alpha_3 = \sqrt{2} / 4 \implies Z_{Tline3} = (\sqrt{2}Z_0) / 4 \tag{2.9}$$

$$\max(I_{PA1}) = \max(I_{PA2}) = I$$
 (2.10)



Figure 2.10: Implementation of a four-way Doherty output impedance matching network with on-chip lumped elements for transmission lines. Consolidation reduces the total number of inductors from 10 down to 4. The actual implementation is differential.

$$\max(I_{PA3}) = I \times 2 \tag{2.11}$$

$$\max(I_{PA4}) = I(\sqrt{2} - 1) \times 4 \tag{2.12}$$

The impedances seen by the amplifiers during back-off are plotted in Fig. 2.9(d), and they lead to 4 efficiency enhancement peaks through 9 dB PBO as depicted in Fig. 2.9(b). Here,  $Z_0/4$  corresponds to 50  $\Omega$  to avoid the need for additional impedance transformation at the output.

### **2.4 Implementation and Simulations**

The proposed D4DPA is implemented in a general-purpose 65 nm CMOS process, as a proof-ofconcept. As mentioned in Section 2.3, presenting the optimal impedances is crucial to achieve high efficiency, especially in back-off. To prevent the matching network from de-tuning due to the parasitics of the wirebonds, a differential architecture is implemented. This provides a virtual short on-chip for the fundamental frequency, so the matching network is not solely reliant on



Figure 2.11: Simulated drain efficiency and passive efficiency of the implemented D4DPA showing the effects of loss in passive OMN.

bypass capacitors.

Although the transmission lines, shown in Fig. 2.9(a), can be implemented using either a high-pass equivalent or a low-pass equivalent LC network, the high-pass equivalent offers two advantages: 1) reduced total number of inductors through consolidation, and 2) reduced inductance values. The high-pass equivalent LC, along with the inductors used to resonate the drain capacitance and provide biasing, lead to an OMN design with 10 inductors, as depicted in Fig. 2.10. However, various inductors can be consolidated, such as  $L_1$ ,  $L_2$ , and  $L_{res2}$  can be reduced to simply  $L_{eq2}$ . Overall, consolidation helps reduce the total inductors down to just 4, thereby reducing the area of the chip. Additionally, some of the inductors necessary for drain capacitance resonance are large (> 10 nH) due to the small size of the transistors, as they are meant for outputting low power levels. Consolidation, due to the use of high-pass equivalent, reduces the inductance values and makes them feasible to be implemented on-chip around 5 GHz [57]. Note that the high-pass equivalent network produces a delay of  $\angle 270^{\circ}$ , as opposed to  $\angle 90^{\circ}$  from the transmission lines. Therefore, input phases of each PA block are updated, as shown in Fig. 2.10, to maintain the impedance transformation described in Section 2.3.

The overall performance of the PA is dependent on the implementation of the OMN. In an ideal scenario with no loss in the passives, the PA achieves a drain efficiency around 70% throughout the whole 9 dB back-off enhancement range, as shown in Fig. 2.11. But on-chip inductors are



Figure 2.12: (a) Implementation of a quadrature hybrid along with LC network to impedance transform and bias the RF inverters resulting in 6 inductors. (b) Consolidated quadrature hybrid and input matching network resulting in only 4 inductors. The actual implementation is differential.

lossy, and they typically obtain a quality factor of about 15 around 5 GHz. Further, they need to be connected to the digital PA blocks. These connections were implemented through transmission lines, and their effects were taken into account through an electromagnetic (EM) simulator using a method of moments (MoM) solver. The value of the inductors were then modified such that the desired inductance is presented after accounting for the transmission lines. The overall passive efficiency of the matching network and its effects on the performance are also depicted in Fig. 2.11. The degradation in the passive efficiency with PBO is the primary reason for the difference between the lossless and the lossy simulated drain efficiency plots. Finally, the network is also simulated to capture the effects of inductor coupling, and it results in < 1% and < 0.15 dB change in drain efficiency and output power, respectively, for the entire back-off range.

The quadrature signals at the input of the PA are generated using a differential quadrature hybrid. This structure is followed by an LC network that serves two functions: 1) bias a set of differential



Figure 2.13: Simulated frequency dependence comparison of voltage gain and quadrature phase generation for the input network before and after consolidation using ideal components.

inverters that drive the PA blocks, and 2) impedance transform the input of the inverters and provide 50  $\Omega$  to the output of the quadrature hybrid. This LC network can be combined with the quadrature hybrid to not only reduce the total number of inductors from 6 down to 4, but also retain the ability to provide dc biasing. Fig. 2.12(a) and (b) show the detailed description of the consolidation process, where  $L_B$ ,  $C_B$ , and  $L_Q$  can be reduced to  $L_{B2}$  and  $C_{B2}$ .

Since the input network (quadrature hybrid + input buffer's matching network) is consolidated, its output now drives a large impedance presented by the gate of the inverter. Thus, it is convenient to evaluate the performance of the consolidated input network through a voltage gain metric, rather than s-parameters. Fig. 2.13 shows the difference in the simulated performance of the input network using lossless elements before and after consolidation. While consolidation makes the input network compact and lowers the value of the inductor  $L_{B2}$ , making it manageable to generate on-chip, it reduces the input network's bandwidth. Fig. 2.14 shows the simulated frequency dependence of the realized input network for voltage gain and quadrature phase generation, where the simulations were performed using an EM MoM simulator. The network achieves a peak voltage gain of 2.1 at 5 GHz and maintains a reasonable quadrature phase generation from 4.5 GHz to 5.5 GHz. This input network is connected to the pads through a long transmission line which reduces the peak voltage gain to 1.8. Although Fig. 2.14(a) exhibits a mismatch in the I and Q magnitude, the mismatch is reduced at the input of the PA's final stage, as the input network is followed by a set of digital buffers that act as limiters. Also, the quadrature phase deviation from the ideal



Figure 2.14: EM simulated frequency dependence of the realized consolidated input network for (a) quadrature gain and phase generation, and (b) mismatch in differential magnitude and phase.

 $90^{\circ}$  across frequency leads to a degradation in drain efficiency of only < 1.5% from 4.5 GHz to 5.2 GHz, and it increases to < 4% as the operating frequency increases to 5.5 GHz. Finally, Fig. 2.14(b) shows that the consolidated input network preserves differential magnitude and phase for the entire simulated band.

The D4DPA is composed of 4 PA blocks, and Fig. 2.15 shows a detailed diagram of one such block. Each PA block consists of 4-bit binary-weighted unit cells with common source (CS) output stages. Every CS stage is driven by an inverter that is designed to present low impedance at its output during its operation to alleviate the stability concern of the CS stage. The sizes of the CS stage's LSB bits are mentioned in Fig. 2.15, and the rest of the bits are sized proportional to their binary weights. Although the PA<sub>3</sub> and PA<sub>4</sub> blocks are sized the same, bits x1 and x2 for the PA<sub>4</sub> block are turned off when the D4DPA outputs maximum power. The resulting current generated by PA<sub>4</sub> is within 3.5% of the desired maximum current from Fig. 2.9(c). As seen from the lossless passives plot of Fig. 2.11, the chosen transistor ratios maintain high efficiency until 9 dB PBO.



Figure 2.15: Block-level schematic of each digital PA block along with the transistor sizes. The PA block can be turned off through "PA<sub>x</sub> Block Enable" node controlled by a 4 input functional OR gate.



Figure 2.16: Frequency dependence of differential quadrature phases at the input of the CS stage of 4 PA blocks for the MSB (x8 bit).

The baseband digital amplitude control  $(BB_{xx})$  for the PA is implemented through NAND gates. To reduce the dynamic power dissipation, the baseband digital bits are also sent to a functional 4 input OR gate that automatically disables the entire PA block through the "PA<sub>x</sub> Block Enable" control node, when all the digital inputs are set low.

Since the input network is followed by a set of digital gates which then connect to the CS stage, all the connections between them require careful layout to minimize degradation of the quadrature phases. Therefore, for reference, the frequency dependence of the quadrature phases at the input of the MSB (x8 bit) post layout for all the PA blocks are shown in Fig. 2.16. The overall block



Figure 2.17: Overall block diagram of the implemented differential four-way digital Doherty power amplifier with consolidated output and input matching network and on-chip quadrature signal generation.



Figure 2.18: Die photo of the PA realized in a 65 nm CMOS process.

diagram of the implemented D4DPA with the consolidated differential input and output networks is shown in Fig. 2.17. Even after consolidation, the desired value for  $L_{eq1}$  is challenging to implement on-chip at 5 GHz. Therefore, additional capacitors are added in parallel to achieve the necessary inductance, as depicted in Fig. 2.17.



Figure 2.19: Simulated and measured input reflection coefficient  $(S_{11})$  of the PA.



Figure 2.20: Measured drain efficiency (DE) and system efficiency (SE) of the implemented PA at (a) 4.75 GHz and (b) 5.25 GHz compared to normalized class B and class A PAs.

# 2.5 Measurements

The die photo of the implemented PA is shown in Fig. 2.18. The simulated and measured input reflection coefficient ( $S_{11}$ ) is shown in Fig. 2.19. The measured  $S_{11}$  is lower than -10 dB from 4.25

GHz to 5.45 GHz.

#### 2.5.1 CW Measurements

The output CS stage is biased with a  $V_{DD}$  of 0.55 V to deliver low output power levels efficiently. A constant envelope differential signal is generated through an external balun and provided to the input of the chip. The 16 digital bits of the PA are controlled through an external pattern generator to vary the output amplitude levels. Fig. 2.20(a) and (b) show the drain efficiency and the system efficiency of the PA in power back-off at 4.75 GHz and 5.25 GHz and compare them to normalized class A and class B curves. Each of the circular and diamond points on the plot is a measured data point, and the optimal points that lead to maximum efficiency are highlighted as well. These optimal points were found through measurement and then set through a look-up table. Here, the system efficiency accounts for the power dissipated in the output stage as well as all the buffers. At 4.75 GHz, the PA achieves a peak drain efficiency of 48% with a peak output power of 7.3 dBm. The PA also obtains a DE of 37%, 35%, and 27% at 3 dB, 6 dB, and 9 dB PBO respectively, which corresponds to a  $1.1 \times$ ,  $1.5 \times$  and  $1.6 \times$  improvement compared to normalized class B. Similarly, at 5.25 GHz, the PA achieves a DE of 42% with  $P_{out}$  of 6.5 dBm. The PA also obtains a DE of 36%, 28%, 23%, and 20% at 3 dB, 6 dB, 9 dB, and 12.8 dB PBO respectively, which corresponds to a  $1.2 \times$ ,  $1.4 \times$ ,  $1.6 \times$ , and a peak  $2.2 \times$  improvement over normalized class B.

The maximum system efficiency at 4.75 GHz and 5.25 GHz is 31% and 26%, which implies that the buffers consume 35% and 38% of the total power dissipated, respectively. This emphasizes the need for a nested architecture, as explained in Fig. 2.5 under section 2.2.

Fig. 2.21(a) and (b) show the frequency dependence of the PA for DE, SE, gain, and  $P_{out}$  at the maximum DE setting. The PA maintains the DE and SE above 35% and 20%, respectively, over the frequency range of 4.5 GHz to 5.25 GHz. Fig. 2.21(a) also shows the frequency dependence of DE in back-off. It is measured for the setting that gives the maximum efficiency enhancement (PA is 1/8 turned on), and it is compared with the efficiency of the normalized class B PA for reference.



Figure 2.21: Measured frequency dependence of the implemented PA for (a) DE, SE, (b) gain, and output power ( $P_{out}$ ) at peak DE, where the input power is varied to maximize efficiency while still maintaining gain above 10 dB. DE performance is also shown when the PA is 1/8 turned on, and it is compared with normalized class B.



Figure 2.22: Measured AM-PM degradation of the implemented PA at 5.25 GHz for the optimal drain efficiency points.

The use of narrow-band matching networks restricts the operating frequency, especially in PBO, to a small range. The optimum frequency of operation for the implemented PA to achieve efficiency



Figure 2.23: For a 1 MSym/s QPSK waveform at 5.25 GHz, measured (a) constellation, (b) DE in PBO compared to normalized class B waveform, and (c) r.m.s. error vector magnitude ( $EVM_{rms}$ ), and adjacent channel leakage ratio (ACLR) in PBO.

enhancement in PBO is at 5.25 GHz. This explains the difference in the measured performance in PBO at 4.75 GHz and 5.25 GHz seen in Fig. 2.20(a) and (b). However, the PA still performs appreciably better than class B in PBO around 5.25 GHz as shown in Fig. 2.21(a). Additionally, the PA achieves a maximum gain of 15 dB at 5 GHz with a  $P_{out}$  of 6.5 dBm. Since the performance of the input network of the PA is frequency dependent, as shown in Fig. 2.14, the input power provided to the PA is varied with frequency to maximize efficiency, while still maintaining gain above 10 dB. Therefore, even though the gain lowers, the output power stays relatively constant, within  $\pm$  1 dB from 4.75 to 5.35 GHz, as depicted in Fig. 2.21(b).

The digitally switched on and off unit cells of a current mode PA changes the output capacitance



Figure 2.24: Measurement setup for outputting a 16 QAM RF signal through the implemented PA.



Figure 2.25: Measured (a) constellation,  $EVM_{rms}$ , and (b) ACLR for a 1 MSym/s 16 QAM waveform at 5.25 GHz.

that it presents to the load. This results in an amplitude to phase non-linearity that is captured in an AM-PM measurement, and it degrades as the PA is pushed in PBO. The static AM-PM measurement at 5.25 GHz is depicted in Fig. 2.22. The code words that corresponded to the optimal drain efficiency points were used for this measurement, and it shows a maximum degradation of 33° up

to 11 dB PBO. This is the primary reason that the modulation capability of the implemented PA is limited to 16 QAM. To perform higher-order modulation schemes, an AM-PM linearization would be needed (not implemented in this work.)

#### 2.5.2 **RF Modulated Measurements**

To perform QPSK and 16 QAM measurements, a PSK RF signal is generated through a vector signal generator (VSG) and provided to the RF input of the PA. The PSK data is oversampled at  $10 \times$  for all the measurements.

The QPSK measurement results for a 1 MSym/s waveform at 5.25 GHz are shown in Fig. 2.23. The PA achieves an average  $P_{out}$  of 4.2 dBm with a DE of 37% and SE of 21%. It also achieves an r.m.s. error vector magnitude ( $EVM_{rms}$ ) of -22 dB and adjacent channel leakage ratio (ACLR) of -23.1 dBc. Further, the digital controls of the PA are used to vary the power levels in back-off. The PA obtains a DE improvement of  $1.7 \times$  at 9.4 dB PBO and it maintains its EVM and ACLR below -22 dB and -23 dBc, respectively, over the entire measured back-off range.

The measurement setup for 16 QAM is shown in Fig. 2.24. An M8190A arbitrary waveform generator (AWG) is used to generate baseband I/Q signals with the desired symbol rate which is then provided to an N5182B VSG to get up-converted to RF. The AWG also generates a trigger signal at the symbol rate that is used to synchronize the up-converted PSK signal with the digital control bits, so the amplitude and the phase change occur simultaneously.

For a 1 MSym/s 16 QAM measurement at 5.25 GHz, the PA achieves an average  $P_{out}$  of 1.9 dBm with a DE of 34% and SE of 18%. It also achieves an  $EVM_{rms}$  of -20.5 dB and ACLR of -21.4 dBc. The biggest limitation to reduce  $EVM_{rms}$  is the AM-PM non-linearity that causes the inner constellation points to rotate with respect to the outer points, as illustrated in Fig. 2.25. This degradation gets worse in PBO for the code words that correspond to optimal drain efficiency, as shown in Fig. 2.22. Since no AM-PM linearization or digital pre-distortion (DPD) is applied in this work, the look-up table of code words is used to trade-off some optimal drain efficiency points for



Figure 2.26: For a 1 MSym/s 16 QAM waveform at 5.25 GHz, measured (a) DE in PBO compared to normalized class B waveform, and (b)  $EVM_{rms}$  and ACLR in PBO. (c)  $P_{out}$ , DE, (d) ACLR, and  $EVM_{rms}$  are measured for robustness with respect to symbol rate.



Figure 2.27: Far-out spectrum for a 5 MSym/s 10× oversampled 16 QAM waveform at 5.25 GHz.

the ones that exhibit improved AM-PM to lower  $P_{out}$ , while maintaining  $EVM_{rms}$  close to -20 dB, as depicted in Fig. 2.26(a) and (b). Further, the PA is also tested at maximum output power with symbol rates from 0.5 MSym/s to 5 MSym/s for robustness, and no major changes in DE,  $P_{out}$ ,  $EVM_{rms}$ , and ACLR are observed, as shown in Fig. 2.26 (c) and (d). Since the targeted application

for the implemented PA is low-power ad-hoc sensor network nodes, the layout of the digital lines and the baseband circuitry were designed to be able to demonstrate a modulation capability of 1 MSym/s, and they limit the modulation performance of the PA at higher bandwidths.

Fig. 2.27 depicts the far-out spectrum of the PA for a 5 MSym/s 16 QAM measurement at 5.25 GHz. Since the RF input of the PA is driven with a  $10 \times$  oversample rate, sampling images occur 50 MHz apart. These images are below -34 dBc.

# 2.6 Conclusion

This work demonstrates a differential digital four-way Doherty PA that achieves efficiency enhancement up to 9 dB PBO. The PA has been implemented in a 65 nm CMOS process and achieves a  $P_{out}$  of 7.3 dBm at 4.75 GHz with peak DE and SE of 48% and 31%, respectively. Table 2.1 compares the results of this work to other efficiency enhancement state-of-the-art low-power (LP) PAs ( $P_{out,max} < 10$  dBm), high-power (HP) PAs with operating frequency higher than 3.5 GHz, and four-way Doherty PAs. The implemented PA achieves competitive peak DE and SE, while performing better than all non- $V_{DD}$  switching PAs in DE PBO. To the best of the authors' knowledge, this is the first paper that demonstrates a PA with efficiency enhancement in deep PBO (> 6 dB) for sub 10 dBm output power applications.

### 2.7 Contributions

- J. Sheth and S. M. Bowers, "A Differential Digital 4-Way Doherty Power Amplifier with 48% Peak Drain Efficiency for Low Power Applications," 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020, pp. 119-122, doi: 10.1109/RFIC49505.2020.9218395.
- J. Sheth and S. M. Bowers, "A Four-Way Nested Digital Doherty Power Amplifier for Low-Power Applications," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 69, no. 6, pp. 2782-2794, June 2021, doi: 10.1109/TMTT.2021.3057895.

|                                         | Work                                                                                                                                                         | 0                         | <b>d</b>              | 'ay                | erty     | m         |                   | 5.25            | 6.5                          | 14                                      |                             | 42   26           | 36   21                 | 28   15              | 23   12                                      |               | ym/s       | MM                 | 1.9                              | 34               | -21.4               | -20.5               |
|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|-----------------------|--------------------|----------|-----------|-------------------|-----------------|------------------------------|-----------------------------------------|-----------------------------|-------------------|-------------------------|----------------------|----------------------------------------------|---------------|------------|--------------------|----------------------------------|------------------|---------------------|---------------------|
|                                         | This /                                                                                                                                                       | Ž                         | [7]                   | 4-w                | Doh      | 65 1      | 3                 | 4.75            | 7.3                          | 14                                      |                             | 48 31             | 37   22                 | 35   18              | 27   12                                      | -             | 1 MS       | 16 Q               | 3.1                              | 35               | -21.7               | -22.5               |
| _                                       | [51]                                                                                                                                                         | No                        | LP                    | 2-way              | Ooherty# | 28 nm     | $0.525^{@}$       | 2.15            | -2*                          | n/a                                     |                             | 34*   23*         | 29*   16* ]             | 24*   11*            | 18*   7*                                     |               | 50 Mbps    | 16 QAM             | -7.66                            | n/a              | -30.1               | -24.7               |
|                                         | [21]                                                                                                                                                         | No                        | HP                    | 4-way SCPA Doherty | I        | 65 nm     | 2.1               | 3.82            | 27.3                         | 16.8                                    |                             | 30.2   27.5*      | 28*   n/a   3           | 22*   n/a    2       | 16*   n/a                                    |               | .5 MSym/s  | 16 QAM             | 21.8                             | 22.1             | -21.8               | -25                 |
| _                                       | [32]                                                                                                                                                         | No                        | HP                    |                    | Doherty  | 40 nm     | 0.8               | 1.5             | 21.4                         | n/a                                     |                             | 31.3†             | 28*,†<br>27.7†<br>21*,† | 20 MHz (             | 64 QAM                                       | 15.2          | 25.3†      | -30.4 <sup>‡</sup> | -32.5 <sup>‡</sup>               |                  |                     |                     |
|                                         | [59]                                                                                                                                                         | No                        | HP                    | MGTR               | Doherty  | 55 nm     | 6                 | 5.8             | 27.2                         | 15*                                     | 10 n/a n/a 12 CW Efficiency | 24.5 <sup>†</sup> | 21*,†                   | 13*,†                | 8*,†                                         | RF Modulation | 40 MSym/s  | 64 QAM             | 20.6                             | $9.4^{\dagger}$  | -33.2               | -32.8               |
|                                         | [58]                                                                                                                                                         | No                        | НР                    | Outphasing         |          | 40 nm     | 2.5               | 5.9             | 22.2                         | n/a                                     |                             | 49.2   34.9       | 36*   25*               | 25*   19*            | 18*   13*                                    |               | 20 MHz     | 64 QAM             | 16.4                             | 23.3             | n/a                 | -30**               |
|                                         | [33]                                                                                                                                                         | Yes                       | HP                    | Class G Vlt.       | Doherty  | 45 nm SOI | n/a               | 3.5             | 25.3                         | n/a                                     |                             | $30.4^{\dagger}$  | $26^{*,\dagger}$        | $25.3^{\dagger}$     | 19*,†                                        |               | 10 MHz     | 256 QAM            | 17.1                             | $21.4^{\dagger}$ | <-45 <sup>††</sup>  | -40.1 <sup>††</sup> |
|                                         | [0]                                                                                                                                                          | es                        | IP                    | G Curr.            | nerty    | nm        | 2                 | 4.3             | 26.1                         | n/a                                     |                             | 136.2   n/a       | 34*   n/a               | 29.3   n/a           | 27*   n/a                                    |               | Sym/s      | QAM                | 20.1                             | 27.2             | -26.5 <sup>\$</sup> | -30\$               |
|                                         | [2                                                                                                                                                           | γ                         | F                     | Class (            | Doł      | 92        | 3                 | 3.71            | 26.7                         | 16                                      |                             | 40.2   n/a        | 34*   n/a               | 37   n/a             | 32*   n/a                                    |               | 1 MS       | 16 (               | 20.8                             | 28.8             | -21 <sup>\$</sup>   | -24 <sup>\$</sup>   |
|                                         |                                                                                                                                                              | V <sub>DD</sub> Switching | High-Power/ Low-Power | Architecture       |          | CMOS Node | Die Size $(mm^2)$ | Frequency (GHz) | $P_{out,max}~(\mathrm{dBm})$ | Gain at <i>P<sub>out,max</sub></i> (dB) |                             | Peak DE   SE (%)  | 3 dB PBO DE $ $ SE (%)  | 6 dB PBO DE   SE (%) | 9 dB PBO DE $ $ SE (%)                       |               | Modulation |                    | Average $P_{out}~(\mathrm{dBm})$ | Modulated DE (%) | ACLR (dBc)          | EVM (dB)            |
| ⊥<br># Integra<br><sup>\$</sup> After A | Integrated in a transmitter <sup>@</sup> Active area only *Estimated from reported Fig.<br>After AM-PM Linearization <sup>††</sup> AM-AM/-PM LUT **After DPD |                           |                       |                    |          |           |                   |                 |                              |                                         |                             |                   |                         | g.                   | <sup>†</sup> PAE<br><sup>‡</sup> Look-Up Tal |               |            |                    |                                  |                  |                     |                     |

Table 2.1: Comparison to CMOS PAs with PBO Efficiency Enhancement

<sup>‡</sup>Look-Up Table

# **Chapter 3**

# Design of an Asymmetric Current-Mode Multi-phase Digital Doherty Transmitter Using a Single Footprint Transformer-Based Matching Network

# 3.1 Introduction, Motivation, and Prior Art

Chapter 1 introduced the advantages of a digital transmitter, along with the need to improve efficiency in PBO. To address this need, chapter 2 illustrated a nested Doherty architecture to achieve efficiency enhancement in deep PBO, along with a component consolidation technique to realize a compact design. However, this design provides a differential output that limits the use of the transmitter to only a subset of antennas; a single-ended output is preferred. Although a balun can be used to convert the differential signal to a single-ended signal, it adds loss to the output network, thereby reducing efficiency, while occupying additional area. Further, the PA in chapter 2 is implemented in a polar fashion, and as Section 1.2.3 mentions, polar PAs suffer from massive bandwidth expansion that limits the data-rate of the transmitter [13–15, 18–20, 32, 52, 60, 61]. To overcome these challenges, this chapter proposes and implements an asymmetric multi-phase digital Doherty transmitter using a single footprint transformer-based matching network.

In the literature, digital quadrature architectures have been explored, since they do not suffer from the bandwidth expansion problem [62]; however, they result in a 6 dB worst-case lower output power, along with reduced efficiency. To overcome this issue, the IQ cell-sharing technique gained interest [25, 27, 63, 64]; however, it still results in a worst-case 3 dB lower output power, and thus lower efficiency. Taking advantage of PA non-linearity, a diamond-shaped input code-word profile has been shown to achieve an almost constant output power profile [26]; however, this technique still suffers from efficiency degradation due to enormous in-phase/quadrature vector overlap, as shown in Section 3.2.1. Utilizing a multi-phase architecture can reduce this overlap, and thereby improve performance [65–67]; therefore, in this work, a multi-phase architecture is proposed to achieve an almost constant output power profile, while also improving efficiency.

To implement a quadrature or a multi-phase transmitter, a set of basis vectors is needed. Almost every quadrature and multi-phase architecture in the literature uses an even integer multiple of the transmitter operating frequency along with a series of flip-flops to generate the basis vectors [23, 25, 64, 67–69]. Although this technique can achieve a wideband multi-phase generation, it is difficult to scale with increasing numbers of basis vectors. For example, the generation of eight basis vectors at 6.5 GHz would need a prohibitive 26 GHz input. In this work, injection locking a ring oscillator using the outputs of a polyphase filter is proposed to achieve a relatively wideband multi-phase generation, while still using the transmitter operating frequency as the input.

Finally, to achieve efficiency enhancement in output power back-off, numerous techniques such as class-G [20, 33, 34], out-phasing [58], and sub-harmonic switching [70] have been explored in the literature; however they are challenging to implement. The current-mode class G technique needs complex power management circuits, the out-phasing technique needs complex base-band processing, and the sub-harmonic switching technique usually occupies a significant amount of area. A two-way Doherty offers a potential solution; however, it achieves efficiency enhancement

only up to 6 dB back-off. Therefore, in this work, an asymmetric two-way Doherty is used to improve efficiency enhancement beyond 6 dB PBO. This Doherty network can be implemented using transformers to achieve a single-ended output without the need for an additional balun [21,71, 72]. However, the resulting output matching network is composed of two transformers that occupy a considerable amount of on-chip area. Therefore, to achieve a compact form-factor, a transformer-within-transformer architecture that achieves efficiency enhancement up to 9.5 dB PBO is proposed in this work.

To highlight the proposed techniques, an eight-phase asymmetric digital Doherty transmitter with a compact matching network is implemented in a general-purpose 65 nm process. It attains more than 20 dBm output power and more than 31% DE from 4.5 GHz to 6.7 GHz. At 8 dB PBO, it achieves a DE of 23% and 24% at 6.5 GHz and 7.0 GHz, which corresponds to a  $1.76 \times$  and  $1.93 \times$  improvement compared to normalized class B PA, respectively. Finally, the transmitter also achieves a 21% DE and an average  $P_{out}$  of 14 dBm with an  $EVM_{rms}$  of 4.1% for a 20 MSym/s 64-QAM waveform at 6.5 GHz.

The rest of this chapter is divided into the following sections. Section 3.2 provides a discussion on the design of a multi-phase architecture, wideband multi-phase generation technique, and an asymmetric Doherty network. This section also derives design equations for implementing an asymmetric Doherty network that achieves efficiency enhancement at any desired PBO level and at any operating frequency. The implementation of all the circuit blocks including the eight-phase architecture, eight-phase injection locked eight-stage oscillator, and the single footprint transformer-based matching network is included in Section 3.3. CW and modulated measurements are shown in Section 3.4, and conclusions are drawn in Section 3.5.
# 3.2 Design Techniques for a Multi-phase Asymmetric Doherty Architecture

### 3.2.1 Multi-phase Architecture

The operation of the proposed multi-phase digital PA architecture is depicted in Fig. 3.1. At a high level, the PA consists of a total of p cells, where m cells are driven with  $\vec{\phi_A}$ , n cells are driven with  $\vec{\phi_B}$ , and the remaining [p - (m + n)] cells are turned off, such that

$$m+n \le p \tag{3.1}$$

The values of m and n are decided based on the desired output amplitude and phase, and they can be changed dynamically during the operation of the transmitter, as shown in the bottom part of Fig. 3.1. Therefore, all of the p cells can be driven by either  $\vec{\phi_A}$  or  $\vec{\phi_B}$ , or a combination of  $\vec{\phi_A}$  and  $\vec{\phi_B}$ to achieve any desired output phase between  $\vec{\phi_A}$  and  $\vec{\phi_B}$  (In this work,  $\vec{\phi_A}$  and  $\vec{\phi_B}$  are assumed to be 50% duty cycled square waves). However, there are implications on output amplitude ( $|V_{out}|$ ) and efficiency as the desired output phase ( $\angle V_{out}$ ) moves away from the basis vectors  $\vec{\phi_A}$ ,  $\vec{\phi_B}$ .

Normally, for a linear digital PA, such a multi-phase configuration would result in the output  $(V_{out})$  given by a linear vector summation:

$$V_{out} = m.\vec{\phi_A} + n.\vec{\phi_B} \tag{3.2}$$

This could significantly limit the maximum output amplitude  $(|V_{out,max}|)$  contour described by m + n = p, with the worst-case scenario happening when m = n = p/2. For example, if  $\vec{\phi_A} = 0^\circ$  and  $\vec{\phi_B} = 90^\circ$ , this contour exhibits a 3 dB reduction to achieve  $\angle V_{out} = 45^\circ$ . Such an effect is undesirable and degrades the performance of the transmitter.

However, a current-mode digital PA is non-linear, and this non-linearity is advantageous to achieve an almost constant  $|V_{out,max}|$  contour. Consider the worst-case scenario of m = n = p/2 explained using an ideal switch model for the transistors, as shown in Fig. 3.2. For simplicity of explanation, the switches are assumed to exhibit zero on-resistance, the RF chokes (RFC) have been introduced



Figure 3.1: A digital power amplifier with a total of p cells, where m cells are driven with  $\phi_A / \overline{\phi_A}$ , n cells are driven with  $\phi_B / \overline{\phi_B}$ , and the remaining [p - (m + n)] cells are turned off to achieve the desired output amplitude and phase.



Figure 3.2: A simplified model of a digital PA using a multi-phase architecture, where the difference in the two basis vectors corresponds to an increased duty-cycle (D) of the on-time  $(t_1)$  of the switch, which leads to a boost in the voltage waveform across the switch.

to provide the needed biasing for the switches, and the resonant tank components  $L_{OMN}$  and  $C_{PA}$  from Fig. 3.1 have been omitted as their role is only to allow the fundamental component of current to flow through the load. The assumption of zero on-resistance also allows consolidation of the m and n cells of Fig. 3.1 into a single switch with an increased duty cycle, where the switch is closed for time  $t_1$ , and open for time  $t_2$ , and values of  $t_1$  and  $t_2$  are related to the phase difference between  $\vec{\phi_A}$  or  $\vec{\phi_B}$ . The duty cycle (D) is given by:

$$D = \frac{t_1}{t_1 + t_2}$$
(3.3)

In order to determine the steady-state operation of the PA, consider two snapshots in time:

Switch is closed: During this time  $t_1$ , the RFC is directly connected in series with the voltage source  $V_{DD}$ , thereby becoming magnetized, and exhibiting an increase in the build-up of flux  $(\Delta \phi_1)$ , which can be calculated as follows:

$$V_{DD} = L_{RFC}\left(\frac{\Delta I_{RFC}}{\Delta t}\right) = L_{RFC}\left(\frac{\Delta I_{RFC}}{t_1}\right),\tag{3.4}$$

and by definition,

$$\Delta \phi_1 = L_{RFC} \cdot (\Delta I_{RFC}) = L_{RFC} \cdot (\frac{V_{DD} \cdot t_1}{L_{RFC}}) = V_{DD} \cdot t_1$$
(3.5)

Switch is open: During this time  $t_2$ , the voltage across the switch  $(V_{L1})$  increases due to the current  $I_{RFC}$  flowing through the load. This leads to a decrease in the build-up of flux  $(\Delta \phi_2)$  in the RFC:

$$\Delta \phi_2 = (V_{L1} - V_{DD}).t_2 \tag{3.6}$$

In steady-state, the RFC experiences no net change in its flux, implying:

$$\Delta \phi_1 = \Delta \phi_2 \tag{3.7}$$

$$V_{DD}.t_1 = (V_{L1} - V_{DD}).t_2$$
(3.8)

$$V_{L1} = V_{DD} \cdot \left(\frac{t_1 + t_2}{t_2}\right) = \frac{V_{DD}}{1 - D}$$
(3.9)

When the PA is operating completely along one of the basis vectors,  $t_1 = t_2$ , resulting in D = 0.5. Equation (3.9) shows that  $V_{L1} = 2 \times V_{DD}$ , just as indicated in Fig. 3.2. (Note that although



Figure 3.3: For the scenario m = n = p/2,  $\vec{\phi_A} = 0^\circ$ , and  $\vec{\phi_B}$  swept from  $0^\circ$  to  $120^\circ$  (a) the fundamental component of the output voltage swing shows less than 0.6 dB variation; however, (b) the efficiency is significantly reduced. The results from a switch model with a finite on-resistance  $(r_{sw})$  are in close agreement with the switches implemented as cascode cells from a general-purpose 65 nm CMOS process.

the above equations are derived for m = n = p/2, assuming both  $\vec{\phi_A} = \vec{\phi_B} = 0^\circ$  results in the same condition as though the PA is operating along one of the basis vectors. This allows for easy comparison between the best-case and the worst-case scenarios of  $|V_{out,max}|$ ). However, when the PA is operating in the middle of the two vectors, the resulting D increases, leading to a boost in the voltage across the switch. For example, if  $\vec{\phi_A} = 0^\circ$  and  $\vec{\phi_B} = 90^\circ$ ,  $t_1 = 3.t_2$ , resulting in D = 0.75. Equation (3.9) shows that  $V_{L1} = 4 \times V_{DD}$ , just as indicated in Fig. 3.2. So although the voltage difference across the load is non-zero for a shorter span of time, as the D increases, the resulting voltage boost compensates for this behavior. Therefore, at a high-level, a non-linear digital PA does not suffer from the limitation of reduction  $|V_{out,max}|$  contour, as exhibited by its linear counterpart. It is important to note that a two-dimensional AM-PM look-up table would be required to account for the non-linear summation, as explained in Section 3.4.1.

In practice, the switches exhibit a finite on-resistance  $(r_{sw})$ , which will also impact the reduction



Figure 3.4: For m+n = p (maximum output amplitude ( $|V_{out,max}|$ ) contour), a quadrature architecture (red) exhibits severe performance degradation, with a worst-case efficiency of 42%, while its 8-phase counterpart (green) maintains a relatively high worst-case efficiency of 69%, highlighting a 1.64× improvement.

in the  $|V_{out,max}|$  contour, along with the phase difference between  $\vec{\phi_A}$  and  $\vec{\phi_B}$  [73–75]. Therefore, in order to understand the overall practical behavior of the proposed architecture, the schematic shown in Fig. 3.1 is simulated with 100 switches (p = 100) with  $r_{sw} = 90\Omega$  per switch, for the condition m = n = p/2, where  $\vec{\phi_A} = 0^\circ$ , and  $\vec{\phi_B}$  is swept from  $0^\circ$  to  $120^\circ$ . The resulting worst-case  $|V_{out}|$ , shown in Fig. 3.3(a), exhibits less than 0.6 dB variation for  $\vec{\phi_B}$  between  $0^\circ$  and  $90^\circ$ . This result is also confirmed in a general-purpose 65 nm CMOS process, as depicted in Fig. 3.3(a), where the switches are implemented using a cascode unit cell. This behavior highlights the advantage of the non-linear summation of  $\vec{\phi_A}$  and  $\vec{\phi_B}$ .

Although the phase difference between  $\vec{\phi_A}$  and  $\vec{\phi_B}$  does not significantly affect the  $|V_{out,max}|$  contour, it has implications on the overall efficiency as  $\angle V_{out}$  is swept away from the basis vectors. As this phase difference increases, the switches are turned on for a longer period of time ( $t_1$  increases), and the power dissipated in the finite  $r_{sw}$  increases, thereby reducing efficiency. The setup explained above is also used to simulate efficiency using the same switch model, and the results are shown in Fig. 3.3(b). The efficiency drops by 33%, a severe degradation, when  $\vec{\phi_B} =$ 



Figure 3.5: Time-domain voltage waveforms at the drain of cascode cells when the PA is operated along the basis vectors (m = p, n = 0 or n = p, m = 0) and at the middle of the two basis vectors (m = n = p/2), for both (a) the quadrature and (b) the 8-phase architecture. The voltage peak exceeds  $3 \times V_{DD}$  for the quadrature architecture, while it is relatively tamer for the 8-phase counterpart, alleviating reliability concerns related to the breakdown of the devices.

90°. Reducing the phase difference ameliorates this issue, i.e. the efficiency drop is only 7% for  $\vec{\phi_B} = 45^\circ$ , and it is almost negligible (about 1%) for  $\vec{\phi_B} = 22.5^\circ$ . Again, these results are in close agreement with the switches implemented using the cascode unit cell. Overall, Fig. 3.3(a) and (b) lead to the conclusion that reducing the phase difference between  $\vec{\phi_A}$  and  $\vec{\phi_B}$  is desirable.

To build a functioning transmitter capable of handling complex modulation schemes such as QAM,  $V_{out}$  needs to cover the entire constellation. Therefore, reducing the phase difference between  $\vec{\phi_A}$ and  $\vec{\phi_B}$  increases the total number of required vectors. The transmitter then dynamically switches  $\vec{\phi_A}$  and  $\vec{\phi_B}$  to the two adjacent vectors such that the desired  $V_{out}$  is encompassed by them. Therefore, for  $\vec{\phi_A} = 0^\circ$  and  $\vec{\phi_B} = 90^\circ$ , only 4 (quadrature) vectors are required, while this number increases to 8 and 16 as  $\vec{\phi_B}$  decreases to 45° and 22.5°, respectively. The phase generation design, explained in Section 3.2.2, is not able to generate 16 equally spaced phase-shifted signals around 6.5 GHz due to the higher delay of the older 65 nm node; therefore, an 8-phase architecture is chosen for this work. However, based on the application, if the operating frequency is lower, or if a newer (smaller node) CMOS process is affordable, a 16-phase architecture could be chosen.

#### Advantages of an 8-phase architecture:

This section details the advantages of an 8-phase architecture over a quadrature architecture. All the simulations shown here are performed in a general-purpose 65 nm process with 100 digital cells, where each cell is composed of switches implemented using a cascode cell (common source + common gate). The common source device is of size 16.64  $\mu$ m and the common gate device is of 11.00  $\mu$ m.

- Increased overall efficiency: As explained above, the multi-phase architecture achieves the highest efficiency when it is being operated along one of the basis vectors, and the efficiency degrades as ∠V<sub>out</sub> is swept away from them. Fig. 3.4 compares the efficiency of an 8-phase architecture with its quadrature counterpart for the case m + n = p (i.e. |V<sub>out,max</sub>| contour). As expected, both the architectures achieve an identical maximum efficiency of 76% when operated along the basis vectors (m = p, n = 0 or n = p, m = 0); however, the efficiency severely degrades for the quadrature architecture when ∠V<sub>out</sub> is around the middle of the two basis vectors (m = n = p/2), with the worst-case efficiency of 42%. On the other hand, the 8-phase architecture maintains a relatively high worst-case efficiency of 69%, highlighting a 1.64× improvement. This improvement results in an increased average efficiency when the PA is used to transmit complex modulation schemes such as QAM.
- Improved reliability: Again, as explained above, when ∠V<sub>out</sub> moves away from the basis vectors, it increases the overall duty-cycle of the switches, thereby leading to a voltage boost. Although this mechanism minimizes the variation in |V<sub>out,max</sub>| contour, it could cause reliability concerns. Fig. 3.5(a) and (b) show the time-domain voltage waveforms across the



Figure 3.6: Input code word constellation for (a) a quadrature and (b) an 8-phase digital PA resulting in a more circular output constellation (c) and (d) due to PA non-linearity. The 8-phase architecture also achieves a higher resolution since the total number of code words is distributed in a smaller sector of the constellation.

switch (at the drain of a cascode cell) for both the quadrature and the 8-phase architectures when the PA is operated along the basis vectors (m = p, n = 0 or n = p, m = 0) and at the middle of the two basis vectors (m = n = p/2). Compared to Fig. 3.2, the shape of the waveforms in Fig. 3.5 is no longer a square wave due to the presence of the resonant tank. These figures show that while the voltage peak exceeds  $3 \times V_{DD}$  for the quadrature architecture, it is relatively tamer for its 8-phase counterpart, alleviating reliability concerns related to the breakdown of the devices.

• Increased constellation resolution: Since the 8-phase architecture has double the number

of basis vectors compared to the quadrature architecture, the adjacent basis vectors encompass a smaller sector of the constellation. Therefore, the total code words are distributed in the smaller sector, resulting in a higher resolution for the entire constellation. Fig. 3.6(a) and (b) show the input code word constellation for a quadrature and an 8-phase digital PA, respectively. The resulting output constellation depicted in Fig. 3.6(c) and (d) is more circular due to PA non-linearity, as explained above. Also, the output constellation of the 8-phase architecture shows higher resolution compared to its quadrature counterpart, although both of them have the same number of code words on their basis vectors. (The curves in the output contours of both the architectures are due to the AM-PM non-linearity of the cascode cells.) This improvement in resolution translates to an improved quantization noise floor, which helps with reducing the EVM and the ACLR of the transmitter.

#### 3.2.2 Multi-phase Generation

The advantages of a multi-phase architecture are detailed in Section 3.2.1, and in order to implement a transmitter based on the eight-phase architecture, eight equally spaced phase-shifted signals at the operating frequency need to be generated on-chip. Given that WiFi 7 covers almost 2 GHz of spectrum from 5.17 GHz to 7.125 GHz, it is desirable to use only one transmitter to cover the entire range. Therefore, the multi-phase generation circuits need to operate over a wide range of frequencies to enable frequency re-configurable transmitters. This also ensures that the input network of the transmitter does not become the bottleneck of the RF bandwidth, thereby easing the design of the output matching network of the power amplifier. Therefore, an eight-phase generation circuit with more than 2 GHz of operating frequency around 6.5 GHz is desired.

#### Potential solutions and their disadvantages:

• Flip-flop based frequency dividers: A well-known solution in the literature to generate quadrature signals is through the use of two flip-flops and a differential signal at 2× the operating frequency [27], as shown in Fig. 3.7(a). Here, the flip-flops are used as dividers,



Figure 3.7: Potential solutions to generate 8 equally spaced phase-shifted signals around 6.5 GHz through (a) flip-flop based frequency dividers, (b) polyphase filter (PPF), and (c) a two phase (red) or eight phase (red + pink) injection locked eight-stage ring oscillator (2P-8ILRO or 8P-8ILRO)

where the inverted output  $\overline{Q}$  is connected to the input D of the flip-flop, and each of the differential signals is sent to the clock inputs. The outputs Q and  $\overline{Q}$  generate quadrature signals with very low phase error for a wide range of input frequencies owing to the broadband nature of the flop-flops. This technique can also be extended to generate 8 equally spaced phase shifted signals through the use of cascaded flip-flops and  $4\times$  the operating frequency at the input [67,68], as illustrated in Fig. 3.7(a). Although this technique provides the desired performance, an 8-phase implementation operating from 5 GHz to 7 GHz would require an input frequency from 20 GHz to 28 GHz, which is cumbersome.

• **Polyphase filters (PPF):** Another well-known approach to generate multi-phases is through the use of a polyphase filter [76,77], as shown in Fig. 3.7(b). Unlike flip-flop based dividers,

the PPF does not require a multiple of operating frequency; however, this technique is relatively narrow band. Fig. 3.8(a) shows the simulated deviation from the desired phases of a three-ring PPF designed using ideal resistors and capacitors, and whose outputs are followed by a bias tee and NOT gates to generate the rail-to-rail signals needed to drive a digital transmitter. Since the output amplitudes of the PPF are not symmetric, the non-linearity of the NOT gates translates the amplitude asymmetry into phase error. Therefore, even an ideal PPF exhibits deviation from the desired phases at the center frequency, as seen in 3.8(a). This issue is exacerbated when the operating frequency is varied from the center frequency as the PPF provides the desired phase shift only at  $\omega = 1/(RC)$ . For example, the outputs generating 90°/ 270° signals experience about 5° error around 5 GHz, resulting in signals with 95° / 275° phase shift. Such an error is undesirable as it degrades the linearity of the transmitter, which affects EVM and ACLR. Although adding more PPF stages can improve the bandwidth and reduce this error, every additional stage adds about a 3 dB loss, which will reduce the system efficiency of the transmitter.

• Two-phase injected eight-stage ring oscillator (2P-8ILRO): An even-stage ring oscillator is another known technique for a multi-phase generation. Since a free-running ring oscillator suffers from poor phase noise, it can be injection locked to a cleaner RF source, as shown in Fig. 3.7(c). Simulation of a 2P-ILRO, shown in Fig. 3.8(b), indicates that this technique can achieve multi-phase generation with low error, as long as the injected frequency  $(f_{inj})$ is close to the natural frequency  $(f_{nat})$  of the oscillator. However, the error starts increasing as  $f_{inj}$  moves away from  $f_{nat}$ , as seen in Fig. 3.8(b). Furthermore, the locking range of a 2P-ILRO is relatively narrow, thereby requiring  $V_{DD}$  tuning of the oscillator, which is also cumbersome.

#### Proposed eight-phase injected eight-stage ring oscillator (8P-8ILRO):

To overcome the challenges of multi-phase generation mentioned above, an 8P-8ILRO is proposed in this work. It has been shown in the literature that a four-stage ring oscillator with injection signals provided to all four of the quadrature nodes significantly reduces phase error, even if the



Figure 3.8: Error (deviation) from the desired eight equally spaced phase-shifted signals around 6.5 GHz when generated through (a) a PPF, (b) a 2P-8ILRO, and (c) an eight-phase injection locked eight-stage ring oscillator (8P-ILRO)

injected signals themselves exhibit high phase error [78]. Further, this technique also improves the locking range of the ILRO, thereby easing the requirement of the injected frequency to be closely aligned with the natural frequency of the oscillator [79]. In this work, this technique has been extended to generate eight phases by combining the above-mentioned PPF and ILRO techniques. The outputs of the PPF are injected into all eight nodes of the 8ILRO, as opposed to only two nodes, as depicted in 3.7(c). Although the PPF still exhibits significant phase error, the multiphase outputs of the proposed technique exhibit  $< \pm 0.4^{\circ}$  error over the entire operating range, as shown in Fig. 3.8(c). Further, the locking range of the 8P-8ILRO is increased to almost 4 GHz, compared to less than 1 GHz offered by the 2P-8ILRO counterpart. Also, this technique naturally does not require the input frequency to be a multiple of the operating frequency, unlike the flip-flop based dividers. Therefore, an 8P-8ILRO technique is proposed for the multi-phase generation in this work.

#### **3.2.3** Transformer Matching Network for Asymmetric Doherty PA:

The use of switch-mode PAs and 8-phase architecture, explained in Section 3.2.1, improves efficiency for the  $|V_{out,max}|$  contour; however, the efficiency degrades drastically in output power back-off. The proposed 8-phase PA can be combined with Doherty architecture to improve backoff efficiency; however, the classical Doherty architecture achieves efficiency enhancement only up to 6 dB PBO. In order to achieve high data-rates, while maintaining high spectral efficiency, modern communication systems use higher-order complex modulation schemes, such as 128-QAM. These signals exhibit high PAPR; therefore, PA architectures with efficiency enhancement in deep PBO are desired to improve the average efficiency of the transmitter. Further, most sub-7GHz PAs are implemented using differential architectures to provide a virtual short on-chip at the fundamental frequency, since that prevents the matching network from de-tuning due to the parasitics of the wirebonds. However, the output of the PA is desired to be single-ended to allow the transmitter to be used with single-ended antennas. Normally, a balun would be used after the matching network to convert a differential signal to a single-ended signal; however, baluns use spiral structures that



Figure 3.9: (a) A transformer-based asymmetric series Doherty architecture using (b) asymmetric current drive strengths of the main and the peaking amplifier, (c) to achieve the optimal impedances at 0 dB and  $20.\log(\alpha)$  dB back-off, and thereby leading to (d) efficiency enhancement at  $20.\log(\alpha)$  dB back-off.

consume area on-chip.

A transformer-based matching network has the potential to provide the desired solution, as it can achieve the functionality of both the matching network and the balun in a smaller area. This technique can be extended with the use of two transformers to even provide the behavior of Doherty architecture. Further, with the use of an asymmetric Doherty design, the efficiency enhancement can be increased to deep PBO. Such a network can be implemented either in a parallel [20, 59] or a series [71, 72] fashion. However, the parallel transformer adds the currents from the main and the peaking PA to the output, causing the load resistance ( $R_L$ ) to transform up, resulting in a higher than  $R_L$  impedance seen by each of the PAs. Since CMOS PAs are voltage-limited, and therefore desire low impedance to output higher power, additional matching networks or multi-turn



Figure 3.10: (a) Implementation of a transformer-based asymmetric series Doherty matching network, where  $k_1$  and  $k_2$  represent the magnetic coupling coefficients of the main and peaking transformers, respectively.  $L_{res1}$  and  $L_{res2}$  represent the inductors used to resonate the parasitic capacitors of the PAs, and the transmission line is implemented using an equivalent high-pass T LC network. Taking advantage of the parallel magnetic inductance and the series leakage inductance of a practical transformer, (b) the entire network can be implemented simply using two transformers (Lp1-Ls1 and Lp2-Ls2), and a capacitor ( $C_{OMN}$ ), to achieve a compact design.

transformers are needed. Such a solution is not desirable as it increases the loss of the network and reduces efficiency. On the other hand, the series architecture adds the voltages from the main and the peaking PA to the output, and thereby naturally provides lower impedance to both the PAs. Therefore, a series Doherty architecture is implemented in this work.

Fig. 3.9(a) shows the schematic of a transformer-based asymmetric Doherty architecture, which consists of two transformers driven by a differential main and a differential peaking PA. The voltage supply for the PAs is provided through the virtual shorts at the center-tap of the primary coil of the transformers. The secondary coils of the two PAs are connected in series to achieve a singleended output, and a 90° transmission line with a characteristic impedance of  $R_L/\alpha$  is placed in series with the secondary coils. The peaking PA is driven 90° out of phase with respect to the main PA to account for the delay in the transmission line.

To achieve efficiency enhancement in PBO, the current drive strengths of the main and peaking PAs need to satisfy:

$$\frac{\max(I_{Peak})}{\max(I_{Main})} = \alpha - 1 \tag{3.10}$$

as shown in Fig. 3.9(b), where  $\alpha$  denotes the back-off level on a linear scale where the peaking amplifier turns off and efficiency enhancement is achieved [56]. The resulting impedances seen by the main and peaking PAs are shown in Fig. 3.9(c), where  $Z_{Main} = (\alpha - 1).Z_{Peak}$  at 0 dB PBO, as desired, since the peaking PA is  $(\alpha - 1)$  times bigger than main PA to satisfy the current drive strength requirement. As the peaking PA turns off, the impedance seen by the main PA increases by a factor of  $\alpha$ , due to the nature of the 90° transmission line, leading to the desired efficiency enhancement at 20.log( $\alpha$ ) dB PBO, as shown in Fig. 3.9(d).

The transmission line, shown in Fig. 3.9(a), is not practical to implement on chip for sub-7 GHz designs due to its large size and therefore needs to be implemented using lumped component approximations. Although high-pass ( $\pi$  and T) and low-pass ( $\pi$  and T) equivalent LC networks all provide viable solutions, the high-pass T network will result in a compact design, as shown in Fig. 3.10(a) and (b). Here,  $k_1$  and  $k_2$  represent the magnetic coupling coefficients of the main and peaking transformers, respectively. Inductors  $L_{res1}$  and  $L_{res2}$  have been added to resonate the parasitic capacitors of the two PAs, and the transmission line is implemented using an equivalent high-pass T LC network, where the value of the inductance is given by  $Z_0/\omega = R_L/(\omega.\alpha)$  and the capacitance is given by  $1/(Z_0.\omega) = \alpha/(\omega.R_L)$  [72]. Given that a practical transformer with a coupling coefficient k provides a parallel magnetizing inductance given by  $k^2.L$ , and a series leakage inductance given by  $(1 - k^2.L)$ , they can be used to produce the necessary inductance for  $L_{res}$  and  $L_{Tline}$ , as shown in Fig. 3.10(a). Therefore, the entire asymmetric series Doherty matching network can be implemented simply using two transformers ( $L_{p1}$ - $L_{s1}$  and  $L_{p2}$ - $L_{s2}$ ), and a capacitor ( $C_{OMN}$ ), to achieve a compact design, as highlighted in Fig. 3.10(b).



Figure 3.11: A transformer-based asymmetric series Doherty network showing impedances and current labels for helping with the derivation of the design methodology.

# Derivation of equations for the design of a transformer-based series asymmetric Doherty matching network:

This section derives the equations and explains the design methodology to implement a transformerbased asymmetric series Doherty matching network with any desired back-off efficiency enhancement level, and at any desired frequency. Fig. 3.10 and 3.11 are used as a references throughout the derivation, where  $n_1$  and  $n_2$  are defined as:

$$n_1 = \sqrt{\frac{L_{s1}}{L_{p1}}}$$
  $n_2 = \sqrt{\frac{L_{s2}}{L_{p2}}}$  (3.11)

Since the peaking PA is turned off at the back-off efficiency enhancement level  $\alpha$ , it presents an open circuit to the transformer network, implying  $Z_3 = \infty \Omega$ . The 90° transmission line translates that open to a short, resulting in  $Z_4 = 0 \Omega$ . Thus, the impedance seen by the main PA  $Z_{1, \alpha PBO}$  is simply the load resistance transformed by the coupling coefficient  $k_1$ , and the transformer turns

ratio  $n_1$ , given by

$$Z_{1, \alpha PBO} = \frac{R_L}{k_1^2 . n_1^2} \tag{3.12}$$

Since the impedance seen by the main PA decreases by a factor  $\alpha$  at 0 dB PBO, and since the peaking PA is ( $\alpha$  - 1) times the size of the main PA, we need to ensure

$$Z_{1, 0dB PBO} = \frac{Z_{Main, \alpha PBO}}{\alpha} = \frac{R_L}{\alpha k_1^2 . n_1^2} \quad and \tag{3.13}$$

$$Z_{2, 0dB PBO} = \frac{Z_{Main, 0dB PBO}}{(\alpha - 1)} = \frac{R_L}{\alpha.(\alpha - 1).k_1^2.n_1^2}$$
(3.14)

In general, the impedance  $Z_2$ , and the currents  $I_1$  and  $I_2$ , shown in Fig. 3.11 can be transformed to the secondary side of the peaking transformers as  $Z_2 \cdot (k_2 \cdot n_2)^2$ ,  $I_1 / (k_1 \cdot n_1)$ , and  $I_2 / (k_2 \cdot n_2)$ , respectively. Knowing that power is conserved from the input to the output of a lossless transmission line, we get

$$\left(\frac{I_2}{k_2.n_2}\right)^2 \cdot \left[Z_2 \cdot (k_2.n_2)^2\right] = \left(\frac{I_1}{k_1.n_1}\right)^2 \cdot \left[Z_5\right]$$
(3.15)

Further, since the transmission line is designed to be  $90^{\circ}$  in electrical length (quarter wavelength),

$$Z_2.(k_2.n_2)^2 = \left(\frac{R_L}{\alpha}\right)^2.\frac{1}{Z_5},$$
(3.16)

Using equations (3.15) and (3.16),

$$Z_2 = \frac{R_1}{\alpha} \cdot \frac{I_1}{I_2} \cdot \frac{1}{k_1 \cdot n_1 \cdot k_2 \cdot n_2}$$
(3.17)

$$\Rightarrow Z_{2, 0dB PBO} = \frac{R_1}{\alpha.(\alpha - 1).k_1.n_1.k_2.n_2}$$
(3.18)

Therefore, equations (3.14) and (3.18) lead to the conclusion that

$$k_1 \cdot n_1 = k_2 \cdot n_2 \tag{3.19}$$

Similarly, we can show that

$$Z_4 = R_L \cdot \frac{I_2}{I_1} \cdot \frac{k_1 \cdot n_1}{k_2 \cdot n_2}$$
(3.20)

$$\Rightarrow Z_{4, 0dB PBO} = R_L \cdot \frac{(\alpha - 1)}{\alpha} \cdot \frac{k_1 \cdot n_1}{k_2 \cdot n_2}$$
(3.21)

and using KVL, we get

$$Z_{4,0dB PBO} \cdot \frac{I_1}{k_1 \cdot n_1} + k_1 \cdot n_1 \cdot V_1 = R_L \cdot \frac{I_1}{k_1 \cdot n_1}$$
(3.22)

Combining equations (3.21) with (3.22) at the 0 dB PBO condition, and using equation (3.19) results in

$$\frac{V_{1, 0dB PBO}}{I_{1, 0dB PBO}} = Z_{1, 0dB PBO} = \frac{R_L}{k_1 \cdot n_1} \cdot \left(\frac{1 - \alpha}{\alpha} \cdot \frac{1}{k_2 \cdot n_2} + \frac{1}{k_1 \cdot n_1}\right)$$
(3.23)

$$\Rightarrow Z_{1, 0dB PBO} = \frac{R_L}{\alpha . k_1^2 . n_1^2}$$
(3.24)

just as equation (3.13) wanted to ensure.

Substituting the values of  $n_1$  and  $n_2$  from equation (3.11), and using of equation (3.19) in equations (3.13) and (3.14), we get

$$Z_{1, 0dB PBO} = \frac{R_L . L_{p1}}{\alpha . k_1^2 . L_{s1}} = \frac{R_L . L_{res1}}{\alpha . k_1^2 . L_{s1}}$$
(3.25)

$$Z_{2, 0dB PBO} = \frac{R_L L_{p2}}{\alpha.(\alpha - 1).k_2^2 L_{s2}} = \frac{R_L L_{res2}}{\alpha.(\alpha - 1).k_2^2 L_{s2}}$$
(3.26)

Note that  $L_p = L_{res}$  because the magnetic inductance of the transformer  $(k^2.L_s)$ , which is used to implement  $L_{res}$ , gets reflected to the primary side as  $(k^2.L_s)/(k^2.n^2) = (k^2.L_s.L_p)/(k^2.L_s) = L_p$ . Also, since  $L_{res1}$  and  $L_{res2}$  are used to resonate the parasitic capacitance  $C_1$  and  $C_2$  of the main and the peaking PAs, respectively, equations (3.25) and (3.26) can be rewritten as:

$$k_1^2 L_{s1} = \frac{R_L L_{res1}}{\alpha (Z_{1, 0dB \ PBO})} = \frac{R_L}{\alpha \omega^2} \cdot \frac{1}{C_1 (Z_{1, 0dB \ PBO})}$$
(3.27)

$$k_2^2 L_{s2} = \frac{R_L L_{res2}}{\alpha . (\alpha - 1) . (Z_{2, \ 0dB \ PBO})} = \frac{R_L}{\alpha . (\alpha - 1) . \omega^2} \cdot \frac{1}{C_2 . (Z_{2, \ 0dB \ PBO})}$$
(3.28)

Now, we can make an important observation:  $(Z_{1, 0dB PBO})$  and  $(Z_{2, 0dB PBO})$  are design parameters that determine the output power of the PA, and as these values are varied, the size of the PA needs to vary inversely to achieve the desired output power and maintain high efficiency. This implies that the capacitance presented by the PA also varies inversely. Therefore, for a given PA architecture, the factors  $C_{1.}(Z_{1, 0dB PBO})$  and  $C_{2.}(Z_{2, 0dB PBO})$  are constant, to the first order. Thus,

$$C_{1}(Z_{1,0dB PBO}) = C_{2}(Z_{2,0dB PBO}) = \beta, \qquad (3.29)$$

where  $\beta$  represents the value of this constant, which can be determined through simulations. In a general-purpose 65 nm CMOS process,  $\beta = 15 \ \Omega.pF$ , for a cascode cell using the nominal thin oxide transistors. Therefore, we get

$$k_1^2 L_{s1} = \frac{R_L}{\alpha . \omega^2 . \beta} \tag{3.30}$$

$$k_2^2 L_{s2} = \frac{R_L}{\alpha . (\alpha - 1) . \omega^2 . \beta}$$
(3.31)

Lastly, Fig. 3.10(a) gives two more equations; since the series inductance of the 90° transmission line is implemented using the leakage inductance of the transformer, we get

$$(1 - k_1^2).L_{s1} = \frac{R_L}{\alpha.\omega}$$
 (3.32)

$$(1 - k_2^2).L_{s2} = \frac{R_L}{\alpha.\omega}$$
(3.33)

# **3.2.4** Derivations Results and Design Strategy:

Using equations (3.30), (3.31), (3.32), and (3.33), we can solve for  $L_{s1}$ ,  $L_{s2}$ ,  $k_1$ , and  $k_2$ 

$$L_{s1} = \frac{R_L}{\alpha.\omega} \cdot \left(1 + \frac{1}{\omega.\beta}\right) \tag{3.34}$$

$$L_{s2} = \frac{R_L}{\alpha.\omega} \cdot \left(1 + \frac{1}{(\alpha - 1).\omega.\beta}\right)$$
(3.35)

$$k_1 = \sqrt{\frac{1}{1 + \omega.\beta}} \tag{3.36}$$

$$k_{2} = \sqrt{\frac{1}{1 + (\alpha - 1).\omega.\beta}}$$
(3.37)

For a load resistance  $(R_L)$  of 50  $\Omega$ , the above equations can be plotted with respect to the operating frequency for a family of efficiency enhancement back-off levels, as shown in Fig. 3.12. These



Figure 3.12: Design of a transformer-based asymmetric series Doherty network at any desired frequency and at any desired efficiency enhancement (EE) level using (a)  $L_{s1}$ , (b)  $L_{s2}$ , (c)  $k_1$ , and (d)  $k_2$ .

plots can be used to determine the values for  $L_{s1}$ ,  $L_{s2}$ ,  $k_1$ , and  $k_2$ . It is important to note that these plots assume and  $\beta = 15 \ \Omega$ .pF, which will change if a different CMOS process is used, or if a different PA architecture is used (e.g. a cascode cell with thick gate transistors). In that scenario, the value of  $\beta$  can be determined through a simple simulation and can be substituted in the above equations to generate a new set of plots. The transistor size of the main and the peaking PAs are determined based on the desired output power, and the values of  $L_{p1}$  and  $L_{p2}$  are designed to resonate out the capacitance presented by the PAs, as explained above. This design procedure automatically ensures that the impedances presented to the PAs are optimal from an output power and efficiency standpoint, as  $\beta$  accounts for that. Note that we need to ensure the following to achieve the desired efficiency enhancement at  $\alpha$  back-off:

$$L_{p1} = (\alpha - 1).L_{p2} \tag{3.38}$$

$$size(Peak_{PA}) = (\alpha - 1). size(Main_{PA})$$
 (3.39)

# **3.3 Implementation of a Current-Mode Eight-phase Asymmetric Doherty Transmitter Using a Single Footprint Transformer**

This section describes the implementation of a current-mode eight-phase digital asymmetric series Doherty transmitter using a single footprint transformer. The overall block diagram is shown in Fig. 3.13, where a CW differential RF signal is provided to the chip, and a single-ended output with the desired amplitude and phase is produced by the chip. The differential input is fed to a three-ring PPF to generate eight "unclean" phases that are used to injection lock an 8ILRO to achieve "clean" wideband eight-phase generation. These eight phases behave as the basis vectors for the digital power amplifier, as explained in Section 3.2.2. Next, based on the desired output phase, the two adjacent basis vectors,  $\vec{\phi_A}$  and  $\vec{\phi_B}$ , and their corresponding differential basis vectors,  $\vec{\phi}_A$  and  $\vec{\phi}_B$ , are selected through an eight-phase basis vector mapper and fed to the main PA. Since the peak PA is driven 90° out of phase with respect to the main PA,  $\vec{\phi_C}$ ,  $\vec{\phi_D}$ ,  $\vec{\phi_C}$ , and  $\vec{\phi_D}$  are fed to the peak PA. Therefore, all of the eight basis vectors are always assigned to either the main or the peak PA, at any given time. The digital PA cells perform the non-linear combination of the basis vectors for producing the desired output amplitude and phase. The outputs of PAs are combined through a single footprint transformer to achieve efficiency enhancement in deep power back-off, and achieve single-ended output, while maintaining a compact form-factor. The details of the circuit blocks are mentioned below.

## 3.3.1 Implementation of Wideband Eight-phase Generation around 6.5 GHz

As discussed in Section 3.2.2, an 8P-8ILRO technique is implemented on this chip for wideband eight-phase generation around 6.5 GHz, as shown in Fig. 3.14. The three-ring PPF is desired to be composed of high impedance resistors and capacitors, since that results in an overall high input impedance. This can be easily matched to 50  $\Omega$  for a wide range of operating frequencies



Figure 3.13: Overall block diagram of the implemented current-mode eight-phase asymmetric digital Doherty transmitter using a single footprint transformer with on-chip multi-phase generation, along with the overall block-level schematic of the unit cells of the PAs.



Figure 3.14: An eight-phase injected eight-stage injection locked ring oscillator (8P-8ILRO) technique to generate eight equally spaced phase-shifted signals around 6.5 GHz is composed of (a) a PPF to generate the eight "unclean" injection signals, (b) a bias tee to raise the DC offset of the injection signals, and (c) an eight-stage ring oscillator to output the "clean" phase-shifted signals.



Figure 3.15: Post parasitic extracted voltage swing attenuation from the input (single-ended) to the outputs (single-ended) of the PPF.

by simply placing resistors in parallel at the input. In this work, the values of all the resistors and capacitors of the PPF are chosen to be 800  $\Omega$  and 30 fF, respectively, which results in a singleended input impedance of about 400  $\Omega \parallel$  60 fF. 56  $\Omega$  resistors are added in parallel at the PPF input to reduce reflections, resulting in  $S_{11}$  that remains below -20 dB throughout the operating range. The resulting post-parasitic extracted voltage swing attenuation from the input (single-ended) to the outputs (single-ended) of the PPF is shown in Fig. 3.15. The attenuation increases with the increase in the frequency of operation due to parasitic load capacitance. It should be noted that using an LC matching network, instead of the parallel resistors, results in a passive voltage gain that can improve the system efficiency of the transmitter; however, this reduces the RF bandwidth of the input network due to the narrow-band nature of the LC networks, and it also increases the area of the chip due to the use of additional inductors. Therefore, this approach is not implemented in this work.

Bias tee blocks, followed by NOT gates, are placed in between the PPF outputs and the inputs to the 8ILRO for adding a DC offset to the injected signals, as shown in Fig. 3.14(b) This ensures that the injected signal swing is centered at  $V_{DD}/2$ , which maximizes the gain provided by the non-linear NOT gates. The injection signals are fed to only differential nmos devices to alleviate the DC bias interaction at the injection nodes of the ring oscillator [80, 81], and cross-coupled NOT gates are added between the differential nodes to prevent the ring oscillator from latching.



Figure 3.16: Post parasitic extracted error (deviation) from the desired eight equally spaced phase-shifted signals around 6.5 GHz shows significant error at the output of (a) the three-ring PPF, while (b) the error is substantially reduced through the 8P-8ILRO.

The transistor sizes and values of the resistor and capacitors are shown in Fig. 3.14.

The simulated result of the implemented 8P-8ILRO after parasitic capacitance extraction is depicted in Fig. 3.16. The output of the 3-ring PPF shows a significant amount of error, as high as  $-17^{\circ}$ ; however, this error is substantially reduced to  $< \pm 1.5^{\circ}$ , at the output of the ring oscillator, highlighting the advantage of the implemented 8P-8ILRO technique. Note that the locking range shown in Fig. 3.16(b) represents the maximum locking range of the implemented 8P-8ILRO around 6.5 GHz. It is shown in the literature that the locking range is dependent on the strength of the injected signals [79], and the maximum locking range is attained when these injected signals swing from 0 to  $V_{DD}$ , as exceeding any further could result in reliability concerns related to the breakdown of the devices.



Figure 3.17: Block-level schematic of the implemented eight-phase basis vector mapper.

# 3.3.2 Implementation of Eight-phase basis vector mapper around 6.5 GHz

The outputs of the 8P-8ILRO structure are fed to an eight-phase basis vector mapper for assigning the desired phases to the main PA or the peak PA, as described above. Although the mapper can be implemented using a standard multiplexer tree with 3-bit control and 56 switches (seven switches per path for a total of eight paths), it results in a considerable number of cross-overs between the signals since all eight input signals need to be routed to all eight paths. Such a design introduces parasitic coupling that increases phase error between the different paths. A simple observation can significantly simplify the mapper design - note that the phase relationship between the outputs of the mapper is dependent on each other; for example,  $\vec{\phi_A}$  always lags  $\vec{\phi_B}$  by 45°, and  $\vec{\phi_B}$  always lags  $\vec{\phi_c}$  by 45°, and so on; only the absolute value of the phases vary. Therefore, the mapper does not need to achieve independent control for each path, as offered by the standard multiplexer tree. A simplified mapper design, resulting in a total of 24 switches that are divided into columns of three with a 3-bit control is shown in Fig. 3.17. Further, each input signal is only routed to two switches, which ameliorates the parasitic coupling issue.

Each switch is implemented using a 2:1 multiplexer, as depicted in Fig. 3.17, where the RF signal is fed to two NAND gates along with the control signal and its complement. This allows only one of the RF signals to propagate, while the output of the other NAND gate clamps to  $V_{DD}$ . These two signals are then combined through a final NAND gate to drive a set of NOT gate buffers. Since a standard NAND gate presents uneven load capacitance to its drive signals, it is crucial that the final NAND gate is implemented using a symmetric NAND topology to ensure its delay is identical, irrespective of the RF signal that propagates through it [27]. The sizes of all the devices are mentioned in Fig. 3.17. The output of the mapper is followed by an additional set of NOT gate buffers in a fan-out structure to drive the 2D unit-cell grid, described below.

#### **3.3.3** Implementation of the PAs and its Unit Cells

The peak PA is designed to be twice the size of the main PA for the following three advantages:

- The resulting architecture is an asymmetric Doherty network that achieves efficiency enhancement in deep PBO. Based on equation (3.39), (α 1) = 2, which results in enhancement up to 9.5 dB output power back-off.
- The asymmetric architecture eases the design of a transformer-within-transformer structure since the two transformers are desired to be different in size. The classical Doherty architecture that achieves efficiency enhancement until 6 dB PBO, needs two identical transformers. Such a design is challenging to realize if one transformer is inserted into another, as that process inherently leads to asymmetry. A transformer-within-transformer technique is easier to achieve for an asymmetric design.
- It reduces a significant amount of layout effort due to the reuse of implemented cells. For example, the peak PA can be implemented by simply "copying and pasting" the main PA

twice, and then connecting the outputs of the two pasted blocks together, as shown in Fig. 3.13. Similarly, the buffers driving the main PA can be re-used for driving the peak PA. Such a design is useful for being able to scale with process nodes, since only one "golden" unit cell needs to be implemented, and this would not be possible had the peak PA size been a non-integer multiple of the main PA size.

It is important to note that since the peak PA is double in size compared to the main PA, the output of the eight-phase basis vector mapper needs to drive uneven loads. Therefore, a dummy buffer is added in parallel to the buffers driving the main PA to compensate for this mismatch, as shown in Fig. 3.13.

The block level schematic of one of the main PA's unit-cell is also shown in Fig. 3.13 (two of these cells are used to create a unit-cell for the peak PA). Based on the desired output amplitude and phase, the output stage of each unit cell can be driven by either one of the two basis vectors, or it can be turned off. Therefore, every cell receives both of the adjacent basis vectors ( $\vec{\phi_A}$  and  $\vec{\phi_B}$ ), and the control signals  $\vec{\phi_A}$ -enable and  $\vec{\phi_B}$ -enable. Similar to the 2:1 switch described above, the enable signals only allow one of the basis vectors to propagate using a symmetric NAND gate, which is followed by a set of NOT gate buffers. The output stage is implemented using a cascode topology, composed of a CS stage and a common-gate (CG) stage, to allow for a higher voltage swing handling capability, and thereby increase output power. The sizes of all the devices are mentioned in Fig. 3.13.

The PAs are implemented using 6-bit thermometer-coded unit-cells that are arranged in a 2D grid consisting of row and column signals. The enable signal is calculated through a simple digital logic given by  $[(row_i.column_j) + row_{i+1}]$ , where  $0 \le i, j \le 7$ , and  $row_0 = 1$ ,  $row_{7+1} = 0$ , and  $column_7 = 0$ . To reduce the number of pads on-chip, these row and column signals are derived from a 6-bit binary code that is split into two sets of 3-bit binary codes, and each of them is then converted to a 7-bit thermometer code through an on-chip decoder. Since each cell needs two enable signals, one per basis vector, the main PA and the peak PA each have a total of  $2 \times 6$ -bit control. The two enable signals are designed to traverse in a reverse order on the 2D grid to ensure



Figure 3.18: (a) An implementation of an asymmetric transformer-based series Doherty network with parasitic magnetic coupling, resulting in (b) efficiency enhancement close to 9.5 dB PBO when  $k_p = 0$ , and the degraded performance in PBO as  $k_p$  increases.

only one enable signal is activated per unit cell, at any given time. Finally, TSPC-based flip-flops are used in every unit-cell to correct the timing mismatch associated with the control signals.

# 3.3.4 Implementation of Single Footprint Asymmetric Series Doherty Transformer Network

Typically, two transformers are used for designing an asymmetric series Doherty matching network, as described in Section 3.2.3, and an example of such a network operating at 6.5 GHz, for  $\alpha = 3$  [size(Peak<sub>PA</sub>) = ( $\alpha - 1$ ). size(Main<sub>PA</sub>)], is developed using the design strategy described in Section 3.2.4, and it is shown in Fig. 3.18(a). The transformers are assumed to be lossless, and the PA unit cells are implemented using a cascode topology, similar to the one depicted in Fig. 3.13. The values of the components used in the network are also shown in 3.18(a). This design achieves a peak drain efficiency of 76% and efficiency enhancement close to 9.5 dB PBO, as expected.

Although the design described above achieves the desired performance, it consumes a significant amount of area. Therefore, a transformer-within-transformer structure is proposed and implemented in this work. However, placing one transformer inside another leads to parasitic magnetic coupling  $(k_{p1}, k_{p2}, k_{p3}, \text{ and } k_{p4})$  between the transformers of the main PA and the peak PA, as indicated in Fig. 3.18(a), which can significantly reduce the performance of the overall power amplifier. To understand the effects of this parasitic magnetic coupling, the design described above is simulated for various values of  $k_p$ , where  $k_p = k_{p1} = k_{p2} = k_{p3} = k_{p4}$  for simulation simplicity. The results show significant degradation in back-off efficiency enhancement with increasing  $k_p$ , as illustrated in Fig. 3.18(b). Therefore, all parasitic coupling coefficients are desired to be  $\leq 0.1$  for the proposed transformer-within-transformer design.

To reduce the parasitic coupling coefficients of the transformer-within-transformer structure, the inner transformer is twisted into a Figure-8 structure, as shown in Fig. 3.19(a). The resulting magnetic flux in the two "octagons" of Figure-8 are almost equal in magnitude but opposite in



Figure 3.19: (a) The implemented single footprint transformer-based series Doherty network using the transformer-within-transformer technique with (b) very low parasitic coupling, resulting in (c) maximum passive efficiency of 70%, and (d) the desired input impedance for the Main and the Peak PA.



Figure 3.20: Die photo of the transmitter realized in a 65 nm CMOS process.

direction; therefore, the net induced current in the non-twisted loops due to the Figure-8 loops is close to zero, leading to very low parasitic coupling [82]. The design of Fig. 3.18(a) is taken as the starting point, and iterative EM simulations are performed to reach the final solution that is shown in Fig. 3.19(b). Note that the coil on the outside of one of the "octagons" ends up on the inside of the other, since this allows the desired connections to be made without the need for additional vias. Reducing the number of vias results in lower loss, and thereby improves efficiency; however, this uneven nature of the Figure-8 structure results in imperfect cancellation of the magnetic flux. Therefore, the parasitic coupling coefficients are not exactly zero, but simulations show that they are below 0.1, as desired. The passive efficiency is close to 70% at maximum output power, and almost 59% when the peak PA is switched off, as shown in Fig. 3.19(c). Finally, Fig. 3.19(d) shows that the impedance seen by the main PA  $(Z_{Main})$  is almost twice compared to the peak PA  $Z_{Peak}$  at maximum output power, as desired, and  $Z_{Main}$  increases to more than  $2 \times$  when the peak PA is turned off, which results in efficiency enhancement in PBO. Note that ideally,  $Z_{Main}$  needs to increase by  $3 \times$  when the peak PA is turned off, but the impedance boost is reduced in practical implementations due to the loss of the transformers, and the non-infinite off resistance of the peak PA, preventing the efficiency enhancement peak from the reaching the design value of 9.5 dB PBO.

The die photo of the implemented transmitter is shown in Fig. 3.20.

# **3.4 Measurement**

## 3.4.1 CW Measurements

For CW measurements, the 27 control bits (12 bits for Main and Peak PA each + 3 bits for basis vector mapper) are driven by an Arduino Due, with an off-chip digital level-shifter in-between to translate the digital signals from 0-3.3 V to 0-1.8 V. Additionally, on-chip digital level-shifters are also used to further reduce the swing to 0-1.2 V. An external 100 MHz clock is provided to the chip, since the TSPC flip-flops need to be clocked for preventing charge leakage. The CW RF

input is generated through Keysight's N5245A PNA-X vector network analyzer, followed by an external balun to produce a differential signal at the operating frequency, which is provided to the chip through a differential probe. The output cascode stage of the transmitter uses a 1.25 V supply, and the gate of the CG stage of the cascode structure uses 1.2 V for biasing. The NOT gate driving the CS stage of the cascode structure uses a reduced 0.85 V supply to improve the efficiency of the output stage, and the injection-locked ring oscillator supply  $V_{RO}$  is set to 1.01 V to achieve the optimal phase noise performance at 6.5 GHz, as shown below; the rest of the supplies use 1.2 V. The output of the transmitter is connected to a single-ended probe, followed by an external wideband power splitter, where one end is connected to a Rhode and Schwarz's NRP-Z57 power sensor to measure the output power.

Fig. 3.21 shows that for an input power of 6 dBm (3 dBm at each of the differential RF inputs) provided to the chip, the implemented injection locked oscillator achieves more than 1 GHz of locking range. This figure also shows the phase noise at the output of the transmitter for the entire achievable locking range at a constant  $V_{RO}$ , and compares them to the phase noise of the input PNA-X signal (green) provided to the chip, showing an optimal performance at 6.5 GHz. Although the locking range can be improved by increasing the output power, it results in a reduction of transmitter gain. Therefore, in this work, the input power is set to 6 dBm to achieve a decent gain of about 14 dB for the entire operating frequency range, as shown below.

Fig. 3.22 (a) and (b) show the frequency dependence of the transmitter for DE,  $P_{out}$ , and gain, where the supply of the ring oscillator is varied with respect to frequency to achieve optimal phase noise performance. The transmitter achieves a DE above 30% for the frequency range of 4.5 GHz to 7.0 GHz, with a maximum of 38% at 5.75 GHz, 34% at 6.5 GHz, and 31% at 7.0 GHz. It also achieves more than 20% DE in PBO (Peak PA turned off) from 6.1 GHz to 7.0 GHz, with 24% at 6.5 GHz, and a maximum of 26% at 7.0 GHz, as shown in Fig. 3.22 (a). Further, it attains more than 20 dBm  $P_{out}$  and close to 14 dB gain from 4.5 GHz to 6.7 GHz, as depicted in Fig. 3.22 (b). This wideband performance also indicates the strength of the proposed 8P-8ILRO technique for generating multi-phases for a wide frequency operating range.



Figure 3.21: Measured phase noise of the injection locked ring oscillator for the entire locking range at a constant  $V_{RO}$  = 1.01 V



Figure 3.22: Measured frequency dependence of the implemented PA for (a) DE at maximum output power and at power back-off when the peak PA is turned off, and (b) maximum output power and gain, where the supply of the ring oscillator is varied with respect to frequency to achieve optimal phase noise performance.



Figure 3.23: Measured drain efficiency (DE) and system efficiency (SE) of the implemented PA at (a) 6.25 GHz, (b) 6.50 GHz, (c) 6.75 GHz and (d) 7.00 GHz compared to normalized class B PAs.

Given that the transmitter performs optimally in PBO above 6.1 GHz, it is measured for DE and SE for the entire PBO range at 6.25 GHz, 6.5 GHz, 6.75 GHz, and 7.0 GHz, and it is compared with a normalized class B PA's performance, as shown in Fig. 3.23 (a), (b), (c), and (d), respectively. Here, the system efficiency includes the power dissipated in the output stage as well as all the other circuit blocks of the transmitter. The implemented transmitter achieves a maximum DE of 34%, 33%, 32%, and 31%, and a DE of 22%, 23%, 24%, and 24% at 8 dB PBO, at the aforementioned frequencies, respectively. This corresponds to an improvement of  $1.61 \times$ ,  $1.76 \times$ ,  $1.88 \times$ , and  $1.93 \times$  compared to a normalized class B PA, showcasing the improved performance offered by the proposed single footprint transformer-within-transformer based asymmetric series Doherty network.

Next, the DE at maximum  $P_{out}$  (m + n = p) contour is measured with respect to the normalized


Figure 3.24: Measured DE at maximum  $P_{out}$  contour (m + n = p) with respect to normalized output phase for all basis vector mapper code bits at 6.5 GHz.

output phase for all 3-bit combinations of the basis vector mapper code bits at 6.5 GHz, as shown in 3.24. This indicates that the wort-case efficiency at maximum  $P_{out}$  is relatively high; for example, code <101> results in a maximum DE of 33%, and worst-case DE of 25%, which corresponds to a 0.24× reduction. On the other hand, an idealized quadrature architecture, described in Section 3.2.1, shows a simulated worst-case DE reduction of 0.45×. This illustrates the advantage of the proposed eight-phase architecture.

Figure 3.25 shows the measured output voltage and output phase at 6.5 GHz over the entire backoff range, and for all the basis vector mapper control bit combinations, resulting in a total of 48,896 data points. The curves on each colored sector represent the AM-PM non-linearity caused by the variation in output capacitance of the unit cells as they are switched on and off. Given the currentmode implementation of each unit cell, this transmitter also exhibits a significant AM-AM nonlinearity, which is captured in this measurement as well. It will be explained in Section 3.4.2 that this measured data is used to create a 2D look-up table for performing modulated measurements.



Figure 3.25: Measured AM-AM and AM-PM of the implemented transmitter for all basis vector mapper code bits at 6.5 GHz.

#### 3.4.2 **RF Modulated Measurements**

The modulated measurement setup is similar to the CW measurement setup, with a couple of exceptions: The NRP-Z57 power sensor is replaced with Keysight's N9030A PXA spectrum analyzer for vector signal analysis, and the Arduino Due is replaced with Link Instruments' IO-3200 digital pattern generator. The desired oversampled baseband modulated waveform is generated in MATLAB, and it is mapped to the corresponding control bits of the transmitter using the 2D look-up table, mentioned in Section 3.4.1. The resulting digital pattern is uploaded to the pattern generator's memory, which is triggered using the same clock source that is provided to the chip for clocking the TSPC flip-flops.

The measured 32-QAM and 64-QAM constellations at maximum  $P_{out}$  with increasing symbol rates at 6.5 GHz, are shown in Fig. 3.26 (a) and (b), respectively. For 32-QAM, the transmitter achieves a DE of 21% with an average  $P_{out}$  of 14 dBm,  $EVM_{rms}$  of 3.5%, and ACLR of 30.6 dBc for a 12.5 MSym/s waveform. The  $EVM_{rms}$  increases to 5.4% and ACLR reduces to 19.5 dBc



Figure 3.26: Measured (a) 32-QAM and (6) 64-QAM constellations at maximum  $P_{out}$  for increasing symbol rates at 6.5 GHz.

as the symbol-rate is increased to 40 MSym/s. Similarly, for 64-QAM, the transmitter achieves a DE of 21% with an average  $P_{out}$  of 14 dBm,  $EVM_{rms}$  of 4.0%, and ACLR of 30.9 dBc for a 12.5 MSym/s waveform. The  $EVM_{rms}$  increases to 4.5% and ACLR reduces to 24.5 dBc as the symbol-rate is increased to 25 MSym/s.

Fig. 3.27(a) shows that the transmitter can also produce a decent 128-QAM constellation at 6 dB power back-off (average  $P_{out} = 7.5$  dBm), with a DE of 13% and  $EVM_{rms}$  of 2.1%. It achieves an ACLR of 29.9 dBc, as shown in Fig. 3.27(b).



Figure 3.27: Measured (a) 128-QAM constellation  $EVM_{rms}$  and (b) ACLR for a 12.5 MSym/s waveform at 6 dB output power back-off at 6.5 GHz.

The modulation performance of the transmitter (DE and  $EVM_{rms}$ ) is also characterized for dependence on  $P_{out}$ , for 16-QAM, 32-QAM, 64-QAM constellations at 20 MSym/s, and for 128-QAM constellation at 12.5 MSym/s, as depicted in Fig. 3.28 (a), (b), (c), and (d), respectively. As expected, the DE increases with  $P_{out}$ , but it also results in worse  $EVM_{rms}$ . For example, as the average  $P_{out}$  is increased from 7 dBm to 14 dBm for the 64-QAM waveform, the DE increases from 12% to 21%; however, the resulting  $EVM_{rms}$  also increases from 2.9% to 4.1%.

Finally, the far-out spectrum for a 20 MSym/s 64-QAM waveform at 6.5 GHz is shown in Fig. 3.29. Since the baseband waveform is  $8 \times$  oversampled, the zero order hold (ZOH) sampling images occur at 160 MHz offset. These can be lowered by increasing the oversampling factor or using a higher order sampling holds [27].

Table 3.1 compares the implemented transmitter with other state-of-the-art CMOS transmitters and PAs operating above 5 GHz that also achieve efficiency enhancement in PBO. The implemented transmitter demonstrates state-of-the-art  $P_{out,max}$  1 dB bandwidth with competitive performance



Figure 3.28: Measured DE and  $EVM_{rms}$  dependence on output power at 6.5 GHz for (a) 16-QAM, (b) 32-QAM, (c) 64-QAM constellations at 20 MSym/s, and (d) 128-QAM constellation at 12.5 MSym/s.



Figure 3.29: Far-out spectrum for a 20 MSym/s  $8 \times$  oversampled 64 QAM waveform at 6.5 GHz showing the resulting zero-order hold (ZOH) sampling images.

in power back-off, especially considering the use of an older 65 nm process, and higher operating frequency while occupying significantly lower die area compared to the other works.

## 3.5 Conclusion

This work proposes a current-mode eight-phase asymmetric digital Doherty transmitter using a single footprint transformer-based matching network, where three techniques are highlighted: 1) An eight-phase architecture technique to improve the efficiency profile compared to a quadrature architecture, while still overcoming the massive bandwidth expansion of the polar architecture, 2) An eight-phase injection locked eight-stage ring oscillator technique for producing the eight basis phase vectors to overcome the need for using an  $8 \times$  multiple of operating frequency at the RF input, 3) A transformer-within-transformer technique for realizing an asymmetric Doherty matching network to achieve compact form-factor, while also improving the efficiency in power back-off. These techniques are implemented in a general-purpose 65 nm CMOS process at 6.5 GHz. The transmitter achieves a maximum  $P_{out}$  of 20 dBm and a peak DE of 34% with 1.76× improvement at 8 dB PBO compared to a normalized class B PA. It also achieves a DE of 21% with an average  $P_{out}$  of 14 dBm and  $EVM_{rms}$  of 4.1% for a 64-QAM 20 MSym/s modulated waveform.

## 3.6 Contributions

 J. Sheth, L. Zhang, X. Shen, V. Iyer and S.M. Bowers, "An Asymmetric Current-Mode Multi-Phase Digital Doherty Transmitter Using a Single Footprint transformer-based Matching Network," to be submitted to *IEEE Open Journal of the Solid-State Circuits Society (OJ-SSCS)*.

This work was done in collaboration with Linsheng Zhang, Xiaochuan Shen, and Vinay Iyer. I would like to thank them for the following contributions:

|                             | This Work                       |         | [61]    | [70]             | [27]                     | [83]              | [59]               |  |  |  |  |  |
|-----------------------------|---------------------------------|---------|---------|------------------|--------------------------|-------------------|--------------------|--|--|--|--|--|
| CMOS Node                   | 65 nm                           |         | 28 nm   | 65 nm            | 40 nm                    | 65 nm             | 55 nm              |  |  |  |  |  |
| Architecture                | 8-phase                         |         | Polar   | Sub-harmonic     | Quadrature               | Polar             | MGTR               |  |  |  |  |  |
|                             | 2-way Doherty                   |         |         | switching        | 4-way Doherty            | 4-way Doherty     | Doherty            |  |  |  |  |  |
| Die Size (mm <sup>2</sup> ) | <b>2.4</b> (0.86 <sup>@</sup> ) |         | 4#      | 7.1              | 3.55 (1.5 <sup>@</sup> ) | 3                 | 6                  |  |  |  |  |  |
| CW Measurements             |                                 |         |         |                  |                          |                   |                    |  |  |  |  |  |
| 1-dB Power BW               | 4.5 - 7.0 GHz                   |         | n/a     | 5.3 - 6.05 GHz * | 5.0 - 5.6 GHz            | 4.75 - 5.25 GHz * | 5.4 - 6.1 GHz *    |  |  |  |  |  |
| Pout,max                    | 20 dBm                          |         | 27 dBm  | 27 dBm           | 27.4 dBm                 | 6.5 dBm           | 27.2 dBm           |  |  |  |  |  |
| Frequency                   | 6.5 GHz                         | 7.0 GHz | 5 GHz   | 5.4 GHz          | 5.4 GHz                  | 5.25 GHz          | 5.8 GHz            |  |  |  |  |  |
| Peak DE                     | 34%                             | 31%     | 37%     | 40.1%            | 47.4%                    | 42%               | 24.5%†             |  |  |  |  |  |
| 3 dB PBO DE                 | 29%                             | 27%     | 31% *   | 32% *            | 47.68%                   | 36%               | 22% <sup>†,*</sup> |  |  |  |  |  |
| 6 dB PBO DE                 | 25%                             | 26%     | 26% *   | 26.3%            | 43.49%                   | 28%               | 13% <sup>†,*</sup> |  |  |  |  |  |
| 9 dB PBO DE                 | 21%                             | 21%     | 18% *   | 29.2%            | 41.06%                   | 23%               | $8\%^{\dagger,*}$  |  |  |  |  |  |
| Modulation Measurements     |                                 |         |         |                  |                          |                   |                    |  |  |  |  |  |
| Modulation                  | 27 MHz                          |         | 160 MHz | 80 MHz           | 320 MHz                  | 1 MSym/s          | 80 MSym/s          |  |  |  |  |  |
| Туре                        | 64-QAM                          |         | MCS-11  | 64-QAM           | 4x512-QAM                | 16-QAM            | 256-QAM            |  |  |  |  |  |
|                             | @ 6.5 GHz                       |         |         | OFDM             | OFDM                     |                   |                    |  |  |  |  |  |
| Avg. Pout                   | 14 dBm                          |         | 19.2    | 18 dBm           | 18.16 dBm                | 1.9 dBm           | 17 dBm             |  |  |  |  |  |
| DE                          | 21%                             |         | 21.2%   | 28.1%            | 41.12%                   | 34%               | 5.3%†              |  |  |  |  |  |
| EVM                         | -27.7 dB                        |         | -35 dB  | -30.4 dB         | -29.65 dB                | -20.5 dB          | -34.9 dB           |  |  |  |  |  |

Table 3.1: Comparison with State-of-the-Art CMOS PAs Operating above 5 GHz

<sup>®</sup> Core area <sup>#</sup>Includes DFE, DPLL, and Dual-band Tx <sup>\*</sup>Estimated from reported figures <sup>†</sup>PAE

- Linsheng Zhang for the design and layout of the eight-phase basis vector mapper, and for the layout of the three-ring poly-phase filter and the eight-stage injection locked ring oscillator, along with helpful technical discussions
- Xiaochuan Shen for the design and layout of the binary to thermometer decoders, I/O and DC pads, and helpful technical discussions.
- Vinay Iyer for helpful technical discussions.

# **Chapter 4**

# Wideband InP PAs in F-band with Modulation Measurements

#### 4.1 Introduction

As mentioned in Chapter 1, there is a need for efficient wide band PAs with high  $P_{out}$  operating in D-band for future applications such as 6G communication, THz imaging, sensing, and positioning [35, 36, 84]. Since silicon-based processes, such as CMOS, have limited  $f_t/f_{max}$  and limited breakdown voltage, HBTs in the InP process have gained interest for designing such PAs [38–41]. Current state-of-the-art uses power combining techniques to improve  $P_{out}$ . A 2:1 Wilkinson-based power combining technique has been shown in the literature to achieve wide bandwidth and high efficiency [38]. To increase  $P_{out}$ , this technique has been extended to a 4:1 Wilkinson combiner; however, this leads to reduced efficiency [39]. 4:1 and 8:1 transmission line based power combining techniques have also been explored in the literature. Although these techniques offer high  $P_{out}$  and high efficiency, they suffer from limited bandwidth. Further, these PAs have not been tested with modulation measurements, and the literature does not hold any readily implementable solutions to accomplish this. Almost all the modulation measurements at D-band are performed on complete transmitter integrated circuits that include all up-conversion modules on-chip [85–88].



Figure 4.1: Schematic of the stacked PA, showing its input and output matching networks,  $R_b$  resistance for stability, and radial stubs for presenting broadband low impedance.

The one exception that performs such measurements on a standalone InP PA uses a custom CMOS integrated circuit on a ceramic interposer to drive the InP PA wirebonded to the interposer, which is complex to use [42].

In this work, two PAs implemented in a 130 nm InP process are presented. First, a single stage stacked PA with a peak saturated output power ( $P_{sat}$ ) of 15.3 dBm and wideband operation with small-signal 3 dB bandwidth from 90 - 140 GHz is presented. Next, this single stage stacked PA is used in an eight-way power combined distributed active transformer (DAT) configuration to generate a  $P_{sat}$  of 22.1 dBm with a measured small-signal 3 dB bandwidth from 100 - 140 GHz. Finally, QPSK and 16-QAM modulation measurements are performed using COTS equipment on the single stage stacked PA to demonstrate a peak data rate of 10 Gbps at 120 GHz with the PA operating close to its  $P_{sat}$ . It is important to note that this peak data-rate is limited due to the measurement equipment.

# 4.2 Design of a Single Stage Stacked PA and an Eight-Way DAT PA

#### 4.2.1 Single Stage Stacked PA

To achieve high gain from a single stage PA, a stacked topology consisting of a common-emitter (CE) and a common-base (CB) amplifier is implemented, as shown in Fig. 4.1. Both of these amplifiers are realized using four transistors placed in parallel, each 6  $\mu$ m wide, to increase  $P_{out}$ . The desired optimal load impedance for this implementation is (60 + j40)  $\Omega$ , which is significantly closer to 50  $\Omega$ , compared to a stand-alone CE PA that desires an optimal impedance of (10 + j15)  $\Omega$ . Therefore, the resulting output matching network for the stacked PA inherently offers a wideband performance, compared to the stand-alone CE PA, showing the advantage of the proposed design. The output matching network is designed using an open-stub and a series transmission line, along with a series DC blocking capacitor, as shown in Fig. 4.1. The simulated passive efficiency of the output matching network is > 80% from 90 GHz to 140 GHz.

A short series transmission line of 50  $\mu$ m is used as an inter-stage matching network between the CE and the CB amplifiers to improve the gain of the stacked PA, and a capacitor of 65 fF is added at the base of the CB amplifier to prevent its collector-base junction from exceeding the breakdown limit. In order to prevent any low-frequency instability and to provide thermal ballasting, a resistor  $R_b$  of 150  $\Omega$  is added in series with the bias line. A quarter-wave transmission line is used in between the  $R_b$  and the bias supply to ensure that  $R_b$  does not affect the RF performance. Finally, radial stubs are implemented at all DC pads to present a broadband low impedance which helps reduce the effect of parasitics from packaging elements, such as wirebonds.



Figure 4.2: Block-level schematic of a distributed active transformer (DAT) based power amplifier with the inter-stage matching network between the final stacked PA stage and a differential driver stage.

#### 4.2.2 Eight-Way Distributed Active Transformer (DAT) Based PA

To achieve a higher  $P_{out}$ , eight of the single stage PAs are combined using a DAT, as shown in Fig. 4.2. The DAT provides a virtual short between the two adjacent PA stages (for example, PA1+ and PA4-), and this virtual short is used to provide the supply for each of the stacked PAs. Such a design eliminates the need for quarter-wave transmission lines at the supply, thereby achieving a compact power combining network. Additionally, it reduces the loss of the output matching network, which helps to improve the efficiency of the overall PA. The simulated loss of the entire output network is 1.2 dB at 135 GHz, and it stays lower than 1.3 dB from 120 GHz to 140 GHz.

The eight single stage stacked PAs of the DAT are driven using a total of two driver stages operated in a differential configuration. A 5  $\Omega$  resistor is added at the common-mode node of the driver stage to overcome common-mode stability concerns, and a Marchand balun is implemented on-chip to generate the differential signal for the driver stages. A transmission line based inter-stage matching network, which simultaneously achieves 1:4 power split, is used between the driver stages and the final stacked PA stages, as shown in Fig. 4.2. Since the 130 nm InP process offers only a three metal stack, the transmission lines are implemented as a micro-strip with metal 1 as ground and metal 3 as signal to achieve the desired impedance. The use of metal 2 at signal cross-over locations leads to a significant imbalance in the DAT, since the signal that uses metal 2 is considerably closer to the ground plane compared to the signal using metal 3. To mitigate this issue, ground cut-outs are added at the three cross-over locations in the inter-stage match, as illustrated in Fig. 4.2.

The die photos of the stacked PA and the DAT PA are shown in Fig. 4.3(a) and (b), respectively.

#### 4.3 CW Measurements

CW measurements are performed on both the stacked PA and the DAT PA.

The small-signal s-parameter measurements of the single stage PA are shown in Fig. 4.4(a), highlighting a wide 3 dB bandwidth PA operation from 90 - 140 GHz. The PA also achieves a  $P_{out}$  of



Figure 4.3: Die photos of (a) the single stacked PA, and (b) the DAT based PA implemented in Teledyne's 130 nm InP process.



Figure 4.4: (a) Small-signal measurements of the stacked PA showing a 3 dB bandwidth from 90 - 140 GHz, and (b) large signal power back-off measurements at 112.5 GHz showing a peak 18.3% power added efficiency (PAE).

15 dBm with a peak power added efficiency (PAE) of 18.3% and a gain of 7.5 dB at 112.5 GHz, as shown by the large signal measurements in Fig. 4.4(b). Additionally, the peak  $P_{out}$  exhibits less than 3 dB ripple and more than 7% PAE from 90 - 140 GHz, as shown in Fig. 4.5(a) and (b).

The small-signal s-parameter measurements of the eight-way combined DAT PA are shown in Fig. 4.6(a), illustrating a 3 dB bandwidth from 100 - 140 GHz. The PA obtains greater than 20 dBm  $P_{sat}$  with more than 7% PAE from 115 to 130 GHz, as shown in Fig. 4.6(b). Additionally, the PA achieves a  $P_{sat}$  of 22.1 dBm with a PAE of 11.5% and a gain of 8.6 dB at 120 GHz, as presented in Fig. 4.7.



Figure 4.5: Large signal frequency dependence measurements of the stacked PA showing (a) more than 7% PAE, and (b) less than 3 dB ripple from 90-140 GHz.



Figure 4.6: (a) Small-signal measurements of the DAT PA showing a 3 dB bandwidth from 100 to 140 GHz, and (b) large signal frequency dependence showing > 20 dBm  $P_{sat}$  and > 7% PAE from 115 to 130 GHz.



Figure 4.7: Large signal power back-off measurements at 120 GHz showing a peak 11.5% power added efficiency (PAE).



Figure 4.8: Multi-Gbps modulation measurement setup to study the behavior of a standalone D-band PA close to its saturation regime using commercial-of-the-shelf (COTS) equipment.

## 4.4 Modulation measurements of a Single Stage Stacked PA

Modulation measurements are performed on the single stage stacked PA using readily available COTS equipment, as shown in Fig. 4.8. First, baseband multi-Gbps I and Q signals are generated using Keysight's AWG M8190A. These signals, along with an LO, are fed to a Marki IQ mixer MMIQ-0520HS to generate the modulated waveform at 10 GHz. Next, this modulated signal is further up-converted to 121 GHz using a VDI's Balanced Fundamental Mixer (BAM). The LO for this mixer is generated by tripling a 37 GHz signal using a VDI's Signal Generator Extender (SGX). This up-conversion also generates an image component at 101 GHz which needs to be attenuated significantly, since it falls within the bandwidth of the PA. A VDI waveguide-based bandpass filter WR6.5BPFE116-123 with about 1 dB insertion loss around 120 GHz and more than 70 dB of rejection at 101 GHz is used for this purpose. The resulting 121 GHz modulated signal needs to be amplified to drive the InP integrated circuit close to its saturation regime. This was accomplished using a VDI WR8 amplifier with an output P1dB of 13 dBm. The output power



Figure 4.9: Down-converted and demodulated (a) 500 MSPS 16-QAM, and (b) 5 GSPS QPSK constellations, along with (c) output power and symbol-rate dependence on the  $EVM_{rms}$  of various modulated signals through the single stage stacked PA at a carrier frequency of 121 GHz.

control on the WR8 amplifier is achieved using a WR8 attenuator at its input. This multi-Gbps modulated signal at 121 GHz carrier frequency is used as the input for the stacked PA. The output of the integrated circuit is sent through a WR8 coupler, and the "coupled" port is connected to a VDI PM5 to measure the output power of the PA. The output from the "through" port of the coupler is connected to a VDI spectrum analyzer extender (SAX) with an attenuator in the middle to prevent the SAX from saturating. A 20.667 GHz LO is provided to the SAX to harmonically mix the output of the PA down to 3 GHz. Finally, this signal is fed to a Keysight MXR608A wideband oscilloscope for demodulation.

| Ref.                | This       | Work       | [89]                       | [90]          | [42] / [40]       |        |
|---------------------|------------|------------|----------------------------|---------------|-------------------|--------|
| Pout 1dB            | 97.5-      | 122.5      | 107-135**                  | 147-153.5**   | 125-145**         |        |
| BW (GHz)            |            |            |                            |               |                   |        |
| Technology          | 130 n      | m InP      | 130 nm SiGe 45 nm SO       |               | 250 nm InP + CMOS |        |
| PA Topology         | Single sta | ge stacked | Eight-way combined Doherty | Tx front-end. | Four-way combined |        |
| Psat (dBm)          | 15         | 5.3        | 22.7                       | 1.9           | 17                |        |
| Peak PAE (%)        | 18.3       |            | 18.7                       | 11**          | 20                |        |
| @ (GHz)             | 112.5      |            | 110                        | 149.9         |                   |        |
| Peak S21 (dB)       | 11         | .7         | 21.8                       | 22.5**        | 20                |        |
| Mod. Freq.          | 121 GHz    |            | 131.5 GHz                  | 150.7 GHz     | 135 GHz           |        |
| Mod. Format         | 16 QAM     | QPSK       | 16 QAM                     | 64 QAM        | 64QAM             | 16QAM  |
| Data rate (Gbps)    | 2          | 10         | 8                          | 10.56         | 6                 | 4      |
| EVM                 | 5.8%       | 14.3%      | 11.6%                      | 9.5%          | 12%**             | 12%**  |
| PAE <sub>avg</sub>  | 5.2%       |            | 7.9%                       |               |                   |        |
| Pout <sub>avg</sub> | 8.2 dBm    | 7.8 dBm    | 13.7 dBm                   | 0.1 dBm       | 11 dBm            | 11 dBm |

Table 4.1: Table of comparison for stand-alone PAs/Front-Ends at greater than 100GHz

\*\*Estimated from plots

The whole signal chain is equalized in software to correct linear distortions. It is important to note that no DPD was used to correct the non-linearity of the PA. The PA is measured using both QPSK and 16-QAM waveforms, and it achieves an average  $P_{out}$  of 10 dBm with an  $EVM_{rms}$  of 14.3% for a 5 GSPS (giga-symbols per second) QPSK waveform, which corresponds to a data-rate of 10 Gbps, and it achieves an average  $P_{out}$  of 9.1 dBm with an  $EVM_{rms}$  of 7.4% for a 500 MSPS (mega-symbols per second) 16-QAM waveform. The down-converted demodulated constellations for both these waveforms are shown in Fig. 4.9(a) and (b), respectively. The output power and symbol-rate dependence on the  $EVM_{rms}$  of various modulated waveforms are illustrated in Fig. 4.9(c). It is important to note that the data-rate for the QPSK measurement is limited due to the AWG's maximum sampling rate, and not due to the implemented PA. Table 4.1 compares the performance of this PA with stand-alone PAs operating beyond 100 GHz. The proposed measurement setup results in a state-of-the-art data-rate demonstration for high  $P_{out}$  (> 5 dBm) PAs.

#### 4.5 Conclusions

This chapter presents some synergistic work at F-band that includes designing efficient and high  $P_{out}$  PAs, along with a modulation setup to characterize these PAs with 6G-type multi-Gbps modulated data-rates. First, a single stage stacked PA with a  $P_{sat}$  of 15.3 dBm and wide small-signal bandwidth of 50 GHz is presented. Next, this PA is used in an eight-way power combined DAT configuration to generate a  $P_{sat}$  of 22.1 dBm, with a small-signal bandwidth of 40 GHz. Finally, a modulation measurement setup is presented using readily available measurement equipment to demonstrate a state-of-the-art 10 Gbps data-rate while achieving close to 8 dBm  $P_{out}$  at 121 GHz.

#### 4.6 Contributions

- L. Zhang, V. Iyer, J. Sheth, L. Xie, R. Weikle and S.M. Bowers, "An F-band DAT Based Power Amplifier in InP 130 nm HBT Technology," to be submitted to *IEEE Transactions on Terahertz Science and Technology*.
- V. Iyer, J. Sheth, L. Zhang, R. Weikle and S.M. Bowers, "A 15.3 dBm, 18.3% PAE F-band Power Amplifier in 130 nm InP HBT with Modulation Measurements," accepted in *IEEE Microwave and Wireless Technology Letters*.
- V. Iyer, J. Sheth, L. Zhang, R. M. Weikle and S. Bowers, "A 90-125 GHz Stacked PA in 130 nm InP HBT with 18.3 % peak PAE at 15.3 dBm Output Power," 2022 United States National Committee of URSI National Radio Science Meeting (USNC-URSI NRSM), 2022, pp. 224-225, doi: 10.23919/USNC-URSINRSM57467.2022.9881401.
- L. Zhang, V. Iyer, J. Sheth, L. Xie, R. M. Weikle and S. M. Bowers, "A 117.5-130 GHz 22.1 dBm 11.5% PAE DAT Based Power Amplifier in InP 130 nm HBT Technology," 2021 16th European Microwave Integrated Circuits Conference (EuMIC), 2022, pp. 229-232, doi: 10.23919/EuMIC50153.2022.9783740.

This work was done in collaboration with Vinay Iyer, Linsheng Zhang, and Linli Xie. My contributions include the following:

- Equal collaboration on analyzing the stability of the PA design to ensure small and largesignal stability.
- Equal collaboration on the packaging of the PA for performing measurements.
- Equal collaboration on planning the measurement setup, performing the measurements, and writing the automation code for CW measurements.
- Equal collaboration on planning the measurement setup and performing the modulation measurements.

# **Chapter 5**

# **Conclusions, Methodology for Designing Digital Transmitters, Future Directions, and Other Work**

## 5.1 Dissertation Conclusions

Digitally implemented transmitters have become appealing in recent years due to their ability to scale well with process nodes, and due to their potential to realize cost-effective transmitters with improved battery life. However, in practice, most applications use higher order modulation schemes, such as quadrature amplitude modulation, to transmit data; such schemes exhibit high peak to average power ratio. Without any modification, a digital transmitter suffers from poor efficiency in output power back-off, resulting in severe limitations to battery life improvement. Further, the two well-known digital transmitter implementations, polar and quadrature, have their own limitations. The polar implementation suffers from wide bandwidth expansion that limits the overall data-rate of the transmitter, while the quadrature implementation exhibits degraded efficiency due to the in-phase/quadrature basis vector combination, which further limits the improvement to

battery life. This dissertation proposes and implements design techniques to achieve compact and efficient digital transmitters.

- First, a nested Doherty architecture is proposed to improve efficiency in deep output power back-off. A typical nested Doherty architecture requires multiple quarter-wave transmission lines that are traditionally implemented using a physical transmission line model ( $\pi$  network with inductor-capacitor-inductor), which results in a significant number of inductors that are not practical to realize on-chip. This dissertation proposes the use of an inverted transmission line model ( $\pi$  network with capacitor-inductor-capacitor), which leads to a considerable number of inductors being placed in parallel. These can easily be consolidated, resulting in a compact form-factor. An implementation of the proposed technique is shown in Chapter 2, which results in a measured 42% peak drain efficiency at 5.25 GHz with 1.6× improvement at 9 dB PBO compared to a normalized class B PA.
- Second, a transformer-within-transformer architecture is proposed to achieve an even more compact form-factor, while also realizing efficiency enhancement in deep output power back-off through the use of a two-way asymmetric Doherty network. To prevent undesired magnetic coupling between the two transformer structures, one of the transformers is twisted into a Figure-8 shape and inserted into a non-twisted transformer. The resulting magnetic flux in the two "octagons" of Figure-8 are equal in magnitude but opposite in direction; therefore, the net induced current in the non-twisted loops due to the Figure-8 loops is close to zero. An implementation of this technique is shown in Chapter 3, which results in a measured 33% peak drain efficiency at 6.5 GHz with 1.76× improvement at 8 dB PBO compared to a normalized class B PA.
- Third, a multi-phase architecture is proposed to overcome the limitations of both the polar and the quadrature digital transmitter architectures. The use of multi-phase improves the efficiency profile degradation associated with the quadrature architecture, since the multiphase basis vectors are more closely spaced compared to the in-phase/quadrature vectors. Additionally, the multi-phase architecture uses only continuous wave signals at the input

and thereby overcomes the massive bandwidth expansion of the polar architecture. An implementation of this technique is shown in Chapter 3, which results in a peak drain efficiency of 33%, and a relatively high worst-case efficiency of 25% that corresponds to only  $0.24 \times$  reduction, compared to a  $0.45 \times$  reduction associated with an idealized quadrature architecture.

• Finally, a wideband multi-phase generation technique for a multi-phase transmitter is proposed to overcome the need to use a multiple of the operating frequency at the input. Typically, flip-flop based dividers are used to generate multi-phases for wideband operation; however, this technique is difficult to scale with frequency. For example, an eight-phase architecture operating at 6.5 GHz would need a prohibitive 26 GHz input. The proposed technique of using the output of a 3-ring polyphase filter to injection lock an eight-stage ring oscillator achieves wideband eight-phase generation, while utilizing only the operating frequency as its input. An implementation of this technique is shown in Chapter 3, which results in more than 1 GHz of locking range around the operating frequency of 6.5 GHz.

Additionally, a strategy for the design of a transformer-based asymmetric Doherty matching network is also discussed in Chapter 3. This methodology generates the required transformer parameters for a two-way asymmetric series Doherty network to achieve efficiency enhancement at any desired back-off level and at any desired operating frequency. This strategy provides a decent starting point for designing the matching network, and it is applicable irrespective of whether the proposed transformer-within-transformer technique is utilized.

### 5.2 Considerations for Designing a Digital Transmitter

Although this dissertation is focused on techniques for increasing compactness and efficiency, there are numerous design parameters that need careful consideration while implementing a digital transmitter for specific applications. This subsection provides a discussion of those considerations as a guide for the designer.

- Peak and Back-off Efficiency: As seen throughout this dissertation, the output network of the transmitter/power amplifier dictates the peak and back-off efficiency of the entire transmitter. The biggest constraint on the output network is the available area. If the area is severely limited, transformer-within-transformer architecture provides a decent solution to achieve efficiency enhancement in a small form-factor. However, due to the parasitic magnetic coupling between the transformers and the use of multiple vias, the compact form-factor could reduce the maximum achievable efficiency. If additional area is available, the designer should consider a nested Doherty architecture, since it has the potential to achieve the highest efficiency. Further, this architecture could also reduce the design time, since most design kits include inductor models that can be easily "dragged and dropped"; however, the transformer-within-transformer architecture needs time-consuming iterative EM simulations. The designer is encouraged to refer to Sections 2.3 and 3.2.4 for the design strategy for the output network.
- **Output Type:** Based on the desired application, the output of the transmitter may need to be single-ended. A transformer-based output network is appealing for such situations, as it inherently provides a single-ended output. If a nested Doherty architecture is utilized, then a balun (differential to single-ended transformer) would be required at the output, which further increases the area occupied by a nested architecture.
- **Output Power:** The output power is determined according to the impedance seen by the PA and the supply voltage at the output stage of each unit cell. To increase the output power, the impedance seen by the PA needs to be reduced, and the supply voltage needs to be increased. The impedance is determined by the output network and the number of "ways" in the nested Doherty architecture, as explained in Sections 2.3 and 3.2.4. To lower the impedance further, an additional matching network can be added at the output to transform the load impedance to a lower value; however, that comes at the cost of increased area and reduced efficiency due to increased loss. It is important to note that if a transformer-based output network or a balun is utilized to achieve single-ended output, it inherently halves the

load impedance. The supply voltage can be used as a design parameter to achieve the desired output power; however, its increase is limited by the breakdown voltage of the transistor. To further increase the supply voltage, a cascode structure composed of a 1.2 V breakdown common-source device and a common-gate device could be utilized. Most design kits also include 2.5 V breakdown devices that could be utilized for common-gate or both common-source and common-gate to further increase the supply voltage; however, these devices are slow and lead to reduced efficiency (typically the gate length of 2.5 V devices is  $> 4 \times$  that of 1.2 V devices).

- **Output Resolution:** Increasing the resolution of a digital transmitter is desired to improve the adjacent channel power ratio and to improve the error vector magnitude, which allows for the use of higher-order modulation schemes. A 6-bit thermometer-coded unit cell implementation can achieve 128-QAM, as shown in this work. Using pure thermometer coding is beneficial to achieve monotonously increasing output voltage with respect to code words. However, it is challenging to implement pure thermometer coding beyond 6-bit or 7-bit, since it severely increases the total number of unit cells (incrementing thermometer-coded bit resolution by 1 results in doubling the number of total unit cells). This not only results in increased area consumption, but also makes it challenging to route all the RF signals into and out of each unit cell. Therefore, to increase output resolution, a combination of binary-weighted and thermometer-coded unit cells should be considered. This could potentially result in non-monotonously increasing output voltage with respect to code words, thereby requiring correction techniques such as using a look-up table.
- **Data-rate:** In order to improve data-rate, time synchronization of the control signals is key. Typically, routing for control signals is long and uneven, which results in a timing mismatch between unit cells. Inserting a flip-flop into every unit cell to time all the control signals to actuate simultaneously can significantly improve data-rate. Next, it is difficult to generate high-speed digital signals beyond a few hundred MHz off-chip to drive the high-impedance on-chip control signal buffers. Therefore, to achieve high data-rates, SRAM

memory cells can be added on-chip, where the data is transferred to the memory cells at a slower speed, and then clocked at a few GHz to drive the control signals. Finally, for the multi-phase (or quadrature) architecture, delay cells should be added to the asynchronous basis vector mapper control bits, so their actuation can be synchronized with the rest of the control signals, while for the polar architecture, delay cells should be added in the constant-envelope phase-shift keyed RF signal path to synchronize the timing between phase and amplitude data.

#### 5.3 Future Directions

This dissertation showcases progress on multiple fronts toward realizing compact and efficient digital transmitters. However, there are numerous other challenges that still need to be solved to improve the overall performance of wireless transmitters.

#### 5.3.1 Wideband Transmitter Operation

Most applications can benefit from wideband transmitters to cover much of the sub-7 GHz spectrum. As shown in Section 3.2.2, the input network can operate over a wide frequency range; however, it is challenging to design a wideband output network, since one of its objectives is to resonate the parasitic capacitance of the PA, which inherently reduces bandwidth. Additionally, current-mode Doherty architectures use a quarter-wave transmission line to achieve efficiency enhancement, which further limits the bandwidth. Reconfigurable matching networks could offer a potential solution to operate over a wide frequency range through the use of switchable capacitors [23]; however, they introduce a loss in the matching network, which reduces efficiency. Therefore, solutions are needed to achieve wideband transmitter operation.

#### 5.3.2 Improving Linearity

The measured data shown in Fig. 3.25 includes 48,896 data points, which are used for generating a look-up table to correct for the amplitude-to-amplitude and amplitude-to-phase non-linearity, and thus produce a modulated waveform. However, this look-up table is valid only at that particular measured frequency. If the operating frequency is varied, a new look-up table must be generated, which is a burdensome process. Additionally, in practical use, these data points need to be stored for every desired operating frequency, which takes a significant amount of memory. Such overhead is impractical, and better solutions are desired. Some works in the literature have demonstrated interpolation-based correction that occupies less memory [65]; however, before deployment, the transmitter still needs to be measured for certain data points to feed the interpolator. It is desirable to design a transmitter that is inherently linear. Techniques such as capacitive compensation [15] and non-linear unit cell segmentation [19] have been shown in the literature to achieve linear digital polar power amplifiers. However, given the limitations of the polar architecture described above, a multi-phase architecture is preferred, and solutions are still needed to achieve inherently linear multi-phase transmitters.

#### 5.3.3 Robustness to Load Impedance Variations

For cellular applications, a device is held by a user, and the orientation of the device in the user's hand can vary the response of the antenna, which causes variation in the load impedance seen by the transmitter. This work, along with most others in the literature, assumes a stable 50  $\Omega$  load impedance. As shown in Sections 2.3 and 3.2.4, the load impedance determines the output power, peak efficiency, and back-off efficiency. Therefore, large variations of this impedance can dramatically alter the performance of the transmitter. A self-healing architecture can provide a potential solution [91], where a small portion of the output signal is coupled to a feedback system to determine the variation in the load impedance, which then varies the bias and supply voltages to output the desired signal. However, inserting a coupler at the output of the transmitter and using

supply modulators reduces efficiency. Therefore, solutions are needed to achieve robustness to load impedance variation.

## 5.4 Other Work

In addition to the work presented in this dissertation, contributions were made to other projects. These include designing a transformer-based matching network around 24 GHz to convert a singleended input to a differential output, while simultaneously providing the input matching network for a low-noise amplifier used in a wake-up receiver; and helping to design and measure a 600 GHz phase-conjugation system designed in Teledyne's 250 nm InP process. The expected publications are listed below:

D. Duvvuri, J. Sheth, S. Bhattacharya, S. Hanifi, B. H. Calhoun and S. M. Bowers "A 5  $\mu$ W -80 dBm Multi-channel Tuned RF Wake-up Receiver for SHF Band Applications," *currently being measured*.

V. Iyer, C. Moore, J. Sheth, S.M. Bowers and R. M. Weikle "A 600 GHz Phase Conjugation System in 250 nm InP Process," *currently being measured*.

# **Bibliography**

- [1] 3GPP, "Overview of 3GPP Release 8 V0.3.3 (2014-09)," 3rd Generation Partnership Project (3GPP), Tech. Rep., 2014. [Online]. Available: https://www.3gpp.org/ftp/Information/ WORK\_PLAN/Description\_Releases/
- [2] L. Technologies. (2021) Sub-6 Cellular LTE/5G NR Frequency Band Guide. [Online]. Available: https://linxtechnologies.com/wp/wp-content/uploads/ sub-6-cellular-lte-5g-nr-frequency-band-guide.pdf
- [3] S. Banerji and R. S. Chowdhury, "On IEEE 802.11: Wireless LAN Technology," *International Journal of Mobile Network Communications & Telematics (IJMNCT)*, vol. 3, no. 4, pp. 45–64, 2013.
- [4] FCC. (2020)FCC Opens 6 GHz Band to Wi-Fi and Other Unlicensed Uses. [Online]. Available: https://www.fcc.gov/document/ fcc-opens-6-ghz-band-wi-fi-and-other-unlicensed-uses-0
- [5] C. D. Alwis, A. Kalla, Q.-V. Pham, P. Kumar, K. Dev, W.-J. Hwang, and M. Liyanage, "Survey on 6G Frontiers: Trends, Applications, Requirements, Technologies and Future Research," *IEEE Open Journal of the Communications Society*, vol. 2, pp. 836–886, 2021.
- [6] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, "Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications," *IEEE Commun. Surveys Tuts.*, vol. 17, no. 4, pp. 2347–2376, 4th Quart. 2015.

- [7] M. D. Nil. (2022) Wi-Fi HaLow Delivers On The Promise Of IoT Connectivity. [Online]. Available: https://www.forbes.com/sites/forbestechcouncil/2022/06/29/ wi-fi-halow-delivers-on-the-promise-of-iot-connectivity/?sh=57c5a44266b4
- [8] W.-F. Alliance. (2021) Wi-Fi CERTIFIED HaLow. [Online]. Available: https://www.wi-fi. org/discover-wi-fi/wi-fi-certified-halow
- [9] R. Levinger, E. Shumaker, R. Banin, A. Ravi, and O. Degani, "The Rise of the Digital RFIC Era: An Overview of Past and Present Digital RFIC Advancements," *IEEE Microwave Magazine*, vol. 23, no. 12, pp. 71–85, 2022.
- [10] Statista. (2022) Number of Internet of Things (IoT) connected devices worldwide from 2019 to 2021, with forecasts from 2022 to 2030. [Online]. Available: https: //www.statista.com/statistics/1183457/iot-connected-devices-worldwide/
- [11] P. Bassirian, "Techniques for Design of Temperature- and Interference-Robust Sub-100 nW Wakeup Receivers at Sub-GHz and Multi-GHz RF Frequencies," Ph.D. dissertation, University of Virginia, 2021.
- [12] A. Dissanayake, "Highly Reconfigurable and High Sensitivity Wake-up Receivers With Aggressive Duty-Cycling Techniques," Ph.D. dissertation, University of Virginia, 2021.
- [13] D. Chowdhury, L. Ye, E. Alon, and A. M. Niknejad, "An Efficient Mixed-Signal 2.4-GHz Polar Power Amplifier in 65-nm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1796–1809, Aug. 2011.
- [14] J. S. Park, S. Hu, Y. Wang, and H. Wang, "A Highly Linear Dual-Band Mixed-Mode Polar Power Amplifier in CMOS with An Ultra-Compact Output Network," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1756–1770, Aug. 2016.
- [15] J. S. Park, Y. Wang, S. Pellerano, C. Hull, and H. Wang, "A CMOS Wideband Current-Mode Digital Polar Power Amplifier With Built-In AM–PM Distortion Self-Compensation," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 340–356, Feb. 2018.

- [16] S. Yoo, J. S. Walling, E. C. Woo, B. Jann, and D. J. Allstot, "A Switched-Capacitor RF Power Amplifier," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 2977–2987, Dec. 2011.
- [17] A. Ba, Y. Liu, J. van den Heuvel, P. Mateman, B. Büsze, J. Dijkhuis, C. Bachmann, G. Dolmans, K. Philips, and H. De Groot, "A 1.3 nJ/b IEEE 802.11ah Fully-Digital Polar Transmitter for IoT Applications," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3103–3113, Dec. 2016.
- [18] H. J. Qian, J. O. Liang, and X. Luo, "Wideband Digital Power Amplifiers With Efficiency Improvement Using 40-nm LP CMOS Technology," *IEEE Trans. Microw. Theory Techn.*, vol. 64, no. 3, pp. 675–687, Mar. 2016.
- [19] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, "An Intrinsically Linear Wideband Polar Digital Power Amplifier," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3312–3328, 2017.
- [20] S. Hu, S. Kousai, and H. Wang, "A Broadband Mixed-Signal CMOS Power Amplifier With a Hybrid Class-G Doherty Efficiency Enhancement Technique," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 598–613, Mar. 2016.
- [21] S. Hu, S. Kousai, J. S. Park, O. L. Chlieh, and H. Wang, "Design of A Transformer-Based Reconfigurable Digital Polar Doherty Power Amplifier Fully Integrated in Bulk CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 5, pp. 1094–1106, May 2015.
- [22] V. Vorapipat, C. S. Levy, and P. M. Asbeck, "Voltage Mode Doherty Power Amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1295–1304, May 2017.
- [23] H. J. Qian, B. Yang, J. Zhou, H. Xu, and X. Luo, "A Quadrature Digital Power Amplifier with Hybrid Doherty and Impedance Boosting for Efficiency Enhancement in Complex Domain," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 127–130.

- [24] S. Hung, S. Yoo, and S. Yoo, "A Quadrature Class-G Complex-Domain Doherty Digital Power Amplifier," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2019, pp. 291–294.
- [25] S.-W. Yoo, S.-C. Hung, and S.-M. Yoo, "A Watt-Level Quadrature Class-G Switched-Capacitor Power Amplifier With Linearization Techniques," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 5, pp. 1274–1287, 2019.
- [26] Z. Deng, E. Lu, E. Rostami, D. Sieh, D. Papadopoulos, B. Huang, R. Chen, H. Wang, W. Hsu, C. Wu, and O. Shanaa, "9.5 A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40nm CMOS," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 172–173.
- [27] M. Beikmirza, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, "A Wideband Four-Way Doherty Bits-In RF-Out CMOS Transmitter," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 12, pp. 3768–3783, 2021.
- [28] J. Walling, "Mixed-Mode Transceivers: A brief tutorial," *IEEE Solid-State Circuits Maga*zine, vol. 14, no. 3, pp. 53–64, 2022.
- [29] W. H. Doherty, "A New High Efficiency Power Amplifier for Modulated Waves," Proc. Inst. Radio Eng., vol. 24, no. 9, pp. 1163–1182, Sep. 1936.
- [30] N. Srirattana, A. Raghavan, D. Heo, P. E. Allen, and J. Laskar, "Analysis and Design of a High-Efficiency Multistage Doherty Power Amplifier for Wireless Communications," *IEEE Trans. Microw. Theory Techn.*, vol. 53, no. 3, pp. 852–860, Mar. 2005.
- [31] D. Jung, S. Li, J.-S. Park, T.-Y. Huang, H. Zhao, and H. Wang, "A CMOS 1.2-V Hybrid Current- and Voltage-Mode Three-Way Digital Doherty PA With Built-In Phase Nonlinearity Compensation," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 3, pp. 525–535, 2020.

- [32] Y. Yin, T. Li, L. Xiong, Y. Li, H. Min, N. Yan, and H. Xu, "A Broadband Switched-Transformer Digital Power Amplifier for Deep Back-Off Efficiency Enhancement," *IEEE J. Solid-State Circuits*, vol. 55, no. 11, pp. 2997–3008, Nov. 2020.
- [33] V. Vorapipat, C. S. Levy, and P. M. Asbeck, "A Class-G Voltage-Mode Doherty Power Amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3348–3360, Dec. 2017.
- [34] S. Yoo, J. S. Walling, O. Degani, B. Jann, R. Sadhwani, J. C. Rudell, and D. J. Allstot, "A Class-G Switched-Capacitor RF Power Amplifier," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1212–1224, May 2013.
- [35] M. Akhtar, S. Hassan, R. Ghaffar, H. Jung, S. Garg, and M. Hossain, "The shift to 6G communications: vision and requirements," *Springer Open Human Centric Computing and Information Sciences.*, vol. 10, no. 53, 2020.
- [36] T. S. Rappaport, Y. Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal, A. Alkhateeb, and G. C. Trichopoulos, "Wireless Communications and Applications Above 100 GHz: Opportunities and Challenges for 6G and Beyond," *IEEE Access*, vol. 7, pp. 78729–78757, 2019.
- [37] European Conference of Postal and Telecommunication Administration Electronic Communications Committee. (2018) ECC Report 282 - Point-to-Point Radio Links in the Frequency Ranges 92- 114.25 GHz and 130-174.8 GHz. [Online]. Available: https://docdb.cept.org/download/1337
- [38] Z. Griffith, M. Urteaga, and P. Rowell, "A W-Band SSPA With 100–140-mW Pout, >20% PAE, and 26–30-dB S21 Gain Across 88–104 GHz," *IEEE Microwave and Wireless Components Letters*, vol. 30, no. 2, pp. 189–192, 2020.
- [39] Z. Griffith, M. Urteaga, and P. Rowell, "A 140-GHz 0.25-W PA and a 55-135 GHz 115-135 mW PA, High-Gain, Broadband Power Amplifier MMICs in 250-nm InP HBT," in 2019 IEEE MTT-S International Microwave Symposium (IMS), 2019, pp. 1245–1248.

- [40] A. S. H. Ahmed, M. Seo, A. A. Farid, M. Urteaga, J. F. Buckwalter, and M. J. W. Rodwell, "A 140GHz power amplifier with 20.5dBm output power and 20.8% PAE in 250-nm InP HBT technology," in 2020 IEEE/MTT-S International Microwave Symposium (IMS), 2020, pp. 492–495.
- [41] A. S. Ahmed, M. Seo, A. A. Farid, M. Urteaga, J. F. Buckwalter, and M. J. Rodwell, "A 200mW D-band Power Amplifier with 17.8% PAE in 250-nm InP HBT Technology," in 2020 15th European Microwave Integrated Circuits Conference (EuMIC), 2021, pp. 1–4.
- [42] A. A. Farid, A. S. H. Ahmed, and M. J. W. Rodwell, "A 27.5dBm EIRP D-Band Transmitter Module on a Ceramic Interposer," in 2021 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2021, pp. 43–46.
- [43] X. Liu, M. M. Izad, L. Yao, and C. Heng, "A 13 pJ/bit 900 MHz QPSK/16-QAM Band Shaped Transmitter Based on Injection Locking and Digital PA for Biomedical Applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2408–2421, Nov. 2014.
- [44] Yao-Hong Liu and Tsung-Hsien Lin, "A 3.5-mW 15-Mbps O-QPSK transmitter for real-time wireless medical imaging applications," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2008, pp. 599–602.
- [45] X. Chen, J. Breiholz, F. B. Yahya, C. J. Lukas, H. Kim, B. H. Calhoun, and D. D. Wentzloff, "Analysis and Design of an Ultra-Low-Power Bluetooth Low-Energy Transmitter With Ring Oscillator-Based ADPLL and 4 × Frequency Edge Combiner," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1339–1350, May 2019.
- [46] J. Prummel, M. Papamichail, J. Willms, R. Todi, W. Aartsen, W. Kruiskamp, J. Haanstra, E. Opbroek, S. Rievers, P. Seesink, J. van Gorsel, H. Woering, and C. Smit, "A 10 mW Bluetooth Low-Energy Transceiver With On-Chip Matching," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3077–3088, Dec. 2015.
- [47] M. Babaie, F. Kuo, H. R. Chen, L. Cho, C. Jou, F. Hsueh, M. Shahmohammadi, and R. B. Staszewski, "A Fully Integrated Bluetooth Low-Energy Transmitter in 28 nm CMOS With

36% System Efficiency at 3 dBm," *IEEE J. Solid-State Circuits*, vol. 51, no. 7, pp. 1547–1565, Jul. 2016.

- [48] S. Yang, J. Yin, H. Yi, W. Yu, P. Mak, and R. P. Martins, "A 0.2-V Energy-Harvesting BLE Transmitter With a Micropower Manager Achieving 25% System Efficiency at 0-dBm Output and 5.2-nW Sleep Power in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1351–1362, May 2019.
- [49] B. Yang, H. J. Qian, T. Wang, and X. Luo, "1.2–3.6 GHz 32.67 dBm 4096-QAM Digital PA Using Reconfigurable Power Combining Transformer for Wireless Communication," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 123–126.
- [50] P. Cruise, Chih-Ming Hung, R. B. Staszewski, O. Eliezer, S. Rezeq, K. Maggio, and D. Leipold, "A Digital-to-RF-Amplitude Converter for GSM/GPRS/EDGE in 90-nm digital CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2005, pp. 21–24.
- [51] D. Jeong, S. Lee, H. Lee, and B. Kim, "Ultra-Low Power Direct-Conversion 16 QAM Transmitter Based on Doherty Power Amplifier," *IEEE Microw. Wireless Compon. Lett.*, vol. 26, no. 7, pp. 528–530, Jul. 2016.
- [52] Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, "A Fully-Integrated Digital-Intensive Polar Doherty Transmitter," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2017, pp. 196–199.
- [53] Youngoo Yang, Jeonghyeon Cha, Bumjae Shin, and Bumman Kim, "A Fully Matched N-way Doherty Amplifier with Optimized Linearity," *IEEE Trans. Microw. Theory Techn.*, vol. 51, no. 3, pp. 986–993, Mar. 2003.
- [54] W. C. E. Neo, J. Qureshi, M. J. Pelk, J. R. Gajadharsing, and L. C. N. de Vreede, "A Mixed-Signal Approach Towards Linear and Efficient N-Way Doherty Amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 55, no. 5, pp. 866–879, May 2007.

- [55] N. Ginzberg, D. Regev, and E. Cohen, "Digital Evolution of the Quadrature Balanced Power Amplifier Transceiver for Full Duplex Wireless Operation," *IEEE Solid-State Circuits Lett.*, vol. 3, pp. 434–437, 2020.
- [56] J. Lee, J. Kim, J. Kim, K. Cho, and S. P. Stapleton, "A High Power Asymmetric Doherty Amplifier with Improved Linear Dynamic Range," in *IEEE MTT-S Int. Microw. Symp. Dig*, Jun. 2006, pp. 1348–1351.
- [57] H. Lee, W. Lim, J. Bae, W. Lee, H. Kang, K. C. Hwang, K. Lee, C. Park, and Y. Yang, "Highly Efficient Fully Integrated GaN-HEMT Doherty Power Amplifier Based on Compact Load Network," *IEEE Trans. Microw. Theory Techn.*, vol. 65, no. 12, pp. 5203–5211, Dec. 2017.
- [58] Z. Hu, L. C. N. de Vreede, M. S. Alavi, D. A. Calvillo-Cortes, R. B. Staszewski, and S. He, "A 5.9 GHz RFDAC-Based Outphasing Power Amplifier in 40-nm CMOS with 49.2% Efficiency and 22.2 dBm Power," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2016, pp. 206–209.
- [59] D. Jung, H. Zhao, and H. Wang, "A CMOS Highly Linear Doherty Power Amplifier With Multigated Transistors," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 5, pp. 1883–1891, May 2019.
- [60] B. Yang, H. J. Qian, T. Wang, and X. Luo, "A CMOS Wideband Watt-Level 4096-QAM Digital Power Amplifier Using Reconfigurable Power-Combining Transformer," *IEEE Journal* of Solid-State Circuits, pp. 1–14, 2022.
- [61] A. Ben-Bassat, S. Gross, A. Lane, A. Nazimov, B. Khamaisi, E. Solomon, E. Banin, E. Borokhovich, N. Kimiagorov, N. Dinur, P. Skliar, R. Cohen, R. Banin, S. Zur, S. Reinhold, S. Breuer-Bruker, T. Abuhazira, T. Livneh, T. Maimon, U. Parker, A. Ravi, and O. Degani, "A Fully Integrated 27-dBm Dual-Band All-Digital Polar Transmitter Supporting 160 MHz for Wi-Fi 6 Applications," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 12, pp. 3414–3425, 2020.

- [62] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, "A Quadrature Switched Capacitor Power Amplifier," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 5, pp. 1200– 1209, 2016.
- [63] Y. Li, Y. Yin, D. Zheng, X. Jia, J. Lin, F. Gao, Y. Zhu, L. Xiong, N. Yan, Y. Lu, and H. Xu, "A 15-Bit Quadrature Digital Power Amplifier With Transformer-Based Complex-Domain Efficiency Enhancement," *IEEE Journal of Solid-State Circuits*, vol. 57, no. 6, pp. 1610– 1622, 2022.
- [64] H. Jin, D. Kim, and B. Kim, "Efficient digital quadrature transmitter based on iq cell sharing," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 5, pp. 1345–1357, 2017.
- [65] W. Yuan and J. S. Walling, "A Multiphase Switched Capacitor Power Amplifier," *IEEE Jour-nal of Solid-State Circuits*, vol. 52, no. 5, pp. 1320–1330, 2017.
- [66] Z. Bai, W. Yuan, A. Azam, and J. S. Walling, "4.3 A Multiphase Interpolating Digital Power Amplifier for TX Beamforming in 65nm CMOS," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 78–80.
- [67] M. Beikmirza, Y. Shen, L. C. de Vreede, and M. S. Alavi, "A 1-to-4GHz Multi-Mode Digital Transmitter in 40nm CMOS Supporting 200MHz 1024-QAM OFDM signals with more than 23dBm/66% Peak Power/Drain Efficiency," in 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022, pp. 01–02.
- [68] C. Hu, Y. Yin, T. Li, Y. Liu, L. Xiong, and H. Xu, "A Fully-Integrated Wideband Digital Polar Transmitter With 11-bit Digital-to-Phase Converter in 40nm CMOS," *IEEE Journal of Solid-State Circuits*, pp. 1–12, 2022.
- [69] M. Beikmirza, Y. Shen, L. C. de Vreede, and M. S. Alavi, "A Wideband Two-Way Digital Doherty Transmitter in 40nm CMOS," in 2022 IEEE/MTT-S International Microwave Symposium - IMS 2022, 2022, pp. 975–978.
- [70] A. Zhang, C. Yang, M. Ayesh, and M. S.-W. Chen, "26.6 A 5-to-6GHz Current-Mode Subharmonic Switching Digital Power Amplifier for Enhancing Power Back-Off Efficiency," in 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, 2021, pp. 364– 366.
- [71] E. Kaymaksut, D. Zhao, and P. Reynaert, "Transformer-Based Doherty Power Amplifiers for mm-Wave Applications in 40-nm CMOS," *IEEE Transactions on Microwave Theory and Techniques*, vol. 63, no. 4, pp. 1186–1192, 2015.
- [72] M. Mortazavi, Y. Shen, D. Mul, L. C. N. de Vreede, M. Spirito, and M. Babaie, "A Four-Way Series Doherty Digital Polar Transmitter at mm-Wave Frequencies," *IEEE Journal of Solid-State Circuits*, vol. 57, no. 3, pp. 803–817, 2022.
- [73] M. Kazimierczuk and K. Puczko, "Exact analysis of class E tuned power amplifier at any Q and switch duty cycle," *IEEE Transactions on Circuits and Systems*, vol. 34, no. 2, pp. 149–159, 1987.
- [74] F. Raab, "Effects of circuit variations on the class E tuned power amplifier," *IEEE Journal of Solid-State Circuits*, vol. 13, no. 2, pp. 239–247, 1978.
- [75] M. Acar, A. J. Annema, and B. Nauta, "Analytical Design Equations for Class-E Power Amplifiers," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 54, no. 12, pp. 2706–2717, 2007.
- [76] J. Kaukovuori, K. Stadius, J. Ryynanen, and K. A. I. Halonen, "Analysis and Design of Passive Polyphase Filters," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, no. 10, pp. 3023–3037, 2008.
- [77] S. Mondal and D. A. Hall, "A 67µW Ultra-Low Power PVT-Robust MedRadio Transmitter," in 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020, pp. 327–330.

- [78] P. Kinget, R. Melville, D. Long, and V. Gopinathan, "An injection-locking scheme for precision quadrature generation," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 7, pp. 845–851, 2002.
- [79] A. Mirzaei, M. E. Heidari, R. Bagheri, and A. A. Abidi, "Multi-Phase Injection Widens Lock Range of Ring-Oscillator-Based Frequency Dividers," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 3, pp. 656–671, 2008.
- [80] K. Hu, T. Jiang, J. Wang, F. O'Mahony, and P. Y. Chiang, "A 0.6 mW/Gb/s, 6.4–7.2 Gb/s Serial Link Receiver Using Local Injection-Locked Ring Oscillators in 90 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 4, pp. 899–908, 2010.
- [81] M. Raj, S. Saeedi, and A. Emami, "A Wideband Injection Locked Quadrature Clock Generation and Distribution Technique for an Energy-Proportional 16–32 Gb/s Optical Receiver in 28 nm FDSOI CMOS," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 10, pp. 2446–2462, 2016.
- [82] P. Guan, H. Jia, W. Deng, Z. Wang, and B. Chi, "An Ultra-Compact 16-to-45 GHz Power Amplifier within A Single Inductor Footprint Using Folded Transformer Technique," in 2021 IEEE Custom Integrated Circuits Conference (CICC), 2021, pp. 1–2.
- [83] J. Sheth and S. M. Bowers, "A Four-Way Nested Digital Doherty Power Amplifier for Low-Power Applications," *IEEE Transactions on Microwave Theory and Techniques*, vol. 69, no. 6, pp. 2782–2794, 2021.
- [84] J.-B. Doré, D. Belot, E. Mercier, S. Bicaïs, G. Gougeon, Y. Corre, B. Miscopein, D. Kténas, and E. C. Strinati, "Technology Roadmap for Beyond 5G Wireless Connectivity in D-band," in 2020 2nd 6G Wireless Summit (6G SUMMIT), 2020, pp. 1–5.
- [85] A. Singh, M. Sayginer, M. J. Holyoak, J. Weiner, J. Kimionis, M. Elkhouly, Y. Baeyens, and S. Shahramian, "A D-Band Radio-on-Glass Module for Spectrally-Efficient and Low-Cost Wireless Backhaul," in 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020, pp. 99–102.

- [86] S. Carpenter, D. Nopchinda, M. Abbasi, Z. S. He, M. Bao, T. Eriksson, and H. Zirath, "A D-Band 48-Gbit/s 64-QAM/QPSK Direct-Conversion I/Q Transceiver Chipset," *IEEE Transactions on Microwave Theory and Techniques*, vol. 64, no. 4, pp. 1285–1296, 2016.
- [87] A. Simsek, A. S. H. Ahmed, A. A. Farid, U. Soylu, and M. J. W. Rodwell, "A 140GHz Two-Channel CMOS Transmitter Using Low-Cost Packaging Technologies," in 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), 2020, pp. 1–3.
- [88] M. Ito, T. Okawa, and T. Marumoto, "D-band Transceiver Utilizing 70-nm GaAs-mHEMT Technology for FDD System," in 2019 IEEE BiCMOS and Compound semiconductor Integrated Circuits and Technology Symposium (BCICTS), 2019, pp. 1–4.
- [89] X. Li, W. Chen, S. Li, H. Wu, X. Yi, R. Han, and Z. Feng, "A 110-to-130GHz SiGe BiC-MOS Doherty Power Amplifier With Slotline-Based Power-Combining Technique Achieving >22dBm Saturated Output Power and >10% Power Back-off Efficiency," in 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, 2022, pp. 316–318.
- [90] A. Hamani, A. Siligaris, F. Barrera, C. Dehos, N. Cassiau, B. Blampey, F. Chaix, M. Gary, and J. L. Gonzalez Jimenez, "A 84.48-Gb/s 64-QAM CMOS D-Band Channel-Bonding Tx Front-End With Integrated Multi-LO Frequency Generation," *IEEE Solid-State Circuits Letters*, vol. 3, pp. 346–349, 2020.
- [91] S. M. Bowers, K. Sengupta, K. Dasgupta, B. D. Parker, and A. Hajimiri, "Integrated selfhealing for mm-wave power amplifiers," *IEEE Transactions on Microwave Theory and Techniques*, vol. 61, no. 3, pp. 1301–1315, 2013.