# KEEPER DESIGNS FOR WIDE FAN IN DYNAMIC LOGIC Sarthak Bhuva<sup>1</sup> and Praneeta Kalsait<sup>2</sup> <sup>1</sup>Electrical Engineering Dept, VJTI, H.R. Mahajani Marg, Mumbai-400019 <sup>2</sup>Electrical Engineering Dept, VJTI, H.R. Mahajani Marg, Mumbai-400019 #### Abstract: In this era, high performance and multifunctional modules to have in the modern microprocessors has become essential. Dynamic gates have been a brilliant choice in the design of these modules. But, as the length of the devices is reducing drastically, the increasing leakage current and decreasing noise margin in dynamic gates, is affecting the performance of the system and making it less robust. This was overcome by the use of keepers. Using a weak PMOS keeper could solve majority of the problems associated due to contention currents, however with the aggressive scaling technologies this has been rendered less effective. On the other hand, using large PMOS can drastically increase the contention current in wide fan-in dynamic logic which results in a drop in the performance. This paper reviews the issues with traditional keepers, followed by the new keeper techniques coming up, including conditional keeper, leakage replica keeper and adaptive keeper techniques which includes rate sensing keeper & variation tolerant keeper design and discuss each design's limitation. This can help to reduce the contention current, thereby decreasing the leakage power and also minimizing the delay time with an added advantage of reduced noise margins. ## **Key Words:** Keeper, Weak PMOS, Wide Fan-in dynamic logic #### 1. Introduction: Wide fan-in dynamic gates are an important structure in the critical path of modern high speed microprocessors [1]. However, with the aggressive scaling trends, the effects of process variation becomes very significant. Process variation causes a variation in leakage current of gates located in different regions of a die [2], [3]. In such a situation to maintain appropriate level of noise margin for wide fan-in gates, a large sized PMOS keeper is used but this large size keeper results in large contention between pulldown network (PDN) and the keeper. This contention leads to an unnecessary increase in power dissipation and delay. Continuous efforts have been made in this field to make a design process variation tolerant along with the reduction in the contention resulting in low power dissipation and less delay without drastically increasing the area or the power consumption [4]. DOI: 10.5121/ijme.2016.2101 ## 2. Need of Keepers: Although the dynamic circuits are much faster than their static counterparts, the circuits are very sensitive to cross talk, leakage current, charge sharing, power supply bump because the dynamic nodes cannot be recharged once the stored data is lost due to the noise sources present in the circuit. To make up for this charge loss, a PMOS pull up transistor is used to charge the dynamic node in the pre-charge phase. These PMOS transistors are called as the simple charge keepers. Besides, there are various other types of keeper like the feedback keeper, delay charge keeper, burn-in charge keeper [5]. The various designs explaining these types of keepers are mentioned later in this paper. ## 2.1. Issues with Wide Fan in Dynamic Logic The high performance ARM Cortex<sup>TM</sup>-A series Processors are used as core processors in almost all the smart devices being used today such as iphone, ipad, and mobile phones [6].In this processor, two register files are deployed in the data path, which are boxed for emphasis. The register files are used almost in each clock cycle, as in order to execute each instruction data should either be read from or written to the register file. Therefore register files forms an important module in high speed microprocessors. [4] Fig.1(a) Block diagram of a simplified register file and (b) read port implemented using 4 x1 multiplexer (MUX) [1] Fig. 1(a) shows the block diagram of such a register file consisting of static RAM register, a read and a write port [1]. These ports are implemented using multiplexer and demultiplexer whose implementation is shown in Fig. 1(b). It illustrates a simple 4x1 multiplexer with 4 input lines. [1] However, the actual ARM CORTEX microprocessor consists of 16 or 32 bit register file and hence would need 16 or 32 bit input OR gate. Also, with the advancement in the technology space and the increase in the use of 32 and 64 bit systems, the use of these wide fan in gates becomes very important. Hence, wide fan-in OR gate becomes an important structure in such a high speed microprocessor. But designing a highly robust wide fan-in dynamic OR gate is a difficult task in sub 100 nm regime [7], especially with the technology reaching as low as 13 nm in recent times. Process variation and high contention current are two major factors that make designing the wide OR gate a challenging task. [4] Also, in wide fan-in dynamic OR gate the systematic process variation results in variation of the threshold voltage of NMOS transistors in the pull down network. In turn this threshold voltage variation results in variation of the leakage current through the pull down network. Hence it becomes very difficult to maintain a particular value of noise margin [7]. One way this problem can be overcome is by using a large size PMOS keeper that can be employed at the dynamic node which can compensate for any variation in the leakage current by the pull down network. But this results in large contention between the pull down network and the keeper [1], [4]. In the literature various keeper designs have been proposed to addresses the process variation issue [1]. An effective variation tolerant keeper architecture is proposed in [1]. This technique has been used in the proposed design to achieve process variation tolerance. This design achieves process variation tolerance and contention current has been reduced as compared to conventional keeper design since a large size keeper is not used here. But still there is some amount of contention current flowing through the keeper when one of the inputs in the pull down network becomes high [4]. In the next part we will be discussing about the prevalent keeper designs which have been used till date and their limitations. Further, we will discuss about the new keeper designs that have been proposed which help us eliminate the contention current essentially, thus helping increase noise margins and also decrease the delay time. ## 3. Prevalent Keeper Techniques There have been a lot of improvisation in the designs of keeper. This section discusses various keeper designs which were used formerly. In dynamic circuits, there can be charge sharing between the dynamic node and the intermediate nodes of the logic block. This charge sharing may result in erroneous output. To prevent this error, a pMOS device is added as shown in the Fig. 2. This pMOS is called as a keeper which is always kept ON. The keeper pulls up the dynamic node to Vdd hence compensating the loss due to charge sharing. The requirement is that the output needs to go low when the pull down network is active. Hence, the pMOS keeper should be weaker than the pull down nMOS network. So that the nMOS pull down block, during the evaluation phase, will significantly overpower the keeper pMOS and pull the output node to ground. #### 3.1.1 Limitations: In the weak keeper design, one extra PMOS device is required for each stage of the circuit. Hence, the cost increases. Another major limitation is that excess power is dissipated due to the possibility of direct path from VDD to GND. Such circumstance occurs when PMOS and NMOS fight to pull up and pull down the node respectively, which results in contention current. #### 3.2. Standard Keeper Design To reduce contention current of the weak keeper design, it is proposed to keep the keeper active only when the output is high. For this, the gate of keeper is connected to output node of inverter stage. The keeper functions as a latch, cutting off whenever output of the inverter is high. Hence the keeper conducts only when the dynamic node is not grounded. Effectively, the power dissipation is significantly reduced. Fig.3 Standard Keeper Design #### 3.2.2 Limitation: At the start of evaluation, the keeper is ON which may cause contention if the input combination turns the PDN ON. Hence, the contention current is not completely eliminated. Additionally, in the design both delay of the gate and its power consumption are increased. ## 4. New Techniques The prevalent designs were simple and had a lot of limitations. To overcome this limitations the designs were modified. This section discusses these new design techniques in detail. ## 4.1. Conditional Keeper Design In most dynamic circuits, the input signals to wide dynamic gates are applied before or close to the start of the evaluation phase. Hence, the output transition has a time window of only a fraction of total evaluation time. This fraction is often the half-period time of a 50% duty-cycle clock. Thus, there is exposure to noise and leakage for a long time for the outputs of the dynamic gates. In conventional dynamic circuits, the standard keeper PK1, in Fig. 4 is active for the entire evaluation phase. Since the standard keeper is turned on unconditionally at the start of the evaluation phase, this reduces the performance of the circuit. In contrast, in conditional keeper technique, a large fraction of the keepers is activated conditionally. Hence, strong keepers with leaky pre-charged circuits can be used without significant impact on performance of the circuits. The keeper is weak when the clock transition happens and is strong for the rest of the time in evaluation, if the dynamic node should remain high. During the transition window, the weak keeper reduces contention and the strong keeper during evaluation gives good robustness to leakage and noise. The circuit implementation with two keepers: a fixed keeper, PK1, and a Fig.4 Conditional Keeper [10] conditional keeper, PK2 is shown in Fig. 4. At the beginning of the evaluation phase, when the clock goes from Low-to-High, PK1 is the only active keeper. The effective delay time will be delay due to delay element plus delay due to NAND gate. After the delay time the keeper PK2 will be activated, i.e. the output of the NAND gate would be low if the dynamic output should remain high. The output transition is faster and hence the delay improvement is larger when PK2 is activated near or later than the worst-case clock-to-output transition window, *TTmmmmmm*. The fixed keeper PK1 ensures sufficient robustness when PK2 is weak, which can be a small fraction of the total evaluation time. Depending on the required robustness of the actual gate, different size combinations of PK1 and PK2 can be utilized. [10] #### 4.1.2. Limitations: One limitation of the conditional keeper design is that significant amount of power dissipated in the inverter chain and the NAND gate that are used to generate the delay. Moreover, Delay of the inverter chain is maintained for the worst case fNsP corner. The shorter delay time set for the fNsP corner degrades the performance of the dynamic gate in the sNfP corner. ## 4.2. Leakage Current Replica Keeper Design The leakage current replica (LCR) keeper [11] (Fig. 5), is a circuit that addresses the shortcomings of the conventional keeper and previously proposed enhancements. The LCR keeper uses a conventional analog current mirror. This mirror tracks any process corner as well as voltage and temperature. Only one current mirror can be shared among many dynamic gates having same topology. A dynamic gate with an LCR keeper, where the keeper comprises of one extra series pFET and a replica current mirror is used. The current mirror traces the leakage current and copies it into the dynamic gate through p1. The overhead per gate is p1 plus a portion of the shared current mirror. Let the dynamic gate leakage current be *IIIllll kk*, construct a current mirror so as to draw ssss-IIIIII , where ssss is a safety factor. The nFET nnnnnnnn is sized such that it is a replica of the worst case leakage current. The worst case in the pull-down network occurs when AA0,....,nn =0 and BB0,....,BBnn =1. The gate of nnnnnnnn is connected to ground so that the nFET is off and leakage current flows through it. NFETs nn0...nnnn and nnnnnnnn have the same (generally minimum) channel length. Assuming the dimensions of p1 and p3 are same, the width of nnnnnnnn is set equal to the sum of the widths of nn0...nnnn times the safety factor ssss. The replica leakage current IIIIIII l is mirrored into transistor p1 by p3. Devices p1 and p3 are should have large L to eliminate channel-length modulation and to reduce V1 variation. The size of p2 is not critical, but it should have minimum Length to reduce output loading. It should also be large enough so that its drain-to-source voltage is negligible when it's in ON state. It is assumed that, when on, p2 is a virtual short and the potential at the drain of p1 is the same as the potential of the dynamic node DN. The safety factor sf is set by ratioing the $$sf = \frac{W_{nrpl}}{\sum W_{ni}} \cdot \frac{W_{p1}}{W_{p2}} \tag{1}$$ transistor geometries. Assuming that p1 and p3 have the same channel length and that all nFETs have the same channel length, sf is given by Where $W_{nrpl}$ , $W_{p1}$ and $W_{p3}$ donate the width od nrpl, p1 and p3 respectively, and Wni donates the width of ni(i=0, ..., n)[12] #### 4.2.1 Limitations: The limitation of the design is high Contention because the keeper is strongly ON at the beginning of evaluation. Moreover, Replica transistor does not track leakage due to noise and DIBL. Additionally, Area overhead and power dissipation becomes excessive if designed for higher noise robustness and better tracking. Also, the LCR keeper cannot track random on-die variation, which still must be addressed using conventional margining. #### 4.3 Adaptive Keeper Designs The adaptive keeper design techniques keep track of the process variations which is considerable in the deep sub-micron region. The two adaptive keeper design techniques discussed further are: - (i)Rate Sensing Keeper Technique - (ii) Variation Tolerant Keeper Technique ## **4.3.1.** Rate Sensing Keeper Technique (RSK) The difference in the rate of change of voltage at the dynamic node during the ON and the leakage condition is used by the rate sensing keeper. A reference rate is used to control the state of the keeper. This reference rate is the average of the two rates. RSK achieves high speed and better tracking because the keeper is OFF during the start of the evaluation phase and the adaptive control of the keeper strength is based on the process corner. The proposed keeper technique is shown in Fig. 6 for a wide AND-OR domino gate. The circuit mainly consists of the keeper pMOS transistor (M1), the rate controller including the reference rate generator transistor (M4), feedback transistor (M2), shutoff feedback transistor (M5), shutoff clock transistor(M6), and precharge transistor (M3). Fig.6 Rate Sensing Keeper [13] By using suitable bias voltage (VBIAS) and by biasing the transistor M4, a reference rate is generated. This reference rate is then compared with the dynamic node rate by using a rate controller, the output of which is used to control the keeper. During the precharge phase, the sense node (VSEN) is precharged to VDD, which then makes M1 off during the start of the evaluation phase. This reduces contention to a great amount. [13] #### 4.3.1.1 PVT ADAPTIVE BIAS GENERATION Fig.7 replica bias generator [13] Process variations have three elements namely, inter-die variations which is common to all the gates on the chip, spatially correlated intra-die variation that is shared by all gates within a spatially correlated region and random intra-die variations that is uncorrelated from transistor to transistor. The bias voltage is generated using a replica circuit as shown in Fig.7. The replica bias circuit mainly consists of a wide AND-OR domino gate with a rate sensing keeper which is controlled by a feedback circuit. The reference rate (the half rate) is obtained by setting the leakage and the ON current to half the original value for constant dynamic node capacitance. The gate used has 32 legs. Out of which 16 legs are in the worst case leakage state. One leg which is sized to supply half the ON current and the remaining 15 legs are set in a low leaky state (i.e. both inputs are grounded). As the capacitance at the node is fixed, the reference rate [13] is given by The feedback loops control the biased voltage of transistor M4 until the pull down rates on both $$R_{ref} = \frac{\frac{I_{on}}{2}}{C_{dyn}} + \frac{\frac{I_{off}}{2}}{C_{dyn}} = \frac{R_{on} + R_{off}}{2} \tag{2}$$ arms is equal. In every clock cycle, at the end of the evaluation phase, the output of the domino gate is sampled. Then, to obtain the average value the output sample is given to an opamp integrator with reference voltage maintained at VDD/2. The integrator in the feedback drives the voltage until the sampled output of the gate has an average value of VDD/2. At this moment the rates in both the arms are equal and VBIAS matches to the required bias voltage. Hence the replica structure successfully tracks the intra-die process variations. [13] #### 4.3.2 Variation Tolerant Keeper Technique In this technique a circuit is made tolerant to the on die variation. The variation sensor [14] used here works on drain-induced barrier lowering (DIBL) effect. The threshold voltage of a short-channel MOSFET is modulated by the deviation of drain voltage by this variation sensor [15]. Hence, in a short channel device, the threshold voltage becomes linearly dependent on the drain voltage, assuming that all other parameters are constant. The plus point of this design is that the circuit simultaneously improves performance and decreases power consumption. Furthermore, the keeper is much less sensitive to process variations. The process variation sensor is shown in Fig. 8. Here M2 is biased by voltage source VBias along with current source IREF. It is assumed that the bias condition remains the same over the entire threshold voltage range hence the role of the bias circuitry is critical. The reference current and voltage generated by the bias circuitry are only a function of the width of transistors in this circuit and the thermal voltage. Hence, they are effectively independent of channel length variations. Considering that bias sources are designed in such a way that they are not sensitive to process variations, the drain voltage of M2 will be a linear function of systematic process variations. Fig.8 Variation Tolerant Keeper [16] The drain voltage fluctuations will be approximately tenfold of the threshold voltage fluctuations of this sensor. Therefore, this sensor can be used to design circuits to offset the impact of systematic process variation. [16] #### 4.3.2.1 Limitations Both the adaptive keeper designs are not leakage tolerant. Also, in Variation tolerant keeper architecture contention problem has not been taken care of. ## 5. Conclusion As a result of the aggressive scaling that has been employed in order to reduce the channel length, the leakage current factor has become highly significant. Besides, in order to improve the speed, we are using the dynamic and domino technologies we are facing the problem of contention current. Also, due to the various parameter variations that take place, the leakage (contention) current cannot be completely eliminated. However, efforts are being made consciously to reduce the effects due to contention current and keep it minimal. For this various new techniques are being proposed wherein contention current has been drastically reduced. However, every solution does have a trade-off. While, the new techniques are helping us reduce the contention current, they are also resulting in extra area consumption on the chip thereby increasing the size of the chip. Besides, every extra transistor added causes an increase in the power consumption thereby decreasing the battery life of the devices. It is imperative that the proper trade-off be established based on the application for which the chip is being manufactured. This can only be achieved after proper analysis of chip behavior and exactly knowing the market requirements. ## Acknowledgements We would take this opportunity to thank God, Our Alma matter- VJTI for providing the necessary infrastructure for our research work. We would also like to thank Miss Akanksha Chouhan, our mentor for being always available and helping us clear all our doubts. Also, the contribution of our families cannot be ignored in this regard. This wouldn't have been possible without their motivation. We would also thank our friends and colleagues for the unending support and guidance whenever we got stuck somewhere. #### References - [1] H. F. Dadgour and K. Banerjee "A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic OR Gates" IEEE transaction on VLSI systems, vol.18, NO. 11, pp. 1567 1577, Nov 2010. - [2] K.J.Kuhn, M.D.Giles, D.Becher, P.Kolar, A.Kornfeld, R.Kotlyar, A.Maheshwari, S.Mudanai, "Process Technology Variation," Electron Devices, IEEE Transactions on , vol.58, no.8, pp.2197-2208, Aug. 2011 - [3] S Borkar, "Designing reliable systems from unreliable components: the challenges of transistor variability and degradation," Micro, IEEE, vol.25, no.6, pp. 10-16, Nov.-Dec. 2005 - [4] Vikas Mahor, Akanksha Chouhan, Manisha Pattanaik, "A Novel Process Variation Tolerant Wide Fan-In Dynamic OR Gate with Reduced Contention," Computers and Devices for Communication (CODEC), 2012 5th International Conference on 17-19 Dec, 2012 - [5] Introduction to VLSI systems: A logic, circuit and system perspective by Ming-Bo Lin - [6] J.Koppanalil, G.Yeung, D.O'Driscoll, S.Householder, C. Hawkins, "A 1.6 GHz dual-core ARM Cortex A9 implementation on a low power high-K metal gate 32nm process," VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on , vol., no., pp.1-4, 25-28 April 2011 - [7] ManishaPattanaik, Fazal Rahim ,Muddala V D L Varaprasad, "Improvement of Noise Tolerance Analysis in Deep-submicron Dynamic CMOS logic circuits", IEEE International Conference of Electronic Devices Systems, Page(s): 48 53, Year: 2010. - [8] RakeshGnana David Jeyasingh, NavakantaBhat, and BharadwajAmrutur, "Adaptive Keeper Design for Dynamic Logic Circuits Using Rate Sensing Technique," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 2, pp. 295-204,Feb. 2011. - [9] Sung-Mo Kang, Yusuf Leblebici "CMOS Digital Integrated Circuits Analysis and Design", Pub- Mc Graw Hill, second edition. - [10] Atila Alvandpour, Ram K. Krishnamurthy, K. Soumyanath, and Shekhar Y. Borkar "A Sub-130-nm Conditional Keeper Technique" IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002 - [11] Y. Lih et al., "A leakage current replica keeper for dynamic circuits," in IEEE ISSCC Dig. Tech. Papers, 2006, pp. 442–443 - [12] Yolin Lih, Nestoras Tzartzanis, and William W. Walker, "A Leakage Current Replica Keeper for Dynamic Circuits", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007 - [13] Rakesh Gnana, David Jeyasingh, Navakanta Bhat, and Bharadwaj Amrutur "Adaptive Keeper Design for Dynamic Logic Circuits Using Rate Sensing Technique" IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 - [14] T. Kuroda, T. Fujita, S. Mita, T. Nagamatu, S. Yoshioka, F. Sano, M. Norishima, M. Murota, M. Kako, M. Kinugawa, M. Kakumu, and T. Sakurai, "A 0.9 V 150 MHz 10 mW 4 2-D discrete cosine transform core processor with variable-threshold-voltage scheme," in Proc. ISSCC, 1996, pp. 166–167. - [15] M. M. Griffin, J. Zerbe, G. Tsang, M. Ching, and C. L. Portmann, "A process-independent, 800-MB/s, DRAM byte-wide interface featuring command interleaving and concurrent memory operation, "IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1741–1751, Nov. 1998. - [16] Hamed F. Dadgour and Kaustav Banerjee "A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic OR Gates" IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 11, NOVEMBER 2010