The functional blocks described in this appendix are those blocks common to most of the implementations presented in this work.
All the registers are implemented by using arrays of flipflops. The flipflops are Dtype edgetriggered on the rising edge and include either SET pin, or RESET pin, or both.
The radix2 carrysave adder is implemented as an array of fulladders. Each fulladder (FA) is implemented as depicted in Figure A.1 and it can be decomposed into two halfadders (HA). Its maximum delay is the delay of the two XOR gates, or halfadders (t_{FA} = t_{HA} + t_{HA}).
Figure 1.1: Implementation of fulladder.
The selection function (SEL), except for radix512, is usually composed by a small carrypropagate adder, because of the carrysave representation of the residual, and by a function implemented with logic gates as depicted in Figure A.2. The implementations of SEL are obtained by synthesis of the VHDL description of the selection function. SEL includes both the assimilation of the carrysave representation of [^y] and the actual digitselection function.
Figure 1.2: Selection function.
The multiple generator (MULT) perform the following operation for division:

Figure 1.3: One bit of the multiple generator.
digit  M2  M1  P1  P2 
2  1  0  0  0 
1  0  1  0  0 
0  0  0  0  0 
1  0  0  1  0 
2  0  0  0  1 
To perform the rounding, it is necessary to detect the sign of the residual from its redundant representation and to determine if the residual is zero. In [10], a network to detect the two conditions: sign of residual, and residual is zero, is described. We now summarize its implementation. Let w_{S} and w_{c} be the values of the (h+1)bit carrysave representation of the last residual. We introduce two quantities a_{S} and a_{C} such that


 (25) 



The subtraction of 2^{h} to the carrysave representation of w is done by adding a (h+1)bit vector of 1s. The resulting expression for the bits of a_{S} and a_{c} are
 (26) 
The P_{i}s of expression (A.2) are generated in a hierarchical way using a carrylookahead structure. For example, for a 64bit signandzero detection unit using groups of 4 bits we have the scheme of Table A.2. And the two corresponding expressions for zero and sign are:



 
In this section we describe the voltage level shifter presented in [35]. Voltage level shifters are needed in circuits that operate with dual voltage (V_{DD} regular supply voltage and V_{2} reduced supply voltage). Level shifters are necessary when a portion of the circuit at voltage V_{2} is connected to a portion at voltage V_{DD}. As shown in Figure A.4, if the output of a circuit operating at V_{2} (C2) is connected directly to the input of a circuit operating at V_{DD} (C1), static current flows in C1 at the input level "high". Since the voltage of node N1 is not raised higher than V_{2}, the ptransistor MP1 cannot be cutoff if V_{2} < V_{DD}  V_{threshold,p} . Therefore, static current flows from V_{DD} to V_{SS} through MP1 and MN1. In order to block this static current a voltage level shifter is inserted at node N1. No level shifting is necessary when, in the reversed case, the output of a V_{DD} operated circuit is connected to the input of a V_{2} circuit. The voltage level shifter is realized as depicted in Figure A.5. Table A.3 indicates the inputoutput delays and energy consumption for a level shifter operating at V_{DD} = 3.3 V and V_{2} = 2.0 V, and its comparison with an inverter of the Passport library. The values in Table A.3 were obtained by SPICE simulation.
Figure 1.4: Dual voltage: C1 is not cutoff.
Figure 1.5: Voltage level shifter.
level shifter  inverter  
delay [ns]  E_{tran}  delay [ns]  E_{tran}  
t_{LH}  t_{rise}  t_{HL}  t_{fall}  [nJ]  t_{LH}  t_{rise}  t_{HL}  t_{fall}  [nJ]  
SL1  0.144  0.13  0.042  0.11  0.7  0.097  0.20  0.094  0.16  0.3 
SL4  0.245  0.17  0.087  0.22  1.2  0.164  0.32  0.163  0.27  0.8 
SL16  0.670  0.45  0.271  0.69  3.4  0.459  0.98  0.476  0.86  2.1 
SL = standard load = 22 fF for Passport library 