|
Copyright 2002 IEEE. To be published in the Proceedings of 2002 IEEE International Symposium on Circuits and Systems (ISCAS), Phoenix, Arizona, USA, May 26-29, 2002.
Previous work demonstrated that FIR filters implemented in the Residue Number System (RNS) offer better performance of filters realized in the traditional binary system in terms of area and power dissipation [1]. In this work, we want to extend the findings of [1] to a generic datapath for digital signal processing and show the advantages deriving from the use of the RNS in terms of flexibility and power dissipation.
Recently, reconfigurable hardware, such as FPGAs, has gained a large portion of the market because it allows fast prototyping and tuning-up of the products which reduce design time and costs. However, usually reconfiguration of FPGAs cannot be done at run time (on-the-fly).
Reconfigurable computers (RCC) are systems that combine a reconfigurable hardware processing unit with a programmable controller. The basic blocks are arithmetic units (such as multipliers and adders) which can be configured to implement a given algorithm or a sequence of computations. Reconfigurable computers are well suited for high-throughput and data-parallel applications [2]. RCCs offer less flexibility than FPGAs, but can be configured on-the-fly by a command string. The growing interest for reconfigurable computers is also confirmed by the appearance on the market of the first commercial systems [3].
In this paper, we present the design of a RNS configurable datapath oriented to DSP applications. Our datapath should be reconfigurable, or programmable, in terms of bit-width (dynamic range), type of operations (multiplication or addition), and sequence of operations. As a first approach, we selected an architecture of the datapath suitable for convolution and pattern matching.
A Residue Number System is defined by a set of relatively prime integers
{m1, m2, ¼, mP }.
The dynamic range of the system is given by the product of all the
moduli mi:
M = m1 ·m2 ·¼·mP .
Any integer X Î { 0, 1, 2, ¼M-1} has a unique RNS
representation given by:
|
| (1) |
Therefore, the use of the RNS allows the decomposition of a given dynamic range in slices of smaller range on which the computation can be implemented in parallel [4]. The conversion RNS to binary is accomplished by the Chinese Remainder Theorem (CRT) [5].
Modular multiplication can be simplified by using the isomorphism technique.
According to [6], if m is prime there exists a primitive
radix r such that its powers modulo m
cover the set [1, m-1].
By using isomorphism, the product of the two residues is transformed
into the sum of their indices which are obtained by an isomorphic
transformation:
|
The RNS reconfigurable processor is depicted in Figure 1. The figure shows a control unit which is used to configure the datapath, according to a command string, to adapt it to different applications. The processor includes the conversions binary/RNS and RNS/binary.
![]() |
There are three main advantages in using the RNS for a configurable datapath.
Once a dynamic range M1 < M is chosen, and consequently, the number of bits required is smaller that the bit-width of the datapath (log2 M1 < log2 M) we can turn-off in the RNS datapath the paths modulo mi which are not needed to cover the dynamic M1.
For example, by choosing the following set of moduli:
{ 3, 5, 7, 11, 13, 17, 19}
a 22-bit dynamic range can be covered. If our system requires a dynamic
range of 16 bits we can use the set of moduli:
{ 3, 7, 11, 17, 19}
turning off the paths through the moduli 5 and 13.
As a consequence, for dynamic ranges smaller than the maximum range the
power dissipation is reduced.
As an alternative to Figure 2, because in RNS the dynamic range is broken down into smaller ranges of a few bits (3-6), we can implement any operation satisfying Expr. (1) with small look-up tables to be loaded during reconfiguration.
![]() |
We designed our datapath to have a maximum dynamic range of 32 bits, 64 processing elements PEs (either multipliers or adders), and a tree of adders to add 64 operands. As a first implementation, for each of the RNS paths (see Figure 3), we connected the inputs of the PEs to two élog2 miù-bit shift registers (Reg. A and B) and the output to the tree of adders. This scheme is suitable for mono and bidimensional convolution and pattern matching, as later explained. The RNS paths not used to cover a given dynamic range are deactivated by disabling the clock of registers A and B in the path.
A command string, a sort of micro-instruction, is given to the controller (Figure 1) to perform the following tasks:
For the set of applications we selected, register A holds values which are constant for the whole processing. For this reason, we can use one binary to RNS converter to load both register A (at start-up) and register B (runtime). The RNS to binary converter is programmable in terms of dynamic range as well: when some moduli are not used in the datapath, the corresponding parts in the converter are disabled by switching off the clock in the registers at the interface between the RNS paths and the converter.
![]() |
The first application we implement on the reconfigurable RNS datapath
is a a programmable 64-tap FIR filter
|
For this application, the coefficients ak are loaded in register A and the samples x(n-k) are fed into register B one per clock cycle. The PEs are all set in multiply mode.
Bidimensional convolution is used to process an image in the time domain.
Averaging, smoothing and sharpening of a digital image are done by
the convolution of the matrix of pixels representing the image
B(n×n) with a specific mask W(m×m). For example, if
m = 3 the value of the new pixel is computed as:
|
![]() |
We consider the problem of detecting from a stream of data a sequence of symbols, where each symbol is represented by a word of n bits. If, as an example, we consider a stream of ASCII characters, we want to detect the 8-character sequence ÄB*A***D", in which the wildcard (*) indicates that any character can be in that position. A simple approach to perform the task, using the reconfigurable RNS datapath, consists in the following steps:
Table 1 show an example of pattern matching using the two moduli { 11, 17}.
stream | A | B | B | A | G | O | L | D |
Reg B | 10 | 0 | 0 | 10 | 5 | 2 | 10 | 2 |
op | + | + | × | + | × | × | × | + |
Reg A | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 9 |
mod 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Reg B | 14 | 15 | 15 | 14 | 3 | 11 | 3 | 0 |
op | + | + | × | + | × | × | × | + |
Reg A | 3 | 2 | 0 | 3 | 0 | 0 | 0 | 0 |
mod 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The implementation of the reconfigurable RNS datapath has been realized
with the AMS 0.35 mm library of standard cells.
To cover the 32-bit dynamic range in RNS we chose the following set of moduli:
|
In order to compare the performance of the reconfigurable RNS processor in terms of throughput, area and power dissipation, we implemented a programmable 64-tap FIR filter realized in the traditional two's complement system (TCS) with 32-bit dynamic range. Table 2 reports the characteristics of the TCS filter and the reconfigurable RNS datapath. The table shows that the RNS processor is about 25% faster than the traditional FIR filter and consume less power (at the same frequency).
By reducing the dynamic range, we obtain for the power dissipation, computed at 100 MHz, the results shown in Figure 5. For the TCS filter the range is reduced simply by feeding data with shorter wordlength, while for the RNS datapaths, we also disabled, by turning off the clock, the paths through the moduli not necessary to cover that dynamic range. The figure shows that the TCS filter working at 20 bit dynamic range consumes almost the same energy of the RNS working at full dynamics. From the RNS set of points (staircase shape), we can see that the power reductions are due to the different configurations of active moduli, and, once a moduli set is selected, the power dissipation is constant. For the TCS, the set of points is on a straight line because by reducing the input wordlength, the most-significant portion of the datapath get filled with sign-extension bits1, which significantly reduce switching activity.
![]() |
We presented the implementation of a RNS configurable datapath suitable for digital FIR filtering, bidimensional convolution and pattern matching. The datapath is configurable in terms of dynamic range and type of operations. We compared its performance and power dissipation with a FIR filter realized in the traditional two's complement system, and found that the RNS datapath can be clocked at a higher frequency and consumes on the average a 30% less power.
We plan to introduce an interconnection network between PEs in the datapath to improve its flexibility and be able to run more applications on it.
|
1Radix-4 Booth recoding in multipliers transforms a sequence of 1s (negative numbers) in a sequence of 0s.