## 2012年12月5日 星期三

### Sync Counter

by Lee Chin Wei
and Andrew Long

Contents

An Overview
The purpose of the survey is to collate information on Digital Synchronous Counters. Particular emphasis was placed on the following areas :

1. Types of Synchronous Counters and How they work
2. Fast Counter Techniques
3. Fast Counters from Xilinx
4. Implementation of Counters :
Dedicated Hardware and Alternative Devices
The material is presented in a manner suitable for a teaching tool. It seeks to enlighten and to spark off interest in the design of counters. As R.S.S Obermann remarks "....design of counters has, in my experience, always been an excellent proving ground for anyone who has mastered Boolean algebra... Have fun reading !!!!!

Different types of Synchronous Counters
Binary Up Counters
A synchronous binary counter counts from 0 to 2N-1, where N is the number of bits/flip-flops in the counter. Each flip-flop is used to represent one bit. The flip-flop in the lowest-order position is complemented/toggled with every clock pulse and a flip-flop in any other position is complemented on the next clock pulse provided all the bits in the lower-order positions are equal to 1.
Take for example A4 A3 A2 A1 = 0011. On the next count, A4 A3 A2 A1 = 0100. A1, the lowest-order bit, is always complemented. A2 is complemented because all the lower-order positions (A1 only in this case) are 1's. A3 is also complemented because all the lower-order positions, A2 and A1 are 1's. But A4 is not complemented the lower-order positions, A3 A2 A1 = 011, do not give an all 1 condition.
To implment a synchronous counter, we need a flip-flop for every bit and an AND gate for every bit except the first and the last bit. The diagram below shows the implementation of a 4-bit synchronous up-counter.

4-bit Synchronous Binary Up-Counter
From the diagram above, we can see that although the counter is synchronous and is supposed to change simultaneously, we have a propagation delay through the AND gates which add up to give an overall propagation delay which is proportional to the number of bits of the counter. To overcome this problem, we can feed the outputs from the flip-flops directly to a many-input AND gate as follows :

4-bit Synchronous Binary Up Counter using speedup technique
This method does overcomes the problem of additive propagation delay but introduces some other problem of its own. From the diagram above, we can see that the third flip-flop gets its J-K input from the output of a 2-input AND gate and the fourth flip-flop gets its input from a 3-input AND gate and so on. If we have a counter that counts to for example 16 bits, we will need to have :
1 * 15-input AND gate,
1 * 14-input AND gate,
. . .
. . .
. . .
1 * 3-input AND gate and
1 * 2-input AND gate.

This method obviously usus a lot more resources than the first method. Not only that, in the first method, the output from each flip-flop is only used as an input to one AND gate. In the second method, the output from each flip-flop is used as an input to all the higher-order bits. If we have a 12-bit counter, the output of the first flip-flop will have to drive 10 gates (called fan-out. The output from the flip-flop may not have the power to do this.
The "solution" to this is to use a compromise between the two methods. Say we have a 12-bit counter, we can organise it into 3 groups of 4. Within each group of 4, we use the second method and between the 3 groups, use the first method. This way, we only have an overall gate propagation delay and a maximum fan-out of 3 instead of 10 using the first and second method respectively.
There are many variations to the basic binary counter. The one described above is the binary up counter (counts upwards). Besides the up counter, there is the binary down counter, the binary up/down counter, binary-coded-decimal (BCD) counter etc. Any counter that counts in binary is called a binary counter.

Binary Down Counters
In a binary up counter, a particular bit, except for the first bit, toggles if all the lower-order bits are 1's. The opposite is true for binary down counters. That is, a particular bit toggles if all the lower-order bits are 0's and the first bit toggles on every pulse.
Taking an example, A4 A3 A2 A1 = 0100. On the next count, A4 A3 A2 A1 = 0011. A1, the lowest-order bit, is always complemented. A2is complemented because all the lower-order positions (A1 only in this case) are 0's. A3 is also complemented because all the lower-order positions, A2 and A1 are 0's. But A4 is not complemented the lower-order positions, A3 A2 A1 = 011, do not give an all 0 condition.

4-bit Synchronous Binary Down Counter
The implementation of a synchronous binary down counter is exactly the same as that of a synchronous binary up counter except that the inverted output from each flip-flop is used. All the methods used improve a binary up counter can be similarly applied here.

Binary Up/Down Counters
The similarities between the implementation of a binary up counter and a binary down counter leads to the possibility of a binary up/down counter, which is a binary up counter and a binary down counter combined into one. Since the difference is only in which output of the flip-flop to use, the normal output or the inverted one, we use two AND gates for each flip-flop to "choose" which of the output to use.

3-bit Synchronous Binary Up/Down Counter
From the diagram, we can see that COUNT-UP and COUNT-DOWN are used as control inputs to determine whether the normal flip-flop outputs or the inverted ones are fed into the J-K inputs of the following flip-flops. If neither is at logic level 1, the counter doesn't count and if both are at logic level 1, all the bits of the counter toggle at every clock pulse. The OR gate allows either of the two outputs which have been enabled to be fed into the next flip-flop. As with the binary up and binary down counter, the speed up techniques apply.

MOD-N/Divide-by-N Counters
Normal binary counter counts from 0 to 2N - 1, where N is the number od bits/flip-flops in the counter. In some cases, we want it to count to numbers other than 2N - 1. This can be done by allowing the counter to skip states that are normally part of the counting sequence. There are a few methods of doing this. One of the most common methods is to use the CLEAR input on the flip-flops.

3-bit Synchronous Binary MOD-6 Counter
In the example above, we have a MOD-6 counter. Without the NAND gate, it is a MOD-8 counter. Now, with the NAND gate, the output from the NAND gate is connected to the asynchronous CLEAR inputs of each flip-flop. The inputs to the NAND gate are the outputs of the B and C flip-flops. So, all the flip-flops will be cleared when B = C = 1 (1102 = 610 ). When the counter goess from state 101 to state 110, the NAND output will immediately clear the counter to state 000. Once the flip-flops have been cleared, the B = C = 1 condition no longer exists and the NAND output goes back to high. The counter will therefore count from 000 to 101, and for a very short period of time, be in state 110 before the counter is cleared. This state is called the temporary state and the counter usually only remains in a temporary state for a few nanoseconds. We can essentially say that the counter skips 110 and 111 so that it goes only six different states; thus, it is a MOD-6 counter. We also have to note that the temporary state causes a spike or glitch on the output waveform of B. This glitch is very narrow and will not normally be a problem unless it is used to drive other circuitry outside the counter. The 111 state is the unused state here. In a state machine with unused states, we need to make sure that the unused states do not cause the system to hang, ie. no way to get out of the state. We don't have to worry about this here because even if the system does go to the 111 state, it will go to state 000, a valid state) on the next clock pulse.

Binary Coded Decimal (BCD) Counters
The BCD counter is just a special case of the MOD-N counter (N = 10). BCD counters are very commonly used because most human beings count in decimal. To make a digital clock which can tell the hour, minute and second for example, we need 3 BCD counters (for the second digit of the hour, minute and second), two MOD-6 counters (for the first digit of the minute and second), and one MOD-2 counter (for the first digit of the hour).

Ring Counters
Ring counters are implemented using shift registers. It is essentially a circulating shift register connected so that the last flip-flop shifts its value into the first flip-flop. There is usually only a single 1 circulating in the register, as long as clock pulses are applied.

4-bit Synchronous Ring Counter
In the diagram above, assuming a starting state of Q3 = 1 and Q2 = Q1 = Q0 = 0. At the first pulse, the 1 shifts from Q3 to Q2 and the counter is in the 0100 state. The next pulse produces the 0010 state and the third, 0001. At the fourth pulse, the 1 at Q0 is transferred back to Q3, resulting in the 1000 state, which is the initial state. Subsequent pulses will cause the sequence to repeat, hence the name ring counter.
The ring counter above functions as a MOD-4 counter since it has four distinct states and each flip-flop output waveform has a frequency equal to one-fourth of the clock frequency. A ring counter can be constructed for any MOD number. A MOD-N ring counter will require N flip-flops connected in the arrangement as the diagram above.
A ring counter requires more flip-flops than a binary counter for the same MOD number. For example, a MOD-8 ring counter requires 8 flip-flops while a MOD-8 binary counter only requires 3 (23 = 8). So if a ring counter is less efficient in the use of flip-flops than a binary counter, why do we still need ring counters? One main reason is because ring counters are much easier to decode. In fact, ring counters can be decoded without the use of logic gates. The decoding signal is obtained at the output of its corresponding flip-flop.
For the ring counter to operate properly, it must start with only one flip-flop in the 1 state and all the others at 0. Since it is not possible to expect the counter to come up to this state when power is first applied to the circuit, it is necessary to preset the counter to the required starting state before the clock pulses are applied. One way to do this is to apply a pulse to the PRESET input of one of the flip-flops and the CLEAR inputs of all the others. This will place a single 1 in the ring counter.

Johnson/Twisted-Ring Counters
The Johnson counter, also known as the twisted-ring counter, is exactly the same as the ring counter except that the inverted output of the last flip-flop is connected to the input of the first flip-flop.

4-bit Synchronous Johnson Counter
The Johnson counter works in the following way : Take the initial state of the counter to be 000. On the first clock pulse, the inverse of the last flip-flop will be fed into the first flip-flop, producing the state 100. On the second clock pulse, since the last flip-flop is still at level 0, another 1 will be fed into the first flip-flop, giving the state 110. On the third clock pulse, the state 111 is produced. On the fourth clock pulse, the inverse of the last flip-flop, now a 0, will be shifted to the first flip-flop, giving the state 011. On the fifth and sixth clock pulse, using the same reasoning, we will get the states 001 and 000, which is the initial state again. Hence, this Johnson counter has six distinct states : 000, 100, 110, 111, 011 and 001, and the sequence is repeated so long as there is input pulse. Thus this is a MOD-6 Johnson counter.
The MOD number of a Johnson counter is twice the number of flip-flops. In the example above, three flip-flops were used to create the MOD-6 Johnson counter. So for a given MOD number, a Johnson counter requires only half the number of flip-flops needed for a ring counter. However, a Johnson counter requires decoding gates whereas a ring counter doesn't. As with the binary counter, one logic gate (AND gate) is required to decode each state, but with the Johnson counter, each gate requires only two inputs, regardless of the number of flip-flops in the counter. Note that we are comparing with the binary counter using the speed up technique discussed above. The reason for this is that for each state, two of the N flip-flops used will be in a unique combination of states. In the example above, the combination Q2 = Q1 = 0 occurs only once in the counting sequence, at the count of 0. The state 010 does not occur. Thus, an AND gate with inputs (not Q2) and (not Q2) can be used to decode for this state. The same characteristic is shared by all the other states in the sequence.
A Johnson counters represent a middle ground between ring counters and binary counters. A Johnson counter requires fewer flip-flops than a ring counter but generally more than a binary counter; it has more decoding circuitry than a ring counter but less than a binary counter. Thus, it sometimes represents a logical choice for certain applications.

Many synchronous counters available as ICs are designed to be presettable. This means that they can be preset to any desired starting value. This can be done either asynchronously (independent of the clock signal or synchronously (on the active transition of the clock signal). This presetting operation is also known as loading, hence the name loadable counter. The diagram below shows a 3-bit asynchronously presettable synchronous up counter.

3-bit Synchronous Binary Presettable Counter
In the diagram above, the J, K and CLK inputs are wired the same way as a synchronous up counter. The asynchronous PRESET and CLEAR inputs are used to perform the asynchronous presetting. The counter is loaded by applying the desired binary number to the inputs P2, P1 and P0 and a LOW pulse is applied to the PARALLEL LOAD input, not(PL). This will asynchronously transfer P 2, P1 and P0 into the flip-flops. This transfer occurs independently of the J, K, and CLK inputs. As long as not(PL) remains in the LOW state, the CLK input has no effect on the flip-flop. After not(PL) returns to high, the counter resumes counting, starting from the number that was loaded into the counter.
For the example above, say that P2 = 1, P1 = 0, and P0 = 1. When not(PL) is high, these inputs have no effect. The counter will perform normal count-up operations if there are clock pulses. Now let's say that not(PL) goes low at Q2 = 0, Q1 = 1 and Q0 = 0. This will produce LOW states at the CLEAR input of Q1, and the PRESET inputs of Q2 and Q0. This will make the counter go to state 101 regardless of what is occuring at the CLK input. The counter will remain at state 101 until not(PL) goes back to HIGH. The counter will then continue counting from 101.

A comparison between Synchronous and Asynchronous Counters
Asynchronous counters, also known as ripple counters, are not clocked by a common pulse and hence every flip-flop in the counter changes at different times. The flip-flops in an asynchronous counter is usually clocked by the output pulse of the preceding flip-flop. The first flip-flop is clocked by an external event. A synchronous counter however, has an internal clock, and the external event is used to produce a pulse which is synchronised with this internal clock. The diagram of an ripple counter is shown below.

4-bit Ripple Counter
It can be seen that a ripple counter requires less circuitry than a synchronous counter. No logic gates are used at all in the example above. Although the asynchronous counter is easier to construct, it has some major disadvantages over the synchronous counter.
First of all, the asynchronous counter is slow. In a synchronous counter, all the flip-flops will change states simultaneously while for an asynchronous counter, the propagation delays of the flip-flops add together to produce the overall delay. Hence, the more bits or number of flip-flops in an asynchronous counter, the slower it will be.
Secondly, there are certain "risks" when using an asynchronous counter. In a complex system, many state changes occur on each clock edge and some ICs respond faster than others. If an external event is allowed to affect a system whenever it occurs (unsynchronised), there is a small chance that it will occur near a clock transition, after some IC's have responded, but before others have. This intermingling of transitions often causes erroneous operations. And the worse this is that these problems are difficult to forsee and test for because of the random time difference between the events.

Synchronous Counter Design
A synchronous counter usually consists of two parts: the memory element and the combinational element. The memory element is implemented using flip-flops while the combinational element can be implemented in a number of ways. Using logic gates is the traditional method of implementing combinational logic and has been applied for decades. Since this method often results in minimum component cost for many combinational systems, it is still a popular approach. However there are other methods of implementing combinational logic which offers other advantages. Some of the alternative methods which are discussed here are: multiplexers (MUX), read-only memory (ROM) and programmable logic array (PLA).

Multiplexer
The multiplexer, also called the data selector, it has n select inputs, 2n input lines and 1 output line (and usually also a complement of the output). The 2n possible combinations of the select inputs connects one of the input lines to the output. When used as a combinational logic device, the n select inputs represent n variables and the 2n input lines represent all the minterms of the n variables.

The ROM is usually used as a storage unit for fixed programs in a computer. However, it can also be used to implement combinational logic. It is useful for systems requiring changeable functions. When a different function is required, a different ROM producing this function can be plugged into the circuit. No wiring change is necessary. The ROM has n input lines pointing to 2n locations within the ROM that store words of M bits. As with the MUX, each input line is used to represent a variable and the 2n locations represent the minterms.

Programmable Logic Array
The PLA is very similar to the ROM. It can be thought of as a ROM with a large percentage of its locations deleted. A ROM with 16 input address lines must have 216, or 65,536 storage locations, and all the words stored in these have to be decoded. The PLA only decodes a small percentage of the minterms. The PLA is sometimes used to produce a system with a small number of chips in a minimum time.
More information on these devices are given in article 2 of cwl3.

Making Fast Counters
Where speed is a concern....
In certain application, speed is an important factor affecting the choice of a counter. For example, counters used in communication and certain instrumentation applications are necessarily fast. We will be looking at some technique commonly used to improve the speed of a counter. To reinforce, the concepts presented, some commercial counters (by Xilinx) will be considered.

General Structure of a Synchronous Binary Counter
There are two common ways in which a synchronous binary counter is structured. These are, namely, the series carry synchronous counter and the parallel carry synchronous counter. These two counters are illustrated
as follows :

Series Carry Synchronous Counter

Parallel Carry Synchronous Counter

Both counters depicted above are binary-up counters.
The T implies a T flip-flop. The flip-flop complements/toggles its output on the rising edge of a clock pulse provided its enable (EN) input is high.
From the diagrams, it can be seen that the least significant bit Q0 toggles on every clock pulse, and subsequent bits toggle when preceding bits are high. The important distinction between the two counters is the way the EN signals propagate from Q0 to Q3. This is illustrated by the highlighted paths. The signals are propagated serially and in parallel (to each AND gate) in the first and second case respectively.
The parallel carry scheme results in a much faster counter. This difference in speed is accounted for by the delay encountered during the propagation of the EN signals. To illustrate the worst case delay in both cases, we consider a change in Q0 from 0 to 1. (see diagrams above)
In the series carry scheme, the time to propagate the change in Q0 must take into account the propagation delays of the 3 AND gates (A, B, C). In the parallel carry scheme, only the propagation delay of 1 AND gate has to be considered. Therefore, the minimum clock period of the parallel scheme is shorter. Thus, the parallel synchronous carry counter operates at a greater maximum frequency. This structure is believed to be the fastest synchronous binary counter structure. In applications that require speed, this scheme is commonly used.
This structure does have limitations. From the diagrams, it can be seen that a single flip-flop output(consider Q0) has to drive a number of subsequent AND gates. The output current of a flip-flop may not be large enough to drive that many gates. It becomes a problem when the counter gets bigger. To overcome this, a tree of AND gates is usually used. How exactly this tree will look like is an engineering choice. This choice will reflect the trade-off between speed requirements and the constraint mentioned above.
Although the series carry scheme is slower, it does not suffer the same drawback as the parallel carry scheme. This makes it a suitable basis for making big counters. Its speed can be improved by using some form of Prescaling. This technique will be considered in subsequent sections.

Prescaling

" The Concept "

The idea of prescaling is to provide a "prescaling" stage between the incoming clock frquency and the counting circuit. The prescaling stage is sometimes provided by a dedicated prescaling device known as the Prescaler. This device/circuit is designed primarily usingEmitter Coupled Logic (ECL) . ECL benefits from very fast switching capabilities. This makes it suitable for high speed counting work.
Despite its suitability to high speed counting work, it has little or no counting features since such features will only impede its operating speed. The reader does not have to concern himself or herself with the implementation of the prescaler. The reader should, however, understand the function it performs in the overall counter.
A prescaler generates a "clock" pulse after it has received a number of input pulses. This "clock" pulse is then fed to the counting circuit. For example, a divide-by-n prescaler will generate a pulse when it has received n input pulses. At present, there are prescalers that can accept a range of frequencies ranging from a few hundred Megahertx to a few Gigahertz. The point of the prescaler is to divide an incoming clock and, thereby provide a clock to a larger, slower counting circuit.
The curious reader would probably be wondering how the actual (and faster) incoming clock frequency is actually reflected in the slower counting circuit. There are a number of ways in which a prescaler can be used, but one sophisticated setup is the "pulse swallowing"counter. The characteristic of a "pulse swallowing" counter is that it stops counting when a predetermined number of pulses has been received.
The following diagram shows a down-counting Binary Coded Decimal (BCD) counter in a simplified "pulse swallowing" setup.

BCD Pulse Swallowing Counter

In the above setup, the Tens and Units sections of a BCD counter are shown. Note that a section stops counting when zero has been reached. Consequently, a carry is also generated ( UC and TC) . Both sections are presettable via P3-P0. The outputs(Q3-Q0) reset to the preset values when Pe is high. UC is fed back to the prescaler as the Mode(M) input signal. When M is high or low, the prescaler divides-by- 10 or 11 repectively before generating a "clock" pulse. To demonstrate the principle of "pulse swallowing", let's consider an example.
Suppose we preset a value of 32 (0011 0010). The outputs will have values as shown below :

```
Tens Units Mode(M)   Decimal Value

---------after 0 clock pulses-----------

0011    0010      0         32

---------after 11 clock pulses----------

0010    0001      0             21

---------after 22 clock pulses----------

0001    0000      1             10

---------after 32 clock pulses----------

0000    0000      1             00

----------------------------------------
```

Effectively, a "pulse swallowing" counter "swallows up" fast incoming clock pulses. This is reflected in the slower counter bysimultaneously driving the Tens and Units section. Therefore the net effect of such a combination (of prescaler and counter) is a counter operating at a much higher speed than what it was capable of alone.

Pipelining

Pipelining is a "predict and store" technique. It "predicts" an event one(usually) clock cycle before it is to occur. Upon prediction, certain output value(s) (resulting that from that event) are set. These new value(s) are stored/latched using flip-flops (usually D type). They appear at the outputs on the next clock pulse when the event actually occurs. How does this actually help in speeding things up?
Let's say the detection of an event and the setting of the required outputs take 20ns. The propagation of the outputs takes another 10ns.
Consider the two situations where pipelining is used and not used. If the above actions had to be performed in one clock cycle, the minimum clock period would be 30 ns(without pipelining). If these two sets of actions were performed in two separate clock periods, the minimum clock period is 20ns(with pipelining). With pipelining, the overall frequency/speed of the circuit is improved. This is illustrated schematically as follows :

" Pipelining speeds things up!! "

Fast Counters From Xilinx
# Synchronous Presettable Counter
(Xilinx Application Notes XAPP 003.002)
Maximum Clock Frequency
8 bits : 71 MHz
16 bits : 55 Mhz
This counter demonstrates the parallel carry synchronous counter structure and the pipelining technique.
Let's consider an up-counting version of this counter.

Presettable Up Counter
Q- counter bits
D- preset via these inputs

On first sight, it looks complicated but the reader may have noticed that there are many similar blocks of logic circuitry. Let's take a look at some of these blocks and see how they work.

Consider block producing Q0

Block Producing Q0 (least significant bit)

TERMINAL COUNT high

As seen below, an inverted version of Q0 is propagated through AND gate AD0 is not propagated through A because an invertedversion of TERMINAL COUNT is fed into B. Therefore the output of B is low. With this setup, Q0 toggles on every rising edge of the clock pulse.

when TERMINAL COUNT high

TERMINAL COUNT low
As seen below, the inverted version of Q0 is not propagated through AD0 is propagated through because an inverted version ofTERMINAL COUNT is fed into B. The output of the OR gate will have the value of D0. Therefore, Q0 will have the value ofD0 on the next clock pulse.It is noted that the preset value D appears as the output Q on the next clock pulse (after terminal count). This applies to all bit stages.

when TERMINAL COUNT low

Consider block producing Q3

Block Producing Q3

TERMINAL COUNT high
As seen below, the output of the EX-OR gate C will be propagated through A. The output of C is high when either(not both) T3 or Q3 ishighT3 is the ANDED version of all preceding outputs(Q0-Q2 ).(Note that in the Q1 stage, the T input is replaced by Q1) Effectively,Q3 stays the same when T3 is low. When T3 is highQ3 toggles. Therefore, in all the bit stages, an output bit toggles when the preceding bits are high.

when TERMINAL COUNT high

TERMINAL COUNT low
The preset value is loaded on the next clock pulse as before.

Consider the Carry connections

The Carry Connections

Let's focus our attention on the generation of the T outputs(see above). This counter uses an adapted version of the parallel carry scheme by employing an AND gate tree. The different outputs driving the AND gates are summarised schematically as follows :

AND Gate Tree Diagrams

In this setup, Q0 is fed directly to the next bit stage and in parallel to all the T(T2-T6) AND gates. This minimises the worst case delay (compare with the series carry scheme). Subsequent bits feed in parallel to the relevant AND gates. The additional gate delay introduced by Tx does not affect the critical paths from Q0 to Q7 because of the way the numbers change.

Consider the Pipelining block

Pipelining TERMINAL COUNT

When the counter output is 11111110, the NAND gate output is low. Therfore, when the value preceding terminal count is detected, the required TERMINAL COUNT value (low) is fed to the input of the flip-flop. On the next clock pulse(when it is terminal count),TERMINAL COUNT is low ("load preset value"). This propagates the preset values (D0-D7) to the inputs of the flip-flops. Note : when any other values are detected, the NAND gate output is high. Thus TERMINAL COUNT is high ("do not load preset value").

# High-Speed Synchronous Prescaler Counter
( Xilinx Application Notes 001.002)
Max Clock Frequency
8 bits : 200 MHz
16 bits : 115 MHz
The counter demonstrates prescaling and pipelining.
The counter can be represented in a block diagram as follows :

The counter is implemented on a Field Programmable Logic Device (FPGA). This requires it to be implemented as tri-bit blocks(TB1 andTB2 ) for optimal resource usage. The reader does not need to concern himself/herself with this.
The counter employs the concept of prescaling but does not use a dedicated prescaler (ECL device). Instead, the least significant (LS) tri-bit(Q0-Q2 ) provides the prescaling function. All tri-bits respond (increment) to a clock pulse if its Count-Enable inputs (CE, CEP ,CET) are high. The CEO of the LS tri-bit is high once in every 8 clock cycles when all its outputs are high. The "prescaler" pulse effectively reduces the clock rate to the rest of the tri-bits by a factor of 8. Note that there is no change in the original clock rate. The 7clock cycles when the LS CEO is low gives the CEO-CET ripple chain (of subsequent tri-bits) time to settle. If this prescaling was not done, the settling time would have to be taken into account when determining the minimum clock period of the counter. This would significantly limit the minimum clock period, thereby slowing the counter down. This would become clearer when we examine the actual implementation of this counter.

TB1

TB2

Note : all clock inputs are assumed to be driven by a common clock. Qa and Qc represent the LSB and MSB of a tri-bit respectively.

Generation of CEO in TB1(A) and TB2(B)

CEOs are high only when CEs(or CET s) and the outputs Q(Qa-Qc)are high.

Generation of Qa in TB1(A) and TB2(B)

When the Count-Enable inputs are high, the EX-OR gate complements Qa. Therefore, the Count-Enable inputs effectively "enable" or "disable" the complementing function.

Generation of Qb in TB1(A) and TB2(B)

When both Qa and the Count-Enable inputs are high, the complementing function is enabled.
Generation of Qc
The generation of Qc is similar to the generation of Qb except that the value of Qb is also fed into the AND gate.

The Ripple Chain
The delay of this ripple chain is the sum of all the gate delays presented by the chain of AND gates. To appreciate the sigificance of this delay, let's consider an example. Suppose the current values from Q23-Q3 is 01111...(all ones)...110. On the next "prescaler" pulse, Q3 will become 1. This "information" about the change in Q3 has to be propagated to subsequent tri-bits before the next clock pulse. This is necessary to ensure the correct changes(on the next clock pulse) to subsequent bits. As evident from the diagrams, the worst case delay is the propagation of this "information" from Q3 through the ripple chain and to the flip-flop input of Q23. This delay is a major and common problem with most binary counters. The "prescaler" accommodates this delay by allowing time for the propagation of this "information". This does not affect the effective operating frequency of the counter because the "prescaler"(LS tri-bit) still operates at the faster clock rate.
The speed of the counter can be improved further by pipeliningthe LS CEO signal :

Pipelining CEO

We see that when 110 is detected, CEO is set and fed to the flip-flop input. This value appears as CEO on the next clock pulse(when 111 occurs).
The actual implemetation of LS tri-bit with pipelining is seen below :

Implemention of LS tri-bit with Pipeline

# Ultra-Fast Synchronous Counter
(Xilinx Application Notes XAPP 014.001)
Maximum Clock Frequency
8 bits : 256MHz
16 bits : 108MHz

Ultra-Fast Counter
In this counter, the LS bit Q0 acts as the "prescaler". The effective "clock" rate provided by this prescaler is 1/2 of the actual clock rate.
In the previous example, the distribution of the CEO signal (from the LS to the MS tri-bit) introduces a line transmission delay. This counter eliminates the delay by replicating QO for bits after Q1. This is done by the following chain/network of flip-flops :

Network To Replicate Q0

To best describe the function of such a network, let's take a look at the timing diagram depicting the output values :

The Timing Diagram
It is seen that all QX0 outputs are in sync with Q0 after the initial delays. The effect of this is that bits after Q1 appear to be driven directly by Q0 and without the line transmission delay. This improves the minimum clock period.

The Second "Prescaler"

Here Q1 and Q2 act as the second "prescaler". This additional prescaler is needed to accommodate a large counter(more bits). The effective "clock" rate provided by this prescaler to the rest of the counter is 1/8 of the actual clock rate. This second prescaling stage allows the rest of the counting circuit to employ the series carry scheme . The use of such a carry scheme allows a larger counter to be constructed.
From the diagram, CEP2 is pipelined. When the LS three bits are 101(Q2-Q0), the output of AND gate A is high. Since Q0 is high at this point, the value of A is selected and appears at the output of multiplexer B. This value is fed to the Flip-flop input. On the next clock cycle, the value appears as CEP2(high). At this point, the LS three bits is 110. Since QY01 is low, CEP2 is selected by the multiplexer. Thus, on the next clock cycle, CEP2 is high again. This is summarised below :

```
Q2  Q2  Q1    A    D-input(flip-flop)    CEP2

1   0   0     1            0              0
1   0   1     1            1              0
1   1   0     0            1              1
1   1   1     0            0              1

```

References
 Excellent Good Fair Poor
 1. Title: Counting and Counters Author(s): R M M Oberman Source: The Macmillan Press Ltd

 2. Title: Electronic Counters Author(s): R M M Oberman Source: The Macmillan Press Ltd

 3. Title: Logic Design Principles Author(s): Edward J. McCluskey Source: Prentice Hall International

 4. Title: Digital System Design Author(s): Barry Wilkinson with Rafic Makki Source: Prentice Hall International

 5. Title: Digital Design : Principles and Practices Author(s): John F. Wakerly Source: Prentice Hall International

 6. Title: Digital Systems : Principles and Practices Author(s): Ronald J. Tocci Source: Prentice Hall International

 7. Title: Practical Digital Design Using ICs Author(s): Joseph D. Greenfield Source: John Wiley & Sons

 8. Title: Digital Logic Design Author(s): Brian Holdsworth Source: Butterworth-Heinemann Ltd

 9. Title: Digital Design Author(s): M. Morris Mano Source: Prentice Hall International>

 10. Title: Logic Design Principles Author(s): Edward J. McCluskey Source: Prentice Hall International

 11. Title: Digital Electronics Author(s): Christopher E. Strangio Source: Prentice Hall International

 12. Title: Digital Logic and State Machine Design Author(s): David J. Comer Source: Saunders College Publishing

 13. Title: Digital Circuits and Microprocessors Author(s): Herbert Taub Source: Prentice Hall International

 14. Title: The Programmable Logic Data Book Author(s): Xilinx Inc. Source: Xilinx Inc.