Lecture Notes by Anthony Zhang.

ECE124

Digital circuits and systems.

Instructor: Andrew Kennings
Website: http://sifaka.uwaterloo.ca/~akenning/courses/ece124/

5/1/15

The assignments and other course resources can be found on the course website. They will not be posted to LEARN.

Boolean Algebra

See the CS245 notes for description of Boolean algebra.

Binary functions are defined using a truth table, or a Boolean logic formula. An example of a binary function is f = f(x, y). Problem with truth tables is that they're big, and hard to manipulate.

Boolean AND is represented with x \cdot y, xy, or x \wedge y. The schematic symbol for this operation is a rectangle with one side completely rounded into a semicircle with the output wire, and the other flat and with input wires.

Boolean OR is represented with x + y. The schematic symbol for this operation is a rectangle with one side completely rounded into a semicircle with the output wire, with the other side curved inward and has input wires leading into it.

Boolean NOT is represented with \overline x, !x, x', or \neg x. The schematic symbol for this operation is a triangle with a circle on the pointed end, which has the output wire, and the input wire is on the other end.

There is also an order of operations for these operations. From highest precedence to lowest, the operators are NOT, AND, and OR.

7/1/15

Minterms/Maxterms

A minterm is a particular Boolean expression for a particular input to an n-input function. The minterm is special because it is easy to construct algorithmically from a truth table.

The minterm m_i of a truth table's row i is the an OR expression with an operand for each function input where each operand is x_i if x_i = 1 for that row and \neg x_i if x_i = 0.

Given a truth table, we can construct an expression that is equivalent to the function it represents as follows:

  1. Create an empty expression m.
  2. For each row x_0, \ldots, x_n, f in the truth table:
  3. Create an empty expression m_i.
  4. For all 1 \le i \le n:
    1. If x_i = 0, let m_i become m_i \cdot \neg x_i.
    2. If x_i = 1, let m_i become m_i \cdot x_i.
  5. If and only if f = 1, let m become m \vee m_i. Note that m_i is true if and only if the inputs match those in the current row of the truth table.
  6. Note that m is true if and only if f = 1, since there is an m_i term for all the rows in the truth table that are true, and none of them match any rows in the truth table that are not true.

Here, each m_i is a minterm. m is the sum of minterms/canonical sum of products, defined below.

The canonical sum of products/sum of minterms is when we OR all the minterms (m_i) that have f = 1 for their corresponding truth table rows, in order.

The maxterms of a function are the duals of the minterms - similar to the minterms, but with AND instead of OR, and \neg x and x where we used to have x and \neg x.

The maxterm M_i of a truth table's row i is the an OR expression with an operand for each function input where each operand is x_i if x_i = 1 for that row and \neg x_i if x_i = 0.

The maxterm is true if and only if the inputs do not match the inputs in the corresponding row in the truth table. This is the opposite of a minterm being true if and only if the inputs do match the inputs.

The canonical product of sums/product of maxterms is when we AND all the maxterms that have f = 0 for their corresponding truth table rows, in order. In other words, it is an expression that ensures that the inputs do not match the first row resulting in 0, and do not match the second row resulting in 0, and so on.

The sum of minterms and product of maxterms are special because it is exactly equivalent to the original function. The term canonical means that they are unique - there is only one way to write it correctly.

Minterms determine when we need to turn the function on. Maxterms determine when we need to turn the function off.

A n-level expression is an expression that has a maximum tree depth of n. xy + z is a 2-level expression, and when we draw the circuit, it has 2 levels of gates - the depth of the circuit tree. The level of a circuit is the length of the longest path from a circuit input to an output.

For example, if a 2-input function has the following truth table:

x y f
0 0 1
0 1 0
1 0 0
1 1 1

The minterms are \neg x \neg y and x y, and the maxterms are x \neg y and \neg x y. As a result, we can write f as \neg x \neg y + x y (sum of minterms) or (x + \neg y)(\neg x + y) (product of maxterms).

Our goal is to use the simplest possible circuits. We can choose whether to use the sum of minterms or product of maxterms, but we can also use Boolean algebra to further simplify any expression.

Boolean Algebra

Axioms of Boolean algebra:

  1. Closure over operations: given x, y \in \set{0, 1}, x \cdot y, x + y, \neg x \in \set{0, 1}.
  2. Operation identities: given x \in \set{0, 1}, x + 0 = x and x \cdot 1 = x. 0 is the OR identity and 1 is the AND identity.
  3. Commutativity: given x, y \in \set{0, 1}, x + y = y + x and x \cdot y = y \cdot x.
  4. Distributivity: x + y \cdot z = (x + y) \cdot (x + z) (distributivity over OR) and x \cdot (y + z) = x \cdot y + x \cdot z.
  5. Negation existance: given x \in \set{0, 1}, there must exist an element \neg x such that x + \neg x = 1 and x \cdot \neg x = 0.
  6. Unique elements: there must exist x, y \in \set{0, 1} such that x \ne y.

We can now prove theorems using truth tables, or using axioms:

Additional theorems:

  1. x + x = x and x \cdot x = x
  2. x + 1 = 1 and x \cdot 0 = 0
  3. \neg \neg x = x (double negation)
  4. x + (y + z) = (x + y) + z and x \cdot (y \cdot z) = (x \cdot y) \cdot z (associativity)
  5. \neg (x + y) = \neg x \cdot \neg y and \neg (x \cdot y) = \neg x + \neg y (De Morgan's Law)
  6. x + x \cdot y = x and x \cdot (x + y) = x (adsorption)

9/1/15

Simplify \overline{\overline{cd} + a} + a + cd + ab:

\displaystyle \begin{aligned} \overline{\overline{cd} + a} + a + cd + ab &= \overline{\overline{c}}d\overline{a} + a + cd + ab = cd\overline{a} + a + cd + ab \\ &= cd(\overline{a} + 1) + a + ab = cd + a + ab = cd + a(1 + b) = cd + a \end{aligned}
Note that this is still a sum of products (SOP), although it isn't necessarily unique. A product of sums (POS) is what we would get if we worked with maxterms. A sum of products is always a 2-level circuit.

We can use Boolean algebra to convert the sum of minterms or product of maxterms into a simplified expression. This allows us to convert a truth table into a sinple Boolean expression.

We can also represent a sum of minterms like m_{a_1} + \ldots + m_{a_n} using the shorthand notation \sum(a_1, \ldots, a_n). For example, f = m_3 + m_5 + m_6 + m_7 can also be written as \sum(3, 5, 6, 7).

Likewise, maxterms have a shorthand as well: M_{a_1} + \ldots + M_{a_n} can be written as \prod(a_1, \ldots, a_n).

Each logical operation has a physical cost when we build the circuit in real life. For now, each gate costs 1 unit, each gate input costs 1 unit, and the inverters at inputs are free. For example, xy + yz has a 2-input OR, and two 2-input ANDs, so it has a cost of (1 + 2) + 2 \cdot (1 + 2) = 9.

To convert between a sum of products and a product of sums, invert it twice and apply De Morgan's laws. For example, let f = \sum(1, 4, 7):

Clearly, f = m_1 + m_4 + m_7 = \overline{\overline{f}}.
Clearly, \overline{f} = m_0 + m_2 + m_3 + m_5 + m_6, the minterms that are not part of the function.
So f = \overline{m_0 + m_2 + m_3 + m_5 + m_6} = \overline{m_0}\overline{m_2}\overline{m_3}\overline{m_5}\overline{m_6}, by De Morgan's laws.
Note that \overline{m_i} = M_i - a maxterm is the negation of its corresponding minterm.
So f = \overline{m_0}\overline{m_2}\overline{m_3}\overline{m_5}\overline{m_6} = M_0 M_2 M_3 M_5 M_6, a product of sums, as required.

12/1/15

Other Logic Gates

The NAND gate is an AND gate with an inverted output, written x \uparrow y. The NOR gate is an OR gate with an inverted output, written x \downarrow y.

When we invert an input or an output, we can just put a hollow small bubble/circle inline with the wire, conventionally touching the gate.

Note that NAND and NOR are not associative, so two NAND/NOR gates chained together is different from a three-input NAND/NOR gate.

These gates are extremely useful for circuits built using technology such as CMOS. In CMOS, the cheapest construct is the NAND gate, so we tend to make other constructs in terms of NAND gates.

However, we generally want to work with normal gates like AND and OR, and convert it into NAND at the end. Since NAND and NOR are universal, any combinatorial circuit can be represented using just one of these gates.

To convert sums of products into NAND logic, simply negate the whole thing twice and apply De Morgan's law. For example, f = a \overline b + \overline b + c = \overline{\overline f} = \overline{\overline{a \overline b} \overline{\overline a b} \overline c}, and the final result is all NAND gates. This can also be done recursively on subexpressions. This also works for converting products of sums into NOR logic.

Alternatively, we can also just replace each individual gate with its NAND equivalent. For reference, x + y = (x \uparrow x) \uparrow (y \uparrow y), xy = (x \uparrow y) \uparrow (x \uparrow y), and \overline x = x \uparrow x.

This also works graphically. In a circuit schematic, simply insert two back-to-back inverters inline to wires, until it is possible to see NAND forms. By moving one of a pair of inserted inverters into the output for an AND gate, for example, we can replace the AND and NOT with a NAND gate.

Essentially, we first insert pairs of inverters until all gates have been converted into NAND, then remove any remaining back-to-back inverters (since they cancel each other out) and replace any remaining single inverters with their NAND forms.

13/1/15

Boolean XOR is represented with x \oplus y. The schematic symbol for this operation is an OR gate, but with the concave side having two lines. It has the same precedence in Boolean algebra as the AND gate.

Boolean XOR essentially is true if and only if there are an odd number of operands that are true. For two inputs this is useful as an inequality operator \ne.

x \oplus y = \overline x y + x \overline y. This pattern appears quite often in practical Boolean algebra, and so using XOR can often simplify formulas quite a bit. Plus, in technologies like CMOS, XOR can be implemented significantly cheaper than its long, SOP form, so we can save on cost too by using this.

Boolean NXOR (also known as XNOR) is simply XOR inverted. For two inputs this is useful as an equality operator =.

A buffer is a special identity gate, f = x, which has the same output as it does input. It has the same symbol as the inverter, but without the circle at the output.

This is useful for amplifying signals and supplying current when there are a lot of branches. Additionally, it can act as a diode (in 3-level logic) when placed in a real circuit. Plus, sometimes we use it for slowing down a signal slightly to improve timing synchronocity. Mathematically, it does not serve any purpose.

A tri-state buffer is similar, but has another input called Enable coming out of one side. When the Enable wire is high, the output is the same as the input. When enable is low, the output is disconnected from the input - the value of the output is not specified.

Tri-state buffers are useful if we want to connect multiple sources to the same destination. For example, in RAM we might have one per RAM cell, and selectively connect and disconnect the RAM cell such that only one cell is actually connected to the destination at a given time. This is important, since if we have two RAM cells connected at the same time, one with 0's and one with 1's, there would be a short circuit.

Karnaugh Maps

Karnaugh maps (K-maps) are a way of describing Boolean functions with around 5 or less inputs (for larger inputs, it becomes impractical, as the number of table cells grows with 2^n).

K-maps are also useful because they can be used to minimise functions by graphically performing Boolean algebra.

Consider K-map for the function f = \overline x \overline y + \overline x y + x y:

y\x 0 1
0 1 0
1 1 1

Note that each normal box (rectangle) corresponds to a row in the truth table for the function, and each 1 in a normal box corresponds to a minterm.

Note that the bottom two are both 1. That means that the value of x does not influence the bottom row, so we can draw a rectangle around the bottom two, which represents y.

Note that we can do the same for the leftmost row of two 1's, a rectangle around the two representing \overline x.

If we want to minimise a function, the goal of K-mapping is to find the smallest possible number of the largest possible rectangles that cover all the 1 boxes and do not cover any 0 boxes (boxes containing don't cares don't matter). In each step, we try to draw a rectangle that encloses as many 1's as possible without enclosing any 0's.

This implies that there may be multiple optimal minimal versions of the function, with the same cost. This is represented by multiple minimal rectangle possibilities in the table, or multiple factoring possibilities in the Boolean expression

All rectangles must have side lengths that are powers of 2.

Mathematically, when we draw a rectangle we are duplicating a term and then factoring an input out. The boxes we drew above corresponded to the operations f = \overline x \overline y + \overline x y + xy = \overline x \overline y + \overline x y + xy + \overline x y = \overline x(\overline y + y) + y(x + \overline x) = \overline x + y.

Basically, every rectangle duplicates and factors two terms, repeatedly for larger rectangles. This is the reason rectangles need to have dimensions that are powers of 2. Larger rectangles result in products with fewer factors. Fewer rectangles result in fewer products.

The above technique resulted in a sum of products. To get a product of sums, draw maximum rectangles that cover all 0's but no 1's. Rather than a product, each rectangle has a corresponding sum, made up of the inverted inputs that actually matter in making the function 0.

For example, if there is a 0 at x_1 = 0, x_2 = 0, x_3 = 1, x_4 = 1, the rectangle surrounding just that 0 is associated with x_1 + x_2 + \overline{x_3} + \overline{x_4}. The minimized product of sums is simply the product of the sum associated with each rectangle.

Also, the sides are labelled using grey code, and the rectangles can wrap around the sides of the table.

14/1/15

When doing K-maps, we can read off the answer from the rectangles by looking at which variables did not change within the minterms in a given rectangle. If all the minterms in a rectangle have x_1 equal to 0, but all other variables are not constant within the rectangle, then the rectangle represents the term \overline x_1.

If we look at the table and see that no rectangles can be expanded, that means that in the Boolean expression, there is no more factoring or collapsing left to be done.

For K-maps of larger dimensions, we label the rows and columns using grey code - a binary counting system in which only one bit changes when incrementing or decrementing a value. The grey code is recursively defined. The 1-bit grey code is grey_code(1) = [0, 1]. The n-bit grey code is grey_code(n) = map(grey_code(n - 1), lambda code: 0 .. code) + map(reversed(grey_code(n - 1)), lambda code: 1 .. code).

In other words, to get the next grey code, we prepend a 0 to every binary string in the current grey code, and prepend a 1 to the reversed version of the current grey code, then put these two together.

Every variable splits the output space of a function in half. Let f(x_1, \ldots, x_n) be a Boolean function. Let f_0 represent f(0, x_2, \ldots, x_n) and f_1 represent f(1, x_2, \ldots, x_n).

16/1/15

More Dimensions

The 5-input K-map is 3-dimensional. This is because the rectangles in 2D can only factor up to 2 variables per dimension.

As a result, it is difficult to visualize. Instead, what we can do is f = \overline x f_0 + x f_1, and then do one K-map for f_0 and one for f_1 side by side. This represents a 4 by 4 by 2 cuboid. Now, instead of finding rectangles, we find cuboids - we can join rectangles that are adjacent across their K-map, but also between the two K-maps:

x_3 x_4$x_1 x_2, x_5 = 0$ 00 01 11 10
00 1 0 0 1
01 0 1 1 0
11 0 1 1 0
10 0 1 0 0
x_3 x_4$x_1 x_2$ x_5 = 1 00 01 11 10
00 1 0 0 1
01 0 1 1 0
11 0 1 1 1
10 0 1 0 1

There is a 2x2x2 cuboid at the middle, a 2x1x2 cuboid at the top corners, a 1x2x1 cuboid at the right side of the right K-map, and a 1x2x2 cuboid at the second-to-leftmost row at the bottom of the tables. This corresponds to the terms x_2 x_4, \overline{x_2} \overline{x_3} \overline{x_4}, x_1 \overline{x_2} x_3 x_5, \overline{x_1} x_2 x_3.

Potentially, we could also do 6 variable maps by making 4 4-input K-maps, but this is much more difficult to visualize.

Extensions

Sometimes, the output of a function doesn't matter, like the output values when we have inputs that will never occur. When this is the case, we can say that the value of the function is a don't care. In truth tables and other situations, this is represented as "x". The don't cares are part of the specifications of a circuit, not the mathematical functions.

For example, suppose we have a binary to decimal converter, a circuit that accepts 4 inputs and 10 outputs that interprets the 4 inputs as a 4-bit binary number and turns on 1 of the 10 inputs depending on the binary value of the input being between 0 and 9 inclusive. In this case, we can assume that the input binary value will never be above 9, since the circuit wouldn't work then anyways, and all the outputs for those inputs would be don't cares.

In a K-map, we can include don't cares in rectangles if we want to, but it's not required - we can include them in rectangles if it helps us make rectangles larger, but we can leave don't cares outside of rectangles if they don't help.

The minimized function resulting from the K-map is functionally equivalent - both implementations do what we want them to - but they might not be logically equivalent, when we use don't cares.

When we want to minimise a function with multiple outputs, we could try to minimise each output individually as functions with only one output, but this is not necessarily the lowest cost circuit, since there are a lot of logical signals that can get duplicated.

In the K-map, this translates to sometimes not expanding rectangles when it is possible to. Sometimes, with multiple sums of products, the best overall solution is one where the individual sums of products are not necessarily best.

Essentially, we can sometimes do product term sharing to lower overall cost when multiple SOPs use the same ones. We can sometimes do this in Boolean algebra by factoring to find common products.

Finding the best overall solution is a problem known as multiple output minimization, and is an NP-hard problem.

Math Background

A product term a covers a product term b if and only if a = 1 \implies b = 1.

A product term is an implicant of a function f if and only if f = 1 for all minterms covered by the product term - all the minterms that the product term covers are those that correspond to truth table rows that are 1. In a K-map, an implicant is a rectangle of 1's and possibly don't cares.

A prime implicant is an implicant in which no variable can be removed without making it a non-implicant - it is an implicant that cannot be simplified by removing variables. In a K-map, these are the largest possible rectangles of 1's.

19/1/15

An essential prime implicant is a prime implicant that covers a minterm that is not in any other prime implicant.

Functions implemented with prime implicants tend to be cheaper, since the AND gates have as few inputs as possible, and the bigger the area an implicant covers, the less other 1's we need to include. As a result, the minimum sum of products is an implementation of the function using as few prime implicants as possible, and each one as large as possible.

Every function can be implemented as a set of prime implicants, all OR'd together. We must include all the essential prime implicants, but we might also have prime implicants that would be prime if they didn't overlap. We must therefore choose rectangles from the remaining non-essential implicants such that we cover all the 1's, with as few rectangles as possible.

CMOS

CMOS is one of the most popular technologies around for implementing logic gates. In CMOS, the cheapest 2-input gates are NAND and NOR, which uses the fewest transistors possible. The two types of transistors in CMOS are NMOS and PMOS.

NMOS

An NMOS (N-channel MOSFET) has a source S (often connected to a lower voltage), a drain D (often connected to a higher voltage), and a gate G (which controls the current flowing from the drain to the source):

NMOS Transistor
NMOS Transistor

Another common schematic symbol for an NMOS is a \pi-shape (with straight legs) with a bar on top connected to the gate, where the legs of the \pi are the source and drain.

If we set the gate voltage V_G to a voltage greater than some threshold voltage V_T (like 2V, for example), so V_G > V_T, the NMOS just acts like a short circuit between the drain and the ground (the drain is "pulled down" to ground). If we we set it to ground (0V), the NMOS just acts like an open circuit between the drain and the source.

Basically, an NMOS is a tri-state buffer, with input 1 that works best on the low side, with the drain being the input, the source being the output, and the gate being the enable pin.

PMOS

A PMOS (P-channel MOSFET) has a source (often connected to a higher voltage), a drain (often connected to a lower voltage), and a gate (which controls the current flowing from the drain to the source).

PMOS Transistor
PMOS Transistor

Another common schematic symbol for a PMOS is a \pi-shape (with straight legs) with a bar on top connected to the gate by a bubble (like an inverting bubble on the input), where the legs of the \pi are the source and drain. This is the same as the alternate symbol for the NMOS, but with a bubble on the gate input.

If V_G - V_S < -V_T (source-gate voltage lower than threshold), the PMOS acts like a short circuit between the source and the drain. If V_G = V_S, then the PMOS acts like an open circuit between the source and the drain.

Basically, a PMOS is a tri-state buffer with input 1, that works best on the high side, with source being the input, the drain being the output, and the enable pin being the inverted value of the gate.

CMOS Logic

We can implement a NOT gate very simply using NMOS and PMOS:

NOT gate
NOT gate

We can analyze this by considering the cases. When the input V_x is low, the PMOS is enabled and the NMOS is disabled, pulling the output high. When V_x is high, the PMOS is disabled and the NMOS is disabled, pulling the output low.

A NAND gate extends this concept slightly:

NAND gate
NAND gate

When either V_x or V_y are 0, at least one PMOS will be enabled and the output will the pulled up (since they are in parallel), while at least one NMOS is disabled, preventing the output from getting pulled down (since they are in series). When both V_x and V_y are 1, both PMOSs are disabled, and the NMOSs are both enabled, allowing the output to be pulled down.

A NOR gate works in a similar fashion, but with the NMOSs in parallel and the PMOSs in series. It is now easy to make other gates such as AND and OR by combining these with NOT gates on the outputs.

The following is a transmission gate:

Transmission gate
Transmission gate

This gate happens to be very useful in making XOR gates. When E is low, the PMOS's gate is high and the NMOS's gate is low, so no current can flow. When E is high, the PMOS's gate is low and the NMOS's gate is high, so current can pass through in either direction through one transistor or the other.

To make a 3 or more input NAND gate, we can simply add more PMOSs in parallel and more NMOSs in series. Note that we can't extend these arbitrarily, since there is a voltage drop across each transistor - they aren't perfect open or closed circuits. Basically, in series the voltage will eventually drop too low to use, and in parallel, there will eventually be too much current leaking through closed transistors. As a rule of thumb, we can put no more than 4 transistors in series or parallel at a time.

21/1/15

The transmission gate has a schematic symbol as well - two overlapping triangles facing opposite horizontal directions, with a bubble on the top middle. We often put a NOT gate on the top input and connect it to the bottom input to make it act like a bidirectional tri-state buffer.

XOR Gates

Now we can implement XOR. Since a \oplus b = \overline a b + a \overline b, we can just implement it with NAND gates, with 16 transistors. However, it is possible to do better using the transmission gate:

XOR gate
XOR gate

Basically, what happens is that A turns on the top transmission gate and turns off the bottom transmission gate when it is low, and turns on bottom transmission gate and turns off the top transmission gate when it is high.

In other words, if A is low, the output is set to B. If A is high, the output is set to \overline B.

This is an XOR gate that only takes 8 transistors.

A CMOS circuit always follows a specific form: there is a pull-down network that tries to pull down the output for some inputs (NMOS circuitry), connected to ground, and a pull-up network that tries to pull up the output for all the other inputs, connected to power (PMOS circuitry).

The K-map for a 4-input XOR gate appears as follows:

x_3 x_4$x_1 x_2$ 00 01 11 10
00 0 1 0 1
01 1 0 1 0
11 0 1 0 1
10 1 0 1 0

If we implemented this with AND/OR/NOT logic, this would be the largest possible 4-input combinatorial circuit we can have - the worst case scenario for a K-map, where it is not possible to optimise.

With XOR, we can make the resulting circuit much simpler than with a SOP. Mathematically, we are finding expressions of the form \overline a b + a \overline b, factoring them out, and then replacing them with a \oplus b. For example, \overline{x_1} x_2 (\overline{x_3 \odot x_4}) + (x_1 + \overline{x_2})(x_3 \oplus x_4) = \overline{x_1} x_2 (\overline{x_3 \odot x_4}) + (\overline{\overline{x_1} x_2})(x_3 \oplus x_4) = (\overline{x_1} x_2) \oplus x_3 \oplus x_4

The telltale sign of XOR in a circuit is the 2x2 square with 1's in the top right and bottom left corner. When this appears multiple times, there is almost certainly a way to lower costs using an XOR gate.

Also, when we have a wire with a slash through it, labelled n, that wire actually represents n parallel wires. So a wire with a slash labelled 3 as the only input to an AND gate means that the AND gate has 3 inputs.

A PLA/PAL/SPLD (programmable logic array/programmable array logic) is a chip that has programmable fuses that can be programmed to represent a variety of circuits. It basically has an array of AND gates and OR gates, and can make connections from any input or inverted input to any AND gate, and from any AND gate to any OR gate, with the connections being programmable. This lets us represent a variety of functions (though not all, since the number of AND gates is usually relatively limited) using just a single chip. In practice, these can help save a lot of chip space, since these days we generally build circuits with chips instead of fabricating them from raw transistors.

23/1/15

Decomposition/Factoring

We have to this point basically always measured cost in terms of number of gates plus number of gate inputs. However, in practice a better cost function might be to count the number of transistors, or the number of actual chips we will need.

By using Boolean algebra on the two-level SOP resulting from a K-map, we turn the 2-level circuit into a 3-level circuit that is also no longer a SOP, but in the process it is possible that this might make the circuit cheaper as a whole.

If we are using something known as a K-LUT, we basically have a couple of k-input blocks that can implement any k-input functions. So a 3-lut can implement any 3-input function, and so on. These often have less inputs than the functions we need to implement.

When we use these, our cost function becomes the number of K-LUTs we are using. As a result, to minimise the number of K-LUTs we are using, for a function f we want to find a function g such that g is an input to f (replacing the inputs that g has that originally went into f) and we can split the set of inputs into those for f and those for g. We are basically imposing a structure onto the function in an attempt to make the function smaller, but that smaller function might not exist.

Suppose the split we choose for a 5-input function f = \overline d c + d \overline c b + d \overline c a + e d \overline b \overline a + e c \overline b \overline a is e, b, a for f and d, c for g.

From this, we can draw a decomposition chart:

eba/dc 00 01 10 11
000 1 1 0 1
001 0 1 1 0
010 0 1 1 0
011 0 1 1 0
100 0 1 1 0
101 0 1 1 0
110 0 1 1 0
111 0 1 1 0

Note that there are only 2 unique rows in the table. Wherever there are duplicated rows, we could possibly have a function g that selects which unique row is being used to possibly simplify the logic. In our case, we will let g be a function that returns 1 when the row is 0110 and 0 when the row is 1101. Clearly, this is g = e + b + a.

Now, the function depends only on g and d, c. We can now draw a K-map:

g/dc 00 01 11 10
0 1 1 1 0
1 0 1 0 1

The new minimized function is now f = \overline g \overline d + \overline g c + \overline d c + g d \overline c. We could also try to further minimise it using an XOR or similar. Clearly, g and f both have 3 inputs each, so we can implement it with just 2 3-LUTs, which is the minimum possible amount since a 5-input function cannot be implemented with just 1 3-LUT.

How did we choose the split that resulted in this structure? Mostly by brute force with some heuristics. There are \sum_{i = 1}^{m - 1} {n \choose i} possibilities where n is the number of inputs and m is the number of inputs on our K-LUTs. Basically, there is no simple way to determine what split of variables to use, so we just need to try promising candidates.

This technique is a way to find a common subfunction of a function in order to reduce the number of inputs.

The idea is that the cheapest function possible will depend on the cost function, which depends on the technology that we are using. Different functions will be cheaper for CMOS vs. K-LUTs vs. logic gates, and so on.

26/1/15

;wip: get the notes for this class, missed due to sick

;wip: apparently we talked about representing numbers in different bases, like the week 4 content

;wip: r's complement and all that

27/1/15

Addition

The half adder (HA) is a circuit that can add two one-bit numbers. It is represented by S, C_{out} = x \oplus y, x \wedge y.

In order to fully add two numbers of any size, we need to be able to add two 1-bit numbers, plus a carry in - basically, add three 1-bit numbers.

We can do this with two half adders - a full adder (FA): S, C_{out} = x \oplus y \oplus C_{in}, x \wedge y + C_{in}(x \oplus y). Basically, we connect the output of the half adder adding x and y to the input of a half adder that adds the carry in. The resulting carry out is the carry out of each half adder OR'd together. The full adder is an example of heirarchichal design.

This is only one representation - there are multiple ways to make full adders, including 2-level repesentations like S, C_{out} = x \oplus y \oplus C_{in}, x \wedge C_{in} + y \wedge C_{in} + x \wedge y, since x \wedge C_{in} + y \wedge C_{in} + x \wedge y = C_{in} (x + y) + xy = C_{in} (x \oplus y) + x \wedge y.

With this, we can chain them together to add n-bit numbers, by connecting the C_{out} of one full adder to the C_{in} of the next, and corresponding bits of the addends for inputs to each full adder. This is called an n-bit ripple adder, because the carry-outs ripple through the circuit from the least significant bit to the most. This takes 5n (5(n - 1) + 2 if using half adder for first bits) gates, which is pretty much the smallest possible.

Adding two n-bit numbers requires n adders (technically, n - 1 full adders and 1 half-adder for the least significant bit, or n full adders with the least significant bit's carry in connected to 0). Using a full adder for the least significant bit allows us to easily add 1 at any time, which is useful for certain things like subtraction.

When we add two n-bit numbers, we know that the output is potentially n + 1-bits. In our circuits, this manifests as the carry out of the last full adder. The sum actually overflows after 2^n - 1 back to 0 for unsigned numbers, and after 2^{n - 1} - 1 back to -2^{n - 1} for signed numbers.

Every time we change the input to a gate, we need to wait 1 time unit before we read the result in order to make sure that it is stable. This is the gate's propagation delay.

The longest combinational path/critical path is the longest path of logic gates and other combinational components through the circuit from one input to one output, and determines how long we need to wait before the output is stable and we can read it expecting valid results. We can represent this by drawing the full adder circuit as a box with arrows inside from each input to every output, each labelled with the longest combinational path from that input to that output. For example, the critical path from carry in to carry out is 2 gate delays for a full adder.

The ripple gate is relatively slow. Note that the last sum's output depends on every input before it - as the number of bits increases, the longest combinational path of gates through the circuit gets proportionally longer. As a result, as n increases, the amount of time we need to wait before the output of the ripple adder finally stabilizes also increases as 2n - 1 gate delays.

28/1/15

There is no such thing as a subtractor - while adding was easy because carrying always went leftward, borrowing might need us to move an arbitrary amount of spaces left. Instead, we simply add the negation of the number using the r's complement: M + (r^n - N) = r^n + (M - N). Note the presence of the r^n term, which simply represents the carry/borrow bit past the most significant bit of the number.

If we represent signed integers using two's complement, we don't need to do anything at all to the answer - just add M to the two's complement of N. When we do this, the carry/borrow bit is 1 if M \ge N and 0 otherwise.

Since the two's complement is equivalent to a bitwise NOT followed by addition of 1, we can very quickly find the two's complement of any number. In a ripple adder, if we use a full adder in the least significant bit of the adder, we can very easily add the two's complement for an operand by inverting its bit inputs, and setting the carry in of the full adder to 1 in the least significant bit.

So to make a subtractor, we simply add a control line input to the adder that we XOR with every bit of the second addend and is connected to the carry in of the full adder for the least significant bit. When this is off, the bits of the second addend are XOR'd with 0 (which does nothing) and the carry in is 0 as before - this is addition, as before. When this is on, the bits of the second addend are XOR'd with 1 (inverting them) and the carry in is 1, adding 1 to the sum - this results in subtraction.

30/1/15

;wip: missed first half of this due to interview

Signed arithmetic we will represent using two's complement. See CS241 notes for reference.

For unsigned numbers, addition/subtraction overflows if and only if the carry out of the full adder for the most significant bit is 1. Note that unsigned subtraction can never overflow.

For signed numbers, addition/subtraction overflows if and only if the carry in and carry out of the full adder for the most significant bit are different. In order to detect overflows in hardware, we simply add an XOR gate connected to the carry in and carry out of the last full adder.

The ripple adder is small and simple, but slow. We want to make a better adder. There is a design that uses quadratic space, but can add in constant time with respect to the number of bits - the carry-lookahead adder (CLA).

Consider the carry line of the ripple adder. As we go left, we have more and more gates in the critical path. However, each carry-out is theoretically a function of each of the inputs of the previous gates. If we write a truth table and minimise, we can get a two-level circuit for any of the carry-outs.

2/2/15

The basic idea is that we can compute the carry in of each adder independently without having to wait for it to propagate through all the previous adders. Since it is always possible to write any boolean function as a 2-level SOP or POS, we know that this is possible.

Consider an n-bit ripple adder. Let p_i (propagate) and g_i (generate) be the sum and carry from the first half adder (the one with the non-carry inputs) in the ith full adder. Let c_i be the carry-in of the nth full adder.

Clearly, c_1 = g_0 + p_0 c_0, c_2 = g_1 + p_1 c_1 = g_1 + p_1 g_0 + p_1 g_0 c_0, and c_3 = g_2 + p_2 c_2 = g_2 + p_2 g_1 + p_2 p_1 g_0 + p_2 p_1 p_0 c_0s.

Since every c_i is in terms of only x and y, we can compute all of them in parallel, using a SOP for each carry in. That means that no matter how big the adder is, we can just use 2-level circuits to get the carries, and we can get the sum after just 4 gate delays.

The tradeoff of the carry-lookahead adder is that its area increases quadratically with the number of bits - each successive carry-in expression has one more term than the previous.

In practice, the carry-lookahead adder is a little bit too bulky in terms of area. We can combine the CLA and ripple adding concepts together in a hybrid design to get the best of both worlds - a reasonably compact, fast adder.

For example, for a 16-bit number, we might have 4 carry-lookahead adders with their carry-ins and carry-outs chained together like a ripple adder. This would have much less of a delay than a ripple adder, but also have a much smaller area than an equivalent carry-lookahead adder.

Common Logical Operations

A very common operation is to compare two unsigned numbers. We want a circuit that, given n-bit numbers A = a_n \cdots a_1, B = b_n \cdots b_1, outputs whether the numbers are equal, or whether one is greater than another.

Potentially, we could just subtract the two numbers and compare the sign of the result. However, this is a bit excessive - subtraction is a much more complex operation than we actually need for this purpose.

Equality is easy - two numbers are equal if all their bits are the same. We can just XNOR corresponding bits together and AND those results all together.

Comparison is also simple - we first compare the most significant bit, and if they are equal, we compare the second most, and so on. Mathematically, A > B \equiv a_n \overline{b_n} + e_{n - 1} a_{n - 1} \overline{b_{n - 1}} + e_{n - 1} e_{n - 2} a_{n - 2} \overline{b_{n - 2}} + \ldots + e_{n - 1} \cdots e_1 a_1 \overline{b_1}, where e_i = \overline{a_{i + 1} \overline{b_{i + 1}}} - whether the lower bit is enabled by the bit one higher than it. A < B can be calculated in a similar way, or by using the fact that A < B \equiv \overline{A > B \vee A = B}.

This is analogous to a carry-lookahead adder - we are making decisions for each output bit based on all the bits of the inputs that correspond to or come before it, and it has a constant height with respect to the number of bits. We could also implement this as something like a ripple adder by writing the expression as A > B \equiv a_n \overline{b_n} + e_{n - 1} (a_{n - 1} \overline{b_{n - 1}} + e_{n - 2} (a_{n - 2} \overline{b_{n - 2}} + \ldots + e_1 a_1 \overline{b_1}) \ldots )).

4/2/15

Got #rekt by the quiz.

6/2/15

Binary multiplication is also possible by adding together the partial products of one multiplicand with the bits of the other multiplicand. Basically, implementing the following identity: a_1 \cdots a_n \times b_1 \cdots b_n = (a_1 (b_1 \cdots b_n) \text{left shifted by} (n - 1)) + \ldots + (a_n (b_1 \cdots b_n) \text{left shifted by} 0).

We can do this by stacking up n n-bit AND gates, each offset from the previous by 1 bit. Each one ANDs together one of the multiplicands and one bit of the other multiplicand. The outputs of these gates are known as the partial products.

When we add together all the partial products, we get the 2b-bit product.

Another way of looking at it is that if a_i is 1, then we add b_i shifted by i - 1 to the partial result, for all i. The resulting sum is the product.

An n bit number has 2^n possible distinct values. Figuring out what each pattern represents is known as decoding. An n to m decoder is a device that recognizes n-bit patterns and outputs m-bit patterns in response.

Basically, a decoder has m separate functions of those n input bits that are 1 for certain minterms of n. A decoder generally also has an enable line - the decoder functions as described when this is 1, and the output is always 0 when this is 0.

An example of a decoder is a 7-segment hex decoder, which accepts a 4-bit unsigned binary number and outputs the 7-bit values for the hex digits on the display. A selector is also a decoder - it has one output for each input number, and turns one on for each recognized number.

In this course, after this point when we say decoder we refer to a selector - a device with n inputs and 2^n outputs (or fewer, if the input cannot reach certain states). One of the 2^n outputs is on for each of the 2^n possible distinct values for the n-bit input - there is exactly one output (distinct) that is 1 for each minterm. This circuit can be made with just NOT and AND gates.

Smaller decoders can be used together to make bigger decoders in decoder trees, in a tree-like structure. Some of the bits would be decoded and used to select the right second stage decoder to use, which then decodes the rest of the bits, or enables even more stages. Decoders can also implement functions, by ORing together multiple minterm outputs.

An active low decoder is enabled when and only when the enable pin is low. An active high decoder is enabled when and only when the enable pin is high.

9/2/15

An m to n encoder is the counterpart to an n to m decoder. An encoder has 2^n or fewer inputs and n outputs. The output is always a binary code corresponding to which input line is on (1 and only 1 input may be on at a time). This circuit can be made with just OR gates.

A priority encoder is an encoder that does allow multiple input lines to be on at the same time, and even none at all. The output is always the binary code corresponding the highest-indexed input that is on. There is also an output that is on if and only if at least one of the inputs are on, usually known as "valid".

An n-bit multiplexer (MUX) is a device that has 2^n or fewer inputs (of equal size) based on an n-bit selection input, and outputs one of the inputs based on the value of the selection input. For example, a 1-bit multiplexer outputs the first input if the selection input is 0, and the second input if the selection input is 1. Multiplexers can easily be implemented using tri-state buffers.

A 2-input multiplexer can therefore simply be written as \overline s x_1 + s x_2. A 4-input multiplexer can be written as \overline{s_1} \overline{s_2} x_1 + \overline s_1 s_2 x_2 + s_1 \overline s_2 x_3 + s_1 s_2 x_4. Larger multiplexers can also be built up from smaller multiplexers, in a tree structure.

Any n-input function can be implemented using an n-1-bit multiplexer by looking at the truth table. This is done by using the first n - 1 bits of the input into the multiplexer as the select lines, and the last bit as an input as one of x, \overline x, 0, 1.

Shannon decomposition is the process of decomposing functions for implementation using multiplexers. We can implement logic gates using just small multiplexers, so we can also implement functions using just small multiplexers.

Given any function f(x_0, \ldots, x_n), we can always factor it into two functions \overline{x_0} f(0, x_1, \ldots, x_2) + x_0 f(1, x_1, \ldots, x_n). Note that these two functions (known as the cofactors) are now n - 1 input functions, and that they are the inputs to a 1-bit multiplexer. By applying this repeatedly we can implement the entire function using just 2-bit multiplexers.

A demultiplexer is the counterpart of the multiplexer. It is exactly the same thing as a decoder/selector, but with the meanings of the pins changed 0- the enable line is actually the data input, and the data inputs are actually the select lines. The demultiplexer sets the output selected by the select lines to the value of the data inputs. The data input can also be multiple bits, unlike a decoder.

Tri-state buffers are very useful for making bus-like circuits (a bus is just a bundle of wires that carry data around), when we want to drive a wire from different sources at different times (for example, the data bus on a computer). We can have all the outputs of the tri-state buffer feed into the same wire, and if we ensure that only one enable bit is on at a time, the wire will only be driven by one source at a time.

Memory

All of our circuits so far were all functions of the inputs - they were combinational and do not have state, and always work the same regardless of past inputs.

We also need stateful circuits, for things like memory. The most important storage elements are latches and flip-flops.

A latch is the simplest type of storage element. The following is a SR-latch (also known as NOR latch):

;wip: NOR SR latch and circuit symbol

Note that the outputs feed back into the input - the outputs depend on themselves. We can't write a truth table for this because its value depends on what it already is. However, the output of this can be stable (non-changing) when S and R are not both true.

The S (set) input forces Q to be 1. The R (reset) input forces Q to be 0. When neither are on (hold), Q retains its last value. If both S and R are on (undesirable), Q is 0 (and so is \overline Q) and becomes unstable when we stop making both S and R true.

10/2/15

We can also make a NAND latch, which is basically the same thing but with the inputs inverted - the \overline S \overline R-latch.

We analyze circuits like this by applying inputs, seeing how the signal propagates to the output, and then checking if the output value is stable. If stable, we know that the output stays at the found value while we keep that input.

When we have a switch, there are generally mechanical contacts that will bounce (and the outputs will momentarily be floating) before settling down into their desired state. We can filter out this noise using a latch - when we connect the two ends of the switch to S and R, the latch will keep the previous state when the switch goes floating. This is called switch debouncing.

A gated latch is basically an \overline S \overline R latch, but with an additional input G (gate) to control when the S and R inputs are enabled, or just to hold:

;wip: NAND gated latch with S, R, G, Q and circuit symbol

We want to get rid of the undesirable state when S and R are both 0. We can do this by making S and R always complements of each other, just using a NOT gate. We can do this because we can still hold, set, and reset in other states. This is a gated D-latch:

;wip: gated D-latch with D, G, Q and circuit symbol

This latch basically has the output Q follow D (data) when G (gate) is on, and holds Q when G is off. This is basically a memory cell.

D-latches are useful, but we often want an edge triggered latch - a latch updated by the rising or falling edge of the gate signal. We can do this using a MS (master-slave) flip-flop:

;wip: MS-latch with D, CLK, Q, two D-latches, and circuit symbol

This is a falling edge-triggered D-type flip-flop, or a D-flip-flop. When CLK is high, the master latch follows D, but the slave latch is still holding at its previous value. When CLK goes low again, the slave latch is updated to the value of the master latch, which was the most recent value of D before the clock signal went low.

As a result, it only updates its output value on the falling edge of CLK. In reality, there is a short period of time before and after the clock signal goes from 1 to 0 in which D must be stable. This is because there is a setup time T_{SU} (data path length from D to CLK) for the master latch to get its value before the transition, and some hold time (T_H) for the master latch to lock in its value after the transition). There is also a delay for the value of the master latch to get propagated to the output, the clock to output time T_{CO} (data path length from CLK to Q).

We can also make a rising edge-triggered MS-latch by inverting the CLK signal.

Latches are level-sensitive devices - they operate based on the logical level of signals like CLK and G. Flip-flops are edge-sensitive devices - they operate based on the transition in the logical level of signals like CLK and G.

11/2/15

Essentially, the D-flip-flop implements the characteristic Q(t + 1) = D.

The clock line on circuit elements is often represented using a triangle pointing inward from the pin. The symbol for the D-type flip-flop is the same as that for a D-latch, but with a clock input instead of a gate input.

If there is just a clock pin by itself, the device works on the rising edge. If there is a bubble on the clock pin, then the device works on the falling edge.

The D-flip-flop might also have S, R, and EN (enable) inputs. S and R work as in the SR latch, setting or resetting the device, regardless of the value of D. The output is updated on the next triggering edge if the input is synchronous, and instantly if the input is asynchronous. One of S or R will have priority over the other, so behaviour when both S and R are on is well defined. Enable controls whether the clock signal is connected.

A T-flip-flop is one with inputs T and CLK, and an output Q. When T is on, the triggering edge of CLK causes Q to toggle - to go to 1 if it was 0, and 0 if it was 1. Otherwise, it holds its value.

The T-flip-flop implements the characteristic Q(t + 1) = T \oplus Q(t). Basically, it XORs its current value with an input, which can sometimes make our circuitry simpler - calculating T is sometimes easier than calculating D.

A T-flip-flop can be implemented with a D-flip-flop where D is connected to an XOR gate with T and Q as inputs.

A JK-flip-flop is one with inputs J, K, and CLK, and an output Q. On the triggering edge of the clock signal, Q stays the same when J and K are 0, resets when K is on, sets when J is on, and flips when both are on. The JK-flip-flop can therefore act like a T-flip-flop as well.

The JK-flip-flop implements the characteristic Q(t + 1) = J\overline{Q(t)} + \overline K Q(t). The JK-flip-flop tends to have the simplest inputs to calculate in practice - it is often the case that the logic circuits we connect to J and K are much simpler than the logic circuits we might connect to D in a D-flip-flop.

13/2/15

A register is a group of flip-flops designed for a common task, often having a common clock line. A register is commonly used for holding data, loading data, or shifting its contents by one bit (and adding a new bit at the empty space). A register that supports all these operations is a shift register.

We can implement a shift register as follows:

;wip: 4-bit shift register schematic, using 4 multiplexers with common select each connected to a D of the 4 D-FFs and the multiplexer inputs connected as appropriate, should have 4-bit data input, and lines for shift up data in and shift down data in, and support load, hold, shift up, shift down

This multiplexer-as-data-input-to-register technique is very useful for making registers perform a variety of operations. Each input to the multiplexers can perform a different function.

An n-bit binary ripple up-counter counts from 0 to 2^n - 1, then wraps around and repeats. We can actually implement this using T-flip-flops that are chained together - the inverted output of one goes to the clock line of another, and the next bit flips whenever the current bit transitions from 1 to 0, and all the T inputs are set to 1. This works because of the period of a bit i is 2^i, and each time we use a T-flip-flop, we increase the period by a factor of 2.

;wip: T-FF binary ripple counter

This is good because it's very compact and scales up to higher bit inputs very well. This is bad because each flip-flop has a different clock, which means that as the number of bits increases, the propagation delay increases as well.

2/3/15

The ripple counter has the same issue as the ripple adder - the propagation delay gets really long as the number of bits gets higher. For example, if we are at the maximum value of the counter and try to count again, every bit has to flip, ad the signal propagates through all the flip-flops one by one.

The ripple counter is an asynchronous circuit because all the flip-flops have different clock lines. The value of the current bits depends on the changes in the previous bits

We can also design a synchronous counter, by connecting all the clock lines on the T-FFs together, and using combinatorial logic to determine when we need to flip based on the previous value of the bits rather than the change in them.

Using a transition table for all the output bits, we find that a bit a_i toggles when all bits a_0, \ldots, a_{i - 1} are 1 - when a_0 \wedge \ldots \wedge a_{i - 1}.

;wip: synchronous counter by using T-FFs, first T is 1, second T is a_0, third is a_0 a_1, fourth is a_0 a_1 a_2, etc.

Now, the clock edge is causing the change for all the T-FFs simultaneously, which means the output takes only the delay of one T-FF. Although the AND gates take time to propagate, it does not affect the output, only the maximum frequency at which we can run the circuit - the critical path goes through all the AND gates and includes the setup time of the last T-FF (we draw this as a line from a_0 through all the AND gates, into the T of the last T-FF and down to the CLK of the last T-FF).

The symbol for an adder is generally some variation of:

;wip: box with inputs I, LD, CLR, CNT, CLK

Here, we have some additional functionality in addition to just counting - loading a value from I (n-bit input) on clock rather than counting if LD (load value) is true, clearing the value to 0 asynchronously if CLR (clear value), and counting only when CNT (count enable) is on otherwise.

We can now make a counter that starts at 3, counts to 10 inclusive, then goes back to 3 and repeats. This can be done by using an AND gate with selectively inverted inputs such that it is 1 whenever the output is 10 or more. This AND gate would be OR'd with a "RESET" line, and the OR result is then connected to LD. The input would then be set to 3 whenever we get to 10, or when we turn on the RESET line.

4/3/15

It is possible o make counters that count in any sequence, using any type of flip-flop. For example, we will design a counter that counts the sequence 3, 4, 7, 1, 0 and then repeats.

First, we write them as binary values: 011, 100, 111, 001, 000. Clearly, there are 3 bits and we need 3 flip-flops.

Now we can write a state table, and fill in the flip-flop inputs with functions of the current output such that the next output is the desired value:

Current output (q_2 q_1 q_0) Next output d_2 d_1 d_0 t_2 t_1 t_0 j_2 k_2 j_1 k_1 j_0 k_0
011 100 100 111 1x x1 x1
100 111 111 011 x0 1x 1x
111 001 001 110 x1 x1 x0
001 000 000 001 0x 0x x1

As it turns out, the rightmost three columns are three truth tables, each for a different type of flip-flop. We can make a K-map for each variable of each column with respect to the current output to obtain a simplified expression for each one type of flip-flop, with don't cares for the missing truth table rows. For example, for D-flip-flops we have d_0 = q_2 + \overline{q_0}, d_1 = \overline{q_0}, d_2 = q_2 \oplus q_1.

Now we can draw the circuit diagram for D-flip-flops: 3 D-flip-flops with a common clock line, the outputs leading into gates that calculate d_0, d_1, d_2, and d_0, d_1, d_2 connected to the corresponding inputs of the flip-flops.

We also likely want an "init" line that asynchronously sets the counter to a known state, like 3, the beginning of the sequence. This can be done by connecting a line to the asynchronous set pin of d_0, d_1, and the asynchronous reset pin of d_2, and turning this line on for a little while will reset the entire counter to 3.

;wip: circuit diagram

For T-flip-flops, we have t_0 = \overline{q_2} + \overline{q_0}, t_1 = q_1 + \overline{q_0}, t_2 = q_1. For JK-flip-flops, we have j_0 = 1, k_0 = \overline{q_2}, j_1 = q_2, k_1 = 1, j_2 = q_1, k_2 = q_0.

In real-world circuits, JK-flip-flops tend to result in the simplest logic for the inputs. Generally, most of the flip-flop inputs will simply be outputs, inverted outputs, or constants, which makes building these circuits very simple.

6/3/15

Synchronous circuits have their output dependent on the circuit inputs and the current flip-flop outputs - the current state. The inputs to the flip-flops are based on the next state equations, which depend on the primary inputs (circuit inputs) and the current state. The primary outputs of the circuit depends on the current state and optionally also the primary inputs (combinatorial part of the circuit).

Synchronous circuits basically go through a sequence of states depending on the input. If the primary outputs of the circuit depend only on the current state (and not the primary inputs), the circuit is a Moore machine, otherwise a Mealy machine. Moore machines are generally nicer because their outputs don't change when the clock is low and the inputs are changing, while Mealy machines have their output always dependent on the inputs, even when the clock is low.

A Mealy machine can be converted into a Moore machine by duplicating each state that can possibly have a different output value when on that state, then changing the transitions that were going into the old state to each go to the right duplicated state. The duplicated states have different output values, but the transitions from them are all the same.

We can describe synchronous circuits using a state diagram. This is a diagram that shows all the possible states of a system, the possible transitions between states based on current inputs and state, and outputs. We can draw this as a directed graph with nodes labelled with the state name, like "standing" or "sitting", and edges labelled with the inputs that would result in the transition that edge describes, like x = 0, y = 1. The outputs of the circuit can be written inside the nodes (for a Moore machine, since we can't reference inputs here), or on the transitions, like x = 0, y = 1 / q_0 = 1, q_1 = 0 (for Mealy machines).

Each state is represented in a circuit by a pattern of bits in the flip-flops. We would have a state in the diagram for every possible output of the flip-flops. Implicitly, if we are at a state that doesn't have a transition for the current inputs, we just stay on that state.

A Mealy machine has its outputs depend on the inputs, so if the inputs are changing, the output is also changing. A Moore machine does not - the outputs change only on the clock edge. Mealy machines are nice because they generally need fewer states, but Moore machines are nice because our timing diagram is simpler and only changes on clock edges.

A counter can also be drawn as a Moore machine with zero inputs - its outputs don't depend on any inputs and changes on each clock edge.

9/3/15

A state table represents the same information as a state diagram, but in the form of a table. Rows are states labelled with letters, columns are labelled with possible input values, and cells are the state names to transition to when in that row's state with that column's inputs. There are also additional columns on the right to possible input values, where the cells are the output values for that row's state with that column's inputs.

Suppose we want a circuit with input x and output z. z should be true whenever the last 4 values of x are 1011 - a sequence detector. We will first create a state diagram:

S0 > 0 > S0
S0 > 1 > S1
S1 > 0 > S10
S1 > 1 > S1
S10 > 0 > S0
S10 > 1 > S101
S101 > 0 > S10
S101 > 1 > S1011
S1011 > 0 > S10
S1011 > 1 > S1
S1011: output 1
output 0 otherwise

This is basically the same thing as a DFA that matches 1011, except the accepting state changes the output.

Generally, we can also leave out transitions that just loop back into its state, so we only draw those that cause a state change. This makes the diagrams a little cleaner.

Now we could potentially draw a state table:

Current State x = 0 x = 1
S0 S0, z = 0 S1, z = 0
S1 S10, z = 0 S1, z = 0
S10 S0, z = 0 S101, z = 0
S101 S10, z = 0 S1011, z = 0
S1011 S10, z = 1 S1, z = 0

Since we have 5 states, we will need at least 3 flip-flops, since they can represent up to 8 states. For each state, we need to assign a set of flip-flop values - state assignment. For now, we will assign them directly as 0, 0, 0 up to 1, 0, 1. However, different assignments could potentially make the circuit simpler.

10/3/15

To make a pattern recognizer state diagram, we basically first add the nodes for the pattern, add nodes for recognizing the pattern in particular, then connect all the missing edges.

Sometimes we also do state reduction, which is the process of trying to cut out unnecessary states in order to make the circuit simpler. In this example, state reduction is not possible, since 5 is minimal.

The above state table represents a Moore machine, since the value of z is the same in each column for any given row. If the state is different, then we have a Mealy machine.

This time, we will make the circuit using T flip-flops. Clearly, the next state is a function of the current state, 3 variables, and the input x, so 4 variables in total. We need to draw the state table with the binary values of the states:

q_2 q_1 q_0 x = 0 x = 1
000 000, z = 0, t_2 t_1 t_0 = 000 001, z = 0, t_2 t_1 t_0 = 001
001 010, z = 0, t_2 t_1 t_0 = 011 001, z = 0, t_2 t_1 t_0 = 000
010 000, z = 0, t_2 t_1 t_0 = 010 011, z = 0, t_2 t_1 t_0 = 001
011 010, z = 0, t_2 t_1 t_0 = 001 100, z = 0, t_2 t_1 t_0 = 111
100 010, z = 1, t_2 t_1 t_0 = 110 001, z = 0, t_2 t_1 t_0 = 101

Now we make 3 K-maps, one for each t_0, t_1, t_2:

q_2 q_1\q_0 x 00 01 11 10
00 0 0 0 0
01 0 0 1 0
11 x x x x
10 1 1 x x
q_2 q_1\q_0 x 00 01 11 10
00 0 0 0 1
01 1 0 1 0
11 x x x x
10 1 0 x x
q_2 q_1\q_0 x 00 01 11 10
00 0 1 0 1
01 0 1 1 1
11 x x x x
10 0 1 x x

So t_2 = q_2 + q_2 + q_1 q_0 x, t_1 = q_1 \overline{q_0} \overline{x} + q_2 \overline{x} + q_1 q_0 x + \overline{q_2} \overline{q_1} q_0 \overline x, t_0 = \overline{q_1} x + q_0 \overline x + q_1 x.

We also need the output equations - a function that maps q_2 q_1 q_0 and x to z. Using a K-map, we get z = q_2.

Now we can construct the circuit. Basically, this is the same design as a counter, but now we have an input, and the output is a function of the flip-flop states:

;wip: circuit diagram, counter with one additional input x to the circuit, make sure to remember to add the reset pins.

So the steps for synchronous design is drawing a state diagram, a state table, optionally reduce the state table, pick flip-flop type and number, assign flip-flop states to circuit states, find equations for the flip-flop inputs and the circuit outputs, then implement the circuit.

13/3/15

Sequential Circuit Analysis

This is the opposite of design - given a circuit, derive a state diagram.

First, we need to identify the flip-flops. The number of flip-flops determines the largest number of possible states, since they are the only memory elements in the circuit. Not all of the flip-flop states are necessarily used, but we do need to know the upper bound on the number of states. We also label the flip-flops at this point.

Then, we derive the logic equations for the flip-flop inputs (to get the state transitions) and the circuit outputs (to get the circuit output values), both in terms of the circuit inputs and circuit outputs.

Now we will create something like a state table, with the columns being current state, flip-flop inputs (one for each flip-flop), next state, and flip-flop outputs. First, we fill in the current state column with every possible combination of values for the flip-flops. For a 2 flip-flop circuit, we use 00, 01, 10, 11.

Each column is filled out with their values given the current state in that row and every combination of inputs - so if there are 3 inputs, we would have 8 values per cell.

Now we basically have a state table, by taking only the current state, next state, and output columns. We can directly draw a state diagram from this, since it is basically a DFA description. Each row corresponds to a state, and each value of the next state corresponds to a transition given that combination of inputs.

When we have n states, we must have at least \ceil{\log_2 n} flip-flops to represent those states.

State assignment is the process of assigning binary patterns to symbolic states, like in a state diagram. The binary values we give to each state, and therefore which values in the flip-flops represent which states, could potentially make the logic for the flip-flop inputs simpler or more complex, depending on how the Boolean algebra works out. This also affects the number of flip-flops we need, when we do state reduction.

There is no known efficient way of finding an optimal assignment of binary values for the states, or any technique better than just trying every possible state assignment (they are the n! permutations of binary values from 0 to n where n is the number of states).

The way we have been using so far, where we number the states and map states to their binary numeric values, is known as binary encoding.

If we have a Moore machine, it might be beneficial to just use the desired circuit output values as state values - the circuit outputs themselved are the states. This is called output encoding. If the outputs aren't unique, we can just add another flip-flop and use it just to differentiate the non-unique values (for example, if we have two states that both output 0000, then we might add another flip-flop that is 0 for one and 1 for another, so their assignment is unique again). This makes the output logic very simple, because we can just output the flip-flop values.

16/3/15

One hot encoding has us use one flip-flop per state, where only one flip-flop is on at a time, and the one that is oncorresonds to the state we are in. For example, for 6 states we need 6 flip-flops, with state assignments like 000001, 000010, 000100, 001000, 010000, 100000

This, obviously, uses a lot of flip-flops, but makes the flip-flop input and circuit output logic very simple. As a result, these circuits tend to be faster since the inputs and outputs are smaller.

This actually makes it possible to do input and output logic directly from a state diagram - a state becomes 1 if any of the flip-flops that represent stares that transition to it are 1. For example, if there is a transition from 010 to 100 when x = 0, then the value of q_2 has the term q_2 = q_1 \overline{x}. By constructing these terms for every transition to that state and adding them together, we can directly get the input equations.

To get the output equations, we just do an OR for each bit, where the inputs are the states for which that bit o the output is on. For example, if an output is on whenever we are in state 1, 4, and 6, we simply use an OR connected to those states.

So the common techniques for state assignment are binary encoding (which tends to reduce the number of flip-flops), output encoding (which tends to simplify output logic), and one hot encoding (which tends to reduce input and output logic at the expense of flip-flop count).

State reduction is the process of reducing the number of states, in order to simplify logic and flip-flop count. Basically, we have a state table with redundancy, and we want to remove the redundancy.

Two states are equivalent if all of the following are true:

If these conditions are met, we can combine the states together - all transitions that go to one of the states can simply go to the other.

Given a state table, we can draw an implication chart. If we have the following table:

state | next when x = 0, next when x = 1 | output when x = 0, output when x = 0
S0 | S3 S2 | 1 1
S1 | S0 S4 | 0 0
S2 | S3 S0 | 1 1
S3 | S1 S3 | 0 0
S4 | S2 S1 | 0 0

We can draw the following implication chart:

S4  X
S3  ?  X
S2  X  ?  X
S1  X  ?  X  ?
   S0 S1 S2 S3

This is a matrix where we fill in X if the state in the X axis is definitely not equivalent to the state in the Y axis, generally by property 1 - the outputs differ. Now we check property 2, by writing in the pairs of states that, all together, are necessary and sufficient to say the state on the X axis is equivalent to the one on the Y axis.

If we have don't cares as next transitions, that means that there is some transition for that state and input value, but we don't care what it is. Suppose state A's next input results in state W, state B results in don't care, and state C results in state X. We can group B with A if we pretend that the don't care value is a W, or we can group B with C if we pretend that the don't care value is an X, but we can't group all of them together since the don't care cannot be both W and X. In other words, for don't care values we pretend that it is an actual state in the state table, but we can choose which state that is as needed.

By repeating this process, we eventually manage to get exactly those states that are equivalent:

S4  X
S3  *  X
S2  X  X  X
S1  X  *  X  X
   S0 S1 S2 S3

State pairs that are definitely equivalent are marked with a *, so S0 and S3 are equivalent, and S1 and S4 are equivalent. A merger diagram is a graph where states are vertices and edges show equivalence between states. The number of components in the graph are the number of final states we need, and each component is a state and is a fully connected set of states.

18/3/15

The merger diagram gives us sets of states that are equivalent to each other. For example, the merger diagram from the implication chart above gives the sets \set{S0, S3}, \set{S1, S4}.

The equivalent states can now be merged. For example, if S1, S3, and S5 are equivalent, then we can replace them both with a new state A that has the same output and transitions as all of these states, and all transitions that went to them before now transition to A instead.

;wip: do this example and get \set{a, c, d}, \set{b, e, f} as the equivalent sets:

state | next when xy = 00, 01, 11, 10 | output
a | c a b - | 0
b | - a b e | 1
c | c a - d | 0
d | c - b d | 0
e | f - b e | 1
f | f - - e | 1

ASM charts show the same thing as state diagrams, but in a flow chart form. They are simply flow charts with the following elements:

In ASM charts, we only write outputs if they are 1, and just avoid writing them otherwise. ASM charts are simply another way to draw a state diagram, but in a way that looks more like an algorithm.

If we have a state diagram, and we are at a state that doesn't have a transition arrow for the current inputs, then we assume that we stay at that state. In other words, missing transitions are assumed to loop back into the current state.

20/3/15

An interesting thing about one-hot encoding is that, given an ASM chart, we can translate it directly into a circuit, without all the state tables and K-maps.

State boxes are just a D-flip-flops, with the entry arc being D and the exit arc being Q.

Decision boxes are just demultiplexers, with the select line being the index of exit arc that was matched, the entry arc as data input, and the exit arcs as outputs. If we are deciding based on only one bit, this is equivalent to the circuit \text{exit 0} = \overline{D} \wedge \text{variable is false}, \text{exit 1} = D \wedge \text{variable is true}, where the entry arc is D, the arc labelled 0 is exit 0, and the arc labelled 1 is exit 1.

If more than one arc goes into a box, they are all OR'd together and then this is passed in as a single input.

Conditional output boxes are not directly translatable, but can still easily be drawn into the circuit. Basically, we figure out an expression for when that output is true with respect to the states and the inputs, and then add combinationial logic to the circuit that implements the output.

Asynchronous Sequential Circuits

Asynchronous sequential circuits are those with memory, but no clock. These are characterized by having combinational feedback paths and tend to use latches - the feedback paths don't have to wait for the next clock edge to change the inputs. Since they are not clocked, asynchronous sequential circuits can be made to work as fast as their logic allows.

In general, an asynchronous sequential circuit is a circuit with an n-bit input, an m-bit output, and also a k-bit output that feeds back into another k-bit input. This feedback loop also has a hypothetical delay \Delta on it. Within the circuit, there is only combinatorial logic.

;wip: diagram

This cannot be analyzed using the techniques we used for purely combinatorial or clocked circuits. However, we observe that these circuits have memory, and these are states that we can analyze, even if not clocked.

In fact, the k-bit output can be used as the next state (the excitation variables Y). The k-bit input can be used as the current state (the secondary variables y). Since Y is connected to y by a wire with a hypothetical delay, y is just a time delayed version of Y.

For a given set of circuit inputs, a circuit is stable if and only if Y = y. Otherwise, it is unstable and has changing Y values.

For example, x = \overline{xy} is an asynchronous sequential circuit that is stable when y = 0 and unstable when y = 1.

Generally, we want circuits to go from stable state to stable state in response to inputs. It is often easier to design circuits if we can assume only one input will change at a time (the fundamental mode assumption, where the circuit operates in fundamental mode).

Consider the asynchronous sequential circuit Z = \overline{Y_1} Y_2, Y_1 = x y_1 + \overline{X} Y_2, Y_2 = \overline{X} y_2 + X \overline{y_1}.

Note that in the circuit, only Y_1, Y_2 feed back on themselves - these are the feedback lines (the variables that appear on the right side of the formulas that also appear on the left).

To analyze this, we hypothetically cut these lines, so Y_1, Y_2 are now outputs, and y_1, y_2 are now inputs. This is basically the process in which we insert the hypothetical delay on the feedback loop, in order to separate the input from the output. Even though they are connected by wire, y_1 is distinct from Y_1 in that there is a time delay in the signal at Y_1 propagating to y_1.

Now we have Z = Y_2 \overline{Y_1}, Y_1 = x y_1 + \overline{X} Y_2, Y_2 = \overline{X} y_2 + X \overline{y_1}, where y_1, y_2 become Y_1, Y_2 after a short time.

Clearly, we can now use (y_1, y_2) as the current state, and (Y_1, Y_2) as the next state. This allows us to analyze the circuit using standard sequential circuit analysis.

First, we can draw a transition table. This is the same thing as a state table, but for asynchronous circuits:

current state y_2 y_1 next state when X = 0, X = 1 Z when X = 0, X = 1
00 00, 10 0, 1
01 00, 01 0, 0
10 11, 10 0, 1
11 11, 01 0, 0

Note that whenever the next state is equal to the input state, the circuit is stable, and whenever it is not, there is a change of state. As a result, the circuit is stable when x = 0 and the current state is 00, 01, or 11, and when x = 1 and the current state is 10.

24/3/15

A hazard/glitch is a momentary unwanted transient output in the circuit, caused by different paths in a combinatorial circuit having different propagationm delays.

For example, suppose we have a signal directly connected to an XOR gate's input, and also connected to the other gate's other input via two NOT gates in series. When we turn on the signal, even though the gate theoretically gets 1 on both inputs at the same time, the propagation delay of the NOT gates means that one of them will turn on before the other. Therefore, the XOR gate will momentarily turn on even though the same signal was given as input.

Hazards are very important in asynchronous circuit design, because they can potentially cause unwanted transitions to stable states when the transient values represent stable states.

A static-0 hazard is a situation in which the output should stay at 0, but temporarily switches to 1 in the circuit. A static-1 hazard is a situation in which the output should stay at 1, but temporarily switches to 0 in the circuit.

A dynamic hazard is when the output should switch values, like from 0 to 1 or 1 to 0, but flips between the values, possibly multiple times.

To analyze hazards for a particular circuit for a particular variable, we need to draw a timing diagram with graphs for the logic level at each node in the circuit, and basically do a timing simulation of the circuit for when that variable changes. Hazards show up in the output as blips where there should be smooth, clean transitions.

When we have a 2-level sum-of-products circuit, notice that all the hazards are static-1 hazards, and happen when we "switch between terms" - static-0 hazards can't happen because no AND gates will temporarily turn on, and dynamic hazards cannot happen because only 1 AND gate is on at a time.

Basically, when moving from one AND gate to the next, it is possible for the one we are leaving to become 0 before the new one becomes 1. On the K-map, this is when we have two rectangles that don't overlap:

q_2\q_1 q_0 00 01 11 10
0 0 1 0 0
1 0 1 1 1

If we have only the two 2-box rectangles, then there is a hazard when q_1 switches, since we are switching between terms. To fix this, we can add a third rectangle that includes 101 and 111, so the transitions between 101 and 111 are fully covered by another AND gate while we switch between them - this prevents the OR gate from going low.

If we remove all these static-1 hazards using this redundant term technique, we can guarantee that the circuit is entirely hazard-free. For products of sums, it is possible to get static-0 hazards, when "switching between factors", and likewise we can fix these using redundant factors.

For asynchronous circuits, we just treat the next state as one of the inputs, and make sure that there are redundant terms that cover transitions between 1 values.

This allows us to make hazard-free 2-level combinatorial and asynchronous circuits, but multi-level circuits are more difficult. One solution is to convert the multi-level circuit into a 2-level circuit, but this could potentially make the circuit much larger.

Another solution is to use SR latches - since we have the hold state, we can have transient 0 values (or 1 values for SR NAND latches) appearing without any issues, since that just causes the output to hold.

To use SR latches, we just determine equations for when the output should be on (the set line) and when it should be off (the reset line), using techniques like drawing a K-map. When we do this, any static-1 hazards are eliminated, since the latch will just hold its last value if this occurs. However, this adds additional complexity and results in 2-level circuits anyways.

In flow/transition tables, the output value for some states may be unspecified - generally, for unstable states. Earlier we treated these as don't cares for calculating the output combinatorial equations, but in our state machines we will be passing through these states, and we need to make sure we don't have any hazards when we do so. For example, an unstable state that outputs 1 in between stable states that output 0 is a static-0 hazard.

Therefore, to avoid hazards, whenever we have a transition between stable states that goes through an unstable state, we need to make sure we aren't introducing a hazard. If we transition from a state outputting 0 to a state outputting 0, then any unstable states in between must also output 0. If transitioning between one that outputs 1 to a one that outputs 1, then any unstable states in between must also output 1. Otherwise, the source and target state output different values, so the unstable states in between simply need to have the same value (or values that don't cause hazards when transitioning through them).

25/3/15

A race is a potential problem in asynchronous circuit design. When an input changes, and in response requires two or more state variables to change, a race is possible. When two state variables change at once, they don't actually change exactly at once due to circuit timing, and one might change before the other.

A non-critical race is when the state machine ends up in the same state after both variables changed, regardless of which state variable changes first. If this is the case, we don't really need to care about the race, since the result will still be correct.

A critical race is when the state machine could end up in different states depending on which input changes first. There is a "competition" between the state variables to change the next state outputs first. This is a bad thing to have and should be avoided if possible. It might not happen in the real circuit if the delays are right, but in theory is still incorrect.

So in the flow table or state diagram, we have to trace the logic to make sure that the state machine will go to the same stable state for every possible combination of state variable changes.

If we have the following state machine:

current state y_2 y_1 next state when x = 0, x = 1
00 00, 11
01 00, 01
10 00, 11
11 00, 01

Assuming we start at state 00 and x = 0, if we transition to x = 1, we change both y_2, y_1. Therefore, it is possible that first we will go to 01, 10 in addition to the one we'd expect, 11, because each variable could change after different delays.

If we go to 11 first, then the circuit will settle down into 01 as a stable state. If we go to 01 first, then the circuit will also go to 01. If we go to 10, we will still go to 01. Since we go to 01 regardless of which possible state we might go to first, we have a non-critical race.

Races are introduced during state assignment - if we manage to assign states such that every input transition causes only one state variable to change, then we will never have races. We can do this by taking all the state values, like 00, 01, 10, 11, and drawing a graph G where they are the nodes, and there are bidirectional edges between those nodes that differ by only 1 bit. Then, we try to find the state diagram graph as a subgraph of G. This is often done by trial and error.

Basically, in state assignment we can avoid races entirely by having the next states of each state differ from that state by only one bit. If we do this, we guarantee that there is only 1 state variable changing at a time, so races are impossible.

A cycle is when there is a set of unstable states that cycle between each other. For example, a JK-latch set to toggle is in a cycle, and will keep oscillating between the states as long as the inputs allow for the cycle to exist.

For example, if we have the following transitions: a \to b, a \to c, b \to a, b \to d, c \to b, c \to d, d \to a, d \to c, we can use a cube-like graph for the state values 000, \ldots, 111. Although we might get a race by having a state X transition to state Y if they differ by 2 bits, we can add an unstable states that always transitions to Y, and then transition to that unstable state instead.

For example, if we wanted to get from 00 to 11, and the state value 01 is unused, we can make 01 an unstable state that transitions to 11, and then just transition to 01 instead of 11 - now, we have two transitions, but each one only changes one bit. Using trial and error, we might get a state assignment a = 000, b = 001, c = 010, d = 110, t_1 = 100, t_2 = 011, t_3 = 101, t_4 = 111 using the additional transitions t_1 \to a, t_2 \to b, t_3 \to t_4, t_4 \to d.

27/3/15

Note that state transitions using one-hot encoding always change exactly two bits. To eliminate races, we can use the technique outlined before where we add unstable states that transition to our target state. For example, to transition from a state 0010 to 0100, we might transition from 0010 to a temporary, unstable state 0110, that then always transitions to 0100. We can actually add these unstable states systematically for every possible pair of states.

To support transitioning back and forth in both directions, we could simply have those unstable states transition to a different state depending on the input, rather than always transitioning to a single state. It can be proven that this will always be possible.

Basically, this is constructing a 4 dimensional space in which we can easily transition to any state using at most just a single unstable state.

Real world problems have far more states, and heuristics get a lot harder. We want to find an algorithm so the computer can do the work instead.

There is one more technique for eliminating races, which only works for state tables with 4 or fewer states. The basic idea is to make a cube where each state takes up 2 adjacent vertices, and every state is adjacent to every other state. Since every state is adjacent to every other, it is trivial to transition to any state (or its duplicate) while changing only 1 bit:

    d2 ------- c2
   /|         /|
  / |        / |
 /  d1 -----/- c1
b1 ------- b2 /
| /        | /
|/         |/
a1 ------- a2

We first duplicate every state in the state table, by writing every row twice, so now the states a, b, c, d become a_1, a_2, b_1, b_2, c_1, c_2, d_1, d_2. Now, when we want to transition to any state, we simply choose which duplicate we transition to - there is always one that differs by only 1 bit. Using the cube, we find that one possible state assignment is a_1, \ldots, d_2 = 000, 001, 100, 101, 011, 111, 010, 110.

If we have an asynchronous circuit with latches, we can just view the latches as a circuit element/black box, where the latch outputs are the next state. We proceed with analysis just as before, getting the next state/outputs as a function of the current state, but we consider the latch inputs as well as the circuit inputs. This is basically just a way to simplify asynchronous circuit analysis by using the equation for the SR latch directly.

A NOR SR latch has the equation Y = S \overline R + \overline R y. As a result, for any circuit we can just substitute the equations for S and R, and get Y in terms of the circuit inputs and feedback lines, which then allows us to write a transition table.

30/3/15

Exam is April 20, 2015 at 4pm to 6:30pm in PAC 4, 5, 6, 5 questions, 1 from each quiz, 2 synchronous circuit design, one asynchronous circuit design

Creative Commons License This work by Anthony Zhang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Copyright 2013-2017 Anthony Zhang.