Lecture Notes by Anthony Zhang.

CO487

Applied cryptography.

Alfred Menezes
Section 001
Email: ajmeneze@uwaterloo.ca
Office hours: Mondays 3:00pm-5:00pm, Fridays 1:00pm-3:00pm in MC 5026
Mondays/Wednesdays/Fridays 11:30pm-12:20pm

4/3/16

All course resources are on LEARN. Course has 5 assignments (summing up to 20% of final grade), midterm (worth 30%), and a final exam (worth 50%). Midterm is March 8, 2017 at 7-9pm.

Cryptography is a tool we can use to secure communications when there are malicious adversaries. Some of the fundamental goals of cyptography are:

Different applications might require different subsets of these goals.

In this course, we use Bob and Alice as placeholders to represent two parties trying to communicate securely. We use Eve or Mallory as a placeholder to represent one or more adversaries trying to attack the communications (by injecting data, eavesdropping, or other approaches, depending on the threat model)

Some famous early practical uses of cryptography was the enigma machine, which used multiple spinning rotors to encrypt/decrypt messages. In that cryptosystem, the German military leadership and the German U-boats had the role of Bob and Alice, trying to communicate confidentially, and Alan Turing and his team had the role of Eve, trying to break the confidentiality guarantees and read those communications.

However, cryptosystems like Enigma and its successor, Lorenz, are a far cry from modern cryptosystems, which have a much more mathematically sound foundation. Modern cryptography is what makes online banking, online shopping, and cellular networks possible.

For example, SSL, the protocol that makes online security possible, ensures origin authentication and confidentiality between a user and a website. SSL uses symmetric-key encryption to implement confidentiality, and a MAC scheme (HMAC) to implement origin authentication. Of course, to make this possible they both need to have the same secret key for the symmetric-key cryptography, and for the MAC. To confidentially and authentically share this secret key, we use public-key encryption. Of course, to make this possible we need a way to obtain authentic copies of the public keys of both parties - this is implemented using digital signatures, where trusted third parties known as certificate authorities keep track of public keys that are considered authentic (generally by vetting the site owner in the real world), and digitally sign them to confirm that they're authentic. The certificate authorities' public keys are pre-installed in the browser, which acts as a root of trust - if you trust that certificate authority's public key is authentic, and that the certificate authorities are correctly keeping track of websites' authentic public keys, then you can also trust that the website's presented public key is also authentic.

SSL is one of the most successful cryptosystems ever deployed, used for tons of sites on the internet. However, there are a number of possible weaknesses that might result in the guarantees being broken:

This demonstrates some useful concepts:

Cryptography is only one part of a large information security ecosystem, including other things like secure operating systems, auditing mechanisms, trusted computing, and risk analysis. In this context, cryptography provides a lot of useful and essential tools, but it's not all there is to security. Attackers will generally target the weakest part of the system, and if that link fails, it is possible that the entire system will.

This course focuses on breadth over depth. For depth, take CO 485 and try out the readings for this course as well.

6/1/17

SSL in more detail:

  1. Client makes request to the website (not the full URL, just the host).
  2. Server responds with the website's certificate, which contains the website identification info (like host and etc.).

Symmetric-key encryption

A symmetric key encryption scheme (SKES) is a definition containing:

To use one of these to implement message confidentiality, Alice and Bob agree on a particular k \in K over a secure channel (so that nobody else knows about the value of k). Then, Alice can compute c = E_k(m), where m is the message, and then sends c to Bob over an unsecure channel. Bob can read the message by then computing m = D_k(c), while anyone without k would have to figure out the correct D_k function to use. An SKES can be designed such that finding the correct D_k would be very difficult.

A substitution cipher is an SKES, under this definition: M is the set of all English messages, C is the set of all encrypted messages, K is the set of all permutations of the English alphabet, E_k maps letters onto k by index, and D_k does the inverse mapping.

A security model defines what the adversay is capable of, like what they can do to the communicating parties. For example, there are passive attacks like ciphertext attacks (attacker can obtain ciphertext, like if they're listening on the same network), known-plaintext attacks (attacker knows some of the plaintext as well the resulting ciphertext, like knowing that someone always signs off their message with their name). There are also active attacks, like chosen-plaintext attacks (attacker can choose some part of the plaintext), clandestine attacks (attacker is willing to use bribery/blackmail/etc.), and side-channel attacks (power analysis, RF emissions analysis, timing attacks).

The security model also includes the attacker's computational abilities. For example, information-theoretic security models assume the attacker has infinite computational resources, complexity-theoretic security models assume the attacker has a turing machine capable of computing any polynomial-time algorithm efficiently, and computational-theoretic security models assume the attacker simply has a lot of computers available.

The attacker's goals, in decreasing order of priority: recovering the secret key, recovering plaintext from ciphertext (without the key), or learn some characteristics of the plaintext given ciphertext (besides message length).

An SKES is secure if and only if it is resistant to a chosen-plaintext attack by a computationally bounded adversary (computational-theoretic security). A secure SKES is necessary but not sufficient to guarantee confidentiality.

Additionally, we also assume that the adversaries know everything about the cryptosystem except the plaintext and the key, including the all the algorithms and communicatoin mechanisms.

An SKES should ideally be efficient (for encryption/decryption), have small keys (but not so small that it makes brute force attacks possible), and be secure (even against the designer of the SKES).

A work factor measures how hard a task is, in terms of how many computations are needed to complete it: 2^{56} operations is easy, 2^{64} is feasible, 2^{80} operations is barely feasible, and 2^{128} operations is infeasible. This will change as our computers get faster. For example, the entire bitcoin network is doing 2^{61} hashes per second. However, we use 2^{128} operations as infeasible because the Landauer limit tells us doing that many operations on a classical computer would take a significant fraction of the world's power output for a year.

For example, a substitution cipher is not secure, because it's trivially broken by a chosen-plaintext attack. By choosing the alphabet as the message, the ciphertext is just the secret key. In fact, it's not even secure against ciphertext-only messages. Although it's infeasible to exhaustively try every permutation of the alphabet for valid-looking decrypted messages, we can simply use letter frequency analysis to try likely candidates first.

A polyalphabetic cipher uses multiple alphabet permutations for substitution, choosing between them with a particular algorithm. For example, a Vigenere cipher has a word with non-repeated letters added to the message's letters mod 26. This is nice because as the message grows longer, the letter frequency graph is a bit flatter. We can break that by using a chosen-plaintext attack with plaintext "AAAAAAAAAAAAAAAAAA", so if "A" corresponds to the numerical value 0, the ciphertext is just the key. Therefore, the Vigenere cipher is not secure.

From now on, we assume plaintext, keys, and ciphertext are all binary strings. What happens when we apply the concept of a Vignere cipher, but with a uniformly random key that's as long as the entire message?

A one time pad XORs a random key with the plaintext (the key is as long as the message). However, it's important to not reuse keys, because c_1 = m_1 \oplus k_1 and c_2 = m_2 \oplus k_2 implies that c_1 \oplus c_2 = m_1 \oplus m_2, which leaks a lot of information about the plaintext. If the key is uniformly randomly selected, every key is equally likely, so every ciphertext is also equally likely. This is information-theoretically secure - a one-time pad provably cannot be broken by a ciphertext-only attack even by attackers with infinite computational resources. While one-time pads have these nice properties, in practice they are hard to use because the key has to be as long as the ciphertext, making sharing keys a pain.

A stream cipher is like a one-time pad, but instead of a truly random key, we use a pseudorandom bit generator, where the PRBG's seed is the secret key. This is no longer perfectly secure because it depends on the quality of the randomness of the PRBG, but it's often a practical compromise since the key can be a lot shorter. Like with a one-time pad, we also shouldn't re-use keys, since that would give us some of the values of the PRBG's output by XORing the ciphertexts, which would make it easier to learn the PRBG's state. One example of a stream cipher is RC4.

9/1/17

The PRBG should satisfy two requirements:

Most random number generators built into programming languages are not cryptographically secure - they're only intended to satisfy the indistinguishability requirement. For example, rand in UNIX uses a linear congruential generator with known constants, which is straightforward to predict.

RC4 was designed by Rivest, the R in RSA. It was used in everything from SSL/TLS to Adobe Acrobat, as one of the most popular stream ciphers around. It's nice because it's very fast and has variable key sizes, but was proprietary for a long time has a lot of weaknesses, even though none of them are catastrophic. It consists of a key scheduling algorithm, and a keystream generator.

The key scheduling algorithm generates a random-looking permutation of 0, \ldots, 255. It starts by initializing S with 0, \ldots, 255, and \overline{K} to the key repeated over and over until it fills 256 entries in the array. Then, the array entries in S are swapped based on j_{i + 1} = (S[i] + \overline{K} + j_i) \mod 256. The keystream generator just applies that permutation to generate the keystream, which is then XORed with the plaintext to get the ciphertext.

Wireless networks lose the physical security of wired networks, and security for those networks is a lot harder because attackers can do it from a distance with no physical evidence. The original WiFi standard, IEEE 802.11, includes the Wireless Equivalent Privacy (WEP) protocol for protecting link-level data in transit between clients and access points. WEP was intended to provide confidentiality using RC4, data integrity using a checksum, and access control by rejecting improperly encrypted packets.

In WEP, the client first shares a 40-bit or 104-bit key with the access point (this was because the US classified cryptography as munitions, disallowing most cryptography with keys greater than 40 bits for export; and keys greater than 104 bits for domestic use). Messages are then divided into fixed-size packets, which are then each encrypted with a per-packet initialization vector (IV). The issue is that WEP didn't specify certain things, like how the key should be distributed, and how IVs should be managed.

Implementations ended up using one shared key per LAN, infrequently changed, and just generating random IVs or using consecutive integers as IVs. To send a packet, a party would select a 24-bit IV, compute the CRC-32 checksum of the message, the checksum is then appended to the plaintext, and then XORed with the RC4 keystream to get the ciphertext, where the key is the IV the shared WEP key appended to the IV. The sender then transmits the IV and the ciphertext. The receiver then, having the IV and the WEP shared key, gets gets the plaintext concatenated with the checksum, and then verifies that the checksum is correct, rejecting the packet if the checksum doesn't validate.

Turns out, WEP implements none of its goals. One problem is IV collision - since the IV is only 24-bits, the birthday paradox means that only about 2^{12} packets are needed for a collision. If two packets have the same IV, then c_1 \oplus c_2 = m_1 \oplus m_2 - the ciphertexts can be XORed to get the plaintexts, which can then be analyzed using known plaintexts or statistical analysis, so confidentiality is broken. Another problem is that the checksum is linear - we can make certain changes to the ciphertext while still ensuring that the checksum will verify correctly, so data integrity is broken. Finally, the checksum is unkeyed, so knowing the plaintext for one encrypted packet is enough to get the RC4 keystream and encrypt messages properly themselves.

Shamir and co. (the S in RSA) came up with an even better attack in 2001, based on the fact that 104-bit keys were infrequently changed, the IV is very predictable/randomly selected, and we know the first byte of the plaintext (the protocols add known headers to the plaintext before encrypting). With 5 million packets, the secret key itself can be recovered. Modern techniques can recover the key in just 40000 packets.

IEEE published updated wireless security standards. For example, WPA was a temporary replacement, and WPA2 came out afterwards, using AES instead of RC4, and designed to be much stronger.

11/1/17

Assignment 1 should be started now.

Attacks only get better over time, never worse. In WEPs case, it wasn't the security of the algorithms like RC4 that were broken, it was the use of those algorithms that was incorrect. In the future, thanks to upcoming advances related to Moore's law and quantum computing, currently secure cryptosystems could easily become broken.

A block cipher is the other common type of SKES. While a stream cipher encrypts one bit at a time, a block cipher breaks up the plaintext into fixed-length blocks, encrypting the message one block at a time. Some commonly known block ciphers are DES and AES.

DES has a 56-bit key length (which is now relatively easy to break), and a block size of 64 bits. AES (Rijndael, "rined'all") is its successor, and has a 128/192/256 bit key length. There are currently no significant known attacks on AES.

Clause Shannon gave a couple of principles for designing good block ciphers:

Also, block ciphers should be fast, so we can use them in more applications.

A Feistel cipher is a class of ciphers, including DES. Feistel ciphers are parameterized based on n (half of block size), h (number of rounds), and l (key size). It generates h subkeys from the cipher key, one for each round. Each key is used to generate a component function f_1, \ldots, f_h that "scrambles" its input.

Each block m is then broken into two halves, \tup{m_0, m_1}. Then, we perform h encryption rounds: at each round i, the right half gets moved into the left half, and the original left half is XORed with f_i applied to the original right half, and that becomes the right half. After those h rounds, the two halves are the ciphertext.

13/1/17

;wip: feistel ciphers

The New Data Seal cipher was invented by IBM as the predecessor to DES. It's a Feistel cipher, and is a relatively complicated one, but it also happens to be completely broken today to chosen-plaintext attacks. It has a block size of 64 bits, and uses 16 rounds, so n = 64, h = 16. Here's the basic idea:

  1. Let S_k: \set{0, 1}^8 \to \set{0, 1}^8 be the secret key - an arbitrary function of a byte.
  2. Let f: \set{0, 1}^{64} \to \set{0, 1}^{64}, the component function, be defined as follows:
    1. Since the input is 64-bits, let z = \tup{z^1, \ldots, z^8} be the 8 bytes of the input.
    2. For each byte z^j, let n_1^j = z^j[7:4] and n_2^j = z^j[3:0] - the two nibbles of the byte.
    3. Let t = S_k(z^1[7] \ldots z^8[7]) - the key function applied to the byte obtained by taking the first bit of each byte in z.
    4. For each byte z_j, if bit j of t is 1, let p_1^j = S_1(n_2^j) and p_2^j = S_0(n_1^j), otherwise let p_1^j = S_0(n_1^j) and p_2^j = S_1(n_2^j). S_0, S_1 are functions defined by the NDS specification (that are too long to list here), and we're swapping the nibbles if the corresponding bit of the key function S_k is true.
    5. Output P(p_1^1 p_2^1 \ldots p_1^8 p_2^8) - a scrambling function P applied to all of the scrambled and permuted nibbles concatenated together. P is also a function defined by the NDS specification (that is too long to list here).
  3. For each 16-byte block of the input z = \tup{z^1, \ldots, z^8} (each block contains 2n bits), run the Feistel cipher ladder with f as the component function:
    1. Let m_0, m_1 be the two 8-byte halves of the 16-byte block.
    2. For 16 rounds 1 \le j \le 16, set m_j to m_{j - 1} and m_{j + 1} to m_{j - 1} \oplus f(m_j) (simultaneously).
    3. Output m_{16} m_{17}.

Since S_k can be considered a random function, there are 2^8 possible values of S_k(x) for each of the 2^8 possible values of x - we might think of serializing the function as a 256-byte sequence. That means there are 256^{256} possible values of S_k, or 2^{2048}, which makes trying every possible S_k computationally infeasible.

In fact, we can recover the entire secret key from the ciphertext in a few hundred chosen plaintext attacks. The main issue is that there's no subkeys - each round used the same component function and secret key, rather than deriving unique component functions for each round - each f_1, \ldots, f_h is the same function! Let T denote the function that does one round of encryption, with regard to the secret key S_k. Basically, T(\tup{m_{i - 1}, m_i}) = \tup{m_i, m_{i - 1} \oplus f(m_i)} for some fixed function f and the two halves of the message are m_0, m_1.

Clearly, applying T(m) to the message 16 times is the whole encryption function. Let's represent this as T^{16}(m). Let F = T^{16} - this is the full encryption function, since the cipher is just applying T 16 times. Clearly, T(F(m)) = T(T^{16}(m)) = T^{17}(m) = T^{16}(T(m)) = F(T(m)) for any byte m.

Here's the attack. For every possible byte r, we want to determine S_k(r). Select u = \tup{m_0, m_1} such that the byte formed by taking the first bit of each byte in m_1 is r, and p_1^j \ne p_2^j for 1 \le j \le 8 - the scrambled nibbles aren't the same within each of the 8 bytes in m_1.

As an aside, x^* seems to mean "the first bit of each byte in x".

By the rules of the chosen plaintext attack, we can get Bob to give us \tup{a, b} = F(u), but not what F(m) or T(m) are (we know the values of the function at this point, but that's it).

However, we know that T(F(u)) = F(T(u)) = \tup{b, \cdot}. Since the value of S_k(r) is only a byte, we can guess every possible value of S_k(r), and check each guess t by computing T_t(u) and then get Bob to give us F(T_t(u)) = \tup{c, d}. Clearly, b \ne c implies that S_k(r) \ne t, and b = c implies that S_k(r) = t is very very likely (there is a tiny chance of accidentally getting it right). This is because we're assuming that F works roughly like a random permutation, so since b and c are 64 bits, F(T_t(u)) has only a \frac 1 {2^{64}} probability of b = c without S_k(r) = t, because we selected u such that p_1^j \ne p_2^j. Eventually, we get every possible value of S_k.

Also, since we check every possible byte r (256 values), and for each value of r, we guess-and-check possible bytes t (on average, 128 checks, worst case 256), we should expect 128 \times 256 = 2^{15} chosen plaintexts on average before recovering the entirety of S_k.

The entire attack can be summarized as:

  1. For each possible byte r:
    1. Find a byte u = \tup{m_0, m_1} such that m_1^* = r and p_1^j \ne p_2^j, 1 \le j \le 8.
    2. Get Bob to compute F(u) = \tup{a, b}.
    3. For each possible byte t:
      1. Compute T_t(u).
      2. Get Bob to compute F(T_t(u)) = \tup{c, d}.
      3. If b = c, we know that S_k(r) = t with overwhelming probability, and then go to the next r.
  2. Now we have S_k(r), the secret key. We can use this to decrypt the ciphertext.

Basically, we need about 2^{15} chosen plaintexts to recover the entire secret key, which is quite feasible on today's computers.

16/1/17

Chosen plaintext attacks are not always feasible, but when they are they're very powerful. For example, consider a mail forwarder that encrypts incoming emails and forwards them to another destination.

DES is one of NDS' successors. Originally with 64-bit keys, the NSA weakened the final standard to 56 bits. DES is pretty weak compared to modern ciphers, but 3-DES is still commonly in use. The design principles are classified, but it's been heavily analyzed.

DES is a Feistel cipher with 16 rounds, a 64-bit block and a 56-bit key. For each round of the cipher, a 48-bit subkey is derived from the 56-bit secret key, by selecting 48 bits from the secret key. The plaintext halves are also expanded from 32-bits to 48-bits. After each Feistel round completes, we use 8 S-box functions (which are publicly known) to scramble the XORed result and reduce 48-bit results to 32-bits. This is the source of nonlinearity in DES, and is very important for security.

With a key space of size 2^{56}, it's perfectly practical to just brute force today, even if we can't do it on our laptops just yet. A massively parallel effort broke a DES-encrypted message in just over 22 hours in 1999, in response to a challenge set by RSA security. Also, it's got a block size of 64-bits, which means that you'd expect a collision at around 2^{32} uniformly distributed blocks, thanks to the birthday paradox. So, on a busy network, if Alice and Bob are sending many blocks back and forth, Eve might observe a duplicated block, which tells us a little bit of information about the plaintext - those two corresponding blocks in the plaintext are identical (this isn't much of a problem in practice, but is still an issue).

Differential cryptoanalysis attacks can be used to recover the key in 2^{47} chosen plaintexts, though this is usually an infeasibly large number of chosen plaintexts in practice (turns out, DES was specifically designed to guard against this - the NSA was aware of the attack before it was public knowledge). Linear cryptoanalysis attacks can do the same thing in just 2^{43} chosen plaintexts, which is slightly more practical.

After these weakenings, 3-DES was invented as a more secure SKES in order to take advantage of existing DES hardware (DES was designed for hardware acceleration). Basically, it's just DES applied to DES applies to DES applied to the plaintext. Note that applying ciphers repeatedly doesn't always provide more security (consider a substitution cipher, for example), but in this case it provides reasonably good. Another variant is Double-DES, which is DES applied to DES applied to the plaintext (each DES function has its own key, so we have a 112-bit key overall).

Turns out Double-DES is vulnerable to a meet-in-the-middle attack. Basically, if E_{k_1}, E_{k_2} are the two DES encryption functions in Double-DES and c = E_{k_2}(E_{k_1}(m)), then E^{-1}(c) = E_{k_1}(m).

Suppose we have 3 known plaintext/ciphertext pairs \tup{m_1, c_1}, \ldots, \tup{m_3, c_3}. For each possible 56-bit k_2, decrypting c_1 with our guess for k_2, and store that plaintext and the key used in a table, indexed by the plaintext. Then, for each possible 56-bit k_1, try encrypting the m_1 with our guess for k_1, and see if it matches any entry in the table. If there's a match, and those guesses for k_1, k_2 also satisfy c_2 = E_{k_2}(E_{k_1}(m_2)) and c_3 = E_{k_2}(E_{k_1}(m_3)), then our guesses for k_1, k_2 are the actual secret keys.

Due to this attack, Double-DES is only slightly better than plain DES, with about 2^{57} bits of security and requiring about 2^{56}(64 + 56) bits of memory to store the table. The main hard part is getting the ~1 exabyte of storage needed to mount this attack, but it's quite within the realm of possibility. We can even do a time-memory tradeoff and do 2^{56 + s} DES operations in return for 2^{56 - s} bits of memory, which can easily bring the memory requirements down to a manageable size while keeping the number of operations feasible.

18/1/17

Why do we need 3 pairs of known plaintext/ciphertext pairs rather than just 1 or 2? Well, for each key in the key space, we can think of the encryption function is just a random function (it's not, but this is a good enough assumption for our purposes).

Let k be the actual l-bit secret key (so E_k(m) = c) and k' (also l bits) be our guess for this key. If E_{k'} is a uniformly random function (from our assumption), then it clearly has a \frac 1 {2^L} probability of satisfying E_{k'}(m) = c, where L is the number of bits in the plaintext - it might just randomly happen to output the right ciphertext given our particular plaintext. What's the probability that our guess is correct if it satisfies E_{k'}(m) = c?

Suppose E_{k'}(m) = c and k' \ne k (our guess for the key is incorrect, but gives the right ciphertext for our particular plaintext). Then the number of E_{k'} such that E_{k'}(m) = c is \frac{2^l - 1}{2^{L}} (2^l - 1 is the number of keys in the keyspace that are not the actual key, and 2^{L} is the number of possible plaintexts). This is essentially the number of incorrect keys that work for a single plaintext/ciphertext pair without actually being the correct key.

We want to use enough plaintext/ciphertext pairs such that the expected number of these incorrect keys is close to 0. Clearly, with t of these plaintext/ciphertext pairs, we have \frac{2^l - 1}{2^{Lt}} expected false keys. For Double-DES, one pair gives us \frac{2^{112} - 1}{2^{64}} expected incorrect keys, which is far too many. With 3 pairs, we get around \frac 1 {2^{16}} expected false keys, much more reasonable.

There's also triple DES, which is just applying DES three times now - c = E_{k_1}(E_{k_2}(E_{k_3}(m))), where c is the resulting ciphertext, m is the plaintext, and k_1, k_2, k_3 are the three 56-bit DES keys. We don't have any proof that it's more secure than DES alone, but a meet-in-the-middle attack takes around 2^{112} steps, around 2^{64} message/plaintext pairs, and a huge table.

Given our plaintext m = m_1 \ldots m_t, how do we encrypt it if the Block cipher modes of operation:

AES was developed as part of a public, open competition in 1997 to build a successor to DES. It needed to have 3 key sizes (128, 192, and 256 bits), have a 128-bit block size, and be efficiently implementable on hardware/software. While 128 bits is infeasible for the foreseeable future, the larger key sizes are intended to protect against potential future attacks, as well as quantum computers - Grover's algorithm can perform exhaustive key search in only 2^{\frac 1 2 l} operations, where l is the number of bits in the key.

With 15 submissions initially, Rijndael won in the end. Rijndael became AES in late 2001.

Rijndael is an iterated block cipher, not a Feistel cipher. Specifically, a substitution-permutation network. Currently, there's no known attack better than exhaustive key search on AES.

We won't cover the technical details of AES in this course, but some more details are available in the slides

20/1/17

x \in_R S means that x is randomly and independently chosen from S.

A hash function is a fixed function that transforms an input of arbitrary length and output a fixed-length string, called a digest. Some common hash functions are MD5, SHA1, and SHA512.

Formally, an n-bit hash function is H: \set{0, 1}^* \to \set{0, 1}^n. A good hash function should satisfy three examples:

A Davies-Meyer hash function is a family of hash functions that uses block ciphers. Basically, given a fixed IV H_0 and an n-bit block encryption function E_k, we break up the plaintext into n-bit blocks x_1, \ldots, x_t (after appending 1 to the end and padding with 0 to the nearest block size), then H_i = E_{x_i}(H_{i - 1}) \oplus H_{i - 1} and H_t is the hash value.

A one-way function is a pre-image resistant hash function. A cryptographic hash function is a pre-image resistant and collision-resistant hash function.

A generic attack is an attack on a hash function that treats it as a black-box/ideal hash function. As a result, a generic attack must treat the hash function as a random function H: \set{0, 1}^* \to \set{0, 1}^n (an ideal hash function is a random function). In practice, actual random functions are too complex to represent in a reasonable way.

A generic attack for finding pre-images: select arbitrary x \in \set{0, 1}^* until we find H(x) = y. Clearly, we expect about 2^n time. A generic attack for finding collisions: select arbitrary x \in \set{0, 1}^* and store them in a table indexed by H(x), until a hash is found. By birthday paradox, we expect that to take about \sqrt{2^n} time and memory.

23/1/17

Checking for collisions generically, as mentioned last class, would take around \sqrt{2^n} time and memory. However, there's actually a generic attack for finding collisions that has the same order of running time (\sqrt{\frac{\pi 2^n}{2}}), but requires far less memory.

Basically, given H: \set{0, 1}^* \to \set{0, 1}^n, we want to find x_1, x_2 \in \set{0, 1}^*, where x_1 \ne x_2 yet H(x_1) = H(x_2). How does this algorithm, known as the Van Oorschot/Weiner (VW) algorithm, work?

First, we assume that H: \set{0, 1}^n \to \set{0, 1}^n is a random function. Now, we'll search for x_1, x_2 in \set{0, 1}^n rather than \set{0, 1}^*, by defining a sequence of hash values w_0 \in_R \set{0, 1}^n, w_i = H(w_{i - 1}) - a sequence consisting of a hash value, the hash of that hash value, the hash of the hash of that hash value, and so on. This is a random sequence since we're assuming H is a random function.

Clearly, there must be a repeat (collision) in the sequence somewhere, because an element of the sequence can only have one of 2^n possible values. That means the sequence is finite and ends in a cycle. Let j be the smallest index such that x_j = x_i for some i < j - j must exist because there must exist a collision. Since , x_{j + l} = x_{i + l} for any l \ge 0.

Let N = 2^n. From the birthday paradox, we expect the value of j to be E[j] = \sqrt{\frac{\pi N}{2}}, or around \sqrt{2^n}.

Also, E[i] = E[j - i] = \frac 1 2 \sqrt{\frac{\pi N}{2}}. If we assume, without loss of generality, that i \ne 0, then H(x_{i - 1}) = H(x_{j - 1}) and x_{i - 1} \ne x_{j - 1} (with very high probability) - a collision!

How do we actually find x_{i - 1}, x_{j - 1}, without using much storage?

If we think of the sequence of hashes is like a linked list, where H(x_i) gets the next element x_{i + 1}, then this simply becomes a problem analogous to finding cycles in a linked list, which we could solve using something like the tortoise-and-hare algorithm with just constant memory. However, that means a lot of wasted time unnecessarily computing hashes, and it only tells us that there is a collision, not what the collision actually is.

A better way we could do this is to define some easily distinguishable property of the hash values that applies to around fraction \theta of all hash values, for a small value of \theta. For example, the property might be "the first 20 bits are 0" (\frac 1 {2^{20}} of hash values satisfy this property). Then, we only store the hash values in the sequence that satisfy this property (these hash values are called "distinguished points").

When we encounter a distinguished point for the second time, we've found a collision somewhere behind us in the sequence - assuming there's a distinguished point between i and j, we must eventually encounter it again when our sequence loops around to x_{j}. We can assume there's a distinguished point between i and j because our \theta is chosen to be large enough that there is a very high probability that there is a distinguished point in the cycle - for example, for a 32-bit hash we would expect the cycle to contain about \frac 1 2 2^{32} = 2^{31} entries, so we might make our distinguishing property "the first 21 bits are 0", giving an expected value of 2^{10} distinguished points in the cycle.

That does mean there's a very tiny chance the algorithm doesn't terminate, however - we can get around this by limiting the length of the path to 2^n and retrying. There are a lot of other edge cases too (what if the starting value has a collision in the sequence?), but they all happen with negligible probability and can be handled with relatively few operations.

Algorithm:

  1. Randomly select a x_0 from \set{0, 1}^n, store \tup{x_0, 0, \text{empty}} in the table.
  2. Let L = x_0 (L is the most recently stored distinguished point).
  3. For j \ge 1 do (phase one starts - finding duplicate points):
    1. Let x_j = H(x_{j - 1}).
    2. If x_j is distinguished:
      1. Store \tup{x_j, j, L} into the table and let L = x_j.
      2. If x_j was already in the table, we know x_i = x_j, and stop this loop to begin phase 2 - finding the actual collision values.
  4. Let a be the index of the distinguished point right before i, and b be the index of the distinguished point right before j. This can be looked up as the third element of tuples in the table - the values of L associated with x_i and x_j respectively.
  5. Let l_1 = i - a, l_2 = j - b. Assume without loss of generality that l_1 \ge l_2. Let R = l_1 - l_2 - how much longer the earlier chain is until the collision than the later chain.
  6. Compute x_{a + 1}, \ldots, x_{a + R}. Let p_1 = x_{a + R} and q_1 = x_b. Now the collision is an equal number of hashes away from p_1 and q_2.
  7. Let p_{k + 1} = H(p_k) and q_{k + 1} = H(q_k). Compute p_k and q_k until we find a value k such that p_k = q_k. Then, the previous value p_{k - 1} and q_{k - 1} form a collision.

So overall, step 3 takes around \sqrt{\frac{\pi N}{2}} + \frac 1 \theta time, while the steps after it take \frac 3 \theta. The memory usage is around \theta \sqrt{\frac{\pi N}{2}} \times 3n bits.

25/1/17

If we have a 128-bit hash, we might choose a \theta = \frac 1 {2^{32}}, for about 2^{96} distinguished points. With this, we would expect around 2^{64} operations and 2^{41} bits, or around 256 gibibytes - totally feasible on modern hardware. This tells us that any 128-bit hashes are simply not collision-resistant today.

How do we parallelize the VW algorithm, given m processors? One naive way is to run the algorithm with different starting values on each processor (each with its own memory), and see which one finishes first. Turns out this uses time \frac{\sqrt{\frac{\pi N}{2}}}{\sqrt{m}} + \frac 4 \theta - not very good scaling.

Instead, we can store all of the distinguished points from the m processors in a central table - at each time unit, each processor adds a new unique hash value, until one of the distinguished points collide. This takes \frac{\sqrt{\pi N}{2}}{m} + \frac 4 \theta operations, which is about as well as you can expect this algorithm to be parallelized.

How do we find meaningful collisions in a hash function? Suppose m_1 is "Alice owes Bob 1000 dollars" and m_2 is "Alice owes Bob 10 dollars". Alice wants to find values m_1', m_2' that semantically mean the same thing as m_1, m_2 (e.g., "Alice should give Bob $1000" and "Alice ought to pay Bob $10"), sign m_1, make Bob agree to it, and then later claim to have signed m_2. After Bob agrees to it, Alice can claim to have signed m_2 instead, only owing 10 dollars rather than 1000 dollars. How can Alice find m_1', m_2' using the VW algorithm?

Partition m_1 into n parts, with partitions at indices j_1, \ldots, j_n. Let g_{m_1}(r): \set{0, 1}^n \to \set{0, 1}^* be m_1 modified so that a space is inserted into m_1 at position j_i if and only if the ith bit of r is 1. Note that g_{m_1}(r) should have the same meaning as m_1, since we just added a space somewhere. Likewise, define g_{m_2}(r) for inserting spaces into m_2.

Let S_0 be the elements of \set{0, 1}^* that begin with 0, and S_1 be the elements that begin with 1. Let f: \set{0, 1}^n \to \set{0, 1}^n as f(r) = \begin{cases} H(g_{m_1}(r)) \text{ if } r \in S_0 \\ H(g_{m_2}(r)) \text{ if } r \in S_1 \end{cases} - the hash of a message semantically equivalent to m_1 if r begins with 0, and the hash of a message semantically equivalent to m_2 if r begins with 1.

If we run the VW algorithm on f to find a colliding r, we get a collision a, b \in \set{0, 1}^n with a \ne b and f(a) = f(b). With 50% probability, the first bit of a is different from the first bit of b, assuming they're drawn from a uniform distribution (we can assume this because we assume the hash function is a random function).

Suppose that the first bits are different - then a = g_{m_1}(r) and b = g_{m_2}(r), and a, b are two messages, semantically the same thing as m_1, m_2 respectively, that collide. If the first bits are not different, then we can just run the VW algorithm repeatedly with a different starting point until we get a collision that does have messages with two different first bits. Clearly, the running time is essentially the same for each iteration, and with a 50% probability of success, we can just run it a few times with different variants of g_{m_1}, g_{m_2} until we get a meaningful collision.

27/1/17

So now we've looked at the naive hash collision finding, the VW algorithm, the parallel VW algorithm, and the meaningful collisions with the parallel VW algorithm.

Now let's look at general methods for constructing hash functions. Merkle's meta method is one of these.

Given an input of n + r bits, we define a compression function f: \set{0, 1}^{n + r} \to \set{0, 1}^n. We also have an IV value in \set{0, 1}^n. Using these, we can use Merkle's meta method to define a hash function H: \set{0, 1}^* \to \set{0, 1}^n.

Let x \in \set{0, 1}^*. Let x_1 \ldots x_t be x broken into r-bit blocks (the last one should be zero-padded if needed). Let x_{t + 1} be the binary representation of the length of x in bits (technically, this limits the length of x to 2^r, but in practice 2^r is so large that we can simply pretend x is unbounded in length).

Let H_0 be the IV. For 1 \le i \le t + 1, we do H_i = f(H_{i - 1}, x_i). Then, H(x) = H_{t + 1}.

Merkle's theorem says that if f, the compression function, is collision-resistant, then so is H. This means we only have to make a fixed input size function collision-resistant, rather than an unbounded-input-size function collision-resistant. That means we should usually try to attack the compression function f rather than the hash function itself.

This theorem can be proven using the contrapositive. Assume H is not collision resistant, so we can efficiently find a collision \tup{x, x'} \in \set{0, 1}^*. Let \overline x = x_1 \ldots x_t and b be the length of x in bits and H_1, \ldots, H_{t + 1} be the iteration values for x, and let \overline x' = x_1' \ldots x_{t'}' and b' be the length of x' in bits and H_1', \ldots, H_{t' + 1}' be the iteration values for x'. Clearly, if b \ne b', then x_{t + 1} \ne x_{t' + 1}', so f(H_t, x_{t + 1}) = f(H_t', x_{t' + 1}') and x_{t + 1} \ne x_{t' + 1}' - a collision! Now suppose b = b', so x_{t + 1} = x_{t + 1}' and t = t'. If we work backwords from the last iteration toward the first iteration, we must eventually find some H_i = f(H_{i - 1}, x_i) and H_i' = f(H_{i - 1}', x_i') such that \tup{H_{i - 1}, x_i} \ne \tup{H_{i - 1}', x_i'}, since x \ne x' - they must differ in at least one message block. This is a collision for f. So, a collision is always computationally efficient to find for f. So if f is collision resistant, H must be as well.

30/1/17

MDx is a family of iterated hash functions, the most well-known of which are MD4 and MD5. MD4 was a 128-bit hash invented by Ron Rivest in 1990, but it had some serious flaws - in 2004 someone found collisions by hand. Pre-image is still infeasible, however.

MD5 is another 128-bit hash, a strengthened version of MD4. Since its invention in 1991, we've found MD5 collisions in 31 seconds in 2006 on commodity hardware, while preimages can be found in 2^{123.4} steps. However, MD5 is still in very wide use today.

SHA-1 is a 160-bit hash function invented by the NSA in 1993, and modified by the NSA again in 1994 to fix a classified weakness. In 2005, it was found that that fix increased the cost of finding collisions from 2^{39} to 2^{63}. Collisions are feasible to find today, but nobody seems to have done it yet - we still want to move away from SHA-1 though, since it's theoretically feasible to find collisions. We don't know of any pre-image/second pre-image attacks on SHA-1 that are faster than the generic attacks. In about two weeks from now, SSL will no longer consider SHA-1 hashes for server certificates valid. SHA-1 is a Merkle-style hash with n = 160, r = 512. Most of the design decisions are classified, because NSA, and the hash operations themselves are rather messy.

In 2001 the NSA proposed SHA-2, a family of hash functions with three variants, each with different output sizes - SHA-256, SHA-384, and SHA-512. Note that SHA-2 has the same security against collisions as AES-128/AES-192/AES-256. As of this writing, no attacks stronger than the generic attacks have been found, but there are concerns remaining because the design is pretty similar to SHA-1.

SHA-3 is a NIST hash function competition, much like the competition that resulted in AES. Starting with 64 candidates, Keecak was selected as the winner in 2012. Keecak isn't an iterated hash function, using a sponge construction method, and it isn't very widely used in practice due to most people being fine with the SHA-2 family.

NIST recommends we stop using SHA-1 if possible, though it's okay for HMACs/KDFs/RNGs. SHA-2 and SHA-3 are recommended.

Suppose we can find a meaningless collision in an iterated hash function \tup{x, y}. How can we exploit this in practice? Collision attacks are useful when we can modify both the original document, and the colliding document.

Wang proposed an attack on SHA-1 that gives us a two-block collision (two two-block-long messages that hash to the same value) that takes only around 2^{63} operations. Wang also proposed an attack on MD5 that gives us a one-block collision (two one-block-long messages that hash to the same value) in only 2^{39} operations. We've actually applied these and found real collisions in SHA-1 and MD5. How do we exploit this concretely?

Suppose \tup{c_1, c_2} is a one-block MD5 collision (without loss of generality, assume the blocks don't contain ( or ), so it can fit into a Postscript literal). Now consider the following two Postscript files:

%!PS-Adobe-1.0
%%BoundingBox: 0 0 612 792                   
(<c_1>)(<c_1>)eq{
<HARMLESS MESSAGE>
}{
<HARMFUL MESSAGE>
}ifelse
showpage
%!PS-Adobe-1.0
%%BoundingBox: 0 0 612 792                   
(<c_2>)(<c_1>)eq{
<HARMLESS MESSAGE>
}{
<HARMFUL MESSAGE>
}ifelse
showpage

The two files are identical except for line 3, where we have <c_1> in the first and <c_2> in the second.

Clearly, the first one prints out the harmless message, while the second one prints out the harmful message. However, note that the second line has been padded with spaces so that the second block starts exactly at either <c_1> in the first message, or <c_2> in the second message (MD5 uses a block size of 64 bytes, and the file is assumed to be using Windows-style CR-LF line endings).

Suppose x and y have the same number of blocks in an iterated hash function. Let F(I, x) = H_t where I = H_0 and H_i = f(H_{i - 1}, x_i). Clearly, if F(I, x) = F(I, y), then F(I, xz) = F(I, yz) for any string z. In other words, given a collision \tup{x, y}, \tup{xz, yz} is also a collision for any string z. This is called a length extension attack.

Also, if F(I, x) = F(I, y), then F(I, zx) = F(I, zy) for any string z that is an integer multiple of the block size long. In other words, given a collision \tup{x, y}, \tup{zx, zy} is also a collision for any blocks z.

Note that each Postscript document can be made by prepending a block to one of <c_1>/<c_2>, and then appending the rest of the document. Since <c_1>/<c_2> collide, the Postscript documents themselves must also collide!

Suppose Alice sends Bob the first Postscript file, and asks Bob for a signature. Bob opens it in a Postscript viewer, sees the harmless message, then signs the MD5 hash of the document (because signing the whole document with public key crypto would be too unwieldy), and sends the signature to Alice. However, Alice now has a signature from Bob for the second Postscript document, which has the same MD5 hash but a different, harmful message - Alice has convinced Bob to sign something Bob didn't want to!

(Menezes now demonstrates an MD5 collision on a real document, where the MD5 hash of a Postscript file that results in a PDF saying "John didn't leave his hat in my office" collides with a Postscript file that results in a PDF saying "John should be given a mark of 100 in this course".)

1/2/17

;wip: slides up to slide 192 ish, MAC schemes (a lot like the cipher block chaining scheme used for symmetric ciphers),

;wip: MACs can't provide non-repudiation, because at least two parties must have the shared key.

;wip: encrypt-and-mac is insecure because the MAC can leak info about the plaintext. suppose we have a secure MAC H_k(m). construct a new MAC H_k'(m) = H_k(m) \| m. Clearly, this is still a secure MAC, but it leaks the entire plaintext.

3/2/17

Instead, we can append the key to the message before we hash it, rather than prepending it. This ensures that the attacker can't use a length extension attack anymore because they don't know what the secret key K is, but it requires the hash function to be collision resistant in order for the resulting MAC to secure. This is called the secret suffix method.

If the hash function isn't collision resistant, then we can efficiently find a collision \tup{x_1, x_2}. Suppose that x_1 and x_2 have lengths that are multiples of the hash block length. Then H(x_1 \| K) = H(x_2 \| K), because the values being passed into the hash function start with two different but colliding values. Therefore, we've found a collision for the MAC - a collision attack!

To avoid the requirement that the hash function be collision resistant, we can use the envelope method, in which we prepend and append the key to the message before we hash it. This successfully avoids the length extension attack because the attacker doesn't know what K to append, and the collision attack because the attacker probably can't find a collision where both values start with K, without even knowing what the value of K is.

The Hash MAC (HMAC) is a commonly used MAC scheme that ;wip. H_k(x) = H(K \oplus \mathrm{opad} \| H(K \oplus \mathrm{ipad} \| x)), where \mathrm{ipad} = 0x36, \mathrm{opad} = 0x5C repeated for each byte of K. The inner hash is just the secret prefix method, but hashing for the second time ensures that the hash can't ;wip. This can be proven to be secure if the compression function used in the iterated hash function H is itself a secure MAC scheme with fixed length messages and a secret IV.

HMAC is used a lot in industry, as part of IPSec, SSL, and TLS. In practice, we generally use SHA-1 for the HMAC hash function, and it seems to be unaffected by the known weaknesses in SHA-1.

A secure symmetric encryption scheme ensures confidentiality. A secure MAC scheme like HMAC ensures authentication (data integrity and data origin authentication). If Alice needs to send Bob a message with both confidentiality and authentication, one way to do this is Alice can send Bob \tup{c, t} = E_{k_1}(m), H_{k_2}(m), where k_1, k_2 are secret keys pre-shared with Bob. This is called Encrypt-and-MAC. However, the issue here is that the MAC scheme isn't designed for confidentiality, only authentication, so theoretically, the HMAC might leak information about the plaintext. We still use this a lot in practice though, because most MAC schemes don't seem to leak any information about the plaintext.

A better way to do it would be to send \tup{c, t} = \tup{E_{k_1}(m), H_{k_2}(E_{k_1}(m))} (ciphertext, tag) instead - we tag the encrypted message using the HMAC, rather than tagging the plaintext. This can be proven to be secure if the encryption and MAC scheme are secure. This is called the Encrypt-then-MAC.

Authenticated encryption schemes try to provide both confidentiality and encryption in a more integrated package. The most common one in AES-GCM, which is faster than generic encrypt-then-MAC, as well as authentication-but-not-encrypted headers for data. It uses AES in counter mode (CTR mode - encrypting a counter using the key with AES, and then using the output of that as the CSPRNG for a stream cipher) and a custom MAC scheme, and requires the IV never be reused. One advantage of AES-GCM over something like the encrypt-then-MAC is that it's parallelizable and faster. AES-GCM outputs the IV, authenticated header, authenticated and encrypted message, and the 128-bit authentication tag.

6/2/17

AES-GCM is widely used in the real world for combined authentication and encryption, because it's fast and is proved correct by Iwata-Ohashi-Minematsu in 2012, under certain assumptions. In fact, Intel and AMD processors have special instructions to make AES-GCM implementations a lot faster.

Public Key Cryptography

The main drawback of symmetric cryptography is the key establishment problem - how do we distribute secret keys to all the parties that need it, but nobody else?

One way to solve this is to distribute keys over a secure channel point-to-point, like a face to face meeting, a trusted courier, or a key embedded in a smart card. However, it's hard to obtain a secure channel, and if we already had a secure channel, we could often just use that channel to communicate instead.

Another way is to give the key to a trusted third party, which distributes the secret key as desired. However, finding a trusted third party is quite difficult, and it's also a single point of failure.

There's also the key management problem - in a network of n users, each user needs one key for every other user, so there are O(n^2) keys needed.

Also, symmetric encryption can't achieve non-repudiation, because there are always multiple parties that could have written a given message (since the key needs to be shared with other parties in order to communicate). Theoretically we could do this by sharing a key only with a trusted third party that ensures non-repudiation, but that kind of defeats the point of using cryptography for non-repudiation in the first place.

Public key cryptography works somewhat differently. Using a public key cryptosystem, Alice and Bob first exchange authenticated (rather than secret) information, and then can communicate securely. It was first invented by Merkle, Diffie, and Hellman. It's much easier to exchange information in an authenticated way, then in a secret way. Intuitively this seems impossible, and Merkle's professor originally told him it was impossible back in 1975, but it is indeed possible.

The advantages of public key cryptography over symmetric cryptography are: only requiring an authenticated channel rather than a secured one, each user only has one keypair, a signed message can be verified by anyone, and signatures can achieve non-repudiation. The disadvantages are: private/public keys are usually a lot larger than symmetric cipher keys, making them harder to share in some cases, and public key cryptography is usually a lot slower.

Suppose Alice and Bob want to communicate securely. Alice and Bob want to establish a session key by communicating over a public, authenticated channel. We'll now describe Merkle's proof-of-concept public key cryptography idea.

Alice creates puzzles P_1, \ldots, P_N for some large N, and each puzzle takes time t to solve. The solution to a given puzzle P_i is the session key k_i and the serial number n_i, and Alice keeps track of the solutions to each generated puzzle. Alice sends all the puzzles to Bob, over the authenticated channel.

Then, Bob randomly selects a puzzle P_j, solves it in time t to obtain k_j and n_j, then sends n_j back to Alice. Since Alice knows n_j, Alice can easily look up the corresponding k_j. Now, both Alice and Bob know k_j, which they can now use as the session key. In contrast, Eve doesn't know which puzzle has serial number n_j, so Eve must try each puzzle to see if the serial number matches. That means on average Eve must solve \frac N 2 puzzles, spending \frac{Nt}{2} time overall. For large enough values of N like N = 10^9, it should become infeasible for Eve, while Bob can feasibly still solve the single puzzle.

One example of a Merkle puzzle is AES-encrypting the session key (128-bit) and the serial number (128-bit) (the serial number is appended to the session key twice, so we can check if the puzzle was solved correctly), with a randomly selected 40-bit string as the secret key. Then we can solve the puzzle by exhaustive key search in 2^{40} operations.

More generally, we want each communicating party C to generate a key pair consisting of a public key P_C and a private key S_C, such that it's hard to recover $S_AC from P_C, then publishes P_C while keeping S_C secret.

If A wants to communicate confidentially with B, then A obtains an authentic copy of P_B, then sends B the message encrypted with P_B. Bob then decrypts the message using S_B, which only B knows. Since public key cryptography is so slow, we usually implement this in practice by using public key cryptography to communicate a session key (which is a fixed-length, relatively short value), and then using symmetric cryptography to encrypt the message with that session key.

If A wants to sign a message, A just signs the message with S_A, which only A knows. To verify the signature on the message, we obtain an authentic copy of P_A and verify the signature using P_A. Since public key cryptography is so slow, we usually implement this in practice by hashing the message, and then using public key cryptography to sign the hash of the message (which is fixed-length, relatively short value).

8/2/17

Algorithmic Number Theory

Recall the fundamental theorem of arithmatic - every integer n \ge 2 has a unique prime factorization.

Just because it exists though, doesn't mean that prime factorization is efficient to find - as of this writing, we are not aware of any way to find the prime factorization in polynomial time - we need to do around O(\sqrt{n}) operations, while the length of the input is around O(\log n) (the number of bits needed to write the input down in a reasonable encoding). This is still an open problem - whether we can factor numbers in polynomial time with respect to the length of the input.

However, it's quite simple to verify whether a given prime factorization of a number is correct - just multiply them together.

In this course, by running time we mean the worst case time complexity. and by efficient we mean polynomial time. We also allow average case time complexity in some cases, especially when we look at probabilistic algorithms.

Suppose we have two integers a, b. Clearly, they have lengths O(\log a) and O(\log b) respectively. Since we usually look at length rather than the numerical value itself, we represent the length as k_a, k_b instead - the number of bits in each number.

Clearly, computing a + b takes O(k_a + k_b) - we always have to at least look at every bit of a and b to get the correct answer, and there are k_a + k_b bits total. Likewise, computing a - b is also O(k_a + k_b). We often assume without loss of generality that k_a = k_b, and just call it k

Multiplication can be done naively in O(k^2), and modern algorithms can do it in as little as O(k^{\log_2 3}) or k^{k \log k (\log \log k)} - it's still an open problem what the optimal value is.

For division, we can naively just repeatedly subtract the shifted version of the denominator from the numberator, zeroing one bit of the numerator each time. That means with the naive algorothm, we need to perform k subtractions, giving us an total time of O(k^2).

The GCD algorithm \gcd(a, 0) = 0, \gcd(a, b) = \gcd(b, a \mod b) continuously finds the remainder of dividing a number by another. Both numbers are positive and the second is strictly decreasing at each step. In fact, we can prove that the values are halved or lower after every two steps.

Since division/remainder takes O(k^2), and we're doing, in the worst case, k steps, we might expect that GCD takes O(k^3). However, GCD actually just takes O(k^2) time, because the values we're dividing are rapidly decreasing.

Let's say we're now doing modular arithmetic, so with the modulus n, all values are within [0, \ldots, n - 1]. Now we can compute a + b \pmod n by computing a + b and subtracting n if a + b \ge n, and we can compute a - b \pmod n by computing a - b and adding n if a - b < 0. That means modular addition and subtraction are both O(k).

Multiplication in modular arithmetic can be done by multiplying normally, and then performing a division by n and finding the remainder - it takes O(k^2). Same goes for finding the inverse, \frac 1 a \mod n

Turns out, exponentiation can be done in O(k^3) time - we can compute a^b \pmod n in cubic time with respect to the sizes of a and b!

10/2/17

The naive way to compute a^m \mod n is to compute a^m, then take that number mod n. This potentially gets us a really large result, and is somewhat wasteful. Another slightly improved naive method is to let result = 1, then run result = (result * a) % n m times. This solves the issue of having really large intermediate results, but isn't polynomial time and is pretty inefficient.

An improved method is to use the squaring method, which decomposes m into a sum of powers of 2, then successively squares a and multiplies the result for each power:

A = a
result = 1
while m > 0:
    # at zero-index $i$, `A` is $a^{2^i}$ and the lowest bit of `m` is the $i$th bit of the original value of $m$
    if m & 2: # $m$ written as a sum of powers of 2 has a term with exponent $i$
        result = (result * A) % n
    A = (A ** 2) % n
    m >>= 1

Since there are at most k modular squarings and k multiplications, each of which is O(k^2), the worst case time is O(k^3).

Recall Fermat's Little Theorem, which says, among other things, that if a and n are coprime, and a^{n - 1} \mod n \ne 1, then we know n is composite (if it is 1 though, we don't know if it's prime or not). This is a really useful primality test - we just need to choose the right a to prove that n is not prime.

RSA

RSA was invented by Rivest, Shamir, and Adleman in 1977.

To generate a key for a user U:

  1. U randomly selects 2 large distinct primes p, q of the same bit length k.
  2. U computes n = pq and \phi(n) = (p - 1)(q - 1).
  3. U selects an arbitrary e such that 1 < e < \phi(n) and e is coprime with \phi(n). A common value of e is 3 - we can just ensure e is coprime by ensuring it's prime.
  4. U computes d such that ed \equiv 1 \pmod{\phi(n)} and 1 < d < \phi(n).
  5. U publishes the public key \tup{n, e}, and keeps the secret key d to themselves.

To send an encrypted message from user U to user V (note that real implementations also need to handle things like side channel attacks):

  1. U obtains an authentic copy of V's public key \tup{n_V, e_V}.
  2. U represents the message as 0 \le m \le n_V - 1 such that \gcd(m, n_V) = 1 - the message is represented as a single, fixed-size number that is not one of the primes p, q.
  3. U computes c = m^e_V \mod n_V.
  4. U sends c to V.
  5. V computes m = c^d_V \mod n_V. Note that no part of this requires U's public or private keys.

Decryption is essentially computing (m^e)^d = m \mod n. Finding eth roots of a number mod n requires us to factor n, which is hard. The hardness of solving for m is the basis for RSA's security.

Suppose p \mid m. Then m \equiv 0 \pmod p, so m^{ed} \equiv 0 \pmod p, so (m^e)^d = m \pmod p.

Suppose p \nmid m. Since ed \equiv 1 \pmod{\phi(n)}, ed = 1 + k\phi(n) = 1 + k(p - 1)(q - 1) for some k. By Fermat's Little Theorem, m^{p - 1} \equiv 1 \pmod p, so taking both sides to the power of k(q - 1), we get m^{k(p - 1)(q - 1)} \equiv 1^{k(q - 1)} \pmod p. Multiplying both sides by m, we get m^{1 + k(p - 1)(q - 1)} \equiv m \pmod p, or m^{ed} \equiv m \pmod p. By a similar argument, m^{ed} \equiv 1 \pmod q. Since p and q are coprime (since they're distinct primes), m^{ed} \equiv m \pmod{pq}, so m^{ed} \equiv m \pmod n.

An implementation of the extended Euclidean algorithm:

def EEA(a, b):
    # PRECONDITIONS: `a` and `b` are natural numbers
    # POSTCONDITIONS: `r_second == a * x_second + b * y_second`
    x_first, x_second = 1, 0
    y_first, y_second = 0, 1
    remainder_first, remainder_second = a, b
    while remainder_first != 0:
        quotient = remainder_second // remainder_first
        remainder_first, remainder_second = remainder_second - quotient * remainder_first, remainder_first
        x_first, x_second = x_second - quotient * x_first, x_first
        y_first, y_second = y_second - quotient * y_first, y_first
    return remainder_second, x_second, y_second

13/2/16

The obvious attack on RSA is to factor n to get p, q, and then use the extended Euclidean algorithm to get d - we just need to compute the private key. After decades of research, however, the consensus is that it's a hard problem.

In other words, we can define the basic RSA attack as "given \tup{n, e}, find p, q by factoring". Alternatively, "given \tup{n, e}, compute d". Note that the compute-d attack is at least as hard as factor-n attack, because if we have p, q, we can easily compute d, but if we have d, it's not particularly easy to get p, q. More formally, there's a turing reduction from every compute-d problem to a corresponding factor-n problem - if we can factor n quickly, we can compute d quickly, by using the oracle to factor n to get p, q, then compute \phi(n) = (p - 1)(q - 1) and ed \equiv 1 \pmod{\phi(n)}.

Also, it turns out there's a turing reduction from every factor-n attack to a compute-d problem. Therefore, since factor-n and compute-d both turing reduce to each other, they are polytime equivalent (denoted A_1 = A_2, where A_1 is factor-n and A_2 is compute-d). Knowing this is useful because factoring is a very well studied problem, and we've spent centuries convincing ourselves that it's a difficult problem.

The RSA problem is defined as: given \tup{n, e} and 0 \le c \le n - 1, find 0 \le m \le n - 1 such that c \equiv m^e \pmod n. In other words, we're trying to find the eth root of c under mod n. It turns out that the RSA problem turing reduces to factor-n, so it's no harder than factoring. ;wip: why?

However, we'd really like the factor-n problem to turing reduce to the RSA problem, because that would mean the RSA problem is at least as hard as factoring. However, this is an open question, and nobody has managed to prove it yet. That said, we generally just assume the RSA problem is hard, because we want that to be the case and nobody'd solved it yet.

For RSA by itself, a dictionary attack is posible - if the plaintext space is small and predictable enough, the attacker can precompute every possible plaintext/ciphertext pairs. To prevent this, we can append/prepend a random salt (e.g., a random 128-bit string) to the plaintext before encrypting it, then remove it after decrypting the ciphertext (e.g., by trimming it off of the decryption result to get the plaintext). That means the plaintext space is no longer small and predictable.

For RSA by itself, a chosen plaintext attack is also possible - given c and a decryption oracle f(c') = m' for any c' \ne c (we can decrypt any ciphertext except the one we're given), we can compute m from c. How does this work?

Suppose without loss of generality that \gcd(c, n) = 1 (because if \gcd(c, n) \ne 1, then it must be a factor of n, so we could easily find p, q and compute d).

Now, select an 2 \le x \le n - 1 such that \gcd(x, n) = 1 (again, if the GCD wasn't 1 then we've factored n and are done). Let \hat c = cx^e \pmod n, so we can obtain \hat m using the decryption oracle: \hat m = f(\hat c). Since \hat m \equiv \hat c^d \pmod n and \hat c \equiv cx^e, \hat m \equiv (cx^e)^d \equiv c^d x^{ed} \equiv mx \pmod n. Therefore, m = \hat m x^{-1} \mod n = f(cx^e) x^{-1} \mod n.

To mitigate this, we can make sure decryption routines only return results if the message is formatted correctly (e.g., a hash of the rest of the message included with the message is correct). That means the decryption oracle would be highly unlikely to give an answer for any ciphertext other than one resulting from a properly encrypted message. We add in the formatting before encrypting, and then strip it out after decrypting.

15/2/17

PKCS #1 V1.5 is a set of public key cryptography standards, and has a standard for RSA. It's not really that good in practice, but it's very widely used.

Among other things, it defines the correct way to format the plaintext. Let M be the original plaintext, m be the formatted plaintext, and c = m^e \mod n be the ciphertext. The format is as follows: the bytes 0x00 and 0x02 (the standard's version number), 8 or more random non-zero bytes (the salt, to prevent dictionary attacks), the 0x00 byte (to denote the end of the salt), followed by the message. The total length of the format can be up to the bit size of n - 2048-bit RSA means we can have total length of 256 bytes, and a maximum message size of 245 bytes (since there are at least 11 bytes before the message itself in the format).

A public key cryptosystem is secure if and only if it is semantically secure against chosen-ciphertext attacks by a computationally bounded adversary.

To simulate a computationally bounded adversary, the adversary gets a target ciphertext and a decryption oracle that can decrypt anything except that particular ciphertext, and is trying to learn anything about the plaintext besides its length. The decryption oracle rejects any decryptions that aren't properly formatted.

The PKCS #1 V1.5 RSA standard's formatting of messages does allow the decryption oracle to reject attacker-generated messages in practice, but there are theoretically weaknesses in this padding scheme.

A polynomial time algorithm has worst case time O(n^c) for a constant c. An exponential time algorithm has worst case time O(2^{cn}) for a constant c. A subexponential time algorithm has worst case time 2^{o(n)}. Polynomial-time (polytime)/exponential-time/subexponential-time efficient algorithms are similar to polynomial-time/exponential-time/subexponential-time algorithms, but for expected/average time rather than worst case. Polytime is efficient, Subexponential time is inefficient, and exponential time is very inefficient.

Subexponential algorithms can often be written in the form L_n[\alpha, c] = O(\exp((c + o(1)) (\log n)^\alpha (\log \log n)^{1 - \alpha})) where 0 \le \alpha \le 1 and c > 0. Note that L_n[0, c] is polynomial time and L_n[1, c] is fully exponential.

The naive way of factoring a number n is to try dividing all the integers from 2 to \floor{\sqrt{n}}. While this is polynomial with respect to n, n means a different thing in this context - the value of the number (e.g., "5234" has value 5234), rather than how long the number is (e.g., "5234" needs at least 4 digits to store).

Besides trial divisions, there's also a number of other special-purpose factoring algorithms like the special number field seive, elliptic curve factoring algorithm, and so on, which are only efficient if n has a special form, like when n - 1 has only small factors. To make sure RSA is robust to all known factoring attacks, we choose the RSA primes p, q at random and with the same bit length.

;wip: express that RSA bit size grows faster than linear wrt security level, since factoring is subexponential

There are also new general-purpose algorithms that can factor arbitrary numbers in slightly better time. For example, the number field seive factoring algorithm can do it in L_n\left[\frac 1 3, 1.923\right], and before that, the quadratic seive factoring algorithm can do it in L_n\left[\frac 1 2, 1\right].

Because there haven't been improvements on these in several decades, most mathematicians believe that factoring is not solvable in polynomial time. That said, we've used these methods to successfully factor up to RSA-768. Today, RSA-1024 is the minimum length considered secure, while RSA-2048 and RSA-3072 is recommended. Also, Shor's algorithm can factor numbers very easily on quantum computers, though we currently don't know if large-scale quantum computers can be built at all, and depends on the details of quantum mechanics that we don't have yet.

In hand-wavy terms, 3-DES is roughly as secure as SHA-224 and RSA-2048 (112 bits of security), AES-128 is roughly as recure as SHA256 and RSA-3072 (128 bits of security), AES-192 is roughly as secure as SHA-384 and RSA-7680 (192 bits of security), and AES-256 is roughly as secure as SHA-512 and RSA-15360 (256 bits of security). Something having k bits of security means the fastest known attack for it requires around 2^k operations.

17/2/17

To send a signed message from user U to user V (note that real implementations also need to handle things like side channel attacks):

  1. U computes s = m^d \mod n, and publishes this as the signature (the signature is the eth root of the message).
  2. V obtains an authentic copy of U's public key \tup{n, e}.
  3. V checks that s^e \equiv m \pmod n to verify that the signature is valid.

The RSA problem needs to be hard for this to work, because otherwise we could efficiently solve s^e \equiv m \mod n, allowing us to easily compute s without d, the secret key.

In decreasing order of strength, the adversary is trying to perform a total break (recovering the private key or forging signatures at will), selective forgery (forging signatures for some messages), and existential forgery (forging a signature for any one message). The adversary can perform a key-only attack, known plaintext, and chosen-plaintext attack.

Much like with MAC, a signature scheme is secure if it is existentially unforgeable by a computationally bounded adversary performing chosen-plaintext attack. We simulate computational boundedness by saying the adversary has access to a signing oracle - they can use the oracle to get an arbitrary message signed, as long as it's not one of the messages we actually want to forge a signature for.

Bellare and Rogaway proved proved that if the RSA problem is intractable and H is a random function, then RSA signatures with full-domain hashes is a secure signature scheme.

A full-domain hash function is one that covers the entire RSA signature domain. SHA-256, for example, is not a full-domain hash function, because there are 2^{256} possible values, while something like RSA-4096 has n possible values, and n is much larger than 2^{256}.

If the hash function isn't full-domain (i.e., the hash size is a lot smaller than the RSA bit size), then there are theoretical attacks on a signature scheme using that hash and that RSA bitsize.

PKCS #1 V1.5 also defines a standard for RSA signatures. Invented in 1993, it's widely used today. The process looks like this:

  1. Alice computes h = H(m) where H is a hash function like SHA-1 or SHA-256.
  2. Alice formats h into a message M with length k (the RSA bit length): the bytes 0x00 and 0x01 (representing the version of the standard), the repeated bytes 0xFF (the padding, computed to be long enough to make the entire message have length exactly k), the byte 0x00 (to represent that the padding is over), the name of the hash function (e.g., "SHA-1", "SHA-256"), and finally the hash value itself.
  3. Alice signs M using s = M^d \pmod n, then sends Bob \tup{m, s}.
  4. Bob verifies the padding is valid, and checks the hash value in m

27/2/17

However, this signature scheme is actually really broken if you implement it the obvious way (forgetting to check for additional bytes after the padding), and we can even forge signatures by hand pretty easily. In 2006, Blieichenbacher demonstrated an attack that is capable of breaking the PKCS #1 V1.5 signature scheme by hand.

First, we assume:

  1. e = 3, because 3 is one of the most common values for e in practice (we could extend the attack to work for larger e values, but it would be pretty messy).
  2. n = 3072 (this works for any n though).
  3. The hash function is SHA-1 - a 160-bit hash (so it's a lot smaller than n).
  4. The signature verification routine doesn't check whether the padded hash M has garbage after the end (this actually occurs surprisingly often in practice, like in OpenSSL, Adobe Acrobat, JRE, etc.).

First, we pick an arbitrary message m, and compute h = \mathrm{SHA1}(m). Let z be the byte 0x00. Let C be the 15-byte name of SHA1 in the PKCS #1 V1.5 standard. Let D = z \| C \| h be the formatted version of our fake message (we don't use the 0xFF padding here).

Clearly, D has 288 bits, since the 0 byte has 8 bits, C has 120 bits, and h has 160. Let N = 2^{288} - D. If N isn't divisible by 3, try modifying m slightly and redo all the previous computations until we get one that's divisible by 3. On average, this will take 1.5 attempts, so it's very practical to do by hand.

Let S = 2^{1019} - \frac{2^{34}N}{3}. At this point, if we try to verify the signed message \tup{m, S} with the signature verification routine, it will pass - we just forged the signature! Why does this work?

Verification is done by computing S^3 \mod n = \left(2^{1019} - \frac{2^{34}N}{3}\right)^3 \mod n = 2^{3057} - 2^{2072}N + 2^{1087} \frac{N^2}{3} - \left(\frac{2^{34} N}{3}\right)^3 \mod n. Since 0 \le 2^{3057} - 2^{2072}N + 2^{1087} \frac{N^2}{3} - \left(\frac{2^{34} N}{3}\right)^3 \le n, the modulo is redundant, so 2^{3057} - 2^{2072}N + 2^{1087} \frac{N^2}{3} - \left(\frac{2^{34} N}{3}\right)^3 \mod n = 2^{3057} - 2^{2072}(2^{288} - D) + 2^{1087} \frac{N^2}{3} - \left(\frac{2^{34} N}{3}\right)^3.

Let g = 2^{1087} \frac{N^2}{3} - \left(\frac{2^{34} N}{3}\right)^3 (the later parts of the above expression) be our "garbage" that we append to the end. Since 0 \le g \le 2^{2072}, we can represent g with a bitstring of length 2072. So S^3 \mod n = 2^{3057} - 2^{2072}(2^{288} - D) + g = 2^{3057} + 2^{2360}(2^{697} - 1) + 2^{2072}D + g. Note that 2^{697} - 1 is 696 bits of all 1's, or 87 bytes of 0xFF.

So overall S^3 \mod n is 3072 bits, the last 2072 bits are our garbage g, the 288 bits before that is D, the 696 bits before that is all 1's due to 2^{697} - 1, and the bit before that is 1 due to 2^{3057}, and the 15 bits before that are 0.

(Discussion about SHA-1 collision found recently, and its impact on PGP, SSL, Git, and the switch toward SHA-2 and SHA-3)

1/3/17

;wip: slides about properties of ecash and paper cash

David Chaum proposed a cryptographic currency system called e-cash, based on privacy and anonymity (like paper cash, for most purposes), but the system was never widely successful. Bitcoin later attained some success, but provides much weaker guarantees of privacy/anonymity.

The e-cash system has 3 parties - the payer, payee, and the financial network. A coin is a representation of value, and coins are stored on cards. Payments support transferring coins from banks to cards (withdrawal), from cards to cards (payment), and from cards to banks (deposits).

We want payments to keep the payer anonymous, and for payments to be untraceable (so banks don't know where the coins that are being deposited are from). Additionally, we need it to be infeasible to forge coins, and it should be impossible to double-spend coins.

Also, it would be nice if it could work offline, be cheap/efficient, and for cash to be easily transfered/divided.

For Alice to withdraw $100 from the bank in a centralized model:

  1. Alice signs a withdrawal request for $100.
  2. The bank decreases Alice's account balance by $100.
  3. The bank signs a message containing a randomly selected 256-bit serial number and the value of the coin, and gives that message to Alice as a coin worth $100.

For Alice to pay Bob and Bob to deposit to the bank in a central model:

  1. Alice sends the coin to Bob, who forwards it to the bank.
  2. The bank verifies that the coin signature is correct, and that it's not already spent.
  3. The bank marks the coin as spent in its internal database.
  4. The bank increases Bob's account balance by $100, and tells Bob that the coin was valid.
  5. Bob completes the transaction with Alice.

This gives us unforgeable coins and avoids double-spending, but the bank knows all the payers and payees (because it issues serial numbers), so there's neither payment anonymity or payment untraceability.

This works because Bob checks that the coin is valid before completing the transaction. If we didn't check it before completing the transaction, Alice could spend the coin multiple times and complete multiple transactions, while only the first store to deposit will get the coin. We also can't know who was double spending, because Bob or anyone else with a copy of the coin could do the same thing.

To implement payer anonymity and untraceability in the centralized system, we can use RSA blind signatures. A blind signature essentially allows one to sign something without knowing what they're signing.

For Alice to withdraw $100 from the bank in an anonymous centralized model:

  1. Alice prepares a message M containing a randomly selected 256-bit serial number and the value of the coin they want to withdraw.
  2. Alice selects a blinding factor r, an integer coprime to n (i.e., r \ne p and r \ne q).
  3. Alice computes a blind message m' = H(m) r^e \mod n. Clearly r^e \mod n is a uniformly random number between 0 and n (since exponentiating by e is just a permutation of r). Further, since r is independent of m, we can't learn anything about M from m'.
  4. Alice requests $100 from the bank and gives the bank m'.
  5. The bank decreases Alice's account balance by $100, signs m' with s' = (m')^d \mod n.
  6. Clearly, s' = H(m)^d r^{ed} \mod n = H(m)^d r \mod n = sr \mod n, where s is a signature for H(m). So Alice can compute s = s' r^{-1} \mod n - a signature by the bank for H(M), even though the bank knows nothing about M. The signed message M is the coin \tup{M, s}.

Additionally, the bank has a different key pair for each coin value - the bank would sign a $100 coin with one key pair, a $1000 coin with a different key pair, and so on. This ensures that Alice can't prepare an M that says they want to withdraw $1000 and actually only withdraw $100, which would make Alice gain $900 if it succeeded, since if Alice tried to, the signature wouldn't validate.

The idea is that because of the blinding factor, a particular coin could have come from any withdrawal. The bank knows it's signing a $100 coin, but not what the corresponding M actually is. Therefore, the bank doesn't know who payers and payees are.

3/3/17

We still want our electronic cash system to work offline though, without a centralized network. This is difficult because we want coins to be both anonymous, and we don't want to allow double-spending of coins.

We can actually do this using a blinded signature scheme to satisfy both of these properties (see the slides at around page 269 for more details), though it's quite inefficient. There are real-world electronic cash systems that can do the same thing more efficiently, but they're a lot more complex.

This scheme doesn't need coin serial numbers because the masks and salts should be enough randomness to make each coin unique.

Though this sort of scheme isn't in wide use for currency, the ideas it uses, like blind signatures, are used in many real-world systems.

6/3/17

For symmetric cryptography, the communicating parties always need to share a secret key over a confidential channel. Public key cryptography reduced this requirement to only requiring an authenticated channel. How do we ensure each party gets an authentic copy of the other party's public key when encrypting a message or verifying a signature?

Where does Alice get Bob's public key from? How do we know it's really Bob's key? How do we revoke/update public keys? Public key infrastructure (PKI) needs to address these issues.

Some ways to distribute public keys are: point-to-point over trusted channels (voice, in-person, etc.), centralized trust stores (authentication trees), online/offline certificate authorities, identity-based cryptography (a trusted party holds all the key pairs and gives them out based on identity, like email).

PGP is one of the original public key cryptography software products. It supports algorithms like AES, 3-DES, RSA, DSA, and so on, and doesn't specify public key management itself. What ended up happening is that public keys are distributed based on the web of trust - a network of mutually trusting people who have signed each other's keys. This is based on trust being transitive (if A trusts B, and B trusts C, then A trusts C), but it doesn't work out that well in practice since trust isn't really transitive. However, it does have the advantage of not needing to trust a single third party.

Modern cryptography systems often use certificate authorities as a large part of their PKI - trusted third parties that issue signed certificates saying "the public key X belongs to Bob". Certificates contain the person's identity (e.g., email, domain name), the person's public key, and metadata like the expiry date. The CA doesn't need to be trusted with private keys, we just need to trust that it won't sign fake certificates. Some parts of this type of PKI include the certificate formats, certificate revocation protocols, certificate policy (what each certificate can be used for), and CA protocols (how do they handle the private key)?

SSL and TLS are currently the biggest implementations of this sort of PKI. SSL/TLS consists of a handshake protocol (authenticate and negotiate public keys), and a record protocol (encrypt and authenticate data). The handshake protocol checks that the server's public key has a certificate signed with one of the several hundred CAs whose public keys are preinstalled in the browser, and the server can do likewise for client certificate (although client certificates are rarely used).

(A live demonstration of what makes up a TLS certificate, using Amazon and the Security tab in Chrome's devtools)

8/3/17

QA session with the prof. Midterm is tonight at 7-9PM.

10/3/17

In SSL, signatures are used to prove that an entity has access to a particular private key. That means that the root CAs sign their own public keys, to verify that they actually have the private keys that correspond to the public keys they use.

When Alice visits Bob's website, amazon.com:

  1. Bob sends Alice a certificate, containing public keys for Bob (amazon.com), Symantec (the intermediate CA used by Amazon), and Verisign (the root CA used by Symantec).
  2. Alice checks that Verisign's public key is correctly signed by a root CA hard coded into its trust store.
  3. Alice checks that Symantec's public key is correctly signed by Verisign's public key.
  4. Alice checks that Bob's public key is correctly signed by Symantec's public key. If it is correct, then Bob's public key is authentic.
  5. Alice chooses a session key, encrypts it using Bob's public key, and sends it to Bob.
  6. Bob decrypts the session key using his private key, and Alice and Bob can start communicating using AES-GCM.

In this case, Alice and Bob depend on Symantec to correctly vouch for Bob's identity, and to very carefully guard the private keys using strong physical security measures, such as ensuring the key can only be accessed with the cooperation of 3 people and keeping the key in a special, guarded signing room. Even so, there are hundreds of root CAs trusted in the browser, and it only takes one comprimised one to issue a fake certificate. Browser vendors need to carefully choose which root CAs to include in their trust store.

For example, the DigiNotar root RA issued hundreds of fraudulent certificates, which were immediately used to phish around 300000 users in Iran with a fake GMail login page with the real GMail login page URL and a green padlock in the address bar. This appeared to be a state-sponsored attack. Something similar happened in 2016 when a Chinese government-controlled root CA WoSign issued fraudulent certificates for sites like GitHub and Alibaba. Browser vendors immediately removed DigiNotar and WoSign from their trust stores and pushed out updates.

Extended Validation (EV) certificuates are simply those belonging to CAs that meet a set of stronger requirements (e.g., a more thorough identity check process). If CAs apply for EV status, and meet those stronger requirements, they get EV status and can issue EV certificates, which generally show up in browsers as a green box rather than just a padlock.

13/3/17

Overview and discussion of the recent Wikileaks CIA documents, such as the cryptographic requirements in use.

Discrete logarithm public key cryptography schemes.

Let p be an odd prime. Let \mb{Z}_p = \set{0, \ldots, p - 1} with arithmetic performed mod p (e.g., in \mb{Z}_{17}, 14 + 11 = 8, and in \mb{Z}_{13}, 5^{-1} = 8).

Also, \mb{Z}_p^* = \mb{Z}_p \setminus \set{0}.

We can now write Fermat's little theorem as follows: if a \in \mb{Z}_p^* for some prime p, then a^{p - 1} \equiv 1 \pmod p.

If a \in \mb{Z}_p^*, then the order \ord(a) of a mod p is the smallest positive integer t \in \mb{Z}_p^* such that a^t \equiv 1 \pmod{p}. For example, given p = 13, \ord(1) = 1 since 1^1 \equiv 1 \pmod{13} and \ord{12} = 2 since 12^2 \equiv 1 \pmod{13}. Due to Fermat's little theorem, the order must exist and be between 1 and p - 1 inclusive. For now, we'll compute these orders by trying all the possible t values.

Given a prime p, a \in \mb{Z}_p^*, and t = \ord(a):

  1. Given s \in \mb{Z}, a^s \equiv 1 \pmod p if and only if t \mid s.
  2. t \mid p - 1, by Fermat's little theorem and point 1.
  3. Given x, y \in \mb{Z}, a^x \equiv a^y \pmod p if and only if x \equiv y \pmod t.

The discrete logarithm problem (DLP): let p be a large prime, and q be a prime divisor of p - 1. Let g \in \mb{Z}_p^* such that \ord(g) = q. Let h \in <g>. Determine a = \log_g h - in other words, find the integer 0 \le a \le q - 1 such that h = g^a \pmod p (a is the discrete logarithm of h in base g in \mb{Z}_p).

For example, \log_{59} 22 = 9 \pmod {67}.

Proof that g must exist:

Clearly, there always exists an \alpha \in \mb{Z}_p^* such that \ord{\alpha} = p - 1.
This value of \alpha is called a generator of \mb{Z}_p^* because \set{\alpha^k \mod p : 0 \le k \le p - 2} (we use p - 2 because \alpha^{p - 1} \equiv 1 \pmod p and \alpha^0 \equiv 1 \pmod p, so k = p - 1 would be redundant with k = 0) will give some permutation of the integers 1 to p - 1.
Let g = \alpha^{\frac{p - 1}{q}}. Clearly, g^q \equiv (\alpha^{\frac{p - 1}{q}})^q \equiv alpha^{p - 1} \equiv 1 \pmod p.
So by point 1 above, \ord(g) \mid q, so \ord(g) is either 1 or q, since q is prime.
However, if q = 1, then g \equiv \alpha^{\frac{p - 1}{q}} \equiv 1\pmod p contradicts \ord{\alpha} = p - 1, so \ord(g) = q.

15/3/17

;wip: copy in DLP discrete logarithm problem from photo

One way of solving the discrete logarithm problem is exhaustive search: Compute g^a \mod p for every 0 \le a \le q - 1, until we find the a such that g^a = h. This takes q modular multiplications, so O(p) runtime (since p and q are the same size).

Shanks's Algorithm is a faster way:

Let m = \ceil{\sqrt{q}}. Clearly, a = im + j for some 0 \le i \le m - 1 and 0 \le j \le m - 1, because 0 \le a \le q - 1. If we find i and j, we can find a.
Clearly, j = a - im, so g^j \equiv g^{a - im} \pmod p. So g^a (g^{-m})^i \equiv g^j \pmod p, which means h (g^{-m})^i \equiv g^j \pmod p.
Therefore, we can find a simply by trying all the values of Note that this is similar to the meet in the middle attack on DES.

The algorithm can be written as follows:

  1. Construct a table of values \tup{j, g^j \mod p} for all 0 \le j \le m - 1, indexed by the second entry.
  2. Compute c = (g^m)^{-1} \mod p.
  3. Compute h c^i for all 0 \le i \le m - 1 until we find an i such that h c^i \equiv g^j \mod p for any j.
  4. We get a = \log_g h = im + j.

Since m = \ceil{\sqrt{q}}, we only need O(\sqrt q) time to do the above.

There are actually a lot better ways to do this. Pollard's algorithm takes O(\sqrt q) as well, but uses negligible space. The Number Field Seive uses ideas similar to integer factorization for RSA, and runs in L_p[\frac 1 3, 1.923] - subexponential with respect to \log_2 p!

There's also Shor's algorithm, which is a polynomial-time algorithm for quantum computers that can both find a and factor n. There seems to be an important correspondence between the integer factorization problem and the discrete logarithm problem, though at this time we don't have any proof of that.

Looking at these best known algorithms, how do we get 128 bits of security? Since the number field seive is L_p[\frac 1 3, 1.923] with respect to a function of p, we need a p of about 3072 bits to get 128 bits of security. Since Pollard's algorithm is O(\sqrt q), we need a q of about 256 bits to get 128 bits of security (we need \sqrt{q} to have around 128 bits).

Diffie-Hellman key agreement was one of the earliest key exchange protocols, and is still widely in use. The goal is to allow Alice and Bob to agree on a shared secret key over an unsecured channel! Given public domain parameters p, q, g, defined as they are for the discrete logarithm problem, and a public hash function H(x):

  1. Alice chooses a random 1 \le x \le q - 1.
  2. Alice sends X = g^x \mod p to Bob (solving for x is the discrete logarithm problem).
  3. Bob does the same, choosing a random 1 \le y \le q - 1 and sending Y = g^y \mod p to Alice.
  4. Alice computes K = Y^x \mod p = (g^y)^x \mod p = g^{xy} \mod p.
  5. Bob computes K = X^y \mod p = (g^x)^y \mod p = g^{xy} \mod p.
  6. Now both Alice and Bob have g^{xy} \mod p, but nobody else does, without being able to efficiently solve the discrete logarithm problem. We can now compute k = H(K) using some hash function, and use that as the secret key.

Diffie-Hellman key exchange provides security against passive adversaries - adversaries that don't actively interfere with the connection. The Diffie-Hellman problem (DHP): given p, q, g, X, Y, find K. Clearly, DHP is at least as hard as DLP.

17/3/17

However, unauthenticated Diffie-Hellman key exchange isn't secure against active attackers, because an active attacker could just act as a man-in-the-middle (an intrude-in-the-middle attack), where the attacker pretends to be Alice to Bob and Bob to Alice, transparently forwarding messages through.

To fix this, we use authenticated Diffie-Hellman. Alice and Bob each generate certificates containing their identity (name, email, etc.), RSA public key, and certificate authority signature (the CA signature confirms that this RSA public key actually belongs to the given identity). Now, we do the same thing as for unauthenticated Diffie-Hellman, except afterwards, they exchange signed messages containing X, Y, and their identity, to

In real-world implementations of authenticated Diffie-Hellman, like SSL, the client (Alice) doesn't have their own signature, so only Bob sends the signed message with X, Y, and their identity.

Why do we use DH, even though Alice has Bob's public key and vice versa? In other words, why don't they just communicate with RSA? Well, Diffie-Hellman provides forward secrecy - even if an adversary managed to record all their encrypted traffic, and comprimise the long-term RSA keys, the session keys are still safe. To implement this, Alice and Bob both delete their copy of K, x, and y after they're done communicating. Since no other copies of those exist, and it's infeasible to recompute them from the available information X and Y, this provides forward secrecy.

Digital Signing Algorithm (DSA)

DSA is a signature scheme that's a common alternative to RSA signatures. It's based on the hardness of the discrete logarithm problem rather than integer factorization. It was designed by David Kravitz from the NSA, and standardized by NIST in 1994.

DSA has the public domain parameters p (3072-bit prime), q (256-bit prime factor of p - 1), g (;wip) and uses the SHA256 hash.

For Alice to generate keys:

  1. Select a random 1 \le a \le q - 1.
  2. Compute h = g^a \mod p - the public key is h, and the private key is a.

For Alice to generate a signature for a message M:

  1. Compute m = \text{SHA256}(M).
  2. Select a random per-message secret 1 \le k \le q - 1. It's very important that k not be reused for different messages.
  3. Compute r = (g^k \mod p) \mod q. If r = 0 (the probability of this is tiny however) then go back to step 2.
  4. Compute s = k^{-1} (m + ar) \mod q. If s = 0 (the probability of this is tiny however) then go back to step 2.
  5. The signature is \tup{r, s}.

For Bob to verify Alice's signature \tup{r, s} on a message M:

  1. Obtain an authentic copy of Alice's public key h$$.
  2. Compute m = \text{SHA256}(M).
  3. Verify that 1 \le r \le q - 1 and 1 \le s \le q - 1.
  4. Compute u_1 = s^{-1} m \mod q and u_2 = s^{-1} r \mod q.
  5. Compute v = (g^{u_1} h^{u_2} \mod p) \mod q.
  6. Accept the signature if and only if v = r.

20/3/17

Interesting properties of DSA, compared to RSA:

Elliptic Curve Cryptography

Let F = \mb{Z}_p, where p is a prime of at least 5. An elliptic curve over F is an equation E/F : Y^2 = X^3 + aX + b, where a, b \in F and 4a^3 + 27b^2 \ne 0 (this is to ensure the right hand side has no cubic roots).

Consider a certain "elliptic curve" over real numbers E/\mb{R} : Y^2 = X^3 - X = X(X - 1)(X + 1). If you graph this on a 2D cartesian plane, it resembles (but is not exactly) an ellipse.

Consider another "elliptic curve" over real numbers E/\mb{R} : Y^2 = X^3 + X. If you graph this on a 2D cartesian plane, it resembles a curve that opens toward the right.

One example of an actual elliptic curve E/\mb{Z}_{13} : Y^2 = X^3 + 2X + 9. Here, F = \mb{Z}_{13}, a = 2, b = 9, 4a^3 + 27b^2 \equiv 9 \pmod{13}.

The set of points on for an elliptic curve is the set of solutions (X, Y) to the equation itself, plus the point at infinity \infty. The point at infinity isn't really an actual solution to the elliptic curve - it's just a point we include as an additive/additive-inverse identity.

For this curve the set of points on the curve is E(F) = \set{\tup{x, y} : y^2 = x^3 + ax + b} \cup \set{\infty}. Clearly, E(F) = \set{\tup{0, 3}, \tup{0, 10}, \tup{1, 5}, \tup{1, 8}, \ldots}.

We find the set of points by trying individual X values and then for each of these, trying Y values.

Recall that we define security level of a given crypographic operation by powers of 2, where the value is the estimated number of operations we need to do to break that operation. Here are some more common security levels, including the ones we saw last time:

22/3/17

Elliptic curves have some basic operations defined.

For example, each point on the curve P has an additive inverse -P = (P_x, -P_y) (basically, we flip the point about the Y axis). The additive inverse of a point is essentially the value that you'd add to that point to get the point at infinity. As a result, the point at infinity is its own additive inverse, so \infty = -\infty:

def additive_inverse(point, modulus):
    # PRECONDITIONS: `point` is on the curve; `modulus` is prime
    if point == (): return ()
    return (point[0], -point[1])

The additive inverse operation is closed (-P is also a point on the curve if P is a point on the curve) because the elliptic curve equation uses Y^2, so Y^2 = X^3 + aX + b = (-Y)^2

We can also add points; geometrically, for two different points P, Q, we can draw a line between them, and this line is guaranteed to either intersect a third, different, point R, and P + Q = -R. For two equal points P = Q, we draw a tangent line, and this line is guaranteed to intersect another, different, point R, and again P + Q = -R:

def integer_multiplicative_inverse(a, modulus):
    # PRECONDITIONS: `modulus` is prime
    # by Fermat's Little Theorem, given a prime $p$,
    # $a^p \equiv a \pmod p$, so $a^{p - 1} \equiv 1 \pmod p$,
    # so $a a^{p - 2} \equiv 1 \pmod p$. Therefore,
    # $a^{p - 2}$ is by definition the modular multiplicative
    # inverse of $a$
    # if $p$ wasn't prime, we would use the EEA to find an inverse instead
    return pow(a, modulus - 2, modulus)

def elliptic_curve_add(point_1, point_2, curve_a, modulus):
    # PRECONDITIONS: `point_1` and `point_2` are on the curve; `modulus` is prime
    # handle one of the addends being the additive identity - the point at infinity
    if point_1 == (): return point_2
    if point_2 == (): return point_1

    x_1, y_1 = point_1
    x_2, y_2 = point_2
    if x_1 == x_2 and point_1 != point_2: return () # vertical line, the sum is the point at infinity
    if point_1 == point_2: # doubled point - use the tangent of the curve as the slope
        slope = ((3 * pow(x_1, 2, modulus) + curve_a) * integer_multiplicative_inverse(2 * y_1, modulus)) % modulus
    else: # find the slope of the line joining point_1 and point_2
        slope = (y_2 - y_1) * integer_multiplicative_inverse(x_2 - x_1, modulus)

    # intersect the line with the curve to find the third point, then mirror about X axis to get the sum
    sum_x = (pow(slope, 2, modulus) - x_1 - x_2) % modulus
    sum_y = (slope * (x_1 - sum_x) - y_1) % modulus
    return sum_x, sum_y

While it's intuitively easy to think of the solutions to the curve over real numbers, with modular arithmetic over integers the geometric intuition quickly fails. There are three main causes: adding the point at infinity to anything just gives that same point again, adding a point to itself gives the additive inverse of the new point intersected by the tangent line, and adding a point to another point gives the additive inverse of the new point intersected by the line passing through the original two points.

It turns out that the elliptic curve and the addition operation we defined forms an Abelian group - essentially, a closed, commutative, associative group containing an identity element and where every point has an inverse:

Also, \#E(F) is the cardinality of the elliptic curve - the number of solutions. Interesting cryptographic properties show up when there are a prime number of solutions.

Another useful operation is multiplication of a scalar and an elliptic curve point. Essentially, we just add the point to itself repeatedly. We can compute this efficiently using the double-and-multiply algorithm, since addition is associative:

def elliptic_curve_multiply(a, point, curve_a, modulus):
    # PRECONDITIONS: `a` is a non-negative integer; `point` is on the curve; `modulus` is prime
    if a == 0: return () # if we add the point to itself zero times, we get the point at infinity
    point_times_power_of_2 = point
    result = () # point at infinity is the initial value
    while a >= 1:
        if a & 1: # there's a 1 bit at this position, add the corresponding power of 2 times the point
            result = elliptic_curve_add(result, point_times_power_of_2, curve_a, modulus)
        a >>= 1
        point_times_power_of_2 = elliptic_curve_add(point_times_power_of_2, point_times_power_of_2, curve_a, modulus)
    return result

25/3/17

Let p be prime, and E / \mb{Z}_p with q = \#E(\mb{Z}_p) as a prime. Let P \in E(\mb{Z}_p) where P \ne \infty.

It turns out that E(\mb{Z}_p) = \set{\infty, P, 2P, 3P, \ldots, (q - 1)P} - we can repeatedly add P to itself to form all of the elements of the elliptic curve (this is because all groups of prime order are cyclic).

We'll call P a generator of E(\mb{Z}_p). For example, for E(\mb{Z}_{13}) : Y^2 = X^3 + 2X + 9, we let P = \tup{0, 3}, and get E(F) = \set{\tup{0, 3}, \tup{3, 9}, \tup{1, 8}, \tup{11, 7}, \ldots, \tup{11, 6}, \tup{1, 5}, \tup{3, 4}, \tup{0, 10}, \infty}. Because p and q are prime, and for number-theoretic reasons we won't cover in this course, every point in E(\mb{Z}_p) is a generator, excluding the point at infinity. This would not necessarily be true if p and q weren't prime.

We note that there's a pattern in the X axis coordinates of the points - it starts off as 0, 3, 1, 11, and then mirrors itself at the end with 11, 1, 3, 0. In fact, elliptic curves will always mirror themselves this way, due to the fact that Y^2 is on the left hand side of the elliptic curve equation. However, there don't seem to be any other obvious patterns.

The elliptic curve discrete logarithm problem (ECDLP) is what makes elliptic curve cryptography hard to break. It's defined as follows:

Given E p, q, P defined as above and a Q \in E(\mb{Z}_p), find 0 \le l \le q - 1 such that Q = lP (we can also denote this as l = \log_P Q).

Note that this is quite similar to the discrete logarithm problem we looked at a while ago:

The naive way of solving ECDLP instances is exhaustive search - compute P, 2P, 3P, \ldots until we find Q. This requires O(q) elliptic curve additions, or O(p) (by Hasse), which is non-polynomial. One improvement over this is to adapt Shanks' algorithm, with the group operations replaced as necessary. This gets us a running time of O(\sqrt q) = O(\sqrt p) additions and O(\sqrt q) = O(\sqrt p) space. We could also adapt Pollard's algorithm, which is also O(\sqrt q) = O(\sqrt p) time but negligible space.

It turns out that there's no analogue of the number field seive for ECDLP - ECDLP cannot be solved by any known subexponential time algorithm, unlike RSA and DSA! That means using the ECDLP is better for problem hardness than RSAP or DLP.

Just like with DSA, and unlike RSA, ECDLP only requires a 256-bit prime p to get a security level of 128 bits.

Elliptic curve cryptography can be used for many different situations that we used the DLP for. For example, in Elliptic Curve Diffie-Hellman Key Agreement (ECDH), we have the domain parameters E, p, q, P, Alice sends Bob A = aP for a randomly selected 1 \le a \le q - 1 (this can be computed with repeated doubling, analogous to repeated squaring), Bob sends Alice B = bP for a randomly selected 1 \le b \le q - 1, then Alice gets the session key with H(aB) and Bob gets the same key with H(bA). This works because aB = a(bP) = abP = baP = b(aP) = bA, and is hard to break because finding a given A = aP would require solving ECDLP.

29/3/17

;wip: slides up to the one titled Bitcoins

The big advantage of Bitcoin is that it's decentralized (so it can't be controlled by any single organization, and anyone can use it without a credit history), transactions can't be reversed, and transaction fees are pretty low.

A transaction is a transfer of coins from one user to another. Every transaction is broadcast to all users.

The Bitcoin network is organized as peer-to-peer network. Seed nodes hardcoded into most clients can ensure new clients can find peers, without already knowing any.

Every approximately 10 minutes, a block is created, containing all of the transactions that took place since the last block was created. The block gets added to a block chain - a sort of linked list of blocks where all links are cryptographically verified by hashing. Blocks can only be created by solving a cryptographic challenge that's designed to take 10 minutes on the fastest available computers - a proof of work. The software adjusts automatically to ensure that the proof of work takes around 10 minutes.

New Bitcoins are created by miners, who earn them by putting in computational power to verify transactions and help create blocks.

The main cryptographic primitives in Bitcoin are SHA256 and ECDSA, using the secp256k1 elliptic curve.

Each user picks a random 1 \le a \le q - 1 and then computes the elliptic curve point A = aP. The user's ECDSA private key is a, while the user's public key is A. Users are identified by public key - a given user can choose to generate more key pairs whenever they want.

A coin in Bitcoin is not really a particular thing, like a signed message in Chaum's electronic cash protocol. If Alice wants to give 2.5 coins to Bob, Alice generates the transaction T_{A, B} = \mathrm{sign}_A(\tup{\mathrm{hash}(T_{X, A}), A, B, 2.5}, where T_{X, A} is the transaction in which Alice obtained the coin she wants to give away, A and B are Alice and Bob's public keys, and then broadcasts the transaction to the Bitcoin network.

A coin can therefore be thought of as the chain of transactions, composed of all the transactions that involve that coin. Since transactions are public, anyone can go and verify the chain of ownership for the coin.

Satoshi Nakamoto made the first transaction in 2009, generating the first coins. This transaction is hardcoded into the software.

If an attacker wanted to do anything besides follow the exact rules of the currency, they need to modify the blockchain. This requires significant proof-of-work to make sure that ;wip.

For example, suppose Alice tries to double-spend a coin by sending the same coin to both Bob and Chris.

To form a block, some users will collect all transactions created in a certain period of time, and verify that they're correct. They'll then try to find a nonce such that hashing the the previous block, the user's public key, the nonce, and all of those transactions together will give a hash value that starts with t zeros, where t is decided by the network itself. This is hard because we need to potentially do a lot of hashes before we find a nonce that gets such a hash value.

Users will accept a block if all transactions in it are valid, the coins in those transactions aren't already spent, and the hash value starts with t zeroes. Users show that they accept the block by using it as the previous block when generating new ones.

31/3/17

It is possible for two different blocks to be mined by different people around the same time. This causes a fork in the blockchain, because both of them are, at the time of the fork, equally valid. The Bitcoin protocol requires that users take the longest current chain, and then continue growing that chain - the users trust the chain that was the hardest to generate, because there was more computational effort expended to create it. The basic idea behind taking the longest known chain, is that in order to control the blockchain, an attacker must continually generate blocks faster than the legitimate miners, or else they would be overtaken by those miners and the legitimate miners' blocks would be accepted over the attacker's. That means to control the blockchain, one must have the majority of all computational power in the network. Most Bitcoin transactions become confirmed (i.e., the other party accepts the transaction as valid) after six new blocks are created (which takes about 60 minutes), since it's widely considered to be infeasible for an attacker to create 6 blocks faster than the legitimate part of the network.

Bitcoin incentivizes miners to be legitimate by rewarding miners for computing blocks. The reward automatically halves every four years, to make sure the currency has a fixed upper bound, and the work factor (number of preceding 0's required in the hash of the block) increases every 2016 blocks, to ensure it always takes about 10 minutes to mine a block.

The current work factor is 70 0's, which means the Bitcoin network as a whole is doing around 2^{70} hash operations every 10 minutes.

Mining pools are groups of computers that work together to mine blocks, sharing the reward in some fair way when a block is mined. iMiners in mining pools can prove that they actually tried to mine the block by giving the nonce that gave the longest prefix of 0's seen so far. For example, a nonce that gives a prefix of 53 0's in the block hash proves to the mining that the miner did about 2^{53} hash operations, and should be rewarded accordingly.

Bitcoin doesn't have instant transactions - if the merchant trusts a transaction right after it's made, the customer can quite easily double-spend the coins. Bitcoin is secure as long as the total hashing pwoer of the legitimate users in the network exceeds the malicious users' hashing power. Bitcoin doesn't have payer anonymity or privacy - all transactions are public, though Bitcoin addresses don't need to be associated with real-world identities.

Bitcoin transactions are actually not just from one address to another. A transaction can be from a number of different sources to a number of different recipients - in fact, if you pay 2 BTC to someone out of an address with 6 BTC, you're actually making a transaction that's sending yourself 4 BTC and the other person 2 BTC. Another common use case for multiple recipients is to implement transaction fees - sending some BTC in your transaction to the miner, to incentivise them to include your transaction in their block to mine.

In practice, the tough part of using Bitcoin is securely storing coins - having a copy of a wallet means having access to all of the coins in it.

After Bitcoin came out, blockchain-based technology really took off. Ethereum is a blockchain-based network that supports full smart-contract functionality - Turing-complete programs that run on Ethereum, the advantage being that the code couldn't be modified or stopped unless the attacker has the majority of all computing power. Smart contracts are useful for things like lotteries, crowdfunding, and democratic autonomous organizations (DAOs).

Creative Commons License This work by Anthony Zhang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Copyright 2013-2017 Anthony Zhang.