# CS341

Algorithms.

Doug Stinson
Section 001
https://www.student.cs.uwaterloo.ca/~cs341/

# 14/9/15

Most materials will be on LEARN. Questions and answers can be posted on Piazza.

Study of the design and the analsis of algorithms - in particular, the correctness (proved via formal proofs) and efficiency (proved using time complexity analysis).

Useful metrics for algorithms are asymptotic complexity (best case/average case/worst case time/space complexity), number of specific computations used (like comparisons in sorting algorithms), whether an algorithm is the most efficient for a particular problem, etc.

Interesting things to know are lower bounds on possible algorithms to solve a particular problems, problems that cannot be solved by any algorithm (undecidability), and problems that cannot be solved efficiently by any algorithm, but can be solved (NP-hardness).

Useful design strategies we will go over are divide and conquer, greedy algorithms, dynamic programming, breadth-first and depth-first search, local search, and linear programming.

An example of an algorithm design is the maximum problem: given an array of integers A, find the maximum integer in A. A simple solution to this is as follows:

def find_max(array):
current = array[0]
for elem in array[1:]:
if elem > current: current = elem
return elem

This seems to be obviously correct, but we can also use program verification to prove it correct. In this case, we will use induction to prove the loop invariant - at the end of each iteration of the loop, current is equal to the largest element encountered so far in the array. We'd check the base case for arrays of length 1, then prove the inductive hypothesis to formally verify the program.

Clearly, this algorithm is \Theta(n), since it loops over the entire array once. Formal time complexity analysis can also be done by noting that the loop body takes \Theta(1) time, and the loop runs \Theta(n) times.

We also know that this algorithm is asymptotically optimal, at least in terms of the number of comparisons done. Since a correct solution to the Maximum problem must look at every element in the array, it must therefore do at least n - 1 comparisons. Since each comparison takes \Theta(1) time, the best possible algorithm must take \Omega(n) time. Let's formally prove this:

Suppose there was an algorithm that could determine the maximum element of an array using fewer than n - 1 comparisons.
Let G be a graph such that each vertex in G corresponds to an element in A, and each comparison done by a run of the algorithm between any two elements results in an edge between those two elements.
Clearly, there are n - 2 edges or less, and n vertices.
Therefore, there are at least 2 components in G, since the graph cannot be connected.
Clearly, the solution could be in any of the components, and can only be found by comparing them.
Therefore, the algorithm cannot exist.

An algorithm to find the minimum and the maximum element of an array can also trivially be designed, using 2n - 2 comparisons and \Theta(n) time. However, it is possible to design another algorithm that does fewer than 2n - 2 comparisons (though the time complexity might be worse).

One way to do this is to consider elements two at a time - we compare elements one pair at a time, the larger of the pair with the maximum, and the smaller of which with the minimum. That means we need 3 comparisons per pair, or \ceil{\frac 3 2 n} - 2 in total, a ~25% improvement:

def find_min_max(array):
if array[0] < array[1]: min, max = array[0], array[1]
else: max, min = array[0], array[1]
for i in range(2, len(array), 2):
if array[i] < array[i + 1]: small, big = array[i], array[i + 1]
else: big, small = array[i], array[i + 1]
if big > max: max = big
if small < min: min = small
if len(array) % 2:
if array[-1] > max: max = array[-1]
if array[-1] < min: min = array[-1]

It can actually be proven, using graph theory, that the number of comparisons for this algorithm is optimal. However, it's too lengthy and complicated to cover here.

Suppose we want to determine whether and which three elements of an array of integers sum to 0. This is pretty easy to do in O(n^3) (specifically, n \choose 3 runs of the inner loop) simply by checking all possible triples in the array. Basically, we pick two elements A[i] and A[j], then try to search for a A[k] = -(A[i] + A[j]) using linear search. However, it is actually possible to do so in O(n^2 \log n) by sorting the array first, and searching for k using binary search instead.

In fact, it's possible to do even better by sorting the array, taking each A[i], and searching from both ends of the array inward for j and k such that A[j] + A[k] = -A[i]. After each step in the search, we either move inward from the left if A[j] + A[k] < -A[i], or inward from the right if A[j] + A[k] > -A[i]

# 16/9/15

The pseudocode for the above algorithm looks like the following:

def better_3sum(array):
array = sorted(array)
result = []
for i in range(n - 2):
j, k = i + 1, n
while j < k:
triple_sum = array[i] + array[j] + array[k]
if triple_sum < 0: j += 1
elif triple_sum > 0: k -= 1
else:
result.append((i, j, k))
j += 1
k -= 1

Basically, this goes through each value of i, and does an O(n) search for two values that would make it possible to have all three sum to 0. Proving this correct is left as an exercise to the reader - prove that k - j is monotonically decreasing, and the pairs cover all possible pairs that can possibly sum up to A[i]. It's somewhat reminiscent of the array merge in merge sort.

This algorithm is O(n^2), since the search is O(n^2) and the sort is O(n \log n). There are actually even better algorithms, and the best currently known ones are O\left(n^2 \left(\frac{\log \log n}{\log n}\right)^2\right).

A problem is a computational task. A problem instance is the input for the computational task. The problem solution is the output. The size of a problem instance is a positive integer that is a measure of the size of the instance, and depends on the problem itself. The size is usually fairly intuitive, such as the size of an array for array sum, but sometimes it gets a bit more tricky.

An algorithm is a high level description of a computation. An algorithm solves a problem if it finds a valid solution for any instance in finite time. A program is an implementation of an algorithm using a computer language.

## Time Complexity Analysis

The running time of a program M for a problem instance I is T_M(I). The worst case running time for problem instances of size n for the program is T_M(n). The average case running time is T_M^{avg}(n).

The worst case time complexity of an algorithm A is f(n) such that a program M exists implementing A such that T_M(n) \in \Theta(f(n)).

Running time can only be found by running the program on a computer. Complexity is independent, but we lose information such as the constant factors.

To get the time complexity of an algorithm, we can either use \Theta(f(n)) throughout our analysis, or prove O(f(n)) and then \Omega(f(n)), which is often easier due to being able to make more assumptions for each separate proof.

The complexity of a loop in code is sum of the complexity of its body over each iteration of the loop. The complexity of consecutive operations is the sum of the complexities of the operations they are composed of.

Prove that (\ln n)^a \in o(n^b) for any a and b:

Let L = \lim_{n \to \infty} \frac{(\ln n)^a}{n^b} \lH \lim_{n \to \infty} \frac{a (\ln n)^{a - 1}}{b n^b} \lH \ldots \lH \lim_{n \to \infty} \frac{a!}{b^a n^b} = 0.
By the limit order rule, (\ln n)^a \in o(n^b).

# 21/9/15

Useful to know:

• \sum_{i = 0}^{n - 1} (a + di) = na + \frac{dn(n - 1)} 2 - arithmetic sequence
• \sum_{i = 0}^{n - 1} ar^i = \begin{cases} a\frac{r^n - 1}{r - 1} &\text{if } r \ne 1 \\ na &\text{if } r = 1 \end{cases} - geometric sequence
• \sum_{i = 0}^{n - 1} (a + di)r^i = \frac a {1 - r} - \frac{(a + (n - 1)d)r^n}{1 - r} + \frac{dr(1 - r^{n - 1})}{(1 - r)^2} -arithmetic-geometric sequence
• \sum_{i = 1}^n \frac 1 {i^2} = \frac{\pi^2} 6 = \Theta(1)
• _n = \sum_{i = 1}^n \frac 1 i \in \Theta(\log n) - harmonic sequence

The order of n! is \Theta(n^2 \sqrt n e^{-n}). This is similar to Stirling's approximation, \sqrt{2 \pi n} n^n e^{-n} (this gets more and more accurate for larger values).

For order notation, the base of logarithmic terms don't matter because \log_{10} n = \frac 1 {\log_b 10} \log_b n, and \frac 1 {\log_b 10} is a constant factor.

Polynomial growth rates are those in O(n^k), k \in \mb{R}. For example, the best known algorithm for graph isomorphism is n^{\sqrt n \log_2 n}, so it is not polynomial.

# 23/9/15

Consider the following program:

def useless(n):
for i in range(n):
j = i
while j >= 1:
j /= 2

Clearly, each iteration of the inner loop will run \Theta(\log n) times, so the overall time complexity is \sum_{i = 1}^n \Theta(\log i) = \Theta(\sum_{i = 1}^n \log i) = \Theta(\log(\prod_{i = 1}^n i)) = \Theta(\log(n!)) = \Theta(n \log n).

A recurrence relation specifies a_n in the infinite sequence of real numbers a_1, a_2, a_3, \ldots in terms of the previous terms a_1, \ldots, a_{n - 1} (a_1 must therefore be a constant - the initial value). A solution to a recurrence relation is a closed form formula for a_n - one that does not depend on the previous values of a_1, \ldots, a_{n - 1}. These are generally solved using techniques like guess and check or recursion trees, but there aren't any methods that are guaranteed to solve all recurrences, and in fact, reurrences may not necessarily even have solutions.

Recurrence relations can also be written using function notation, like T(1) = 2, T(n) = T(n - 1) + 1.

The guess and check method involves computing enough a_n instances to guess the form of the solution (without any constants). Then, we solve for the constants using the computed values, and check if our solution is correct - if it is, we must prove it is correct using induction, and if not, we start over with a different guess.

Solve a_0 = 4, a_n = a_{n - 1} + 6n - 5:

Guess and check: the first few elements of the sequence are a_0 = 4, a_1 = 5, a_2 = 12, a_3 = 25, a_4 = 44.
This seems to be quadratic growth, so we guess an^2 + bn + c, a, b, c \in \mb{R} as the solution.
Solving for a, b, c, c = 4, a + b + c = 5, 4a + 2b + c = 12, we get a = 3, b = -2, c = 4.
So we guess that a_n = 3n^2 - 2n + 4. We can verify this for 0 \le n \le 4 as our inductive base case.
Assume for some k \in \mb{N} that a_k = 3k^2 - 2k + 4.
Clearly, a_{k + 1} = 3k^2 - 2k + 4 + 6(k + 1) - 5 = 3(k + 1)^2 - 2(k + 1) + 4.
So by induction, a_n = 3n^2 - 2n + 4.

The recurrence tree method is often used for divide-and-conquer algorithms. This technique involves building a call tree for the recurrence. then summing up the results of the function on each level of the tree.

To construct a recurrence tree, we start with the timing function T(n), then add children for each recursive call. The recursive calls are then removed from the parent. This is repeated for the children, all the way until we reach the base case. Note that after each step of growing the tree, the sum of all the nodes are always equivalent to T(n).

Then, we sum the nodes at each level in the tree. The sum of all these sums is equal to T(n).

Use the recurrence tree method to find the running time of merge sort:

Clearly, T(n) = \begin{cases} 2T\left(\frac n 2\right) + cn &\text{if } n > 1 \wedge n \text{ is a power of 2} \\ d &\text{if } n = 1 \end{cases} where c, d are constants.
Construct a recursion tree: start with the root node N with value T(n), then add 2 children, N_1, N_2, both T\left(\frac n 2\right), and replace N's value with cn.
What we get is a binary tree with every internal node having 2 children, and value c\frac n {2^i} (where i is the level in the tree, where 0 is the root level), and each leaf node has value d.
Let n = 2^j. Clearly, the height of the tree must then be j + 1, with j interior levels. Clearly, at the bottom level there are 2^j leaf nodes, since each level i has 2^{j - i} nodes. The sum of the bottom level is therefore 2^j d = dn.
Clearly, at each internal level i each node has value c\frac n {2^i}. Since there are 2^i nodes, the sum of each interior level is c2^i\frac n {2^i} = cn.
So T(n) = dn + jcn = dn + cn \log_2 n = \Theta(n \log n).

The master theorem is a generalized formula for solving certain forms of recurrences. One thing it says is that given T(n) = aT(\frac n b) + \Theta(n^y) where a \ge 1, b > 1, and n is a power of b, then T(n) \in \begin{cases} \Theta(n^x) &\text{if } y < x \\ \Theta(n^x \log n) &\text{if } y = x \\ \Theta(n^y) &\text{if } y > x \end{cases} where x = \log_b a.

Also, the exact value is T(n) = da^j + cn^y \sum_{i = 0}^{j - 1} \left(\frac a {b^y}\right)^i, or more simply T(n) = dn^x + cn^y \sum_{i = 0}^{j - 1} r^i where x = \log_b a and r = \frac a {b^y} = b^{x - y}.

This can be proved by constructing the recursion tree, figuring out the geometric sequence for the tree sums, and then solving for its value for each of the three cases.

A more general version, which will not be proved here, is that given T(n) = aT(\frac n b) + f(n) where a \ge 1, b > 1, and n is a power of b, then T(n) \in \begin{cases} \Theta(n^x) &\text{if } f(n) \in O(n^{x - \epsilon}) \text{ for some } \epsilon > 0 \\ \Theta(n^x \log n) &\text{if } f(n) \in \Theta(n^x) \\ \Theta(f(n)) &\text{if } \frac{f(n)}{n^{x + \epsilon}} \text{ is an increasing function of } n \text{ for some } \epsilon > 0 \text{ after } n \text{ is greater than some } n_0 \end{cases} where x = \log_b a.

# 28/9/15

Solve T(1) = 1, T(n) = 3T(\frac n 4) + n \log n:

Let a = 3, b = 4, f(n) = n \log n, so x = \log_4 3 and f(n) / n^{x + \epsilon} is an increasing function of n for \epsilon = 1 - \log_4 3 since f(n) / n^{x + \epsilon} = \log n.
So T(n) \in \Theta(n \log n), by the master theorem.

The main task when using the master theorem to prove time complexities is choosing the right \epsilon to make one of the cases valid. For example, the master theorem cannot be used for something like T(1) = 1, T(n) = 2T(\frac n 2) + n \log n, since none of the cases apply. Instead, we can use the recurrence tree method to solve the recurrence directly:

Each level i (from the bottom) of the tree has 2^i nodes, each with value 2^{j - i}(j - i) where n = 2^j.
So at each level, the sum is 2^i 2^{j - i} (j - i) = (j - i) 2^j.
So the sum of the tree is \sum_{i = 0}^j (j - i) 2^j = j^2 2^j - 2^j \sum_{i = 0}^j i = 2^j\left(j^2 - \frac{j(j + 1)}{2}\right) = \Theta(n \log^2 n).

## Divide and Conquer

Divide and conquer is a design strategy for algorithms where we divide a problem into smaller subproblems, solve those individually, then combine them back together to get a good result. Formally:

• Divide the problem instance I into smaller subproblems I_1, \ldots, I_n.
• Solve I_1, \ldots, I_n recursively to get sollutions S_1, \ldots, S_n.
• Use S_1, \ldots, S_n to compute S - combine the results. This is usually the hardest step.

For example, mergesort splits an array into two subarrays, and then recursively sorts those. Then, the merge is performed to combine the two sorted subarrays.

The time complexity of this is T(n) = \begin{cases} T(\ceil{\frac n 2}) + T(\floor{\frac n 2}) + cn &\text{if } n > 1 \\ d &\text{if } n = 1 \end{cases} for some constants c, d. This is called the exact recurrence. If we remove the ceilings and floors, we get a sloppy recurrence T(n) = \begin{cases} 2T(\frac n 2) + cn &\text{if } n > 1 \\ d &\text{if } n = 1 \end{cases}, which only works the same when n is a power of 2, but is a lot simpler to work with.

The master theorem can prove that mergesort is \Theta(n \log n) for the sloppy recurrence, and therefore only for when n is a pwoer of 2. However, if we want to get the time complexity for all n, we need to use induction on the exact recurrence.

# 30/9/15

### Non-dominated points

Suppose we have a set of 2D coordinates/points. One point dominates another if it has a greater or equal X and Y coordinate than the other. We want to find those points that are non-dominated - the points that are not dominated by any other point. The non-dominated points form a sort of boundary below which all the dominated points lie.

Clearly, this can easily be solved simply by checking every pair of points to see which dominate which, but this would take \Theta(n^2) time.

A better solution is to sort the points by their X coordinate, and then apply divide and conquer.

Dividing larger sets of points S is also easy - we can just split the sorted points into two sets S_1, S_2, and then recursively solve the problem. It is the combining step that is the difficult step.

Given a point and the highest point on its right, it is easy to determine whether it is dominated. This is our base case

Clearly, no point in S_1 can dominate a point in S_2, since the X coordinates of points in S_1 is strictly less than points in S_2. So we find the point in S_2 with the highest Y coordinate, and all points in S_1 with Y coordinate at or below this value are guaranteed to be dominated by at least one point in S_2. In this way, we can eliminate all the points in S_1 that are dominated by points in S_2 in O(n) time - this is the combining step.

In code, this looks like:

def non_dominated(points):
points = sorted(points, key=lambda p: p.x)
if len(points) == 1: return {points[0]}
pivot = floor(len(points) / 2)
left = non_dominated(points[:pivot])
right = non_dominated(points[pivot + 1:])
i = 0
while i < len(left) and left[i].y > right[1].y:
i += 1
return left[:i] + right

Clearly, the time complexity of this algorithm is T(n) = T\left(\floor{\frac n 2}\right) + T\left(\ceil{\frac n 2}\right) = O(n \log n) (the sloppy version of this can be proven using the master theorem).

Without divide and conquer, we could also just sort the list, and then iterate though it from right to left, keeping track of the largest Y axis coordinate seen so far and eliminating points at or below this Y coordinate:

def non_dominated(points):
largest = float("-inf")
for point in reversed(sorted(points, key=lambda p: p.x)):
if point.y <= largest:
yield point
else:
largest = point.y

### Closest pair

Suppose we have a set Q of n distinct 2D points in Euclidean space. We want to find the two points that are closest together - the points u, v for which \sqrt{(u_x - v_x)^2 + (u_y - v_y)^2} is minimised.

Clearly, minimising \sqrt{(u_x - v_x)^2 + (u_y - v_y)^2} is the same as minimising (u_x - v_x)^2 + (u_y - v_y)^2. We will again sort the points in Q by X coordinate, taking \Theta(n \log n) time.

The dividing step is again trivial to get two partitions Q_1, Q_2, but the combining step takes some work.

If we are given the distance \delta between the closest pair of points seen so far in the left and right partitions, and a point p, we know that if this point is in the closest pair of points, the other point must have X coordinate within \delta of this point's X coordinate. Therefore, the other point must be in the critical strip - a section of the plane from p_x - \delta to p_x + \delta and extending infinitely both up and down.

We now want to find the closest pair between the left and right partitions, if it is less than the .

There is also a lemma that says that if a critical strip R of Q is sorted by Y coordinate, and R[j] and R[k] have distance less than \delta where j < k, then k \le j + 7. Here, \delta is the smaller of the minimum distance between points in Q_1, and the minimum distance between points in Q_2. This can be proven by dividing the strip around the point as squares of side length \frac \delta 2, and then showing that no two points can be in the same square. In other words, within a critical strip and given \delta, the point closest to a given point can be determined in 7 comparisons or less.

def closest_pair(points, start, end):
if start == end: return float("inf")
pivot = floor((left + right) / 2)
closest_left = closest_pair(points, left, pivot)
closest_right = closest_pair(points, pivot + 1, right)
closest = min(closest_left, closest_right)
candidates = select_candidates(left, right, closest, points[pivot].x) # find points in the critical strip
candidates = sorted(candidates, key=lambda p: p.y)
closest = check_candidates(candidates, closest)
return closest

The time complexity of this is O(n \log n \log n). We can improve this by eliminating the Y coordinate sort of the candidates. One of them is to sort all the points Q by Y coordinate to begin with, and then use that list when checking candidates, and then adding the necessary checks that make it work. Alternatively, we could replace the Y coordinate sort with a merge, and maintaining the candidates in sorted order.

def closest_pair(points, start, end):
if start == end: return float("inf")
pivot = floor((left + right) / 2)
closest_left = closest_pair(points, left, pivot)
closest_right = closest_pair(points, pivot + 1, right)
closest = min(closest_left, closest_right)
merge_in_place(points, start, pivot, end)
candidates = select_candidates(left, right, closest, points[pivot].x) # find points in the critical strip
closest = check_candidates(candidates, closest)
return closest

# 5/10/15

The size of an integer is the number of bits that we use to represent it, rather than the value of the integer itself - this is basically just \ceil{\log_2 n}. The bit complexity of algorithms operating on integers is therefore the complexity as a function of the sizes of those integers.

## Multiplication

### Integer multiplication

Suppose we have 2 positive binary integers, X = [X_{k - 1}, \ldots, X_0] and Y = [Y_{k - 1}, \ldots, Y_0]. We want to compute the 2k-bit product Z = [Z_{2k - 1}, \ldots, Z_0]. Naively, we can use the simple binary multiplication algorithm, to shift Y by each position 0 \le i < k and add those shifted values for which X_i = 1, in O(k^2). Note that the fastest way to add two integers is \Theta(k).

We can do better than this by using divide and conquer, by using XY = (2^{\frac k 2} X_L + X_R)(2^{\frac k 2} Y_L + Y_R) = 2^k X_L Y_L + 2^{\frac k 2}(X_L Y_R + X_R Y_L) + X_R Y_R, then solving for those four multiplications between X_L, X_R, Y_L, Y_R (multiplying the powers of 2 are simply shifts, so they're basically free):

def multiply(x, y):
if len(x) == 1: return x[0] * y[0]
x_left, x_right = x[:len(x) // 2], x[len(x) // 2:]
y_left, y_right = y[:len(y) // 2], y[len(y) // 2:]
z_top = multiply(x_left, y_left)
z_middle = add(multiply(x_left, x_right), multiply(x_right, y_left))
z_bottom = multiply(x_right, y_right)
shift_left(z_top, len(x)),
shift_left(z_middle, len(x) // 2),
z_bottom
)
return z

Using the master theorem, we can show that this is takes \Theta(n^2) time - no improvement over the naive algorithm. However, this is a good base for an improved version.

Consider (X_L + X_R)(Y_L + Y_R) = X_L Y_L + X_L Y_R + X_R Y_L + X_R Y_R. So X_L Y_R + X_R Y_L = (X_L + X_R)(Y_L + Y_R) - X_L Y_L - X_R Y_R. Since we need to calculate X_L Y_L and X_R Y_R anyways, we can compute X_L Y_R + X_R Y_L with just one multiplication by computing (X_L + X_R)(Y_L + Y_R).

So using this:

def multiply(x, y):
if len(x) == 1: return x[0] * y[0]
x_left, x_right = x[:len(x) // 2], x[len(x) // 2:]
y_left, y_right = y[:len(y) // 2], y[len(y) // 2:]
z_top = multiply(x_left, y_left)
z_bottom = multiply(x_right, y_right)
shift_left(z_top, len(x)),
shift_left(z_middle, len(x) // 2) - z_top - z_bottom,
z_bottom
)
return z

Now, we only solve 3 subproblems instead of 4. This gives us a time complexity of around \Theta(k^{\log_2 3}), or about \Theta(k^{1.59}).

Even better algorithms exist. One is Toom-Cook, which splits X and Y into 3 pieces each, and solves 5 subproblems, all in O(k^{\log_3 5}). The Stronhage-Strassen method takes O(k \log \log \log k) time, using Fourier transforms. The Furer method is the most advanced as of the present, using O(k \log k 2^{3 \log^* k}) time where \log^* k is the inverse Ackermann function.

### Matrix multiplication

We will now assume that integer multiplication can be done in constant time, to avoid having to consider bit complexity.

Suppose we have two n by n matrices A, B. Clearly, the naive algorithm for multiplying them is \Theta(n^3), since we just take the dot product of the corresponding row of A and the corresponding column of B - the dot product takes \Theta(n), and there are \Theta(n^2) cells in the result.

We can use the same technique as for integer multiplcation - we break each n by n matrix into 4 \frac n 2 by \frac n 2 matrices, multiply those, then combine those together, then again try to remove subproblem calculations where possible.

If we split A, B into 4 blocks A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, B = \begin{bmatrix} e & f \\ g & h \end{bmatrix}, then AB = \begin{bmatrix} ae + bg & af + bh \\ ce + dg & cf + dh \end{bmatrix}. If we use divide and conquer to solve these 8 multiplications in the block version, it still takes \Theta(n^3). However, the problem is now in a form that is useful for optimizing, by reducing the number of sub-problems.

Strassen's method notes that if P_1 = a(f - h), P_2 = (a + b)h, P_3 = (c + d)e, P_4 = d(g - e), P_5 = (a + d)(e + h), P_6 = (b - d)(g + h), P_7 = (a - c)(e - f), then ae + bg = P_5 + P_4 - P_2 + P_6, af + bh = P_1 + P_2, ce + dg = P_3 + P_4, cf + dh = P_5 + P_1 - P_3 - P_7.

So AB = \begin{bmatrix} P_5 + P_4 - P_2 + P_6 & P_1 + P_2 \\ P_3 + P_4 & P_5 + P_1 - P_3 - P_7 \end{bmatrix}, and since there are only 7 multiplications needed to calculate P_1, \ldots, P_7, we only have 7 sub-problems to solve. As a result, the algorithm is \Theta(n^{\log_2 7}) or around \Theta(n^{2.81}), since the recurrence is T(n) = 7T\left(\frac n 2\right) + \Theta(n^2) (solve this using the master theorem).

Research into this problem has resulted into better algorithms. The current state of the art is the Coppersmith-Winograd algorithm, taking around O(n^{2.373}) time.

### Selection

Suppose we have an array A of size n, and we want to find the kth smallest integer, where the smallest element is found when k = 1. The median algorithm is a special case of this where k = \ceil{\frac n 2}.

Naively, we could just sort the array and pick the kth element, all in \Theta(n \log n) time. Another naive approach is to use a modified selection sort (repeatedly finding the smallest element in the remaining part of the array, and putting it at the end of the sorted array), where we stop after k moves, all in \Theta(kn) time - this is linear if k is constant, but quadratic if it is not. Combining the two, we can use a modified heapsort, which can solve the problem in \Theta(n + k \log n) time.

# 7/10/15

There's a make-up lecture on November 21st or so.

Using divide and conquer, we might do something similar to quicksort, called quickselect - we pick a pivot, partition the elements, and then recursively quickselect within the partition that contains the desired element:

def quickselect(array, k):
pivot = array[0]
partition1, partition2 = [x for x in array if x < pivot], [x for x in array if x > pivot]
if k - 1 == len(partition1):
return pivot
elif k - 1 < len(partition1):
return quickselect(partition1, k)
else:
return quickselect(partition2, k - len(partition1) - 1)

Note that unlike quicksort, there's only one recursive call in each invocation. On average, the pivot will be good about 50% of the time, where a good pivot is one that is within the 25th and 75th percentile inclusive. So on average, we would expect that every two calls, we would get a good pivot. So on average, every other call would have partitions that are of size at most 75% of the array's size. Therefore, on average the size of the subproblem would be reduced by 25% on every other call, which makes this function \Theta(n) average case.

Note that in the worst case, the pivot would be at one of the extremes of the percentiles, and so the subproblem would only be reduced by a constant amount, and there are n calls. Therefore, the function is \Theta(n^2) worst case.

However, if we could always choose a good pivot, then we can ensure that the function is \Theta(n) worst case as well. Doing so is a bit more clever, but it is possible, using the median of medians algorithm. First, we assume that there are at least 15 elements, so n = 10r + 5 + \theta, where r \ge 1, 0 \le \theta \le 9.

We now split A into subarrays of exactly 5 elements: B_1, \ldots, B_{2r + 1} (we ignore the last \theta elements entirely here), and then find the median of each B_i, as m_i. Then, we use a selection algorithm to find the median of this array of medians.

We claim that y is always a good pivot - that is is guaranteed to be between the 25th and 75th percentile inclusive. We will prove this later, but first, we will integrate it into quickselect:

def median_of_medians_quickselect(array, k):
# generate a good pivot
if len(array) < 15: return sorted(array)[k]
write $n = 10r + 5 + \theta$
construct $B_1, \ldots, B_{2r + 1}$
find medians $M = [m_1, \ldots, m_{2r + 1}]$
pivot = median_of_medians_quickselect(M, r + 1)

# do the rest of the steps from quickselect
partition1, partition2 = [x for x in array if x < pivot], [x for x in array if x > pivot]
if k - 1 == len(partition1):
return pivot
elif k - 1 < len(partition1):
return quickselect(partition1, k)
else:
return quickselect(partition2, k - len(partition1) - 1)

We have two recursive calls now, since the median of medians algorithm also required us to do a selection to find pivot (y).

There are r + 1 medians at or below y in m_1, \ldots, m_{2r + 1} - let these be represented m_{i_1}, \ldots, m_{i_{r + 1}}. Clearly, in each B_{i_1}, \ldots, B_{i_{r + 1}}, there are at least 3 out of the 5 elements that are less than or equal to y (since they are less than or equal to the median). Therefore, there must be at least 3(r + 1) elements in total that are less than or equal to y. Since there are 10r + 5 elements (since we ignored \theta), the ratio of elements below y to the total number of elements is \frac{3(r + 1)}{10r + 5}, or basically \frac 3 {10}. Therefore, 30% or more of the elements must be less than or equal to y.

Using a similar argument, but with the elements at or above y rather than at or below, we can prove that 30% or more of the elements must be greater or equal to y. Therefore, y is within the 30th to 70th percentile inclusive.

Therefore, the sloppy recurrence is T(n) \le T(\frac n 5) + T(\frac{7n}{10}) + \Theta(n), which we can solve using the recurrence tree method - each node has two children, which are the node's value times \frac 1 5 and \frac 7 {10}, respectively, and the root node's value is n. When we draw the tree, we see that each level i from the root (level 0) has sum \left(\frac 9 {10}\right)^in, and the sum of all levels to infinity converges to 10n, so T(n) \in \Theta(n).

The exact recurrence for this is T(n) \le T(\floor{\frac n 5}) + T(\floor{\frac{7n + 12}{10}}) + \Theta(n). This accounts for the rounding down in the partitioning, and accounts for the last \theta values we were ignoring earlier.

## Greedy Algorithms

Greedy algorithms are often used for optimization problems, which are problems where the goal is to find a solution that both satisfies the problem constraints, and maximizes/minimizes an objective/profit/cost function.

Feasible solutions are those that satisfy the problem constraints, and given a problem instance I, are represented as \mathrm{feasible}(I). The objective function is in \mathrm{feasible}(I) \to \mb{R}.

For greedy algorithms, it is generally the proof of correctness that is the hard part. Greedy algorithms build up partial solutions, making locally optimal decisions at each step, with the goal of getting a globally optimal result.

A feasible solution should be writeable as an n-tuple \tup{x_1, \ldots, x_n}. A partial solution is an i-tuple \tup{x_1, \ldots, x_i} where i < n and none of the problem constraints are violated (note that this is not necessarily a feasible solution). A partial solution isn't always extendible to a feasible solution - some partial solutions are dead ends, and greedy algorithms can't get an optimal solution here.

A choice set for a partial solution \tup{x_1, \ldots, x_i} is the set \mathcal{X} of all elements we can append to \tup{x_1, \ldots, x_i} to make another, longer partial or feasible solution \tup{x_1, \ldots, x_i, y} - the set of all possible y.

# 14/10/15

For any y in the choice set \mathcal{X}, we have a local evaluation criterion g(y), that measures the profit/cost of including y in the current partial solution. Extension of a partial solution is the process of adding the y with the highest g(y) value to the partial solution. A greedy algorithm, in general, is repeated extension until we have a feasible solution X.

Greedy algorithms do no backtracking or lookahead - they only consider the elements of the choice set at each step. That means they are often faster than algorithms that do backtracking, but the solution is possibly not as good. When implementing greedy algorithms, it is often efficient to do a preprocessing step to compute the local evaluation criterion, like sorting the elements of an array.

For some greedy algorithms, it is possible to always obtain an optimal solution. However, these proofs are often rather difficult.

Suppose we have a set A of pairs representing intervals. What is the largest subset B of these pairs such that none of the intervals intersect?

One greedy strategy might be to repeatedly choose the interval with the earliest start time that doesn't intersect with any intervals in the current partial solution. Another might be to choose the interval with the smallest duration that doesn't intersect. A third is to choose the interval with the earliest end time that doesn't intersect.

Clearly, the first one is not valid because we can make a counterexample: an interval that starts early but is really long is chosen over multiple non-intersecting short intervals that start a bit later, which means this strategy doesn't always give the optimal solution.

It is also possible to find a counterexample for the second strategy - a short interval that intersects with two non-intersecting longer intervals is chosen over those two, which means this strategy also doesn't always give the optimal solution.

The last strategy is correct. We can implement it as follows:

def select_interval(intervals):
intervals = sorted(intervals, key=lambda interval: interval[1])
solution = [intervals[0]]
latest_finish = intervals[0][1]
for interval in intervals[1:]:
if interval[0] >= latest_finish:
solution.append(interval)
latest_finish = interval[1]
return solution

Most correctness proofs for greedy algorithms use strong induction. The proof for the greedy interval selection algorithm is relatively straightforward:

Let A be an array of intervals, sorted by their end values.
Let \mathcal{B} = \tup{A_{i_1}, \ldots, A_{i_k}} where i_1 < \ldots < i_k be the greedy solution.
Let \mathcal{O} = \tup{A_{j_1}, \ldots, A_{j_k}} where j_1 < \ldots < j_l be the optimal solution.
Clearly, k \le l since O is the optimal solution, and it is impossible for the greedy solution to be better than optimal.
We want to first prove that i_m \le j_m for all 1 \le m \le k, which means that the ends of the intervals in the greedy solution are always before the ends of the corresponding intervals in the optimal solution.
For m = 1, i_1 \le j_1 since i_1 = 1 - the greedy algorithm always initially chooses the interval that ends earliest.
Assume i_{m - 1} \le j_{m - 1}. So in the greedy solution, the m - 1th interval ends no later than the m - 1th interval in the optimal solution.
Suppose i_m > j_m. Clearly, the optimal solution's mth interval ends earlier than the greedy algorithm's mth interval, since A is sorted by end values.
However, this is not possible, since that means the greedy algorithm didn't choose the earliest end time. So i_m \le j_m.
Suppose k < l. Clearly, the optimal solution must have at least k + 1 intervals.
Clearly, i_k \le j_k. So the optimal solution's k + 1th interval must be in the choice set for the greedy algorithm at the kth step.
However, this isn't possible since the greedy algorithm didn't choose that interval, so k \ge l.
Since k \le l and k \ge l, k = l and the greedy solution is the same length as the optimal solution, and so the greedy algorithm must be optimal.

The time complexity of this algorithm is easy to prove - it's \Theta(n \log n) for the sort and \Theta(n) for the traversal, so \Theta(n \log n) overall.

Suppose we want to partition A into disjoint sets of non-intersecting intervals, minimizing the number of these sets. This is called the interval coloring problem - we want to assign a color to each interval such that all intervals with a given color do not intersect each other.

An optimal greedy strategy for this is to sort the intervals by start values ascending, and for each interval in order, attempt to insert each interval into existing sets (assign it an existing color), and create a new set if not possible (assign it a new color):

def partition_intervals(intervals):
intervals = sorted(intervals, key=lambda interval: interval[0])
partitions = [{intervals[0]}]
partition_latest_finishes = [intervals[0][1]]
for i in range(1, len(intervals)):
interval = intervals[i]
for partition_index, latest_finish in enumerate(partition_latest_finishes): # try to assign interval to existing partitions
if latest_finish <= interval[0]:
partition_latest_finishes[partition_index] = interval[1]
break
else: # no existing colors can accept the interval
partition_index = len(partition_latest_finishes)
partitions[partition_index] = {interval}
partition_latest_finishes.append(interval[1])
return partitions

The correctness proof for this algorithm is a bit unusual:

Suppose the greedy algorithm uses D colors. Since we only create new colors if existing colors can't accept the interval, for every color c < D, there exists an interval A_c that overlaps A_i.
That means that there are at least D intervals that all overlap each other, since they all overlap A_i.
Therefore there must be at least D colors, which means that having D colors is optimal.

Clearly, this algorithm is O(n^2), or more specifically O(nD) since D is O(n). We can actually do better by using a different data structure for the inner loop - the inner loop simply is trying to find the lowest end value in a partition, so if we use a heap for the partitions instead, we can have the algorithm run in O(n \log D), or more simply, O(n \log n):

import heapq
def partition_intervals(intervals):
intervals = sorted(intervals, key=lambda interval: interval[0])
partitions = [{intervals[0]}]
partition_latest_finishes_heap = [(intervals[0][1], 0)]
for i in range(1, len(intervals)):
interval = intervals[i]
latest_finish, partition_index = heapq.heappop(partition_latest_finishes_heap)
if latest_finish <= interval[0]: # try to assign interval to the existing partition that ends earliest
heapq.heappush(partition_latest_finishes_heap, (interval[1], partition_index))
else: # no existing colors can accept the interval
partition_index = len(partition_latest_finishes_heap)
partitions[partition_index] = {interval}
heapq.heappush(partition_latest_finishes_heap, (interval[1], partition_index))
return partitions

# 19/10/15

Consider the knapsack problem: given profits P = [P_1, \ldots, P_n] and weights W = [W_1, \ldots, W_n], and a capacity M, all positive integers, find the x = \tup{x_1, \ldots, x_n} that maximises \sum P_i x_i given that \sum W_i x_i \le M.

There are several variations of this problem: for the 0-1 knapsack problem, x_i is either 0 or 1 (we either take the item or don't); in the rational knapsack problem, 0 \le x_i \le 1 (we can take a fraction of the item from none to all).

As it turns out, the rational knapsack problem can be solved greedily (but not other knapsack problems; 0-1 knapsack problems are NP-complete, for example) - we simply take as much as possible of each item, considering them in decreasing order of profit divided by weight:

def greedy_rational_knapsack(profits, weights, capacity):
items = sorted(zip(profits, weights), lambda x: x[0] / x[1])
current_weight = 0
for item in items: # consider items in decreasing order of profit to weight ratio
profit, weight = item
if current_weight + weight <= capacity: # we have enough space to take all of the item, so take it
result.append(1)
current_weight += weight
else: # we can't take all of the item, but we'll take enough to fill the rest of the knapsack
result.append((capacity - current_weight) / weight)
break
return result

Proof of correctness:

Suppose the greedy algorithm is not optimal. Then for certain inputs the output must differ from optimal algorithms.
Let the result of the greedy algorithm be G = [G_1, \ldots, G_n] and a result B = [B_1, \ldots, B_n] of all optimal algorithms for this problem such that B and G agree for as many items as possible, disagreeing at index d, so B = [G_1, \ldots, G_{d - 1}, B_d, \ldots, B_n].
The greedy algorithm always chooses the item with the highest profit per weight, so the actual profit from G_1, \ldots, G_d is at least as much as the actual profit of B_1, \ldots, B_d, because we will pick the most.
Construct B' = [B_1, \ldots, B_{d - 1}, G_d, B_{d + 1}, \ldots, B_n]. Clearly, B' is at least as profitable as B, but it agrees with G for 1 more element than B, a contradiction.
Therefore, the greedy algorithm is optimal.

Consider the coin changing problem: given a list of coin denominations d_1, \ldots, d_n and a positive integer T (target sum), find A = [A_1, \ldots, A_n] such that T = \sum a_i d_i and N = \sum A_i is minimised. In other words, we are trying to find the smallest set of coins that add up to a certain value.

We can do this simply by considering the coins from largest denomination to smallest, filling up the sum as close to the target sum as possible. Although this works for a coin system like 1, 5, 10, 25, 100, 200, this isn't always optimal: the coin system 12, 5, 1 doesn't give the optimal solution for something like a target sum of 15. The study of where these algorithms are optimal is an area of active research.

Consider the stable marriage problem: suppose we have sets of n men and n women, M = [M_1, \ldots, M_n], W = [W_1, \ldots, W_n]. Define \mathrm{pref}(M_i, k) is the kth favourite w \in W for M_i, and \mathrm{pref}(W_i, k) is the kth favourite m \in M for W_i. Find the matching \set{(M_{i_1}, W_{j_1}), \ldots, (M_{i_n}, W_{j_n})} such that there does not exist any (M_u, W_v) that are not in the set, but prefer each other to their matches in the set - a stable matching.

Formally, we want to find a matching such that

The Gale-Shapley algorithm is an algorithm for this problem, for which Gale and Shapely won a Nobel prize for economics. In this algorithm, men propose to women from their most preferable to least preferable, and women accept proposals if not engaged or if a proposal is better than their currently accepted proposal (the new proposal replaces the old one), and women reject proposals otherwise.

Basically, the men propose starting from their most preferable to least preferable, and the women are proposed to starting from the least preferable to most preferable - the men always move down in their preferences, and the women always move up in their preferences:

def woman_prefers(woman, preference_function, man1, man2):
i = 1
while True:
man = preference_function(woman, i)
if man == man1: return True
if man == man2: return False
i += 1

def gale_shapley(men, women, preference_function):
engagements = {}
men_ranks = {man: 0 for man in men}
unengaged_men = set(men)
while unengaged_men:
man = next(unengaged_men.keys())
men_ranks[man] += 1 # next rank in preferences
woman = preference_function(man, men_ranks[man])
if woman not in engagements or woman_prefers(woman, preference_function, man, engagements[woman]):
unengaged_men.remove(man)
engagements[woman] = man
return set(engagements.items())

This algorithm is good for the men, but not the women - suiters can ask any of the reviewers, but reviewers can only pick from the suiters that proposed to them. For example, given 3 men and 3 women with preferences M_1: W_2, W_1, W_3; M_2: W_3, W_2, W_1; M_3: W_1, W_3, W_2; W_1: M_2, M_1, M_3; W_2: M_3, M_2, M_1; W_3: M_1, M_3, M_2, all men get their first choices, while all women get their last choice. The men get the best possible match that will accept them, while the women get the worst possible match they will accept.

This algorithm always terminates because women cannot reject offers if unengaged, there is a man for every woman, and at some point a man will ask that woman. In the worst case, the men will all have their last choise, and have to propose to all n of their preferences. Since there are n men, the algorithm takes O(n^2) proposals, specifically n^2 - n + 1 or less.

Proof of correctness:

We want to prove that there is no man that prefers a woman that also prefers that man. Suppose Alice prefers Bob over her current partner, and Bob also prefers Alice over his current partner.
If Bob prefers Alice, then he must have proposed to her before his current partner.
Alice must have rejected the proposal, since Bob is engaged to a less preferable choice.
However, Alice is with a less preferable partner, which means Alice should have accepted Bob and rejected her current partner when he proposed.
This is a contradiction, so Alice and Bob cannot exist, and the matching is stable.

# 21/10/15

To implement the Gale-Shapely algorithm more efficiently, we can represent the set of unengaged men as a linked list, represent the men's preferences using arrays/linked lists, and represent engagements using an array. We could also precompute a lookup table for M_j = \mathrm{pref}(W_i, k) to look up j given W_i and M_j.

## Dynamic Programming

Consider a naive function to compute the nth Fibonacci number, based on its recurrence definition:

def fibinacci(n):
if n <= 1: return n
return fibinacci(n - 1) + fibinacci(n - 2)

However, this is extremely inefficient. Using the recurrence tree method, we find that it takes 2f_{n + 1} total calls of the function where f_n is the nth Fibonacci number. Using the closed form for the Fibonacci number, f_n = \frac{\phi^n - (-\phi)^{-n}}{\sqrt{5}} where \phi is the golden ratio, or around 1.62. Therefore, this function takes \Omega(\phi^n) time.

This function basically is inefficient because it computes the same thing over and over again - for example, for fibonacci(100) we will call fibonacci(10) millions of times, when it gives the same result each time.

The idea behind dynamic programming is to avoid recomputing the same thing over and over again, by keeping those results around and simply looking them up when necessary. As a result dynamic programming trades off memory for time.

For example, if we were to write the above function iteratively, we might start from the base case and build upward toward the result:

def fibinacci(n):
saved = [0, 1]
for i in range(2, n + 1):
saved[i] = saved[i - 1] + saved[i - 2]
return saved[n]

However, note that at any given time, we only really use the last two elements of saved. Therefore, we can just store the last two elements:

def fibinacci(n):
if n <= 1: return n
last1, last2 = 0, 1
for i in range(2, n + 1):
current = last1 + last2
last1, last2 = current, last1
return current

This is much better - we now have \Theta(n) iterations and constant memory. Since the numbers grow exponentially, however, and addition takes \Theta(\log n) (using multiprecision arithmetic), addition will actually take \Theta(n), so the algorithm overall takes \Theta(n^2).

Dyanmic programming is used to solve optimization problems, like greedy algorithms are often used to do.

To solve a problem using dynamic programming, we examine the structure of an optimal solution - the optimal structure. Given a problem instance I:

1. Define all the relevant subproblems S(I) that allows us to compute I.
• This is just the direct subproblems, not the subproblems of the subproblems. For example, for fibonacci(n) we would have S(I) be fibonacci(n - 1) and fibonacci(n - 2).
2. Write a recurrence relation for I in terms of S(I) and the base cases.
• The references to S(I) allow us to look up the answers later in the table.
3. Compute the optimal solution to I using the recurrence relation from the bottom-up (starting from the base cases of the recurrence relation) - filling in a table of values for the solutions to each subproblem.
• Whenever we have a subproblem we've previously encountered, we can simply look up the answer in the table to avoid recomputing it.
• When all subproblems have been solved, we can often compute the solution to I very simply using values from the table.

This is somewhat similar to divide and conquer, where we have subproblems and solve them recursively, up until we actually need to combine the solutions - instead of combining recursively, we solve them from the bottom up.

While the 0-1 knapsack problem we saw previously is NP-complete, we can solve it relatively well using dynamic programming.

Suppose [x_1, \ldots, x_n] is an optimal solution for a given problem instance where P = [P_1, \ldots, P_n] and weights W = [W_1, \ldots, W_n], and capacity M. Clearly, either x_n = 0 or x_n = 1.

If x_n = 0, then [x_1, \ldots, x_{n - 1}] is an optimal solution to a smaller problem instance where P = [P_1, \ldots, P_{n - 1}] and weights W = [W_1, \ldots, W_{n - 1}], and capacity M, and the profit of [x_1, \ldots, x_n] is the same as the profit of [x_1, \ldots, x_{n - 1}].

If x_n = 1, then [x_1, \ldots, x_{n - 1}] is an optimal solution to a smaller problem instance where P = [P_1, \ldots, P_{n - 1}] and weights W = [W_1, \ldots, W_{n - 1}], and capacity M - W_n, and the profit of [x_1, \ldots, x_n] is the same as the profit of [x_1, \ldots, x_{n - 1}] plus P_n.

Clearly, the profit of [x_1, \ldots, x_n] is the larger of the two cases, where either x_n = 0 or x_n = 1.

These are our relevant subproblems - the knapsack problem instances with the same weights and profits, but only including the first 0 \le i \le n items and for different capacities 0 \le m \le M, so P = [P_1, \ldots, P_i] and weights W = [W_1, \ldots, W_i].

Now we can define a 2D array (a table) P[i, m] that defines the optimal profit of such a subproblem, considering the first i items and for a knapsack with capacity m.

The basic idea will be to compute P[i, m] for all i and m, until we get to P(n, M).

# 26/10/15

Now we'll write the recurrence relation: P[i, m] = \max\left(P[i - i, m], P[i - 1, m - W_i]\right). Here, P[i - i, m] represents the case where x_i = 0, and P[i - 1, m - W_i] represents the case where x_i = 1.

We also need some base cases: P[i, m] = \begin{cases} 0 &\text{if } i = 1 \wedge m < W_i \\ P_1 &\text{if } i = 1 \wedge m \ge W_i \\ P[i - 1, m] &\text{if } i > 1 \wedge m < W_i \\ \max\left(P[i - i, m - W_i] + P_i, P[i - 1, m]\right) &\text{otherwise} \end{cases}.

Now we fill in an n by M + 1 table of all P[i, m] values, For each row i, we fill in all the m values from smallest to largest. The final answer is therefore at the last cell of the bottommost row.

In code, it looks like this:

from collections import defaultdict
def knapsack(profits, weights, max_weight):
max_profits = defaultdict(dict)
for weight in range(max_weight):
if weight >= weights[0]:
max_profits[0][m] = profits[0]
else:
max_profits[0][m] = 0
for i in range(1, len(profits)):
for weight in range(max_weight):
if weight < weights[i]:
max_profits[i][weight] = P[i - 1][weight]
else:
max_profits[i][weight] = max(max_profits[i - 1][weight - weights[i]] + profits[i], max_profits[i - 1][weight])
return max_profits[len(profits) - 1][max_weight - 1]

One improvement we could make is that in the last row we only need to compute the single entry P[n, M], since it only relies on P[n - 1, M - w_i] and P[n - 1, M].

The above algorithm finds the maximum profit, but not the actual items chosen to get that profit. We can get the items by repeatedly finding which \max\left(P[i - i, m - W_i] + P_i, P[i - 1, m]\right) we chose at each step, and if P[i, m] = P[i - i, m - W_i] + P_i, then we include the item in the solution. then, we repeatedly do the same check for P[i - i, m - W_i] if P[i, m] = P[i - i, m - W_i] + P_i or do the same check for P[i - 1, m] if P[i, m] = P[i - 1, m], and so on:

from collections import defaultdict
def knapsack(profits, weights, max_weight):
max_profits = defaultdict(dict)
for weight in range(max_weight):
if weight >= weights[0]:
max_profits[0][m] = profits[0]
else:
max_profits[0][m] = 0
for i in range(1, len(profits)):
for weight in range(max_weight):
if weight < weights[i]:
max_profits[i][weight] = P[i - 1][weight]
else:
max_profits[i][weight] = max(max_profits[i - 1][weight - weights[i]] + profits[i], max_profits[i - 1][weight])

# figure out which items are in this max profit selection
weight = max_weight
items = [None] * len(profits)
profit = max_profits[len(profits) - 1, capacity - 1]
for i in range[len(profits) - 1, 0, -1]: # go through indices from the last row's index to the second row's index
if profit == max_profits[i - 1, weight]:
items[i] = False
else:
items[i] = True
profit -= profits[i]
weight -= weights[i]
items[0] = profit == 0

return items

Clearly, this is an \Theta(nM) algorithm, since the table has nM cells and each cell value takes constant time to compute, and finding the actual items can be done in \Theta(n) since we trace one row per iteration.

Although this seems to be a polynomial time algorithm, this is actually not the case. The size of a given problem instance is the sum of the sizes of its inputs.

At first, we might assume that this is n + M. However, the size of the M input is actually the number of bits it has, which is k = \ceil{\log_2 M} - the size of the input is n + k. Therefore, the algorithm is O(n2^k). This is consistent with the fact that the knapsack problem is NP-complete.

There is also a recursive backtracking algorithm that tries all the combinations of items and finds the largest profit, which runs in \Theta(2^n). This is useful in different in different situations - even though both are exponential, the dynamic programming one is better for small M and large n, and the backtracking one is better for small n and large M.

We can solve the coin changing problem optimally as well, in almost the same way:

Note that for d_n, 0 \le a_n \le \frac T {d_n}.
Our dynamic programming table will be n by \ceil{\frac{T}{d_1}} - each row is a number of coin denominations used, and each column is a different target sum.
Let N[i, t] be a list of coins used in the optimal solution to the coin changing problem instance with only the coin denominations d_1, \ldots, d_i and target sum t.
Then we have the recurrence relation N[i, t] = \begin{cases} j \text{ where } 0 \le j \le \floor{\frac t d_i} \text{ is the index for which } N[i - 1, t - jd_i] + j \text{ is minimized} &\text{if } i > 1 \\ t &\text{if } i = 1 \end{cases}.
The final answer is in N[n, T]. We could also save memory by, instead of having N[i, t] for solutions, having a table A[i, t] store the number of coins used in the optimal solutions, and then at the end tracing backwards like we did for the knapsack problem.

# 28/10/15

### Longest Common Subsequence

Suppose we have two sequences X = [X_1 \ldots X_m] and Y = [Y_1 \ldots Y_n] where all elements are in some finite alphabet A. What is the longest subsequence that is in both X and Y?

A subsequence of a sequence Z is a sequence that contains any number of the elements of Z in order, but not necessarily consecutively. For example, 1, 4, 8, 9 is a subsequence of 1, 2, 3, 4, 5, 6, 7, 8, 9.

We will solve this using dynamic programming. Let \mathrm{LCS}(X, Y) represent the length of the longest common subsequence.

If X_m = Y_n, then \mathrm{LCS}(X_1 \ldots X_m, Y_1 \ldots Y_n) = \mathrm{LCS}(X_1 \ldots X_{m - 1}, Y_1 \ldots Y_{n - 1}) + 1.

If X_m \ne Y_n, then \mathrm{LCS}(X_1 \ldots X_m, Y_1 \ldots Y_n) = \max\left(\mathrm{LCS}(X_1 \ldots X_{m - 1}, Y_1 \ldots Y_n), \mathrm{LCS}(X_1 \ldots X_m, Y_1 \ldots Y_{n - 1})\right). This is because there are three cases: if the LCS ends on Y_n, then it doesn't end on X_m and we have \mathrm{LCS}(X_1 \ldots X_m, Y_1 \ldots Y_n) = \mathrm{LCS}(X_1 \ldots X_{m - 1}, Y_1 \ldots Y_n); if the LCS ends on X_m, then it doesn't end on Y_n and we have \mathrm{LCS}(X_1 \ldots X_m, Y_1 \ldots Y_n) = \mathrm{LCS}(X_1 \ldots X_m, Y_1 \ldots Y_{n - 1}). The third case is if the LCS doesn't end on X_m or Y_n, but we don't need to consider it in the formula because \mathrm{LCS}(X_1 \ldots X_{m - 1}, Y_1 \ldots Y_{n - 1}) \le \mathrm{LCS}(X_1 \ldots X_{m - 1}, Y_1 \ldots Y_n) - it is already covered by the other cases.

The basic idea behind the recurrence is that we keep chopping off elements from the ends of X and Y - we are computing the LCS for all prefixes of the string. We now define our dynamic programming table, c[i, j] = \mathrm{LCS}(X_1 \ldots X_i, Y_1 \ldots Y_j) (the rows are values of i and the columns are values of j).

Clearly, we now have the recurrence c[i, j] = \begin{cases} 0 &\text{if } i = 0 \vee j = 0 \\ c[i - 1, j - 1] + 1 &\text{if } X_i = Y_j \wedge i > 0 \wedge j > 0 \\ \max(c[i - 1, j], c[i, j - 1]) &\text{if } X_i \ne Y_j \wedge i > 0 \wedge j > 0 \end{cases}. We want to compute c[m, n].

Clearly, to compute any c[i, j], we need to ensure that c[i - 1, j - 1], c[i - 1, j], c[i, j - 1] are already computed. We can do this by filling in the table from left to right, top to bottom. In code, it looks like the following:

from collections import defaultdict
def LCS(X, Y):
c = defaultdict(dict)
for i in range(len(X) + 1): c[i][0] = 0
for j in range(len(Y) + 1): c[0][j] = 0
for i in range(1, len(X) + 1):
for j in rnage(1, len(Y) + 1):
if X[i] = Y[j]:
c[i, j] = c[i - 1][j - 1]
else:
c[i, j] = max(c[i][j - 1], c[i - 1][j - 1])
return c[len(X), len(Y)]

We can also get the actual sequence by doing a traceback like for the knapsack problem. There are three cases: if we moved up and to the left in the table, then we include X_i or Y_j in the LCS and set i = i - 1, j = j - 1; if we moved left in th etable, we set j = j - 1; if we moved right in the table, we set i = i - 1. By repeatedly following this until we get a base case, we get the LCS:

from collections import defaultdict
def LCS(X, Y):
c = defaultdict(dict)
direction = defaultdict(dict)
for i in range(len(X) + 1): c[i][0] = 0
for j in range(len(Y) + 1): c[0][j] = 0
for i in range(1, len(X) + 1):
for j in rnage(1, len(Y) + 1):
if X[i] = Y[j]:
c[i, j] = c[i - 1][j - 1]
direction[i][j] = "up_left"
else:
c[i, j] = max(c[i][j - 1], c[i - 1][j - 1])
direction[i][j] = "left" if c[i, j] == c[i][j - 1] else "up"

# retrieve the LCS
i, j = len(X), len(Y)
sequence = []
while i > 0 and j > 0:
if direction[i][j] == "up_left":
sequence = X[i] + sequence # we could also use Y[j] rather than X[i] since they are identical here
i -= 1; j -= 1
elif direction[i, j] == "up":
i -= 1
else:
j -= 1
return sequence

Clearly, this algorithm takes \Theta(mn) time and memory, since computing each table entry takes constant time and there are (m + 1)(n + 1) table entries.

### Minimum Length Triangulation

Suppose we have a convex polygon denoted by point q_1, \ldots, q_n. How can we triangulate it such that the sum of the perimeters of the n - 2 triangles are minimized?

In other words, what is the trangulation such that the sum of the lengths of all the chords (lines within the polygon that divide it into triangles) is minimised? These two formulations are the same because the sum of lengths of all the chords times 2 plus the perimeter of the polygon is the sum of the perimeters of all the triangles.

As it turns out, a polygon with n vertices has C_n possible triangulations, where C_n is the nth Catalan number (C_n = \frac 1 {n + 1} {{2n} \choose n} = \Theta\left(\frac{4^n}{n^{\frac 3 2}}\right)). This is exponential, so we want to avoid computing all possible triangulations.

Clearly, q_n q_1 is in a triangle q_n q_k q_1, where 2 \le k \le n - 1. Clearly, for each k, we have the triangle q_n q_k q_1, which divides the polygon into a polygon with vertices q_1, \ldots, q_k, and a polygon with vertices q_k, \ldots, q_n.

Clearly, the smallest triangulation of the polygon that includes the triangle q_n q_k q_n is this triangle along with the triangles in the smallest triangulation of q_1, \ldots, q_k and the smallest triangulation of q_k, \ldots, q_n. If we check all possible k, then the smallest of all of these is the smallest triangulation of the polygon q_1, \ldots, q_n. These smaller polygons are our subproblems. Since all of them are polygons of the form q_1, \ldots, q_i, \ldots, q_j, \ldots, q_n, 1 \le i < j \le n, we have \Theta(n^2) subproblems in total.

Let S[i, j] denote the smallest triangulation of the polygon q_i, \ldots, q_j. Let P(q_i, q_k, q_j) be the perimeter of the triangle formed by q_i, q_k, q_j. Then S[i, j] = \min\set{P(q_i, q_k, q_j) + S[i, k] + S[k, j] \middle| k \in \mb{Z}, i < k < j}.

# 2/11/15

## Graph Algorithms

A graph is a pair \tup{V, E}, where V is the set of vertices \set{v_1, \ldots, v_n} and E is the set of edges \set{\set{u_1, v_1}, \ldots, \set{u_m, v_m}}, where u, v are edges. A digraph (directed graph) is also a pair \tup{V, E}, but E is a set of edges \set{\tup{u_1, v_1}, \ldots, \tup{u_m, v_m}} - they have arcs (directed edges) that represent direction. In directed edges, u is the tail and v is the head. We can also write \tup{u, v} and \set{u, v} as uv, which is easier to write and covers both directed and undirected edges.

In this course, we will not allow duplicate edges, so m \le n^2. We will also not allow loops, so vertices will never have edges connecting them to themselves.

On the computer, we often represent graphs with adjecency matrices and adjacency lists.

An adjacency matrix is an n by n matrix where each row represents a vertex and each column represents a vertex. The elements of this matrix are 1 if there is an edge from the vertex for its row to the vertex for its column (A_{u, v} = \begin{bmatrix} a_{1, 1} & \ldots & a_{1, u} \\ \vdots & \vdots & \vdots \\ a_{v, 1} & \ldots & a_{v, u} \end{bmatrix}). Formally, the adjacency matrix is A = \begin{bmatrix} a_{1, 1} & \ldots & a_{1, n} \\ \vdots & \vdots & a_\vdots \\ a_{n, 1} & \ldots & a_{n, n} \end{bmatrix}.

Since loops (edges that have the same vertex on both ends) aren't allowed, the diagonals a_{1, 1}, \ldots, a_{n, n} are all 0. Note that for an undirected graph, the adjacency matrix has 2\abs{E(G)} 1 entries, while for a directed graph, the adjacency matrix has \abs{E(G)} 1 entries.

An adjacency list is an array of n linked lists, represented as \adj(_1), \ldots, \adj(v_n), where \adj(v_i) is the linked list of the vertices that v_i is directly connected to. This is basically the same as the adjacency mtrix, but as an array of linked lists rather than a matrix. This is more memory efficient for sparse graphs, but make some operations more difficult since we can't test for edges in O(1) time.

We usually use breath-first search for undirected graphs, and depth-first search for directed graphs.

A clique is a subset of vertices that is fully connected. A vertex cover is a subset of vertices such that all edges have at least one end in that set.

To do a breadth-first search (BFS), we pick a vertex s, then spread out from s by keeping track of the vertices we've explored so far, and exploring the unexplored neighbors of the explored vertices. Basically, the search spreads out from s in layers. Starting from s, we explore its neighbors, then the neighbors of those, then the neighbors of those, and so on. The full form of breadth-first search appears as follows:

from collections import deque
colors, predecessors = {}, {}
for vertex in vertices:
colors[vertex] = "white"
predecessors[v] = set()
colors[start_vertex] = "gray"
q = deque([start_vertex])
while q:
vertex = q.popleft()
if colors[neighbor] == "white":
colors[neighbors] = "gray"
predecessors[neighbor] = vertex
q.append(neighbor)
colors[vertex] = "black"

The colors represent the visit state of vertices - white means a vertex is undiscovered, gray means it's discovered but its neighbors are still being processed, and black means it's discovered and fully processed. The predecessors dict maps all vertices (except the start vertex) to the the vertices they were visited from (their predecessors), and is formally represented \pi(v_i).

In breadth-first search, an edge \set{\pi(v), v} is a tree edge. All other edges are cross edges. The set of all tree edges forms a BFS forest (there is a BFS tree for each graph component), and if the graph is connected, the BFS tree is a spanning tree.

A simpler form of the algorithm might appear as follows:

from collections import deque
start_vertex = vertices[0]
visited = set()
q = deque([start_vertex])
while q:
vertex = q.popleft()
if neighbor not in visited:
q.append(neighbor)

Note that breadth first search, since it visits every vertex and could potentially visit every edge, takes \Theta(\abs{V} + \abs{E}) worst case time.

Shortest path from a starting vertex:

from collections import deque
start_vertex = vertices[0]
visited = set()
q = deque([start_vertex])
distance_to_start = {start_vertex: 0}
while q:
vertex = q.popleft()
if neighbor not in visited:
q.append(neighbor)
distance_to_start[neighbor] = distance_to_start[vertex] + 1

Proof of correctness:

Let \mathrm{dist}(v) represent distance_to_start[v] where v is a vertex, and s represent start_vertex.
We want to prove that for any vertex v, \mathrm{dist}(v) is the length of the shortest path from s to v.
Define layers as L_i = \set{v \middle| \mathrm{dist}(v) = i}.
Clearly, v \in L_i if and only if uv is an edge such that u \in L_{i - 1} and there is no u' \in L_j, j < i - 1 - it is adjacent to a vertex in a previous layer and is not adjacent to any vertices in lower layers.
Clearly, if u is discovered before v, then \mathrm{dist}(u) \le \mathrm{dist}(v).
Clearly, if uv is an edge, then \abs{\mathrm{dist}(u) - \mathrm{dist}(v)} \le 1.
Clearly, if uv is a tree edge, then \mathrm{dist}(v) = \mathrm{dist}(u) + 1.
Let \delta(v) be the length of the shortest path from s to v.
Clearly, \delta(v) \le \mathrm{dist}(v) because \mathrm{dist}(v) is the length of the sequence v, \pi(v), \pi(\pi(v)), \ldots, s.
We can easily use induction over the shortest path s, v_{i_1}, \ldots, v_{i_k}, v to prove that \delta(v) \le \mathrm{dist}(v).
Therefore, \delta(v) = \mathrm{dist}(v), and \mathrm{dist}(v) is the length of the shortest path.

For most graph algorithms, the steps will be quite intuitive, but the proofs of correctness will be rather difficult.

# 4/11/15

Determine whether a graph is bipartite:

Clearly, a graph is bipartite if and only if it contains an odd cycle (see MATH239 notes for proof of this.
We can check for odd cycles using breadth first search.
At each step, if we encounter an edge \set{u, v} where \mathrm{dist}(u) = \mathrm{dist}(v), then we have found an odd cycle containing u and v and the first common ancestor of u and v in the BFS tree.
This is because the distance from u and v to the common ancester must be equal, so if the distance is k then the cycle must have 2k + 1 edges (k for u to the ancestor, k for v to the ancestor, and 1 for u to v), which forms an odd cycle.
Otherwise, if we go through the entire graph without finding an odd cycle, there are no odd cycles. If this is the case, then we can partition the vertices graph into two partitions \set{u \in V(G) \middle| \mathrm{dist}(u) \text{ is even}} and \set{u \in V(G) \middle| \mathrm{dist}(u) \text{ is odd}}.

Depth-first search (DFS) is done in a way very similar to breadth-first search, but with a different visiting mechanism and using a stack rather than a queue.

def depth_first_search(adjacency_list):
colors, predecessors = {}, {}
discovery_time, finish_time = {}, {}
colors[vertex] = "white"
predecessors[vertex] = None
current_time = [0]
if colors[vertex] == "white":
visit(adjacency_list, vertex, colors, predecessors, current_time, discovery_time, finish_time)

def visit(adjacency_list, vertex, colors, predecessors, current_time, discovery_time, finish_time):
colors[vertex] = "gray"
current_time[0] += 1
discovery_time[vertex] = time
if colors[neighbor] = "white":
predecessors[neighbor] = vertex
visit(adjacency_list, neighbor, colors, predecessors, current_time, discovery_time, finish_time)
colors[vertex] = "black"
current_time[0] += 1
finish_time[vertex] = time

Note that we keep track of the discovery and finish times for each vertex. This information can be useful in some situations. A simplified version of DFS can be written as follows:

def depth_first_search(vertices, adjacency_list):
visited = set()
for vertex in vertices:
if vertex not in visited:

if neighbor not in visited:
depth_first_search_visit(vertices, adjacency_list, neighbor, visited)

In depth-first search, an edge \tup{\pi(v), v} is a tree edge. The set of all tree edges forms a DFS forest (there is a DFS tree for each graph component). If the graph is connected, the DFS forest is a spanning tree. For a tree edge, we discover and finish with v during u's processing - v's discovery and finish time interval is nested within u's discovery and finish time interval. Additionally, v will be white when the edge is dicovered.

An edge \tup{u, v} is a forward edge if and only if it is not a tree edge, and v is a descendant of u in the depth-first forest - an edge that connects a vertex with a descendant that is not a direct child. Like for tree edges, v's time interval is nested in v's, but v will be black when we the edge is discovered.

A back edge is the same thing as a forward edge but in the other direction. However, \tup{u, v} can't possibly be a tree edge since u is a descendant of v in the DFS tree, so we can simply say a back edge is one that connects a vertex to an ancestor. For a back edge, we discover and finish with u during v's processing - u's discovery and finish time interval is nested within v's. Also, v will always be gray.

All other edges are cross edges - generally this happens when edges connect siblings in the tree. For example, in the graph \adj(u) = \set{v, w}, \adj(v) = \set{w}, \adj(w) = \set{v}, either vw is a cross edge, or wv is, depending on the order we visit the vertices in. For a cross edge, the discovery and finish time intervals for u and v will be disjoint - they will never overlap. Also, v will always be black.

The parenthesis theorem says that the discovery and finish time intervals are either disjoint or nested.

BFS and DFS might not visit all the vertices of a graph if it is not connected - if it has multiple components. To have it visit all nodes, we can keep track of unvisited vertices, and while there are still unvisited nodes, we can repeatedly run the search algorithm, stopping when all vertices have been visited.

# 9/11/15

A directed acyclic graph (DAG) is a directed graph with no cycles. A topological ordering/topological sort of a directed graph G is an ordering < of all vertices in G such that \set{u, v} \in E(G) \implies u < v. A directed graph has a topological ordering if and only if it is a DAG - if it has no cycles.

A directed graph G is a DAG if and only if it has no back edges when we do a depth-first search on it. This is because if there is a back edge, that back edge and the tree edges connecting its ends form a directed cycle, and if it has a cycle, then one of its edges must be a back edge.

As it turns out, the negative finishing time forms a topological ordering - for all \set{u, v} \in E(G), finish_time[u] > finish_time[v], so -finish_time[u] < -finish_time[v]. In other words, ordering vertices by finishing time descending results in a topological ordering.

The indegree of a vertex in a directed graph is the number of edges going into it. The outdegree of a vertex in a directed graph is the number of edges leaving it.

Prove that a DAG must have a vertex of indegree 0:

We will prove the contrapositive - that a graph with no vertices of indegree 0 must have a cycle.
If there are any edges starting and ending at the same vertex, then there is a cycle. Assume that there aren't any.
Clearly, there is an arc from v_1 to v_2, and v_2t o v_3, and so on - every vertex must have at least one edge going into it from another vertex.
At some point, we must encounter a v_j = v_i where j > i - we run out of vertices to come from.
Therefore, there must also be a cycle with v_i and v_j. So if there are no cycles, there must be a vertex of indegree 0.

Prove that a directed graph G has a topological ordering if and only if it is a DAG:

Clearly, if G has a directed cycle v_1, \ldots, v_k, v_1, \ldots, then that cycle cannot have a topological ordering since v_k > 1 and v_1 > v_k.
So if G has a topological ordering, then it must be a DAG.
Suppose G is a DAG. Then G has a vertex v_1 of indegree 0, and it can only have outward edges.
Construct a topological ordering where v_1 is the lowest value, and then delete v_1 and all incident edges to make a new graph G'.
Clearly, G' is still a DAG, since it is just G with a vertex deleted. So it also has a vertex v_2 of indegree 0.
Let this be the second lowest value in the topological ordering, then delete v_2 and all incident edges and repeat the steps for the new graph until we have processed all vertices.
Clearly, once we have processed all vertices we have a topological ordering. Therefore, if G is a DAG then we have a topological ordering.
Therefore, G is a DAG if and only if it has a topological ordering.
Note that this proof gives an algorithm for constructing a topological ordering, but this is rather inefficient. We can use depth-first search to do this more efficiently.

Making a topological ordering is possible in \Theta(\abs{V} + \abs{E}):

def topological_ordering(vertices, adjacency_list):
visited = {}
topological_ordering = []
for vertex in vertices: visited[vertex] = False
for vertex in vertices:
if not visited[vertex]:
topological_ordering_visit(vertices, adjacency_list, vertex, colors, predecessors, current_time, discovery_time, finish_time)
return reversed(topological_ordering)

def topological_ordering_visit(vertices, adjacency_list, vertex, colors, visited, topological_ordering):
visited[vertex] = True
if not visited[neighbor]:
topological_ordering.append(vertex)

# 11/11/15

For two vertices x, y in a directed graph G, let x \sim y if and only if x = y or x \ne y and there exist directed paths from x to y and from y to x. Note that \sim as defined here is an equivalence relation - it is a relation that is reflexive (a \sim a), symmetric (a \sim b \implies b \sim a), and transitive (a \sim b \wedge b \sim c \implies a \sim c)- An equivalence relation partitions V(G) into sets of vertices such that u \sim v between any two u, v in the same partition.

A directed graph is strongly connected if and only if for any two vertices u, v there is a directed path from u to v and v to u - u \sim v. A strongly connected component is a maximal strongly connected subgraph.

The component graph of a directed graph is a directed graph that has vertices corresponding to each component in the graph, and vertices corresponding to whether any two components have an edge between them. Define the discovery time and finishing time of a component to be the earliest discovery and latest finish time of any vertex in that component.

The component graph must be a DAG. This is because if there is a cycle between strongly connected components, then there must exist a path from each component to each other component, which means each of those components must not be maximal. Instead, the whole thing is all one component.

For two components C_1, C_2 in a component graph, if there is an edge from C_1 to C_2, then C_1 has a later finishing time than C_2 (note that there can't be any cycles, so if this is the case there can't be an edges from C_2 to C_1 as well). This is because a depth-first search must go through all of C_2 before going to C_1 if it encounters C_2 first, and must go through all of C_2 before being able to finish going through C_1 if it encounters C_1 first.

A graph is strongly connected if and only if for any fixed vertex s, there is a directed path from s to any vertex t and a directed path from t to s. Knowing this, it is simple to create an algorithm to test if a graph G is strongly connected:

1. We need to test if we can reach all vertices from s.
1. Pick any vertex s and call visit(s) (from DFS) once on G.
2. If any of the vertices are not black, then there are vertices that we can't reach from s at all, so G is not a strongly connected component.
2. We need to test if we can reach s from all vertices.
1. Construct a new graph H that is the same as G but where all the edges are reversed.
2. Pick any vertex s and call visit(s) (from DFS) once on H.
3. If any of the vertices are not black, then there are vertices that can't reach s at all, so G is not a strongly connected component.
3. If both tests pass, then G is a strongly connected component.

To find the strongly connected components in G:

1. Do a depth-first search on G, storing the finishing times.
2. Construct a new graph H with all the edges reversed.
3. Do a depth-first search on H, each time calling visit starting with the unvisited vertex with the latest finishing time in the DFS from step 1.
4. The strongly connected components of G are the trees in the DFS forest from the DFS in step 3.

Proof of correctness:

Note that G and H have the same strongly connected components.
Let u be the first vertex visited in step 3 - the one with the highest finishing time from the DFS in step 1.
Let C be the strongly connected component that contains u. Let C' be any other strongly connected component.
Clearly, the finishing time of C is later than that of C'.
So there cannot be an arc from C' to C in G (it is G rather than h because the finishing times were for a DFS on G).
So we can only reach the vertices in C from u, not from C'.

Clearly, this has time complexity \Theta(\abs{V} + \abs{E}), since both tests from step 1 and 2 take \Theta(\abs{V} + \abs{E}).

# 16/11/15

The spanning tree of a connected undirected graph G is a subgraph T that is a tree containing every vertex in G. T is a spanning tree if and only if T is an acyclic subgraph that has \abs{V(G)} - 1 edges.

Given a weight function w: V(G) \to \mb{R} on a graph G, we want to find a spanning tree of G such that \sum_{e \in E(T)} w(e) is minimized - the minimum spanning tree problem. In other words, we want a connected subgraph of G such that the sum of the weights is minimised.

### Kruskal's Algorithm

Kruskal's algorithm is one way to solve this. Basically, we sort the edges by weight ascending and considering each edge in order, we add the edge to the result if it doesn't form a cycle in the result.

At any point, we have a forest of disjoint trees (no trees share any vertices). We start off with n trees, and whenever we add an edge to the forest, we merge two trees, reducing the number of trees in the forest by 1.

We can efficiently test if adding an edge would result in a cycle by keeping track of the parents of each vertex in its tree as L[v]. The root of the tree is therefore the vertex v such that L[v] = v. By repeatedly getting the parent/leader of the vertex, checking v, L[v], L[L[v]], \ldots, we eventually get the root of the tree. When we add an edge to the resulting tree, we simply add an entry L[v] = u or L[u] = v.

We now define \operatorname{find}(v) to be the root of the tree containing the vertex v, and implement it by repeatedly checking L. Testing if an edge causes a cycle is the same as checking if the edge \set{u, v} has u, v in two different trees - if \operatorname{find}(u) = \operatorname{find}(v). So we add an edge \set{u, v} if and only if \operatorname{find}(u) \ne \operatorname{find}(v).

As it turns out, if we ensure that the depth of the tree containing u is less than the depth of the tree containing v when adding the entry L[v] = u, we can ensure that \operatorname{find} is always \Theta(\log(\abs{E})).

L is a type of data structure called a union-find/merge-find data structure, because it has a find operation and a union/merge (setting the leader/parent of a vertex) operation.

def spanning_tree(vertices, edges, weights):
tree_edges = set()
for edge in sorted(edges, key=lambda edge: weights[edge]):
if vertex1_root != vertex2_root:
if vertex1_depth < vertex2_depth:
else:
return (vertices, tree_edges)

depth = 0
depth += 1
return vertex, depth

Since sorting the edges by weight takes \Theta(\abs{E} \log \abs{E}) and we run the \Theta(\log \abs{E}) union and find operations.

### Prim's Algorithm

Prim's algorithm is another way to find the minimum spanning tree of a graph.

Basically, we define a tree A, and starting from an arbitrary vertex u_0, we repeatedly add the lowest-weight edge that connects a vertex in A to a vertex not in A to A.

Let V_A be the set of vertices that are incident to edges in A. Let N[v] be a vertex u \in V_A where \set{u, v} is a minimum weight edge. Let W[v] = w(N[v], v) - the best possible choice we can make at any vertex.

This is a pretty standard greedy algorithm.

def spanning_tree(vertices, edges, weights):
tree_edges = set()
start_vertex = vertices[0]
tree_vertices = {start_vertex}
for vertex in vertices - {start_vertex}:
W[vertex] = weights[{start_vertex, vertex}]
N[vertex] = start_vertex

while len(tree_edges) < len(vertices) - 1:
min_weight_vertex = min(vertices - tree_vertices, key=lambda vertex: W[vertex])
edge_vertex = N[min_weight_vertex]

for vertex in vertices - tree_vertices: # look at the vertices in the rest of the graph
if weights[min_weight_vertex, vertex] < W[vertex]: # new edge has a lower weight than the current one
W[vertex] = weights[min_weight_vertex, vertex]
N[vertex] = min_weight_vertex
return (vertices, tree_edges)

Since we need to check edges \Theta(\abs{V}) times, and each time takes O(\abs{V}) time, this implementation of Prim's algorithm takes O(\abs{V}^2) time. Alternatively, we could use the adjacency list representation and build a heap of all the vertices in W to compute min_weight_vertex in O(\log(\abs{V})) time, giving us O(\abs{E} \log \abs{V}), which is better for sparse graphs (graphs with relatively few edges compared to vertices).

# 18/11/15

To prove the correctness of Kruskal's and Prim's algorithm, we will construct a general algorithm of which Kriskal's and Prim's algorithm are special cases of. Proving this algorithm correct will prove both Kruskal's and Prim's algorthms correct.

A cut is a paritition of V(G) into sets S, V \setminus S such that both are non-empty. A crossing edge with respect to a cut S, V \setminus S is an edge that has one end in S and another in V \setminus S. A cut S, V \setminus S respects a set of edges A \subseteq E(G) if and only if none of the edges in A are crossing edges - all edges in A have vertices that are either both in S or both in V \setminus S.

The general algorithm is as follows:

def greedy_minimum_spanning_tree(vertices, edges, weights):
spanning_tree = set()
while len(A) < len(vertices) - 1:
let S and vertices - S be a cut that respects spanning_tree
let e be the minimum weight crossing edge
return spanning_tree

Kruskal's algorithm simply has the cut be between two trees being grown in the graph. Prim's algoorithm has the cut be between the tree being grown and the rest of the graph.

Proof of correctness:

Let j = \abs{V} - 1. Let A = \set{e_1, \ldots, e_j} be the set of edges in the spanning tree where the edges are chosen in that order.
We will prove that A is a minimum spanning tree, via induction. For simplicity, we will assume that all of the edge weights are distinct.
Clearly, if j = 0 then there is only 1 vertex and the spanning tree has no edges, so A is a minimum spanning tree.
Assume for some j that \set{e_1, \ldots, e_{j - 1}} is contained in a minimum spanning tree.
So there is a cut S, V(G) \setminus S such that \set{e_1, \ldots, e_{j - 1}} respects the cut - S and V \setminus S form two trees that span all vertices.
In the algorithm, e_j is chosen to be the minimum weight crossing edge with respect to this cut.
If e_j is in a minimum spanning tree, then A is a minimum spanning tree. Assume e_j is not in the minimal spanning tree.
Clearly, There is some other crossing edge joining the two trees e_j' that has lower weight than e_j. Then the minimum spanning tree has a lower weight than A, so w(e_j') < w(e_j), a contradiction since e_j is chosen to be the one with the lowest weight. So A is a minimum spanning tree.
So by induction, the spanning tree is minimal.

## Shortest Paths

Given a directed graph G, a weight function w: E(G) \to \mb{R}^+ (non-negative), and a source/start vertex u_0, find the shortest directed path P from u_0 to any vertex v such that the total weights are minimised - the smallest possible value of \sum_{e \in P} w(e). This is called the single source shortest path problem.

Dijkstra's algorithm (pronounced "diig-stra") is one algorithm for this problem, structurally very similar to Prim's algorithm. ;wip: explain this better

Let S be the subset of vertices for which the shortest path from u_0 to all vertices in S are known. Let D[v] where v \notin S be the weight of the shortest path from u_0 to v where all the interior vertices of the path (vertices excluding u_0 and v) are in S.

Let \pi[v] be the predecessor of v on a path (vertices in this algorithm will only be part of one path at a time).

Initially S = \set{u_0} and D[v] = w(u_0, v) for all v (edges that don't exist have weight \infty). At each step, we choose a v \in V(G) \setminus S such that D[v] is minimised, add v to S, then update D and \pi to match:

def compute_paths(vertices, edges, weights, start_vertex):
visited = {start_vertex}
min_path_weights = {start_vertex: 0}
predecessors = {}
for vertex in vertices - {start_vertex}:
min_path_weights[vertex] = weights[{start_vertex, vertex}]
predecessors[vertex] = start_vertex

while len(visited) < len(vertices):
min_weight_vertex = min(vertices - visited, key=lambda vertex: min_path_weights[vertex]) # vertex in the rest of the graph with lowest current path weight
for vertex in vertices - visited: # look at the vertices in the rest of the graph
if min_path_weights[min_weight_vertex] + weights[{min_weight_vertex, vertex}] < min_weight_vertex[vertex]: # new vertex has lower path weight than the current one
min_path_weights[vertex] = min_path_weights[min_weight_vertex] + weights[{min_weight_vertex, vertex}]
predecessors[vertex] = min_weight_vertex

return predecessors

def shortest_path(start_vertex, end_vertex, predecessors):
path = [start_vertex]
vertex = end_vertex
while vertex != start_vertex:
vertex = predecessors[vertex]
path.append(vertex)
return reversed(path)

Since this is structurally so similar to Prim's algorithm, it has the same time complexity, O(\abs{V(G)}^2) for the above implementation or O(\abs{E(G)} \log \abs{V(G)}) for a heap-based version.

This works because D has the weight of the shortest path from u_0 to any vertex v_k \in S, and then we compute, step by step, the best paths from v_k to v. Basically, D keeps track of all the possible candidate paths until we get to v.

The Bellman-Ford algorithm can also solve the single-source path problem, but also supports negative weights, as long as there are no negative weight cycles (cycles whose edge weight sum is negative). This runs in O(\abs{E(G)} \abs{V(G)}).

We might think that adding a constant offset to all the negative weights (to make them positive) would allow us to find the shortest paths using the algorithms we've seen so far. However, there exist graphs with negative weights such that if we add a constant offset, the shortest path changes. Consider a graph with 2 paths P_1, P_2 from u to v, where P_1 has 5 edges each with weight -1, and P_2 has 2 edges each with weight -2, so P_1 has a lower weight. If we add 2 to the weights of each edge to make all the weights non-negative, P_2 has a lower weight.

If there are no cycles in the graph (a DAG), there is an algorithm that can find the shortest path even if there are negative weight edges, and running more efficiently than the Bellman-Ford algorithm (see slide 160 for code).

The all-pairs shortest path problem is to find the shortest path between all pairs of vertices in the graph, allowing negative weight edges but not negative weight cycles. We can solve this by running the Bellman-Ford algorithm on each vertex, but the Floyd-Warshall algorithm can do it in O(n^3), using a dynamic-programming-like technique (see slide 164 for code). Like Bellman-ford, Floyd-Warshall also supports negative edge weights as long as there are no negative weight cycles.

D_m[i, j] denotes the weight of the shortest path from i to j such that the interior vertices of the path (vertices excluding i and j) are in \set{1, \ldots, m}. At each step, we check our candidate path D_m[i, m] + D[m, j] against the old best path D_m[i, j]:

def shortest_path(vertices, edges, weights):
min_weight_path = [[[[float("inf")] * len(vertices)] for _ in vertices] for _ in vertices] # 3D array of size $n$ by $n$ by $n$:
for m in range(len(vertices)):
for i in range(len(vertices)):
for j in range(len(vertices)):
D[m][i][j] = min(D[m - 1][i][j], D[m - 1][i][m] + D[m - 1][m][j])

# 23/11/15

## Intractability and Undecidability

The complexity classes we will look at are P (polynomial), NP (non-deterministic polynomial), and NP-complete. These are sets of problems. NP is a superset of P, and NP-complete is a subset of NP.

A decision problem is a problem whose solutions are either "yes" or "no". A yes-instance/no-instance is a problem instance of a decision problem for which the answer is yes/no. An algorithm solving a decision problem is polynomial time if and only if it has a worst case time complexity of O(n^k) for some positive integer k.

The complexity class P is the set of all decision problems that have polynomial time algorithms solving them. Given a decision problem \Pi, \Pi \in P if and only if there is a polynomial time algorithm solving \Pi. For example, sorting is not in P because it is not a decision problem. For example, checking if a graph contains a cycle is in P (O(n^2)), while checking if a graph contains a Hamiltonian cycle has no known polynomial-time algorithms that solve it (and there probably isn't one).

The rational knapsack problem (where we answer yes if and only if the optimal profit is T or more) is in P, since we have a polynomial time greedy algorithm that can solve it, but the 0-1 knapsack problem (where we answer yes if and noly if the optimal profit is T or more) is NP-complete.

A certificate for a yes-instance I is extra information that allows us to verify whether I is in fact a yes-instance - enough information to generate a formal proof that the answer is correct. A certificate verification algorithm is an algorithm that correctly determines whether a problem instance I is actually a yes-instance given a certificate C, written as \operatorname{Ver}(I, C).

A valid certificate is one that does in fact verify that I is a yes-instance. Certificates can also be invalid, which means their information doesn't prove that I is a yes-instance. Note that if I is a no-instance, then no valid certificate for I can exist. \operatorname{Ver}(I, C) outputs yes if I is a yes-instance and C is a valid certificate for I, and no if C is not a valid certificate for I.

A certificate verification algorithm \operatorname{Ver}(I, C) solves a decision problem if and only if for every yes-instance I, there exists a certificate C such that \operatorname{Ver}(I, C) outputs yes, and for every no-instance I, all certificates X make \operatorname{Ver}(I, C) output no. In other words, given a certificate for a problem instance, the certificate verification algorithm outputs the correct answer to that problem instance, assuming yes-instances come with a valid certificate.

The complexity class NP is the set of all decision problems that can be solved by a polynomial-time certificate verification algorithm - the decision problems for which it is possible to verify that the result is correct in polynomial time. Note that this includes all problems in P because the certificate verification algorithm for polynomial time algorithms can just be that algorithm itself and the certificates can just be the problem instances.

For example, the problem "does a graph contain a Hamiltonian cycle?" is in NP, because the certificate can just be the Hamiltonian cycle itself (a list of vertices in the cycle), and we can verify whether that particular cycle is a Hamiltonian cycle in O(n) time (check that the cycle includes all vertices once, and neighboring vertices are connected by an edge). So P \subseteq NP.

Whether P = NP or not is currently a big open question in computer science. It is commonly believed that P \ne NP - that there are problems that can be verified in polynomial time, but not computed in polynomial time.

All problem instances for problems in NP can be solved in exponential time. This is because we could potentially generate all possible certificates (this is of exponential size because the certificate's size must be O(n), or else the certificate verification algorithm would not be polynomial time) and verify them. If any certificate verifies to a yes, then the problem instance is a yes-instance. Otherwise, it is a no-instance.

Let \mathcal{I}(\Pi) be the set of all instances of a decision problem \Pi. Let \mathcal{I}_{\text{yes}}(\Pi), \mathcal{I}_{\text{no}}(\Pi) be the set of all yes-instances and no-instances, respectively, of a decision problem \Pi.

A polynomial-time reduction/polynomial transformation from decision problem \Pi_1 to decision problem \Pi_2 (written as \Pi_1 \le_P \Pi_2) exists if and only if there exists f: \mathcal{I}(\Pi_1) \to \mathcal{I}(\Pi_2) such that f can be computed in polynomial time, I \in \mathcal{I}_{\text{yes}}(\Pi_1) \implies f(I) \in \mathcal{I}_{\text{yes}}(\Pi_2), and I \in \mathcal{I}_{\text{no}}(\Pi_1) \implies f(I) \in \mathcal{I}_{\text{no}}(\Pi_2).

In other words, if \Pi_1 \le_P \Pi_2, then \Pi_1 is no more difficult to solve than \Pi_2 - problem instances can be transformed into a different problem with the same answer in polynomial time, even without knowing what the answer is.

So if \Pi_1 \le_P \Pi_2, and \Pi_2 is in P, then \Pi_1 must also be in P.

Some NP-complete problems are checking clique sizes (determine whether a graph contains a clique of size k or greater), and checking vertex cover sizes (determine whether a graph contains a vertex cover of size k or smaller). Note that there is a polynomial transformation from instances of the clique problem to instances of the vertex cover problem. We can define f(\tup{G, k}) = \tup{H, \abs{V(G)} - k} where H is the complement of G (a graph with edges where G doesn't have edges, and no edges where G does). Since f can be computed in polynomial time, and transformed problem instances give the same result, the polynomial transformation exists.

# 25/11/15

Proof that the clique problem has a polynomial transformation to the vertex cover problem:

Let I be a yes-instance of the clique problem for size k.
So there exists a W \subseteq V(G) such that \abs{W} = k and W is a clique in G - W is a subset of size k of the largest clique. We want to prove that f(I) is also a yes-instance of the vertex cover problem.
Clearly, V(H) \setminus W is a vertex cover in H, since every edge in V(H) \setminus W has one end in V(H) \setminus W and another in V(H) (this is a theorem in graph theory).
So if f(I) is a yes-instance, I must also be a yes-instance.
Let f(I) be a yes-instance of the vertex cover problem for size n - k.
By a similar argument as above, show that there must be a clique of size n - k in G.
So if I is a yes-instance, f(I) must also be a yes-instance.
Therefore, f(I) is a yes-instance if and only if I is a yes-instance.

Note that if a problem instance I has size n, then the size of any polynomial transformation of I must have size O(n^k) for positive integer k. This is because the computation creating a problem instance of size larger than O(n^k) would take longer than O(n^k) time, so the transformation would not be able to run in polynomial time. In other words, \abs{f(I)} \in O(\abs{I}^k) for some positive integer k.

If \Pi_1 \le_P \pi_2 and \Pi_2 \in P, then \Pi_1 \in P - if a problem has a polynomial transformation to another, and the other can be solved in polynomial time, then so can the first problem. This is because we can solve an instance of \Pi_1 by transforming it into an instance of \Pi_2 in polynomial time, solve the \Pi_2 instance in polynomial time, and give that as the answer.

If \Pi_1 \le_P \Pi_2 and \Pi_2 \le_P \Pi_3, then \Pi_1 \le_P \Pi_3 - polynomial transformations can be composed. This is because any instance of the first problem can be polynomially transformed into an instance of the third by transforming it into an instance of the second, and then transforming that instance into an instance of the third problem.

The complexity class NP-complete (also known as NPC) are the set of all decision problems such that \Pi \in NP and for all \Pi' \in NP, \Pi' \le_P \Pi - the decision problems in NP such that there are polynomial transformations from all other problems in NP to them.

That means that if P \cap NPC \ne \emptyset, then P = NP - if any NP-complete problem can be solved in polynomial time, then that algorithm can be used to solve any NP-complete problem in polynomial time.

Given a Boolean formula F over Boolean variables x_1, \ldots, x_n in conjunctive normal form (the formula is a product of sums), is there a truth assignment of the n variables such that the formula evaluates to true? In other words, do there exist values of x_1, \ldots, x_n such that F is true? This is the CNF-satisfiability problem (also known as SAT).

A literal is either a variable or a negation of a variable, a or \neg a. A clause is a logical OR over literals, usually represented as a set of literals. A formula is a set of clauses.

According to the Cook-Levin theorem, the CNF-satisfiability problem is NP-complete. This is the first ever problem to be proved NP-complete.

There is a theorem that says that if \Pi_1 is NP-complete, and \Pi_1 \le_P \Pi_2, then \Pi_2 is also NP-complete, since \Pi_2 \in NP and any \Pi' must satisfy \Pi' \le_P \Pi_1 \le_P \Pi_2. So once we know one NP-complete problem, we can simply show that an NP-complete problem can be transformed in polynomial time to the problem we want to prove is NP-complete, which is a much simpler problem than proving a problem NP-complete from scratch.

For example, we can prove the 3SAT/3-CNF-satisfiability problem (same as SAT, but F is a product of sums where every sum has exactly 3 literals/variables, all different) is NP-complete in this way. To convert any SAT problem instance into 3SAT:

1. Initialize F' as an empty set.
2. For each clause C = \set{x_{i_1}, \ldots, x_{i_k}} in F (each sum in the product of sums):
1. If \abs{C} = 1, add the clauses x_{i_1} \vee a_1 \vee a_2, x_{i_1} \vee a_1 \vee \neg a_2, x_{i_1} \vee \neg a_1 \vee a_2, x_{i_1} \vee \neg a_1 \vee \neg a_2 to F', where a_1, a_2 are new, unused variables (these should not used in any other clauses).
• This is clearly true since the clauses evaluate to x_{i_1} regardless of the values of a_1, a_2.
2. If \abs{C} = 2, add the clauses x_{i_1} \vee x_{i_2} \vee c, x_{i_1} \vee x_{i_2} \vee \neg a to F', where a is a new, unused variable (this should not be used in any other clauses).
• In the same way as above, regardless of the value of the variable, the clauses are always satisfiable if and only if C is.
3. If \abs{C} = 3, add this clause to F'.
• This clause is already a valid 3SAT clause.
4. If \abs{C} > 3, add \abs{C} - 2 clauses a_1 \vee a_2 \vee x{i_1}, a_3 \vee \neg x_{i_1} \vee x_{i_2}, a_4 \vee \neg x_{i_2} \vee x_{i_3}, a_5 \vee \neg x_{i_3} \vee x_{i_4}, \ldots, a_{k - 2} \vee \neg x_{i_{k - 4}} \vee x_{i_{k - 3}}, a_{k - 1} \vee a_k \vee \neg x_{i_{k - 3}} to F' where a_1, \ldots, a_k are new, unused variables (these should not be used in any other clauses).
• We can prove this is correct by proving that C being satisfiable implies the added clauses are satisfiable (via direct proof), and if the added clauses are satifiable, C is also satisfiable (via contradiction).
3. After this is done, F' is an instance of 3SAT that is a yes-instance if and only if F is a yes-instance of F.

Since the transformation runs in polynomial time, 3SAT is in NP and \text{SAT} \le_P \text{3SAT}, so 3SAT is NP-complete as well.

We can also prove that 2SAT/2-CNF-satisfiability problem (same as SAT, but F is a problem of sums where every sum has exactly 2 variables, all different) is in P.

# 30/11/15

On computers, we can represent SAT problem instances with an m by n matrix M (where n is the number of variables and m is the number of clauses), where rows represent clauses C_1, \ldots, C_m, and columns represent variables x_1, \ldots, x_n. Each cell M_{i, j}, then represents whether x_j is in the clause C_i as a literal (represented by 1), in the clause C_i negated as a literal (represented by -1), or not in the clause at all (represented by 0).

Clearly, the size of the problem instance M is \Theta(mn), and m can grow exponentially with respect to n. The algorithm that converts instances of SAT to 3SAT runs in polynomial time with respect to the size of the instance, \Theta(mn), and so is a polynomial transformation.

We've been talking about decision problems all this time, but now we want to also look at problems in general.

Given a problem \Pi (not necessarily a decision problem), an oracle for \Pi is an (possibly hypothetical) algorithm A that solves it.

Given another problem \Pi', an algorithm A' is a Turing reduction from \Pi' to \Pi if and only if it solves \Pi' using A as a subroutine (A' depends on A to work, so A is the solver). A Turing reduction is sort of like a generalization of the concept of polynomial transformations to all problems. We can say that a Turing reduction exists by writing \Pi' \le^T \Pi.

A Turing reduction A' using A as the solver is a polynomial-time Turing reduction if and only if A' has polynomial running time with respect to the running time of A. If this is the case, we can write \Pi' \le_P^T \Pi.

Alternatively, we can use a more intuitive definition: if we have a magical box called an oracle that can solve any instance of \Pi' in polynomial time, and we have an algorithm that transforms instances of \Pi to instances \Pi' and vice versa in polynomial time, we could use the oracle to solve \Pi in polynomial time.

Basically, if there is a polynomial-time Turing reduction \Pi' \le_P^T \Pi and \Pi can be solved in polynomial time, then \Pi' can also be solved in polynomial time.

Turing reductions are interesting because it seems like we can reduce harder versions of problems into simpler versions.

A travelling salesperson problem (TSP) involves finding paths in a graph with weighted edges such that the total weight is minimized.

Given a graph G and edge weights w: E(G) \to \mb{Z}^+, what is a Hamiltonian cycle H in G that minimises \sum_{e \in H} w(e)? This is the TSP-optimisation problem.

Given a graph G and edge weights w: E(G) \to \mb{Z}^+, what is the smallest \sum_{e \in H} w(e) where H is a Hamiltonian cycle in G? This is the TSP-optimal-value problem.

Given a graph G and edge weights w: E(G) \to \mb{Z}^+ and a positive integer T, is there a Hamiltonian cycle H in G such that \sum_{e \in H} w(e) \le T? This is the TSP-decision problem.

Clearly, \text{TSP-decision} \le_P^T \text{TSP-optimal-value} \le_P^T \text{TSP-optimisation}:

def tsp_optimisation(vertices, edges, weights):
pass # suppose we are given this algorithm, and it returns a list of edges (or None if there are no Hamiltonian cycles)

def tsp_optimal_value(vertices, edges, weights):
hamiltonian_cycle = tsp_optimisation(vertices, edges, weights)
if hamiltonian_cycle is None: return None
return sum(weights[edge] for edge in hamiltonian_cycle)

def tsp_decision(vertices, edges, weights, threshold):
return tsp_optimal_value(vertices, edges, weights) <= threshold

However, it is also true that \text{TSP-optimisation} \le_P^T \text{TSP-optimal-value} \le_P^T \text{TSP-decision}! In other words, if we have a polynomial time algorithm for TSP-decision, we can solve TSP-optimal-value in polynomial time, and if we have a polynomial time algorithm for TSP-optimal-value, we can solve TSP-optimisation in polynomial time:

def tsp_decision(vertices, edges, weights, threshold):
pass # suppose we are given this algorithm, and it returns True or False

def tsp_optimal_value(vertices, edges, weights):
# basically, we do a binary search for the exact total weight of the Hamiltonian cycle
low, high = 0, sum(weights[edge] for edge in edges)
if tsp_decision(vertices, edges, weights, high): return float("inf")
while low < high:
middle = floor((low + high) / 2)
if tsp_decision(vertices, edges, weights, middle):
high = middle
else:
low = middle + 1
return high

def tsp_optimisation(vertices, edges, weights):
# repeatedly remove edges whose removal doesn't increase the optimal total weight (removing edges can never lower the optimal total weight)
lowest_total_weight = tsp_optimal_value(vertices, edges, weights)
if lowest_total_weight == float("inf"): return None
lowest_weight = lowest_total_weight
hamiltonian_cycle = set()
new_weights = dict(weights)
for edge in edges:
new_weights[edge] = float("inf") # remove edge by making it have infinite weight
if not tsp_decision(vertices, edges, new_weights, lowest_total_weight):
new_weights[edge] = weights[edge] # restore the edge, since it's part of the optimal edge
hamiltonian_cycle.add(edge)

In summary, \text{subset-sum} \le_P^T \text{vertex-cover} \le_P^T \text{clique} \le_P^T \text{3SAT} \le_P^T \text{SAT} - these are well-known NP-complete problems.

# 2/12/15

We can transform 3SAT to the clique problem as follows:

Let I be a 3SAT problem instance over variables x_1, \ldots, x_n and clauses C_1, \ldots, C_m. Let C_{i, j} represent the jth literal of the ith clause.
Let f(I) = \tup{G, m} where vertices of G are the literals in C_1, \ldots, C_m (there are exactly 3m vertices, duplicates are allowed) and the edges of G join any two literals that are not negations of each other and not in the same clause - where C_{i, j}, C_{i', j'} such that i \ne i' and C_{i, j} \ne \neg C_{i', j'}. Basically, we transform I into checking if there is a clique (fully connected subgraph) in G of size m or greater. Clearly, f(I) can be computed in polynomial time. Assume I is a yes-instance. Then there is some assignment such that all clauses are satisfied. Then every clause must have at least one true literal. Since these literals can't contradict each other if the clauses are satisfied, there are edges between all of them, so these literals form a clique of size m or more.
Assume f(I) is a yes-instance, so there is a clique with m vertices. Clearly, if there are m vertices then there is one vertex from each clause, since literals within the same clause aren't joined by edges. Since these literals all don't contradict each other as they're connected by edges, it is possible to satisfy all of these literals - all of the clauses are satisfiable and I is a yes-instance.
Therefore, f(I) is a yes-instance if and only if I is a yes-instance.

Given a list of positive integers s_1, \ldots, s_n and a target sum T, is there a subset of s_1, \ldots, s_n such that the sum of the elements is T? This is the subset-sum problem. The subset-sum problem is sort of like the knapsack problem.

We can also transform the clique problem to the subset-sum problem:

Let I be a Clique problem instance over graph G and clique size threshold k.
Let V(G) = \set{v_1, \ldots, v_n} and E(G) = \set{e_1, \ldots, e_n}. Let c_{i, j} = \begin{cases} 1 &\text{if } v_i \in e_j \\ 0 &\text{if } v_i \notin e_j \end{cases}.
Let a_i = 10^{\abs{E(G)}} + \sum_{j = 0}^{\abs{E(G)} - 1} c_{i, j} 10^j and b_j = 10^j. Let T = k 10^{\abs{E(G)}} + \sum_{j = 0}^{\abs{E(G)} - 1} 2 \times 10^j.
We now have a subset-sum problem instance with positive integers a_1, \ldots, a_{\abs{V(G)}}, b_1, \ldots, b_{\abs{E}} and target sum T.
Each sum value a_i has lower digits corresponding to every edge, that are 1 if and only if the edge is incident to vertex i. The digits above these act as a counter - adding k of a_i together makes the top tigits represent k. Each sum value b_j has digits that are 0 except for the one corresponding to edge j. The target sum requires these to be added together to make every digit corresponding to edges equal to 2, and also add exactly k of a_i to make sure the clique is of size k.
Therefore, this problem instance gives the same answer as I.

## Undecidability

A decision problem \Pi is undecidable if and only if there do not exist algorithms that solve it. For an undecidable problem and any algorithm, there is at least one problem instance such that the algorithm won't find the correct answer in finite time.

The Halting problem is the most famous example of an undecidable problem - given a computer program A and the input for the program A, does the program halt in finite time?

This is easy to prove undecidable. Suppose an algorithm solves the Halting problem. Then we can implement it in a program as a function halts(program, input), returning True if program halts when given input input, and False otherwise.

Consider the following program:

def contrarian(program):
if halts(program, program):
while True: pass # never halt
else:
return # immediately halt

contrarian(contrarian)

If contrarian halts, then halts(program, program) returns true, but then contrarian doesn't halt if this is the case, which is impossible. If contrarian doesn't halt, then halts(program, program) returns false, but then contrarian halts immediately if that is the case, which is impossible. Therefore, contrarian neither halts or doesn't halt, a contradiction. So no algorithm can exist that solves the halting problem.

We can actually use Turing reductions to prove that some other problems are undecidable - if we can Turing reduce the halting problem to another problem \Pi, then it must also be undecidable, since if it wasn't we could solve the halting problem using it.