Lecture Notes by Anthony Zhang.

CO250

Introduction to optimization.

Levent Tunçel
Section 081 (online)
Email: ltuncel@uwaterloo.ca

3/1/17

In this course, we will be studying linear programming, integer programming, the Simplex algorithm, and the concept of duality, as well as various ways to efficiently approximate answers to these sorts of problems.

An abstract optimization problem (also known as a "P") is a problem in which we are trying to maximize or minimize a certain objective function - finding the inputs to the function, subject to certain constraints, such that the value of the function is maximised. In other words, given A \subseteq \mb{R}^n (the set of all feasible inputs, also known as the feasible region) and a function f: A \to \mb{R} (the objective function), find x \in A such that f(x) is the global maximum/minimum.

Optimization problems show up in almost every industry - hardware companies might want to simplify their circuit layouts, hotels might want to fill the most rooms possible at the highest price, and a factory might want to generate the most profit given constraints on materials and labour. Also, you can use it to play Pandemic optimally.

There are three types of abstract optimization problems:

Linear programming problems (also known as LPs) are abstract optimization problems where A is representable as a set of linear equations/inequalities, and f is a linear function.
Integer programming problems (also known as IPs) are linear programming problems where A \subseteq \mb{Z}^n.
Non-linear programming problems (also known as NLPs) are abstract optimization problems where A cannot be represented as a set of linear equations/inequalities, or f is non-linear.

Abstract optimization problems are usually given as prose descriptions of the problems, which we then turn into abstract mathematical models/programs/problems. Then, we can take advantage of the many powerful computer solvers available today to automatically find optimal solutions.

Suppose a company produces either plastic carrots (which sell for $5 each and take 4 minutes to make) or rubber potatoes (which sell for $8 each and take 8 minutes to make). The company has 500 minutes available in the rest of the day, and the machine that makes either of these products can only be used up to 100 times - how many of each product should be made to maximize profit?

First, we define our decision variables - the actual values we are trying to find. Let c be the number of carrots, and p be the number of potatoes.
Profit can now be defined as a function of those decision variables: f(c, p) = 5c + 8p.
Next, we can define constraints to specify A. There are only 500 minutes left, and the time spent depends on the number of each product made, so 4c + 8p \le 500. There are only 100 products left that we can make, and this is simply the number of carrots and potatoes, so c + p \le 100. Additionally, c and p must be non-negative, since they represent the number of each product.
We now have a mathematical model of the problem: "maximize 5c + 3p subject to 4c + 3p \le 500 and c + p \le 100".

A feasible solution to an abstract optimization problem is any x \in A - any variable assignment such that the constraints of the problem are satisfied, like c = 50, p = 10. An optimal solution is one that actually solves the problem - a feasible solution that results in the global maximum/minimum of the objective function. Likewise, feasible solutions to prose descriptions of the problem are assignments to the unknowns, like "make 50 carrots and 10 potatoes".

To show that a mathematical model of a problem is correct (properly represents the prose description of the problem), we usually define a bijection between feasible solutions for the mathematical model and feasible solutions for the prose description of the problem, such that the bijection preserves cost - each x \in A for the mathematical model must have f(x) equal to the the profit in the prose description for its corresponding feasible solution.

For example, show that the model from the above example is correct:

4c + 3p \le 500 \land c + p \le 100 can be trivially bijected to "carrots take 4 minutes to make and potatoes take 8 minutes, and the machine can only be used to make 100 products".
Clearly, f(c, p) = 5c + 8p has the same value as the profit since "carrots sell for $5 each and potatoes sell for $8 each".

5/1/17

In this course we will mostly focus on affine optimization problems, since solving general optimization problems is much harder.

An affine function is one that can be written as f(\vec{x}) = \vec{a} \cdot \vec{x} + \beta where \vec{a} \in \mb{R}^n and \beta \in \mb{R}. If \beta = 0, the function is also linear.

A linear program is an abstract optimization problem of the form \min \set{f(x) : x \in \mb{R}^n \wedge \left(\forall 1 \le i \le m, g_i(x) \le b_i\right)} where f is affine, g_1, \ldots, g_m are all linear, and m is finite. We might also replace \min with \max depending on the problem.

Usually we write this as: "minimize f(x) subject to g_1 \le b_1, \ldots, g_m \le b_m", or "maximize f(x) s.t. g_1 \le b_1, \ldots, g_m \le b_m". It's also common to use \vec{x} \le \mb{0} as shorthand for x_1 \ge 0, \ldots, x_n \ge 0.

Suppose an oil company in the next 4 months needs to supply 5000L, 8000L, 9000L, and 6000L, respectively, and oil costs them 0.75/L, 0.72/L, 0.92/L, and 0.90/L, respectively. The company has an oil tank that can store up to 4000L, and starts off with 2000L. Oil is bought at the beginning of the month, is used throughout the month, and is all dumped into the tank at the end of the month. What quantity of oil should be bought in each month to minimize cost?

Clearly, the inputs are the quantity of oil bought in each month, b_1, b_2, b_3, b_4. For convenience, we will also define the tank value at the beginning of each month, t_1, t_2, t_3, t_4. Clearly, 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000, since the capacity of the tank is 4000L.
The objective function we are minimizing is f(b_1, b_2, b_3, b_4) = 0.75b_1 + 0.72b_2 + 0.92b_3 + 0.90b_4.
Clearly, the tank level in month i is t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000, since the tank level is just the leftover oil after supplying the last month's oil.
Clearly, in order to supply enough oil, we must have t_1 + b_1 \ge 5000, t_2 + b_2 \ge 8000, t_3 + b_3 \ge 9000, t_4 + b_4 \ge 6000. However, the first three constraints are redundant with the previously defined constraints t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and 0 \le t_1, 0 \le t_2, 0 \le t_3, 0 \le t_4, so we only need the t_4 + b_4 \ge 6000 constraint.
These encode all of the problem constraints, so we have "minimize 0.75b_1 + 0.72b_2 + 0.92b_3 + 0.90b_4 subject to 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000 and t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and t_4 \ge 6000".
If we use a linear programming solver, we get b_1 = 3000, b_2 = 12000, b_3 = 5000, b_4 = 6000, which gives us the minimum cost 20890.

Same thing, but minimise the maximum amount of oil purchased in any single month, and then show that the model is correct assuming that the above example's model was correct:

From the previous example, we have the same constraints 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000 and t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and t_4 \ge 6000.
However, the objective function is different - let M be the maximum amount of oil bought in a single month, and we can minimize this value. We can define M to be the maximum amount of oil bought in a single month by adding the constraints b_1 \le M, b_2 \le M, b_3 \le M, b_4 \le M.
These encode all of the problem constraints, so we have "minimize M subject to 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000 and t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and t_4 \ge 6000 and b_1 \le M, b_2 \le M, b_3 \le M, b_4 \le M".
Now to show correctness. Suppose that M is an optimal solution that has values b_1, b_2, b_3, b_4. Since M is a feasible solution, M \ge \max\set{b_1, b_2, b_3, b_4}, and since M is an optimal solution, it is the lowest such value that satisfies that, so M = \max\set{b_1, b_2, b_3, b_4}. Therefore, M is also the minimum maximum amount of oil purchased in any single month.

10/1/17

Integer programs are just linear programs plus a integrality constraint - a constraint that all of the variables must be integers. When we formulate integer programs, we just add the constraint x_1, \ldots, x_n \text{ are integral}.

An integer program is mixed if there are still non-integral variables, and pure if it only has integral variables. Integer programs are harder to solve than LPs.

The size of a linear/integer program generally refers to either the number of variables, of the number of constraints. The running time of an IP/LP solver algorithm is the number of steps it takes, as a function of the size of the IP/LP.

Efficient algorithms are polynomially. All LPs can be solved in polynomial time, though with a high exponent. Solving integer programs, however, are in NP, and so are highly unlikely to be solvable in polynomial time.

The knapsack problem is a special case of integer programs. Given a set of n items, with values v_1, \ldots, v_n and weights w_1, \ldots, w_n, how do we maximize the total value of the items we can carry in a knapsack given that the total weight cannot exceed m? From CS341, we know that if the number of items can be fractional, we can just use a greedy algorithm.

As an integer program, we have: maximize v_1 x_1 + \ldots = v_n x_n subject to w_1 x_1 + \ldots + w_n x_n \le m, x_1, \ldots, x_n \text{ are integral}.

Write the IP for a knapsack problem where we have n = 4 and either x_1 + x_2 is at least 4, or x_3 + x_4 is at least 4, and we can only send x_3 if we send at least one x_4:

We're trying to maximize v_1 x_1 + v_2 x_2 + v_3 x_3 + v_4 x_4.
Clearly, we can express x_1 + x_2 \ge 4 AND x_3 = x_4 \ge 4, but how do we express x_1 + x_2 \ge 4 OR x_3 + x_4 \ge 4?
One way to do this is by adding a binary variable y to our IP, constrained such that 0 \le y \le 1 and y \text{ is integral}.
We can now express x_1 + x_2 \ge 4 \lor x_3 + x_4 \ge 4 as x_1 + x_2 \ge 4y, x_3 + x_4 \ge 4(1 - y). Whenever y is 0, 4y is 0, so x_1 + x_2 will always be true (since x_1 and x_2 are non-negative), while 4(1 - y) is 4, so the second constraint becomes x_3 + x_4 \ge 4. Whenever y is 1, 4y is 4, so the first constraint becomes x_1 + x_2 \ge 4, while the second constraint is always true (since x_3 and x_4 are non-negative).
Now, how do we express sending x_3 only if we send at least one x_4? We want to express that x_4 = 0 \implies x_3 = 0.
One way to do this is to multiply x_4 by a large enough value so that if it's 1 or more, the resulting value will definitely be greater than x_3. For example, x_3 must be no greater than \floor{\frac m {w_3}}, so we can add the constraint x_3 \le \floor{\frac m {w_3}} x_4 to do this. When x_4 = 0, \floor{\frac m {w_3}} x_4 is also 0, so x_3 must be 0 as well. When x_4 \ge 1, \floor{\frac m {w_3}} x_4 \ge \floor{\frac m {w_3}}, so x_3 \le \floor{\frac m {w_3}} x_4 is always true.
Therefore, we can write this as "maximize v_1 x_1 + v_2 x_2 + v_3 x_3 + v_4 x_4 subject to w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 \le m, x_1 + x_2 \ge 4y, x_3 + x_4 \ge 4(1 - y), 0 \le y \le 1, y \text{ is integral}, x_3 \le \floor{\frac m {w_3}} x_4, x_1, x_2, x_3, x_4 \ge 0, x_1, x_2, x_3, x_4 \text{ is integral}".

The above example demonstrates a common trick of using a binary variable to express logical constraints involving disjunction. The binary variable lets us make constraints that apply when it has one value, and always be true for other values.

Scheduling is one of the most common optimization problems in industry. Suppose we have the coffee shop that is open on weekdays, hires workers that work for 4 consecutive weekdays per week (can wrap around weeks, and ignoring weekends), and requires 3, 5, 9, 2, and 7 workers to meet demand on Monday through Friday, respectively. How do we hire the smallest number of workers that meets this demand, and how do we schedule them to meet this demand?

First, we could define 5 variables, x_1, \ldots, x_5, for the number of workers who should start working on each weekday. Then, the objective becomes to minimize x_1 + \ldots + x_5.

What are the feasible solutions? Well, the constraints are that the number of workers starting on each day must be non-negative, and that the demand is met each day. To ensure we have enough demand for each day, we can define five constraints for each weekday. For example, for Monday demand to be met, we need the number of workers who start on Monday, Friday, Thursday, or Wednesday to be at least 3, since Monday demand is 3 and on Monday, we have workers from up to 4 days ago: x_1 + x_5 + x_4 + x_3 \ge 3.

So, the problem becomes: "minimize x_1 + \ldots + x_5 subject to x_1 + x_5 + x_4 + x_3 \ge 3, x_2 + x_1 + x_5 + x_4 \ge 5, x_3 + x_2 + x_1 + x_5 \ge 9, x_4 + x_3 + x_2 + x_1 \ge 2, x_5 + x_4 + x_3 + x_2 \ge 7, and x_1, \ldots, x_5 \ge 0".

Suppose we wanted to constrain a variable x to be one of k_1, \ldots, k_n. One way we could do this is to have binary variables y_1, \ldots, y_n, where y_i represents x = k_i - adding a constraint of the form x = k_1 y_1 + \ldots + k_n y_n. Then, we can ensure that exactly one of them is true by adding the constraint y_1 + \ldots y_n = 1.

12/1/17

Another common type of linear programming problem in the real world is graph optimization. For example, how do we find the shortest path on a map? How do we optimize a microchip layout?

For the shortest path on a map example, we can associate street intersections with vertices, and streets with edges, weighted by the distance (a non-negative real number) along its length between intersections.

An s, t-walk in a graph is a sequence of edges s v_1, v_1 v_2, \ldots, v_k t. An s, t-path is an s, t-walk such that non-consecutive edges in the sequence don't share any vertices (we never revisit the same vertex twice).

The length of an s, t-path is the sum of the weights of the edges in the path. So, given a graph G = \tup{V, E}, a weight function w: E \to \mb{R}, and two vertices s, t, we want to find the minimum length s, t-path. How do we write this as an IP?

Suppose we have tasks t = t_1, \ldots, t_n and machines m = m_1, \ldots, m_n, and c_{m_i, t_j} represents the cost of having machine m_i perform task t_j. How do we match machines to tasks to minimize total cost?

Create a graph with one vertex for each job and edges between them weighted by the corresponding cost.
We want to find the perfect (every vertex in the graph is incident to an edge in the matching) matching (subset of edges such that none share vertices) in the bipartite graph with the minimum total weight.
Let \delta(v) be a function that returns the set of all edges incident to a given vertex. Then a set of edges M is a perfect matching if and only if \forall v \in V, \magn{M \cap \delta(v)} = 1.
For our IP, we can have a binary variable x_{\text{machine}, \text{task}} for every edge \tup{\text{machine}, \text{task}}. Now we can simply minimize \sum_{m_i \in m, t_j \in t} c_{m_i, t_j} x_{m_i, t_j}.
To constrain solutions to perfect matchings only, we can add the constraints \sum_{m_i \in m} \sum_{t_j \in \delta(m_i)} x_{m_i, t_j} = 1 and \sum_{t_j \in t} \sum_{m_i \in \delta(t_j)} x_{m_i, t_j} = 1 (for each vertex, choose exactly one incident edge).

17/1/17

The shortest path problem: given a graph G = \tup{V, E}, what is the s, t-path that has the minimum total length? We want to formulate this as an IP.

Recall from MATH239 that a cut in a graph is a partitioning of V into two disjoint subsets, often written as just one of the sets. In other words, a cut is a pair \tup{S, T} such that S \subseteq V, T \subseteq V, S \cap T = \emptyset, S \cup T = V.

An s, t-cut is a set of edges rather than a pair of sets of vertices. Let \delta(S) = \set{\set{u, v} \in E : u \in S, v \notin S} for S \subseteq V. Then given s \in S and t \notin S, \delta(S) is an s, t-cut. To enumerate all s, t-cuts, we can simply find all subsets of V that contain s but not t, then apply \delta to those sets of vertices.

Clearly, if we remove the edges contained in the s, t-cut from the graph, s and t would no longer be connected. Therefore, every s, t-path must contain at least one edge from every s, t-cut.

Also, if S \subseteq E contains at least one edge from every s, t-cut, then S also is a superset of an s, t-path. This is because if S had an edge from every s, t-cut and didn't have an s, t-path, the set of all vertices R that can be reached from s is an s, t-cut (since it includes s but not t, as no s, t-path exists by assumption), so S must include an edge from \delta(R). However, S can't contain any edges from \delta(R), because if it did contain an edge \set{u, v} then both u and v should have been in R (since they're reachable from s). Therefore, S must also contain an s, t-path.

Define binary variables x_1, \ldots, x_m for each edge e_1, \ldots, e_m being or not being in the shortest path, respectively. Let w_1, \ldots, w_n be the weights of each edge e_1, \ldots, e_m, respectively.

Clearly, we want to minimize x_1 w_1 + \ldots + x_m w_m (i.e., the length of the path). What are the feasible solutions (i.e., supersets of s, t-paths)?

To be an s, t-path, the edges of the path must include at least one edge from every s, t-cut. This is a lot easier to write as constraints: we have one constraint for each s, t-cut \delta(S), and each constraint ensures that at least one of the edge variables T \subseteq \set{x_1, \ldots, x_m} corresponding to \delta(S) is true.

For example, given the graph G = \tup{\set{a, b, s, t}, \set{ab, as, at, bs, bt}} with weights w_1, \ldots, w_5, we might write the shortest path problem as "minimize x_1 w_1 + \ldots + x_5 w_5 subject to x_2 + x_4 \ge 1 (for the s, t-cut \set{sa, sb}), x_1 + x_2 + x_3 \ge 1 (for the s, t-cut \set{ab, as, at}), x_1 + x_2 + x_5 \ge 1 (for the s, t-cut \set{ab, as, bt}), x_3 + x_5 (for the s, t-cut \set{at, bt}), and x_1, \ldots, x_m are non-negative integers".

Note that we don't need an upper bound on x_1, \ldots, x_m. This is because if it was greater than zero, then it wouldn't be an optimal solution.

Nonlinear programs are problems of the form "minimize f(x) subject to g_1(x) \le 0, \ldots, g_m(x) \le 0" where f, g_1, \ldots, g_m: \mb{R}^n \to \mb{R}. This is a superset of linear programs and integer programs. Basically, we can do arbitrary constraints, math permitting.

For example, we can minimize Euclidean distance by using an objective function (x_1 - p_1)^2 + (x_2 - p_2)^2 + (x_3 - p_3)^2. Another example is constraining variables to integers: simply add the constraint \sin(\pi x) = 0 to ensure that x is an integer, or x(1 - x) = 0 to ensure that x is either 0 or 1.

It's even possible to write Fermat's last theorem as a single NLP: "minimize (a^n + b^n - c^n)^2 subject to \sin(\pi a) = 0, \sin(\pi b) = 0, \sin(\pi c) = 0, \sin(\pi n) = 0, a, b, c \ge 1, n \ge 3" - the objective function solution is 0 if and only if a^n + b^n = c^n, and the theorem is true if and only if the optimal value is greater than 0.

23/1/17

What does it mean to solve an optimization problem? What is a solution to an optimization problem?

Consider the LP "maximize 2x_1 + 3x_2 subject to x_1 + x_2 \le 1, x_1, x_2 \ge 0". Clearly, in this case the optimal solution is simply x_1 = 1, x_2 = 0. However, in general, it is more difficult to say what exactly can be considered a solution.

A feasible solution is an assignment of values to each of the variables in the LP such that all of the constraints are satisfied - regardless of the objective function value. A feasible optimization problem is one that has at least one feasible solution, and an infeasible optimization problem is one that has none. An optimal solution is a feasible solution where the objective function is maximized (for a maximization problem), or minimized (for a minimization problem).

However, an optimization problem having a feasible solution doesn't mean that it has an optimal solution - the maximum/minimum objective function value could be unbounded, like for "maximize x subject to x \ge 0". If for any feasible solution there exists another with a higher/lower objective function value (depending on whether it's a maximization/minimization problem), the optimization problem is unbounded, and otherwise, it is bounded.

Consider the optimization problem "maximize x subject to x < 1". Clearly, this is feasible (x = 0 is a solution), and bounded (x \le 1 because x < 1), yet it has no optimal solution - the objective function can always get closer to 1 for any given value of x! The same occurs for the optimization problem "minimize \frac 1 x subject to x \ge 1". Note that neither of these are LPs, because one uses a strict inequality, and the other a division.

LPs are a bit better behaved than general optimization problems. According to the Fundamental Theorem of Linear Programming, an LP either has at least one optimal solution, is infeasible, or is unbounded.

An LP solver should therefore output one of the following: an optimal solution to the input LP (plus proof that it's optimal), a proof that the LP is infeasible, or a proof that the LP is unbounded.

Consider the linear system A\vec x = \vec b, \vec x \ge \vec 0. If we can find a \vec y such that {\vec y}^T A \ge {\vec 0}^T and {\vec y}^T \vec b < {\vec 0}^T, then the system can't have any solutions. In other words, we can add/subtract/scale certain equality constraints in the linear system together to get a new constraint such that the coefficients of the constraint are all non-negative, yet the right hand side is negative. Here, \vec y simply represents the fact that we're adding/subtracting/scaling rows in A and elements in \vec b to form a new constraint.

If we have such a constraint (left side coefficients non-negative, right side negative), then the left side of that constraint must be a non-negative value (since the elements of \vec x are also non-negative) and the right side is negative (by assumption), so this constraint is not satisfiable - the LP must be infeasible.

According to Farkas' Lemma, if there's no feasible solutions, then a \vec y that satisfies those conditions must exist. Therefore, a value of \vec y such that {\vec y}^T A \ge {\vec 0}^T and {\vec y}^T \vec b < 0 is a proof of infeasibility. For an LP in SEF, a value \vec y such that {\vec y}^T A = {\vec 0}^T and {\vec y}^T \vec b \ne 0 works just as well for proof purposes - a linear combination of constraints that results in the left side being zero and the right side nonzero.

For example, consider an LP with constraints 3x_1 - 2x_2 = 6, 2x_1 - x_2 = 2. If we choose y = \begin{bmatrix} -1 \\ 2 \end{bmatrix}, then we get {\vec y}^T A = \begin{bmatrix} 1 & 0 \end{bmatrix} and {\vec y}^T \vec b = -4. This means that the two constraints, with the first one subtracted from the second one doubled, imply the constraint x_1 = -2, which is always infeasible because \vec x \ge \vec 0.

To prove that an optimal solution \vec x for an LP is actually optimal, we simply need to prove that for any feasible solution \vec x', the objective function value is less or equal to the objective function value for \vec x - basically, proving that the upper bound of the objective function occurs at \vec x. This is a proof of optimality. We will look at how to construct these proofs systematically later on, using the strong duality theorem.

Consider the LP "maximize 2x_1 + 3x_2 subject to x_1 + x_2 \le 1, x_1, x_2 \ge 0". Clearly, any feasible solution x_1, x_2 must be

To prove that an LP is unbounded, we construct a family of feasible solutions \vec x(t) for all t \ge 0, then show that as t goes to infinity, so does the value of the objective function. In other words, we need to prove that for any arbitrarily high objective function value (assuming a maximization problem), we can construct a variable assignment that is both a feasible solution, and results in that objective function value - a proof of unboundedness.

This family of feasible solutions is always a line in the linear space. Therefore, a family of feasible solutions \vec x(t) = \vec x_0 + t \vec r such that for any z, there exists a t such that \vec x(t) = z is a proof of unboundedness. In matrix form, a proof of unboundedness is simply a value of \vec x_0, \vec r such that \vec x_0 \ge \vec 0, \vec r \ge \vec 0, A \vec x_0 = \vec b, A \vec r = \vec 0, {\vec c}^T \vec r > 0. Also, if the LP is unbounded, those values of \vec x_0, \vec r must exist.

Consider the LP "maximize x subject to x \ge 0". Clearly, for any objective function value z \ge 0, we can construct an assignment x = z, which must be a feasible solution since z \ge 0. Additionally, the objective function value for the feasible solution x must be z. Therefore, the LP is unbounded, and a proof for that would be \vec x_0 = \begin{bmatrix} 0 \end{bmatrix}, \vec r = \begin{bmatrix} 1 \end{bmatrix}.

Basically, whichever outcome an LP has, there must exist a proof that that outcome is the case.

A free variable is a variable that is not constrained.

An LP is in standard equality form (SEF) if and only if all of the following are true:

The LP is a maximization problem.
Every variable is constrained to be non-negative.
All constraints are equality constraints.

In other words, an LP in SEF is a problem of the form "maximize \vec c \cdot \vec x subject to A \vec x = \vec b, \vec x \ge \vec 0", where A is an n by m coefficients matrix and \vec x, \vec b, \vec c \in \mb{R}^m.

Every LP has an "equivalent" LP that is in SEF. By equivalent, we mean the original LP is unbounded if and only if the equivalent LP is unbounded, the original LP is infeasible if and only if the equivalent LP is infeasible, and an optimal solution of one can easily be used to compute an optimal solution of another.

To convert an LP into its equivalent in SEF:

If the LP is a minimization problem, turn it into a maximization problem by inverting the objective function: "minimize f(x) subject to ..." becomes "maximize -f(x) subject to ...".
Replace constraints of the form c_1 x_1 + \ldots c_k x_k \le b with c_1 x_1 + \ldots c_k x_k + s = b where s \ge 0 is a new variable.
Replace constraints of the form c_1 x_1 + \ldots c_k x_k \ge b with c_1 x_1 + \ldots c_k x_k - s = b where s \ge 0 is a new variable.
Replace free variables (variables that don't have upper or lower bounds) with a - b where a \ge 0, b \ge 0 are new variables. Their difference can have any value, positive or negative.

Once this is done, we have an equivalent LP in SEF. Also, we can easily convert optimal solutions for the LP in SEF into optimal solutions for the original LP, simply by taking the values of the corresponding variables and ignoring the newly introduced variables.

25/1/17

We can actually solve LPs by hand, by building an intuition for what would maximise the objective function, and trying to satisfy the constraints with that in mind.

For example, suppose we have the LP "maximise 4x_1 + 3x_2 + 7 subject to 3x_1 + 2x_2 + x_3 = 2 and x_1 + x_2 + x_4 = 1 and x_1, x_2, x_3, x_4 \ge 0". Clearly, x_1 = 0, x_2 = 0, x_3 = 2, x_4 = 1 is a feasible solution, because if we set x_1 = x_2 = 0, the constraints become x_2 = 2, x_4 = 1. This has objective value 7. How do we find a feasible solution with a higher objective function value?

To get a better objective function value, we need to either increase x_1 or x_2, the two variables in the objective function. To keep things simple, we'll just arbitrarily choose to modify only one of the variables for now - maximize x_1 while keeping x_2 = 0.

Now the LP becomes "maximise 4x_1 + 7 subject to 3x_1 + x_3 = 2 and x_1 + x_4 = 1 and x_1, x_3, x_4 \ge 0". To maximize this, we just need to maximize x_1. Clearly, the LP is feasible whenever x_3 = 2 - 3x_1 and x_4 = 1 - x_1 - we have a feasible solution for every possible value of x_1 where all the non-negativity constraints are satisfied.

Since x_3 \ge 0 and x_3 = 2 - 3x_1, 2 - 3x_1 \ge 0, so x_1 \le \frac 2 3. Since x_4 \ge 0 and x_4 = 1 - x_1, 1 - x_1 \ge 0, so x_1 \le 1. Therefore, since we also know that x_1 \ge 0, 0 \le x_1 \le \frac 2 3. Since we're trying to find the largest value of x_1 that still satisfies the constraints, x_1 = \frac 2 3. This gives us an objective function value of \frac 8 3 + 7 with the solution x_1 = \frac 2 3, x_2 = 0, x_3 = 0, x_4 = \frac 1 3, which is better than our original solution.

Turns out, however, this solution is still not optimal. Additionally, we can't increase x_2 without decreasing x_1 at this point, so we can't just repeat the trick with x_2 this time instead of x_1. However, it would be possible to use the "maximize one variable while keeping others in the objective function the same" trick again if we rewrote the LP in what's known as canonical form. We will look at how to do this later.

The basic idea behind the simplex algorithm:

Find a feasible solution \vec x. If not found, give a proof of infeasibility and stop.
Rewrite the LP in canonical form.
If \vec x is optimal, give a proof of optimality and stop.
If \vec x is unbounded, give a proof of unboundedness and stop.
Find a feasible solution with a higher objective function value than \vec x.
Repeat steps 2-6 until the algorithm terminates.

The main questions are: What's canonical form and how do we rewrite LPs in that form? Why does it ensure that we can find a better feasible solution?

4/2/17

The column sub-matrix notation allows us to select a subset of columns from a matrix into a new matrix: if A = \begin{bmatrix} a_{1, 1} & \ldots & a_{1, m} \\ \vdots & \vdots & \vdots \\ a_{n, 1} & \ldots & a_{n, m} \end{bmatrix} and B = \set{b_1, \ldots, b_k}, 1 \le b_i \le m, then A_B = \begin{bmatrix} a_{1, b_1} & \ldots & a_{1, b_k} \\ \vdots & \vdots & \vdots \\ a_{n, b_1} & \ldots & a_{n, b_k} \end{bmatrix}. As a short form, we can also do A_j where 1 \le j \le m, which means "column j of the matrix A", a vector of size n.

For a vector \vec x = \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}, we also often use the subscript to denote the row sub-matrix \vec x_B = \begin{bmatrix} x_{b_1} \\ \vdots \\ x_{b_k} \end{bmatrix}. However, this isn't that common and it's generally preferred to write it out in full to avoid confusion.

Bases and basic solutions

A basis of a matrix is a subset of column indices B \subseteq \set{1, \ldots, m} such that A_B is a square matrix and A_B is non-singular. In other words, B is a set of size n, and the columns of A_B are linearly independent (so they can't be written as linear combinations of each other). Not all matrices will have a basis; for example, a 2 by 2 zero matrix (the only value of B that could result in a square A_B is \set{1, 2}, but A_B is singular).

We often denote the inverse of the basis as \overline B = \set{1, \ldots, m} - B - the column indices for columns that are not in the basis B.

From MATH136 we learned that the number of linearly independent columns in a matrix is the same as the number of linearly independent rows. This means that the number of independent columns is between 0 and the number of independent rows. Therefore, if we have a square matrix A_B, it can only possibly be a basis if all of the rows of A are independent; if there were less than n independent rows, there would also be less than n independent columns, so A_B must be singular. Therefore, a matrix has a basis if and only if all of its rows are independent, and that basis is a set of n independent columns in the matrix (which is a maximal set of independent columns in the matrix).

Suppose we have an LP in standard equality form: "maximize {\vec c}^T \vec x subject to A \vec x = \vec b". If B is a basis of A, then a variable x_i is a basic variable for B if i \in B, or a non-basic variable for B if i \notin B (i.e., i \in \overline B). Essentially, the coefficients of each non-basic variable can be written as a linear combination of the coefficients of the basic variables.

A basic solution for an LP and a basis B is a solution \vec x for the LP such that A \vec x = \vec b and all non-basic variables are set to 0. For the basic solution for a basis B = \set{b_1, \ldots, b_k}, the LP becomes "maximize {\vec c}^T \vec x subject to A_B \begin{bmatrix} x_{b_1} \\ \vdots \\ x_{b_k} \end{bmatrix} = \vec b".

Since B is a basis, A_B must be invertible, so A_B^{-1} A_B \begin{bmatrix} x_{b_1} \\ \vdots \\ x_{b_k} \end{bmatrix} = A_B^{-1} \vec b, or \vec x_B = A_B^{-1} \vec b. Interestingly, this means that a basis B has exactly one basic solution, and that solution is a vector \vec x where {\vec x}_B = A_B^{-1} \vec b and {\vec x}_{\overline B} = \vec 0.

Note that a basic solution doesn't necessarily satisfy the non-negativity constraints \vec x \ge \vec 0. A basic solution \vec x is feasible if and only if it does satisfy this constraint \vec x \ge \vec 0, because by definition it already satisfies A \vec x = \vec b. Solutions can be any combination of feasible/not-feasible and basic/not-basic.

A basic solution \vec x for an LP (without a specified basis) is a feasible solution that is a basic solution for the LP for one or more bases for that LP. So for some set of n linearly independent columns \vec x has corresponding values that result in the constraints being satisfied, and for all other columns \vec x has the element 0.

Suppose again that we have an LP in standard equality form: "maximize {\vec c}^T \vec x subject to A \vec x = \vec b". If the rows of A are dependent (at least one row can be written as a linear combination of others), then either the LP is infeasible, or A has a redundant constraint, which we can remove. Here's why: since there are dependent rows, one of the rows could be written as a linear combination of others, so we could cancel out its coefficients by adding/subtracting/scaling constraints to get the new constraint 0x_1 + \ldots + 0x_m = k. If k \ne 0, then the constraint is clearly not satisfiable, so the LP is infeasible, and if k = 0, then that constraint is redundant because it just says 0 = 0.

Canonical form

Suppose again that we have an LP in standard equality form: "maximize {\vec c}^T \vec x subject to A \vec x = \vec b". Suppose B is a basis for A.

The LP is in canonical form if and only if A_B = I (the column sub-matrix is the identity matrix) and c_{b_1} = \ldots = c_{b_k} = 0 (the objective function only contains non-basic variables).

Since A_B = I for an LP in canonical form with respect to a basis B, A_B^{-1} = I^{-1} = I, so the basic solution for B is just \vec x such that {\vec x}_B = \vec b and {\vec x}_{\overline B} = \vec 0.

Interestingly, it's always possible to write an LP into canonical form given a basis, such that the new LP has the same feasible region and feasible solutions have the same objective function value as the original LP. How do we do this?

Suppose we have the LP in SEF "maximize \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x subject to \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x = \begin{bmatrix} 1 \\ 2 \end{bmatrix} and \vec x \ge \vec 0". One basis for this LP is B = \set{2, 3} - it's a square matrix, and its columns are linearly independent.

To convert this LP into canonical form given B, we start by making A_B into the identity matrix. To do this, we subtract the first equality constraint from the second, and then swap them to get the equality constraints \begin{bmatrix} -1 & 1 & 0 & 3 \\ 1 & 0 & 1 & -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}.

In general, to do this we left multiply both sides of A \vec x = \vec b by A_B^{-1} to get A_B^{-1} A \vec x = A_B^{-1} \vec b. Since A_B^{-1} A is the matrix formed by multiplying A_B^{-1} with the columns of A individually, the columns in B will end up as columns of the identity matrix, so (A_B^{-1} A)_B is indeed the identity matrix. Because A_B is non-singular, A_B^{-1} A \vec x = A_B^{-1} \vec b has the exact same solutions for \vec x as A \vec x = \vec b, too.

Now, we need to make \begin{bmatrix} c_{b_1} \\ \vdots \\ c_{b_k} \end{bmatrix} = \vec 0 - all basic variables should be removed from the objective function. To do this, we'll take a linear combination \vec y of the equality constraints, to get {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x = {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}, or 0 = -{\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}.

We'll add the objective function to both sides of that, to get \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x + {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x = \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}. Rearranging, we get \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x = \left(\begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} - {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix}\right) \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}.

Clearly, the left side is the objective function, and it's equal to the right side. However, we can manipulate \vec y until the value that's being multiplied with \vec x has zero entries for our basic variables! Basically, we want to choose \vec y such that {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix}_{\set{2, 3}} = \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix}_{\set{2, 3}}. In other words, y_1 0 + y_2 1 = 0, y_1 1 + y_2 1 = 2. Solving this system, we get \vec y = \begin{bmatrix} 2 \\ 0 \end{bmatrix}.

Now \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x = \left(\begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} - {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix}\right) \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix} becomes \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x = \left(\begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} - \begin{bmatrix} 2 & 0 & 2 & -2 \end{bmatrix}\right) \vec x + 2 = \begin{bmatrix} -2 & 0 & 0 & 6 \end{bmatrix} \vec x + 2. So for feasible solutions, the objective function is exactly equal to \begin{bmatrix} -2 & 0 & 0 & 6 \end{bmatrix} \vec x + 2.

In general, we take an arbitrary linear combination of the constraints {\vec y}^T A \vec x = {\vec y} \vec b, and then add the objective function to both sides to get {\vec c}^T \vec x + {\vec y}^T A \vec x = {\vec c}^T \vec x + {\vec y} \vec b, and rearrange to get {\vec c}^T \vec x = \left({\vec c}^T - {\vec y}^T A\right) \vec x + {\vec y} \vec b.

Then, we solve the linear system {\vec c}_B - {\vec y}^T A_B = {\vec 0}^T for \vec y. We can transpose both sides to write the formula as A_B^T \vec y = {\vec c}_B, then multiply both sides by (A_B^{-1})^T to get \vec y = (A_B^{-1})^T {\vec c}_B (this is equivalent because A_B is guaranteed to be non-singular). Then, we subsitute that value of \vec y back into {\vec c}^T \vec x = \left({\vec c}^T - {\vec y}^T A\right) \vec x + {\vec y} \vec b to get another way to write the objective function for feasible values of \vec x.

So in general, given the LP in SEF "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0", the corresponding LP in canonical form for a basis B can be computed as follows:

Let A' = A_B^{-1} A and \vec b' = A_B^{-1} \vec b.
Let \vec c' = \left({\vec c}^T - \vec y^T A\right) \vec x and z' = \vec y^T \vec b where \vec y = (A_B^{-1})^T {\vec c}_B.
The new LP is "maximize {\vec c'}^T \vec x + z' subject to A' \vec x = \vec b' and \vec x \ge \vec 0".

Aside: sometimes in this course we'll use the notation A^{-T}, it represents inverse of transpose ((M^T)^{-1}) or transpose of inverse ((M^{-1})^T), which are equivalent to each other.

Simplex Algorithm

Suppose we have an LP "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0" in canonical form for a basis B, and a basic feasible solution \vec x. The simplex algorithm requires us to find a better feasible solution from this, but how do we do that?

Essentially, to get a better feasible solution from an LP:

Pick a non-basic variable x_k \in \overline B with a positive coefficient in the objective function (c_k > 0).
- If there are no non-basic variables with a positive coefficient in the objective function, we've found the optimal solution! This is because the basic solution already has non-basic variables set to 0 (the smallest possible value). Since the LP is in canonical form, {\vec c}_B = \vec 0, and {\vec c}_{\overline B} \le 0, and \vec x \ge 0, the objective function \vec c \cdot \vec x + z = {\vec c}_{\overline B} \cdot {\vec x}_{\overline B} + z must be maximized at {\vec x}_{\overline B} = \vec 0.
Find the largest value of x_k that keeps \vec x within the feasible region (i.e., satisfying \vec x \ge 0), by changing x_k and the basic variables, while keeping all the non-basic variables other than x_k at 0. Let x_k' be that value of x_k and x_{b_1}', \ldots, x_{b_k}' be the values of the basic variables that allow that feasible solution.
- Since the columns in the basis form an identity matrix, A_B {\vec x}_B = {\vec x}_B. Since the other non-basic variables have 0 coefficients, A \vec x = \vec b is just {\vec x}_B + x_k A_k = \vec b, or {\vec x}_B = \vec b - x_k A_k.
- So to keep \vec x feasible, we just need to find a value such that \vec x \ge \vec 0. Since \vec x was already a solution, we only need to ensure that condition holds for the variables we changed - the basic variables and x_k: {\vec x}_B \ge 0, x_k \ge 0.
- This can be written as \vec b - x_k A_k \ge \vec 0, x_k \ge 0. If we solve for x_k, we get x_k = \min\set{\frac{b_i}{A_{i, k}} : 1 \le i \le n, A_{i, k} > 0}.
- If the maximum is infinite, then the LP is unbounded. To prove this, we let \vec x(t) be a function that outputs solutions to the LP, where {\vec x(t)}_B = \vec b - t A_k and {\vec t}_{\overline B} = \vec 0, and show that {\vec c}^t \vec x(t) \to \infty as t \to \infty. The certificate of unboundedness would then be \tup{B, A_k}.
- Also, the maximum is infinite exactly when A_k \le \vec 0, so if all of the coefficients for x_k are negative, the LP is unbounded. This is because if any element A_{i, k} was positive, eventually x_k A_{i, k} exceeds b_i for a finitely large values of x_k, which is an upper bound for x_k.
Now we have a new feasible solution \vec x' where x_i' = \begin{cases} x_k' &\text{ if } i = k \\ x_i' &\text{ if } i \in B \\ x_i &\text{ otherwise} \end{cases}. In other words, the solution where we maximized x_k and changed the basic variables starting from \vec x. Additionally, this new feasible solution has a greater or equal objective function value.
- Note that this new feasible solution is also a basic feasible solution!
- The new feasible solution is basic because when we maximized x_k, it was no longer 0, so it becomes part of the basis for the new feasible solution. At the same time, when we maximize x_k, at least one of the basic variables now has value 0 (if the basic variables ended up all positive, then x_k could have been increased more by decreasing them, so x_k wouldn't have been maximized - a contradiction!) - one of the basic variables that are now 0 are not part of the new feasible solution's basis.
- Here's why the new feasible solution's basis is in fact a basis: first, we removed one of the existing columns in favor of column k, so the number of columns hasn't changed. Second, the new column must be linearly independent of the other columns in the basis, because the basic variable that was excluded from the new basis was originally non-zero, so it was necessary to form a linear combination of columns equal to x_k's column.

So, we started with an LP in canonical form for a feasible basis, and used it to obtain a better feasible basis (or a proof of unboundedness). We can then rewrite the LP in canonical form again and repeat this process until we either find an optimal solution or prove that it's unbounded. This is the simplex algorithm.

To summarize the procedure of finding an improved feasible basis given an existing feasible basis for an LP in canonical form:

Rewrite the LP "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0" in canonical form for a feasible basis B. Clearly, canonical-form LPs can be written as "maximize {\vec c}_N^T {\vec x}_N subject to {\vec x}_B A_N {\vec x}_N = \vec b and \vec x \ge \vec 0".
1. Let A' = A_B^{-1} A and \vec b' = A_B^{-1} \vec b.
2. Let \vec c' = \left({\vec c}^T - A_B^{-1} {\vec c}_B A\right) \vec x and z' = A_B^{-1} {\vec c}_B \vec b.
3. The new LP is "maximize {\vec c'}^T \vec x + z' subject to A' \vec x = \vec b' and \vec x \ge \vec 0".
Compute \vec x where {\vec x}_B = {\vec b}_B and {\vec x}_{\overline B} = \vec 0 - the basic solution for our basis B, which must be feasible since B is a feasible basis.
If {\vec c}_N \le 0, stop and output \vec x as the optimal solution and B as a proof of optimality. Note that the optimal solution output by this algorithm is always a basic feasible solution!
Choose a k \in \overline B such that c_k > 0 - a non-basic variable with a positive coefficient in the objective function.
If A_k \le 0, stop and output a certificate of unboundedness \tup{B, A_k}.
Let x_k' = \min\set{\frac{b_i}{A_{i, k}} : 1 \le i \le n, A_{i, k} > 0} and {\vec x}_B' = \vec b - x_k A_k and {\vec x}_{\overline B - \set{k}}' = \vec 0.
Choose one of the basic variables r \in B such that x_r' = 0 and c_r > 0. Let B' = (B \cup \set{k}) - \set{r} be the new basis, formed by removing the basic variable that was forced to 0 and adding the non-basic variable we chose with a positive coefficient in the objective function.
Repeat starting from step 1 until the algorithm terminates, with B' instead of B and \vec x' instead of \vec x.

This is essentially the entire simplex algorithm - it takes in an LP in SEF and a feasible basis, and outputs an optimal solution or proof that the LP is unbounded. To actually use this to solve arbitrary LPs though, we need to find a feasible basis first. This will be the next topic.

The only thing left to do is to ensure that the simplex algorithm terminates - how do we know it won't get stuck in a cycle of bases? To do this, we can use Bland's Rule, where whenever we need to choose a variable the leftmost variable in each case (the variable x_i with the smallest possible index i). This affects which variable we choose in step 4 and 7 above.

;wip: Why does Bland's rule work? proof outlined in exercise 10, page 69 of textbook

One of the interesting results we found is that the optimal solution given by the simplex algorithm is always a basic feasible solution, and that the simplex algorithm, with Bland's rule always gives us the optimal solution if there is one. LPs can have infinite feasible solutions, but the number of bases is finite, and so there are a finite number of basic feasible solutions.

This tells us that it's actually possible to find optimal solutions LPs by just enumerating every basis, finding the basic solution for each, and choosing the one that's both feasible and has the highest objective function value. However, this would be really inefficient - the number of possible bases goes up exponentially with respect to the number of constraints. That said, the simplex algorithm can also display exponential behaviour for LPs like the Klee-Minty cube, though it's a lot more efficient on real-world LPs.

11/2/17

Finding a feasible basis for an arbitrary LP

One of the first steps before the simplex algorithm is, given an LP in SEF, to find a feasible basis, or detect that there is none. How do we find a basic feasible solution for an arbitrary LP in SEF? In other words, how do we find an \vec x \ge 0 such that A \vec x = \vec b (the objective function doesn't matter since we're just trying to find a basic feasible solution)?

Now we'll construct an auxilary problem - an LP that has a known feasible basis, whose solutions can be converted into a feasible basic solution for the original LP. We can then solve the auxilary problem using the simplex algorithm (since we have the auxilary LP and a feasible basis for that auxilary LP), and then convert the optimal solution to that auxilary LP into a basic feasible solution for our original LP.

The auxilary LP for an LP in SEF "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0" (where A is an n-row-m-column matrix) can be constructed as follows:

Flip the signs of every constraint that has a negative constant value (a negative value in \vec b), so that \vec b \ge 0. Let the new, sometimes-flipped constraints be given as A' \vec x = \vec b'.
- For example, x_1 + 2x_3 - 3x_5 = -4 should become -x_1 - 2x_3 + 3x_5 = 4, while x_1 + 2x_3 - 3x_5 = 8 should stay the same.
- This is necessary in order to ensure that the basic solution for the basis we'll choose later is feasible.
Let A'' = A' \| I and \vec c' be a vector such that c_i' = \begin{cases} 0 &\text{if } 1 \le i \le m \\ 1 &\text{if } m + 1 \le i \le m + n \end{cases} (i.e., \vec c' \cdot \vec x = x_{m + 1} + \ldots + x_{m + n}).
- A'' is basically A' with an n by n identity matrix appended on the right. In other words, for each constraint 1 \le i \le n of the form A_{i, 1}' x_1 + \ldots + A_{i, m}' x_m = b_i, we add a new variable x_{m + i} to it to get A_{i, 1}' x_1 + \ldots + A_{i, m}' x_m + x_{m + i} = b_i.
- We can think of the newly introduced variables x_{m + 1}, \ldots, x_{m + n} as "error" variables - extra values that are needed to satisfy the constraint. When all of these are 0, we just get the original constraints back.
- c' basically means that our objective function is the sum of those new variables.
Let the auxilary LP be "minimize {\vec c'}^T \vec x subject to A'' \vec x = \vec b' and \vec x \ge \vec 0". Alternatively, to make sure the auxilary LP is in SEF, we can write it as "maximize {-\vec c'}^T \vec x subject to A'' \vec x = \vec b' and \vec x \ge \vec 0".
- Note that this LP obviously has the basis B = \set{m + 1, \ldots, m + n} (i.e., the basis formed by the new variables introduced for each constraint). This is a basis because there are n such new variables, and because the entries form an identity matrix in A'', which means A_B'' = I, which by definition has linearly independent columns.
- The corresponding basic solution is clearly \vec x such that {\vec x}_B = \vec b' and {\vec x}_{\overline B} = \vec 0. This is always feasible because \vec b' \ge 0 and it always satisfies A'' \vec x = \vec b', since that's equivalent to A_B'' {\vec x}_B = I {\vec x}_B = {\vec x}_B = \vec b.

Now we have the auxilary LP and a feasible basis for the auxilary LP B = \set{m + 1, \ldots, m + n}, so we can find an optimal solution using the simplex algorithm. Clearly, this LP is bounded, because it's a minimization problem where the objective function has all positive coefficients. Therefore, since we know that the LP isn't infeasible (since we have a feasible solution already), and the LP isn't unbounded, it must have an optimal solution!

The optimal solution to the auxilary LP gives us a feasible solution to the original LP, or tells us the original LP is infeasible. If any of the error variables x_{m + 1}, \ldots, x_{m + n} are non-zero, that means the original LP is infeasible, since if the original LP was feasible, it could've satisfied the constraints with all of the error variables set to 0, which would result in a smaller objective function value than we got.

Likewise, if all of the error variables are zero, then we satisfied the constraints without using any of the error variables, so \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix} is a feasible solution for the original LP. Additionally, since it's the output of the simplex algorithm, it's also a feasible basic solution, so we can construct a feasible basis from \set{i : x_i > 0, 1 \le i \le n}. This isn't guaranteed to give us n columns, but we can add in other column indices from the original LP until we do get a basis, checking them for linear independence each time.

Therefore, we can solve an arbitrary LP as follows:

Convert the LP into SEF, and solve its auxilary LP using the simplex algorithm to find a feasible basic solution and a feasible basis for the LP.
Use the simplex algorithm to solve the original LP with the feasible basis found in step 1.

This is called the two-phase algorithm.

Now we can actually prove the fundamental theorem of LPs using the simplex algorithm! Since we proved without using that theorem that the simplex algorithm works, and the simplex algorithm outputs either an optimal solution, proof of unboundedness, or proof of infeasibility, we've proved that an arbitrary LP must fall into one of those three categories.

Half-spaces and convexity

In the geometric interpretation of LPs, the feasible region is the space of all possible solutions, \mb{R}^m where m is the number of variables in the LP.

A polyhedron P is a set of vectors that can be defined by P = \set{\vec x : A \vec x \le \vec b}, where A is a matrix and \vec b is a vector. Think of it as a solid n-dimensional shape - a bounded area of the n-dimensional space.

The feasible region of an LP is always a polyhedron. Given an LP "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0", we can write its feasible region as \set{\vec x: \begin{bmatrix} A \\ -A \\ -I \end{bmatrix} \le \begin{bmatrix} \vec b \\ -\vec b \\ \vec 0 \end{bmatrix}}. Basically, this is just saying A \vec x \le \vec b, A \vec x \ge \vec b, and \vec x \ge 0.

A hyperplane is a set of vectors that can be defined by H = \set{\vec x : \vec a \cdot \vec x = \beta}, where \vec a is a non-zero vector and \beta \in \mb{R}. Think of it as an n-dimensional plane - an n - 1-dimensional space within the n-dimensional space. This is by definition the solution to a linear equation.

A halfspace is a set of vectors that can be defined by F = \set{\vec x : \vec a \cdot \vec x \le \beta}, much like a hyperplane. Think of it as the space on one side of a hyperplane, or half of the n-dimensional space. This is by definition the solution to a linear inequality.

A polyhedron can be thought of as an intersection of a finite number of halfspaces - each linear constraint can be thought of as a halfspace where points in the halfspace are those that satisfy that particular constraint, and points outside don't.

Recall that \vec x \cdot \vec y = \magn{x} \magn{y} \cos(\theta), where \theta is the minimum angle between \vec x and \vec y.

The hyperplane H = \set{\vec x : \vec a \cdot \vec x = 0} is the set of all vectors that are orthogonal to \vec a - the ones for which \theta = 90 \deg so \cos \theta = 0. In the same way, the halfspace F = \set{\vec x : \vec a \cdot \vec x \le 0} is the set of all vectors that are on the side of H that doesn't contain \vec a - the ones for which \theta \ge 90 \deg so \cos \theta \le 0.

The translate of a set of vectors S for some vector \vec p is the set \set{\vec x + \vec p : \vec x \in S}.

The hyperplane \set{\vec x : \vec a \cdot \vec x = \beta} is a translate of the hyperplane \set{\vec x : \vec a \cdot \vec x = 0}, by any vector \vec p such that \vec a \cdot \vec p = \beta.

The dimension of a hyperplane \set{\vec x : \vec a \cdot \vec x = \beta} is simply the dimension of \set{\vec x : \vec a \cdot \vec x = 0}, which is a vector space - its dimension is n - \mathrm{rank}(\vec a) = n - 1.

A line through two vectors \vec x, \vec y is defined as \set{\vec x + k(\vec y - \vec x), k \in \mb{R}}. A line segment is a subset of that, defined as \set{\vec x + k(\vec y - \vec x), 0 \le k \le 1}.

A set S \subseteq \mb{R}^n is convex if and only if, for all \vec x, \vec y \in S, the line segment between \vec x and \vec y is a subset of S (i.e., \forall x \in S, \forall y \in S, \set{\vec x + k(\vec y - \vec x), k \in \mb{R}} \subseteq S).

The union of two convex sets S_1 \cup S_2 is not necessarily convex - consider S_1 = \set{0}, S_2 = \set{1}, for example. However, the intersection of two convex sets S_1 \cap S_2 is convex. This is easily proved: any two points \vec x, \vec y in S_1 \cap S_2 is in both S_1 and S_2, by definition. Therefore, the line segment between them is a subset of both S_1 and S_2, and so the line segment is a subset of S_1 \cap S_2, as required.

All halfspaces are convex. This is also easy to prove:

Let F = \set{\vec x : \vec a \cdot \vec x \le \beta} be a halfspace.
Let \vec x_1, \vec x_2 \in F, and \vec v \in \set{\vec x_1 + k(\vec x_2 - \vec x_1), 0 \le k \le 1} be an arbitrary point on the line segment between \vec x_1 and \vec x_2.
So \vec a \cdot \vec v = \vec a \cdot \vec x_1 + \vec a \cdot k(\vec x_2 - \vec x_1) = (1 - k) \vec a \cdot \vec x_1 + k \vec a \cdot \vec x_2 for some 0 \le k \le 1.
Since \vec x_1 and \vec x_2 are in the halfspace, \vec a \cdot \vec x_2 \le \beta and \vec a \cdot \vec x_2 \le \beta \le \beta.
So \vec a \cdot \vec v = (1 - k) \vec a \cdot \vec x_1 + k \vec a \cdot \vec x_2 \le (1 - k) \beta + k \beta = \beta.
Therefore, \vec v \in F, and since \vec v is an arbitrary element of the line segment, the line segment must be a subset of F, which means that F is convex.

As mentioned earlier, a polyhedron is an intersection of a finite number of halfspaces. Since halfspaces are convex, and intersections of convex sets are convex, all polyhedra are convex. Since the feasible region of an LP is a polyhedron, it must also be convex - this is actually why solving LPs is relatively easy!

Extreme points

A point \vec v is properly contained in a line segment L = \set{\vec x + k(\vec y - \vec x), 0 \le k \le 1} if and only if \vec v \in L - \set{\vec x_1, \vec x_2} - when it's part of the line segment but isn't an endpoint.

Suppose we have a convex set S. A point \vec v \in S is an extreme point if and only if there is no line segment L \subseteq S that properly contains \vec v - if the point has to be one of the endpoints of any line segment in S that contains it.

Intuitively, an extreme point is one that's on a "curve" on the surface of the set. For a convex set that looks like a 2D square, the extreme points at the four corners. For a circle, every point is an extreme point - a convex set doesn't necessarily have a finite number of extreme points! A halfspace actually has no extreme points.

Suppose we have an LP with the constraints x_1 \le 3, x_2 \le 2 and x_1, x_2 \ge 0. The solution space is actually 2D for this LP, so each constraint forms a 2D halfspace (a hemiplane). If we plot the feasible region of this LP, we actually get a 3 by 2 square where the bottom left corner is the origin. Here, the extreme points are \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 3, 0 \end{bmatrix}, \begin{bmatrix} 3 \\ 2 \end{bmatrix}, \begin{bmatrix} 2 \\ 0 \end{bmatrix}.

What's interesting is that all of these points, and only these points, satisfy two different constraints with equality at the same time. For example, for \begin{bmatrix} 3 \\ 2 \end{bmatrix}, the constraints x_1 \le 2 and x_2 \le 2 are satisfied with equality, so x_1 = 2 and x_2 = 2. In fact, this generalizes into multiple variables.

For an LP, given a constraint \vec p \cdot \vec x \le k and a solution \vec x, the constraint is tight if and only if it is satisfied with equality by \vec x - when \vec p \cdot x = k.

For an LP with constraints A \vec x \le \vec b, the tight constraints for a solution \vec x are denoted \overline A \vec x \le \overline{\vec b}, where \overline A and \overline{\vec b} are the rows of A and the entries of \vec b corresponding to constraints that are tight for \vec x.

Now we have the tools to define what an extreme point of a polyhedron is in a better way: given a point \vec x in a polyhedron \set{\vec x \in \mb{R}^m : A \vec x \le \vec b}, \vec x is an extreme point if and only if \mathrm{rank}(\overline A) = m. In other words, there's a theorem that says that an extreme point for an m-dimensional polyhedron is a point in the polyhedron that satisfies m linearly independent constraints with equality.

Now we'll prove the above:

Assume \mathrm{rank}(\overline A) = m. Suppose we have an arbitrary point \vec x in an arbitrary polyhedron P = \set{\vec x \in \mb{R}^m : A \vec x \le \vec b}.
Suppose \vec x is not an extreme point. Then \vec x is properly contained by a line segment L = \set{\vec x + k(\vec y - \vec x), 0 \le k \le 1} such that L \subseteq P, with some endpoints \vec v_1, \vec v_2. So there exists a 0 < k < 1 such that \vec x = \vec v_1 + k(\vec v_2 - \vec v_1) = k\vec v_2 + (1 - k)\vec v_1.
Clearly, the tight constraints \overline{\vec b} = \overline A \vec x = \overline A (k\vec v_2 + (1 - k)\vec v_1) = k \overline A \vec v_2 + (1 - k) \overline A \vec v_1.
Since L \subseteq P, \vec v_1 \in P and \vec v_2 \in P, \overline A \vec v_2 \le \overline{\vec b} and \overline A \vec v_1 \le \overline{\vec b}.
For any vectors \vec a, \vec b, \vec b, if \vec a = k \vec b + (1 - k) \vec c and \vec b \le \vec a and \vec c \le \vec a, then it must be that \vec a = \vec b = \vec c (if the weighted average of two numbers is greater or equal to both numbers, it must be equal to those numbers). This is easily proven by showing that \vec a = k\vec b + (1 - k)\vec c \le k\vec a + (1 - k)\vec a = \vec a, so k\vec b + (1 - k)\vec c = k\vec a + (1 - k)\vec a, so \vec b = \vec a and \vec c = \vec a.
Therefore, \overline{\vec b} = \overline A \vec v_1 = \overline A \vec v_2.
Since \mathrm{rank}(\overline A) = m, \overline A is invertible. Since \overline{\vec b} = \overline A \vec v_1 = \overline A \vec v_2, \vec v_1 = \vec v_2.
This is a contradiction, because then \vec x can't possibly be contained in L, if the line segment only has its endpoints. Therefore, if \mathrm{rank}(\overline A) = m, then \vec x must be an extreme point. Basically, we showed that if m independent constraints are satisfied by a point, then any line segment containing that point will only contain that point as both endpoints.
Assume \mathrm{rank}(\overline A) < m. So there must exist a vector \vec d such that \overline A \vec d = \vec 0.
Let \epsilon > 0. Let L be the line segment with endpoints \vec x - \epsilon \vec d and \vec x + \epsilon \vec d. Clearly, \vec x is properly contained by L, so we just need to show that L \subseteq P to prove that \vec x is not an extreme point.
Clearly, for the tight constraints \overline A, \overline A (\vec x - \epsilon \vec d) = \overline A \vec x - \epsilon \overline A \vec d. Since \overline A \vec d = \vec 0, \overline A \vec x - \epsilon \overline A \vec d = \overline A \vec x = \overline{\vec b}. Therefore, \vec x - \epsilon \vec d satisfies the tight constraints. The same argument shows that \vec x + \epsilon \vec d also satisfies the tight constraints.
Clearly, for any non-tight constraint \vec a \cdot \vec x \le \beta, \vec a \cdot (\vec x - \epsilon \vec d) = \vec a \cdot \vec x - \epsilon \vec a \cdot \vec d. Since \vec x \in P, we know that \vec a \cdot \vec x < \beta (the constraint is non-tight, so it isn't satisfied by equality, so we can use strict inequalities), but \vec a \cdot \vec d could be anything.
However, since \vec a \cdot \vec x < \beta, there must exist an \epsilon > 0 such that \vec a \cdot \vec x - \epsilon \vec a \cdot \vec d < \beta. Therefore, \vec x - \epsilon \vec d satisfies an arbitrary non-tight constraints, and by the same argument, so does \vec x + \epsilon \vec d.
Since all the tight constraints and arbitrary non-tight constraints are satisfied by \vec x - \epsilon \vec d and \vec x + \epsilon \vec d, so both are in P. Since P is convex and the endpoints are in P, L \subseteq P, so we have a line in P that properly contains \vec x.
Therefore, \vec x is not an extreme point if \mathrm{rank}(\overline A) < m.

Suppose we now have the feasible region of an LP P = \set{\vec x : \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 3 \end{bmatrix} \vec x = \begin{bmatrix} 2 \\ 4 \end{bmatrix}, \vec x \ge 0}. Recall that the feasible region of an LP is always a polyhedron. How do we check if \vec v = \begin{bmatrix} 2 \\ 4 \\ 0 \end{bmatrix} is an extreme point?

First, we can rewrite P in the form \set{\vec x : A \vec x = \vec b}: \set{\vec x : \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 3 \\ -1 & 0 & 1 \\ 0 & -1 & -3 \\ -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{bmatrix} \vec x = \begin{bmatrix} 2 \\ 4 \\ -2 \\ -4 \\ 0 \\ 0 \\ 0 \end{bmatrix}}.

Now we take the constraints that are satified with equality by \vec v - namely, constraints 1, 2, 3, 4, and 7. So \overline A = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 3 \\ -1 & 0 & 1 \\ 0 & -1 & -3 \\ 0 & 0 & -1 \end{bmatrix}. Clearly, \mathrm{rank}(\overline A) = 3, and the LP has 3 variables, so \vec v must be an extreme point.

An interesting consequence of the above theorem is that if we have the feasible region of an LP in SEF P = \set{\vec x : A \vec x = \vec b, \vec x \ge 0}, then \vec x \in P is an extreme point if and only if it's a basic feasible solution for that LP. In other words, the basic feasible solutions for an LP are the extreme points in its feasible region. This is easily proven from the previous theorem and the definition of a basic feasible solution.

Because the simplex algorithm moves between different bases until it reaches the optimal one, it's actually moving between extreme points in n-dimensional space. That means we could enumerate all the extreme points in the feasible region by enumerating all the basic feasible solutions.

1/3/17

Previously we looked at the shortest-path problem formulated as an IP - given a weighted graph G and s, t \in V(G), find the path from s to t that minimizes the total weight. Given a shortest-path problem and an s, t-path P, how do we prove if P is optimal? How do we find an optimal P, anyways?

The cardinality special case is the shortest path problem where all edge weights are 1 - we're trying to find the s, t-path P with the fewest edges. How do we show that P is an optimal solution, if it is?

Recall that any s, t-cut must contain one of the edges of P - if it didn't, then we could remove the edges in the cut and P would still connect s and t, which is a contradiction because removing the edges of an s, t-cut must disconnect s and t. As a result, if S \subseteq E(G) has an edge from every s, t-cut, then S contains an s, t-path.

If we can construct s, t-cuts \delta(U_1), \ldots, \delta(U_k) such that each one is disjoint from the other, then since an s, t-path must contain an edge from every s, t-cut, an s, t-cut must contain at least k edges. This allows us to lower bound the length of the shortest path, so if we find a path of length k, we know for sure that it is optimal. In other words, to prove that an s, t-path of length k is optimal, we find k pairwise disjoint s, t-cuts.

When we're looking at the general case (i.e., not the cardinality special case), we can do something similar. We construct a mapping between s, t-cuts and arbitrary non-negative values called cut widths: \overline y: \delta(U) \to \mb{R}_{\ge 0}. A feasible cut width assignment \overline y is such a mapping that, for every edge e, the sum of the widths of s, t-cuts containing e is less or equal to the weight of e. In other words, the cut widths summed up by edge shouldn't exceed the lengths of any edge. Formally, \forall e \in E(G), \sum_{\text{every s-t cut } U \text{ such that } e \in \delta(U)} \overline y_U \le c_e if and only if \overline y is feasible.

If we generalize the result of the cardinality special case, we find that if \overline y is a feasible cut width assignment, then every s, t-path must have length \sum_{\text{every s-t cut } U} \overline y_U or greater - the sum of the s, t-cut widths is a lower bound for the length of the shortest path! So if we find an , st path with that length, we know it's a shortest path.

Proof:

Let P be an s, t-path and \overline y be a feasible cut width assignment.
Clearly, the length of P is c(P) = \sum_{e \in P} c_e \ge \sum_{e \in P} \left(\sum_{\text{every s-t cut } \delta(U) \text{ such that } e \in \delta(U)} \overline y_U\right).
Since P is an s, t-path, P contains at least one edge from any \delta(U) (where U is an s, t-cut). So the condition for the inner sum will be satisfied, at one point or another, by every single s, t-cut.
So c(P) \ge \sum_{\text{every s-t cut } U} \overline y_U, giving us a lower bound for the length of P.

How do we find these cut width assignments, and do they always exist such that the total cut width is equal to the length of a shortest s, t-path? We will answer this using LPs.

Just as a feasible cut width assignment lower bounds the length of a shortest s, t-path, a feasible solution upper bounds the objective value of a minimization LP. Can we find a lower bound?

Suppose we have an LP "minimize {\vec c}^T \vec x subject to A \vec x \ge \vec b and \vec x \ge \vec 0". Clearly, any feasible solution \vec x must satisfy A \vec x \ge \vec b, so it must also satisfy {\vec k}^T A \vec x \ge {\vec k}^T \vec b for any \vec k \ge \vec 0 - we can construct any new constraint out of linear combinations of existing constraints, and \vec x will satisfy it.

Since {\vec c}^T \vec x \ge {\vec c}^T \vec x and (\vec k)^T A \vec x \ge (\vec k)^T \vec b, {\vec c}^T \vec x + (\vec k)^T A \vec x \ge {\vec c}^T \vec x + (\vec k)^T \vec b, so {\vec c}^T \vec x \ge (\vec k)^T \vec b + \left({\vec c}^T - (\vec k)^T A\right) \vec x.

Since we're trying to find a lower bound, we want to write the objective function value in the form z + \vec m \cdot \vec x, where \vec m \cdot \vec x \ge 0 - this means that z is a lower bound for the objective function value. To ensure (\vec k)^T \vec b + \left({\vec c}^T - (\vec k)^T A\right) \vec x is in this form, we want {\vec c}^T - (\vec k)^T A \ge \vec 0.

That means we're trying to choose \vec k such that {\vec c}^T \ge (\vec k)^T A. If we do find such a \vec k, then since \vec x \ge \vec 0, \left({\vec c}^T - (\vec k)^T A\right) \vec x \ge 0, so {\vec c}^T \vec x \ge {\vec c}^T \vec x \ge (\vec k)^T \vec b, and (\vec k)^T \vec b is a lower bound for the objective function.

We want the largest possible lower bound, to ensure that the actual minimum value isn't higher than our lower bound. To maximize this lower bound, we want to maximize (\vec k)^T \vec b subject to {\vec c}^T \ge (\vec k)^T A where \vec k \ge \vec 0, which is an LP such that the maximum objective function value is the minimum objective function value of the original LP. This new LP is the dual of the original LP. Since {\vec a}^T M = {M^T \vec a}^T for any vector \vec a and any matrix M, we can transpose both sides of the constraints to get "maximize {\vec b}^T \vec k subject to \vec c \ge A^T \vec k and \vec k \ge \vec 0".

In other words, "maximize {\vec b}^T \vec k subject to A^T \vec k \le \vec c and \vec k \ge \vec 0" is the dual LP for the primal LP "minimize {\vec c}^T \vec x subject to A \vec x \ge \vec b and \vec x \ge \vec 0", and vice versa. Basically, we transposed A, swapped \vec b and \vec c, and swapped the \ge and \le for all constraints except the non-negativity constraints.

Also, we can do the same thing to an LP in SEF to obtain its dual: "minimize {\vec b}^T \vec k subject to A^T \vec k \le \vec c" is the dual of "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0".

The Weak Duality theorem says that if \vec x_1 is feasible for the primal maximization LP and \vec x_2 is feasible for the dual minimization LP, then \vec c \cdot \vec x_2 \le \vec b \cdot \vec x_1. This is because {\vec b}^T \vec x_2 = {\vec x_2}^T \vec b, and since \vec b \le A \vec x_1, {\vec b}^T \vec x_2 \le {\vec x_2}^T (A \vec x_1) = ({\vec x_2}^T A) \vec x_1 = (A^T \vec x_2)^T \vec x_1. Since A^T \vec x_2 \le \vec c, {\vec b}^T \vec x_2 \le {\vec c}^T \vec x_1, as required.

In other words, the objective function value for any feasible solution of the maximization LP is upper bounded by the objective function value for any feasible solution of the LP's dual, which is a minimizatoin problem. The main consequence of this is that if the objective function values are equal, both feasible solutions are optimal for their respective LPs.

As a result, the weak duality theorem also says that if \vec b \cdot \vec x_1 = \vec c \cdot \vec x_2 (the objective functions are exactly equal), then \vec x_1 is optimal for the primal LP and x_2 is optimal for the dual LP. This is because the dual's objective function bounds the primal's objective function, and vice versa, so if they're equal, they must be at their bounds.

The LP relaxation of an IP is just the IP with integer constraints removed.

Consider the shortest-path problem "minimize {\vec c}^T \vec x subject to A \vec x \ge \vec 1 and \vec x \ge \vec 0". The IP rows of A are s, t-cuts and columns are edges, and values of A are 1 or 0 depending on whether the corresponding edges are in the corresponding \delta(U) values for the s, t-cuts. Clearly, the solution \vec x for is a vector of integers 1 or 0, representing whether that given edge is a part of the shortest s, t-path, and we're trying to minimize the sum of weights of those edges.

Clearly, the objective function value is the length of the shortest s, t-path. Then the LP relaxation of the shortest-path IP must have an optimal value that's at most the optimal value for the IP, since removing constraints can only expand the feasible region.

Also note that the solution to the dual of the LP relaxation of the shortest-path IP is always a valid cut width assignment! This is because the dual, "maximize \vec 1 \cdot \vec y subject to A^T \vec y \le \vec c and \vec y \ge \vec 0", is maximizing the number of cuts while each constraint in A^T \vec y = \vec b ensures that cut weights summed for a given edge can't exceed the edge's weight. Each value of \vec y is a cut width, and each column of A^T is an s, t-cut. In other words, A[U, e] is 1 whenever e \in \delta(U) and 0 otherwise, where U is an s, t-cut and e is an edge.

From the Weak Duality theorem, the dual of the LP relaxation must have an optimal value that's less or equal to the optimal value for the LP relaxation. That means the cut width assignment \vec y has total width at most the length of the shortest s, t-path.

So if we find a feasible cut width assignment such that the total width is equal to the length of an s, t-path, then the s, t path must be a shortest s, t-path and the feasible cut width assignment must be maximal.

Also, the characteristic vector of an s, t-path P = sa, ab, bt is the vector \vec x where each element corresponds to an edge, and each element is 1 if the corresponding edge is in P, or 0 otherwise.

10/3/17

Week 8.

We looked at duals for the shortest path problem, but how did we find these shortest paths? How did we find the duals? In other words, how do we find an s, t-path P and a feasible dual (cut width assignment) such that the length of the path is equal to the total width of the cut width assignment?

In this course we will denote directed graphs as D = \tup{V, E}, and directed edges as \vec{uv}. A directed s, t-path might be denoted with \vec{sx}, \vec{xy}, \vec{yt}.

The slack of an edge given a feasible cut width assignment in a shortest-path problem is the difference between the edge's weight and the total width of the cuts that contain the edge. An equality edge is an edge with zero slack.

Also, an active s, t-cut in a feasible cut width assignment is an s, t-cut with a non-zero width.

Formally, the slack of an edge e for a feasible cut width assignment y is defined as \mathrm{slack}_y(e) = c_e - \sum_{\text{every st-cut } \delta(U) \text{ such that } e \in \delta(U)} y_U, where c_e is the weight of e and y_U is the width of the s, t-cut \delta(U).

There's actually a simple algorithm that will always find a shortest-path and a feasible cut width assignment of the same total width as the path's length (assuming non-negative edge weights):

Let U be the trivial s, t-cut \set{s} (cut only containing s), and y be the trivially feasible cut width assignment \vec 0 (all cuts have weight 0).
Repeat while t \notin U:
1. Let ab be the edge in \delta(U) with the smallest value for \mathrm{slack}_y(e). Essentially, this is the biggest value by which you can grow this cut's width while still keeping y a feasible cut width assignment. Without loss of generality, assume a \in U and b \notin U.
2. Let y_U = \mathrm{slack}_y(ab). Essentially, grow the cut width by the largest amount possible, as calculated in the previous step.
3. Change ab into a directed edge \vec{ab}. Keep track of each new vertex's predecessor vertex - you can use a map and insert b \to a at this step.
4. Add b to U. Essentially, grow the s, t-cut U so that it once again includes all of the vertices reachable by directed edges.
Now y_U is a feasible cut width assignment of maximal width, while a minimum length s, t-path can be computed from the predecessors map.

Clearly, this algorithm must terminate, because every step adds a new vertex to U, so eventually it must reach t and terminate, assuming the graph is connected.

How do we know the resulting cut width assignment is of maximal width, or that the resulting s, t-path is of minimal length?

First of all, P is a shortest s, t-path if all edges in P are equality edges, and every active cut contains exactly one edge of P. Proof:

Assume all edges in P are equality edges, and every active cut contains exactly one edge of P.
Clearly, the length of the path is \sum_{e \in P} c_e. Since all edges are equality edges (so their slack is 0), c_e = \sum_{\text{every st-cut } \delta(U) \text{ such that } e \in \delta(U)} y_U for any edge e \in P.
So the length of the path is \sum_{e \in P} \sum_{\text{every st-cut } \delta(U) \text{ such that } e \in \delta(U)} y_U.
Clearly, if we expand the summation, e \in \delta(U) will occur \abs{P \cap \delta(U)} times for each s, t-cut \delta(U). So \sum_{e \in P} \sum_{\text{every st-cut } \delta(U) \text{ such that } e \in \delta(U)} y_U = \sum_{\text{every st-cut } \delta(U)} \abs{P \cap \delta(U)} y_U.
However, since every active s, t-cut contains exactly one edge of P, \abs{P \cap \delta(U)} = 1 for them, so \abs{P \cap \delta(U)} y_U = y_U. For any non-active s, t-cut U, y_U = 0, so \abs{P \cap \delta(U)} y_U = y_U. Therefore, \sum_{\text{every st-cut } \delta(U)} \abs{P \cap \delta(U)} y_U = \sum_{\text{every st-cut } \delta(U)} y_U.
Since \sum_{e \in P} c_e = \sum_{\text{every st-cut } \delta(U)} y_U, by Weak Duality, P is a minimum length path and y is a maximum width cut width assignment.

Now let's use this to prove that the algorithm gives the correct result. First, we note that at the end of every iteration in step 2, the following properties hold:

y is a feasible cut width assignment.
All directed edges are equality edges (all directed edges have zero slack).
No arcs point into any s, t-cuts - there are no entering arcs (for any active s, t-cut \delta(T), no directed edges \vec{ab} exist such that a \notin T but b \in T).
For every u \in U (vertex in the current cut), there is a directed s, u-path.
Every arc has both ends contained within U.

These invariants hold when the algorithm terminates. Therefore:

By property 1 above, we know that y is feasible cut width assignment.
By property 2, all arcs are equality arcs.
Since t \in U, by property 4 above there must be a s, t-path.
We proved earlier that P must contain at least one edge from each s, t-cut. Suppose there exists an active s, t-cut \delta(U) that contains more than one edge of P. Since Then P must first exit U through \delta(U), and then enter it again, which would be impossible by property 3. Therefore, any active s, t-cut must contain exactly one edge from P.

Since all edges in P are equality edges, and every active cut contains exactly one edge of P, we can use the proposition proved above to say that P is a shortest s, t-path. Therefore, the algorithm is always correct.

Note that if all edge weights are integers, then in the algorithm output, all cut widths are also all integers.

17/3/17

Week 9.

Previously, we looked at duality for the shortest-path problem. now we'll look at duality in general.

We can write an arbitrary LP as "maximize/minimize {\vec c}^T \vec x subject to A \vec x ? b and \vec x ? \vec 0". Here A \vec x ? b represents n individual constraints \mathrm{row}_i(A) \cdot \vec x \le b_i, \vec a_i \cdot \vec x \ge b_i, or \vec a_i \cdot \vec x = b_i. Likewise, \vec x ? \vec 0 represents n variable definitions x_i \ge 0, x_i \le 0, or x_i \text{ free}.

The dual has a variable for each constraint in the primal LP, and a constraint for each variable in the primal LP.

Recall that the dual of an LP "maximize {\vec c}^T \vec x subject to A \vec x \ge b and \vec x \ge \vec 0" is "minimize {\vec b}^T \vec y subject to A^T \vec y \le c and \vec y \ge \vec 0". This only works for LPs in a certain form, but it's easy to change LPs into this form:

If the LP is a minimization problem, invert the objective function and replace "minimize" with "maximize" to turn it into a maximization LP.
Turn constraints of the form \vec a \cdot \vec x \le b into -\vec a \cdot \vec x \ge -b.
Turn constraints of the form \vec a \cdot \vec x = b into -\vec a \cdot \vec x \ge -b and \vec a \cdot \vec x \ge b.
Replace variables x \le 0 with x' \ge 0 where x = -x', or variables x \text{ free} with x' \ge 0, x'' \ge 0 where x = x' - x''.

Alternatively, a more mechanical way:

If the LP is a minimization problem, invert the objective function and replace "minimize" with "maximize" to turn it into a maximization LP.
For each constraint \mathrm{row}_i(A) \cdot \vec x = b_i, define a variable y \text{ free}; for each constraint \mathrm{row}_i(A) \cdot \vec x \ge b_i, define a variable y \ge 0; for each constraint \mathrm{row}_i(A) \cdot \vec x \le b_i, define a variable y_i \le 0.
For each variable x_i \text{ free}, define a constraint \mathrm{column}_i(A) \cdot \vec y = c_i; for each variable x_i \ge 0, define a constraint \mathrm{column}_i(A) \cdot \vec y \ge c_i; For each variable x_i \le 0, define a constraint \mathrm{column}_i(A) \cdot \vec y \le c_i.

A good way to memorize maximization LP dual conversion is "convert constraints to variables (flipping inequalities in the process), convert variables to constraints, change maximize to minimize". A good way to memorize minimization LP dual conversion is "convert constraints to variables, convert variables to constraints (flipping inequalities in the process), change minimize to maximize".

We can generalize the weak duality theorem to these arbitrary LPs: if \vec x_1 is feasible for the primal maximization LP and \vec x_2 is feasible for the dual minimization LP, then \vec c \cdot \vec x_2 \le \vec b \cdot \vec x_1. Proof:

Clearly, all inequality constraints can be converted into equality constraints using slack variables (e.g., \vec a \cdot \vec x \ge b becomes \vec a \cdot \vec b + s = b with the slack variable s \le 0, and \vec a \cdot \vec x \le b becomes \vec a \cdot \vec b + s = b with the slack variable s \ge 0).
So all constraints in the primal LP can be written as A\vec x + \vec s = \vec b, with the slack variables represented with \vec s.
If we take the dual of this, we get the constraints A^t \vec y + \vec w = \vec c in the dual LP, where \vec w represents the slack variables.
Let \overline x be a feasible solution for the primal LP, and let \overline y be a feasible solution for the dual LP.
Let \overline s = \vec b - A \overline x represent the slack variable values in the primal LP, and let \overline w = \vec c - A^t \vec y represent the slack variable values in the dual LP.
Clearly, {\overline y}^T \vec b = {\overline y}^T (A \overline x + \overline s) = ({\overline y}^T A) \overline x + {\overline y}^T \overline s, since A \overline x = \overline b.
Clearly, ({\overline y}^T A) \overline x + {\overline y}^T \overline s = (\vec c - \overline w)^T \overline x + {\overline y}^T \overline s, since {\overline y}^T A = (A^T \overline y)^T and A^T \overline y + \overline w = \vec c, so {A^T \overline y}^T = (\vec c - \overline w)^T.
Clearly, (\vec c - \overline w)^T \overline x + {\overline y}^T \overline s = {\vec c}^T \overline x - {\overline w}^T \overline x + {\overline y}^T \overline s.
We know that {\overline w}^T \overline x \le 0 and {\overline y}^T \overline s \ge 0 (we won't prove this in this course, but it follows from the way we defined slack variables initially).
Therefore, {\overline y}^T \vec b \ge {\vec c}^T \overline x, as required.

This tells us the following:

If an LP is unbounded, its dual is infeasible (if it wasn't, then the LP would have a bound).
If both an LP and its dual are feasible, they must both have optimal solutions (because they're not unbounded, and they're not infeasible, so this is true by the fundamental theorem of LPs).

The Strong Duality Theorem is similar to the weak duality, but it deals with optimal solutions rather than feasible solutions: if \vec x_1 is an optimal solution for a primal LP, then there exists a \vec x_2 that's an optimal solution for its dual LP, such that \vec c \cdot \vec x_2 = \vec b \cdot \vec x_1. Proof:

Assume our LP is in SEF, so it's of the form "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0". Since any LP can be converted into SEF, this assumption is without loss of generality.
Then the dual must be "minimize {\vec b}^T \vec y subject to A^T \vec y \ge \vec c". Assume that the primal LP has an optimal solution \overline x.
Then the 2-phase simplex algorithm must terminate with an optimal basis B, so we can write the primal LP in canonical form for B and multiply both sides of the constraints equation by A^{-1}: "maximize z = {\overline y}^T \vec b - {\overline c}^T \vec x subject to \vec x_B = A_B^{-1} A_{\overline B} \vec x_{\overline B} and \vec x \ge \vec 0". Here, {\overline c}^T = {\vec c}^T - {\overline y}^T A.
So \overline x_B = A_B^{-1} \vec b and \overline x_{\overline B} = \vec 0. Note that the LP in canonical form is equivalent to the original LP in SEF - and \overline x has the same objective value in both.
Therefore, {\vec c}^T \overline x = {\overline y}^T \vec b = {\overline y}^T \vec b + {\vec c}^T \overline x.
Since \vec c has 0 for all of the basic components, {\vec c_B}^T \overline x_B = 0. Likewise, since \overline x_{\overline B} = \vec 0, {\vec c_{\overline B}}^T = 0. So {\vec c}^T \overline x = {\overline y}^T \vec b = \vec b \cdot \overline y.
Since B is an optimal basis for the , we know from the Simplex algorithm that \overline c \le \vec 0 in the canonical form objective function, so {\vec c}^T - {\overline y}^T A \le 0.
So {\vec c}^T \le {\overline y}^T A, which means \vec c \le A \overline y, which means that \overline y is feasible for the dual LP.
Since \overline y is a feasible solution whose objective function value is equal to that of a feasible solution of the primal LP, we've found an optimal solution to the dual LP that has the same objective function value, as required.

The strong duality theorem tells us that if a primal LP and its dual are both feasible, then they can neither be infeasible nor unbounded, so they must have optimal solutions that have equal value.

This gives us the slightly stronger version of the strong duality theorem: if a primal LP and its dual are both feasible, then both have optimal solutions with the same objective function value. Otherwise, one must be unbounded, and the other must be infeasible.

Recall the proof for the weak duality theorem, where we wrote out our LP as "maximize \vec c \cdot \vec x subject to A \vec x + \vec s = \vec b and \vec s \ge \vec 0", with the dual "minimize \vec b \cdot \vec y subject to A^T \vec y = \vec c and \vec y \ge 0". In the process, we showed that \overline y \cdot \vec b = \overline y \cdot (A \overline x + \overline s) = {\overline y}^T A \overline x + {\overline y}^T \overline s = (A^T \overline y)^T \overline x + {\overline y}^T \overline s = {\vec c}^T \overline x + {\overline y}^T \overline s.

According to the strong duality theorem, if \overline x, \overline y are optimal for their respective LPs, then {\vec c}^T \overline x = {\overline y}^T \vec b. So if \overline x, \overline y are optimal for their respective LPs and \overline y \cdot \vec b = {\vec c}^T \overline x + {\overline y}^T \overline s (\overline s represents the slack variables), then {\overline y}^T \overline s = 0.

Since \overline x, \overline y are feasible solutions, \overline x \ge 0 and \overline y \ge 0. Likewise, \overline s \ge 0 by definition. So if {\overline y}^T \overline s = 0, then either \overline y_i = 0 or \overline s_i = 0 for all 1 \le i \le n. In other words, either \overline y_i = 0, or constraint i in the primal LP is tight (satisfied with equality).

This gives us the complementary slackness special case: if \overline x is a feasible solution for a primal LP of the form "maximize \vec c \cdot \vec x subject to A \vec x \le \vec b" and \overline y is a feasible solution of its dual "minimize \vec b \cdot \vec y subject to A^T \vec y = \vec c and \vec y \ge 0", \overline x and \overline y are optimal for their respective LPs if and only if for each 1 \le i \le n, either \overline y_i = 0, or constraint i of the primal LP is tight.

The general complementary slackness theorem is the general version of the complementary slackness special case. It says that given a feasible solution \overline x of a primal LP and a feasible solution \overline y of its dual LP, \overline x and \overline y are optimal exactly when they satisfy the complementary slackness conditions:

For each 1 \le i \le m, either \overline x_i = 0 or constraint i of the dual LP is satisfied with equality (or both are true), and
For each 1 \le j \le n, either \overline y_j = 0 or constraint j of the primal LP is satisfied with equality (or both are true).

Geometric Interpretations of Complementary Slackness

Given \vec a_1, \ldots, \vec a_k \in \mb{R}^n, the cone generated by \vec a_1, \ldots, \vec a_k is C = \set{\lambda_1 \vec a_1 + \ldots + \lambda_k \vec a_k : \vec \lambda \ge \vec 0}. Essentially, it's the set of all points that can be generated by linear combinations of a set of vectors using non-negative coefficients.

The cone of tight constraints for an LP and a feasible solution \overline x is the cone generated by those rows of A that correspond to tight constraints for A \overline x = \vec b.

Interestingly, a feasible solution \overline x is an optimal solution if and only if \vec c is in the cone of tight constraints for \overline x. This follows from the general complementary slackness theorem, and is useful for geometrically interpreting what an optimal solution is.

Intuitively, we can think of \vec c as the "direction to optimize in". We're basically trying to move \overline x toward \vec c as far as possible. When \vec c is in the cone of tight constraints, \overline x must have been moved until it reached a polyhedron vertex, unable to move further past that vertex into the cone, which would make it optimal.

24/3/17

Integer And Linear Programs

LPs are nice because we have ways of efficiently solving huge LP instances (millions of variables/constraints). Additionally, it's easy to construct short certificates for optimality (Strong Duality), feasibility (Farka's Lemma), and unboundedness. Plus, the fundamental theorem of LPs tells us there can only be three possible outcomes for any given LP.

In contrast, IPs don't have any known efficient solver algorithms, and we don't know of any way to generate short certificates for the solver outcomes. Plus, IPs have a lot more possible outcomes. On the other hand, IPs can be used to formulate many more types of practical problems, and certain special cases can easily be solved in practice.

For example, here's an IP that is feasible, bounded, and doesn't have an optimal solution: "maximize x_1 - \sqrt 2 x_2 subject to x_1 \le \sqrt 2 x_2, x_1 \ge 1, x_2 \ge 1, and x_1, x_2 \text{ integer}". This is because \sqrt 2 is irrational, so cannot be represented as a ratio. However, the IP tries to approximate \sqrt 2 as \frac{x_1}{x_2}, and the approximation can get arbitrarily better with larger integers x_1 and x_2. Therefore, no finite values for x_1, x_2 can be optimal. Formally:

Suppose x_1, x_2 is an optimal solution to the LP.
Let x_1' = 2x_1 + 2x_2, x_2' = x_1 + 2x_2. Clearly, x_1', x_2' satisfies constraint 2 if x_1, x_2 did.
Clearly, x_1' \le \sqrt 2 x_2' \iff 2x_1 + 2x_2 \le \sqrt 2 x_1 + \sqrt 2 2x_2 \iff (2 - \sqrt 2)x_1 \le \sqrt 2 2x_2 - 2 x_2 \iff (2 - \sqrt 2)x_1 \le (2 \sqrt 2 - 2) x_2 \iff x_1 \le \frac{2 \sqrt 2 - 2}{2 - \sqrt 2} x_2 = \sqrt 2 x_2.
Since x_1 \le \sqrt 2 x_2, x_1' \le \sqrt 2 x_2', so x_1', x_2' is a feasible solution.
Clearly, x_1' - \sqrt 2 x_2' > x_1 - \sqrt 2 x_2 \iff (2x_1 + 2x_2) - \sqrt 2(x_1 + 2x_2) > x_1 - \sqrt 2 x_2 \iff x_1 < \frac{2 - \sqrt 2}{\sqrt 2 - 1} x_2 = \sqrt 2 x_2.
Clearly, x_1 < \sqrt 2 x_2, because if x_1 = \sqrt 2 x_2, then \frac{x_1}{x_2} = \sqrt 2, which is impossible since x_1, x_2 are integers and \sqrt 2 is irrational. So x_1' - \sqrt 2 x_2' > x_1 - \sqrt 2 x_2.
So x_1, x_2 can't be optimal solutions to the LP, a contradiction. So the LP has no optimal solution.

In theory, IPs can be reduced to LPs, the LPs can be solved, and the solutions can be turned back into solutions for the IPs. Though this is procedure is itself no more efficient than existing ways to solve IPs, it tells us a lot about the relationship between IPs and LPs.

The convex hull of a subset C of \mb{R}^n is the smallest convex set that contains C. Intuitively, it's the shape formed by tightly wrapping the region in cling film - all gaps and concavities are filled in to form a convex region.

Also, it turns out that every subset C of \mb{R}^n has exactly one possible convex hull. This is because if we suppose there were two or more possible convex hulls for C, their set intersection is also a convex hull for C (because both sets being intersected are convex and contain C), but it would be smaller than at least one of the convex hulls, which contradicts the earlier fact that the two convex hulls are convex sets containing C of minimum size.

Meyer's Theorem: given the polyhedron P = \set{\vec x : A \vec x \le b} where all values in A and b are rational, the convex hull of all integer points in P is a polyhedron. The values must be rational because of cases like the one we saw above, where the IP's polyhedron is P = \set{\vec x : x_1 \le \sqrt 2 x_2, x_1 \ge 1, x_2 \ge 1}. Here, the convex hull of all integer points is not a polyhedron - if you try to construct one, it'll take infinitey many halfspaces.

Meyer's theorem means that solving IPs can be reduced to solving LPs. Suppose we have an IP "maximize \vec c \cdot \vec x subject to A \vec x \le \vec b and \vec x \text{ integer}", where all values in A, \vec b are rational. Clearly, the convex hull for the feasible solutions of the IP is a polyhedron A' \vec x \le \vec b', by Meyer's theorem.

Now consider the LP "maximize \vec c \cdot \vec x subject to A' \vec x \le \vec b'". This LP is infeasible exactly when the IP is infeasible, and is unbounded exactly when the IP is unbounded. Furthermore, an optimal solution to the IP is also an optimal solution to the LP, and an optimal solution to the LP that is also an extreme point is an optimal solution to the IP.

So to solve the IP, we could find A' and \vec b', use the Simplex algorithm to find an extreme optimal solution to the LP formed by A' and \vec b', and then use those as solutions as solutions to the IP. However, this is not a practical way of solving IPs, because the size of A' and \vec b' can grow exponentially with respect to the size of A and \vec b. One common real-world technique for solving IPs is to approximate the convex hull for the integer points.

Solving IPs

First, we'll start by finding an optimal solution \overline x for the LP relaxation of the IP first (the LP relaxation is the IP with the integrality constraints removed). This is easy to do using the simplex algorithm, but the solution we get is not always an integral solution.

Now we'll add another constraint \vec \alpha \cdot \vec x \le \beta to the LP relaxation, such that the constraint is satisfied for all feasible solutions to the IP, but not for \overline x. Such a constraint is known as a cutting plane.

If we add a cutting plane constraint to our LP relaxation, and then find an optimal solution x', we note that the new optimal solution is now closer to at least one of the feasible solutions for the IP. By repeatedly adding cutting plane constraints and finding optimal solutions, we eventually run into an integer solution.

This process is called the cutting plane algorithm. Given the IP "maximize \vec c \cdot \vec x subject to A \vec x \le \vec b and x \text{ integer}":

Let P be the LP relaxation of the IP: "maximize \vec c \cdot \vec x subject to A \vec x \le \vec b".
If P is infeasible, STOP and output that the IP is also infeasible.
Let \overline x be an optimal solution to P, found using the Simplex algorithm.
If \overline x is integral, STOP and output the optimal IP solution.
Find a cutting plane \vec \alpha \cdot \vec x \le \beta.
Add the cutting plane constraint to P and go back to step 2.

It turns out the Simplex algorithm already finds cutting planes - take the final canonical form of the LP, then ;wip: 05b slides.

31/3/17

Recall that NLPs are problems of the form "maximize f(\vec x) subject to g_i(\vec x) \le 0 for all 1 \le i \le k". NLP feasible regions don't need to be convex, or even connected. This is what makes solving them hard.

We can also write this as "maximize \lambda subject to \lambda - f(\vec x) \le 0 and g_i(\vec x) \le 0 for all 1 \le i \le k". At an optimal solution, it is always true that \lambda = f(\vec x). Because \lambda is a linear function (of itself), we can always write any NLP such that it has a linear objective function. For the rest of this course, we'll assume that all NLPs we look at have linear objective functions.

NLPs generalize LPs and IPs. As we saw earlier, x_i = k can be written as x_i \le k and k \le x_i, x_i \text{ integer} can be written as \sin(\pi x_i) = 0, and 0 \le x_i \le 1, x_i \text{ integer} can be written as x_i(x_i - 1) = 0.

For our purposes, a feasible solution \vec x to a minimization problem is a local optimum if and only if there exists a \delta > 0 such that for any \vec x' such that \magn{\vec x' - \vec x} \le \delta, f(\vec x) < f(\vec x'). For a maximization problem, it's the same thing, but with f(\vec x) > f(\vec x') instead. Geometrically, local optima are feasible solutions for which there are no better solutions within some distance of it.

It turns out that if the feasible region is convex, all local optima are also global optima. Proof:

Let S be a convex feasible region for a minimization problem and \vec x \in S be local optimum. Suppose \vec x is not a global optimum. Let \vec x' \ne \vec x be a global optimum.
Since \vec x is a local optimum, for some \delta > 0, for any \vec z \in S such that \magn{\vec z - \vec x} \le \delta, \vec c \cdot \vec x \le \vec c \cdot \vec z.
Let \vec y = \lambda\vec x' + (1 - \lambda)\vec x. Since \vec x \in S and \vec x' \in S and S is convex, \vec y \in S.
Clearly, for \lambda = \delta, \magn{\vec y - \vec x} \le \delta. So \vec c \cdot \vec x \le \vec c \cdot \vec y.
However, \vec c \cdot \vec y = \vec c \cdot \left(\lambda\vec x' + (1 - \lambda)\vec x\right) = \lambda \vec c \cdot \vec x' + (1 - \lambda) \vec c \cdot \vec x.
Since \vec x' is a global optimum and \vec x isn't, \vec c \cdot \vec x' < \vec c \cdot \vec x. So \lambda \vec c \cdot \vec x' + (1 - \lambda) \vec c \cdot \vec x < \lambda \vec c \cdot \vec x + (1 - \lambda) \vec c \cdot \vec x = \vec c \cdot \vec x.
So, \vec c \cdot \vec y < \vec c \cdot \vec x and \vec c \cdot \vec x \le \vec c \cdot \vec y, a contradiction! Therefore, \vec x must be a global optimum.

Our strategy so far for finding globally optimal solutions to optimization problems has been to find a feasible solution, and then repeatedly refine it until we find one we know is locally optimal. For convex problems, locally optimal solutions are always globally optimal solutions, so this works well. However, since NLPs can have non-convex feasible regions, there can be locally optimal solutions that aren't globally optimal.

A convex function is a function f(\vec x) such that for all \vec a, \vec b, 0 \le k \le 1, f(k\vec a + (1 - k)\vec b) \le kf(\vec a) + (1 - k)f(\vec b). Basically, a convex function is one where the function's value at the interpolation of any two points is always less or equal to the interpolation of the function's value at those two points. For example, 1D convex functions like f(x) = x^2 are always flat or open upward. A common way of proving convexity is to show that a function's second derivative with respect to k is always positive.

For a convex function f(\vec x) and constant \beta, S = \set{\vec x \in \mb{R}^n : f(\vec x) \le \beta} is a convex set. Proof:

Let f(\vec x) be a convex function. Let \beta be a constant. Clearly, \vec x \in S if and only if f(\vec x) \le \beta.
Let \vec a, \vec b \in S, so f(\vec a) \le \beta and f(\vec b) \le \beta. Let \vec y = k\vec a + (1 - k)\vec b.
Since f(\vec x) is a convex function, f(\vec y) \le kf(\vec a) + (1 - k)f(\vec b). Clearly, kf(\vec a) + (1 - k)f(\vec b) \le k\beta + (1 - k)\beta = \beta.
Since \vec y \le \beta, S is a convex set.

The epigraph of a function f: \mb{R}^n \to \mb{R} is defined by \mathrm{epi}(f) = \set{\begin{bmatrix} y \\ \vec x \end{bmatrix} : y \ge f(\vec x), \vec x \in \mb{R}^n}. For 1D functions, it's the area above the curve; for 2D functions, it's the volume above the surface; and so on. The epigraph is a useful concept, because it's convex if and only if f(\vec x) is convex.

Suppose we have an NLP "maximize f(\vec x) subject to g_1(\vec x) \le 0 \land \ldots \land g_k(\vec x) \le 0". Let S_i = \set{\vec x \in \mb{R}^n : g_i(\vec x) \le 0} for all 1 \le i \le k. Clearly, the feasible region of this NLP is S_1 \cap \ldots \cap S_k - the intersection of all the values that satisfy each constraint.

If each g_i is convex, then so is each S_i, as we proved above. Since the intersection of convex sets is also convex, if each g_i is convex, so is the feasible region of the NLP.

The KKT Theorem

The feasible region of an NLP is a region of \mb{R}^n, where n is the number of variables.

Suppose we have the NLP "minimize -x_1 - x_2 subject to -x_2 + x_1^2 \le 0 and -x_1 + x_2^2 \le 0 and -x_1 + \frac 1 2 \le 0". How do we prove that \overline x = \begin{bmatrix} 1 \\ 1 \end{bmatrix} is an optimal solution?

We can get a relaxation of the NLP by arbitrariy removing some constraints: "minimize -x_1 - x_2 subject to -x_2 + x_1^2 \le 0 and -x_1 + x_2^2 \le 0".

We can further relax it by replacing the -x_2 = x_1^2 \le 0 constraint with 2x_1 - x_2 \le 1, which is always satisfied when -x_2 = x_1^2 \le 0 is satisfied, and is satisfied by equality at \overline x.

We can do the same thing by replacing -x_1 + x_2^2 \le 0 with -x_1 + 2x_2 \le 1, which is also satisfied when -x_1 + x_2^2 \le 0 and satisfied by equality at \overline x.

The NLP then becomes "minimize -x_1 - x_2 subject to 2x_1 - x_2 \le 1 and -x_1 + 2x_2 \le 1", which is just an LP. Clearly, \overline x is an optimal solution to the LP relaxation, because the objective function coefficients are in the cone of tight constraints (we could also show this using strong duality, if we could find an optimal solution to the dual).

Since \overline x is an optimal solution to the LP relaxation, and is also a feasible solution to the NLP, it must also be an optimal solution to the NLP, since the LP's feasible region is a superset of the NLP's feasible region - each NLP feasible solution is an LP relaxation feasible solution, so the best of all the LP relaxation feasible solutions is also the best NLP feasible solution, if it is an NLP feasible solution.

We can apply this technique for showing that a feasible NLP solution is optimal to any NLP.

A subgradient of a convex function f: \mb{R}^n \to \mb{R} at a vector \overline x is a vector \vec s \in \mb{R}^n such that h(\vec x) \le f(\vec x) for all \vec x \in \mb{R}^n where h(\vec x) = f(\overline x) + \vec s \cdot (\vec x - \overline x).

Given a subgradient s, h(\vec x) is a linear function of \vec x, h(\overline x) = f(\overline x), and h(\vec x) is by definition always a lower bound for f(x). In other words, h(\vec x) is a linear lower bound for a convex function f(\vec x), and \vec s is the slope of h(\vec x).

Note that the subgradient is not unique - consider the subgradients possible at the extreme points of a polyhedron.

A halfspace F = \set{\vec x : \vec s \cdot \vec x \le \beta} is supporting a convex set C if and only if C \subseteq F (the halfspace contains the convex set) and \vec s \cdot \vec x = \beta (the halfspace is touching the convex set).

Given a convex function g(\vec x) and a vector \overline x such that g(\overline x) = 0 and a subgradient s of g at \overline x, F = \set{\vec x : h(\vec x) \le 0} is supporting C = \set{\vec x : g(\vec x) \le 0} where h(x) = g(\overline x) + \vec s \cdot (\vec x - \overline x).

Proof:

Let g(\vec x) be a convex function and \overline x be a value such that g(\overline x) = 0.
Let s be a subgradient of g at \overline x.
Since g(\vec x) is a convex function, C = \set{\vec x : g(\vec x) \le 0} is a convex set.
Since h(\vec x) is a linear function of \vec x, F = \set{\vec x : h(\vec x) \le 0} is a halfspace.
Since h(\overline x) = g(\overline x) = 0 = \beta, F is on the boundary of C.
Since h(\vec x) \le g(\vec x) \le 0, any \vec x \in F, so C \subseteq F.
So C is supporting F.

The above is useful because it allows us to construct LP relaxations of NLPs. Suppose we have the NLP "maximize f(\vec x) subject to g_1(\vec x) \le 0 \land \ldots \land g_k(\vec x) \le 0". If \overline x is a feasible solution of the NLP such that g_i(\overline x) = 0, g_i is convex, and \vec s is a subgradient of g_i at \overline x, then we can replace the constraint g_i(\vec x) \le 0 with h(\vec x) \le 0 to get a linear version of that constraint.

So given an NLP for which all g_i are convex, \overline x is a feasible solution such that g_i(\overline x) = 0 for all tight constraints i \in I, and \vec s_i is a subgradient for g_i(\vec x) at \overline x, \overline x is optimal -\vec c \in \mathrm{cone} \set{s_i : i \in I} (the negative objective function is in the cone of the tight constraints).

To get an LP relaxation of the NLP for a feasible solution \overline x, we can replace g_i(\vec x) with h_i(\vec x) for all tight constraints, and then discard the rest of the constraints.

Clearly, if the gradient \nabla f(\vec x) of a convex function f(\vec x) exists at \overline x, then it's a subgradient of f(\vec x) at \overline x. We compute the gradient of a function of a vector as \nabla f(\vec x) = \begin{bmatrix} \frac{\dee f(\vec x)}{\dee x_1} \\ \vdots \\ \frac{\dee f(\vec x)}{\dee x_n} \end{bmatrix}. Finding the gradient is a common way of finding a subgradient of a convex function at a given point.

A Slater point is a feasible solution \overline x of an NLP "minimize \vec c \cdot \vec x subject to g_i(\vec x) \le 0 for all 1 \le i \le k" such that for all 1 \le i \le k, g_i(\overline x) < 0 (a feasible solution that isn't on the boundary of the feasible region).

The Karush-Kuhn-Tucker theorem (KKT theorem) says that given an NLP "minimize \vec c \cdot \vec x subject to g_i(\vec x) \le 0 for all 1 \le i \le k" such that:

g_i(\vec x) is convex for all 1 \le i \le k.
There exists a Slater point - a point within the feasible region.
There exists a feasible solution \overline x.
I is the set of indices of tight constraints - the indices for which g_i(\overline x) = 0.
For all i \in I, there exists a gradient \nabla g_i(\overline x) of g_i(\vec x) at \overline x.

If this is the case, then \overline x is optimal if and only if -\vec c \in \mathrm{cone}\set{\nabla g_i(\overline x) : i \in I}.

The p-norm of a vector \vec x is defined as \magn{x}_p = \left(\sum \abs{x_i}^p\right)^{\frac 1 p}.

The infinity-norm of a vector \vec x is defined as \magn{x}_\infty = \max \set{\abs{x_1}, \ldots, \abs{x_n}}.