Lecture Notes by Anthony Zhang.

\[ \newcommand{\set}[1]{\left\{ #1 \right\}} \newcommand{\tup}[1]{\left\langle #1 \right\rangle} \newcommand{\abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\left\lceil#1 \right\rceil} \newcommand{\mb}[1]{\mathbb{#1}} \newcommand{\rem}{\operatorname{rem}} \newcommand{\sign}{\operatorname{sign}} \newcommand{\imag}{\boldsymbol{i}} \newcommand{\dee}{\mathop{}\!\mathrm{d}} \newcommand{\lH}{\overset{\text{l'H}}{=}} \newcommand{\evalat}[1]{\left.\left(#1\right)\right|} \newcommand{\sech}{\operatorname{sech}} \newcommand{\spn}{\operatorname{Span}} \newcommand{\proj}{\operatorname{proj}} \newcommand{\prp}{\operatorname{perp}} \newcommand{\refl}{\operatorname{refl}} \newcommand{\magn}[1]{\left\lVert #1 \right\rVert} \newcommand{\rank}{\operatorname{rank}} \newcommand{\sys}[2]{\left[ #1 \mid #2\hskip2pt \right]} \newcommand{\range}{\operatorname{Range}} \newcommand{\adj}{\operatorname{adj}} \newcommand{\cof}{\operatorname{cof}} \newcommand{\diag}{\operatorname{diag}} \newcommand{\formlp}{\operatorname{Form}(\mathcal{L}^P)} \]


Introduction to optimization.

Levent Tunçel
Section 081 (online)
Email: ltuncel@uwaterloo.ca


In this course, we will be studying linear programming, integer programming, the Simplex algorithm, and the concept of duality, as well as various ways to efficiently approximate answers to these sorts of problems.

An abstract optimization problem (also known as a "P") is a problem in which we are trying to maximize or minimize a certain objective function - finding the inputs to the function, subject to certain constraints, such that the value of the function is maximised. In other words, given \(A \subseteq \mb{R}^n\) (the set of all feasible inputs) and a function \(f: A \to \mb{R}\) (the objective function), find \(x \in A\) such that \(f(x)\) is the global maximum/minimum.

Optimization problems show up in almost every industry - hardware companies might want to simplify their circuit layouts, hotels might want to fill the most rooms possible at the highest price, and a factory might want to generate the most profit given constraints on materials and labour. Also, you can use it to play Pandemic optimally.

There are three types of abstract optimization problems:

Abstract optimization problems are usually given as prose descriptions of the problems, which we then turn into abstract mathematical models/programs/problems. Then, we can take advantage of the many powerful computer solvers available today to automatically find optimal solutions.

Suppose a company produces either plastic carrots (which sell for $5 each and take 4 minutes to make) or rubber potatoes (which sell for $8 each and take 8 minutes to make). The company has 500 minutes available in the rest of the day, and the machine that makes either of these products can only be used up to 100 times - how many of each product should be made to maximize profit?

First, we define our decision variables - the actual values we are trying to find. Let \(c\) be the number of carrots, and \(p\) be the number of potatoes.
Profit can now be defined as a function of those decision variables: \(f(c, p) = 5c + 8p\).
Next, we can define constraints to specify \(A\). There are only 500 minutes left, and the time spent depends on the number of each product made, so \(4c + 8p \le 500\). There are only 100 products left that we can make, and this is simply the number of carrots and potatoes, so \(c + p \le 100\). Additionally, \(c\) and \(p\) must be non-negative, since they represent the number of each product.
We now have a mathematical model of the problem: "maximize \(5c + 3p\) subject to \(4c + 3p \le 500\) and \(c + p \le 100\)".

A feasible solution to an abstract optimization problem is any \(x \in A\) - any variable assignment such that the constraints of the problem are satisfied, like \(c = 50, p = 10\). An optimal solution is one that actually solves the problem - a feasible solution that results in the global maximum/minimum of the objective function. Likewise, feasible solutions to prose descriptions of the problem are assignments to the unknowns, like "make 50 carrots and 10 potatoes".

To show that a mathematical model of a problem is correct (properly represents the prose description of the problem), we usually define a bijection between feasible solutions for the mathematical model and feasible solutions for the prose description of the problem, such that the bijection preserves cost - each \(x \in A\) for the mathematical model must have \(f(x)\) equal to the the profit in the prose description for its corresponding feasible solution.

For example, show that the model from the above example is correct:

\(4c + 3p \le 500 \land c + p \le 100\) can be trivially bijected to "carrots take 4 minutes to make and potatoes take 8 minutes, and the machine can only be used to make 100 products".
Clearly, \(f(c, p) = 5c + 8p\) has the same value as the profit since "carrots sell for $5 each and potatoes sell for $8 each".


In this course we will mostly focus on affine optimization problems, since solving general optimization problems is much harder.

An affine function is one that can be written as \(f(\vec{x}) = \vec{a} \cdot \vec{x} + \beta\) where \(\vec{a} \in \mb{R}^n\) and \(\beta \in \mb{R}\). If \(\beta = 0\), the function is also linear.

A linear program is an abstract optimization problem of the form \(\min \set{f(x) : x \in \mb{R}^n \wedge \left(\forall 1 \le i \le m, g_i(x) \le b_i\right)}\) where \(f\) is affine, \(g_1, \ldots, g_m\) are all linear, and \(m\) is finite. We might also replace \(\min\) with \(\max\) depending on the problem.

Usually we write this as: "minimize \(f(x)\) subject to \(g_1 \le b_1, \ldots, g_m \le b_m\)", or "maximize \(f(x)\) s.t. \(g_1 \le b_1, \ldots, g_m \le b_m\)". It's also common to use \(\vec{x} \le \mb{0}\) as shorthand for \(x_1 \ge 0, \ldots, x_n \ge 0\).

Suppose an oil company in the next 4 months needs to supply 5000L, 8000L, 9000L, and 6000L, respectively, and oil costs them 0.75/L, 0.72/L, 0.92/L, and 0.90/L, respectively. The company has an oil tank that can store up to 4000L, and starts off with 2000L. Oil is bought at the beginning of the month, is used throughout the month, and is all dumped into the tank at the end of the month. What quantity of oil should be bought in each month to minimize cost?

Clearly, the inputs are the quantity of oil bought in each month, \(b_1, b_2, b_3, b_4\). For convenience, we will also define the tank value at the beginning of each month, \(t_1, t_2, t_3, t_4\). Clearly, \(0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000\), since the capacity of the tank is 4000L.
The objective function we are minimizing is \(f(b_1, b_2, b_3, b_4) = 0.75b_1 + 0.72b_2 + 0.92b_3 + 0.90b_4\).
Clearly, the tank level in month \(i\) is \(t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000\), since the tank level is just the leftover oil after supplying the last month's oil.
Clearly, in order to supply enough oil, we must have \(t_1 + b_1 \ge 5000, t_2 + b_2 \ge 8000, t_3 + b_3 \ge 9000, t_4 + b_4 \ge 6000\). However, the first three constraints are redundant with the previously defined constraints \(t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000\) and \(0 \le t_1, 0 \le t_2, 0 \le t_3, 0 \le t_4\), so we only need the \(t_4 + b_4 \ge 6000\) constraint.
These encode all of the problem constraints, so we have "minimize \(0.75b_1 + 0.72b_2 + 0.92b_3 + 0.90b_4\) subject to \(0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000\) and \(t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000\) and \(t_4 \ge 6000\)".
If we use a linear programming solver, we get \(b_1 = 3000, b_2 = 12000, b_3 = 5000, b_4 = 6000\), which gives us the minimum cost 20890.

Same thing, but minimise the maximum amount of oil purchased in any single month, and then show that the model is correct assuming that the above example's model was correct:

From the previous example, we have the same constraints \(0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000\) and \(t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000\) and \(t_4 \ge 6000\).
However, the objective function is different - let \(M\) be the maximum amount of oil bought in a single month, and we can minimize this value. We can define \(M\) to be the maximum amount of oil bought in a single month by adding the constraints \(b_1 \le M, b_2 \le M, b_3 \le M, b_4 \le M\).
These encode all of the problem constraints, so we have "minimize \(M\) subject to \(0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000\) and \(t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000\) and \(t_4 \ge 6000\) and \(b_1 \le M, b_2 \le M, b_3 \le M, b_4 \le M\)".
Now to show correctness. Suppose that \(M\) is an optimal solution that has values \(b_1, b_2, b_3, b_4\). Since \(M\) is a feasible solution, \(M \ge \max\set{b_1, b_2, b_3, b_4}\), and since \(M\) is an optimal solution, it is the lowest such value that satisfies that, so \(M = \max\set{b_1, b_2, b_3, b_4}\). Therefore, \(M\) is also the minimum maximum amount of oil purchased in any single month.


Integer programs are just linear programs plus a integrality constraint - a constraint that all of the variables must be integers. When we formulate integer programs, we just add the constraint \(x_1, \ldots, x_n \text{ are integral}\).

An integer program is mixed if there are still non-integral variables, and pure if it only has integral variables. Integer programs are harder to solve than LPs.

The size of a linear/integer program generally refers to either the number of variables, of the number of constraints. The running time of an IP/LP solver algorithm is the number of steps it takes, as a function of the size of the IP/LP.

Efficient algorithms are polynomially. All LPs can be solved in polynomial time, though with a high exponent. Solving integer programs, however, are in NP, and so are highly unlikely to be solvable in polynomial time.

The knapsack problem is a special case of integer programs. Given a set of \(n\) items, with values \(v_1, \ldots, v_n\) and weights \(w_1, \ldots, w_n\), how do we maximize the total value of the items we can carry in a knapsack given that the total weight cannot exceed \(m\)? From CS341, we know that if the number of items can be fractional, we can just use a greedy algorithm.

As an integer program, we have: maximize \(v_1 x_1 + \ldots = v_n x_n\) subject to \(w_1 x_1 + \ldots + w_n x_n \le m\), \(x_1, \ldots, x_n \text{ are integral}\).

Write the IP for a knapsack problem where we have \(n = 4\) and either \(x_1 + x_2\) is at least 4, or \(x_3 + x_4\) is at least 4, and we can only send \(x_3\) if we send at least one \(x_4\):

We're trying to maximize \(v_1 x_1 + v_2 x_2 + v_3 x_3 + v_4 x_4\).
Clearly, we can express \(x_1 + x_2 \ge 4\) AND \(x_3 = x_4 \ge 4\), but how do we express \(x_1 + x_2 \ge 4\) OR \(x_3 + x_4 \ge 4\)?
One way to do this is by adding a binary variable \(y\) to our IP, constrained such that \(0 \le y \le 1\) and \(y \text{ is integral}\).
We can now express \(x_1 + x_2 \ge 4 \lor x_3 + x_4 \ge 4\) as \(x_1 + x_2 \ge 4y, x_3 + x_4 \ge 4(1 - y)\). Whenever \(y\) is 0, \(4y\) is 0, so \(x_1 + x_2\) will always be true (since \(x_1\) and \(x_2\) are non-negative), while \(4(1 - y)\) is 4, so the second constraint becomes \(x_3 + x_4 \ge 4\). Whenever \(y\) is 1, \(4y\) is 4, so the first constraint becomes \(x_1 + x_2 \ge 4\), while the second constraint is always true (since \(x_3\) and \(x_4\) are non-negative).
Now, how do we express sending \(x_3\) only if we send at least one \(x_4\)? We want to express that \(x_4 = 0 \implies x_3 = 0\).
One way to do this is to multiply \(x_4\) by a large enough value so that if it's 1 or more, the resulting value will definitely be greater than \(x_3\). For example, \(x_3\) must be no greater than \(\floor{\frac m {w_3}}\), so we can add the constraint \(x_3 \le \floor{\frac m {w_3}} x_4\) to do this. When \(x_4 = 0\), \(\floor{\frac m {w_3}} x_4\) is also 0, so \(x_3\) must be 0 as well. When \(x_4 \ge 1\), \(\floor{\frac m {w_3}} x_4 \ge \floor{\frac m {w_3}}\), so \(x_3 \le \floor{\frac m {w_3}} x_4\) is always true.
Therefore, we can write this as "maximize \(v_1 x_1 + v_2 x_2 + v_3 x_3 + v_4 x_4\) subject to \(w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 \le m\), \(x_1 + x_2 \ge 4y\), \(x_3 + x_4 \ge 4(1 - y)\), \(0 \le y \le 1\), \(y \text{ is integral}\), \(x_3 \le \floor{\frac m {w_3}} x_4\), \(x_1, x_2, x_3, x_4 \ge 0\), \(x_1, x_2, x_3, x_4 \text{ is integral}\)".

The above example demonstrates a common trick of using a binary variable to express logical constraints involving disjunction. The binary variable lets us make constraints that apply when it has one value, and always be true for other values.

Scheduling is one of the most common optimization problems in industry. Suppose we have the coffee shop that is open on weekdays, hires workers that work for 4 consecutive weekdays per week (can wrap around weeks, and ignoring weekends), and requires 3, 5, 9, 2, and 7 workers to meet demand on Monday through Friday, respectively. How do we hire the smallest number of workers that meets this demand, and how do we schedule them to meet this demand?

First, we could define 5 variables, \(x_1, \ldots, x_5\), for the number of workers who should start working on each weekday. Then, the objective becomes to minimize \(x_1 + \ldots + x_5\).

What are the feasible solutions? Well, the constraints are that the number of workers starting on each day must be non-negative, and that the demand is met each day. To ensure we have enough demand for each day, we can define five constraints for each weekday. For example, for Monday demand to be met, we need the number of workers who start on Monday, Friday, Thursday, or Wednesday to be at least 3, since Monday demand is 3 and on Monday, we have workers from up to 4 days ago: \(x_1 + x_5 + x_4 + x_3 \ge 3\).

So, the problem becomes: "minimize \(x_1 + \ldots + x_5\) subject to \(x_1 + x_5 + x_4 + x_3 \ge 3\), \(x_2 + x_1 + x_5 + x_4 \ge 5\), \(x_3 + x_2 + x_1 + x_5 \ge 9\), \(x_4 + x_3 + x_2 + x_1 \ge 2\), \(x_5 + x_4 + x_3 + x_2 \ge 7\), and \(x_1, \ldots, x_5 \ge 0\)".

Suppose we wanted to constrain a variable \(x\) to be one of \(k_1, \ldots, k_n\). One way we could do this is to have binary variables \(y_1, \ldots, y_n\), where \(y_i\) represents \(x = k_i\) - adding a constraint of the form \(x = k_1 y_1 + \ldots + k_n y_n\). Then, we can ensure that exactly one of them is true by adding the constraint \(y_1 + \ldots y_n = 1\).


Another common type of linear programming problem in the real world is graph optimization. For example, how do we find the shortest path on a map? How do we optimize a microchip layout?

For the shortest path on a map example, we can associate street intersections with vertices, and streets with edges, weighted by the distance (a non-negative real number) along its length between intersections.

An \(s, t\)-walk in a graph is a sequence of edges \(s v_1, v_1 v_2, \ldots, v_k t\). An \(s, t\)-path is an \(s, t\)-walk such that non-consecutive edges in the sequence don't share any vertices (we never revisit the same vertex twice).

The length of an \(s, t\)-path is the sum of the weights of the edges in the path. So, given a graph \(G = \tup{V, E}\), a weight function \(w: E \to \mb{R}\), and two vertices \(s, t\), we want to find the minimum length \(s, t\)-path. How do we write this as an IP?

Suppose we have tasks \(t = t_1, \ldots, t_n\) and machines \(m = m_1, \ldots, m_n\), and \(c_{m_i, t_j}\) represents the cost of having machine \(m_i\) perform task \(t_j\). How do we match machines to tasks to minimize total cost?

Create a graph with one vertex for each job and edges between them weighted by the corresponding cost.
We want to find the perfect (every vertex in the graph is incident to an edge in the matching) matching (subset of edges such that none share vertices) in the bipartite graph with the minimum total weight.
Let \(\delta(v)\) be a function that returns the set of all edges incident to a given vertex. Then a set of edges \(M\) is a perfect matching if and only if \(\forall v \in V, \magn{M \cap \delta(v)} = 1\).
For our IP, we can have a binary variable \(x_{\text{machine}, \text{task}}\) for every edge \(\tup{\text{machine}, \text{task}}\). Now we can simply minimize \(\sum_{m_i \in m, t_j \in t} c_{m_i, t_j} x_{m_i, t_j}\).
To constrain solutions to perfect matchings only, we can add the constraints \(\sum_{m_i \in m} \sum_{t_j \in \delta(m_i)} x_{m_i, t_j} = 1\) and \(\sum_{t_j \in t} \sum_{m_i \in \delta(t_j)} x_{m_i, t_j} = 1\) (for each vertex, choose exactly one incident edge).


The shortest path problem: given a graph \(G = \tup{V, E}\), what is the \(s, t\)-path that has the minimum total length? We want to formulate this as an IP.

Recall from MATH239 that a cut in a graph is a partitioning of \(V\) into two disjoint subsets, often written as just one of the sets. In other words, a cut is a pair \(\tup{S, T}\) such that \(S \subseteq V, T \subseteq V, S \cap T = \emptyset, S \cup T = V\).

An \(s, t\)-cut is a set of edges rather than a pair of sets of vertices. Let \(\delta(S) = \set{\set{u, v} \in E : u \in S, v \notin S}\) for \(S \subseteq V\). Then given \(s \in S\) and \(t \notin S\), \(\delta(S)\) is an \(s, t\)-cut. To enumerate all \(s, t\)-cuts, we can simply find all subsets of \(V\) that contain \(s\) but not \(t\), then apply \(\delta\) to those sets of vertices.

Clearly, if we remove the edges contained in the \(s, t\)-cut from the graph, \(s\) and \(t\) would no longer be connected. Therefore, every \(s, t\)-path must contain at least one edge from every \(s, t\)-cut.

Also, if \(S \subseteq E\) contains at least one edge from every \(s, t\)-cut, then \(S\) also is a superset of an \(s, t\)-path. This is because if \(S\) had an edge from every \(s, t\)-cut and didn't have an \(s, t\)-path, the set of all vertices \(R\) that can be reached from \(s\) is an \(s, t\)-cut (since it includes \(s\) but not \(t\), as no \(s, t\)-path exists by assumption), so \(S\) must include an edge from \(\delta(R)\). However, \(S\) can't contain any edges from \(\delta(R)\), because if it did contain an edge \(\set{u, v}\) then both \(u\) and \(v\) should have been in \(R\) (since they're reachable from \(s\)). Therefore, \(S\) must also contain an \(s, t\)-path.

Define binary variables \(x_1, \ldots, x_m\) for each edge \(e_1, \ldots, e_m\) being or not being in the shortest path, respectively. Let \(w_1, \ldots, w_n\) be the weights of each edge \(e_1, \ldots, e_m\), respectively.

Clearly, we want to minimize \(x_1 w_1 + \ldots + x_m w_m\) (i.e., the length of the path). What are the feasible solutions (i.e., supersets of \(s, t\)-paths)?

To be an \(s, t\)-path, the edges of the path must include at least one edge from every \(s, t\)-cut. This is a lot easier to write as constraints: we have one constraint for each \(s, t\)-cut \(\delta(S)\), and each constraint ensures that at least one of the edge variables \(T \subseteq \set{x_1, \ldots, x_m}\) corresponding to \(\delta(S)\) is true.

For example, given the graph \(G = \tup{\set{a, b, s, t}, \set{ab, as, at, bs, bt}}\) with weights \(w_1, \ldots, w_5\), we might write the shortest path problem as "minimize \(x_1 w_1 + \ldots + x_5 w_5\) subject to \(x_2 + x_4 \ge 1\) (for the \(s, t\)-cut \(\set{sa, sb}\)), \(x_1 + x_2 + x_3 \ge 1\) (for the \(s, t\)-cut \(\set{ab, as, at}\)), \(x_1 + x_2 + x_5 \ge 1\) (for the \(s, t\)-cut \(\set{ab, as, bt}\)), \(x_3 + x_5\) (for the \(s, t\)-cut \(\set{at, bt}\)), and \(x_1, \ldots, x_m\) are non-negative integers".

Note that we don't need an upper bound on \(x_1, \ldots, x_m\). This is because if it was greater than zero, then it wouldn't be an optimal solution.

Nonlinear programs are problems of the form "minimize \(f(x)\) subject to \(g_1(x) \le 0, \ldots, g_m(x) \le 0\)" where \(f, g_1, \ldots, g_m: \mb{R}^n \to \mb{R}\). This is a superset of linear programs and integer programs. Basically, we can do arbitrary constraints, math permitting.

For example, we can minimize Euclidean distance by using an objective function \((x_1 - p_1)^2 + (x_2 - p_2)^2 + (x_3 - p_3)^2\). Another example is constraining variables to integers: simply add the constraint \(\sin(\pi x) = 0\) to ensure that \(x\) is an integer, or \(x(1 - x) = 0\) to ensure that \(x\) is either 0 or 1.

It's even possible to write Fermat's last theorem as a single NLP: "minimize \((a^n + b^n - c^n)^2\) subject to \(\sin(\pi a) = 0, \sin(\pi b) = 0, \sin(\pi c) = 0, \sin(\pi n) = 0, a, b, c \ge 1, n \ge 3\)" - the objective function solution is 0 if and only if \(a^n + b^n = c^n\), and the theorem is true if and only if the optimal value is greater than 0.


What does it mean to solve an optimization problem? What is a solution to an optimization problem?

Consider the LP "maximize \(2x_1 + 3x_2\) subject to \(x_1 + x_2 \le 1, x_1, x_2 \ge 0\)". Clearly, in this case the optimal solution is simply \(x_1 = 1, x_2 = 0\). However, in general, it is more difficult to say what exactly can be considered a solution.

A feasible solution is an assignment of values to each of the variables in the LP such that all of the constraints are satisfied - regardless of the objective function value. A feasible optimization problem is one that has at least one feasible solution, and an infeasible optimization problem is one that has none. An optimal solution is a feasible solution where the objective function is maximized (for a maximization problem), or minimized (for a minimization problem).

However, an optimization problem having a feasible solution doesn't mean that it has an optimal solution - the maximum/minimum objective function value could be unbounded, like for "maximize \(x\) subject to \(x \ge 0\)". If for any feasible solution there exists another with a higher/lower objective function value (depending on whether it's a maximization/minimization problem), the optimization problem is unbounded, and otherwise, it is bounded.

Consider the optimization problem "maximize \(x\) subject to \(x < 1\)". Clearly, this is feasible (\(x = 0\) is a solution), and bounded (\(x \le 1\) because \(x < 1\)), yet it has no optimal solution - the objective function can always get closer to 1 for any given value of \(x\)! The same occurs for the optimization problem "minimize \(\frac 1 x\) subject to \(x \ge 1\)". Note that neither of these are LPs, because one uses a strict inequality, and the other a division.

LPs are a bit better behaved than general optimization problems. According to the Fundamental Theorem of Linear Programming, an LP either has at least one optimal solution, is infeasible, or is unbounded.

An LP solver should therefore output one of the following: an optimal solution to the input LP (plus proof that it's optimal), a proof that the LP is infeasible, or a proof that the LP is unbounded.

Consider the linear system \(A\vec x = \vec b, \vec x \ge \vec 0\). Clearly, if we can find a \(\vec y\) such that \({\vec y}^T A \ge {\vec 0}^T\) and \({\vec y}^T \vec b < {\vec 0}^T\), then the system can't have any solutions - we can add/subtract/scale some of the constraints in the linear system together to make a new constraint, such that the coefficients of the constraint are all positive, yet the right hand side is negative, which isn't satisfiable.

According to Farkas' Lemma, if there's no solution, then a \(\vec y\) that satisfies those conditions must exist. Therefore, a value of \(\vec y\) such that \({\vec y}^T A \ge {\vec 0}^T\) and \({\vec y}^T \vec b < {\vec 0}^T\) is a proof of infeasibility.

For example, consider an LP with constraints \(3x_1 - 2x_2 = 6, 2x_1 - x_2 = 2\). If we choose \(y = \begin{bmatrix} -1 \\ 2 \end{bmatrix}\), then we get \({\vec y}^T A = \begin{bmatrix} 1 & 0 \end{bmatrix}\) and \({\vec y}^T \vec b = -4\). This means that the two constraints, with the first one subtracted from the second one doubled, imply the constraint \(x_1 = -2\), which is always infeasible because \(\vec x \ge \vec 0\).

To prove that an optimal solution \(\vec x\) for an LP is actually optimal, we simply need to prove that for any feasible solution \(\vec x'\), the objective function value is less or equal to the objective function value for \(\vec x\) - basically, proving that the upper bound of the objective function occurs at \(\vec x\). This is a proof of optimality.

Consider the LP "maximize \(2x_1 + 3x_2\) subject to \(x_1 + x_2 \le 1, x_1, x_2 \ge 0\)". Clearly, any feasible solution \(x_1, x_2\) must be

To prove that an LP is unbounded, we construct a family of feasible solutions \(\vec x(t)\) for all \(t \ge 0\), then show that as \(t\) goes to infinity, so does the value of the objective function. In other words, we need to prove that for any arbitrarily high objective function value (assuming a maximization problem), we can construct a variable assignment that is both a feasible solution, and results in that objective function value.

This family of feasible solutions is always a line in the linear space. Therefore, a family of feasible solutions \(\vec x(t) = \vec x_0 + t \vec r\) such that for any \(z\), there exists a \(t\) such that \(\vec x(t) = z\) is a proof of unboundedness. In matrix form, a proof of unboundedness is simply a value of \(\vec x_0, \vec r\) such that \(\vec x_0 \ge \vec 0, \vec r \ge \vec 0, A \vec x_0 = \vec b, A \vec r = \vec 0, {\vec c}^T \vec r > 0\). Also, if the LP is unbounded, those values of \(\vec x_0, \vec r\) must exist.

Consider the LP "maximize \(x\) subject to \(x \ge 0\)". Clearly, for any objective function value \(z \ge 0\), we can construct an assignment \(x = z\), which must be a feasible solution since \(z \ge 0\). Additionally, the objective funciton value for the feasible solution \(x\) must be \(z\). Therefore, the LP is unbounded, and a proof for that would be \(\begin{bmatrix} 0 \end{bmatrix}, \begin{bmatrix} 1 \end{bmatrix}\).

Basically, whichever outcome an LP has, there must exist a proof that that outcome is the case.

A free variable is a variable that is not constrained.

An LP is in standard equality form (SEF) if and only if all of the following are true:

In other words, an LP in SEF is a problem of the form "maximize \(\vec c \cdot \vec x\) subject to \(A \vec x = \vec b, \vec x \ge \vec 0\)", where \(A\) is an \(n\) by \(m\) coefficients matrix and \(\vec x, \vec b, \vec c \in \mb{R}^m\).

Every LP has an "equivalent" LP that is in SEF. By equivalent, we mean the original LP is unbounded if and only if the equivalent LP is unbounded, the original LP is infeasible if and only if the equivalent LP is infeasible, and an optimal solution of one can easily be used to compute an optimal solution of another.

To convert an LP into its equivalent in SEF:

Once this is done, we have an equivalent LP in SEF. Also, we can easily convert optimal solutions for the LP in SEF into optimal solutions for the original LP, simply by taking the values of the corresponding variables and ignoring the newly introduced variables.


The basic idea behind the simplex algorithm is for us to find a feasible solution \(x\), then repeatedly find a feasible solution better than \(x\) until we find one that is either optimal or realize that the LP is unbounded.

We can actually solve LPs by hand in this way, by building an intiution for what would maximise the objective function, and trying to satisfy the constraints with that in mind.

For example, suppose we have the LP "maximise \(4x_1 + 3x_2 + 7\) subject to \(3x_1 + 2x_2 + x_3 = 2\) and \(x_1 + x_2 + x_4 = 1\) and \(x_1, x_2, x_3, x_4 \ge 0\)". Clearly, we want to maximise \(x_1\) and \(x_2\) while still satisfying the constraints, and feasible solutions that

Creative Commons License This work by Anthony Zhang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Copyright 2013-2017 Anthony Zhang.