# Hi, I'm Anthony.

## Owing Graph Minimization

July 11, 2015

Do you have a whiteboard? Do you share it with a group of people? It probably has some variation of this:

Each line of the form $X \to Y: a$ represents a transaction - the fact that $X$ owes $Y$ exactly $a$ dollars, where $X$ and $Y$ are people and $a$ is a dollar amount.

Every time a new transaction is created, we can just add a new entry to the list. However, this causes the list to slowly grow out of control:

Our goal is to make this list smaller. Let’s kick this off by representing the above example in code. In this post I will be using Python 3:

This list of transactions is a graph, where people are the vertices, transactions are edges, and dollar values are the edge labels. Edges in this graph are directed since $X$ owing $Y$ is not the same as $Y$ owing $X$. I’m going to call this structure an owing graph.

The above, then, translates to the following:

Two owing graphs are equivalent if when everyone in either graph pays the amounts they owe everyone else, they end up with the same amounts in the end. Our goal is to find the smallest possible graph (a graph with the fewest possible edges) that is equivalent to a given owing graph.

Solving this problem optimally is NP-complete, since it requires solving the subset-sum problem repeatedly. However, we don’t care and just want to generate some minimal transactions, even if it takes exponential time to run. Afterwards, we will look at an approximate, much more efficient solution that works just as well for most inputs.

First, we will calculate the overall amount each person owes - the net balances:

This algorithm will create a graph using the star topology (all vertices have 1 transaction with a designated hub vertex), which, in a set of $n$ people, results in exactly $n - 1$ transactions taking place - the hub has one transaction with each other person.

Suppose we partition the set of people into $k$ disjoint sets of people such that no set owes any other (there are no transactions between any two sets). Then, for each set, we construct a graph with a star topology. The graph for a set of size $m$ has $m - 1$ transactions, and so $n - k$ transactions take place in total (the sum of the sizes of all the subsets in the partition is always the size of the original set). If we maximize $k$, then we minimize $n - k$, the number of transactions.

Clearly, no two sets will owe any other if and only if the sum of all the net balances in the set is 0. This is because if the sum of the net balances of a set was non-zero, money must be flowing into or out of that set of people.

Therefore, we want to partition the set of people into the largest possible number of disjoint subsets, such that the sum of each subset is 0. Afterwards, we simply construct graphs out of each set in the resulting partition, and we will have the smallest possible owing graph.

The partitioning part is actually what makes the problem NP-complete - if we can solve this problem in polynomial time, we can solve the subset-sum problem in polynomial time, and prove that $P = NP$. The number of partitions of a set of size $n$ is the $n$th Bell numbers, and according to that page, there are $O\left(\left(\frac{0.792n}{\ln(n + 1)}\right)^n\right)$ (or more loosely, $O(n^n)$) possible partitions. We will use a naive approach and simply check if all the subsets of a partition add up to 0.

One way to enumerate all partitions of a set of size $n$ is to, for each partition, allocate $n$ subsets, go through each element, and assign them to one of those subsets (afterward, we filter out empty subsets from the partition) - there is one unique element assignment per unique partition:

This is very simple and works well, but we don’t get the partitions in a nice order - ideally, we want to get them in decreasing order by number of subsets, because then we can stop looking through partitions as soon as we find one where all subsets sum to 0.

There is another way to do that, which is almost as simple. We first take one vertex $v$ from the set of vertices $V$, and compute all the partitions of $V - v$. For each partition $p$ of $V - v$, we can compute one partition of $V$ by adding $v$ as a subset, and more partitions by adding $v$ to each subset of the partition in turn:

Here, we store the set of vertices as a list in order to use indexing. The only unusual thing to note is that we use max_index to mark the end of elements rather than making copies of elements, for efficiency reasons. This generator function yields partitions in order from most subsets to least subsets.

We can now calculate the partition with the most subsets that sums to 0 pretty easily:

This is pretty self-explanatory - we go through each partition, from most subsets to least subsets, and if any partition contains only subsets where the elements sum up to 0, we’ve found the result.

To generate the minimal graphs now, we go through the subsets in our partition - the sets of vertices whose net balances add up to 0. For each set of vertices, we output edges to make a graph with a star topology:

Here, we pick a hub vertex, and for each of the remaining vertices in the subset, we create a transaction between the hub and that vertex. The resulting edge list represents a minimal owing graph!

Let’s test it out with our first example:

The above outputs:

Clearly, this is optimal - it is impossible to resolve these debts using 4 or fewer transactions.

### Some say it’s still running to this day

What if we have a larger network, say, 100 people all having various transactions with each other? Aside from needing to buy a bigger whiteboard, an $O(n^n)$ algorithm isn’t really practical when we have more than a dozen or so people - each additional person massively increases the time needed to run the algorithm.

Here, we need to use approximations. Our goal is now to create an owing graph that is equivalent to the original owing graph, with a preference for smaller graphs - we no longer require that the results be optimal, only reasonably close to optimal. In the real world, reasonable people don’t borrow tiny amounts of money from every single person they know for every purchase, or specifically try to borrow from people who indirectly owe them.

Here’s an approximation that works pretty well on that sort of owing graph: instead of finding the partition with the largest number of subsets, we simply remove vertices that have a 0 net balance, and use the components of the resulting graph as the partition.

First, we compute the net balances for each person, and remove any people who have a net balance of 0 (as well as transactions that involve these people). People with a net balance of 0 neither owe or are owed any money, in the end:

We then identify the components of the graph, using repeated breadth-first traversal:

Now we treat our components list as a partition, and generate the star-topology graphs like we did earlier:

Let’s try all this out with the same example as before:

The above outputs:

In this case, the result also happens to be optimal. This is also the case for most real-world inputs!

However, the result will not be optimal whenever the largest partition with subsets that all sum to 0 is larger than the number of components in the graph where all vertices with a 0 net balance are removed. Let’s try to construct one of these graphs:

On the top left, we have a graph with a single component, where none of the vertices have a net balance of 0 (each vertex is labelled with the amount they are owed). Running the approximation on this graph results in 3 transactions, when the optimal result, shown on the bottom right, has 2.

There are many ways to improve our approximation: for example, we could detect and break cycles in the graph. However, since the problem is NP-complete, our approximations will never be perfect in all cases without also taking just as much time.