# An Online Algorithm to Check for Bipartite Graphs

### Bipartite Graphs

A graph $$G(V, E)$$ is called bipartite if its vertices can be divided into two groups $$X$$ and $$Y$$ such that every edge connects one vertex from $$X$$ and one vertex from $$Y$$. The graph drawn below is bipartite.

Given a graph, can you determine if it is bipartite?

This problem is pretty straightforward because the vertices of a bipartite graph can be colored with two colors such that every edge connects two vertices of different colors (see above). This property is called 2-colorability. The converse is also true, that is if a graph is 2-colorable, it is also bipartite. Hence, it order to check for bipartiteness, we can run a Breadth First Search on the graph and color the vertices as we discover them. It would take $$O(V+E)$$ time and storage.

### An Online Variant

First of all, what is an online algorithm? An online algorithm is an algorithm that processes the input in a piece by piece in a sequential manner and have access to limited storage. These algorithms are useful in many realistic scenarios.

Now consider an online variant of the previous problem: suppose your algorithm gets a single pass over the edges of a graph and has access to only $$O(V)$$ storage. In other words, the edges are fed into the algorithm one at a time, but the algorithm cannot store all of them. Can you determine if this graph is bipartite?

This is a practical problem because a graph with $$|V|$$ vertices can have as many as $$|V|^2$$ edges. For instance, if a graph has $$10^{10}$$ vertices, there can be $$O(10^{20})$$ edges. Storing them in the memory might be impossible. The problem above attempts to bypass this issue by getting the edges sequentially, and never storing all of them. I’ll show that it is indeed possible to design such an algorithm with only $$O(E\log V)$$ runtime.

We’ll use the 2-colorability property once more. In the beginning, the graph is 2-colorable since all the vertices are disjoint. Let’s color all the vertices to be red. Every time a new edge is added to the graph, it connects two vertices of either different or same colors. In the former case, the graph remains bipartite. In the later case, one of the following two scenarios may occur:

1. The edge connects two vertices from the same connecting component. Now, if the edge connects two vertices of same color, the graph is no longer 2-colorable. We can output FAIL and stop.
2. The edge connects two vertices from different connected components. In this case, if the edge connects two edges of the same color, the graph is still 2-colorable. However, we need to flip colors of all the vertices in one of the connected components (see below).

A naive way to do #2 would be to flip color of every vertex in one of the connected components. This may take $$O(V)$$ time for each edge, the total runtime being $$O(EV)$$. We need a smarter way.

One potential approach is to do the flipping in a lazy manner: if #2 happens, only remember that colors in a connected components need to be flipped, instead of actually flipping them. When the algorithm checks the color of a vertex,

• go through your records to find the connected components the vertex was previously in.
• Observe how many of those connected components had their colors flipped.
• Since all vertices started as red, the number of flips determine current color of that vertex.

This reminds us of a data structure called Union-Find.

### A Quick Intro to Union-Find

This data structure is is essentially a list of mutually disjoint sets. It has three methods:

• make-set: makes a set with a unique id
• find(x): given element $$x$$, finds the id of the set $$x$$ belongs to.
• union(x, y): performs the set-union on the sets $$x$$ and $$y$$ belong to.

Union-Find is usually implemented as a forest of trees, with root of every tree being its id. Every node contains a pointer to its parent. Roots contain pointers to themselves. find(x) looks for the root of a tree. union(x, y) first does find(x) and find(y), then attaches the root of y as a child to the root of x.

In a naive implementation, the cost of the find and union functions are $$O(n)$$. However, those can be reduced to $$O(\log n)$$ via a technique called union by rank. In this technique, a shorter tree is always attached to a taller tree. You can easily show (by induction) that such a tree with height $$n$$ must contain atleast $$2^n$$ nodes. In other words, the time complexity for this data structure is $$O(\log n)$$.

### Faster Algorithm Using Union Find

In our problem, every connected component can be thought of as a disjoint set. Every node starts in its own set. When two connected components are merged, we perform set-union over their corresponding sets. Now, notice that, the root of every tree can remember if the connected component needs to flip color during the union. When we call find on an element, we’ll pass through all its ancestors, which were roots in some previous connected components. We can check each of them to see if those components had their colors flipped. Since each find costs $$O(\log V)$$ time, computing the color of each node requires $$O(\log V)$$ time, too. Now, each of the steps #1 and #2 in the previous algorithm takes $$O(\log V)$$ time. Hence, the total runtime for our modified algorithm is $$O(E\log V)$$.

### Appendix

In Union Find, there is another technique, called Path Compression, which (along with Union by Rank) reduces the data structure’s  amortized complexity to $$O(\alpha(n))$$. The previous algorithm works with path compression, but needs to be modified. This is left as an exercise to the reader.

# A Blind Robot Beside an Infinite Wall

Let’s think about the following problem:

Consider a wall that stretches to infinitely in both directions. There is a robot at position $$0$$ and a door at position $$p\in\mathbb Z$$ along the wall $$(p\neq 0)$$. The robot would like to get to the door, but it knows neither $$p$$, nor the direction to the door. Furthermore, the robot cannot sense or see the door unless it stands right next to it. Give a deterministic algorithm that minimizes the number of steps the robot needs to take to get to the door.

This problem quite famous in Bangladesh because of Dr. Mohammad Kaykobad, a renowned Bangladeshi computer scientist. He maintains a list of interesting problems like this one to attract high school kids into problem solving.

Dr. Kaykobad’s formulation of this problem is quite ambiguous. He simply asks how the robot should get to the door, without specifying whether the algorithm should be deterministic. It matters as we shall see.

In this article, I shall provide a deterministic algorithm that takes at most $$9|p|$$ steps for any $$p$$. I shall also prove that any deterministic algorithm will take at least $$8|p|$$ steps. I suspect this bound can be improved up to $$9|p|$$, but I didn’t have time to think much.

### My $$\leq 9|p|$$ Deterministic Algorithm

for i = 0, 1, 2, ...:
walk up to (-2)^i, but don't walk past p
if you find the door: break loop
else: get back to 0


Suppose $$p$$ is found on $$i$$-th round. Then, $$|p| > 2^{i-2}$$, otherwise $$p$$ would have been found on $$(i-2)$$-th round. Now, it is easy to see the robot takes $$2^1+\dots+2^{i}<2^{i+1}$$ steps prior to $$i$$-th round. It takes $$|p|$$ more steps to get to $$p$$. Therefore, the total number of steps is at most $|p|+2^{i+1}=|p|+8\cdot2^{i-2}<9|p|$ As $$|p|\to\infty$$, total number of steps converges to $$9|p|$$.

(Disclaimer: Finding a $$9|p|$$ algorithm was a homework problem in my class 6.046. The algorithm and the runtime analysis shown above was part of the official solution. My algorithm was the same, but my runtime analysis was much uglier.)

### Other Deterministic Algorithms

In this section, we shall prove that any deterministic algorithm for this problem will take at least $$8|p|$$ steps.

In any algorithm, the robot has to walk some steps to the left/right, then turn back and walk some more steps, then turn again, and so on. Without loss of generality, we may assume the optimum algorithm takes first step to the right.

Consider the the optimum deterministic algorithm $$OPT$$ with minimum number of turns $$n$$. Assume that $$\{a_i\}_{i=1}^n$$ denotes the number of steps $$OPT$$ takes during the $$i$$-th turn. Define $$A_i=\sum_{j=1}^i(-1)^{j+1}a_j$$ to be the position of the robot after $$i$$ steps. Also define $$A_0=0$$. Suppose $$OPT$$ takes fewer than $$8|p|$$ steps for all $$p$$.

Claim 1: $$a_n>\cdots a_i>\cdots a_1 > 0$$ for $$n> i > 1$$.

Proof: If the robot takes $$0$$ step at any turn, we can simply ignore that turn and merge the steps of the next turn with the steps of the previous turn. Hence, $$a_1,a_i,a_n > 0$$.

Suppose $$a_{i-1}-a_{i} \ge 0$$ for some $$i$$. This means at $$i$$-th step, the robot backtracked to somewhere between $$A_{i-1}$$ and $$A_{i-2}$$. Then, at the next step, it must go past $$A_{i-1}$$, otherwise, $$(i+1)$$-th step will revisit some previously visited vertices and it can be removed by walking to $$(a_{i-1}-a_i+a_{i+1})$$ at $$i$$-th step. This is a contradiction to the minimality of $$OPT$$. Hence, we must have $$(a_{i-1}-a_i)+a_{i+1} > a_{i-1}$$. However, in this case, we can get a better algorithm by taking $$a_{i-1}-a_i+a_{i+1}$$ steps at the $$(i-1)$$-th turn, eliminating both $$a_i$$ and $$a_{i+1}$$. This is another contradiction. Hence, $$a_{i-1}-a_{i}< 0$$.

$$a_{n}> a_{n-1}$$ must be true, otherwise the $$n$$-th step is redundant.

Claim 2: $$a_i=|A_i|+|A_{i-1}|$$ for all $$i\ge 1$$. Furthermore, $$|A_{i+2}|-|A_{i}|=a_{i+2}-a_{i+1}>0$$.

Proof: Trivial to check for odd and even $$i$$. This claim also implies $$|A_{i+2}|>0$$.

Claim 3: For $$i\ge 2$$,
$\sum_{j=1}^{i}a_j=|A_{i}|+2\sum_{j=1}^{i-1}|A_j|$

Proof: Apply claim 3 for every $$a_j$$.

Claim 4: $$|A_i| \leq 2.5|A_{i-1}|$$ for $$i\ge 6$$.

Proof: Suppose $$i$$ is odd since the other case is analogous. Consider $$p= -|A_{i-1}|- 1 < A_{i-1}$$ . The robot does not find $$p$$ until at least $$(i+1)$$-th turn, since $$|A_{i-1}|>|A_{i-3}|>\cdots>|A_2|$$ (claim 2). Thus,
$\left(\sum_{j=1}^{i}a_j\right)+a_{i}+1 < 8(|A_{i-1}|+1)$ The left hand side can be thought as follows. First, the robot walks to $$A_{i}$$. After $$a_{i}$$ more steps in the opposite direction, it reaches $$A_{i-1}$$ again. Then it takes 1 more step to reach $$p$$. Rewrite the left hand side using claim 2 and 3 to get $1+|A_{i-1}|+2\sum_{j=1}^{i}|A_j|< 8|A_{i-1}|+8$ It simplifies to $1+2|A_{i}|+2\sum_{j=1}^{i-2}|A_j|< 5|A_{i-1}|+8$ $\implies |A_{i}|< 2.5|A_{i-1}|+3.5-\sum_{j=1}^{i-2}|A_j|<2.5|A_{i-1}|$ The last inequality follows from the fact that $$i\ge 6$$ and $$|A_j| > 0$$. (Q.E.D)

Notice that claim 4 can be strengthened iteratively.

Claim 5: Suppose we have proven $$|A_i|\leq b|A_{i-1}|$$ for some $$b$$ and for all sufficiently large $$i$$. Let $$b’=(2.5-1/b-1/b^2)$$. We claim that $$|A_k|\le b’|A_{k-1}|$$ for all sufficiently large $$k$$.

Proof: Pick a sufficiently large $$i$$ such that $$|A_{i-2}|\leq b|A_{i-3}|$$ We proceed like in the previous proof to get
$|A_{i}|< 2.5|A_{i-1}|+3.5-\sum_{j=1}^{i-2}|A_j|\qquad(1)$ Notice that $|A_{i-2}|+|A_{i-3}|\ge \left( \frac{1}{b}+\frac{1}{b^2} \right)|A_{i-1}|$ Since $$i$$ is sufficiently large, the right hand side of (1) is at most $(2.5-1/b-1/b^2)|A_{i-1}|=b'|A_{i-1}|$ Hence, $$|A_{i}| < b' |A_{i-1}|$$ . (Q.E.D.)

We start with $$b=2.5$$ and repeatedly apply claim 5. We find the following values for $$b$$:

$\begin{array}{cl} \text{iter.} & b\\ 1 & 1.94\\ 2 & 1.71883\\ 3 & 1.57973\\ 4 & 1.46627\\ 5 & 1.35287\\ 6 & 1.21445\\ 7 & 0.99857 \end{array}$

We observe that $$|A_{k}|\leq .999 |A_{k-1}|$$ for all sufficiently large $$k$$. However, that’s a contradiction, since we know $$|A_{k}| > |A_{k-2}|$$. Hence, $$OPT$$ has to take at least $$8|p|$$ steps in the worst case.

### Probabilistic Algorithms

The $$8|p|$$ bound does not hold for probabilistic algorithms. In fact, it can be shown that my first algorithm takes $$7|p|$$ steps in expectation when the first step is taken in a random direction.