### Masks

*By Shel Silverstein*

She had blue skin,

And so did he.

He kept it hid

And so did she.

They searched for blue

Their whole life through,

Then passed right by-

And never knew.

*By Shel Silverstein*

She had blue skin,

And so did he.

He kept it hid

And so did she.

They searched for blue

Their whole life through,

Then passed right by-

And never knew.

*By William Shakespeare*

My mistress’ eyes are nothing like the sun;

Coral is far more red than her lips’ red;

If snow be white, why then her breasts are dun;

If hairs be wires, black wires grow on her head.

I have seen roses damask’d, red and white,

But no such roses see I in her cheeks;

And in some perfumes is there more delight

Than in the breath that from my mistress reeks.

I love to hear her speak, yet well I know

That music hath a far more pleasing sound;

I grant I never saw a goddess go;

My mistress, when she walks, treads on the ground:

And yet, by heaven, I think my love as rare

As any she belied with false compare.

*By Robert Frost*

Two roads diverged in a yellow wood,

And sorry I could not travel both

And be one traveler, long I stood

And looked down one as far as I could

To where it bent in the undergrowth;

Then took the other, as just as fair,

And having perhaps the better claim,

Because it was grassy and wanted wear;

Though as for that the passing there

Had worn them really about the same,

And both that morning equally lay

In leaves no step had trodden black.

Oh, I kept the first for another day!

Yet knowing how way leads on to way,

I doubted if I should ever come back.

I shall be telling this with a sigh

Somewhere ages and ages hence:

Two roads diverged in a wood, and I

I took the one less traveled by,

And that has made all the difference.

I love poetry. Did I ever tell you that? Not really.

I also enjoy world-building, philosophy, travelling to new places, but I never write about any of my experiences. I don’t write about myself much.

Twenty years later, when I’ll ask myself if I lived the life I dreamed for, I will need a definitive answer. The best way to do so would be to document all the interested things happen to me.

I’ll start with poetry. This week I’ll post some of my favorite poems. The first one would be *Song of the Indian Maid* by John Keats.

It’s a long poem like many of other Keats’s poems. The lines I like the most are the following:

To Sorrow

I bade good morrow,

And thought to leave her far away behind;

But cheerly, cheerly,

She loves me dearly;

She is so constant to me, and so kind:

I would deceive her

And so leave her,

But ah! she is so constant and so kind.

And also the following:

Young Stranger!

I’ve been a ranger

In search of pleasure throughout every clime;

Alas! ’tis not for me!

Bewitch’d I sure must be,

To lose in grieving all my maiden prime.Come then, Sorrow,

Sweetest Sorrow!

Like an own babe I nurse thee on my breast:

I thought to leave thee,

And deceive thee,

But now of all the world I love thee best.

A graph \(G(V, E)\) is called *bipartite* if its vertices can be divided into two groups \(X\) and \(Y\) such that every edge connects one vertex from \(X\) and one vertex from \(Y\). The graph drawn below is bipartite.

Given a graph, can you determine if it is bipartite?

This problem is pretty straightforward because the vertices of a bipartite graph can be colored with two colors such that every edge connects two vertices of different colors (see above). This property is called 2-colorability. The converse is also true, that is if a graph is 2-colorable, it is also bipartite. Hence, it order to check for bipartiteness, we can run a Breadth First Search on the graph and color the vertices as we discover them. It would take \(O(V+E)\) time and storage.

First of all, what is an online algorithm? An online algorithm is an algorithm that processes the input in a piece by piece in a sequential manner and have access to limited storage. These algorithms are useful in many realistic scenarios.

Now consider an online variant of the previous problem: suppose your algorithm gets a single pass over the edges of a graph and has access to only \(O(V)\) storage. In other words, the edges are fed into the algorithm one at a time, but the algorithm cannot store all of them. Can you determine if this graph is bipartite?

This is a practical problem because a graph with \(|V|\) vertices can have as many as \(|V|^2\) edges. For instance, if a graph has \(10^{10}\) vertices, there can be \(O(10^{20})\) edges. Storing them in the memory might be impossible. The problem above attempts to bypass this issue by getting the edges sequentially, and never storing all of them. I’ll show that it is indeed possible to design such an algorithm with only \(O(E\log V)\) runtime.

We’ll use the 2-colorability property once more. In the beginning, the graph is 2-colorable since all the vertices are disjoint. Let’s color all the vertices to be red. Every time a new edge is added to the graph, it connects two vertices of either different or same colors. In the former case, the graph remains bipartite. In the later case, one of the following two scenarios may occur:

- The edge connects two vertices from the same connecting component. Now, if the edge connects two vertices of same color, the graph is no longer 2-colorable. We can output FAIL and stop.
- The edge connects two vertices from different connected components. In this case, if the edge connects two edges of the same color, the graph is still 2-colorable. However, we need to flip colors of all the vertices in one of the connected components (see below).

A naive way to do #2 would be to flip color of every vertex in one of the connected components. This may take \(O(V)\) time for each edge, the total runtime being \(O(EV)\). We need a smarter way.

One potential approach is to do the flipping in a lazy manner: if #2 happens, only *remember* that colors in a connected components need to be flipped, instead of actually flipping them. When the algorithm checks the color of a vertex,

- go through your records to find the connected components the vertex was previously in.
- Observe how many of those connected components had their colors flipped.
- Since all vertices started as red, the number of flips determine current color of that vertex.

This reminds us of a data structure called Union-Find.

This data structure is is essentially a list of mutually disjoint sets. It has three methods:

**make-set:**makes a set with a unique id**find(x):**given element \(x\), finds the id of the set \(x\) belongs to.**union(x, y):**performs the set-union on the sets \(x\) and \(y\) belong to.

Union-Find is usually implemented as a forest of trees, with root of every tree being its id. Every node contains a pointer to its parent. Roots contain pointers to themselves. find(x) looks for the root of a tree. union(x, y) first does find(x) and find(y), then attaches the root of y as a child to the root of x.

In a naive implementation, the cost of the find and union functions are \(O(n)\). However, those can be reduced to \(O(\log n)\) via a technique called union by rank. In this technique, a shorter tree is always attached to a taller tree. You can easily show (by induction) that such a tree with height \(n\) must contain atleast \(2^n\) nodes. In other words, the time complexity for this data structure is \(O(\log n)\).

In our problem, every connected component can be thought of as a disjoint set. Every node starts in its own set. When two connected components are merged, we perform set-union over their corresponding sets. Now, notice that, the root of every tree can remember if the connected component needs to flip color during the union. When we call find on an element, we’ll pass through all its ancestors, which were roots in some previous connected components. We can check each of them to see if those components had their colors flipped. Since each find costs \(O(\log V)\) time, computing the color of each node requires \(O(\log V)\) time, too. Now, each of the steps #1 and #2 in the previous algorithm takes \(O(\log V)\) time. Hence, the total runtime for our modified algorithm is \(O(E\log V)\).

In Union Find, there is another technique, called *Path Compression*, which (along with Union by Rank) reduces the data structure’s amortized complexity to \(O(\alpha(n))\). The previous algorithm works with path compression, but needs to be modified. This is left as an exercise to the reader.

This is a very unusual post for this blog. I hope my regular readers will bear with me.

I love Game of Thrones. I have read the books and the companion novels. I have been following the series for years. I am familiar with most of the fan theories too. Tonight the series finale aired. I didn’t like how the show ended. It felt rushed and very baffling.

I understand that I don’t get to question the showrunners’ creative decisions about the show. However, as a fan I can disagree with them. This post is about how I feel the show should have ended. It was written in haste, so please forgive the typos and the repeating sentences.

Before I begin, please note that this article contains major spoilers. Read at your own risk.

In my honest opinion, the army of the living should have lost this fight. They should have retreated back to Iron Islands. However, this goes too far from the show’s ending, so I shall not pursue this thread. I shall try to modify the show’s version of the story so that it makes some sense.

The battle plan of the army of the living SUCKED. I won’t even try to justify it.

First of all, the cavalry charge was stupid. There are other articles on the internet suggesting better ways to use the Dothraki, so I’ll not get to it. Secondly, if the army of the living wanted to defend the castle, they should have also dug a wider, deeper trench around winterfell. The troops should be positioned *behind* the trench. In that way, the wights would not be able to cross the trench just by dropping some bodies. They would eventually fill in the trench, but it would require more time while they take damage. And when they do cross the trench, they would face fresh troops awaiting for them.

They should have manned the wall from the beginning, not when the wights are pursuing them.

In the show, the army of the living had catapults, but they got overrun only after throwing a couple of vollies. A more realistic solution would be to position the catapults inside the castle walls so that they may function for longer. Also, throwing jars full of wildfire, instead of rocks, might have been more effective since wights are susceptible to fire. (In fact, it was out of character for Tyrion not to suggest this strategy, after all, he had destroyed King Stanis’s fleet in the Blackwater Bay with wildfire.)

Nobody should have taken shelter in the Crypt. That’s where the Starks bury their dead. Someone should have spotted that!

**Brianne Should have died in Jamie’s arms. **Her character arc was complete.

Sansa should have died. She was useless throughout the entire story. Too bad, north needs a ruler.

In the show, the living were overwhelmed by Night King’s army. Night King was coming for Bran. At that point, out of desperation, Bran should have tried a new strategy. As we have seen in the Hodor story, warging in the past can affect the future. Inspired from this, he would try to go back thousands of years, when Night King was a human, and try to warg into him to stop the army of the dead at present. This would freeze the dead and white walkers for am moment, which is enough for Jon to rush and and stab the Night King. That would fulfil Jon’s role as the Azor Ahai.

Again, it should have been Jon who killed Night King. Otherwise, his resurrection has no importance, storywise.

As for Bran, altering past should have serious side-effects, as the previous Three Eyed Raven said,

The past is already written. The ink is dry.

He could get stuck in the past for good. Then, he could become Bran the Builder.

The overpowered Scorpion plot had more holes than Swiss cheese. Someone calculated that the bolts needed to have an initial velocity of 2000m/s to make it happen. Also, the dragons are practically invulnerable in the air according to the books.

The show should have given Euron a dragon horn instead. These horns were used in Old Valeriya and are presumably capable of controlling dragons. Dany should have spotted Iron fleet from far above. Unknowing that Euron has a dragon horn, she would proceed to destroy the Iron fleet. When she was close enough, Euron would blow the horn and take control of both dragons. However, Dany should regain control of Drogon soon, because of special bond with him. Rhaegal would go berserk, attacking Dany’s ships and pursuing Drogon. Dany would have no choice but to kill Rhaegal with Drogon. That would be a devastating blow for both Dany and Drogon. She would mourn for Rhaegal and become more revengeful and destroy Euron’s fleet with remaining strength.

Jaime should accompany Dany, not to save Cersei, but to kill her. Suspecting Dany’s plan to burn everyone in King’s Landing, Tyrion would ask Jaime to secretly go to the Red Keep and convince Cersei to surrender. Jaime would do so, while Dany would attack the City’s defenses.

After meeting Jaime, Cersei would pretend to surrender. She would ring the bell and open the gates. When Dany’s troops get inside the walls, they would be greeted by common folk. Then the folk would suddenly attack Dany’s soldiers. It would become apparent that Cersei has hidden her soldiers among common folk and using them as a meat shield to slaughter Dany’s troops. There should also be archers in the buildings. Common folk would throw rocks at Dany and Drogon showing that she was undesirable. This would finally provoke Dany and she would go on a rampage. She would also command Grey Worm to kill everyone who fought for Cersei.

On the other hand, seeing what Cersei had done, Jaime would finally choke Cersei, thus fulfilling Maggy the Frog’s prophecy.

After witnessing the atrocities committed by Dany, Grey Worm and their army, Arya will finally kill Grey Worm and steal his face. Then she would get to Dany in the throne room and kill her. Seeing his mother dead, Drogon will burn Arya, melt the Iron Throne and fly away with Dany’s corpse.

Finally, Jon should become the king, with Tyrion becoming his hand. The post credit scene should show Night King is being reborn. (Implying that somehow the timeline has changed, which would provide the premise for HBO’s spin off.)

Let’s think about the following problem:

*Consider a wall that stretches to infinitely in both directions. There is a robot at position \(0\) and a door at position \(p\in\mathbb Z\) along the wall \((p\neq 0)\). The robot would like to get to the door, but it knows neither \(p\), nor the direction to the door. Furthermore, the robot cannot sense or see the door unless it stands right next to it. Give a deterministic algorithm that minimizes the number of steps the robot needs to take to get to the door.*

This problem quite famous in Bangladesh because of Dr. Mohammad Kaykobad, a renowned Bangladeshi computer scientist. He maintains a list of interesting problems like this one to attract high school kids into problem solving.

Dr. Kaykobad’s formulation of this problem is quite ambiguous. He simply asks how the robot should get to the door, without specifying whether the algorithm should be deterministic. It matters as we shall see.

In this article, I shall provide a deterministic algorithm that takes at most \(9|p|\) steps for any \(p\). I shall also prove that any deterministic algorithm will take at least \(8|p|\) steps. I suspect this bound can be improved up to \(9|p|\), but I didn’t have time to think much.

for i = 0, 1, 2, ...: walk up to (-2)^i, but don't walk past p if you find the door: break loop else: get back to 0

Suppose \(p\) is found on \(i\)-th round. Then, \(|p| > 2^{i-2}\), otherwise \(p\) would have been found on \((i-2)\)-th round. Now, it is easy to see the robot takes \(2^1+\dots+2^{i}<2^{i+1}\) steps prior to \(i\)-th round. It takes \(|p|\) more steps to get to \(p\). Therefore, the total number of steps is at most \[|p|+2^{i+1}=|p|+8\cdot2^{i-2}<9|p|\] As \(|p|\to\infty\), total number of steps converges to \(9|p|\).

**(Disclaimer:** Finding a \(9|p|\) algorithm was a homework problem in my class 6.046. The algorithm and the runtime analysis shown above was part of the official solution. My algorithm was the same, but my runtime analysis was much uglier.)

In this section, we shall prove that any deterministic algorithm for this problem will take at least \(8|p|\) steps.

In any algorithm, the robot has to walk some steps to the left/right, then turn back and walk some more steps, then turn again, and so on. Without loss of generality, we may assume the optimum algorithm takes first step to the right.

Consider the the optimum deterministic algorithm \(OPT\) with minimum number of turns \(n\). Assume that \(\{a_i\}_{i=1}^n\) denotes the number of steps \(OPT\) takes during the \(i\)-th turn. Define \(A_i=\sum_{j=1}^i(-1)^{j+1}a_j\) to be the position of the robot after \(i\) steps. Also define \(A_0=0\). Suppose \(OPT\) takes fewer than \(8|p|\) steps for all \(p\).

**Claim 1:** \(a_n>\cdots a_i>\cdots a_1 > 0\) for \(n> i > 1\).

**Proof:** If the robot takes \(0\) step at any turn, we can simply ignore that turn and merge the steps of the next turn with the steps of the previous turn. Hence, \(a_1,a_i,a_n > 0\).

Suppose \(a_{i-1}-a_{i} \ge 0\) for some \(i\). This means at \(i\)-th step, the robot backtracked to somewhere between \(A_{i-1}\) and \(A_{i-2}\). Then, at the next step, it must go past \(A_{i-1}\), otherwise, \((i+1)\)-th step will revisit some previously visited vertices and it can be removed by walking to \((a_{i-1}-a_i+a_{i+1})\) at \(i\)-th step. This is a contradiction to the minimality of \(OPT\). Hence, we must have \((a_{i-1}-a_i)+a_{i+1} > a_{i-1}\). However, in this case, we can get a better algorithm by taking \(a_{i-1}-a_i+a_{i+1}\) steps at the \((i-1)\)-th turn, eliminating both \(a_i\) and \(a_{i+1}\). This is another contradiction. Hence, \(a_{i-1}-a_{i}< 0\).

\(a_{n}> a_{n-1}\) must be true, otherwise the \(n\)-th step is redundant.

**Claim 2:** \(a_i=|A_i|+|A_{i-1}|\) for all \(i\ge 1\). Furthermore, \(|A_{i+2}|-|A_{i}|=a_{i+2}-a_{i+1}>0\).

**Proof:** Trivial to check for odd and even \(i\). This claim also implies \(|A_{i+2}|>0\).

**Claim 3:** For \(i\ge 2\),

\[\sum_{j=1}^{i}a_j=|A_{i}|+2\sum_{j=1}^{i-1}|A_j|\]

**Proof:** Apply claim 3 for every \(a_j\).

**Claim 4:** \(|A_i| \leq 2.5|A_{i-1}|\) for \(i\ge 6\).

**Proof:** Suppose \(i\) is odd since the other case is analogous. Consider \( p= -|A_{i-1}|- 1 < A_{i-1}\) . The robot does not find \(p\) until at least \((i+1)\)-th turn, since \(|A_{i-1}|>|A_{i-3}|>\cdots>|A_2|\) (claim 2). Thus,

\[

\left(\sum_{j=1}^{i}a_j\right)+a_{i}+1 < 8(|A_{i-1}|+1)
\]
The left hand side can be thought as follows. First, the robot walks to \(A_{i}\). After \(a_{i}\) more steps in the opposite direction, it reaches \(A_{i-1}\) again. Then it takes 1 more step to reach \(p\).
Rewrite the left hand side using claim 2 and 3 to get
\[
1+|A_{i-1}|+2\sum_{j=1}^{i}|A_j|< 8|A_{i-1}|+8
\]
It simplifies to
\[
1+2|A_{i}|+2\sum_{j=1}^{i-2}|A_j|< 5|A_{i-1}|+8
\]
\[
\implies |A_{i}|< 2.5|A_{i-1}|+3.5-\sum_{j=1}^{i-2}|A_j|<2.5|A_{i-1}|
\]
The last inequality follows from the fact that \(i\ge 6\) and \(|A_j| > 0\). (Q.E.D)

Notice that claim 4 can be strengthened iteratively.

**Claim 5:** Suppose we have proven \(|A_i|\leq b|A_{i-1}|\) for some \(b\) and for all sufficiently large \(i\). Let \(b’=(2.5-1/b-1/b^2)\). We claim that \(|A_k|\le b’|A_{k-1}|\) for all sufficiently large \(k\).

**Proof:** Pick a sufficiently large \(i\) such that \(|A_{i-2}|\leq b|A_{i-3}|\) We proceed like in the previous proof to get

\[

|A_{i}|< 2.5|A_{i-1}|+3.5-\sum_{j=1}^{i-2}|A_j|\qquad(1)
\]
Notice that
\[|A_{i-2}|+|A_{i-3}|\ge \left( \frac{1}{b}+\frac{1}{b^2} \right)|A_{i-1}|
\]
Since \(i\) is sufficiently large, the right hand side of (1) is at most \[(2.5-1/b-1/b^2)|A_{i-1}|=b'|A_{i-1}|\]
Hence, \(|A_{i}| < b' |A_{i-1}|\) . (Q.E.D.)

We start with \(b=2.5\) and repeatedly apply claim 5. We find the following values for \(b\):

\[

\begin{array}{cl}

\text{iter.} & b\\

1 & 1.94\\

2 & 1.71883\\

3 & 1.57973\\

4 & 1.46627\\

5 & 1.35287\\

6 & 1.21445\\

7 & 0.99857

\end{array}

\]

We observe that \(|A_{k}|\leq .999 |A_{k-1}|\) for all sufficiently large \(k\). However, that’s a contradiction, since we know \(|A_{k}| > |A_{k-2}|\). Hence, \(OPT\) has to take at least \(8|p|\) steps in the worst case.

The \(8|p|\) bound does not hold for probabilistic algorithms. In fact, it can be shown that my first algorithm takes \(7|p|\) steps in expectation when the first step is taken in a random direction.

The distribution of primes plays a central role in number theory. The famous mathematician Gauss had conjectured that the number of primes between \(1\) and \(n\) is roughly \(n/\log n\). This estimation gets more and more accurate as \(n\to \infty\). We use \(\pi(n)\) to denote the number of primes between \(1\) and \(n\). So, mathematically, Gauss’s conjecture is equivalent to the claim

\[\lim_{n\to\infty}\frac{\pi(n)}{n/\log n}=1\]

This conjecture (currently known as the Prime Number Theorem) is notoriously difficult to prove. The first proof appeared almost a century after it was stated.

However, there are much simpler methods to bound \(\pi(n)\) above and below. Today I shall show such a bound that gives the correct order of magnitude for \(\pi(n)\) up to some constants. In particular, I shall prove that

\[(1+2\log 4)\frac{n}{\log n}\ge \pi(n)\ge \frac{\log 2}{2}\frac{n}{\log n}\]

for \(n\ge 12\). This proof technique is originally due to nineteenth century mathematician Pafnuty Chebyshev. He used a more complicated version of this technique to show that

\[1.1\frac{n}{\log n}\ge \pi(n)\ge 0.92\frac{n}{\log n}\]

for sufficiently large \(n\)s.

Suppose \(n\) is even, ie. \(n=2k\). Let us consider the prime factorization of \(\binom{2k}{k}\). All of its prime factors are between \(1\) and \(2k\). Therefore,

\[\binom{2k}{k}=p_1^{a_1}\cdots p_{\pi(2k)}^{a_{\pi(2k)}}\]

**Claim:** For every prime \(p_i\), \(p_i^{a_i}\leq 2k\)

**Proof:** We shall use this very simple identity \[\lfloor a+b\rfloor-\lfloor a\rfloor-\lfloor b\rfloor\leq1\]

It is well-known that the largest exponent of a prime dividing \(r!\) is

\[\left\lfloor\frac{r}{p_i}\right\rfloor+\left\lfloor\frac{r}{p_i^2}\right\rfloor+\cdots\]

Since \(\binom{2k}{k}=\frac{(2k)!}{(k!)^2}\), we have the following relationship

\begin{align}

a_i=&\left\lfloor\frac{2k}{p_i}\right\rfloor-2\left\lfloor\frac{k}{p_i}\right\rfloor+\left\lfloor\frac{2k}{p_i^2}\right\rfloor-2\left\lfloor\frac{k}{p_i^2}\right\rfloor+\cdots \\

&\leq 1+1+\cdots

\end{align}

Now there are as many \(1\)s as there are powers of \(p_i\leq 2k\). Hence, \(p_i^{a_i}\leq 2k\).

(Q.E.D)

So, we can conclude,

\[\binom{2k}{k}=p_1^{a_1}\cdots p_{\pi(2k)}^{a_{\pi(2k)}}\leq (2k)^{\pi(2k)}\]

It is easy to prove \(\binom{2k}{k}\ge 2^k\), hence

\[(2k)^{\pi(2k)}\ge 2^k\implies \pi(2k)\ge \frac{\log 2}{2}\cdot\frac{2k}{\log 2k}\]

The same bound can be derived for \(n=2k+1\) using \(\binom{2k+1}{k+1}\). Therefore, the lower bound is

\[\boxed{\pi(n)\ge \frac{\log 2}{2}\cdot\frac{n}{\log n}}\]

**Claim:** For all \(n\ge 2\), \[\prod_{p\leq n\\ p\ \text{prime}}p\leq 4^n\]

**Proof:** We induct on \(n\). The base case trivially holds. \(n=2k-1\to 2k\) also holds because the product on the left hand side does not increment. So, we only need to prove \(n=2k\to 2k+1\)

\begin{align}

\prod_{p \leq 2k+1\\ p\ \text{prime}}p &= \prod_{p \leq k+1\\ p\ \text{prime}}p \cdot\prod_{k+2\leq p \leq 2k+1\\ p\ \text{prime}}p\\

&\leq 4^{k+1}\cdot \binom{2k+1}{k}\quad (1)

\end{align}

The first part of the product is \(\leq 4^{k+1}\) because of the inductive hypothesis. The second part is just the product of all the primes in between \(k+2\) and \(2k+1\). It’s easy to see that they all divide \(\binom{2k+1}{k}\), hence their product is \(\leq \binom{2k+1}{k}\).

It’s easy to show \(\binom{2k+1}{k}<4^k\). Substituting this bound back to (1) gives us

\[\prod_{p \leq 2k+1\\ p\ \text{prime}}p \leq 4^{k+1}\cdot 4^k=4^{2k+1}\quad (\text{Q.E.D.})\]

Now take a number \(2\leq m\leq n\). Consider the primes that are between \(m\) and \(n\). There are \(\pi(n)-\pi(m)\) of them, each of which is \(\ge m\). Consequently,

\[m^{\pi(n)-\pi(m)}\leq \prod_{m\leq p \leq n\\ p\ \text{prime}}p\leq \prod_{1\leq p \leq n\\ p\ \text{prime}}p\leq 4^n\]

Now take log of both sides

\[(\pi(n)-\pi(m))\log m\leq n\log 4\]

Rewrite this as

\[\pi(n)\leq \pi(m)+\frac{n\log 4}{\log m}\leq m+\frac{n\log 4}{\log m}\]

Set \(m=\frac{n}{\log n}\) and simplify the expression

\[\pi(n)\leq \frac{n}{\log n}\left(1+\frac{\log 4\cdot\log n}{\log(n/\log n)}\right)\]

Easy to show \(\frac{\log n}{\log(n/\log n)}\leq 2\) for \(n\ge 12\). (In fact you can show much tighter bound for sufficiently large \(n\).) Hence, the upper bound is

\[\boxed{\pi(n)\leq \left(1+2\log 4\right)\frac{n}{\log n}}\]

Polynomial multiplication is one of the most important problems in mathematics and computer science. It can be formally defined as follows:

You are given two polynomials roughly of equal size\[A(x)=a_0+a_1x+\dots+a_{n-1}x^{n-1}\]\[B(x)=b_0+b_1x+\dots+b_{n-1}x^{n-1}\]find a polynomial \[C(x)=c_0+c_1x+\dots+c_{2n-2}x^{2n-2}\] such that \(A(x)\cdot B(x)=C(x)\). At first, this may look like an easy problem, since you already know how to multiply polynomials. You just sum the product of every two terms, right?

def multiply(A, B): """ A = list of coefficients of A(x) B = list of coefficients of B(x) """ n = len(A) - 1 # C can have 2*n-2 degree at most C = [0 for i in range(2*n-1)] for i in range(n): for j in range(n): # adding a_i*b_j to c_{i+j} C[i+j] += A[i]*B[j]

It definitely works, but has asymptotic runtime of \(O(n^2)\). That’s quite bad. Can we do it faster?

Turns out we can! We can in fact multiply two polynomials in \(O(n\log n)\) time. How is that possible? It all comes down to looking at the problem from another perspective.

So far we only saw polynomials as a list of coefficients. However, a polynomial of degree \(n\) can also be viewed as a curve in \(n+1\) dimensional space. It is quite possible define it in terms of the points it passes through. For instance, take the line \(P(x)=2x+1\) in 2d space. We only need to know two points on this line, say \((0, 1)\) and \((1, 3)\), to identify it uniquely. To determine a polynomial with degree \(n\), It has been proven that you need to know its values at \(n+1\) distinct points. This is called the Point Value Representation of Polynomials.

A polynomial \(P(x)\) in this form can be saved like \(\{0 : P(0), \dots, n : P(n)\}\), although you can evaluate \(P(x)\) at any complex point. (This will come handy later)

Note that multiplication is really cheap in this form. Given

\[A(x)=\{0:A(0),\dots (2n-2): A(2n-2)\}\] and \[B(x)=\{0:B(0),\dots (2n-2): B(2n-2)\}\] we can compute \(C(i)=A(i)\cdot B(i)\) at every point \(i\) from \(0\) to \(2n-2\). For instance, \(C(0)=A(0)\cdot B(0)\).This costs \(O(n)\) time. So, we are happy.

Wait, are we? We do not know the coefficients of \(C(x)\)! Evaluation at arbitrary points is very costly in point value form. So, we could not, for instance, raise \(C(x)\) to some arbitrary power. What if we could find a way to go back and forth between coefficient representation and point value representation? We could get the best of both worlds.

This is what Fast Fourier Transform allows us to do.

Before I move on, please note a couple of things.

- Prior to multiplication, we must evaluate both \(A(x)\) and \(B(x)\) at at least \(2n-1\) points even though each of them have degree \(n\). This is because \(C(x)\) has degree \(2n-2\).
- It is not necessary to evaluate the polynomials exactly at \(1,2,\dots,2n-2\). There is nothing special about these numbers. However, it is necessary that we evaluate both \(A(x)\) and \(B(x)\) at the same set of points.

The only way to convert a polynomial with degree \(n\) to point value representation is to evaluate it at \(n+1\) distinct points. However, polynomial evaluation at a single point stills costs \(O(n)\) in coefficient representation. For instance, you can evaluate \(A(p)\) at any point \(p\) like this: \[A(p)=a_0+p(a_1+p(a_2+\cdots+pa_{n-1}))\] Then for \(n+1\) points, we would end up with \(O(n^2)\) runtime. However, you can save a lot of this cost by algebraic manipulation. Let me give you an example.

Suppose we want to compute \(P(x)=2+3x-4x^2+2x^3+5x^4\) at \(1, 2, -1, -2\). We can evaluate the odd powers and even powers separately since even powers of negative numbers is same as that of positive numbers. To be more precise, define \(Q(x)=2-4x+5x^2\) and \(R(x)=3x+2x^2\).

Note that we only kept the order of the coefficients from \(P(x)\), but not the associated powers \(x\). This gives an algebraic relationship as follows:

\[P(x)=Q(x^2)+xR(x^2)\] Now, note that \(Q((-1)^2)=Q(1^2), Q((-2)^2)=Q(2^2)\) and so on. Therefore, we will only need to compute \(Q(1), Q(4), R(1), R(4)\) in order to evaluate \(P(x)\) at four points. Both \(Q(x)\) and \(R(x)\) has half the degree as \(P(x)\), hence we’re saving almost half the work.

In order for trick to work, we must select our list of points very carefully. If the polynomial were much bigger, we would have to do that odd-even split multiple times. Every time we would like to have numbers with opposite signs, so that when squared, they give the same value. You can easily see that no initial set of real numbers can satisfy such requirements. This is why we choose roots of unity. In particular, we choose \(m=2^k\)th roots of unity such that \(2n\leq m\leq 4n\). (This is essentially the smallest power of two that’s greater than \(2n\)). They have this magic property that if \(w\) is a \(2^k\)th root, then so is \(-w\) regardless of which \(k\) you choose.

If you know some linear algebra, you might notice that evaluating a polynomial at \(n+1\) point essentially defines a linear transformation, which can be represented by a matrix. Due to special nature of roots of unity, the inverse matrix consists of conjugates of previously used roots scaled by a factor of \(1/n\). I won’t show it here in details, but if you are interested you should check out this blogpost.

In English, the equations tell us that inverse of Fast Fourier Transform is another Fast Fourier Transform where every point of polynomial evaluation must be replaced by its conjugate, and the final result has to be scaled by a factor of \(1/n\).

Fast Fourier Transform is essentially a divide-and-conquer algorithm. The divide step corresponds to odd-even coefficient split. Each of the resulting subproblems has cost \(T(n/2)\). The combining step has cost \(O(n)\). (There are \(2n-2\) points and you have to compute the algebraic identity at each point.) So, the recurrence relationship should be

\[T(n)=2T\left(\frac{n}{2}\right)+O(n)\] Using master theorem, \(T(n)=O(n\log(n))\). Since FFT is the most expensive step in this approach, runtime of the entire process is dominated by \(O(n\log n)\). Therefore, polynomial multiplication also costs \(O(n\log n)\).

- Given \(A(x)\) and \(B(x\) in coefficient representation, convert them to point value representation with Fast Fourier Transform. (\(O(n\log n)\)))
- Compute \(C(x)=A(x)\cdot B(x)\). (\(O(n)\))
- Go back to coefficients representation with Inverse Fast Fourier Transform. (\(O(n\log n)\)))

No form of beauty is as flawless, as abstract, or as perennial as the mathematical beauty that derives from unassailable logical perfectness, for a mathematical proof is not earthly science, but a poetry written in logic.

[ –Adib Hasan 😛 ]