# A Probabilistic Model for COVID-19 Outbreak

As I am writing this blogpost, MIT is about to close until next fall and the undergrads are essentially getting evicted from the dorms.

Last two days were some of the most stressful days I have ever had. I have tons of work to do, but I cannot concentrate because I need to make sure my visa status will be okay, I need to pack my belongings and hopefully I’ll find a place to go.

COVID-19 is spreading rapidly across the US. This was expected. While MIT’s take on this is understandable, was it really necessary?

Let me try to answer that.

Let’s start with a small campus with $$N$$ students where a small number of students $$\eta$$ have been tested COVID-19 positive. I’ll also assume that the campus is in a state of lockdown — no student leaves the campus and no person from outside enters the campus.

As you might know, it takes some time before an infected person shows symptoms of the disease. During that time, he/she might spread the virus without knowing. So, I’ll assume that after infecting a person, the virus takes $$a$$ days to be capable of transmitting elsewhere, and it takes $$b$$ days $$(b > a)$$ for the symptoms to surface. Hence, a patient is likely to spread the disease in this window of $$b-a$$ days.

I also expect every person to seek medical care once symptoms can be observed. In practice, this is not always true.

For simplicity, I’m assuming that once a person gets infected, he/she remains infected for the rest of the time, however, spreads the disease only for $$b-a$$ days. Since we are interested in the number of new cases per day, this does not really matter.

Lastly, I’ll assume that the virus is transmitted only in between a person and his/her immediate neighbors in the network graph with probability distribution $$\psi$$. For person $$X_n$$, we shall denote this set by $$\mathcal N(X_n)$$ and the distribution $$\psi$$ is typically some beta distribution.

Let $$X_n^{(t)}$$ be the indicator variable such that person $$X_n$$ contacts the virus on day $$t$$ since the outbreak. Let $$Y_n^{(t)}$$ be the indicator such that the person is a vector for COVID-19 on day $$t$$.

Now we shall find recursive relationships for $$P(X_{n}^{(t+1)})$$ and $$P(Y_{n}^{(t+1)})$$.

At a first glance, $$X_{n}^{(t+1)}=0$$ if and only if a person is not infected by the virus on both day $$t+1$$ and $$t$$. Since only the closest neighbors can pass on the virus, we get

$P\left(X_n^{(t+1)}\right)=1-\left(1-P\left(X_n^{(t)}\right)\right)\prod_{\mathcal N(X_n)}\left(1-\psi P\left(Y_m^{(t+1)}\right)\right)$

Next, a person is an active vector of the virus on day $$t+1$$ if he/she has been infected in between day $$(t-a)$$ and $$(t-b)$$. This is because any person infected in between day $$(t-a+1)$$ and day $$t$$ cannot transmit the virus by assumption. Furthermore, once the symptoms surface, I expect the person to take medical care and not to infect any other person. Hence,

$P\left(Y_n^{(t+1)}\right) = P\left(X_n^{(t-a)}\right)+\left(1-P\left(X_n^{(t-a)}\right)\right)\left(P\left(X_n^{(t-a-1)}\right)+\cdots\right)$

This can be re-written as

$P\left(Y_n^{(t+1)}\right)=\sum_{i=a}^{b}P\left(X_n^{(t-i)}\right)\prod_{j=a}^{i-1}\left(1-P\left(X_n^{(t-j)}\right)\right)$

Since the recurrence relation for $$X_n^{(t+1)}$$ relies on $$Y_m^{(t+1)}$$, we only need to provide some base cases for $$X_n$$. We do so by just setting $$P(X_n^{(0)})=1$$ for $$\eta$$ number of initial students at random.

Here is the python code in case anyone wants to take a look:

import numpy as np
import matplotlib.pyplot as plt

def create_graph(N, connect_prob=0.3):
"""Create an undirected graph such that each node is connected
with every other node with prob = connect_prob
"""
G = np.random.choice([0, 1], size=(N, N), p=[1-connect_prob, connect_prob])
G = ((G + G.T) > 0).astype('int')

# remove self-loops
for i in range(N):
G[i, i] = 0
return G

"""
load an undirected graph with N nodes
in the edge list format from filepath
"""
G = np.zeros((N, N))
with open(filepath, 'r') as f:
u, v = line.split()
u, v = int(u), int(v)
G[u, v] = 1
G[v, u] = 1
return G

def update_x(G, P_X, P_Y, N, p_transmit):
P_Xtp1 = 1-(1 - P_X[:1, :]) * np.prod(1 - p_transmit(N)*np.repeat(P_Y, N, axis=0),
where=G > 0, axis = 1, keepdims=True).T
P_X = np.vstack((P_Xtp1, P_X[:-1, :]))
return P_X

def update_y(P_X, P_Y, N, p_transmit, start, end):
P_Y = np.zeros(P_Y.shape)
P_Y += P_X[start:start+1, :]
for i in range(start+1, end):
P_Y += P_X[i:i+1, :] * np.prod(1-P_X[start:i, :], axis = 0, keepdims=True)
return P_Y

def simulate(G, N, start, end, p_transmit, initX, T):
X = np.zeros((N,))
Y = np.zeros((N,))

P_X = np.zeros((end, N))
ind_i = np.random.choice(range(N), initX)
P_X[0, ind_i] = 1. # these people infected initially
P_X = np.repeat(P_X, end, axis=0)
P_Y = np.zeros((1, N))

infected = []
sol = 0

for t in range(T):
P_X = update_x(G, P_X, P_Y, N, p_transmit)
P_Y = update_y(P_X, P_Y, N, p_transmit, start, end)
infected.append(np.sum(P_X[:1, :] > 0.7) - sol)
sol = np.sum(P_X[:1, :] > 0.7)
if t % 3 == 0:
print(f"t={t}, total infected={sol}")
return infected

# N = 10
# G = create_graph(N, 0.0005)

N = 4039
start = 3
end = 10
p_transmit = lambda n : np.random.beta(0.6, 1, size = (1, n))
initX = 3
T = 50

ts = list(range(T))
infected = simulate(G, N, start, end, p_transmit, initX, T)
line1, = plt.plot(ts, infected, marker='o', linestyle='solid', linewidth=2, markersize=6, label='unchecked spread')

end = 6
p_transmit = lambda n : np.random.beta(0.1, 1, size = (1, n))
infected = simulate(G, N, start, end, p_transmit, initX, T)
line2, = plt.plot(ts, infected, marker='o', linestyle='solid', linewidth=2, markersize=6, label='suppressed spread')

plt.xlabel('# days', fontsize=12)
plt.ylabel('# new cases detected', fontsize=12)

plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.legend(handles=[line1, line2], fontsize=12)
plt.show()
# np.save("graph", G)


Now I need to test it on some graphs. I used Stanford’s Social Circles dataset. It has 4039 nodes and about 80k edges.

To test the unchecked spread of the virus, I set $$\eta=3$$, $$\psi=\beta(0.6, 1)$$, and $$a=3$$, $$b=10$$. I ran the simulation for 50 days, but the network was overwhelmed way before that, which is expected. The height of the top peak was around 1200.

Then I simulated the conditions where preventive measures, such as reducing transmission probability $$\psi$$, early diagnosis to shorten the period $$(b-a)$$ etc. have been undertaken. To simulate that, I set $$\psi=\beta(0.1, 1)$$ and $$b=6$$.  It seems like the outbreak was pushed back, it also became flatter (observe the peak height difference). I also found that $$X^{(t)}$$ is very sensitive to $$\psi$$. So, if we had reduced it enough, for instance, by virtualizing classes, not having social gatherings, etc, I assume that MIT Medical could handle this outbreak. Here is an alternate plot with for $$\psi=\beta(2, 5)$$ for unchecked spread and $$\psi=\beta(2, 30)$$ for suppressed spread. At very least, the situation was not so dire that the students have to leave in such a short notice.