These past weeks, I was working on things related to mutations and whatnot. In the course of exploratory work, I had to revisit some things I last read about during my PhD: how do we decide when a mutation will happen, and who will mutate? This post is an attempt at organizing some notes I took about the Gillespie algorithm.

The @Gil77 algorithm is a way to calculate how long in the future ($\delta_t$) a
new event will take place, given the number of possible events ($k$), and the
rate at which the event we are looking for happens ($\mu$). With the example of
mutations, this becomes relatively straightforward: $k$ is the number of
offsprings, and $\mu$ is the probability that a given offspring has a mutation
in the trait of interest. The product of $k$ and $\mu$ is the *rate* ($r$). If
there are several types of events, each with their own rate, then $r = \sum_i
k_i \mu_i$.

In the Gillespie algorithm, the next timestep $\delta_t$ is drawn at random from
an exponential distribution. We want the value of $\delta_t$ to become shorter
when $k$ or $\mu$ increase (*i.e.* more potential events or a higher chance of
each event happening). One property of the exponential distribution is that
$\text{E}[X] = 1/\lambda$, where $\lambda$ is the parameter of the distribution.

If we assume that each offspring will have a 20% chance of mutation ($k=1$, $\mu =
0.2$, $r=0.2$), then we can expect to have a mutation every five generations
($E[X]=1/r=5$). If we double the number of offsprings ($k=2$, $r=0.4$), then we
should expect to have a mutation every 2.5 generations: $E[X]=1/r=2.5$. In
short, when we know the rate $r$, we draw $\delta_t$ in an exponential
distribution of parameter $\lambda = r$. Depending the software used, the
exponential distribution function will use either the actual parameter
$\lambda$, or the *scale* ($1/\lambda$). Be careful.

Finding $\delta_t$ is the first step. When there are multiple populations
involved, the next question is to decide *which* will mutate. The Gillespie
algorithm assumes that all events are completely random and completely
independant, and so all we need to do is to draw an event to happen this
timestep. Assuming we have several populations with their own $k_i$ (and the
same mutation probability $\mu$), the probability of every population mutating
is

We can sample a population $i$ to mutate, by applying the weights $p_i$.

**I guess it’s time for an illustration!** Because setting up the code for a
proper example with mutations is a little bit overwhelming, I will use
something simpler: the growth of two populations of bacteria. The *event* we
want to see happen is a cellular division, and it happens at a rate $\mu$
roughly equal to the number of divisions per hour (now is a good time to
mention that the values of *everything* in the Gillespie algorithm are
dependent on the units in which the model is expressed). This example is not
the most biologically appropriate (bacterial division happens after the cell
has grown enough, and so events are not random), it will do for an overview.

If a bacteria doubles on average every 20 minutes, then its rate of doubling is
$3 \text{hr}^{-1}$. Starting from a single cell at $t_0=0.0$, we expect to have
a division (and therefore two cells) at $t_1 = t_0+\delta_t \approx 20
\text{min}$. The next division, at which point we will have three cells, will
take place at $t_2 = t_1 + \delta_t \approx 30 \text{min}$ – at $t_1$, we have
$k=2$, $\mu=3$, and therefore $r=6$, which means that $\delta_t$ is $1/r=1/6$
hours. The event at $t_1$ is the division of the initial cell. The event at
$t_2$ is the division of *either* daughter cells. We then expect to have a
second doubling of the population when there are four cells, which happens at
$t_2+\delta_t$, where $\delta_t$ is $1/9$ – this is approximately 7 minutes.
The population will double first after 20 minutes, then after another 17 minutes
have passed.

Let’s write this up (in `Julia`

). We will need a few packages:

```
using Distributions
using StatsBase
using StatPlots
```

Next, let’s write a function to calculate $\delta_t$:

```
function gillespie(k, μ)
r = sum(k.*μ)
return rand(Exponential(1/r))
end
```

And a function to simulate the growth of populations:

```
function simulate(n, g; nmax=1000)
# Will store the time of events
t = [0.0]
# Will store the number of individuals
N = [copy(n)]
while (sum(last(N)) < nmax)
# When is the next step?
δt = gillespie(n, g)
# Which population should have a division?
double = sample(1:length(n), weights(n.*g))
# We increase the size of the dividing population, and push!
n[double] += 1
push!(N, copy(n))
push!(t, last(t)+δt)
end
return (t, N)
end
```

Let’s test our little prediction that the population will have doubled twice ($n=4$) after 37 minutes (even with the awfuly non-optimal code, this all runs within a second):

```
time_until_4 = [last(simulate([1], [3], nmax=4)[1])*60 for i in 1:200000]
histogram(time_until_4)
```

The vertical line is 37 minutes (the average time in this run was 36.9999… minutes).

Let’s summarize – the Gillespie algorithm works by using the number of possible events, and the rate at which events happens, to randomly generate the time until the next event. Events are picked with a probability relative to their rate, and are assumed to be independant and entirely random.

One of the issues I ran into when using this algorithm is that the rate of
events must be expressed in a unit that is *right* with regard to both the
population sizes *and* the time. In a surprising number of ecological models,
both of these quantities are dimensionless. For example, in the case of
mutations, the $\mu$ parameter is the rate of mutation per offspring; but in
(for example), the prey-predator Lotka-Volterra model, the number of offspring
of a predator $y$ consuming a prey $x$ at rate $\delta$ is *not* simply
$\delta\times x\times y$ (or rather it is, but this is not a quantity expressed
in number of individuals). This can become problematic when we need to be
relatively confident in the fact that the tempo of evolutionary changes matches
the tempo of population dynamics. But this is a story for another
day…