Notes on the Gillespie algorithm - part 2

In the [previous post][gillespie-1], I collected some notes on the Gillespie algorithm, and highlighted one difficulty: it is not always clear how to setup the reaction rate, because some ecological models are setup in a way where the population sizes are not expressed in units that make a lot of sense. In this post, I will go through one solution I found, and show an example of a model with adaptation.

[gillespie-1]: {% link _posts/ %}

Let’s start with a simple model of logistic growth with competition:

$$\frac{\dot N}{N} = r\left(1-\frac{N}{K}\right)$$

To implement mutation in this model, we may want to count the offspring, which is not obvious since this model does not really have a component associated to mortality. We will just simplify and assume that $rN$ offsprings are produced (since any decrease in growth rate is due to mortality through competition). Then the time until the next mutation is drawn from the exponential distribution of parameter $\lambda = r\times N\times \mu$. If we call the shape of this distribution $\beta = 1/\lambda$, the expected $\delta_t$ is $\beta$.

So far so good, but because $N$ is not expressed in individuals, it is unclear (i) what the value of $\delta_t$ means, and (ii) what the value of $\mu$ relative to other parameters should be. There are a number of ways to decide on the value of $\mu$ (one of which is to ), but here is the one I picked. First, let’s see how many timesteps we need to reach the $N^\star = K$ equilibrium.

We can write this up using the [DifferentialEquations package][diffeq] for julia:

[diffeq]: {% link _posts/ %}

r, K = 1.1, 10.0
f(t, u) = u*r*(1-u/K)
t = (0.0, 20.0)
logistic_p = ODEProblem(f, 0.02, t)
logistic_s = solve(logistic_p, callback=PositiveDomain())

The carrying capacity is reached after about 12 timesteps. So assuming we are still dealing with bacteria, I can assume that one timestep is close to 2 hours (we know that bacteria reach carrying capacity in the lab after exactly 24 hours, which allows the field of experimental evolution to exist).

One question we can ask at this point is, how long would I like to wait until the next mutation, and work backward from here. Remember that the expected $\delta_t = 1 / \lambda$, and $\lambda = r\times N \times \mu$. Let’s say that at equilibrium, I would like to have a mutation within the next 20 minutes: if a timestep is 2 hours (120 minutes), then 20 minutes is about 0.16. We can re-organize

$$\delta_t = \frac{1}{r\times N \times \mu}$$

to have

$$r\times N\times \mu = \frac{1}{\delta_t} ,,$$

and finally

$$\mu = \frac{1}{\delta_t\times r \times N} ,.$$

Since we want to find out what this will be at equilibrium, we can can substitute $N$ by $N^\star = K$, so that we end up with

$$\mu = \left(\delta_trK\right)^{-1},.$$

Using the values of $r$ and $\alpha$ in the code snippet above, this turns out to $\mu \approx 0.56$. The unit of this quantity is something like mutation per unit of population per timestep, but this does not really matter – by deciding how long we want to wait, we can pick a value of $\mu$ that depends on the parameters of population dynamics.

But there is another issue with this approach – if we assume that the equilibrium density is higher than the starting density, then we will have to wait a lot longer for the first mutation to appear. So instead of using $N^\star$, we can use $N_0$ to calculate $\mu$. If we want to have to wait 0.16 timesteps to have the first mutation, with an initial population size of $N_0 = 0.01$, then we need to use a value of $\mu \approx 2.9 \times 10^{2}$. Of course, this will result in a very short $\delta_t$ very soon, so finding a balance between the number of mutations and the run time is going to matter a lot.

This is also the time when we need to decide on the relative tempo of ecological vs. evolutionary dynamics. Do we want to see change within a few hours, or within a few days? Unless there are good empirical evidence to calibrate it, $\mu$ is a parameter that is poorly constrained.

Let’s wrap this up using code. But first, we will re-write a multi-strain model, where (i) bacteria grow faster when their trait $x_i$ is optimal, and (ii) bacteria compete with strains that have similar traits values:

$$\frac{\dot N_i}{N_i} = \omega(x_i, 0, \xi_g)\times r \times \left(1-\frac{\sum_j\omega(x_i, x_j, \xi_c) N_j}{K}\right),.$$

As in previous work on this type of models @WeitHart05, the $\omega$ function is defined as

$$\omega(x, y, z) = \text{exp}\left(-\frac{(x-y)^2}{2z^2}\right),.$$

We can write both of these as (and loading the same packages as in the [previous post][gillespie-1]):

function fitness(x,y,breadth)
    dist = (x.-y').^2
    return exp.(-dist./(2*breadth^2))

function offspring(t, u)
    gr = fitness(x, 0.0, ξ)*r
    return gr

function f(t,u)
    competition = (1.0-vec(sum(fitness(x, x, adj*ξ).*u, 1))/K)
    return u.*offspring(t,u).*competition

After a little bit of setting things up, we have a workable code. First, the parameters:

# Initial trait value
x = [-0.07]
# Initial population size
n = [0.02]

# Parameters
r = 1.1
K = 10.0
ξ = 0.05 # Tolerance for being away from optimum
adj=0.95 # Strength of competition vs. being close to optimum
μ = 0.05

# Time
t₀ = 0.0
tₙ = 12000.0
δₜ = gillespie(n.*offspring(t₀, n), μ)

And second, the main loop (minus the code to keep track of values of traits and population sizes):

while t₀ + δₜ <= tₙ
  # Time for integration
  t = (t₀, t₀ + δₜ)
  if t[2] > tₙ
    t = (t₀, tₙ)

  # We define the integration problem
  p = ODEProblem(f, n, t)
  s = solve(p, callback=PositiveDomain())

  # Update the next starting time + population size
  t₀ = last(s.t)
  n = last(s)

  # Find next mutation
  δₜ = gillespie(n.*offspring(t₀, n), μ)

  # Pick a mutant
  to_mutate = sample(eachindex(x), weights(n.*offspring(t₀, n)))
  push!(x, rand(Normal(x[to_mutate], 0.005)))
  push!(n, eps())

  # Remove populations below a threshold
  not_extinct = find(n .>= eps())
  n = n[not_extinct]
  x = x[not_extinct]

Running this can take a little while (because there can be up to a thousand strains, and because there will be many new mutations, and also of course because this is proof of concept code). But in the end, we will see the evolutionary trajectory of the population:

Traitgram{: .center-image }

There is a first separation in two quasi-species after 2500 timesteps (that’s about 200 days assuming the starting population reaches $K$ in 24 hours), then one of these clusters further divides at 6500 timesteps. We can look at the distribution of trait values at the final timestep - it shows a main cluster of traits close to the optimum, and two clusters away from the optimum that presumably offset the loss of performance by being less exposed to competition.

Histogram{: .center-image }

This is not breaking any new grounds in adaptive dynamics, but at least the code works!

To summarize, picking the right value of $\mu$ is important, because this will regulate the relative rythm of ecological and evolutionary processes. When we are lucky enough to be able to exactly represent birth and death, there are very fast versions of this algorithm available [@MathHast12].

Some comon models for population dynamics have the strange tendency of not being expressed in units of individuals, so some tweaks may be necessary to find a suitable value of $\mu$. Without revisiting the old discussion about the ability to calibrate models on empirical data (which @GetzMars17 covered extremely well in a recent paper), I think this is a nice illustration of the issues that come from of ecologists being a little carefree about the units in which our models are expressed.