The minimalist beauty of the canonical equation of adaptive dynamics
Perfection is when there is nothing to removeThe canonical equation of adaptive dynamics is absolutely beautiful. It purports to describe the evolutionary change in the value of a trait over time, and does so with a surprisingly small number of parameters. In this entry, I will go through the terms it uses, and show an illustration of how it all works.
The canonical equation of adaptive dynamics is
$$ \frac{\text{d}}{\text{d}t}x = \frac{1}{2}\mu \sigma^2 N^\star(x) \left(\frac{\partial}{\partial x’}s_x(x’)\right)\biggr|_{x’=x} $$
That’s it. This is (assuming the usual assumptions of adaptive dynamics) all we need to represent the change in a trait $x$ over time. This equation has three components, and a half (literally, there’s a $1/2$ in here, and it has a very simple explanation).
What is actually going on in this equation?
The first component is the creation of diversity. This is represented by $\mu\sigma^2$, which is the per capita mutation rate $\mu$, multiplied by the variance in the value of $x$ resulting from a mutation ($\sigma^2$). This equation therefore speaks the language of the effect of mutations: this effect increases with $\sigma$, and happens more commonly with increases in $\mu$.
But these mutations are expressed per capita, which is to say that we need to know the quantity of individuals in which these mutations can originate. This is $N^\star(x)$, specifically the population size of a resident with trait value $x$, at its demographic attractor. This is the second component, the size of the population.
This is not the effective population size (although there are links between the two notions), in part because we assume here that $N^\star(x)$ is large, in the specific sense that we do not have to consider the effects of stochasticity on population size or on selection.
The last component is maybe the least (or most?) intuitive: the partial derivative of the invasion fitness of the mutant, evaluated at the strategy held by the resident. The invasion fitness is the per capita growth of an initially rare mutant $x’$, and so the partial derivative w.r.t $x’$ is a measure of the movement of the trait. When the absolute value of this derivative gets larger, the trait is evolving faster (i.e. selection is stronger). When the partial derivative is positive, the trait value is expected to increase, and when it is negative it will decrease. This last parameter is a measure of how fast we turn the cranks of the evolutionary process, and in which direction.
But why divide it by two?
There is a very confusing bit of notation in this equation. We are used to seeing $\mu$ as the mean and $\sigma^2$ as the variance, but this is not the case here. We can still think of mutation as a process that produces normally distributed effects, specifically given by $\mathcal{N}(x,\sigma^2)$. On average, we expect that mutants are going to have the same trait value as their ancestor, and so this equation is, rather than mechanistic, phenomenological: we do not know how a trait value of $x$ will have an impact on fitness, only that it does, and that mutations acts as small perturbations on the value of $x$.
And this is precisely where $1/2$ comes from. Remember that the (partial) derivative of the invasion fitness is telling us something about the direction in which the trait is expected to change, i.e. move away from $x$. But the mutation process is generating mutants that have, on average, a value of $x$. Assuming that the mutations effects come from $\mathcal{N}(x,\sigma^2)$, how many “relevant” mutants (with trait values that are moving in the “right” direction) do we expect? This is the same thing as asking $P(x’) \le x$ (assuming the sign of the derivative of the invasion fitness is negative), which in a normal distribution with mean $x$ is $(1/2)\times [1+\text{erf}(0)]$ (we know this because we know the cumulative distribution function!), which is exactly $1/2$.
So the canonical equation is only true from a certain point of view; namely, the point of view that the distribution of the effect of mutations is normal, and that traits values can be expressed by putting them on the number line (the traits are in $\mathbb{R}$, and can move at will in this space). If we assumed different properties for the distribution of mutations effects, we would need another equation.
For example, if we assumed that traits were cyclical (having a very high value and a very low value are similar in terms of population size and selection), and are represented on any interval of length $2\pi$, we would be potentially very wrong in using the assumption of the normal distribution. Wrapped distributions (wrapped Normal, von Mises, …) would be good choices. Disregard the fact that the cumulative distribution functions for these distributions are often, uhh, interesting. And yet, it’s not too difficult to think of a situation where a circular distribution would make sense. If we can imagine a trait describing time of day, then it would lend itself to being represented on a circle!
Similarly, if we have reasons to believe that a trait can be represented as something on the unit interval (for example: investment in activity $a$ v. activity $b = 1-a$), then again, we would need to use a distribution that reflects this constraint, like a Beta. We could still express the variance, and we could still express the coefficient representing the proportion of mutants with the “right” trait values, but it would be a different formula.
How does it work in practice?
In practice, the assumption of normally distributed mutations on real-valued traits is fine. So let us assume it works, and see what the canonical equation looks like for what ought to be the best known example of adaptive dynamics: the logistic growth model with a cost from Brännström et al.’s “Hitchhiker’s Guide to Adaptive Dynamics”.
In this model, the growth of a population with trait $x$ is defined as
$$\frac{\text{d}}{\text{d}t}N_x = N_x\times (x - c(x) - d N_x)$$
with $x$ as the birth rate, $c(x)$ as the cost of high birth rate ($ae^x$ in the paper, with $a$ being a scaling parameter for the exponential cost of higher birth rates), and $d$ the intra-specific competition rate. To represent the evolution of $x(t)$, we need to figure out $N^\star(x)$, and $s_x(x’)$.
The equilibrium population size that is not $N_x = 0$ is given by the solution to $x - c(x) - d N_x$, i.e. $N^\star(x) = [x - c(x)]/d$. The invasion fitness is, as usual, defined as the per capita growth rate of an initially rare mutant $x’$ in a population of a resident $x$ at its demographic attractor (that’s a mouthful I know), which is
$$s_x(x’) = x’ - c(x’) - d N^\star(x)$$
or in other words,
$$s_x(x’) = x’ - c(x’) - x + c(x)$$
If we now replace $c(x)$ by its value ($ae^x$), we get
$$s_x(x’) = x’ - a \times \mathrm{exp}(x’) - x + a \times \mathrm{exp}(x)$$
What we want at this point is
$$\left(\frac{\partial}{\partial x’}s_x(x’)\right)\biggr|_{x’=x}$$
which ends up being $1-a\times \text{exp}(x)$.
If we now replace everything in our equation (after making the required gathering of terms that should be together), we get the dynamics of $x(t)$:
$$\frac{\text{d}}{\text{d}t}x = \frac{1}{2}\mu \sigma^2 \frac{(x - a\times e^x)(1-a\times e^x)}{d}$$
What is interesting here is that we can look for values of $x$ so that this derivative goes to zero, which we will call $x^\star$. These values correspond to possible equilibria of the trait values, at which the usual scenarios of adaptive dynamics can occur. There are two such values here, that specifically are solutions to $(x - a\times e^x)$ and $(1-a\times e^x)$.
Note that one solution actually comes from the expression of $N^\star(x)$; this is the solution to $x = ae^x$, which does not really have a neat solution other than $x^\star = -\text{W}(-a)$ (which, through numerical approximation, gives two solutions, the lowest one being the minimum viable value of $x$). The other solution, from $(1-a\times e^x)$, corresponds to the expected equilibrium value of $x(t)$, and is $x^\star = \mathrm{ln}(a^{-1})$.
What have we learned?
Adaptive dynamics is not always the most intuitive (or even flexible) frameworks. In fact, there is a very recent proposal to expand the approach, using oligomorphic dynamics, which is clearly something I will have to spend more time with (but it works with multimodal trait distributions!).
Nevertheless. The canonical equation of adaptive dynamics is beautiful, because it only bring what is required to predict how the trait will change through time. What proportion of the mutations will fall on the right side of the resident? $1/2$. How fast do mutations accumulated? $\mu$. How much variance do we expect to see as a result of the mutation process? $\sigma^2$. How many times is this process repeated? $N^{\star}$. And finally, in what direction and how far is the trait “moving”? $[s_x(x’)]’|_{x’=x}$.
Nothing more, nothing less.