# The minimalist beauty of the canonical equation of adaptive dynamics

Perfection is when there is nothing to removeThe canonical equation of adaptive dynamics is *absolutely beautiful*.
It purports to describe the evolutionary change in the value of a trait over
time, and does so with a surprisingly small number of parameters. In this entry,
I will go through the terms it uses, and show an illustration of how it all
works.

The canonical equation of adaptive dynamics is

$$ \frac{\text{d}}{\text{d}t}x = \frac{1}{2}\mu \sigma^2 N^\star(x) \left(\frac{\partial}{\partial x’}s_x(x’)\right)\biggr|_{x’=x} $$

That’s it. This is (*assuming the usual assumptions of adaptive dynamics*) all
we need to represent the change in a trait $x$ over time. This equation has
three components, and a half (literally, there’s a $1/2$ in here, and it has a
very simple explanation).

## What is actually going on in this equation?

The first component is the **creation of diversity**. This is represented by
$\mu\sigma^2$, which is the *per capita* mutation rate $\mu$, multiplied by the
variance in the value of $x$ resulting from a mutation ($\sigma^2$). This
equation therefore speaks the language of the *effect* of mutations: this effect
increases with $\sigma$, and happens more commonly with increases in $\mu$.

But these mutations are expressed *per capita*, which is to say that we need to
know the quantity of individuals in which these mutations can originate. This is
$N^\star(x)$, specifically the population size of a resident with trait value
$x$, at its demographic attractor. This is the second component, **the size of
the population**.

This is *not* the effective population size (although there are links between
the two notions), in part because we assume here that $N^\star(x)$ is
*large*, in the specific sense that we do not have to consider the effects of
stochasticity on population size *or* on selection.

The last component is maybe the least (or most?) intuitive: the partial
derivative of the *invasion fitness* of the mutant, evaluated at the strategy
held by the resident. The invasion fitness is the *per capita* growth of an
initially rare mutant $x’$, and so the partial derivative w.r.t $x’$ is a
measure of the **movement of the trait**. When the absolute value of this
derivative gets larger, the trait is evolving faster (*i.e.* selection is
stronger). When the partial derivative is positive, the trait value is expected
to *increase*, and when it is negative it will *decrease*. This last parameter
is a measure of how fast we turn the cranks of the evolutionary process, and in
which direction.

## But why divide it by two?

**There is a very confusing bit of notation in this equation**. We are used to
seeing $\mu$ as the mean and $\sigma^2$ as the variance, but this is not the
case here. We can still think of mutation as a process that produces normally
distributed *effects*, specifically given by $\mathcal{N}(x,\sigma^2)$. On
average, we expect that mutants are going to have the same trait value as their
ancestor, and so this equation is, rather than mechanistic, phenomenological: we
do not know *how* a trait value of $x$ will have an impact on fitness, only that
it does, and that mutations acts as small perturbations on the value of $x$.

And this is precisely where $1/2$ comes from. Remember that the (partial)
derivative of the invasion fitness is telling us something about the direction
in which the trait is expected to change, *i.e.* move away from $x$. But the
mutation process is generating mutants that have, on average, a value of $x$.
Assuming that the mutations effects come from $\mathcal{N}(x,\sigma^2)$, how
many “relevant” mutants (with trait values that are moving in the “right”
direction) do we expect? This is the same thing as asking $P(x’) \le x$
(assuming the sign of the derivative of the invasion fitness is negative), which
in a normal distribution with mean $x$ is $(1/2)\times [1+\text{erf}(0)]$ (we
know this because we know the cumulative distribution function!), which is
exactly $1/2$.

So the canonical equation is only true *from a certain point of view*; namely,
the point of view that the distribution of the effect of mutations is normal,
and that traits values can be expressed by putting them on the number line (the
traits are in $\mathbb{R}$, and can move at will in this space). If we assumed
different properties for the distribution of mutations effects, **we would need
another equation**.

For example, if we assumed that traits were cyclical (having a very high value
and a very low value are similar in terms of population size and selection), and
are represented on any interval of length $2\pi$, we would be potentially *very*
wrong in using the assumption of the normal distribution. Wrapped distributions
(wrapped Normal, von Mises, …) would be good choices. Disregard the fact that
the cumulative distribution functions for these distributions are often, uhh,
interesting. And yet, it’s not too difficult to think of a situation where a
circular distribution would make sense. If we can imagine a trait describing
time of day, then it would lend itself to being represented on a circle!

Similarly, if we have reasons to believe that a trait can be represented as
something on the unit interval (for example: investment in activity $a$ *v.*
activity $b = 1-a$), then again, we would need to use a distribution that
reflects this constraint, like a Beta. We could still express the variance, and
we could still express the coefficient representing the proportion of mutants
with the “right” trait values, but it would be a different formula.

## How does it work in practice?

In practice, the assumption of normally distributed mutations on real-valued
traits is fine. So let us assume it works, and see what the canonical equation
looks like for what ought to be the best known example of adaptive dynamics: the
logistic growth model with a cost from Brännström *et al.*’s “Hitchhiker’s
Guide to Adaptive Dynamics”.

In this model, the growth of a population with trait $x$ is defined as

$$\frac{\text{d}}{\text{d}t}N_x = N_x\times (x - c(x) - d N_x)$$

with $x$ as the birth rate, $c(x)$ as the cost of high birth rate ($ae^x$ in the paper, with $a$ being a scaling parameter for the exponential cost of higher birth rates), and $d$ the intra-specific competition rate. To represent the evolution of $x(t)$, we need to figure out $N^\star(x)$, and $s_x(x’)$.

The equilibrium population size that is not $N_x = 0$ is given by the solution
to $x - c(x) - d N_x$, *i.e.* $N^\star(x) = [x - c(x)]/d$. The invasion fitness
is, as usual, defined as the *per capita* growth rate of an initially rare
mutant $x’$ in a population of a resident $x$ at its demographic attractor
(that’s a mouthful I know), which is

$$s_x(x’) = x’ - c(x’) - d N^\star(x)$$

or in other words,

$$s_x(x’) = x’ - c(x’) - x + c(x)$$

If we now replace $c(x)$ by its value ($ae^x$), we get

$$s_x(x’) = x’ - a \times \mathrm{exp}(x’) - x + a \times \mathrm{exp}(x)$$

What we want at this point is

$$\left(\frac{\partial}{\partial x’}s_x(x’)\right)\biggr|_{x’=x}$$

which ends up being $1-a\times \text{exp}(x)$.

If we now replace everything in our equation (after making the required gathering of terms that should be together), we get the dynamics of $x(t)$:

$$\frac{\text{d}}{\text{d}t}x = \frac{1}{2}\mu \sigma^2 \frac{(x - a\times e^x)(1-a\times e^x)}{d}$$

What is interesting here is that we can look for values of $x$ so that this derivative goes to zero, which we will call $x^\star$. These values correspond to possible equilibria of the trait values, at which the usual scenarios of adaptive dynamics can occur. There are two such values here, that specifically are solutions to $(x - a\times e^x)$ and $(1-a\times e^x)$.

Note that one solution actually comes from the expression of $N^\star(x)$; this is the solution to $x = ae^x$, which does not really have a neat solution other than $x^\star = -\text{W}(-a)$ (which, through numerical approximation, gives two solutions, the lowest one being the minimum viable value of $x$). The other solution, from $(1-a\times e^x)$, corresponds to the expected equilibrium value of $x(t)$, and is $x^\star = \mathrm{ln}(a^{-1})$.

## What have we learned?

Adaptive dynamics is not always the most intuitive (or even flexible)
frameworks. In fact, there is a *very* recent proposal to expand the approach,
using oligomorphic dynamics, which is clearly something I will have to
spend more time with (but it works with multimodal trait distributions!).

Nevertheless. The canonical equation of adaptive dynamics *is* beautiful,
because it only bring what is required to predict how the trait will change
through time. What proportion of the mutations will fall on the right side of
the resident? $1/2$. How fast do mutations accumulated? $\mu$. How much variance
do we expect to see as a result of the mutation process? $\sigma^2$. How many
times is this process repeated? $N^{\star}$. And finally, in what direction and
how far is the trait “moving”? $[s_x(x’)]’|_{x’=x}$.

Nothing more, nothing less.