# Scale, productivity, biodiversity, and curve-fitting

Revisiting an ecological classicAs part of the Advances in Community Ecology class I’m auditing (because I am on sabbatical, and so I can do things like that!), we re-read the classic Chase & Leibold (2002) paper; to summarize, by surveying diversity of producers and animals in ponds, they show that the relationship between productivity and species richness is quadratic within a pond, but linear across aggregates of ponds in a watershed. This was a momentous paper, being one of the first large-scale “natural experiment”, re-inforcing the idea that scale can change the qualitative nature of the relationship, and laying out some interesting hypotheses about the role of compositional dissimilarity on productivity gradients.

**Very** importantly, nothing in what follows changes (essentially) anything
about the conclusion of the original paper. All it does is give me an excuse to
be very pedantic about intercepts, and give a little walkthrough of how I would
adress the problem of fitting a curve to some data in a way that is both
ecologically and statistically satisfactory, with the advantage of a 19 years
headstart on the original paper. This is not a *criticism* (in the sense of
finding flaws) of the original paper, this is a critique (in the sense of
engaging with the material, even if it’s 19 years later).

## So what is the problem?

There is something in this paper that always bugged me: look at the relationship between the productivity and the richness of animals at the regional scale:

We can definitely fit a line (I’m using ordinary least squares here) through
these points! In fact, we can do this with ordinary least-squares curve fitting.
If we eyeball the figure, we can guess that the slope is about a half, and the
interecept is small-ish, which we can use to get initial values and bounds. I am
using the `LsqFit`

package for Julia, which is really fast.

This gives an equation of

$$\text{richness} \approx 0.31\times \text{productivity} + 9.31,$$

and now it is time for my favorite thing to do with model: thinking about the units! Richness is expressed in the unit of “species”, and productivity is measured as $\text{biomass} \times \text{surface}^{-1} \times \text{time}^{-1}$. We know that the result has unit “species”, so we can guess that the slope is expressed as $\text{species} / (\text{biomass} \times \text{surface}^{-1} \times \text{time}^{-1})$, and the intercept is expressed in species.

What does it means?

Well, it means that in a watershed with no productivity, *i.e.* one where (in
the terms of the experiment), algae do not receive enough light to grow on a
surface, we expect to find 9.31 ± 2.7 species of
animals. You may recognize this as a statement that, although statistically
correct, makes little trophic sense: animals need to get their biomass from
somewhere.

## Oh, really?

Let’s have a look at the residuals.

At both low and high productivity, the linear model is *over* estimating species
richness. The RMSE for this fit is 4.14, which
is a useful baseline for what comes next.

The problem here is two-fold: we would ideally like to have a model that predicts “just about 0” species in a watersehd with 0 productivity, and we would definitely like a more balanced distribution of the residuals.

We can solve the first issue by assuming that the relationship is linear, and fitting the model through the origin, which is simply $y = aX + 0$.

Let’ see how this compares:

The RMSE for the constrained fit is 6.5, which is worse than the unconstrained solution; it is also fairly obvious that the residuals are even more poorly distributed than in the previous case.

So by attempting to solve one of our problem (there shouldn’t be animals in an unproductive pond), we made the other one (the distribution of residuals doesn’t look like what we would like under a linear process) worse.

## So what?

Everything so far is done under the assumption that *the relationship between
productivity and biodiversity is linear*, and this got us nowhere; it’s time to
relax it. Luckily, two things behave almost exactly like lines: lines, and most
non-linear functions when given the right parameters and observed over the right
range of inputs. After having exhausted the linear approach, we can start
thinking about another model.

Two models comes to mind: a quadratic model ($y = aX^2 + bX + c$), and the
Michaelis-Menten model ($y = (SX)/(K+X)$). Of these two, note that
Michaelis-Menten is guaranteed to go through the origin, and the quadratic one
*should* as long as $c$ is small. For the record, I would be happy with a
non-zero $c$ as long as 0 is somewhere within the margin of error for the
estimate.

We can guesstimate the parameters for Michaelis-Menten, with $S$ being on the order of the maximum species richness, and $K$ being the point where $X = K/2$, which is probably about a productivity of 50. Let’s see how this fits.

Better! This fit has a RMSE of 2.37, which is about twice as much as the linear fit (for the same number of parameters!). We can repeat the same process with a quadratic fit:

The RMSE for the quadratic model is 2.43, which is slightly worse than the Michaelis-Menten model (and costs one more parameter). The quadratic model predicts 0.9 ± 2.9 species in an unproductive watershed, which is fine because it includes 0, but let’s get rid of this model for now.

## What have we learned?

The relationship between productivity and biodiversity may not be exactly linear. If I had to pick, I would pick a Michaelis-Menten model, which in this case yields a maximum number of species of 48.0, which is reasonable given the reported maximal number of species (about 32, I think).

In concrete terms, it means that the relationship (at the regional scale) between productivity and biodiversity is definitely increasing, possibly monotonous, but unlikely to be linear. The great tragedy here is that the range of productivities measured did not really allow for a clear answer, because we can’t really see whether the quadratic curve would really be supported (by a more productive and less diverse watershed) – you might notice that I am not invoking any ecological mechanisms here because it is not really the point.

**But wait**! The residuals!

Their distribution is a little bit better. Letting go of the “linear” assumption solved our trophic problem (no productivity means no species), and made our statistical problem less problematic.