Scale, productivity, biodiversity, and curve-fitting
Revisiting an ecological classicAs part of the Advances in Community Ecology class I’m auditing (because I am on sabbatical, and so I can do things like that!), we re-read the classic Chase & Leibold (2002) paper; to summarize, by surveying diversity of producers and animals in ponds, they show that the relationship between productivity and species richness is quadratic within a pond, but linear across aggregates of ponds in a watershed. This was a momentous paper, being one of the first large-scale “natural experiment”, re-inforcing the idea that scale can change the qualitative nature of the relationship, and laying out some interesting hypotheses about the role of compositional dissimilarity on productivity gradients.
Very importantly, nothing in what follows changes (essentially) anything about the conclusion of the original paper. All it does is give me an excuse to be very pedantic about intercepts, and give a little walkthrough of how I would adress the problem of fitting a curve to some data in a way that is both ecologically and statistically satisfactory, with the advantage of a 19 years headstart on the original paper. This is not a criticism (in the sense of finding flaws) of the original paper, this is a critique (in the sense of engaging with the material, even if it’s 19 years later).
So what is the problem?
There is something in this paper that always bugged me: look at the relationship between the productivity and the richness of animals at the regional scale:
We can definitely fit a line (I’m using ordinary least squares here) through
these points! In fact, we can do this with ordinary least-squares curve fitting.
If we eyeball the figure, we can guess that the slope is about a half, and the
interecept is small-ish, which we can use to get initial values and bounds. I am
using the LsqFit
package for Julia, which is really fast.
This gives an equation of
$$\text{richness} \approx 0.31\times \text{productivity} + 9.31,$$
and now it is time for my favorite thing to do with model: thinking about the units! Richness is expressed in the unit of “species”, and productivity is measured as $\text{biomass} \times \text{surface}^{-1} \times \text{time}^{-1}$. We know that the result has unit “species”, so we can guess that the slope is expressed as $\text{species} / (\text{biomass} \times \text{surface}^{-1} \times \text{time}^{-1})$, and the intercept is expressed in species.
What does it means?
Well, it means that in a watershed with no productivity, i.e. one where (in the terms of the experiment), algae do not receive enough light to grow on a surface, we expect to find 9.31 ± 2.7 species of animals. You may recognize this as a statement that, although statistically correct, makes little trophic sense: animals need to get their biomass from somewhere.
Oh, really?
Let’s have a look at the residuals.
At both low and high productivity, the linear model is over estimating species richness. The RMSE for this fit is 4.14, which is a useful baseline for what comes next.
The problem here is two-fold: we would ideally like to have a model that predicts “just about 0” species in a watersehd with 0 productivity, and we would definitely like a more balanced distribution of the residuals.
We can solve the first issue by assuming that the relationship is linear, and fitting the model through the origin, which is simply $y = aX + 0$.
Let’ see how this compares:
The RMSE for the constrained fit is 6.5, which is worse than the unconstrained solution; it is also fairly obvious that the residuals are even more poorly distributed than in the previous case.
So by attempting to solve one of our problem (there shouldn’t be animals in an unproductive pond), we made the other one (the distribution of residuals doesn’t look like what we would like under a linear process) worse.
So what?
Everything so far is done under the assumption that the relationship between productivity and biodiversity is linear, and this got us nowhere; it’s time to relax it. Luckily, two things behave almost exactly like lines: lines, and most non-linear functions when given the right parameters and observed over the right range of inputs. After having exhausted the linear approach, we can start thinking about another model.
Two models comes to mind: a quadratic model ($y = aX^2 + bX + c$), and the Michaelis-Menten model ($y = (SX)/(K+X)$). Of these two, note that Michaelis-Menten is guaranteed to go through the origin, and the quadratic one should as long as $c$ is small. For the record, I would be happy with a non-zero $c$ as long as 0 is somewhere within the margin of error for the estimate.
We can guesstimate the parameters for Michaelis-Menten, with $S$ being on the order of the maximum species richness, and $K$ being the point where $X = K/2$, which is probably about a productivity of 50. Let’s see how this fits.
Better! This fit has a RMSE of 2.37, which is about twice as much as the linear fit (for the same number of parameters!). We can repeat the same process with a quadratic fit:
The RMSE for the quadratic model is 2.43, which is slightly worse than the Michaelis-Menten model (and costs one more parameter). The quadratic model predicts 0.9 ± 2.9 species in an unproductive watershed, which is fine because it includes 0, but let’s get rid of this model for now.
What have we learned?
The relationship between productivity and biodiversity may not be exactly linear. If I had to pick, I would pick a Michaelis-Menten model, which in this case yields a maximum number of species of 48.0, which is reasonable given the reported maximal number of species (about 32, I think).
In concrete terms, it means that the relationship (at the regional scale) between productivity and biodiversity is definitely increasing, possibly monotonous, but unlikely to be linear. The great tragedy here is that the range of productivities measured did not really allow for a clear answer, because we can’t really see whether the quadratic curve would really be supported (by a more productive and less diverse watershed) – you might notice that I am not invoking any ecological mechanisms here because it is not really the point.
But wait! The residuals!
Their distribution is a little bit better. Letting go of the “linear” assumption solved our trophic problem (no productivity means no species), and made our statistical problem less problematic.