Approximate Bayesian Computation and tiny data in ecology

Every time I hear about Big Data in ecology, I cringe a little bit. Some of us may be lucky enough to have genuinely big data, but I believe this is the exception rather than the norm. And this is a good thing, because tiny data are extremely exciting – in short, they offer the challenge of isolating a little bit of signal in a lot of noise, and this is a perfect excuse to apply some really fun tools. And one of my favorite approaches for really small data is ABC, Approximate Bayesian Computation. Let’s dig in!

It’s approximate, Bayesian, and computational. What’s not to like?

In a nutshell, ABC is a method to infer model parameters, when the likelihood of the model is difficult to calculate – or more honestly, when you cannot be bothered to calculate it. You should really click the link before, the Wikipedia page was adapted from a PLOS Computational Biology topic page which is well worth reading.  One of the most famous applications of ABC has to do with counting how many socks Karl Broman has, because dealing with quantitative techniques all days changes your brain in surprising ways.

ABC works by comparing the output of a model to empirical data, through summary statistics. Instead of comparing the raw data/output, they are first summarized, and these summaries are compared. If the summaries are close enough, then the parameters that were used for this simulation run are a good representation of reality. It seems simple, because it actually is. The interesting feature here is that summary statistics are used, and so a wide range of models can be compared to empirical data. Specifically, as long as you can apply these summary statistics to the model output, then it is possible to use it in the context of ABC.

In ecology, this is a very important feature. It lets us use models that can be quite arbitrary, and because the method fits in the Bayesian framework, it estimates the distribution of the parameters of these models. In practice, this also means that because the number of data that are put in for the estimation can be small, ABC allows inference based on tiny data. For example, ABC SMC schemes have been used to estimate parameters in age-structured population models, which would be a very difficult thing to do should the likelihood have to be expressed.

One of the issue we have in ecology (and probably in other fields) is that the models do not always adequately capture all of the subtlety of empirical data. ABC is a way of coming up with a model that may be very phenomenological (remember, there is no need to produce a directly comparable dataset as long as you can use summary statistics), and give a “good enough” idea of the values and distributions of underlying parameters.

What I like with this approach, and where I think its inferential power lies, is that we can build toy models, and compare them to small amounts of data. Then, investigating the posterior distribution of the various parameters can give insight about what is important, what is not, and the relationship between the values of different parameters. At the core of computational thinking is the loop of problem formulation (empirical data), solution expression (conceptual model), and solution evaluation (ABC). This approach fits ABC in a very natural way.

I think this method can deliver some interesting results in ecology (as it already did in other fields). The links in this post should give you an idea of where to start – this is a surprisingly simple, yet astoundingly powerful method, and I definitely think it will get more traction in ecology in the short term.