Every time I hear about Big Data in ecology, I cringe a little bit. Some of us may be lucky enough to have genuinely big data, but I believe this is the exception rather than the norm. And this is a good thing, because tiny data are extremely exciting – in short, they offer the challenge of isolating a little bit of signal in a lot of noise, and this is a perfect excuse to apply some really fun tools. And one of my favorite approaches for really small data is ABC, Approximate Bayesian Computation. Let’s dig in!
Don’t we love patterns? This is, after all, one of the purpose of ecology as a science: to describe and document what it is, exactly, that species and individuals and communities and ecosystem do. And in order to do so, we look at them, and transcribe what we see. And so if you happen to find a pattern, this is noteworthy ecology that should be published, whereas the lack of a pattern means the opposite. And now, please, this madness has to stop.
There are very few domains of ecology for which we know any general laws. This is particularly true in the “mess” that is community ecology. Even most recent attempts at conceptual unification rarely go beyond the fact that community ecology is driven by selection, drift, speciation, and dispersal. But so does everything else, as far as living organisms are concerned. Not that we should abandon community ecology, but we need to recognize how little we are able to generalize any of the things we know. Can we replicate ecological results?
In a few weeks, I will be giving a talk at the Association Francophone pour le Savoir annual meeting in McGill University, about how advanced research computing (aka high performance computing) can accelerate discoveries in biodiversity sciences and ecology. Collecting data on any ecosystem, no matter how small, is painstaking. It is long. It is expensive. And as a result, we have a relatively small amount of data. So what could advanced research computing possibly deliver?
I remember the first time I have been surprised by a model. I was working on the conditions under which a mutualist can protect its host from a pathogen, and in particular whether the mutualist can persist or will be displaced by the pathogen (unless there are multiple populations connected by dispersal, the answer is no). What surprised me was how, in the end, the answer to this question depended on the relative value of three parameters. Of course, nothing in modeling should be surprising, because the model encompasses the entirety of its own rules, and so of course the answer is in here, waiting to be found. But where do the models come from?