Reproducing modelling papers is important and instructive.
These past few days, I have been re-reading a paper that stimulated me a lot during my PhD. And I found myself wanting to dig in a little bit deeper into the mechanisms of one particular result. Since this paper was published in the 1990s, I never even attempted to look for the code, and started re-writing my own implementation. It made me realize a few things along the way.
I never really understood this paper. Sure, I got the general message, and how the mechanisms interact to generate the main result. But I did not really grok it. Now that I am done with the re-implementation, I have a much deeper understanding of the results (and therefore, of the ecology behind them). In a way, this is because when talking about a model, words are not as efficient as equations, and when talking about a model with stochasticity and heuristics, words are not as efficient as code.
Reproducing the paper was also a good experience in that for once I knew what I should get. Not only because of the figures in the original psper (they may be wrong, after all), but because subsequent papers had confirmed some of the results. But this introduced a number of practical considerations, that turned the whole experience into fun detective work. Starting from the text, I wrote what I thought was the correct heuristic, then see whether it matched the results (using unit tests, after extracting the values from the original plots). Most of the time, it didn’t. So I was left tweaking my implementation until it matched (in the end, it did).
This is a teachable moment, for two reasons. First, models are extremely sensitive to choices in implementation. It is never just a matter of writing the equations, then translating them into code. We make a series of decisions along the way, and it makes modelling papers no more (or no less) objective than any other form of inquiry. Second, what does replication even mean? Inferring the code from the text is not the best target, because nuances of the implementation are easily lost (or blurred, or forgotten) when writing. In a way, replicability becomesma game of finding the right way to come up with the same figures.
What makes the exercice even more worth it is that these projects are now publishable units. ReScience, for example, is a whole journal dedicated to publishing replications and reference implementations of published papers. As we wrote in a recent paper describing a reference implementation:
We argue that providing the community with reference implementations of common models is an important task. First, implementing complex models can be a difficult task, in which programming mistakes will bias the output of the simulations, and therefore the ecological interpretations we draw from them. Second, reference implementations facilitate the comparison of studies. Currently, comparing studies means not only comparing results but also comparing implementations – because not all code is public, a difference in results cannot be properly explained as an error in either studies, and this eventually generates more uncertainty than it does answers. Finally, having a reference implementation eases reproducibility substancially. Specifically, it becomes enough to specify which version of the package was used, and to publish the script used to run the simulations (as we do in this manuscript). We fervently believe that more effort should be invested in providing the community with reference implementations of the models that represents cornerstones of our ecological understanding.