Computational ecology: currencies for collaboration

In this follow-up to the previous part of the manuscript on computational ecology, I explore some of the ways to facilitate collaborations between data users and data producers. You can read the first part to get up to speed, and then feel free to comment and give feedback.

An important question to further the integration of computational approaches to the workflow of ecological research is to establish currencies for collaborations. Both at the scale of individuals researchers, research groups, and larger research communities, it is important to understand what each can contribute to the research effort. As ecological research is expected to be increasingly predictive and policy-relevant, and as fundamental research tends to tackle increasingly refined and complex questions, it is expected that research problems will become more difficult to resolve. This is an incentive for collaborations that build on the skills that are specific to different approaches.

In an editorial to the New England Journal of Medicine, Longo and Drazen (2016) characterized scientists using previously published data as “research parasites” (backclash by a large part of the scientific community caused one of the authors to later retract the statement – Drazen (2016)). Although community ecologists would have, anyways, realized that the presence of parasites indicates a healthy ecosystem (Marcogliese 2005; Hudson, Dobson, and Lafferty 2006), this feeling on unfair benefit to data re-analysis which is also expressed by ecologists (Mills et al. 2015) has to be adressed. It has no empirical support: Evans (2016) shows that the rate of data re-use in ecology is low and has a large delay – he found no instances of re-analysing the same data for the same (or similar) purpose. There is a necessary delay between the moment data are available, and the moment where they are re-used (especially considering that data are, at the earliest, published at the same time as the paper). This delay is introduced by the need to understand the data, see how they can be combined, develop a research hypothesis, etc..


On the other hand, there are multiple instances of combining multiple datasets collected at different scales, to adress an entirely different question (see GBIF 2016 for an excellent showcase) – it is more likely than data re-use is done with the intent of exploring different questions. It is also worth remembering that ecology already benefit immensely from data re-use – data collected by citizen scientists are used to generate estimates of biodiversity distribution, but also set and refine conservation target (Devictor, Whittaker, and Beltrame 2010); an overwhelming majority of our knowledge of bird richness and distribution comes from the eBird project (Sullivan et al. 2014; Sullivan et al. 2009), which is essentially fed by the unpaid work of citizen scientists.

With this is mind, there is no tip-toeing around the fact that computational ecologists will be data consumers, and this data will have to come from ecologists with active field programs (in addition to government, industry, and citizens). Recognizing that computational ecology needs this data as a condition for its continued existence and relevance should motivate the establishment of a way to credit and recognize the role of data producers (which is discussed in Poisot et al. 2016, in particular in the context of massive dataset aggregation). Data re-users must be extremely pro-active in the establishment of crediting mechanisms for data producers; as the availability of these data is crucial to computational approaches, and as we do not share any of the cost of collecting these data, it behooves us to make sure that our research practices do not accrue a cost for our colleagues with field or lab programs. Research funders could develop financial incentives to these collaborations, specifically by dedicating a part of the money to developing and implementing sound data archival and re-use strategies, or by encouraging researchers to re-use existing data when they exist.