Expensive & Open Source Science

free as in 'holy cow that is a lot of money'

I have been playing around with the scc utility, which in addition to counting lines of code and complexity within a project, also provides estimates of the development cost of a project. And so, of course, I am going to use this as an opportunity to be a little provocative about data sharing (you’ll see why in a minute).

I used the main Julia packages in which the lab played an active development role over the last five years, and the results (in USD) are in the table below:

Package Development cost
BioEnergeticFoodWebs 153377
GBIF 35783
EcologicalNetworks 422329
EcologicalNetworksPlots 31084
Mangal 44163
NCBITaxonomy 35036
SimpleSDMLayers 103895
NeutralLandscapes 29829
Total 855496

That’s correct! The package development activities of the lab are worth about five hundred thousands double double (L) at Tim Hortons, or about 1% of Canada’s daily Timmies' intake.

For the last few years, I have been trying to make the point that treating data and code differently when it comes to sharing is hypocritical, and belies an idea of ecology where the labor from field work is privatized, but the labor from tool building is distributed (and indeed, expected). The usual argument I hear in support to this position is: data collection is expensive.

This is correct.

But as the table above shows, so is software development. And a moderately complex piece of code can represent a 100k gift to the community.

The differences between data sharing and software releases are two-fold.

First, software development has far less funding support. Looking back at old grant applications (and a number of these packages were developed outside of grant applications), I could only track 33k worth of money that was earmarked for things listed in this table. I understand that it’s no possible to go collect data for free - but the costs of software development are so well hidden that we expect the tools to materialize of their own and be accessible to all.

Second, software development enables a lot of research. None of the entries in the table are code that was written for a single project, never to be re-used. They may originate in a project, but they solve general problems. This is, indeed, the underlying philosophy of developing packages: we want them to be freely accessible, so our colleagues do not need to reinvent the wheel.

This brings me to my final point: software costs money. The FOSS software people use in the analysis of data they will never make public is only free in the sense that no one had to pay to use it. But we can go ahead with a little valuation exercise (which I do not think is especially relevant as “cost” is different from “value” or “worth”) to show that free as it may be, it is not cheap.

And so maybe, just maybe, we (as a community) could start under-valuing its development so much.