Bipartite networks are not sampled well.
There is a whole sub-genre of the ecological network literature working on elucidating “the structure” of bipartite networks (parasite/host, pollinator/plant, …). I am, of course, guilty of contributing a few papers to this genre. The premise is that, by putting together enough data from different places, we may be able to infer some of the general mechanisms that shape different aspects of the structure.
After following a few conversations on twitter, notably by Emilio Bruna on the deficit of data in the South, and by Katherine Crocker on the importance of territorial issues in science, I wanted to look at how well the current available networks sampled the latitudinal gradient (because this is the low hanging fruit for these questions). Sampling the gradient well is important, because we can only describe well what we sample well, and if our sample is biased towards certain regions, then our “general” trends implicitly assume that the unsampled regions do not matter.
How do bipartite ecological networks fare?
Poorly.
The map of the left is the current map of data from mangal.io (v2 coming soon). Looking at the raw data, I was surprised to see how rare latitudes lower than 0 where. Our knowledge of these systems is heavily biased towards the northern hemisphere. In fact, if we look at the distribution of latitudes for a subset of these data, it becomes a little bit clearer: not only do we sample more, overall, at higher latitudes, but the range of positive latitudes that is covered is larger (0 to ~ 90, vs 0 to -60).
Part of this is because the landmass distribution is not uniform, but this cannot be the only explanation. There are almost no data in Africa, and a handful in South America. This is in stark contrast with the situation in Europe or North America.
This may be a problem.
As I mentioned earlier, our “general” inference about networks are only really robust in the area we describe well. At this point, although we can compare the structure of networks in space, any general discussion on the latitudinal trends has to be presented with the caveat that some areas are virtually not sampled.
And we have some clues that the situation at lower latitudes is not simply a mirror of the situation at higher ones. When we reconstructed food webs from distribution and interaction data, we found no latitudinal gradient in connectance (and because of the co-linearity between networks measures, this means that a latitudinal gradient in other measures is unlikely).
Of course I firmly believe in aggregating data from different sources to infer some more general rules. But we (myself very much included) need to be careful about the implicit assumptions: “sampled” does not mean “general”, “unsampled” does not mean “unimportant”, and “existing” does not mean “sufficient to make inference”.
By the way, if you were expecting a discussion of the sampling of one ecological network, then Pedro Jordano wrote one of the most insightful papers on the topic last year. Reading up on how difficult it is to sample a single network should put the problem of sampling all the networks everywhere in perspective…