Brain dump on drawing random networks — part 1

I’m back from a long hiatus, with some nerdery about ecological networks. There is an interesting idea in the analysis of ecological networks. By generating random networks under some null hypotheses, we can test whether the observed property can be explained by chance alone. If there is a deviation, then chance alone is not sufficient to explain the structure of the network, and therefore there must be some ecology involved. It’s a great idea on paper, but putting things in silico is a bit more complicated.

Let’s start with the basic idea: we have a bipartite network, and we want to know whether its observed nestedness can be explained by its connectance, its degree distribution, or the degree distribution of either sides. We reviewed the whole idea in @DelmBess18, and there are references to the original papers there.

I will be using a currently unreleased version of the EcologicalNetwork package. But the code is not really the important part. We will use a network from @HadfKras13.

using StatPlots
using EcologicalNetwork
N = convert(BinaryNetwork, web_of_life("A_HP_004"))

The measured nestedness of this network (measured with η) is

ERROR: MethodError: no method matching getindex(::Float64, ::Symbol)
. In a “real” analysis, we woul generate probability templates for all four null models; I will limit things to a single one (based on degree distribution):

P = null2(N)

Following @PoisCirt16, we have an expression for the value of η in a probabilistic network: we expect that under the null model, the nestedness of this network will be 0.54. This is lower than the observed value (but the expected connectance is equal to the observed connectance).

The interesting point here is that this value of nestedness on the probabilistic network is a good benchmark for any routine to generate a distribution of networks. So let’s start with this:

R = rand(P, 10_000) # See this beautiful syntax?

Once we have this sample, we can measure the nestedness of these 10000 networks, and plot the output:

nestedness(n) = η(n)
density(nestedness.(R), c=:skyblue, fill=(:skyblue, 0.4, 0), frame=:box, lab="", legend=:topleft)
vline!([nestedness(N)], c=:black, lw=3, lab="Obs.")
vline!([nestedness(P)], c="#cf6c25", lw=2, ls=:dot, lab="Prob.")
xaxis!("Nestedness", (0,1))

Something is not right. We know that the connectance of the probabilistic and observed networks are equal, so let’s see what is hapenning.

density(connectance.(R), c="#347e3c", fill=("#347e3c", 0.4, 0), frame=:box, lab="", legend=:topleft)
vline!([connectance(N)], c=:black, lw=3, lab="Obs.")
vline!([connectance(P)], c="#cf6c25", lw=2, ls=:dot, lab="Prob.")
xaxis!("Connectance", (0,1))

Weird. The randomly generated networks have the correct connectance.

One thing to which nestedness is particularly sensitive is the number of species, and more specifically its interaction with the degree distribution. So if we have some degenerate networks, with species that become unconnected, it may throw the values off. We can remove these networks:

R′ = filter(x -> !isdegenerate(x), R)

We are left with 3859 networks (that’s a lot of networks that were not suitable for the analysis). Let’s revise our distribution of nestedness now:

density(nestedness.(R′), c=:skyblue, fill=(:skyblue, 0.4, 0), frame=:box, lab="", legend=:topleft)
density!(nestedness.(R), c=:skyblue, lab="", ls=:dash)
vline!([nestedness(N)], c=:black, lw=3, lab="Obs.")
vline!([nestedness(P)], c="#cf6c25", lw=2, ls=:dot, lab="Prob.")
xaxis!("Nestedness", (0,1))

There isn’t much difference. The distribution is still off compared to the expected value from the probabilistic network, and the overall distribution hasn’t changed from the previous step (see the dashed blue line).

So let’s scale this up. We can take a bunch of networks and apply the same overall approach: measure the observed value, the probabilitic value, and generate a bunch of replicates under a null model (and average their values, for good measure):

hp = getfield.(filter(x -> contains(x.ID, "_HP_"), web_of_life()), :ID)
Ns = convert.(BinaryNetwork, web_of_life.(hp))
function randomnest(n)
  Rs = rand(null2(n), 5_000)
  # Keep the non-degenerate networks
  filter!(x -> !isdegenerate(x), Rs)
  # Only networks with the same richness
  filter!(x -> richness(x) == richness(n), Rs)
  # Return the average
  return mean(nestedness.(Rs))
randomnest (generic function with 1 method)

We can generate the three values for each network – this is the sort of thing that really should be ran in parallel, but hey.

obs = nestedness.(Ns)
per = nestedness.(null2.(Ns))
rnd = randomnest.(Ns)

Once this is done, we can plot the results:

p1 = plot([0,1],[0,1], frame=:box, c=:grey, ls=:dash, lab="", legend=:topleft)
scatter!(p1, obs, per, lab="Prob.", c="#d7752e", msw=0, ms=5)
scatter!(p1, obs, rnd, lab="Perm.", c="#249977", msw=0, ms=5)
xaxis!(p1, "Observed value", (0,1))
yaxis!(p1, "Expected value", (0,1))

p2 = plot([0,1],[0,1], frame=:box, c=:grey, ls=:dash, lab="", legend=:topleft)
scatter!(p2, per, rnd, msw=0, ms=5, c="#945daf", lab="")
xaxis!(p2, "Probabilistic value", (0,1))
yaxis!(p2, "Randomized value", (0,1))


In the first panel, we can see that the values of nestedness predicted from the probabilistic network and the randomized sample are lower than the observed value except for a very small number of them, and this only exacerbates when the observed nestedness increases. Eyeballing a linear relationship, we would guesstimate that unless the observed nestedness is below 0.25, it will always be significantly larger than expected by chance under this null model. In an interesting twist, @Theb13 identified the same threshold for the significance of modularity. This is interesting, in that it suggests that unless the property of interest is basically zero, using this type of null model to test its significance is not really useful.

The second panel is a tad more interesting: the average nestedness value of the randomized networks is always larger than the probabilistic expectation – and a quick eyeballing of the figure suggests that this relationship has a slope of 1, but a non-zero elevation: the bias accrued by drawing random networks is not stronger for high vs. low nestedness.

I will leave it at that for now, and revisit this in a follow-up looking at other randomization routines. Getting this whole process right matters, because it is used to determine the significance of measured network properties. Ideally we want this to have the appropriate statistical power to draw ecological inference based on the output.