# Brain dump on drawing random networks — part 1

I’m back from a long hiatus, with some nerdery about ecological networks. There
is an interesting idea in the analysis of ecological networks. By generating
random networks under some null hypotheses, we can test whether the observed
property can be explained by chance alone. If there is a deviation, then chance
alone is not sufficient to explain the structure of the network, and therefore
there must be some ecology involved. It’s a great idea on paper, but putting
things *in silico* is a bit more complicated.

Let’s start with the basic idea: we have a bipartite network, and we want to know whether its observed nestedness can be explained by its connectance, its degree distribution, or the degree distribution of either sides. We reviewed the whole idea in @DelmBess18, and there are references to the original papers there.

I will be using a *currently unreleased* version of the `EcologicalNetwork`

package. But the code is not really the important part. We will use a network
from @HadfKras13.

```
using StatPlots
using EcologicalNetwork
N = convert(BinaryNetwork, web_of_life("A_HP_004"))
```

The measured nestedness of this network (measured with η) is

ERROR: MethodError: no method matching getindex(::Float64, ::Symbol). In a “real” analysis, we woul generate probability templates for all four null models; I will limit things to a single one (based on degree distribution):

```
P = null2(N)
```

Following @PoisCirt16, we have an expression for the value of η in a probabilistic network: we expect that under the null model, the nestedness of this network will be 0.54. This is lower than the observed value (but the expected connectance is equal to the observed connectance).

The interesting point here is that this value of nestedness on the probabilistic network is a good benchmark for any routine to generate a distribution of networks. So let’s start with this:

```
R = rand(P, 10_000) # See this beautiful syntax?
```

Once we have this sample, we can measure the nestedness of these 10000 networks, and plot the output:

```
nestedness(n) = η(n)
density(nestedness.(R), c=:skyblue, fill=(:skyblue, 0.4, 0), frame=:box, lab="", legend=:topleft)
vline!([nestedness(N)], c=:black, lw=3, lab="Obs.")
vline!([nestedness(P)], c="#cf6c25", lw=2, ls=:dot, lab="Prob.")
xaxis!("Nestedness", (0,1))
yaxis!((0,8))
```

Something is not right. We know that the connectance of the probabilistic and observed networks are equal, so let’s see what is hapenning.

```
density(connectance.(R), c="#347e3c", fill=("#347e3c", 0.4, 0), frame=:box, lab="", legend=:topleft)
vline!([connectance(N)], c=:black, lw=3, lab="Obs.")
vline!([connectance(P)], c="#cf6c25", lw=2, ls=:dot, lab="Prob.")
xaxis!("Connectance", (0,1))
yaxis!((0,10))
```

**Weird**. The randomly generated networks have the correct connectance.

One thing to which nestedness is particularly sensitive is the number of species, and more specifically its interaction with the degree distribution. So if we have some degenerate networks, with species that become unconnected, it may throw the values off. We can remove these networks:

```
R′ = filter(x -> !isdegenerate(x), R)
```

We are left with
3859 networks (that’s *a lot* of networks that were
not suitable for the analysis). Let’s revise our distribution of nestedness now:

```
density(nestedness.(R′), c=:skyblue, fill=(:skyblue, 0.4, 0), frame=:box, lab="", legend=:topleft)
density!(nestedness.(R), c=:skyblue, lab="", ls=:dash)
vline!([nestedness(N)], c=:black, lw=3, lab="Obs.")
vline!([nestedness(P)], c="#cf6c25", lw=2, ls=:dot, lab="Prob.")
xaxis!("Nestedness", (0,1))
yaxis!((0,8))
```

There isn’t much difference. The distribution is still off compared to the expected value from the probabilistic network, and the overall distribution hasn’t changed from the previous step (see the dashed blue line).

So let’s scale this up. We can take a bunch of networks and apply the same overall approach: measure the observed value, the probabilitic value, and generate a bunch of replicates under a null model (and average their values, for good measure):

```
hp = getfield.(filter(x -> contains(x.ID, "_HP_"), web_of_life()), :ID)
Ns = convert.(BinaryNetwork, web_of_life.(hp))
function randomnest(n)
Rs = rand(null2(n), 5_000)
# Keep the non-degenerate networks
filter!(x -> !isdegenerate(x), Rs)
# Only networks with the same richness
filter!(x -> richness(x) == richness(n), Rs)
# Return the average
return mean(nestedness.(Rs))
end
```

```
randomnest (generic function with 1 method)
```

We can generate the three values for each network – this is the sort of thing that really should be ran in parallel, but hey.

```
obs = nestedness.(Ns)
per = nestedness.(null2.(Ns))
rnd = randomnest.(Ns)
```

Once this is done, we can plot the results:

```
p1 = plot([0,1],[0,1], frame=:box, c=:grey, ls=:dash, lab="", legend=:topleft)
scatter!(p1, obs, per, lab="Prob.", c="#d7752e", msw=0, ms=5)
scatter!(p1, obs, rnd, lab="Perm.", c="#249977", msw=0, ms=5)
xaxis!(p1, "Observed value", (0,1))
yaxis!(p1, "Expected value", (0,1))
p2 = plot([0,1],[0,1], frame=:box, c=:grey, ls=:dash, lab="", legend=:topleft)
scatter!(p2, per, rnd, msw=0, ms=5, c="#945daf", lab="")
xaxis!(p2, "Probabilistic value", (0,1))
yaxis!(p2, "Randomized value", (0,1))
plot(p1,p2)
```

In the first panel, we can see that the values of nestedness predicted from the
probabilistic network *and* the randomized sample are lower than the observed
value except for a very small number of them, and this only exacerbates when the
observed nestedness increases. Eyeballing a linear relationship, we would
guesstimate that unless the observed nestedness is below 0.25, it will *always*
be significantly larger than expected by chance under this null model. In an
interesting twist, @Theb13 identified the same threshold for the significance of
modularity. This is interesting, in that it suggests that unless the property of
interest is basically zero, using this type of null model to test its
significance is not really useful.

The second panel is a tad more interesting: the average nestedness value of the randomized networks is always larger than the probabilistic expectation – and a quick eyeballing of the figure suggests that this relationship has a slope of 1, but a non-zero elevation: the bias accrued by drawing random networks is not stronger for high vs. low nestedness.

I will leave it at that for now, and revisit this in a follow-up looking at other randomization routines. Getting this whole process right matters, because it is used to determine the significance of measured network properties. Ideally we want this to have the appropriate statistical power to draw ecological inference based on the output.