# Partitioning ecological network dissimilarity

## Criticism is easy, and art is difficult

The recent issue of *EcoSphere* had a deeply questionable article,
arguing essentially that my 2012 work on ecological networks beta
diversity is wrong, and should not be used. The issue with this article is
that, although the arguments it presents seems reasonable if you read them with
a superficial knowledge of the topic, they are essentially misguided, both
because they rely on a flawed definition of what networks are, and on flawed
assumptions of what measures should do.

## Restating the problem

All the way back in 2012, we suggested that ecological networks need a β diversity measure. This was motivated by the fact that we have an α diversity for them (any local network we measure), and a γ diversity as well (the metaweb), but the piece in between was missing. Networks are a little more complex than species composition because not only do interactions vary across networks, species composition does as well.

Ultimately, we ended up being able to measure two things: the total dissimilarity of interactions ($\beta_{wn}$), and the dissimilarity of interactions between shared species ($\beta_{os}$). The basic idea was to replicate the approach to β-diversity by Koleff and colleagues, wherein dissimilarity is based on functions operating on the cardinality of sets: $A$ is the number of shared elements between the pair of objects to compare, and $B$ and $C$ are the number of unique elements to each of these objects.

Not all of the methods censused by Koleff and colleagues have desirable properties, but a few of the usual measures (like Sørensen’s) do.

One of the core message of our paper was that $\beta_{wn} \ge \beta_{os}$, and that therefore we can extract a component of dissimilarity due to the fact that not all species are shared, which we call $\beta_{st}$ (for species turnover).

## Can we calculate the effect of turnover?

Absolutely.

The core idea is that $\beta_{wn} = \beta_{os} + \beta_{st}$, but because we cannot calculate $\beta_{st}$ directly, we can get if from $\beta_{st} = \beta_{wn} - \beta_{os}$. At this point, an example *really* helps. Let us examine the case of the Sørensen dissimilarity (using lowercase letters to indicate the components without specifying what goes into them:

$$\frac{b + c}{2a + b + c}$$

Now, we can start decompose the number of links between two networks, where in network 1 had $L_1 = A + S_1 + U_1$ (where $S$ and $U$ are respectively the number of unique links between species shared with two, and of unique links involving species unique to 1), and $L_2 = A + S_2 + U_2$. With this notation, we can write

$$\beta_{os} = \frac{S_1 + S_2}{2A + (S_1 + S_2)}$$

and

$$\beta_{wn} = \frac{U_1 + U_2 + S_1 + S_2}{2A + (S_1 + S_2) + (U_1 + U_2)}$$

We can simplify this notation with $S = (S_1 + S_2)$, and $U = (U_1 + U_2)$. Let’s rewrite things one more time, just to clarify:

$$\beta_{os} = \frac{S}{2A+S}$$

and

$$\beta_{wn} = \frac{S+U}{2A+S+U}$$

Now, the comparison between these two show that $\beta_{os}$ is of the form $a/b$, and $\beta_{wn}$ of the form $(a+x)/(b+x)$ – as long as $x \ge 0$, which is always the case since it is a count of the unique links that include at least one non-shared species, we know that $(a+x)/(b+x) \ge (a/b)$.

In other words, $\beta_{wn} \ge \beta_{os}$, and as $0 \le \beta_{wn} \le 1$, there is a $\beta_{st}$ value that is also between 0 and 1. You can think of this value as a “residual”, if you want, or an additive constant, that bridges the total network distance to the overlapping subpgraph dissimilarity.

With this established, we can get the expression of $\beta_{st}$. It does not really serve a purpose (yet), but let’s do this regardless:

$$\beta_{st} = \frac{S+U}{2A+S+U} - \frac{S}{2A+S}$$

This can be expressed as

$$\beta_{st} = \frac{(S+U)(2A+S) - S(2A+S+U)}{(2A+S+U)(2A+S)}$$

Well that’s a mouthful, so let’s simplify the numerator:

$$\beta_{st} = \frac{2AS - 2AS + S^2 - S^2 + SU - SU + 2AU}{(2A+S+U)(2A+S)}$$

Notice that all of the terms in the numerator drop, and we are left with

$$\beta_{st} = \frac{2AU}{(2A+S)(2A+S+U)}$$

This value is *not* a dissimilarity component; it is a measure of how much
dissimilarity in $\beta_{wn}$ is unexplained by $\beta_{os}$.

## What are the objections against this method?

I will focus on the two leading criticism against what I outlined above, and explain why they are misguided.

### The denominators for all components should be the same

However, this assumes that dissimilarities are additive, which is typically not the case due to their normalization (scaling) to values between zero and one. Common dissimilarity indices (Sørensen, Jaccard, Bray-Curtis, and others) consist of a numerator that sums up the differences and a denominator that achieves the normalization.

This hinges on a very narrow view of additivity. In short, the objection against such a partition is that the denominator are not the same when calculating $\beta_{wn}$ and $\beta_{os}$; this is a deeply flawed argument, and explaining why will need to go against the way ecologists think about networks.

Networks are *not* matrices. We usually represent them this way because this is
an easy shortcut we understand, and because a lot of algorithm works well on
matrices. But networks are *pairs*.

Specifically, a network is a pair $G = (V,E)$, where $V$ is the set of vertices
(species), and $E$ (the edges/interactions) is a set of *ordered pairs*, whose
elements are taken from $V$. If we want, we can partition this in the context of
network pairwise comparison as $G_1 = (V_a \cup V_1, E_a \cup E_{s1} \cup
E_{u1})$ and $G_2 = (V_a \cup V_2, E_a \cup E_{s2} \cup E_{u2})$, where the
$_{si}$ and $_{ui}$ indices indicate respectively the unique edges involving
only shared species, or involving at least one unique species (*i.e.* involving
species from $V_i$).

A central point of network dissimilarity is that $V$ is embedded in $E$ - if you go through all the pairs in $E$ and collect their unique elements, this is $V$. I realize that this removes species without any interactions, but this discussion is over and done with, and they overwhelmingly should not be included in ecological network analysis.

The question of the *proper* denominator to use is trivially answered by looking
at what remains when we remove the unique species: $g_1 = (V_a, E_a \cup
E_{s1})$, and $g_2 = (V_a, E_a \cup E_{s2})$. The denominator for the Sørensen
measure is:

$$2|E_a| + |E_{s1}| + |E_{s2}|$$

The alternative suggested in the article would be to use the *wrong*
denominator, *i.e.* counting the contribution of interactions whose species *do
not* exist in the subgraphs of made by shared species. This makes neither
ecological, nor mathematical sense, and this suggested normalisation is wrong.

### The turnover component should covary with species turnover but not connectance

The most baffling argument against our early partitioning is found in the caption to one of the figures (with R² that I do not think should be interpreted at all):

An ideal measure of the rewiring component of total network dissimilarity (in an additive partitioning framework) should increase with the proportion of shared species, decrease with fidelity, and be independent of connectance; network size should only influence via the proportion of shared species, but not by total number of species.

Thinking this way removes what makes $\beta_{st}$ important in the first place: we can use it to figure out whether the difference comes from changes in connectance, or changes in species turnover.

But let’s take a look back at the formula for $\beta_{os}$:

$$\beta_{os} = \frac{2A + S}{2A + S +U}$$

There is something here, very important in its absence: neither species
richness, nor the number of links in either networks, appear. We can definitely
get the later, but there is no time at which the former intervenes. This is a
desirable property of the framework! The fact that there are many shared species
does not tell us anything about the fact that they interact in the same way.
This was, in fact, a major result of our initial paper: $\beta_s$ (the species
dissimilarity) is *not* informative about $\beta_{os}$. It is possible to have a
lot of species in common, that interact in different ways. Removing this
property defeats the entire purpose of measuring interaction β diversity!

This is where the misunderstandings in the recent papers are most obvious:

an increasing proportion of shared species should theoretically increase the contribution of interactions in the shared species subweb (and thus of rewiring) to total network dissimilarity

This depends on the connectance (note that there is nothing to support the “theoretically” here). And this is precisely the effect that the author attempts to remove next. What actually matters is the connectance in the different blocks of the network, and specifically the connectance of each network we are comparing.

In brief, the fact that $\beta_{os}$ can be explained by connectance and species turnover is why we need to pay attention to $\beta_{st}$. The fact that at no point does species richness enters the calculation for the network dissimilarity components is a major strength, as it ensures that we can compare the results to measures building on richness without fear of circularity.

## Is the impact of shared species dissimilarity overestimated?

One of the arguments in the article is that in small networks (two nodes and for species), adding a unique edge with a unique species, and a unique edge between shared species, give a result where $\beta_{os}$ is slightly larger than $\beta_{st}$.

Let’s run the numbers - if we have two networks with $n$ edges each, that are shared (remember, the species richness doesn’t matter here), we can write $A = n, U = 0, S = 0$. Let’s first add a link with a unique species, bringing us to $A = n, U = 1, S = 0$. You can check that the formula for $\beta_{os}$ will not be affected as it does not involve $U$. If we now add a link between shared species, we end up with $A = n, U = 1, S = 1$ – note that we can check that the sum of links in the two networks is $2(n+1)$, and we can also observe that it doesn’t really matter which network receives which link, as Sørensen is a symetrical measure.

Let’s see the effect of increasing $n$:

For very low values of $n$, there is a difference wherein $\beta_{os}$ is larger than $\beta_{st}$ – but this is an expected consequence of using the proper denominator. Note also that the effect goes away when the networks get realistically large (by about 12 interactions, there is no effect anymore).

## Why do we care?

There are a few reasons.

First, the point of this new paper is to introduce another function in
`bipartite`

, which is used by many network ecologists; this new function relies
on a flawed normalization, and poses a significant risk of “contaminating” the
literature with a sub-par method. This type of gung-ho additions to `bipartite`

is why it’s not in use in the lab; references packages should implement methods
that work.

Second, we care because the endgame of measures of network dissimilarity is to compare them to measures of community structure - if they are corrected in such a way that we already know how they will covary with species turnover, or prevent them from changing in response to connectance, this essentially renders them meaningless.