# Quantitative dissimilarity of ecological networks

## And how (not) to measure it

One of the limitations of comparing ecological networks is that, at the moment, our framework to do it is really good with binary interactions (presence/absence), but not necessarily adapted to quantitative networks in which interactions have an explicit strength associated to them. In this post, inspired by some discussions I had over email in the last few weeks, I will try to explore some solutions that would allow us to move from the current situation to a more general expression of beta-diversity that would account for interaction strengths.

Before I do so, an important clarification: when I talk about “ecological networks dissimilarity”, I am talking exclusively about our re-formulation of beta-diversity measures to ecological networks, as in @PoisCana12a. This means that I am interested in (i) comparing interactions between pairs of networks ($\mathcal{G}_1$ and $\mathcal{G}_2$), (ii) partitioning dissimilarity between the overall variation and the variation of interactions between common species, and (iii) expect that any new development for quantitative information has to be “backwards compatible” with the previous approach.

To be explicit about the last point: binary interactions are a special
case of quantitative interactions, in which the interaction strength can be
either 0, or 1. So comparing two networks that have true/false values, or
comparing these networks where false becomes 0, and true becomes 1, should
yield the same result. This is very important, because this ensures that
the results are also compatible with the *probabilistic* networks, as laid
out in @PoisCirt16: replacing false by “a probability of 0” and true with
“a probability of 1” gives the exact same result. For all of these reasons,
I am *not* interested in approaches relying on matrix correlations. For one
thing, these have they own statistical issues to contend with; and it would
make the two approaches very difficult to compare.

First thing first, **how do the binary version works**? The short version is,
it relies on distributing interactions across three sets, called $a$, $b$,
and $c$. As in @KoleGast03, $a$ has the elements that are shared between
the two networks, $b$ has elements unique to $\mathcal{G}_2$, and $c$ has
elements unique to $\mathcal{G}_1$. The dissimilarity is given by applying
any function $f(a,b,c)$, such as for example

$$ \beta_{\text{W}}(a,b,c) = 2\times\frac{a+b+c}{2a+b+c}-1 $$

for Whittaker’s measure. There are many others, but this one will do.

The *twist*, so to speak, is that we can decide what to use to measure the
values of $a$, $b$, and $c$. We can use the two graphs directly, which results
in the overall dissimilarity ($wn$), or we can use both induced subgraphs,
which is to say only the species in common, which is $os$, the dissimilarity
between shared species. Unless the measure of dissimilarity applied to $a$,
$b$, and $c$ has undesirable properties, we can show that $0 \le os \le wn \le
1$, and so we finally define $st = wn - os$, which the part of dissimilarity
that happens because some species are unique to either network.

So far, so good. One important property of this entire thing is that,
because it is based on measuring the cardinality of sets, we have $a+b+c$
equal to the count of whatever we are counting in the union of $\mathcal{G}_1$
and $\mathcal{G}_2$. This is a *desirable property*, as it ensures that we
can aggregate networks through multiple applications of the union function.

This is (plus or minus a few subtleties) the core of the method to compare two ecological networks. I had a very specific point in mind for talking about unions in the previous paragraph. If we want to measure the overall dissimilarity ($wn$) between two networks, if $E$ is a function returning the edges of a graph, then we can write

$$a = |E(\mathcal{G}_1\cap\mathcal{G}_2)|$$ $$b = |E(\mathcal{G}_2\setminus\mathcal{G}_1)|$$ $$c = |E(\mathcal{G}_1\setminus\mathcal{G}_2)|$$

In the same vein, computing $os$ requires

$$a = |E(\mathcal{g}_1\cap\mathcal{g}_2)|$$ $$b = |E(\mathcal{g}_2\setminus\mathcal{g}_1)|$$ $$c = |E(\mathcal{g}_1\setminus\mathcal{g}_2)|$$

where $\mathcal{g}_1 = \mathcal{G}_1[V(\mathcal{G}_1)\cap V(\mathcal{G}_2)]$, where $V$ is a function to return the vertices of a graph.

So how do we make this quantitative? Let us zoom in to the level of a *pair*
of interactions, $x$ and $y$, where $x$ and $y$ involve the *same* species,
but in different networks. Because we are moving into a quantitative question,
we will assume that $x$ and $y$ represent the *strength* of these interactions,
with $x \le 0$ and $y \le 0$. **How much do they have in common**? It cannot
be more than the weakest interaction, so it would be tempting to define
(I will prefix every quantitative thing with a $q$ or a $Q$)

$$qa = \text{min}(x, y)$$

And so by analogy with the binary version, this results in

$$qb = y - qa$$ $$qc = x - qa$$

Whichever interaction is the weakest will have a contribution (to either
$b$ or $c$ or 0). What happens to $qa + qb + qc$? It is equal to $qa + y -
qa + x - qa$, which we can work out is just about $x + y - qa$. We *could*
decide to have this sum to $x + y$ instead, which simply requires to set
$qa = 2\times \text{min}(x,y)$, and then $qb$ and $qc$ use $qa/2$, but this
is exactly as furiously *ad hoc* as it seems, and doing so would require
careful evaluation of the consequences.

This works at the scale of one pair of interaction, but we can also make it scale up to the entire network. Let’s first define $\mathcal{M} = \mathcal{G}_1 \cup \mathcal{G}_2$, and we can write the $a$, $b$, and $c$ components for the entire network pair as $Qa = \sum qa$, $Qb = \sum qb$, and $Qc = \sum qc$.

Remember that we want this approach to give the same result when we use a binary network, so let’s start by checking this. Let’s assume we have

$$\mathcal{G}_1 = ((A,i,2) (B,i,4), (B,j,6))$$

and

$$\mathcal{G}_2 = ((A,i,1) (B,j,3), (C,j,5))$$

This pair of network has both species and interaction turnover, and different interaction strength. If we count the number of interactions in common in the binary networks, we will find

$$a = |(A,i), (B,j)| = 2$$ $$b = |(C,j)| = 1$$ $$c = |(B,i)| = 1$$

If we remove any interactions involving unique species (this is $(C,j)$), we can measure the dissimilarity of interactions between shared species with

$$a = |(A,i), (B,j)| = 2$$ $$b = |\emptyset| = 0$$ $$c = |(B,i)| = 1$$

To ensure that the quantitative measure is comparable, all we need to do is
calculate $Qa$, $Qb$, and $Qc$, *but* with all interaction strengths set to

- This is easier to visualize in a table:

$\mathcal{G}_1$ | $\mathcal{G}_2$ | $qa$ | $qb$ | $qc$ | |
---|---|---|---|---|---|

$(A, i)$ | 1 | 1 | 1 | 0 | 0 |

$(B, i)$ | 1 | 0 | 0 | 0 | 1 |

$(B, j)$ | 1 | 1 | 1 | 0 | 0 |

$(C, j)$ | 0 | 1 | 0 | 1 | 0 |

It is simple to confirm (by summing the columns) that, if we account for $(C,
j)$, $Qa = 2$, $Qb = 1$, and $Qc = 1$, and if we do not, $Qa = 2$, $Qb =
0$, and $Qc = 1$. **The quantitative version works**.

We can do the same thing with the actual interaction strengths:

$\mathcal{G}_1$ | $\mathcal{G}_2$ | $qa$ | $qb$ | $qc$ | |
---|---|---|---|---|---|

$(A, i)$ | 2 | 1 | 1 | 0 | 1 |

$(B, i)$ | 4 | 0 | 0 | 0 | 4 |

$(B, j)$ | 6 | 3 | 3 | 0 | 3 |

$(C, j)$ | 0 | 5 | 0 | 5 | 0 |

We can do the same thing as above (summing the columns), to have $Qa = 4$, $Qb = 5$, and $Qc = 8$ for the whole network comparison, and $Qa = 4$, $Qb = 0$, and $Qc = 8$ for the shared species comparison.

Let’s plug these numbers in the beta diversity function which I mentioned about 500 words ago:

$wn$ | $os$ | $st$ | |
---|---|---|---|

binary | 0.33 | 0.2 | 0.13 |

quantitative | 0.62 | 0.5 | 0.12 |

**Let us take a few steps back**. This describes a way of partitioning the
dissimilarity of quantitative networks, using the exact same approach as the
binary framework, without needing to resort to matrix correlations tests, and
in a way that gives the same result when quantitative information is removed.

**Success**? **Not quite**.

To be honest, I would use this if I had to (after performing an
actual validation of this method on simulated and empirical data, *etc*),
but there is something I do not quite like about the overall approach: it
assumes that the strength of an interaction is the same thing, in that it can
be reduced to, the existence of the interaction. When we first started thinking
more deeply about the causes of network dissimilarity [@PoisStou15], not by
looking at numbers but by relating the technique to the ecological processes,
it became clear that there are processes involved in *making interactions
happen*, and other processes involved in *making interactions strong*.

To make a parallel with species distributions and species abundances, this
problem deserves to be considered as a variant of the problem outlined by
@BoulGrav12: first, consider if the interaction/occurrence happens; second,
infer its strength/density. If we do this, the quantitative dissimilarity needs
to be partitioned further, because some interactions will have dissimilarity
owing to the fact that they *do not* occur in one of the networks. As a direct
consequence, part of the quantitative dissimilarity is explained by the binary
dissimilarity, and the solution presented here is masking some of the details.

Another issue with this approach is that the definition of the metaweb
becomes less clear. In the binary version, the metaweb (*i.e.* the list of
all interactions across multiple measures of local networks) is simple to
define, as it is the union of all networks in the set. In the quantitative
version, what to do is not quite obvious. Should we take the maximum value
of every interaction? If so, the quantitative metaweb will be unreasonably
strongly connected. The minimum value? It will converge to 0. The minimum of
non-zero values? It would make the quantitative metaweb be unreasonably weakly
connected. This is not a trivial decision *at all*, because a big part of our
network beta-diversity framework is allowing to measure the difference of a
single realization from the metaweb from whence it came. This is essentially
a way of answering the question, “how different is this locality from what we
expect knowing the rest of the region?”. This is, possibly, a question we would
like to answer with quantitative data as well; so this requires to come up
with a rule for the quantitative metaweb that makes sense within this context.

**In short**, the solution in this blog post is *correct* (I think). But it is
*incomplete*, in that it is not allowing to separate some of the components
of dissimilarity into what comes from the binary dissimilarity, and what
comes from the actual interaction strength difference from what remains. It
also does not allow to define a metaweb (it is definitely possible, but this
blog post is already far longer as it is) in a way that can be immediately
justified as anything else than *ad hoc* improvisation. There is clearly
something in here worth exploring (and we are, in fact, exploring it at the
moment); but it is easy to make grave mistakes when applying tools in a way
that is not what they were designed for. Sometimes, the best practice is to
not do anything until we have adapter the tools well.