Julia in ecology: why multiple dispatch is good

In what is going to be the most technical note so far, I will try to reflect on a few years of using the Julia programming language for computational ecology projects. In particular, I will discuss how multiple dispatch changed my life (for the better), and how it can be used to make ecological analyses streamlined. I will most likely add a few entries to this series during the fall, leading up to a class I will give in the winter.

But what is multiple dispatch?

Imagine a recipe that calls for onions, and you have left in the cupboard is shallots. You know that shallots are little delicate bundles of gustative pleasure, and so you cook them differently (butter and half a teaspoon of sugar), extra gently. And when it’s done, you add them to the rest of the ingredients. This is multiple dispatch.

In computer terms now, we can express this  recipe as the following pseudocode:

function cook(x::Onion)
   return fry(x, butter)

function cook(x::Shallot)
   return roast(x, butter, sugar)

If x is an onion, then we fry it. If it is a shallot, we roast it. The important point is that the interface is the same: no matter what x is, we can cook it.

And where is the ecology in that?

Let’s talk about species interaction networks! One of the things that has been bugging me for a while was that we have no good, common interface to analyze them. There are a variety of packages that are either specific to some types of networks, or specific to some measures, or (worth) both. Because there are many different types of ecological networks.

Or are there? In EcologicalNetwork.jl, I reduced them to a combination of two factors. Are they bipartite or unipartite, and are they quantitative, probabilistic, or deterministic.

In Julia, this can be explained by a number of types and unions of types, and this hierarchy allows to create a number of functions that have the same name, but behave in the correct way based on their input. For example, the number of species in a network is calculated differently if it is bipartite or unipartite:

function richness(N::Bipartite)
   return sum(size(N.A))

function richness(N::Unipartite)
   return size(N.A, 1)

Where this becomes more interesting, is when we start chaining functions. For example, we can take an empirical network, generate the probabilistic version for a null model, then generate replicates, and finally measure the nestedness on every replicate:

using EcologicalNetwork
ollerton() |> null2 |> nullmodel .|> (x) -> nodf(x)[1]

This lines takes advantage of the fact that each function will take the “right” decision based on the type of its input. Specifically, it goes this way: the empirical network is a bipartite and deterministic one. The null2 function generates a probabilistic network which is also bipartite. This is passed to nullmodel, which will generate a number of bipartite deterministic networks, all of them are then passed through  the nodf function to measure their nestedness.

And the resulting pipeline is also clear to read, and expresses what we want to do (how we do it is determined based on the types). As a consequence, we can have a much more general package for network analysis.

But why does this matter?

Because, in short, it lets us (and yes, there are other paradigms that let us do the same thing) express what we want to do. A good example would be measuring the diversity of an ecological community. Let’s say we have a site by species matrix, and this matrix has presence/absence data. We can measure diversity as the number of species as the sum of each row:

function diversity(x::Array{Bool,2})
   return sum(A, 2)

But if we have quantitative information, then we may want to apply Pielou’s measure on each row instead:

function diversity(x::Array{Number,2})
   return mapslices(pielou, x, 2)

In the case where we have a phylogenetic tree, then what about using PD?

function diversity(x::Array{Number,2}, t::PhyloTree)
 return mapslices(n -> pd(n, t), x, 2)

And so on and so forth. In all of these situations, we know that the same concept (diversity) means different things as a function of the context – and for this reason, we want to do different things.

I like  this approach because it lets me focus on the intent of what I want to do. The (still young) EcoJulia project led by Michael Krabbe Borregaard is an attempt to use some of the niftiest features of Julia to develop general interfaces to some types of ecological data. This is something I am really excited to see happen.