SDM with bioclim in three lines of code
Or why API mattersIn this post, we will walk through the process of implementing the bioclim species distribution model in Julia, and use it to make a simple prediction based on bioclimatic data. Most of the code is based on SimpleSDMLayers.jl, an in-development package.
Getting the data
To start with, let’s get some data about amphibians in Europe:
# Prepare a folder to store the data
ispath("data") || mkdir("data")
data_folder = joinpath("data", "bioclim-sdm")
ispath(data_folder) || mkdir(data_folder)
# Download the data
_project_root = "https://raw.githubusercontent.com/EcoJulia/SpatialEcology.jl/master/"
amph_file = joinpath(data_folder, "amphibians.csv")
isfile(amph_file) || download(_project_root*"data/amph_Europe.csv", amph_file)
Our next step will be to read these data, and get the occurrences for the first
species. We can do the next step with readlines
and split
and a few
convert
s, but since this is a csv file, let’s load it properly using CSV
and
DataFrames
:
import CSV
using DataFrames
amphibians = CSV.read(amph_file);
We will focus on the first species (Salamandra salamandra), and we specifically want to get the latitudes and longitudes at which it has been observed:
occupied = amphibians[:Salamandra_salamandra] .> 0.0
salamandra = amphibians[occupied,[:Lat, :Long]]
rename!(salamandra, (:Lat => :latitude), (:Long => :longitude))
salamandra[1:10,:]
latitude | longitude | |
---|---|---|
Float64 | Float64 | |
1 | 46.5 | 17.5 |
2 | 47.5 | 17.5 |
3 | 37.5 | 24.5 |
4 | 38.5 | 24.5 |
5 | 38.5 | 23.5 |
6 | 38.5 | 22.5 |
7 | 39.5 | 23.5 |
8 | 43.5 | 16.5 |
9 | 44.5 | 14.5 |
10 | 45.5 | 14.5 |
Note that in this example, we would have data about absences, but let’s
disregard this for this illustration. Now, we will need to download some
bioclimatic data, so let’s get started. As an illustration, we will get only the
temperature and precipitation from the worldclim 2.0 database, which can get
downloaded directly using SimpleSDMLayers
:
using SimpleSDMLayers
using Plots
assets_path = joinpath("data", "assets")
ispath(assets_path) || mkdir(assets_path)
temperature, precipitation = worldclim([1,12]; path=assets_path)
heatmap(temperature, c=:Oranges, frame=:box, size=(800,400), dpi=120)

Map of temperatures from the worldclim data, with a resolution of 10 arc minutes.
You might notice that this map is way too large compared to our data, so let’s define a bounding box.
adj = 2.0
bbox = (
(minimum(salamandra.longitude)-adj,maximum(salamandra.longitude)+adj),
(minimum(salamandra.latitude)-adj,maximum(salamandra.latitude)+adj)
)
((-11.5, 29.5), (34.5, 55.5))
And now let’s clip our data again:
temperature_europe = temperature[bbox...]
precipitation_europe = precipitation[bbox...]
heatmap(precipitation_europe, c=:Blues, frame=:box, size=(800,400), dpi=120)

And just like this, we have all we need to start making predictions. So let us move to…
Building the model
The bioclim model is really, really straightforward. We need to extract the value of a variable where a species occur, and then transform the quantile into a score, so that the score is between 0 and 1, and a species occuring at the median value will get a score of one. This is done for all variables, and the results are reduced by taking the minimum score across all variables, so that the occurrence score is limited by the most limiting factor.
Let us write a method to extract the values from a DataFrame
and a
SimpleSDMLayer
. To facilitate the work, we will split this into a method for
the DataFrame
, which will also ensure that NaN
are removed from the result,
and a method for the DataFrameRow
, which we really should modify to check for
the presence of the required columns :latitude
and :longitude
, for example:
function Base.getindex(layer::T, df::D) where {T <: SimpleSDMLayer, D <: DataFrame}
return filter(!isnan, [layer[row] for row in eachrow(df)])
end
function Base.getindex(layer::T, row::R) where {T <: SimpleSDMLayer, R <: DataFrameRow}
return layer[row.longitude, row.latitude]
end
histogram(temperature_europe[salamandra], size=(800,400), dpi=120, leg=false)

Distribution of the temperatures in which Salamandra salamandra has been observed in Europe.
From this, we need to get the quantile function, and transform the quantile into a score.
using StatsBase
temperature_qf = ecdf(temperature_europe[salamandra])
score(x) = isnan(x) ? NaN : (x > 0.5 ? 1-x : 2x)
score (generic function with 1 method)
Based on this, we can transform the map of temperatures into a map of prediction:
temperature_prediction = similar(temperature_europe);
temperature_prediction.grid = score.(temperature_qf.(temperature_europe.grid));
One thing we need to do is to replace the positions with NaN
originally by
NaN
, because ecdf
does not play nicely with these (for now):
for idx in findall(isnan, temperature_europe.grid)
temperature_prediction.grid[idx] = NaN
end
heatmap(temperature_prediction, c=:viridis, clim=(0,1), frame=:box, size=(800,400), dpi=120)

Predicted occurrence based on temperature.
And here is a map. We can now write a function to make a prediction based on a layer, which will look a lot like this:
function bioclimpred(layer::T, df::D) where {T <: SimpleSDMLayer, D <: DataFrame}
pred = similar(layer)
qf = ecdf(layer[df])
pred.grid = score.(qf.(layer.grid))
for idx in findall(isnan, layer.grid)
pred.grid[idx] = NaN
end
return pred
end
bioclimpred (generic function with 1 method)
Note how most of the lines are either to prepare the objects, or to fix the
behavior of ecdf
. Pretty neat! This is bioclim, in about three lines of
relevant code.
Bringing it all together
The final step is to take these predictions over multiple layers, and take the
minimum. To make our lives simpler, we will write a min
method for pairs of
layers:
function Base.min(x::TX, y::TY) where {TX <: SimpleSDMLayer, TY <: SimpleSDMLayer}
SimpleSDMLayers.are_compatible(x,y)
m = similar(x)
for i in eachindex(m)
if !isnan(m[i])
m[i] = min(x[i], y[i])
end
end
return m
end
We can now formulate the predictions for temperature and precipitation, and take their minimum:
pred = minimum(
bioclimpred(temperature_europe, salamandra),
bioclimpred(precipitation_europe, salamandra)
)
plot(pred, c=:plasma, frame=:box, clims=(0,1), size=(800,400), dpi=120)

Predicted occurrences based on two bioclimatic variables.
Oh, and it’s rather fast – generating a prediction for the entire world is under 0.2 second, before we even thought about possible optimizations.
Good enough. Now, we will need to download some bioclimatic data, so let’s get started. As an illustration, we will get only the temperature and precipitation from the worldclim 2.0 database:
@time bioclimpred(temperature, salamandra);
0.159803 seconds (2.52 k allocations: 77.434 MiB, 8.71% gc time)
What have we learned?
Sometimes writing a little bit of glue code (all our Base.xxx
methods) pays
off immensely when creating more complex pipelines. One of my favorite part of
Julia is that it allows, and even encourages, this approach, where it is
possible to make any new structure mimic the syntax we would have to use with
the core of the language. It just flows. This improves the readability of the
code, and also the potential for re-use.