A successful failure at identifying camera traps pictures using deep learning

Over the past few months, Andrew MacDonald and I spent quite a lot of time figuring out a way to automate the identification of animals, based on pictures from a large survey using camera traps. Without going into too much details, we have been “partly succesful”, and there are some important lessons to be gained here. Of the project itself, I will not talk a lot, as this was contract work.

The set of images came from a region in which our approach (using a ConvNet to classify camera traps images) has never been attempted. This proved challenging for two reasons. First, the taxonomic cover of the pre-trained model was only partially overlapping with the species we knew were likely to be present. Second, the background of the images was different from what other dataset had (including striking seasonal differences). We were very much attempting to extrapolate out of the bounds of the model.

In fact, the landscape in which we worked was different both from the pre-trained biodiversity-specific models, and from the images that are in general purpose databases like WordNet. This created a range of problems ranging from amusing (frozen trees classified as caroussels or swings) to baffling but in an interesting way (woods classified as black bears, regardless of whether a black bear was actually in the picture).

The problems introduced by the different backgrounds, coupled with partial taxonomic coverage, were compounded by the fact that prevalence (i.e. the fraction of pictures in which there actually were animals) was really low. During a pilot using two cameras, out of 500 pictures, 6 had animals (and 12 had undergraduates roaming the woods during the forest ecology field class). Working on a dataset with a few hundred thousands of pictures, the low prevalence became a more complex issue (as the cost of mis-classifying an image rose, because there need to be human validation in the end).

So why was this failure a success? Well, we (and by we I mean mostly Andrew) managed to classify a good proportion of pictures with a confidence over 80%, and this winnowed the dataset from 10⁵ total images to a little under 6000 pictures of interest (which, although not fun, is human-doable). But the most successful part is that this experience gave us a lot of insights about what to do next. Quite obviously, the answer to almost all our problems is to develop a specific training set of images. This was what we were afraid of, in the beginning; but it seems to be that, in order for neural networks to efficiently impact this problem, there is no choice other than training them de novo.

This was also a succes because we now have this nice cautionary tale about over-reliance on machine learning and machine learning adjacent techniques in biodiversity. For most “novel” applications, I do not think the tools are consumer-ready yet. Instead, they will require some adaptation to the specific problems, and this calls for biologists with a specific profile, able to build bridges between data science and domain knowledge.