Data training: community over institution

A few months ago, I was invited to an event organized by the Mozilla Science Labs, and I gave a short presentation on How to train data-driven ecologists. One of the key points in my presentation was the importance of community over institution, or broadly speaking, how to leverage existing training material that has been developed by the community.

I think most ecologists will agree that (i) we need to improve the data literacy of students (graduates and undergraduates), and (ii) it takes time away from other classes. Part of the solution is to build in more data management skills in all classes, but this will only solve part of the problem. Data literacy is so important that, past a given point (in my opinion, the moment you decide to go to grad school), it requires a class of its own.

By data literacy, I mean the entire family of skills, knowledges, and practices allowing students to manage, input, manipulate, and evaluate data; either their own, or data from external sources. Good science relies on good data, and good data relies on good data management.

Developping a new class is time consuming. And very few faculty have received any formal training in data literacy, which places us in the delicate situation where no one teaches the teachers, with the predictible result that data literacy is often confined to footnotes in statistics classes.

Wait. No one? Not quite.

There are a lot of initiatives, driven by the community, dedicated to providing data training. Data Carpentry, or DataONE, for example, provide free and libre content that can be used to build classes (in the interest of full disclosure, I am involved in both projects). The material for both projects is written by multiple community members, peer reviewed (read more about the DataONE peer review experience here), and frequently updated based on instructors and learners feedback.

A large part of my teaching material re-uses elements from both projects. Not because it saves me time (it really doesn’t, since I have to make the material fit the entire class), but because I trust this material to represent a consensus of good practices in the community. For a lot of topics, it makes sense to have classes specific to a single institution (or rather, to an instructor). But for methodological elements that underpin the entire research activity?

Trust the crowd.