The many flavors of (research) computing

What do you call using a computer to do science?

This sounds easy, but it’s not. Because there are many, many ways to use a computer in the pursuit of science, and it is probably a good idea to find the correct way to call oneself.

I frequently self-style as a “computational scientist”, by which I mean specifically that I try to link data and models, through the use of code and profanity. But there are times where I work primarily on data (which is a different culture entirely), or when I do data-free simulations. These different approaches should, ideally, have different names.

My personal preference is to use research computing for almost everything. This is a parallel to advanced research computing, which is the new high-performance computing, which itself is the new supercomputing, and they’re all basically the same in that they mean that you use a machine worth more than the building it is hosted in. The nice emerging property of using research computing is that you can do it in a way that is not advanced (so it neatly separates applications requiring many tasks), and you can do it on any hardware that you like.

As I said, I use research computing for almost everything; as long as it relies on custom-written code. Using ready-made software, even with terabytes of data, even on a supercomputer, is a different exercise entirely, and should be called something else. Research computing (and everything that falls under this umbrella), is a developer thing. We already have a name for researchers who are users of software (the word, of course, is researchers). For example, accruing thousands of core-hours aligning genomes is not research computing if you inherit an existing (modulo a few configuration files) pipeline – and this is not a judgment about the worth of the activity!

So, computational science, then, what is it?

In @PoisLaBr19, we define computational science (within the relatively specific context of ecological research) as a family of practices where algorithms, software, data management practices, and advanced research computing are put in interaction with the explicit goal of solving “complex” problems – the definition we build in this article is inspired by @Pape96 more than by recent considerations, but alternative definitions usually emphasize the interaction between tools, and the reliance on algorithmic to solve problems; if possible, computational science solves the entire family of problems through cleverly designed (or just plain lucky) abstractions.

There is a clear programming/hardware component to computational science, but it comes after careful conceptualization of the problem – this may be a testament to my programming skills more than anything else, but I am convinced that some of my best work of this type is done on paper. A good example (of other people’s work) is the use of symbolic regression by @ChenAngu16 to reveal ecological dynamics. The mathematical side of this article is far more impressive than the actual computation I guess it took to actually do it. And the resulting method is relatively general, making it a shining example of computational science.

Ultimately, this distinction is only really important when trying to position yourself in the research landscape; they are not scientific identities. On any given week, I will alternate between software user, and practitioner of different flavors of research computing (and sometimes of data science, which is a discipline in its own right). But they can help to delineate broad communities of practice, with different toolkits, interests, and cultures. In short, they can guide you when deciding who to talk to, and how to approach the conversation.