Source

Python for Scientists

This is a guest post by Brian David Hall from codenhance.com. He develops tutorials and resources to help scientists learn to code, including his latest Kickstarter project, Learn the Command Line … for Science!

Spoiler alert: in this post, I will argue that Python is the best programming language for scientists to learn. Forget Perl, Java, FORTRAN, IDL, or whatever else they were pushing when you got your degree. This type of discussion is usually emotionally charged – many a flame war has been fought over programming languages. Feelings will get hurt. Heads will roll. But first …

Do scientists even need to learn to program?

No. Not necessarily. As counseled by Jeff Atwood in “Please Don’t Learn to Code”, it’s better to focus on solutions rather than methods. A scientist who already has a love-hate-but-mostly-hate relationship with computers would do better to leave the programming to the programmers. This way, the code gets written, the research gets completed, and the scientist herself never has to debug a single syntax error.

That said, scientists who are prepared and motivated to add programming to their skillset couldn’t ask for a better language than Python. This post will outline some of the advantages of learning Python and some resources for getting started. But first …

Does it even matter what language you learn?

No. Not necessarily. Most experienced programmers are proficient in several languages, and are able to become productive in a new language over the course of a long weekend. Obviously there’s more to learning programming than memorizing the syntax of a single language. That said, Python offers a number of distinct advantage to the beginner.

The universe wants you to learn Python

One of the greatest strengths of Python is the vibrant community and wealth of high quality training materials available. Want to try Python out without installing anything, just to get a feel for it? Codecademy has you covered. Would you rather watch video lectures and follow a semester format? Coursera has a free course that lets you do just that.

Suppose you’re more of a textbook learner. You’ve got a ton of options, including:

Even better, all of these texts are free to read online.

The universe wants you to be productive in Python

It’s not just the learning materials that make Python such a great choice for scientists learning to code. There’s also a wealth of free packages, tools, and documentation for every field of science you can imagine.

For astronomers, there’s Astropy, which contains modules for astronomical constants, coordinate systemsmodel fitting, and more. As an example, here’s a script that downloads a FITS file containing spectral data on the Horsehead Nebula and plots it. Five imports and five lines of code are all it takes:

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

from astropy.utils.data import download_file
from astropy.io import fits

image_file = download_file(
        'http://data.astropy.org/tutorials/FITS-images/HorseHead.fits', 
         cache=True)
image_data = fits.getdata(image_file)

plt.imshow(image_data, cmap='gray')
plt.colorbar()
plt.show()

Here’s what the image looks like:

Horsehead Nebula

The full tutorial is here if you want to do more!

For biologists, there’s Biopython. Want to read genetic sequence data? Parse BLAST output? Read from a protein database? Produce a random genome? Biopython’s got you covered.

There are also a number of tools that are useful for all scientists, regardless of their specific field. Machine learning touches nearly every data-driven scientific discipline, and of course you can do it with Python – just take a look at the beautiful documentation for the scikit-learn package. Need to produce publication-quality visualizations? You’re guaranteed to find a walkthrough for the chart you need in the thoroughly-documented Bokeh library.

The universe will thank you for using Python

Python is an open source language*, and so are the libraries mentioned above. As a scientist, this should matter to you. Using open source tools that are widely available is a great way to ensure that your experiments are easily reproducible.

Compare this with the experience of a researcher who would like to extend your experiment, but must first purchase an expensive license in order to recreate your programming environment. This is exactly the case in much of astronomy, with many legacy projects locked into the proprietary IDL language. Working astronomers are largely migrating to Python in order to facilitate collaboration. It’s the future!

Everyone else is doing it

Python’s already in use at Google, NASA, the National Weather Service, even the Los Alamos National Laboratory (LANL) Theoretical Physics Division (longer list here). Not to mention nearly a million projects on GitHub. This means that the strengths mentioned above – the high quality training materials, useful libraries, and active community – will only grow with time. By embracing Python, a scientist enters into a movement toward openness and simplicity in computational science that is already changing the world around us. If you’re a scientist looking to learn how to code, I’d love to hear from you in the comments. What’s the nature of your research? What will learning Python allow you to do? How can I help you get started?

*Technically an open source implementation of a language.

Newsletter

×

If you liked what you read then I am sure you will enjoy a newsletter of the content I create. I send it out every other month. It contains new stuff that I make, links I find interesting on the web, and occasional discount coupons for my book. Join the 5000+ other people who receive my newsletter:

I send out the newsletter once every other month. No spam, I promise + you can unsubscribe at anytime

✍️ Comments

Be the first to leave a comment! 🎉

Say something

Send me an email when someone comments on this post.

Thank you!

Your comment has been submitted and will be published once it has been approved. 😊

OK