Getting started with Python for data analysis

Python books covers

A few days ago another friend asked me to recommend reading materials to get started with python. Yesterday, I saw this tweet.

"When you’ve written the same code 3 times, write a function. When you’ve given the same in-person advice 3 times, write a blog post" - David Robinson‏

So here I am, writing this blog post.

A short disclaimer: I like learning from books, but I know that it doesn't work for everybody. So my recommendations are, unsurprisingly, for books. Don't follow them if you don't like learning by reading.

If you have some coding experience and want to dive straight to doing data analysis with python you can skip this paragraph. If you are not so confident with your programming skills, and want to take it slowly, the Quick Python Book, by Naomi Ceder, is highly recommended. My first steps with python were with the 2nd edition of the same book, and here are my thoughts on it. In general, it's a bit wordy but the explanations are very clear and to the point. In fact, being a bit verbose is probably what you want anyway if coding is new to you. Note that a new edition is about to come out, and an ebook version is already available. If you are looking for a shorter and faster introduction to python take a look at Learn X in Y minutes. It is especially good for those who are already familiar with coding but are new to python.

To get familiar with the python scientific stack I can highly recommend going over the first section of the Scipy Lecture Notes. Their short intro to the language (chapter 1.2) is also great. With this resource one can learn numpy and matplotlib relatively fast. The last crucial building block in the python scientific stack that you must learn is pandas. The canonical book for pandas is Python for Data Analysis by Wes McKinney, the creator of the package. Personally, I didn't like this book so much. On the other hand, I really like everything coming from Jake Vanderplas, and his book, the Python Data Science Handbook contains a chapter about pandas. I didn't read it, but from my familiarity with his writing and from online comments I saw I think that I can stand behind this recommendation.

After that stop reading, and start to solve some real world problems!