I've noticed a desire (sometimes a need) among researchers in the social sciences to analyze data that doesn't always fit nicely into an Excel spreadsheet or a Stata table. Sometimes this includes data spread across thousands of text documents, or other times strewn across the Web. What started as answering a few questions here and there about regular expressions and web scraping turned into more and more emails and requests for example code, so I eventually thought it would be useful to put many of the strategies we were talking about into one place. It is my hope that this information will continue to be useful to others.
The first half of the book is intended to make the reader comfortable with the basics of Python programming, and the second half goes into more advanced applications relating to web scraping and textual processing. The book assumes no prior knowledge, though it provides tie-ins when appropriate to other scripting languages readers may be familiar with, such as Stata.
Click below to read the booklet. Here are the files for the example exercises.