Collecting Data from the Internet and Government Documents

A short course on collecting data from the internet, including accessing APIs and web scraping, using Python. While the course was taught in person, the materials are designed for self-paced, independent learning for social scientists with no background in Python. Each lesson includes exercises that can be completed within the notebook, along with answers.

Directions (starting from scratch):

  1. Read the setup notebook online. Follow the directions for installing Python.
  2. Download this repository by clicking on the green “Clone or Download⌄” button above. You may need to unzip the folder, depending on your operating system.
  3. Using the instructions in the setup file, start the Anaconda Navigator program, launch a Jupyter notebook, and navigate to the “Notebooks” folder that you downloaded in Step 2.
  4. The first two notebooks (2_Python.ipynb and 3_Data.ipynb) provide an introduction to working with Python.
  5. The other numbered notebooks are the materials that were covered in class.
  6. The Bonus notebooks detailed some additional techniques for data collection.

The lessons can also be completed entirely online without installing anything on your computer.

Neal Caren
Associate Professor of Sociology

My research interests include social movements, protest events, web scraping, and text analysis.