Course Description

With a vast amount of information now collected on our online and offline actions — from what we buy, to where we travel, to who we interact with — we have an unprecedented opportunity to study complex social systems. This opportunity, however, comes with scientific, engineering, and ethical challenges. In this hands-on course, we develop ideas from computer science and statistics to address problems in sociology, economics, political science, and beyond. We cover techniques for collecting and parsing data, methods for large-scale machine learning, and principles for effectively communicating results. To see how these techniques are applied in practice, we discuss recent research findings in a variety of areas.

Prerequisites: An introductory course in applied statistics, and experience coding in R or Python. The class is currently oversubscribed. Please complete this short course application by Monday, Sep 23, 11:59pm. Decisions will be announced before the first lecture on Tuesday.

There is a $25 course materials fee for running experiments on Mechanical Turk.

Sharad Goel ()
Scott Jespersen (TA) (email)
Zhiyuan “Jerry” Lin (TA) (email)
Class: Tuesdays & Thursdays @ 3:00 - 4:20 in Thornton 110
Lab Section: Thursdays @ 4:30 - 5:50 in Thornton 110

Office Hours
Mondays 3 - 5pm in Huang B007 (Jerry)
Tuesdays 4:30 - 6:30pm in Huang 251 (Sharad)
Wednesdays 10am - 12pm in Huang B016 (Scott)

During the first week of school, Scott and Jerry will hold office hours by appointment only.

The first two lab sections (on Sep. 26 and Oct. 3) will run for two hours, from 4:30 - 6:30, and will be crash courses in Python and R. These are optional but highly recommended. These are interactive session, so please bring your computers and have RStudio (including R), Python 3.7, and JupyterLab installed. Instructions for installing JupyterLab (with the R kernel) are here.

We use Piazza to manage course questions and discussion, and Canvas to submit assignments. Code examples are posted on GitHub.

Computing Environment

A Unix-like setup is required (e.g., Linux, OS X, or Cygwin). We primarily use R (RStudio is recommended) and Python 3.7 (JupyterLab is recommended). We use the Tidyverse suite of packages in R for data manipulation and visualization. We also use Vowpal Wabbit (a fast online learning algorithm), and Amazon Elastic MapReduce (a web service for distributed computing).

4 assignments (60%)
Project proposal (10%)
Final project (25%)
Participation (5%)