Course Description

With a vast amount of information now collected on our online and offline actions — from what we buy, to where we travel, to who we interact with — we have an unprecedented opportunity to study complex social systems. This opportunity, however, comes with scientific, engineering, and ethical challenges. In this hands-on course, we develop ideas from computer science and statistics to address problems in sociology, economics, political science, and beyond. We cover techniques for collecting and parsing data, methods for large-scale machine learning, and principles for effectively communicating results. To see how these techniques are applied in practice, we discuss recent research findings in a variety of areas. This course was previously listed as MS&E 331.

Prerequisites: An introductory course in applied statistics, and experience coding in R or Python.

There is a $25 course materials fee for running experiments on Mechanical Turk.

Sharad Goel ()
Imanol Arrieta Ibarra (TA) (email)
Jongbin Jung (TA) (email)
Class: Mondays & Wednesdays @ 1:30 - 2:50 in Thornton 110
Lab Section: Wednesdays @ 3:00 - 4:20 in Thornton 110

Office Hours
Mondays 3 - 5pm in Huang 356 (Sharad)
Tuesdays 10am - 12pm in Huang B011 (Jongbin)
Wednesdays 4:30 - 6:30pm in Huang 203 (Imanol)
Thursdays 10 - 11am in Shriram 102 (Imanol)
Fridays 10 - 11am in Y2E2 105 (Jongbin)

During the first week of school, there is no lab section and office hours are by appointment only.

On Sunday, Oct. 1, we will hold optional (but highly recommended) crash courses on R (10am - 12pm) and Python (1 - 3pm) in Thornton 110. This is an interactive session, so please bring your computers, and have RStudio (including R) and Python 3.6 installed.

We use Piazza to manage course questions and discussion, and Canvas to submit assignments. Code examples are posted on GitHub.

Computing Environment

A Unix-like setup is required (e.g., Linux, OS X, or Cygwin). We primarily use R (RStudio is recommended) and Python 3.6 (Anaconda Python is recommended). We use the Tidyverse suite of packages in R for data manipulation and visualization. We also use Vowpal Wabbit (a fast online learning algorithm), and Amazon Elastic MapReduce (a web service for distributed computing).

4 assignments (60%)
Project proposal (10%)
Final project (25%)
Participation (5%)