Course Description

An increasing amount of data is now generated in a variety of disciplines, ranging from finance and economics, to the natural and social sciences. Making use of this information requires both statistical tools and an understanding of how the substantive scientific questions should drive the analysis. In this hands-on course, we learn to explore and analyze real-world datasets. We cover techniques for summarizing and describing data, methods for statistical inference, and principles for effectively communicating results.

Prerequisites: MS&E 120 or equivalent, and CS 106A or equivalent

Please take note of the following two course policies.

  1. On-time attendance at lectures is required, and attendance at discussion sections is encouraged. Our aim is to create a collaborative and supportive learning environment. One of the best ways to learn the course material is to engage with the lectures by asking questions. If you need to miss a class (e.g,. for an illness or sporting event) or will be late, please Sharad prior to the lecture. In-class attendance checks may be periodically carried out throughout the quarter.

  2. Please do not use electronics (laptops, tablets, phones) during lectures. But please bring laptops to the Thursday discussion sections, as there will be in-class coding and analysis. See here and here on why we institute this policy. (We're happy to make exceptions in special circumstances.)

We encourage you to attend our crash course on R during discussion section on Thursday, January 9. You can view the R course materials here.

Sharad Goel ()
Linjia Wu (TA) ()
Madison Coots (TA) ()
Yanlin Qu (TA) ()
Class: Tuesdays & Thursdays @ 1:30 PM - 2:50 PM in 300-300
Discussion Section: Thursdays @ 3:00 PM - 4:20 PM in Shriram 104

We use Piazza to manage course questions and discussion. Please sign up here.

Office Hours
Mondays @ 5 PM - 7 PM in Y2E2 253 (Linjia)
Tuesdays @ 3 PM - 5 PM in Huang 251 (Sharad)
Wednesdays @ 10 AM - 12 PM in Huang B016 (Madison)
Thursdays @ 4:30 PM - 6:30 PM in Y2E2 382 (Yanlin)

There are no regular office hours during the first week of class, but feel free to schedule an appointment if you would like to meet.

[ Optional ] Textbooks
All of Statistics by Larry Wasserman (available online)
R for Data Science by Garrett Grolemund and Hadley Wickham
Statistics by David Freedman, Robert Pisani, and Roger Purves
Natural Experiments in the Social Sciences by Thad Dunning
Computing Environment
We primarily use R (RStudio is the recommended interface), including the suite of tidyverse packages.
8 homework assignments (50%)
Final exam (25%)
Final project (20%)
Attendance and participation (5%)