Course Description

An increasing amount of data is now generated in a variety of disciplines, ranging from finance and economics, to the natural and social sciences. Making use of this information requires both statistical tools and an understanding of how the substantive scientific questions should drive the analysis. In this hands-on course, we learn to explore and analyze real-world datasets. We cover techniques for summarizing and describing data, methods for statistical inference, and principles for effectively communicating results.

Prerequisites: MS&E 120 or equivalent, and CS 106A or equivalent

Sharad Goel ()
Alex Chohlas-Wood (TA) ()
Josh Grossman (TA) ()
Jerry Lin (TA) ()
Class: Tuesdays & Thursdays @ 10:30 AM - 11:50 AM PT (online)
Discussion Section: Thursdays @ 12:30 PM - 1:50 PM PT (online)
[ Zoom links posted on Canvas. ]

Office Hours
Mondays @ 4 PM - 6 PM PT (Alex)
Tuesdays @ 4:30 PM - 6:30 PM PT (Sharad)
Wednesdays @ 10 AM - 12 PM PT (Jerry)
Wednesdays @ 6:30 PM - 8:30 PM PT (Josh)

Lectures and discussion sections will be recorded to ensure all students have access to the materials, regardless of timezone and internet connectivity. But please make every effort to attend lecture (and, ideally, discussion section) if you are able. Both the lectures and the discussion sections are interactive, so the learning experience is best when you attend, and regular attendance also helps foster a sense of community. If you cannot attend the lectures live due to an extenuating circumstance (e.g., due to timezone constraints or personal obligations), please submit a short explanation.

If you would like to request some music to play at the beginning of class, please fill out this form!

Office hours are a great opportunity to discuss not only topics directly related to the course, but also anything else that's on your mind beyond the class, including, for example, questions about career trajectories, and research opportunities in MS&E and in the Computational Policy Lab. Please note that there are no regular office hours during the first week of class, but feel free to schedule an appointment if you would like to meet.


We use Piazza to manage course questions and discussion, and you can sign up here.

It is our intent that students from all backgrounds and perspectives be well served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength, and benefit. We aim to present materials and conduct activities in ways that are respectful of this diversity. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups.

You may use our (anonymous) comment box to let us know which aspects of the class are going well and which could be improved.

We encourage you to work together in groups to solidify your understanding of the course material. If you would like assistance forming a study group, please complete this form by Thursday, January 14 at 9pm PT. Our goal is to form the study groups the following day, so students can begin discussing the first homework assignment.

[ Optional ] Textbooks
All of Statistics by Larry Wasserman (available online)
R for Data Science by Garrett Grolemund and Hadley Wickham
Statistics by David Freedman, Robert Pisani, and Roger Purves
Natural Experiments in the Social Sciences by Thad Dunning

All of the key resources for this class are avilable online, free of charge. However, please note that the department has created a new Opportunity Fund through which students may request financial assistance to purchase any necessary course materials.

Computing Environment
We primarily use R (RStudio is the recommended interface), including the suite of tidyverse packages.
6 homework assignments (50%)
2 quizzes (25%)
Project proposal + final project (20%)
Attendance and participation (5%)