Social Science Curriculum for Data Science

My colleague Matt Blackwell has organized a Bok Exploratory Seminar on “Improving the Social Science Curriculum for Data Science” for this week. He poses several interesting questions. But, before turning to them, we must first decide how many methods courses we are going to require for Government concentrators. There are three options:

  • Zero Courses: This is not unreasonable. Many concentrations at Harvard — English, History, Computer Science — have no such requirements, even though research in those fields sometimes (often?) makes use of quantitative techniques. Political science majors at Williams and Yale are not required to take a methods course.

  • One Course: As previously discussed, the Department currently requires students to take one class, chosen from STAT 100, STAT 104 and GOV 50. This is similar to the requirements in other social sciences at Harvard, including psychology and sociology.

  • Two Courses: I would wager that, within 20 years, the Government Department will require its concentrators to take two methods courses, just as Harvard Economics does today. If that is a reasonable forecast, then the best decision would be to move there now.

With this as background, here are my initial thoughts on two of Matt’s questions:

How can we integrate data science into the overall Government concentration?

We could start by integrating/requiring more data science in our required classes. The Harvard Economics Department does more than require two methods courses. Each student must take a sophomore tutorial, a class which requires STAT 104 (or the equivalent) and for which the primary requirement is an empirical research paper. Part of the Sophomore Tutorial is a mini-tutorial titled “Introduction to Stata and R for Economists.” The Government Department currently has nothing like this.

Tl;dr: The best way to make data science a bigger/better part of the Government concentration would be to require two methodology courses in the concentration and to encourage/require empirical research as a part of other required courses like Sophomore Tutorial.

What set of classes should we offer?

Simple Options

If we stick with the current single course requirement, then STAT 104 or GOV 50 does a reasonable job.

If we switch to a two course requirement, then we should probably follow Economics and have a second class which builds off of STAT 104. Indeed, delegating the teaching of intro stats to the Statistics Department while maintaining control of the second course has a lot of appeal. We could even mimic Economics further. After STAT 104, they allow students to choose between two courses: ECON 1123 and 1126, the latter much more technical. We could follow a similar path, requiring STAT 104 and then allowing students to choose between a new (more in depth) version of GOV 50 and GOV 51, a more mathematical version which would set the stage for study in our graduate courses.

I think that our current series of methodology courses at the graduate level — 2000/2001/2002/2003 — is excellent and unlikely to change, regardless of what new emphasis we place on data science.

More Aggressive Options

  • We could combine undergraduate and graduate classes. Consider the core sequence for the data science track in the undergraduate statistics concentration and the core sequence in the masters in data science. They are the same courses! (This is not completely true. CS 109A/STAT 121A is required in the first, but the vast majority of undergraduate students on this track also take CS 109B/STAT 121B. In the masters program, these same courses are called AC 209A and AC 209B. They have the same lectures, but the graduate courses have harder problem sets.)

The Government Department could do the same, although the details might be trickier. We already do this to some extent, allowing undergraduates to register for some of the classes in the graduate sequence under different course numbers. We would have GOV 1000/2000, GOV 1001/2001, and so on, all sharing the same lectures and the same empirical questions in problem sets, but with the graduate versions requiring more of a math background and employing extra questions on problem sets and exams involving proofs and other mathematics.

Every student (except for the political philosophers?) would be required to take 1000/1001 and 2000/2001. Students who took all four methodology courses would be granted the data science certificate.

  • Rethinking the Foundations of the Data Science Program

The current Data Science Program is (excessively?) ecumenical when it comes to the meaning of the “Foundations of Data Science.” Three very different courses — GOV 2000, STAT 121 or STAT 139 — are allowed to fulfill it. Although there is some overlap among these courses, their differences are much more important than their similarities. For example, students in STAT 121 do extensive empirical work in Python with a little associated theory. Students in STAT 139 do extensive theoretical work, with a little associated empirical work in R. Both are excellent courses! But, given that they are offered by the same department, it is hardly surprising that they are so different. Can they both serve as the “foundation” for our Data Science Program?

One fix is to rethink GOV 2000 and, truly, make it a foundation course, serving both as the core component of the undergraduate data science program and as the required graduate course in methodology. GOV 2000 might already fulfill this role. Perhaps the option of STAT 121/139 is merely meant to make it easy for one-time statistics concentrators to move over to gov.

I hope to have further thoughts once the seminar is complete.

David Kane
Preceptor in Statistical Methods and Mathematics
comments powered by Disqus