Fall 2018

GOV 1005: Data
M/W 1:30 – 2:45


Data matters. How much money is spent on US political campaigns? How does the Chinese government use social media? How many seats will the Democrats control in the Senate after the next election? How does Harvard College decide whom to admit? We need data to answer these questions.

This course will teach you how to work with data, how to gather information from a variety of sources and in various formats, how to import that information into a project, how to tidy and transform the variables and observations, how to visualize and model the data for both analysis and prediction, and how to communicate your findings in a sophisticated fashion. Each student will complete a final project, the first entry in their professional portfolio. Our main focus is data associated with political science, but we will also use examples from education, economics, public health, sociology, sports, finance, climate and any other subject area which students find interesting.

We use the R programming language, RStudio, GitHub and DataCamp. Although we will learn how to program, this is not a course in computer science. Although we will learn how to find patterns in data, this is not a course in statistics. We focus on practice, not theory. We perform empirical analysis rather than write mathematical derivations. We make stuff.

Spring 2018

Harvard Ec 970: Sophomore Tutorial

Syllabus and student evaluations

Elite Education: Rhetoric and Empirics

This course reviews the economic literature on elite colleges in the United States while studying the theory and practice of persuasion: using words, statistics and graphics to convince others, and yourself, of some claim. You will learn how to write persuasive essays, how to pick out the flaws in your opponent’s argument, how to shift the terms of a debate to your advantage and how to marshal statistics and graphics to your side. We will study topics like: Which students choose to apply to colleges like Harvard? Which are admitted? What influence do athletics, legacy, race and wealth play in admissions? How do students perform while at college? How much of a problem is grade inflation and what might be done about it? Do elite colleges increase or decrease economic mobility? How generous are alumni after graduation? Yet beyond the empirical question concerning how the world works today, we are also interested in discussing the normative questions surrounding how elite colleges ought to function tomorrow. Natural Philosophers, the classical name for Economists, have wrestled with questions of Rhetoric since Plato confronted the Sophists two thousand years ago. Our class will continue that conversation in the context of contemporary debates about elite education in the United States.

Spring 2016

Middlebury INTD 0318 : Quantitative Finance

Syllabus and student evalutions

This class will introduce students to applied quantitative equity finance. First, we will develop the technical skills needed to do serious research, the most important of which is proficiency with R and RStudio. Second, we will briefly review the history and approach of academic research in equity pricing via selected readings. Students will work as teams to replicate the results of a published academic paper and then extend those results in a nontrivial manner. This course is designed for two types of students: first, those interested in applied financial research, and second, those curious about how that research is used and evaluated by finance professionals.

Recent Posts

More Posts

Let’s continue our discussion about Richard Kerby’s data on race/gender diversity in venture capital from Sunday. I did a touch more cleaning of the data — exact details of which are left as an exercise for the reader — leaving us with a nice tibble. x <- read_rds(url("https://www.davidkane.info/files/blog_files/vc.rds")) x$title <- as.factor(x$title) I have assigned this object to x, which is my preferred name for whatever the main object of analysis is within an R session.


Richard Kerby wrote about diversity within venture capital. Interesting stuff. Even better, Kerby made his data public. Sadly, the data is a fairly non-tidy Excel file. Purpose of this post is to process it into something a little nicer. raw <- read_csv("https://www.davidkane.info/files/blog_files/kerby.csv", # Column names are a mess. So, after running the simple read_csv() # command on the raw file, I use spec(x) to get the default column # types and then use the col_types argument to set them by hand.


I made this SEC comment 12 years ago. Alas, no one cared then and no one cares now. Still think it is a great idea! The SEC should pass a regulation requiring that all publicly traded companies allow their shareholders to vote on the following (binding) resolution each year. “The total compensation of both the CEO and the CFO shall not exceed $1 million in the coming fiscal year.” Those who dislike government meddling in business have little to complain of here since the government isn’t telling any business how to set salaries.


11 years ago, R made its first appearence at Boston’s Fenway Park.


I discussed “Patient–physician gender concordance and increased mortality among female heart attack patients” by Brad N. Greenwood, Seth Carnahan, and Laura Huang (henceforth GCH) the other day. Now that their Supplementary Information pdf is available, it is easy to see that their claims about “quasirandom” assignment are nonsense. To review, GCH claim that female physicians are better at treating female heart-attack victims than male physicians are. Their evidence is that female patients have higher survival rates when treated by female physicians.