In the Harvard American Politics summer reading group, we discussed “Legislative staff and representation in Congress,” by Hertel-Fernandez, Alexander, Matto Mildenberger, and Leah C. Stokes. American Political Science Review 113.1 (2019): 1-18 (pdf).
Footnote 27 was the cause of much merriment:
Two-tailed test: p = 0.098, one-tailed test: p = 0.049; clustered by office. That we find a significant effect at all given such a small sample size is striking; while our confidence intervals are necessarily wide given this small sample, we also note that, ultimately, it is the direction of the effect that matters most for our argument that interactions with campaign contributors may skew staffer perceptions of public preferences.
If you are uninterested in statistical nitpickery, read no further.
The context is this sentence:
This indicates that 45% of staffers agreed they had developed a new perspective about a policy after speaking with a group that provided campaign contributions to their Member.
Before diving into the statistical nonsense, it is worth understanding the authors’ ideological views, which they make fairly clear in their New York Times op-ed. Key sentences:
We found two key factors that explain why members of Congress are so ignorant of public preferences: their staffs’ own beliefs and congressional offices’ relationships with interest groups.
And if [Congressional] offices hear from only deep-pocketed interest groups, they are likely to miss out on the opinions of ordinary Americans.
The authors have serious doubts about how well US democracy functions, at least partly because campaign donors have too much influence. (So do I!) But . . .
First, why show us 3 decimal places in reporting those p-values? Obviously, to highlight that the results are (just barely!) significant at everyone’s favorite 5% and 10% cut-offs. In my experience, there is nothing that raises the suspicions of experienced readers more than p < 0.049 . . .
Second, it is suspect to report a one-tailed test result. (I would be sympathetic if that decision was made as a part of a pre-registration, but I don’t think that any aspect of the analysis was pre-registered. But that is a complaint for a different day.) I would wager that, if the two-tailed result had popped significant at the 5% level, they wouldn’t be bothering us with the one-tailed result. And recall the claim that “it is the direction of the effect that matters most.” If that is true, then they have no business using a one-tailed test which, by definition, assumes the very direction which they are trying to establish.
Third is the claim “That we find a significant effect at all given such a small sample size is striking . . .” Andrew Gelman calls this the “What does not kill my statistical significance makes it stronger” fallacy. See his Science article (pdf) with Eric Loken.
Our concern is that researchers are sometimes tempted to use the “iron law” reasoning to defend or justify surprisingly large statistically significant effects from small studies. If it really were true that effect sizes were always attenuated by measurement error, then it would be all the more impressive to have achieved significance. But to acknowledge that there may have been a substantial amount of uncontrolled variation is to acknowledge that the study contained less information than was initially thought. If researchers focus on getting statistically significant estimates of small effects, using noisy measurements and small samples, then it is likely that the additional sources of variance are already making the t-test look strong. Measurement error and selection bias thus can combine to exacerbate the replication crisis.
Instead of this nonsense, the authors should just admit that, although they have some suggestive evidence that staffers “had developed a new perspective about a policy” after meeting with donors, standard measures of statistical significance have not been met. (And, by the way, just what is so bad about learning a “new perspective?” I learn new perspectives from smart people all the time!)
They really wanted to find that campaign donors have a big influence on staffers. Having a small sample with noisy estimates made that easier to do, not harder.
A respect for an implicit Chatham House Rule suggests that I should not attribute these points to the specific speakers who made them, although readers are welcome to claim credit for themselves in the comments.