Problem Sets and Exams

This post provides details on problem sets and exams in Gov 1005.


All problem set and exam solutions are submitted via Github Classroom. For problem sets, you will create, at least, two new files: ps_N.Rmd and ps_N.html, where N is replaced by the number of the problem set. You must use exactly these names. Replace “ps” with “exam” for exam submissions. Your projects may also contain other files, either distributed by us as part of the assignment or added by you.

  • The two documents you are submitting are very different.
    • The Rmd file is a technical document, an accurate record of your work which allows you (and us!) to reproduce your html easily. It should be well-organized, nicely formatted and clean. Non-technical readers will not understand it, but that is OK.
    • The html file is a presentation document, designed for non-technical readers. No R code or weird warnings or obscure messages mar its pleasing appearance.
  • It must be possible for us to replicate your work. That is, we will clone your repo and open your Rmd file. When we knit it, we should produce your html. If we can’t, we will take off points. (The Course Staff will be happy to test your work before you submit it. Visit them during Study Hall!)

Question Types

There are three main types of questions on the problem sets: Mad Libs, tables, and graphics. For these, you do not write any prose. Some exam questions require a paragraph or so of explanation.

Mad Libs

A Mad Libs style question provides a sentence with an X which you must replace with the correct answer. For example, the question might state:

The state with the most rows is X. (format state like Massachusetts, not MA)

You copy/paste that sentence as your answer, but replace the X with inline R code that determines the correct replacement for X dynamically. Do not include the words in the parentheses. They are there for explanation. Do not simply copy/paste the correct answer. In your Rmd, you might write:

The state with the most rows is `r x %>% group_by(state) %>% count() %>% arrange(desc(n)) %>% slice(1) %>% pull(state)`.

When you knit your Rmd file, this will turn into:

The state with the most rows is Massachusetts.

This is (you hope!) the answer that we are looking for.

Obviously, x needs to be a tibble which you have already created and which has state as a variable name. Sometimes, so much code is needed to answer the Mad Lib that it is placed in its own code chunk, with the answer saved as an object.

y <- x %>% 
  group_by(state) %>% 
  count() %>% 
  arrange(desc(n)) %>% 
  slice(1) %>% 

But that object is still placed in the inline code:

The state with the most rows is `r y`.

This second method — doing all the calculations in a separate code chunk and then just referencing the final answer inline — is almost always best, not least because it makes the code much more readable.

Late Days

You may use your late days on the problem sets, with a maximum of 1 late day per problem set. When using GitHub, there is no “submission”" button. Rather, we download the latest commit you’ve pushed as of 11:55 PM on Wednesday and grade that. If you want to use a late day for a problem set, email Alice before the due date. Otherwise, we will grade your latest commit as of the deadline.

Late days may not be used on exams. We grade whatever is in your Github repo as of the deadline.


Always list, at the very end of the problem set, the names of any students with whom you worked on the problem set. If there were none, write None. We define “worked with”" very broadly. It would certainly include anyone you sat next to or across from at a Study Hall, even if you only exchanged a few words.

Grading Rubrics

In addition to the above directions, the grading rubrics below apply to problem sets and to the final project in Gov 1005.

  • Make sure you follow all of our instructions.

  • Follow the Tidyverse Style Guide. Here are some portions of the Guide that have tripped up students in the past: strive to limit your code to 80 characters per line; each line of a comment should begin with the comment symbol and a single space: #; %>% should always have a space before it, and should usually be followed by a new line. But those are just the most common mistakes. The section on spacing is also important.

  • Avoid the hacks of message=FALSE and warning=FALSE in your R code chunk options. Figure out what the problem is and make it go away. Don’t close your eyes (metaphorically) and pretend the problem doesn’t exist. If you use these hacks, you must make a code comment directly below them, explaining their purpose.

  • Ensure that your repo is clean (no unnecessary files).

  • At least 5 commits with sensible commit messages, i.e., not “stuff” or “update.”

  • Once we download your repo, can we replicate your work easily? When we knit your .Rmd file, does your code throw an error (for example, by referencing a file you have locally but which you didn’t push to GitHub)? (It is OK if you use a library which we need to download, but your Rmd better include all the necessary library() commands.)

  • Make your code readable. Formatting matters.

  • Include comments in your code. Rough guideline: You should have as many lines of comments as you have lines of code.

  • Make your comments meaningful. They should not be a simple description of what your code does. The best comments are descriptions about why you did what you did and which other approaches you tried or considered. (The code already tells us what you did.) Good comments often have a “Dear Diary” quality: “I did this. Then I tried that. I finally chose this other thing because of reasons X, Y and Z. If I work on this again, I should look into this other approach.” Because of this, the structure is often a several lines of comments followed by several lines of code. But this is not required.

  • Code comments must be separated from code by one empty line on both sides.

  • Format your code comments neatly. Cmd-Shift-/ is the easiest way to do that.

  • Name your R code chunks.

  • Spelling and punctuation matter.

  • Use captions, titles, axis labels and so on to make it clear what your tables and graphics mean.

  • Anytime you make a graphic without a title (explaining the graphic is), a subtitle (highlighting a key conclusion to draw), a caption (with some information about the source of the data) and axis labels (with informations about your variables), you should justify that decision in a code comment. I (try to) always include all three but there are situations in which that makes less sense. Ultimately, these decisions are yours, but we need to understand your reasoning.

  • Use your best judgment. For example, sometimes axis labels are unnecessary. Read Data Visualization: A practical introduction by Kieran Healy for guidance on making high quality graphics.

Grading Rubrics for the Final Project

These are in addition to the usual rubrics above.

  • It is not enough to simply use an already-assembled data set. Instead, you must combine data from two or more different sources. Looking at your data-munging code will confirm for us that you have made an actual contribution to human knowledge. Imagine that your roommate also cares about soccer/wine/politics/whatever. You are building something that would interest her, something that will make her say, “That is cool! Let’s spend 30 minutes poking around with your data.” Projects without at least 10,000 data points are unlikely to be interesting enough, but feel free to convince us otherwise. Projects must feature some statistical modelling, most commonly a regression.

  • You should have a one and four sentence summary of your project memorized for Demo Day. The one sentence summary is for listeners who want the briefest possible pitch. The four sentence summary is for those who want more details. Both should be smooth and persuasive. People are busy. Why should they pay attention to you?

  • Your repo must be public.

  • Your repo must contain a professional README.

  • We will look at (and grade) your code in conjunction with the Demo Day evaluation.

  • Give your repo and Shiny App a descriptive name. “Vaccine-Explorer” or “syrian_civil_war” is good. “Gov_1005_Final_Project” or “project_test” is not.

  • The typical Shiny App will include three tabs. The “About” tab will provide background information about the project and your data. The second tab will display some data/graphics, and allow the user to make some choices. The display must change depending on user input. The third tab will be a detailed explanation of your model and what it means.

  • Apps should all have an “Info” or “About” tab which includes, your name, contact information, GitHub repo and data source information. Include other background information as you see fit.

  • Apps should “open” on an interesting tab, which will usually not be the “About” tab.

  • Apps should have at least one tab in which the user can select something and see a change.

  • Apps often have “story” tabs which, although they do not allow for user selections, do highlight specific aspects of the data which are interesting, and which users are unlikely to discover by themselves. They are not labelled with the word “story.”

  • All projects must include a 1 to 2 minute video in which you explain what you have found. This may be a video of you talking or a screen-based tour of some aspects of your app.

  • Fill out the Final Project Spreadsheet accurately. Failure to do so will cost you two points.

David Kane
Preceptor in Statistical Methods and Mathematics
comments powered by Disqus