Problem Sets and Exams

This post details grading standards for problem sets and exams in Gov 1005 and Gov 1006.


All problem set and exam solutions are submitted via Github Classroom. For problem sets, you will create, at least, two new files: ps_N.Rmd and ps_N.html, where N is replaced by the number of the problem set. You must use exactly these names. Replace “ps” with “exam” for exam submissions. Your projects may also contain other files, either distributed by us as part of the assignment or added by you.

  • The two documents you are submitting are very different.
    • The Rmd file is a technical document, an accurate record of your work which allows you (and us!) to reproduce your html easily. It should be well-organized, nicely formatted and clean. Non-technical readers will not understand it, but that is OK.
    • The html file is a presentation document, designed for non-technical readers. No R code, messages, warnings, errors or obscure nonsense mar its pleasing appearance.
  • It must be possible for us to replicate your work. That is, we will clone your repo and open your Rmd file. When we knit it, we should produce your html. If we can’t, we will take off points. (The Course Staff will be happy to test your work before you submit it. Visit them during Study Hall!) The most common cause of a failure to knit is to not include/push necessary files in your repo. The second most common cause is hard-coded file paths which only work on your computer.

Question Types

There are three main types of questions on the problem sets: Mad Libs, tables, and graphics. For these, you do not write any prose. Other questions may require a paragraph or so of explanation.

Mad Libs

A Mad Libs style question provides a sentence with an X which you must replace with the correct answer. For example, the question might state:

The state with the most rows is X. (format state like Massachusetts, not MA)

You copy/paste that sentence as your answer, but replace the X with inline R code that determines the correct replacement for X dynamically. Do not include the words in the parentheses. They are there for explanation. Do not simply copy/paste the correct answer. In your Rmd, you might write:

The state with the most rows is `r x %>% group_by(state) %>% count() %>% arrange(desc(n)) %>% slice(1) %>% pull(state)`.

When you knit your Rmd file, this will turn into:

The state with the most rows is Massachusetts.

This is (you hope!) the answer that we are looking for.

Obviously, x needs to be a tibble which you have already created and which has state as a variable name. Sometimes, so much code is needed to answer the Mad Lib that it is placed in its own code chunk, with the answer saved as an object.

y <- x %>% 
  group_by(state) %>% 
  count() %>% 
  arrange(desc(n)) %>% 
  slice(1) %>% 

But that object is still placed in the inline code:

The state with the most rows is `r y`.

This second method — doing all the calculations in a separate code chunk and then just referencing the final answer inline — is almost always best, not least because it makes the code much more readable.

Late Days

You may use your late days on the problem sets, with a maximum of 1 late day per problem set. The official submission time of your problem set is when you submit the html on Canvas. If that is after 11:55 PM, you are charged a late day. When using GitHub, there is no “submission” button. Rather, we download the latest commit you’ve pushed as of 11:55 PM on whatever day you submitted your html. Failure to submit your problem set within one day means you get a zero for that problem set. However, you still have to submit it. (All work in this class must be done, even if you are taking the class pass/fail.) You are charged late days until you submit.

Late days may not be used on exams. We grade whatever is in your Github repo as of the deadline.


Always list, at the very end of the problem set, the names of any students with whom you worked on the problem set. If there were none, write None. We define “worked with”" very broadly. It would certainly include anyone you sat next to or across from at a Study Hall, even if you only exchanged a few words.

Grading Rubrics

In addition to the above directions, the grading rubrics below apply.

  • Make sure you follow all of our instructions.

  • Follow the Tidyverse Style Guide, unless it is contradicted by the instructions below. Here are some portions of the Guide that have tripped up students in the past: strive to limit your code to 80 characters per line; each line of a comment should begin with the comment symbol and a single space: #; %>% should always have a space before it, and should usually be followed by a new line. But those are just the most common mistakes. The section on spacing is also important.

  • Students often have trouble turning off R messages/warnings/errors. Recall that these may not appear in your html. The right way to deal with these issues is to find out their cause and then fix the underlying problem. Students sometimes use “hacks” to make these messages/warnings/errors disappear. The most common hacks involve using code chunk otions like message=FALSE, warning=FALSE, results=“hide”, include=FALSE and others. Don’t do this, in general. A message/warning/error is worth understanding and then fixing. Don’t close your eyes (metaphorically) and pretend that the problem doesn’t exist. There are some situations, however, in which, no matter what you try, you can’t fix the problem. In those few cases, you can use one of these hacks, but you must make a code comment directly below it, explaining the situation. The only exception is the “setup” chunk (included by default in every new Rmd) which comes with include=FALSE. In that chunk, no explanation is necessary, by convention.

  • Ensure that your repo is clean (no unnecessary files). In general, we recommend including your .gitignore, but that is not required. You should not include your .Rproj file unless you have a specific reason to, a reason you documented in a code comment somewhere.

  • At least 5 commits with sensible commit messages, i.e., not “stuff” or “update.” If, for some reason, you don’t have 5 commits, you should document/explain that fact in a code comment.

  • Once we download your repo, can we replicate your work easily? When we knit your .Rmd file, does your code throw an error (for example, by referencing a file you have locally but which you didn’t push to GitHub)? (It is OK if you use a library which we need to download, but your Rmd better include all the necessary library() commands.)

  • Make your code readable. Formatting matters.

  • Include comments in your code. It is OK if easy-to-understand chunks of code (like a simple Mad Lib) have no comments. The code is self-explanatory. But other code will merit many, many lines of comments, more lines than the code itself. For the problem set or exam as a whole, you should have about as many lines of comments as you have lines of code.

  • Make your comments meaningful. They should not be a simple description of what your code does. The best comments are descriptions about why you did what you did and which other approaches you tried or considered. (The code already tells us what you did.) Good comments often have a “Dear Diary” quality: “I did this. Then I tried that. I finally chose this other thing because of reasons X, Y and Z. If I work on this again, I should look into this other approach.” Because of this, the structure is often a several lines of comments followed by several lines of code. But this is not required. Look closely at our code and our comments.

  • Code comments must be separated from code by one empty line on both sides.

  • Format your code comments neatly. Ctrl-Shift-/ is the easiest way to do that.

  • Name your R code chunks.

  • Spelling and punctuation matter.

  • Use captions, titles, axis labels and so on to make it clear what your tables and graphics mean.

  • Anytime you make a graphic without a title (explaining what the graphic is), a subtitle (highlighting a key conclusion to draw), a caption (with some information about the source of the data) and axis labels (with information about your variables), you should justify that decision in a code comment. We (try to) always include all four but there are situations in which that makes less sense. Ultimately, these decisions are yours, but we need to understand your reasoning.

  • Use your best judgment. For example, sometimes axis labels are unnecessary. Read Data Visualization: A practical introduction by Kieran Healy for guidance on making high quality graphics.

David Kane
Data Scientist
comments powered by Disqus