Harvard Men's Basketball Data

Let’s use the ncaahoopR package to gather some data about Harvard Men’s Basketball from the 2017-2018 season. Unfortunately, the ncaahoopR package is not very flexible (and/or I don’t understand blogdown well enough to control the package’s output), so there was no way for me to control the output messages. Apologies!

knitr::opts_chunk$set(echo = TRUE)
suppressMessages(library(tidyverse))
suppressMessages(library(ncaahoopR))
# One tricky aspect of creating posts with non-trivial run times and which rely
# on outside data sources is that you don't want to re-run the analysis each
# time you save a draft of the post, which is what happens with blogdown.
# (Presumably, this could be changed.) Also, you don't want things to fail if
# the source website goes down.

# So, I tried to handle this by creating a single "download" code chunk for
# downloading the necessary data. I then a) cache this data so that it is always
# available and then set eval=FALSE for this chunk so that it does not run
# again. In other words, I first compile the post without eval=FALSE (but with
# cache=TRUE), thus saving the data permanently. I then set eval=FALSE. BUT I
# COULD NOT GET THIS TO WORK. Why?

# Anyway, This takes a few minutes to download 31 games in the Harvard 2017-2018
# season. The program issued a warning message which was somewhat cryptic,
# indicating that Harvard played one game in the NIT tournament for which, alas,
# they do not have the data.

x <- get_pbp("Harvard")
## [1] "Getting Game IDs: Harvard"
## [1] "Getting Harvard Game: 1/32"
## [1] "Getting Harvard Game: 2/32"
## [1] "Getting Harvard Game: 3/32"
## [1] "Getting Harvard Game: 4/32"
## [1] "Getting Harvard Game: 5/32"
## [1] "Getting Harvard Game: 6/32"
## [1] "Getting Harvard Game: 7/32"
## [1] "Getting Harvard Game: 8/32"
## [1] "Getting Harvard Game: 9/32"
## [1] "Getting Harvard Game: 10/32"
## [1] "Getting Harvard Game: 11/32"
## [1] "Getting Harvard Game: 12/32"
## [1] "Getting Harvard Game: 13/32"
## [1] "Getting Harvard Game: 14/32"
## [1] "Getting Harvard Game: 15/32"
## [1] "Getting Harvard Game: 16/32"
## [1] "Getting Harvard Game: 17/32"
## [1] "Getting Harvard Game: 18/32"
## [1] "Getting Harvard Game: 19/32"
## [1] "Getting Harvard Game: 20/32"
## [1] "Getting Harvard Game: 21/32"
## [1] "Getting Harvard Game: 22/32"
## [1] "Getting Harvard Game: 23/32"
## [1] "Getting Harvard Game: 24/32"
## [1] "Getting Harvard Game: 25/32"
## [1] "Getting Harvard Game: 26/32"
## [1] "Getting Harvard Game: 27/32"
## [1] "Getting Harvard Game: 28/32"
## [1] "Getting Harvard Game: 29/32"
## [1] "Getting Harvard Game: 30/32"
## [1] "Getting Harvard Game: 31/32"
## [1] "NIT Game--Play by Play Data Not Available at this time"
x <- as_tibble(x)

# This package does not have a nicy tidy design. I should be able to download
# this data and then create my own chart. As best I can tell, I have no choice
# but to use their chart.

chart1 <- wp_chart("400991656", home_col = "royalblue4", away_col = "red")
## [1] "Game: 1/1"

The play-by-play data is interesting. We have 9720 individual plays over 31 games. The variables we can use are:

names(x)
##  [1] "play_id"             "half"                "time_remaining_half"
##  [4] "secs_remaining"      "description"         "home_score"         
##  [7] "away_score"          "away"                "home"               
## [10] "home_favored_by"     "game_id"             "date"

I find the descriptions to be particularly interesting.

x %>% 
  select(play_id, description) %>% 
  sample_n(10)
## # A tibble: 10 x 2
##    play_id description                                                 
##      <int> <fct>                                                       
##  1     275 Chris Lewis missed Jumper.                                  
##  2     276 Seth Towns made Three Point Jumper. Assisted by Chris Lewis.
##  3     235 AJ Brodeur Defensive Rebound.                               
##  4      78 Taylor Johnson Defensive Rebound.                           
##  5     101 Josh Warren made Layup.                                     
##  6      25 Evan Fitzner Offensive Rebound.                             
##  7     163 Ryan Betley Defensive Rebound.                              
##  8     310 Shavar Newkirk missed Free Throw.                           
##  9      53 Chris Lewis Block.                                          
## 10     140 Will Emery made Free Throw.

Lots of interesting stuff to play with!

comments powered by Disqus