Let’s use the ncaahoopR package to gather some data about Harvard Men’s Basketball from the 2017-2018 season. Unfortunately, the ncaahoopR package is not very flexible (and/or I don’t understand blogdown well enough to control the package’s output), so there was no way for me to control the output messages. Apologies!
knitr::opts_chunk$set(echo = TRUE)
suppressMessages(library(tidyverse))
suppressMessages(library(ncaahoopR))
# One tricky aspect of creating posts with non-trivial run times and which rely
# on outside data sources is that you don't want to re-run the analysis each
# time you save a draft of the post, which is what happens with blogdown.
# (Presumably, this could be changed.) Also, you don't want things to fail if
# the source website goes down.
# So, I tried to handle this by creating a single "download" code chunk for
# downloading the necessary data. I then a) cache this data so that it is always
# available and then set eval=FALSE for this chunk so that it does not run
# again. In other words, I first compile the post without eval=FALSE (but with
# cache=TRUE), thus saving the data permanently. I then set eval=FALSE. BUT I
# COULD NOT GET THIS TO WORK. Why?
# Anyway, This takes a few minutes to download 31 games in the Harvard 2017-2018
# season. The program issued a warning message which was somewhat cryptic,
# indicating that Harvard played one game in the NIT tournament for which, alas,
# they do not have the data.
x <- get_pbp("Harvard")
## [1] "Getting Game IDs: Harvard"
## [1] "Getting Harvard Game: 1/32"
## [1] "Getting Harvard Game: 2/32"
## [1] "Getting Harvard Game: 3/32"
## [1] "Getting Harvard Game: 4/32"
## [1] "Getting Harvard Game: 5/32"
## [1] "Getting Harvard Game: 6/32"
## [1] "Getting Harvard Game: 7/32"
## [1] "Getting Harvard Game: 8/32"
## [1] "Getting Harvard Game: 9/32"
## [1] "Getting Harvard Game: 10/32"
## [1] "Getting Harvard Game: 11/32"
## [1] "Getting Harvard Game: 12/32"
## [1] "Getting Harvard Game: 13/32"
## [1] "Getting Harvard Game: 14/32"
## [1] "Getting Harvard Game: 15/32"
## [1] "Getting Harvard Game: 16/32"
## [1] "Getting Harvard Game: 17/32"
## [1] "Getting Harvard Game: 18/32"
## [1] "Getting Harvard Game: 19/32"
## [1] "Getting Harvard Game: 20/32"
## [1] "Getting Harvard Game: 21/32"
## [1] "Getting Harvard Game: 22/32"
## [1] "Getting Harvard Game: 23/32"
## [1] "Getting Harvard Game: 24/32"
## [1] "Getting Harvard Game: 25/32"
## [1] "Getting Harvard Game: 26/32"
## [1] "Getting Harvard Game: 27/32"
## [1] "Getting Harvard Game: 28/32"
## [1] "Getting Harvard Game: 29/32"
## [1] "Getting Harvard Game: 30/32"
## [1] "Getting Harvard Game: 31/32"
## [1] "NIT Game--Play by Play Data Not Available at this time"
x <- as_tibble(x)
# This package does not have a nicy tidy design. I should be able to download
# this data and then create my own chart. As best I can tell, I have no choice
# but to use their chart.
chart1 <- wp_chart("400991656", home_col = "royalblue4", away_col = "red")
## [1] "Game: 1/1"
The play-by-play data is interesting. We have 9720 individual plays over 31 games. The variables we can use are:
names(x)
## [1] "play_id" "half" "time_remaining_half"
## [4] "secs_remaining" "description" "home_score"
## [7] "away_score" "away" "home"
## [10] "home_favored_by" "game_id" "date"
I find the descriptions to be particularly interesting.
x %>%
select(play_id, description) %>%
sample_n(10)
## # A tibble: 10 x 2
## play_id description
## <int> <fct>
## 1 275 Chris Lewis missed Jumper.
## 2 276 Seth Towns made Three Point Jumper. Assisted by Chris Lewis.
## 3 235 AJ Brodeur Defensive Rebound.
## 4 78 Taylor Johnson Defensive Rebound.
## 5 101 Josh Warren made Layup.
## 6 25 Evan Fitzner Offensive Rebound.
## 7 163 Ryan Betley Defensive Rebound.
## 8 310 Shavar Newkirk missed Free Throw.
## 9 53 Chris Lewis Block.
## 10 140 Will Emery made Free Throw.
Lots of interesting stuff to play with!