Do Not Force Your Students to Work in the Cloud

Nick Horton at Teach Data Science argues that instructors should “have students use a dedicated server to access R” for the entire course. I disagree.

There is nothing wrong with using a set up like RStudio Cloud in the first day or two of class. (I do this.) All of Nick’s thoughts on that topic, especially on getting students excited, are correct. Recall Mine Cetinkaya-Rundel’s advice to “Let Them Eat Cake (First).”

The disagreement lies in whether or not students should use R on their local machines in week two and thereafter.

1) Although the technical hassles in getting R/RStudio to work are real, they aren’t that bad. I have done it for the last two semesters, with scores of students, without any real problems. (I also require students to use Git/Github, which can be more challenging. Thank goodness for Happy Git and GitHub for the useR!) Moreover, using R/RStudio on a local machine gets easier every year, especially as older Windows machines disappear.

2) Nick implies that installing R and RStudio, installing packages like Tidyverse, loading packages and so on is not something that students should do. On the contrary, if students want to be able to do data science after the course is over, they will need all these skills. I concede that they don’t need to do these things the first day, but they need to learn them sometime. The best way to teach them is to require students to run their own R installations.

3) What happens when the class is over, when students are taking other classes, working at summer internships, at the their first job after graduation? Answer: the handy R environment that Nick has arranged (and paid for?) disappears. Students are on their own. My students are prepared for that day because, after week 1, they do everything themselves (with my and TF help).

4) What happens to student work when the class is over? My wager is that the vast majority of classes that use the in-the-cloud approach make it difficult/impossible for students to use/revisit/improve/display work after the class is over. Consider the final projects from my class. Students have access to this work even though the class is over, even though some have graduated and no longer have the use of Harvard systems. (I admit that this concern is somewhat orthogonal to the class-in-the-cloud issue since it is possible to have students save/share their work via Github even if you use the cloud.)

Nick writes:

The results are transformative, particularly for the fraction of students who are easily intimidated by computing and/or who have obsolete equipment, outdated operating systems, or minimal free space. Being able to jump into R from day one with an ability to do interesting things makes a huge difference in motivating students.

a) Just how common, in 2019, are things like “outdated operating systems?” Honestly curious. I taught 110 students in 2018-2019 and not a single one used a machine that could not run R/RStudio.

b) If you are trying to give students the skills to do data science once your class is over, you need to teach them how to handle (simple!) computer issues like “minimal free space.” You may not like teaching them this material, but, if you don’t teach them, how will they do data science on their own?

c) Nothing helps those “easily intimidated by computing” more then showing them how to set up an environment on their own machines. My students are empowered. Nick’s students are forced to rely on a Wizard behind the curtain to set up things for them. My students, especially those new to data/computing, will be much less “intimidated” at the end of the class than Nick’s will be.

David Kane
Preceptor in Statistical Methods and Mathematics
comments powered by Disqus