Is the Earth warming? Let’s take a look at some satellite temperature measurements from the University of Alabama in Huntsville. Roy Spencer provides monthly updates — e.g., here for June 2018 — using the UAH data. The purpose of this post is to replicate/improve his main image.
Global area-averaged lower tropospheric temperature anomalies (departures from 30-year calendar monthly means, 1981-2010). The 13-month centered average is meant to give an indication of the lower frequency variations in the data; the choice of 13 months is somewhat arbitrary… an odd number of months allows centered plotting on months with no time lag between the two plotted time series. The inclusion of two of the same calendar months on the ends of the 13 month averaging period causes no issues with interpretation because the seasonal temperature cycle has been removed, and so has the distinction between calendar months.
suppressMessages(library(tidyverse)) suppressMessages(library(lubridate)) library(RcppRoll) # The raw ascii file is a mess. First, it has a leading white space in every # line. Second, the column names feature many duplicates. Third, the column # names are separated by different numbers of white spaces. (Fortunately, # skipping the column names and using read_table2() handles these issues.) # Fourth, there is a bunch of junk at the botton of the file, so we need to # determine the number of rows with data by hand. (There is probably a better # way to do this.) df <- read_table2(file = "https://github.com/davidkane9/public_data/raw/master/uahncdc_lt_6.0.txt", col_names = FALSE, skip = 1, n_max = 475) %>% # Only want first three columns, which we need to name by hand. select(1:3) %>% rename(year = X1, month = X2, temp = X3) %>% # The data is actually through the end of the year-month we are give. This # gives us the appropriate date. Is there a better way? mutate(date = make_date(year, month, 1), date = ceiling_date(date, "month") - days(1)) %>% # Data is sorted by date already, but it never hurts to be sure, especially # since the rolling mean calculation assumes correct ordering. arrange(desc(date)) %>% # Rolling means are (still!?) not built into dplyr by default, so we need the # RcppRoll package. mutate(temp_mean = roll_mean(temp, 13, fill = NA)) %>% select(date, temp, temp_mean)
Given this nicely organized data, creating a similar image is mostly a matter of fussing with the caption/labels/title.
ggplot(data = df, aes(x = date)) + geom_point(aes(y = temp), size = 0.5) + geom_line(aes(y = temp_mean), na.rm = TRUE, color = "red") + ggtitle("UAH Satellite Data Shows Global Warming Over the Last 40 Years") + xlab("Date: December 1978 through June 2018") + ylab(expression("Global Temperature Anomaly in " ( degree*C)))
Let’s leave aside the issue of whether or not a 13-month centered mean is the best summary measure to include in this graphic. I prefer my image over the original because:
- The x-axis tick mark labels are much nicer looking. There is no point in showing every single year.
- There is no reason for the x-axis label to be in ALL-CAPS like “YEAR”. Also, since the data is actually monthly, “Date” is a better description.
- There is no need to connect the individual points with a line. This is just jagged ugliness.
- My caption (unlike Spencer’s) is concise and includes a link to the raw data.
- My y-axis label is more concise and descriptive.
- The original’s text notations are, first, too large and, second, unnecessary.
- I use the correct scientific symbol for degrees centigrade: °C.
- I am currently using the summary data as calculated by UAH. It would be interesting to work directly with the raw data.
- Instead of just plotting the global mean, look at other portions of the world, which are already present in this data.
- Compare this was other data sources (like from NASA). The NASA data seems to show a much more dramatic increase in the last 20 years.
- Forecast future temperatures, including an associated confidence interval.