Post #33. A “timely” plot—Visualizing time series data in ggplot

2024

Let’s create an awesome time series chart in ggplot!

Gen-Chang Hsu
2024-02-21

Introduction

Welcome to the first post of 2024! It’s been two months since my post last year, and I feel excited to be back in the saddle and write something about ggplots. Hope we’ll have another year of posts full of useful and fun topics!

I’ll kick things off here with a time series chart, which is essentially a type of line chart with “time” on the x-axis. ggplot offers a family of functions that work with time, and we’re going to explore one of those functions in this post and create a nice time series chart.

The data

Most of the time, I use built-in R datasets for my posts. But this time it’s a bit different: I’m using the data I collected myself!

I had been recording the number of black kites (Milvus migrans) along the Xindian river in Taipei, Taiwan pretty much every day (except for a few times when I was out of town) from November 2022 to August 2023. This dataset consists of the daily black kite counts, the sites of the observations, and the weather conditions.

Let’s take a quick look at the dataset:

library(tidyverse)

### Read the data
black_kite_raw <- read_csv("https://raw.githubusercontent.com/GenChangHSU/ggGallery/master/_posts/2024-02-21-post-33-a-timely-plotvisualizing-time-series-in-ggplots/Black_Kite_Records.csv")

### Organize the data
black_kite_clean <- black_kite_raw %>% 
  mutate(Date = ymd(Date)) %>% 
  select(Date, Site, Number, Weather)

head(black_kite_clean)
# A tibble: 6 × 4
  Date       Site            Number Weather         
  <date>     <chr>            <dbl> <chr>           
1 2022-11-17 Huajiang_bridge      0 Sunny_with_cloud
2 2022-11-18 Huajiang_bridge      0 Sunny_with_cloud
3 2022-11-19 Huajiang_bridge      7 Sunny           
4 2022-11-20 Huajiang_bridge      1 Cloudy          
5 2022-11-21 Huajiang_bridge      2 Cloudy          
6 2022-11-22 Huajiang_bridge      0 Sunny_with_cloud

Most observations came from the site “Huajiang Bridge”, so I’ll focus on this particular site from now on. Below is a map of Huajiang Bridge created via ggmap:

library(ggmap)

### Need to register an API key for Stadia Maps and for Google Maps
register_stadiamaps(key = "your key")
register_google(key = "your key")

### Get the long-lat coordinates of the site
huajiang_bridge_lon_lat <- geocode("Huajiang Bridge, Wanhua District")

### Create a map
map_data <- get_stadiamap(bbox = c(left = huajiang_bridge_lon_lat$lon - 0.05, 
                                   bottom = huajiang_bridge_lon_lat$lat - 0.04, 
                                   right = huajiang_bridge_lon_lat$lon + 0.05, 
                                   top = huajiang_bridge_lon_lat$lat + 0.04), 
                          zoom = 14, 
                          maptype = "stamen_terrain")

ggmap(map_data) + 
  annotate(geom = "rect", 
           xmin = huajiang_bridge_lon_lat$lon - 0.01,
           ymin = huajiang_bridge_lon_lat$lat - 0.01,
           xmax = huajiang_bridge_lon_lat$lon + 0.01,
           ymax = huajiang_bridge_lon_lat$lat + 0.01,
           fill = NA,
           color = "red",
           linewidth = 1.5) + 
  theme_void()


The time series chart

Now comes the main topic of the post: creating a time series chart. I’ll break down the process into four steps and walk through them one by one.

(1) Summarize the data by week

Instead of plotting the daily kite counts, I decided to plot the weekly counts to better show the overall trend. To summarize the data by week, I first rounded the dates to the nearest weeks and summed the kite counts in each week. I also assigned a score to each of the four daily weather conditions and calculated the mean score for each week to represent the average weekly weather condition.

At the end of the code, I added a week ID column by computing the day differences from the first week and dividing the differences by 7. This week ID column is used for fitting a LOESS curve at step (2). Finally, I converted the rounded dates (object “Date”) to datetimes (object “POSIXct”) for plotting purposes at step (3) and (4).

### Summarize the data by week
black_kite_clean_week <- black_kite_clean %>% 
  filter(Site == "Huajiang_bridge") %>%
  mutate(Date_week = round_date(Date, "week")) %>%  # round the dates to the nearest weeks
  mutate(Weather_score = case_when(Weather == "Sunny" ~ 3,  # assign a score to each weather condition
                                   Weather == "Sunny_with_cloud" ~ 2,
                                   Weather == "Cloudy" ~ 1,
                                   Weather == "Rainy" ~ 0)) %>% 
  group_by(Date_week) %>% 
  summarise(Number_week = sum(Number),  # weekly kite counts
            Weather_week = mean(Weather_score)) %>%  # average weekly weather condition
  mutate(Week_id = as.double(Date_week - min(Date_week))/7) %>%   # week ID
  mutate(Date_week = as_datetime(Date_week))  # convert dates to datetimes

head(black_kite_clean_week)
# A tibble: 6 × 4
  Date_week           Number_week Weather_week Week_id
  <dttm>                    <dbl>        <dbl>   <dbl>
1 2022-11-20 00:00:00          10        1.57        0
2 2022-11-27 00:00:00          11        1.43        1
3 2022-12-04 00:00:00          19        1           2
4 2022-12-11 00:00:00          11        0.167       3
5 2022-12-18 00:00:00          18        1.2         4
6 2022-12-25 00:00:00           8        3           5

(2) Fit a LOESS curve to the weekly black kite counts

To show the general trend over the observation period, I fit a LOESS curve to the weekly black kite counts, with a smaller span of 0.3 to capture more details. I then created a sequence of datetimes with an interval of six hours (for the sake of model predictions), computed the “week” differences of these datetimes from the first datetime, and made the LOESS predictions based on these week differences.

### Fit a LOESS curve to the weekly black kite counts
loess_model <- loess(Number_week ~ Week_id, data = black_kite_clean_week, span = 0.3)

### Make predictions for the counts
library(modelr)  # for the function "add_predictions()"

loess_predictions <- tibble(Date_seq = seq(min(black_kite_clean_week$Date_week),
                                           max(black_kite_clean_week$Date_week), 
                                           by = "6 hour"),  # a sequence of datetimes with an interval of six hours,
                            Week_id = as.numeric((Date_seq - min(black_kite_clean_week$Date_week))/86400)/7) %>%  # week differences from the first datetime
  add_predictions(loess_model)  # make the LOESS predictions

head(loess_predictions)
# A tibble: 6 × 3
  Date_seq            Week_id  pred
  <dttm>                <dbl> <dbl>
1 2022-11-20 00:00:00  0       10.1
2 2022-11-20 06:00:00  0.0357  10.2
3 2022-11-20 12:00:00  0.0714  10.3
4 2022-11-20 18:00:00  0.107   10.4
5 2022-11-21 00:00:00  0.143   10.5
6 2022-11-21 06:00:00  0.179   10.6

(3) Create the time series chart

With the required data at hand, it’s time to create the time series chart! This chart contains points for the weekly kite counts, line segments joining the points, and the fitted LOESS curve from step (2). I went a bit fancy here by making the points blurry using the function geom_point_blur() from the package ggblur. I also drew a gradient fill background using the function geom_rect_pattern from the package ggpattern. (Note that I added the geom layers in a reversed order to have the points on the top, the lines in the middle, and the background at the very bottom.)

Remember I mentioned in the introduction that we would be using one of the functions that ggplot offers to work with time. That function is scale_x_datetime(), which allows for customizing the appearance of x-axis using date/time specifications. In the code below, I specified the x-axis limits using two datetimes, placed the x-axis breaks at an one-month interval (” date_breaks = “1 month” “), and labeled the x-axis using the R date/time format (” date_labels = “%b ’%y” “; %b represents the abbreviated month name and %y represents the two-digit year; see this documentation for more details).

library(ggpattern)  # for the function "geom_rect_pattern()"
library(ggblur)  # for the function "geom_point_blur()"

p_black_kite <- ggplot(black_kite_clean_week) + 
  geom_rect_pattern(aes(xmin = floor_date(min(black_kite_clean_week$Date_week), "month"), 
                        xmax = ceiling_date(max(black_kite_clean_week$Date_week), "month"), 
                        ymin = -Inf, 
                        ymax = 23),
                    pattern_fill = "#99d6ff", pattern_fill2 = "#ccebff", pattern = "gradient") + 
  geom_line(aes(x = Date_week, y = Number_week), color = "black", alpha = 0.05) + 
  geom_line(data = loess_predictions, aes(x = Date_seq, y = pred),
            color = "grey60", linewidth = 2, lineend = "round") + 
  geom_point_blur(aes(x = Date_week, y = Number_week, color = Weather_week), 
                  size = 3, blur_size = 10) + 
  scale_x_datetime(limits = c(floor_date(min(black_kite_clean_week$Date_week), "month"),
                              ceiling_date(max(black_kite_clean_week$Date_week), "month")), 
                   expand = c(0, 3), date_breaks = "1 month", date_labels = "%b '%y") +
  scale_y_continuous(limits = c(-1, 23), breaks = seq(0, 20, 10), 
                     labels = c("0 ", "10", "20"), expand = c(0, 0)) + 
  scale_color_gradient(low = "#cce6ff", high = "#FFBA00") +
  theme_classic()

p_black_kite

(4) Polish the chart

Let’s finish up by polishing the chart to make it more visually appealing:

library(ggtext)
library(showtext)

### Download and use the Google Fonts "Quintessential"
font_add_google(name = "Quintessential", family = "Quintessential")
showtext_auto()

### The HTML tag for the title and the black kite image
title_text <- "Weekly Black Kite <span> </span> <span> </span> Counts at Huajiang Bridge"
title_image <- "<img src='https://raw.githubusercontent.com/GenChangHSU/ggGallery/master/_posts/2024-02-21-post-33-a-timely-plotvisualizing-time-series-in-ggplots/Black_Kite.png' width='50'/>"

### The final polished chart
p_black_kite_final <- p_black_kite + 
  geom_richtext(data = NULL, aes(x = as_datetime("2023-05-15"), y = 20, label = title_text),
                fill = NA, label.color = NA, size = 10, family = "Quintessential") + 
  geom_richtext(data = NULL, aes(x = as_datetime("2023-05-04"), y = 20.4, label = title_image),
                fill = NA, label.color = NA) +
  geom_hline(aes(yintercept = 23), color = "white", linewidth = 1) + 
  labs(x = NULL, y = NULL) +
  guides(color = guide_colorbar(ticks.colour = "transparent")) + 
  theme(# panel margin
        plot.margin = margin(t = 10, b = 20, l = 35, r = 25),
        
        # axis appearance and text
        axis.line = element_blank(),
        axis.text.x = element_text(size = 18, family = "Quintessential"),
        axis.text.y = element_text(size = 20, margin = margin(r = -20), family = "Quintessential"),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_line(color = "grey60"),
        axis.ticks.length.y = unit(-0.2, "cm"),
        
        # legend
        legend.position = c(0.8, 0.7),
        legend.direction = "horizontal",
        legend.background = element_blank(),
        legend.title = element_blank(),
        legend.text = element_blank()
        ) + 
  annotate(geom = "text", x = as_datetime("2023-06-05"), y = 16.2, 
           label = "Rainy", family = "Quintessential", size = 5.5) + 
  annotate(geom = "text", x = as_datetime("2023-08-01"), y = 16.2, 
           label = "Sunny", family = "Quintessential", size = 5.5)

p_black_kite_final

We made it. Hooray!

As we can see from the chart, the black kites seemed to be more active in the winter (November to February) and less so in the spring (March to May). There was a small peak in June, and the counts increased again in August. Unfortunately, I didn’t have data from September to November and therefore we can’t really see the full-year dynamics.

Another thing to note is that the weather conditions didn’t seem to be strongly associated with the weekly kite counts: the kites were out on both sunny and rainy days. However, this is just a visual guess and we certainly need a formal analysis to verify it.

Summary

To recap what we did in this post, we first summarized the daily black kite counts and weather conditions by week. We then fit a LOESS curve to the weekly counts and made predictions over the observation period. After that, we created a time series chart of the weekly counts with the fitted LOESS curve to show the general trend. Finally, we polished the chart by adding a title and a black kite image as well as modifying the font, text, and legend.

Hope you learn something useful from this post and don’t forget to leave your comments and suggestions below if you have any!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.