Let’s create an awesome time series chart in ggplot!
Welcome to the first post of 2024! It’s been two months since my post last year, and I feel excited to be back in the saddle and write something about ggplots. Hope we’ll have another year of posts full of useful and fun topics!
I’ll kick things off here with a time series chart, which is essentially a type of line chart with “time” on the x-axis. ggplot offers a family of functions that work with time, and we’re going to explore one of those functions in this post and create a nice time series chart.
Most of the time, I use built-in R datasets for my posts. But this time it’s a bit different: I’m using the data I collected myself!
I had been recording the number of black kites (Milvus migrans) along the Xindian river in Taipei, Taiwan pretty much every day (except for a few times when I was out of town) from November 2022 to August 2023. This dataset consists of the daily black kite counts, the sites of the observations, and the weather conditions.
Let’s take a quick look at the dataset:
library(tidyverse)
### Read the data
black_kite_raw <- read_csv("https://raw.githubusercontent.com/GenChangHSU/ggGallery/master/_posts/2024-02-21-post-33-a-timely-plotvisualizing-time-series-in-ggplots/Black_Kite_Records.csv")
### Organize the data
black_kite_clean <- black_kite_raw %>%
mutate(Date = ymd(Date)) %>%
select(Date, Site, Number, Weather)
head(black_kite_clean)
# A tibble: 6 × 4
Date Site Number Weather
<date> <chr> <dbl> <chr>
1 2022-11-17 Huajiang_bridge 0 Sunny_with_cloud
2 2022-11-18 Huajiang_bridge 0 Sunny_with_cloud
3 2022-11-19 Huajiang_bridge 7 Sunny
4 2022-11-20 Huajiang_bridge 1 Cloudy
5 2022-11-21 Huajiang_bridge 2 Cloudy
6 2022-11-22 Huajiang_bridge 0 Sunny_with_cloud
Most observations came from the site “Huajiang Bridge”, so I’ll focus on this particular site from now on. Below is a map of Huajiang Bridge created via ggmap:
library(ggmap)
### Need to register an API key for Stadia Maps and for Google Maps
register_stadiamaps(key = "your key")
register_google(key = "your key")
### Get the long-lat coordinates of the site
huajiang_bridge_lon_lat <- geocode("Huajiang Bridge, Wanhua District")
### Create a map
map_data <- get_stadiamap(bbox = c(left = huajiang_bridge_lon_lat$lon - 0.05,
bottom = huajiang_bridge_lon_lat$lat - 0.04,
right = huajiang_bridge_lon_lat$lon + 0.05,
top = huajiang_bridge_lon_lat$lat + 0.04),
zoom = 14,
maptype = "stamen_terrain")
ggmap(map_data) +
annotate(geom = "rect",
xmin = huajiang_bridge_lon_lat$lon - 0.01,
ymin = huajiang_bridge_lon_lat$lat - 0.01,
xmax = huajiang_bridge_lon_lat$lon + 0.01,
ymax = huajiang_bridge_lon_lat$lat + 0.01,
fill = NA,
color = "red",
linewidth = 1.5) +
theme_void()
Now comes the main topic of the post: creating a time series chart. I’ll break down the process into four steps and walk through them one by one.
Instead of plotting the daily kite counts, I decided to plot the weekly counts to better show the overall trend. To summarize the data by week, I first rounded the dates to the nearest weeks and summed the kite counts in each week. I also assigned a score to each of the four daily weather conditions and calculated the mean score for each week to represent the average weekly weather condition.
At the end of the code, I added a week ID column by computing the day differences from the first week and dividing the differences by 7. This week ID column is used for fitting a LOESS curve at step (2). Finally, I converted the rounded dates (object “Date”) to datetimes (object “POSIXct”) for plotting purposes at step (3) and (4).
### Summarize the data by week
black_kite_clean_week <- black_kite_clean %>%
filter(Site == "Huajiang_bridge") %>%
mutate(Date_week = round_date(Date, "week")) %>% # round the dates to the nearest weeks
mutate(Weather_score = case_when(Weather == "Sunny" ~ 3, # assign a score to each weather condition
Weather == "Sunny_with_cloud" ~ 2,
Weather == "Cloudy" ~ 1,
Weather == "Rainy" ~ 0)) %>%
group_by(Date_week) %>%
summarise(Number_week = sum(Number), # weekly kite counts
Weather_week = mean(Weather_score)) %>% # average weekly weather condition
mutate(Week_id = as.double(Date_week - min(Date_week))/7) %>% # week ID
mutate(Date_week = as_datetime(Date_week)) # convert dates to datetimes
head(black_kite_clean_week)
# A tibble: 6 × 4
Date_week Number_week Weather_week Week_id
<dttm> <dbl> <dbl> <dbl>
1 2022-11-20 00:00:00 10 1.57 0
2 2022-11-27 00:00:00 11 1.43 1
3 2022-12-04 00:00:00 19 1 2
4 2022-12-11 00:00:00 11 0.167 3
5 2022-12-18 00:00:00 18 1.2 4
6 2022-12-25 00:00:00 8 3 5
To show the general trend over the observation period, I fit a LOESS curve to the weekly black kite counts, with a smaller span of 0.3 to capture more details. I then created a sequence of datetimes with an interval of six hours (for the sake of model predictions), computed the “week” differences of these datetimes from the first datetime, and made the LOESS predictions based on these week differences.
### Fit a LOESS curve to the weekly black kite counts
loess_model <- loess(Number_week ~ Week_id, data = black_kite_clean_week, span = 0.3)
### Make predictions for the counts
library(modelr) # for the function "add_predictions()"
loess_predictions <- tibble(Date_seq = seq(min(black_kite_clean_week$Date_week),
max(black_kite_clean_week$Date_week),
by = "6 hour"), # a sequence of datetimes with an interval of six hours,
Week_id = as.numeric((Date_seq - min(black_kite_clean_week$Date_week))/86400)/7) %>% # week differences from the first datetime
add_predictions(loess_model) # make the LOESS predictions
head(loess_predictions)
# A tibble: 6 × 3
Date_seq Week_id pred
<dttm> <dbl> <dbl>
1 2022-11-20 00:00:00 0 10.1
2 2022-11-20 06:00:00 0.0357 10.2
3 2022-11-20 12:00:00 0.0714 10.3
4 2022-11-20 18:00:00 0.107 10.4
5 2022-11-21 00:00:00 0.143 10.5
6 2022-11-21 06:00:00 0.179 10.6
With the required data at hand, it’s time to create the time series chart! This chart contains points for the weekly kite counts, line segments joining the points, and the fitted LOESS curve from step (2). I went a bit fancy here by making the points blurry using the function geom_point_blur()
from the package ggblur
. I also drew a gradient fill background using the function geom_rect_pattern
from the package ggpattern
. (Note that I added the geom layers in a reversed order to have the points on the top, the lines in the middle, and the background at the very bottom.)
Remember I mentioned in the introduction that we would be using one of the functions that ggplot offers to work with time. That function is scale_x_datetime()
, which allows for customizing the appearance of x-axis using date/time specifications. In the code below, I specified the x-axis limits using two datetimes, placed the x-axis breaks at an one-month interval (” date_breaks = “1 month” “), and labeled the x-axis using the R date/time format (” date_labels = “%b ’%y” “; %b represents the abbreviated month name and %y represents the two-digit year; see this documentation for more details).
library(ggpattern) # for the function "geom_rect_pattern()"
library(ggblur) # for the function "geom_point_blur()"
p_black_kite <- ggplot(black_kite_clean_week) +
geom_rect_pattern(aes(xmin = floor_date(min(black_kite_clean_week$Date_week), "month"),
xmax = ceiling_date(max(black_kite_clean_week$Date_week), "month"),
ymin = -Inf,
ymax = 23),
pattern_fill = "#99d6ff", pattern_fill2 = "#ccebff", pattern = "gradient") +
geom_line(aes(x = Date_week, y = Number_week), color = "black", alpha = 0.05) +
geom_line(data = loess_predictions, aes(x = Date_seq, y = pred),
color = "grey60", linewidth = 2, lineend = "round") +
geom_point_blur(aes(x = Date_week, y = Number_week, color = Weather_week),
size = 3, blur_size = 10) +
scale_x_datetime(limits = c(floor_date(min(black_kite_clean_week$Date_week), "month"),
ceiling_date(max(black_kite_clean_week$Date_week), "month")),
expand = c(0, 3), date_breaks = "1 month", date_labels = "%b '%y") +
scale_y_continuous(limits = c(-1, 23), breaks = seq(0, 20, 10),
labels = c("0 ", "10", "20"), expand = c(0, 0)) +
scale_color_gradient(low = "#cce6ff", high = "#FFBA00") +
theme_classic()
p_black_kite
Let’s finish up by polishing the chart to make it more visually appealing:
I added a title and an image of a black kite via the HTML syntax using the function geom_richtext()
from the package ggtext
.
I downloaded the font “Quintessential” from Google Fonts using the function font_add_google()
, and applied it to the numbers and text in the chart using the function showtext_auto()
. Both functions come from the package showtext
.
I removed the axis titles and flipped the y-axis ticks and text to the inner side of the panel.
I changed the legend position and direction, and added the labels manually.
library(ggtext)
library(showtext)
### Download and use the Google Fonts "Quintessential"
font_add_google(name = "Quintessential", family = "Quintessential")
showtext_auto()
### The HTML tag for the title and the black kite image
title_text <- "Weekly Black Kite <span> </span> <span> </span> Counts at Huajiang Bridge"
title_image <- "<img src='https://raw.githubusercontent.com/GenChangHSU/ggGallery/master/_posts/2024-02-21-post-33-a-timely-plotvisualizing-time-series-in-ggplots/Black_Kite.png' width='50'/>"
### The final polished chart
p_black_kite_final <- p_black_kite +
geom_richtext(data = NULL, aes(x = as_datetime("2023-05-15"), y = 20, label = title_text),
fill = NA, label.color = NA, size = 10, family = "Quintessential") +
geom_richtext(data = NULL, aes(x = as_datetime("2023-05-04"), y = 20.4, label = title_image),
fill = NA, label.color = NA) +
geom_hline(aes(yintercept = 23), color = "white", linewidth = 1) +
labs(x = NULL, y = NULL) +
guides(color = guide_colorbar(ticks.colour = "transparent")) +
theme(# panel margin
plot.margin = margin(t = 10, b = 20, l = 35, r = 25),
# axis appearance and text
axis.line = element_blank(),
axis.text.x = element_text(size = 18, family = "Quintessential"),
axis.text.y = element_text(size = 20, margin = margin(r = -20), family = "Quintessential"),
axis.ticks.x = element_blank(),
axis.ticks.y = element_line(color = "grey60"),
axis.ticks.length.y = unit(-0.2, "cm"),
# legend
legend.position = c(0.8, 0.7),
legend.direction = "horizontal",
legend.background = element_blank(),
legend.title = element_blank(),
legend.text = element_blank()
) +
annotate(geom = "text", x = as_datetime("2023-06-05"), y = 16.2,
label = "Rainy", family = "Quintessential", size = 5.5) +
annotate(geom = "text", x = as_datetime("2023-08-01"), y = 16.2,
label = "Sunny", family = "Quintessential", size = 5.5)
p_black_kite_final
We made it. Hooray!
As we can see from the chart, the black kites seemed to be more active in the winter (November to February) and less so in the spring (March to May). There was a small peak in June, and the counts increased again in August. Unfortunately, I didn’t have data from September to November and therefore we can’t really see the full-year dynamics.
Another thing to note is that the weather conditions didn’t seem to be strongly associated with the weekly kite counts: the kites were out on both sunny and rainy days. However, this is just a visual guess and we certainly need a formal analysis to verify it.
To recap what we did in this post, we first summarized the daily black kite counts and weather conditions by week. We then fit a LOESS curve to the weekly counts and made predictions over the observation period. After that, we created a time series chart of the weekly counts with the fitted LOESS curve to show the general trend. Finally, we polished the chart by adding a title and a black kite image as well as modifying the font, text, and legend.
Hope you learn something useful from this post and don’t forget to leave your comments and suggestions below if you have any!
If you see mistakes or want to suggest changes, please create an issue on the source repository.