Post #29. Bad weather today—Creating raincloud plots in ggplots

2023

Come and learn how to create raincloud plots in ggplots to make your data “shine”!

Gen-Chang Hsu
2023-09-09

Introduction

Rainclound plot, what a cool name! You might, or might not have heard of it before, but as you’ll see soon, this type of chart is awesome for visualizing data distribution.

A rainclound plot is actually not a single plot, but instead it consists of three different plots: a boxplot, a dot plot, and a density plot. With these components, a rainclound plot can capture various aspects of the data and convey more information than the individual constituent plots alone.

This post is actually inspired by two previous posts (post 1 and post 2) I stumbled upon the other day. Both posts give a neat tutorial of creating rainclound plots using the extension package ggdist, so I will not repeat that again here. Instead, what I’m going to do is to show you how we can create a raincloud plot in ggplots using basic ggplot functions. Should be fun!

Create the plot

We’ll use the penguins dataset in the package palmerpenguins, which contains size measurements of three penguin species (Adélie, Chinstrap, and Gentoo) on the islands in Palmer Archipelago. In the following example, we’ll create a raincloud plot to visualize the distribution of body mass of the three penguins.

Step 1. The box plot

The first step is to draw a box plot. This is quite straightforward: simply use the function geom_boxplot(). We’ll also reduce the width of the boxes to save space for the two upcoming plots.

library(tidyverse)
library(palmerpenguins)

P_box <- penguins %>% 
  ggplot() + 
  geom_boxplot(aes(x = species, fill = species, y = body_mass_g), 
               width = 0.15,
               show.legend = F)

P_box

Step 2. The dot plot

The second step is to draw the dot plot, which resembles a histogram but uses stacked dots instead of bars to represent the counts. To do so, we’ll use the function geom_dotplot() and specify “binaxis =”y”” to place the stacked dots along the y-axis (not the default x-axis). We’ll also use “stackratio = -1” to flip the dots to the left of the boxes and “position_nudge(x = -0.15)” to shift the dots slightly away from the boxes so that they don’t overlap with each other.

P_box_dot <- P_box +
  geom_dotplot(aes(x = species, fill = species, y = body_mass_g),
               binaxis = "y",
               stackratio = -1,
               position = position_nudge(x = -0.15),
               dotsize = 0.5,
               show.legend = F)

P_box_dot

Step 3. The density plot

The third step is to draw the density plot. Here, we’ll use geom_ribbon(stat = "density") instead of geom_density() because geom_ribbon() allows us to create a density curve for each group (species). The mapping specification in aes() is a bit complicated, so let me explain it:

The key lies in the two arguments “xmin = after_stat(group)” and “xmax = after_stat(group + ndensity*0.2)“.”xmin” represents the lower boundary of the area (i.e., the base), and “xmax” represents the upper boundary (i.e., the density curve). As for after_stat(), it evaluates the new variables created by the geom after the transformation of the original data (e.g., computing the kernel density estimates). In this example, “group” is a new variable created for each species (because we map species to color and fill) and coded as integers; “ndensity” is another new variable created for the kernel density estimates computed from the variable “body_mass_g”. Now, with these two new variables, we can map “xmin” and “xmax” to them to draw the density curves. Also note that “ndensity” is multiplied by 0.2 to adjust the height of the density curves. We’ll also specify “trim = T” to truncate the tails of the density curves that extend across the entire y-axis range.

P_box_dot_density <- P_box_dot + 
  geom_ribbon(aes(xmin = after_stat(group),
                  xmax = after_stat(group + ndensity*0.2),
                  y = body_mass_g,
                  colour = species, fill = after_scale(alpha(colour, 0.3))),
              stat = "density", 
              outline.type = "upper",
              position = position_nudge(x = 0.1),  # shift the density curves to the right of the boxes
              trim = T,  # truncate the long tails
              show.legend = F)

P_box_dot_density 

Step 4. Flip the axes

We’re almost there! The clouds and raindrops are in the vertical direction, so the final step is to flip the axes to make them horizontal. Simply use coord_flip() to achieve this.

P_raincloud <- P_box_dot_density + 
  labs(x = "Body mass (g)", y = "Species") +
  coord_flip(x = c(0.5, 3), y = c(2500, 6500)) + 
  theme_classic(base_size = 12)

P_raincloud 

We made it!

Summary

To recap, we learned how to create a raincloud plot in ggplots step by step: first drawing the boxplot, next the dot plots, third the density plot, and finally flipping the axes. Depending on the data, you may wish to adjust the thickness of the clouds or the size of the raindrops to prevent the items from overlapping with each other.

Hope you learn something useful from this post and don’t forget to leave your comments and suggestions below if you have any!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.