Post #26. ggplot Map Series No.2: Creating simple feature ggplot maps from shapefiles

2023 Map Series

Come and take a look at the nuts and bolts of creating simple feature maps from external shapefiles in ggplots!

Gen-Chang Hsu
2023-05-24

Introduction

In the first post of this Map Series, we looked at how to create polygon maps in ggplots. But this is not the only way to create ggplot maps; there are indeed many more ways out there! And in this post, I’m going to show you another method: creating maps from shapefiles.

I believe you might have heard of shapefiles (or even used them before!). So what exactly are they? A shapefile (.shp) is a common geospatial vector data format for storing the locations, shapes, and attributes of geographic features. It comes with a set of related files (.shx, .prj, .dbf, etc.) that together describe vector features such as points, lines, and polygons, representing all kinds of spatial objects like mountain peaks, country boundaries, roads, rivers, and lakes.

Because of the commonness and popularity of shapefiles, I think it would be useful to have some basic ideas of how to handle them in R. Hopefully, after reaching the very end of this post, you’ll be able to make good use of shapefiles to create your own cool ggplot maps!

Creat a simple feature map

In this post, we’ll be creating a map of the number of visitors in the national parks (units managed by the National Park Service) in California in 2022. It’s called a “simple feature” map because this is the type of R spatial object we’ll use for data processing and plotting. Briefly, a simple feature object (sf) is a special data frame that contains several columns of feature attributes (the metadata like country names) and a list column of feature geometry (the coordinates representing the points/lines/polygons), with an associated coordinate reference system (crs). The R package sf is dedicated to sf objects and allows users to do all sorts of manipulation and computation.

Retrieve the data

Let’s start by preparing the data we need for making the map. The data files we have on hand are:

  1. A shapefile of California state boundary, retrieved from Natural Earth
  2. A shapefile of California national park boundaries, retrieved from National Park Service Open Data Portal
  3. A CSV file of the number of visitors in each park in 2022, retrieved from National Park Service Stats Report Viewer

We’ll download the zip files from the GitHub repository, unzip them, and read the shapefiles into R as “sf” objects using the function st_read() from the package sf.

library(tidyverse)
library(sf)

### Download the zip files from the GitHub repository and unzip them
options(timeout = 500)  # need to increase the download time

url_CA_boundary <- "https://github.com/GenChangHSU/ggGallery/blob/master/_posts/2023-05-24-post-26-ggplot-map-series-no2-simple-feature-maps/NE_State_Boundaries.zip?raw=TRUE"
download.file(url_CA_boundary, "NE_State_Boundaries.zip", mode = "wb")
unzip("NE_State_Boundaries.zip")

url_Park_boundary <- "https://github.com/GenChangHSU/ggGallery/blob/master/_posts/2023-05-24-post-26-ggplot-map-series-no2-simple-feature-maps/CA_Park_Boundaries.zip?raw=TRUE"
download.file(url_Park_boundary, "CA_Park_Boundaries.zip", mode = "wb")
unzip("CA_Park_Boundaries.zip")

### Read the shapefile of state boundaries
CA_boundary <- st_read("NE_State_Boundaries/ne_10m_admin_1_states_provinces.shp", quiet = T) %>% 
  filter(admin == "United States of America") %>%  # US state boundaries
  filter(gn_name == "California")  # California boundary

### Read the shapefile of park boundaries
Park_boundary <- st_read("CA_Park_Boundaries/National_Park_Service_Land_Resources_Division_Tract_and_Boundary_Service.shp", quiet = T) %>% 
    filter(STATE == "CA")  # California park boundaries

### Read the CSV of park visitors in 2022
Park_visitors <- read_csv("https://raw.githubusercontent.com/GenChangHSU/ggGallery/master/_posts/2023-05-24-post-26-ggplot-map-series-no2-simple-feature-maps/CA_Park_Visitation.csv") %>% 
  select(Park = Field1, Visitors = Field2) %>%  # select the park and visitor columns
  mutate(Park_name = str_remove(Park, pattern = " N[A-Z]{1,4}"))  # remove the park type abbreviations

Process the data

Now we have the data loaded in R, the next step is to process them for plotting. There are two things to do here. First, we’ll join the park boundary and park visitor data so that later we can map the numbers of visitors in the parks to different colors. Second, we’ll check the crs of the two data layers (state boundary and park boundaries) to ensure that they can be drawn on the same map.

### Join the park boundary and park visitor data
Park_boundary_visitor <- Park_boundary %>% 
  mutate(PARKNAME = case_match(PARKNAME, "Rosie the Riveter WWII Home Front" ~ "Rosie The Riveter WWII Home Front", 
                               .default = PARKNAME)) %>%  # adjust the park name 
  left_join(Park_visitors, by = join_by("PARKNAME" == "Park_name"))  # join by park names

### Check the crs of the two data layers
st_crs(CA_boundary)$epsg
[1] 4326
st_crs(Park_boundary_visitor)$epsg
[1] 3857
### Transform the crs of park boundary data
Park_boundary_visitor <- st_transform(Park_boundary_visitor, crs = 4326) 
st_crs(Park_boundary_visitor)$epsg
[1] 4326

In this example, the two layers have different crs (EPSG code 4326 vs. 3857), so we used st_transform() to convert the crs of the park boundary data into the one used by the state boundary data.

Create the map

It’s time to make the map. Simply pass the sf objects to the “data” argument in geom_sf() and it will take care of everything! We also cut the number of visitors into discrete categories for color mapping.

### Cut the number of visitors into discrete categories
Park_boundary_visitor <- Park_boundary_visitor %>% 
  mutate(Visitors_cat = case_when(Visitors < 50000 ~ "< 50,000",
                                  Visitors > 50000 & Visitors < 500000 ~ "50,000– \n500,000",
                                  Visitors > 500000 & Visitors < 1000000 ~ "500,000– \n1,000,000",
                                  Visitors > 1000000 & Visitors < 10000000 ~ "1,000,000– \n10,000,000",
                                  Visitors > 10000000 ~ "> 10,000,000")) %>% 
  mutate(Visitors_cat = factor(Visitors_cat, levels = c("< 50,000", "50,000– \n500,000", "500,000– \n1,000,000", "1,000,000– \n10,000,000", "> 10,000,000"))) %>% 
  drop_na() %>% 
  filter(OBJECTID != 14)

### Create the map
CA_park_map <- ggplot() + 
  geom_sf(data = CA_boundary) + 
  geom_sf(data = Park_boundary_visitor, aes(fill = Visitors_cat))

CA_park_map 

Polish the map

The map doesn’t really look pleasant. Let’s polish it now! We’re going to (1) change the background color of the map, (2) pick a new color palette for the visitor categories, (3) modify the legend appearance, (4) label the park names, and (5) add a title, text annotations, and a logo to the map.

# devtools::install_github("yutannihilation/ggsflabel")  # install the package if you haven't
library(ggtext)
library(ggsflabel)  # for the function "geom_sf_text_repel()"
library(showtext)  # for the function "font_add_google()" and "showtext_auto()"

### Import the new font  
font_add_google("Lora")
showtext_auto()

### Map
CA_park_map <- ggplot() + 
  
  # California state 
  geom_sf(data = CA_boundary, color = "transparent", fill = "grey80") + 
  
  # national parks
  geom_sf(data = Park_boundary_visitor, aes(color = Visitors_cat, fill = Visitors_cat)) +
  
  # national park labels
  geom_sf_text_repel(data = Park_boundary_visitor, aes(label = str_wrap(Park, width = 25)), 
                     color = "white", size = 2.75, family = "sans", fontface = "bold", 
                     segment.size = 0.2, lineheight = 0.5, vjust = 0.5, hjust = 0.5, 
                     max.overlaps = Inf, force = 3) +
  
  # title
  geom_richtext(aes(x = -115.5, y = 41.5, label = "What Are the Most Visited <br> National Parks in California?"), 
                color = "white", size = 6.5, family = "Lora", fontface = "bold.italic",
                lineheight = 0.75, fill = NA, label.color = NA, label.padding = unit(rep(0, 4), "pt")) +
  
  # text annotations
  geom_textbox(aes(x = -114.75, y = 40, label = "This is a map of the number of 
                   visitors in the national parks in California in 2022. 
                   The top three most visited parks are **Golden Gate National
                   Recreation Area**, **Yosemite National Park**, and **Joshua
                   Tree National Park**."), 
               color = "white", size = 3.75, family = "Lora", lineheight = 0.6,
               width = unit(1.8, "in"), fill = NA, box.color = NA) + 
  
  # NPS logo
  geom_richtext(aes(x = -118.75, y = 40, label = "<img src='NPS_logo.PNG' width='27.5'/>"), 
                fill = NA, label.color = NA, label.padding = unit(rep(0, 4), "pt")) + 

  # color pallete
  scale_color_manual(values = c("#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "#a50f15")) + 
  scale_fill_manual(values = c("#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "#a50f15")) + 
  
  # map limits
  coord_sf(xlim = c(-125, -112), ylim = c(31.5, 42.2)) +
  
  # legend label position
  guides(color = guide_legend(title = NULL, label.position = "bottom"),
         fill = guide_legend(title = NULL, label.position = "bottom")) + 
  
  # plot background and legend appearance
  theme_void() + 
  theme(plot.background = element_rect(fill = "grey40", color = "grey40"),
        legend.position = c(0.5, 0.05),
        legend.direction = "horizontal",
        legend.key.width = unit(0.5, "in"),
        legend.key.height = unit(0.05, "in"),
        legend.text = element_text(color = "white", size = 9, family = "Lora", 
                                   lineheight = 0.5, margin = margin(t = -2)),
        legend.spacing.x = unit(0, "in"))

CA_park_map

This is our final product. Awesome!

Summary

To recap what we did in this post, we made a ggplot map from external shapefiles. The key is to read the shapefiles into R as sf objects using the function st_read(), make sure their coordinate reference systems are compatible, and plot the sf objects using the function geom_sf(). Finally, a bit of polishing will bring the map to the next level!

Hope you learn something useful from this post and don’t forget to leave your comments and suggestions below if you have any!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.