Data Visualization with ggplot2

Author
Affiliation

Shelby Golden, M.S.

Yale School of Public Health Data Science and Data Equity (DSDE)

Introduction

In this workshop we delve deeper into the domain specific language of statistical graphics that underpins the tidyverse ggplot2 package syntax: the “Grammar of Graphics”. We will explore each discrete grammar layer using laboratory-confirmed RSV hospitalizations data collected by the CDC’s Respiratory Virus Hospitalization Surveillance Network (RESP-NET) surveillance program.

With a better understanding of the syntax fundamentals, we will then get introduced to some advanced uses of ggplot2 that are commonly used in public health:

  • Making plots interactive with plotly
  • Projecting data to a map

We will close the workshop by asking Yale’s Clarity Platform to reproduce our code from the plot image alone to exhibit how AI can be used to support data visualization work. Clarity is an AI chatbot that offers similar functions to OpenAI’s ChatGPT and Microsoft Copilot with additional data protection. Find out more about Clarity’s security guidelines on “AI at Yale”.

The cleaned and harmonized version of the RSV-NET dataset was compiled as part of the YSPH’s very own PopHIVE project. Special thanks to Professor Daniel Weinberger for allow us to adopt his plot code in this workshop.

Accessing the Materials

Slides, Handouts, and Other Materials

Download the complete slide deck with annotations and the in-person workshop handout. Comments were saved in the bottom left of each slide, and references for this webpage are in its Appendix.

Codespaces

In this workshop you will need to access the R code we have prepared for the in-workshop discussion and after-workshop challenge questions. This assumes that you have downloaded R and RStudio to your local device. After you download the code, you will need to:

  1. Move the unzipped directory to the file location you wish to house the project.

  2. Open Data-Visualization-with-ggplot2.Rproj.

  3. Open Discussion and Challenge Questions.R.

  4. Initialize the environment in the R console by running:

    renv::init()               # initialize the project
    renv::restore()            # download packages and their version saved in the lockfile.

NOTE: If you are asked to update packages, say no. The renv() is intended to recreate the same environment under which the project was created, making it reproducible. You are ready to proceed when running renv::restore() gives the output:

- The library is already synchronized with the lockfile.

You can read more about renv() in their vignette. In this workshop you will need to access the R code we have prepared for the worked through example and challenge questions. If you have not already, you will need to download R to your local device, and we suggest using the integrated development environment (IDE) software RStudio. Accessing the code for this workshop requires that you have git installed on your local device, a GitHub account, and you have configured the two. If you have not done this, go through Accounts and Configurations first.

This workshop was generated using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). renv() is included to reproduce the same coding enviroment, storing all the relevant packages and package versions needed in the code. If you experience trouble running the scripts, you might want to check that the environment was initialized and that you are using the same version of R and RStudio.

Initializing the Environment

After cloning the codespace to your local device, you will need to initialize the environment using renv(). This will install all packages and versions used in the workshop, thus creating a reproducible coding environemnt.

  1. In the command-line application (i.e. Terminal for Macs and Windows Terminal for windows) navigate to the file location you want to store the repository.

    cd "/file_location/"
  2. Launch the project by opening the *.Rproj in RStudio.

  3. In the R console, activate the enviroment by runing the following lines of code:

    renv::init()          # initialize the project
    renv::restore()       # download packages and their version saved in the lockfile.
Note

If you are asked to update packages, say no. The renv() is intended to recreate the same environment under which the project was created, making it reproducible. You are ready to proceed when running renv::restore() gives the output:

- The library is already synchronized with the lockfile.

If you experience any trouble with this step, you might want to confirm that you are using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). You can also read more about renv() in their vignette.