Getting Started with Git and GitHub

Authors
Affiliation

Shelby Golden, M.S.

Data Science and Data Equity at the Yale School of Public Health

Howard Baik, M.S.

Introduction

Git and GitHub are two of the most popular and often utilized development tools for project version control on the market. Regarless of their status, they remain challenging tools for new users to get started with. Many resources cover siloed or non-generalizable topics about one or the other. Few resources offer a one-stop-shop introduction that quickly takes users through initial set-up and through basic use cases with individual or collaborative projects.

In this workshop, we aim to merge the worlds of Git and GitHub in an approachable but comprehensive format that is accessible to people who have never used Git or GitHub before or those interested in improving their existing workflows. Maybe you are like I was, flying blind using the bare minimum of what Git and GitHub has to offer!

Over the course of the workshop chapter, we will take you through:

  • Git and GitHub account set-up with configurations by either SSH keys or HTTPS urls.
  • Detailed walk through the standard Git version control workflow and interactions with the remote repository stored in GitHub.
  • Another detailed walk through collaborating on projects with a team through GitHub.

Real-world examples with questions/answers and challenge questions/solutions are provided for:

  • Adding a local, Git initiated, project to GitHub for the first time.
  • Cloning an existing repository as a “clean-break” copy for the first time.
  • Collaborations with a shared GitHub repository (invite friends to have the full experience!)

Get Your Configurations

In order to participate in the entire workshop, you will need to have:

  • Git installed locally and configured with a user and email.
  • GitHub account.
  • A Git and GitHub transfer protocol configured and linked. Either SSH key or HTML url is acceptable, but we recommend using SSH key.

Please watch the following video to learn how you can get set up. Soon, we will release a webpage that covers the same steps.

Accessing the Materials

Slides, Handouts, and Other Materials

Download the complete slide deck and the in-person workshop handout. For now, references for this webpage can be found in the slide deck Appendix.

Workshop Part #2

The following content is from the second and final part of our workshop, where we covered the typical Git workflow and went through three worked-through examples.

Codespaces

In this workshop you will need to access the R code we have prepared for the worked through example and challenge questions. If you have not already, you will need to download R to your local device, and we suggest using the integrated development environment (IDE) software RStudio. Accessing the code for this workshop requires that you have git installed on your local device, a GitHub account, and you have configured the two. If you have not done this, go through Accounts and Configurations first.

This workshop was generated using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). renv() is included to reproduce the same coding enviroment, storing all the relevant packages and package versions needed in the code. If you experience trouble running the scripts, you might want to check that the environment was initialized and that you are using the same version of R and RStudio.

Two GitHub repositories have been created to practice using git and GitHub:

In order to practice your skills with git and GitHub using our codespaces, you will need to create a “clean-break” copy of both repositories. This will fully decouple the codespace connections from the ysph-dsde GitHub accout, and allow you full access to its contents. After you have copied the repository to your personal GitHub, you will need to clone the codespace to your local device and initialize the environment.

Below we have detailed how to do all three steps. Notice that there are two methods to do this: by the GitHub Importer tool or your command-line application (i.e. Terminal for Macs and Windows Terminal for windows). We suggest you attempt the “GitHub Importer” tool option first, and if that fails to follow the command-line steps. Please note that the importer tool will sometimes take a few minutes to fully transfer over the files.

Making a Clean-Break Copy

METHOD 1: Copying Using GitHub Importer

Note

This method is not a Fork. You can learn more about GitHub Importer here.

  1. Log in to your personal GitHub account.

  2. In the top-right of the page navigation bar, select the dropdown menu and click Import repository.

  3. Fill out the following sections:

    1. Your source repository details: Paste the https url of the repositories listed above. No credentials are required for this action.

    2. Your new repository details: Adjust the GitHub account owner as needed and create the name for the new repository. It is good practice to initially set the repository to “Private”.

  4. Click the Begin import button to copy the codespace.

  5. After a few minutes, the newly created GitHub repository webpage will open up.

If this method is successful, then proceed to the Cloning the Copied Repository section. If this is not successful, you can try using the command-line application menthod detailed in Method 2.

METHOD 2: Copying Using The Command-Line Application

These directions follow GitHub’s duplicating a repository page.

  1. Log in to your personal GitHub account.

  2. Navigate to the ysph-dsde GitHub repository you want to copy by either searching for it by name or opening the url provided above.

  3. Near the right side of the page there will be a  Begin import button to click. In its drop down menu under the “Local” tab you will see options to copy the SSH key or HTTPS url to the repository.

    For example, if the repository name is “ORIGINAL-REPOSITORY” they will look like:

    Command-Line Application
    # SSH
    git@github.com:ysph-dsde/ORIGINAL-REPOSITORY.git
    
    # HTTPS
    https://github.com/ysph-dsde/ORIGINAL-REPOSITORY.git

    Depending on your Git/GitHub configurations, you will copy one of these for the remainder of the steps.

Important

SSH keys or HTTPS urls are file transfer protocols that are used to pass information between your local git configured directory to the remote GitHub repository. Only one protocol can be set up for one Git/GitHub connection.

  1. Open the command-line application (i.e. Terminal for Macs and Windows Terminal for windows) and navigate to the file location you want to temporarily store the repository copy.

    Command-Line Application
    cd "/file_location/"
  2. Clone a bare copy of the original repository using its SSH key or HTTPS url:

    Command-Line Application
    # SSH
    git clone --bare git@github.com:ysph-dsde/ORIGINAL-REPOSITORY.git
    
    # HTTPS
    git clone --bare https://github.com/ysph-dsde/ORIGINAL-REPOSITORY.git
  3. Open the project file.

    Command-Line Application
    cd "ORIGINAL-REPOSITORY.git"
  4. Back in GitHub, in the top-right of the page navigation bar select the dropdown menu and click New repository.

  5. Fill out the following sections:

    1. Adjust the GitHub account owner as needed and create the name for the new repository.

    2. It is good practice to initially set the repository to “Private”.

    3. Do NOT use a template or include a description, README.md, .gitignore, or license.

  6. In the newly created GitHub repository under “Quick setup” you will find the repository’s SSH key or HTTPS url. Copy this.

  7. Back in the command-line application, push a mirror of the cloned git file to your newly created GitHub repository using its SSH key or HTPPS url:

    Command-Line Application
    # SSH
    git push --mirror git@github.com:EXAMPLE-USER/NEW-REPOSITORY.git
    
    # HTTPS
    git push --mirror https://github.com/EXAMPLE-USER/NEW-REPOSITORY.git

    Refresh the new GitHub reposiotry webpage to confirm the push was successful.

  8. Delete the bare cloned file used to create a new remote repository.

    Command-Line Application
    cd ..                                   # Go back one file location
    rm -rf ORIGINAL-REPOSITORY.git          # Delete the bare clone
  9. This completes creating a clean-break copy of the ysph-dsde repository codespace. Proceed with cloning the newly made repository to your local device in the following section.

Cloning the Copied Repository

Now that you have copied this repository into your own GitHub, you are ready to proceed with a standard clone to your local device.

  1. Copy the SSH key or HTTPS url to the newly created repository in your GitHub account by finding the codes under the  Begin import button.

    Command-Line Application
    # SSH
    git@github.com:ysph-dsde/NEW-REPOSITORY.git
    
    # HTTPS
    https://github.com/ysph-dsde/NEW-REPOSITORY.git
  2. In the command-line application (i.e. Terminal for Macs and Windows Terminal for windows) navigate to the file location you want to store the repository.

    Command-Line Application
    cd "/file_location/"
  3. Clone the the repository.

    Command-Line Application
    # using SSH
    git clone git@github.com:EXAMPLE-USER/NEW-REPOSITORY.git
    
    # or using HTTPS
    git clone https://github.com/EXAMPLE-USER/NEW-REPOSITORY.git
  4. OPTIONAL: You can reset the repository history, which will clear the previous commits, by running the following block of code (Source: StackExchange by Zeelot).

    Command-Line Application
    git checkout --orphan tempBranch         # Create a temporary branch
    git add -A                               # Add all files and commit them
    git commit -m "Reset the repo"
    git branch -D main                       # Deletes the main branch
    git branch -m main                       # Rename the current branch to main
    git push -f origin main                  # Force push main branch to GitHub
    git gc --aggressive --prune=all          # Remove the old files

Initializing the Environment

After cloning the codespace to your local device, you will need to initialize the environment using renv(). This will install all packages and versions used in the workshop, thus creating a reproducible coding environemnt.

  1. In the command-line application (i.e. Terminal for Macs and Windows Terminal for windows) navigate to the file location you want to store the repository.

    Command-Line Application
    cd "/file_location/"
  2. Launch the project by opening the *.Rproj in RStudio.

  3. In the R console, activate the enviroment by runing the following lines of code:

    RStudio Console
    renv::init()          # Initialize the project
    renv::restore()       # Download packages and their version saved in the lockfile.
Note

If you are asked to update packages, say no. The renv() is intended to recreate the same environment under which the project was created, making it reproducible. You are ready to proceed when running renv::restore() gives the output:

RStudio Output
- The library is already synchronized with the lockfile.

If you experience any trouble with this step, you might want to confirm that you are using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). You can also read more about renv() in their vignette.

About the Data

The Johns Hopkins Coronavirus Resource Center (JHU CRC) tracked and compiled global COVID-19 pandemic data from January 22, 2020 and March 10, 2023. These data are publically available through their two GitHub repositories. We imported two datasets for this workshop content:

  • Cumulative vaccination counts for the U.S. from their GovEX/COVID-19 GitHub repository. The raw data used in the analysis script can be found in the data_tables/vaccine_data/us_data/time_series subdirectory (original source).
  • Cumulative case and death counts for the U.S. from their CSSE GitHub. The raw data for these two datasets used in the analysis can be found in the csse_covid_19_data/csse_covid_19_time_series subdirectory (original source). Both time_series_covid19_confirmed_US.csv and time_series_covid19_deaths_US.csv were used.

The data dictionaries provided by JHU CRC can be found here: Vaccinations Dataset Data Dictionary and Cases and Deaths Datasets Data Dictionary. For our purposes, we conducted some data cleaning, harmonization, and smoothing using an isotonic regression. This included harmonizing the U.S. Census Bureau’s 2010 to 2019 population projections with 2020 to 2023 vintages.

Details about these steps can be found in the Git-and-GitHub/R directory of this workshop’s GitHub repository (link to code). The cleaned datasets used in this workshop can be found in the Git-and-GitHub/Data directory of this workshop’s GitHub repository (link to data).