Getting Started with Git and GitHub

This material is presented over two workshops. Part 1 was authored by Shelby Golden, M.S., and Part 2 is a collaborative effort by Shelby Golden, M.S., and Howard Baik, M.S. You can learn more about the instructors here.

Introduction

Git and GitHub are among the most popular and widely used tools for code-based project version control 1. Despite their prominence, they can be challenging for new users to get started with. Many resources focus on specific or non-generalizable aspects of Git or GitHub, rather than providing a comprehensive overview. Few resources offer a unified introduction that guides users from initial setup to basic use cases for both individual and collaborative projects.

One common limitation of many introductory resources is that they exclusively teach using third-party Graphical User Interfaces (GUIs) or integrations of Git in Integrated Development Environments (IDEs) like RStudio 2,3. However, Git is inherently designed as a Command-Line Interface (CLI) tool to be executed in environments like Mac’s Terminal. Using the Git CLI allows users to directly execute Git operations, offering fine-grained control that is often not possible with third-party GUIs or IDEs. Consequently, relying solely on GUIs or IDEs can limit a learner’s growth and understanding of the full capabilities of Git and its interaction with GitHub.

Workshop Learning Goals

In this workshop, we aim to bridge the gap between Git and GitHub in an accessible and thorough format using the Git CLI. This content is designed for those who are new to Git and GitHub, as well as those looking to enhance their existing workflows. Our goal is to help you confidently navigate and utilize the full range of capabilities that Git and GitHub have to offer.

Over the course of the workshop chapter, we will take you through:

Part 1 Learning Goals:

  • Understand the purpose and value of Git and GitHub in managing coding projects.
  • Learn how Git manages files for version control locally and distributes them through GitHub.
  • Set up and configure your local Git and GitHub accounts using either HTTPS or SSH Keys.

Part 2 Learning Goals:

  • Get hands-on experience using Git and GitHub for solo projects through a worked-through example showing common workflows.
  • Learn how to use GitHub to support collaboration and teamwork on group projects. Invite friends to have the full experience!

We have prepared real-world examples for the hands-on portion of this workshop. To fully engage with these materials, please create a clean-break copy of the two GitHub repositories that have been specifically prepared for this purpose. Detailed instructions for this process can be found below under Accessing the Codespaces.

Slides and Handouts

Download the complete slide decks with annotations and the in-person workshop handouts. Comments have been saved in the bottom left corner of each slide, and references for this webpage are located in the Appendix.

Part 1 Materials:

Part 2 Materials:

Coming Soon!

Below are some helpful cheat sheets that summarize commands for navigating Git, Bash, and the command-line application text editor vim. These will be the primary languages and command series used in this module.

Accessing the Codespaces

In this workshop, you will need to access the R code we have prepared for the worked-through examples and challenge questions. If you haven’t already, please download R to your local device. We also recommend using the IDE software RStudio. To access the code for this workshop, you will need Git installed on your local device, a GitHub account, and both configured. If you have not set this up yet, please follow the instructions in Configurations and Credentials first.

Two GitHub repositories have been created to practice using Git and GitHub:

To practice your Git and GitHub skills using our codespaces, you need to create independent copies of both repositories, referred to as a “clean-break” copy. This will decouple the GitHub account connections and give you full access to their contents. Once copied to your personal GitHub account, you must clone the codespace to your local device and initialize the environment.

Detailed instructions for these steps are provided below. There are two methods you can use: the GitHub Importer tool or your command-line application (i.e., Terminal for Macs or Git Bash for Windows). We recommend trying the GitHub Importer tool first; if it fails, proceed with the command-line steps. Please note that the importer tool may take a few minutes to transfer the files.

Attribution and Ownership

Please note that all materials provided in this workshop, including any code added to your personal repository, belongs to DSDE. When using or referencing this material, please ensure to cite it correctly to give proper credit to the original authors.

This workshop was created using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). The renv() package is included to reproduce the same coding environment, ensuring all relevant packages and package versions are stored. If you encounter issues running the scripts, please check that the environment is initialized and that you are using the same versions of R and RStudio.

Making a Clean-Break Copy

METHOD 1: Copying Using GitHub Importer 8

The method described here will not create a Fork. You can learn more about the GitHub Importer here.

  1. Log in to your personal GitHub account.

  2. In the top-right of the page navigation bar, select the dropdown menu and click Import repository 9.

  3. Fill out the following sections:

    1. Your source repository details: Paste the https URL of the repositories listed above. No credentials are required for this action.

    2. Your new repository details: Adjust the GitHub account owner as needed and create the name for the new repository. It is good practice to initially set the repository to “Private”.

  4. Click the Begin import button to copy the codespace.

  5. After a few minutes, the newly created GitHub repository webpage will open up.

If this method is successful, then proceed to the Cloning the Copied Repository section. If this is not successful, you can try using the command-line application menthod detailed in Method 2.

METHOD 2: Copying Using The Command-Line Application

These directions follow GitHub’s duplicating a repository page.

  1. Log in to your personal GitHub account.

  2. Navigate to the ysph-dsde GitHub repository you want to copy by either searching for it by name or opening the URL provided above.

  3. Near the right side of the page there will be a  Code button to click. In its drop down menu under the “Local” tab you will see options to copy the SSH Key or HTTPS URL to the repository.

    For example, if the repository name is “ORIGINAL-REPOSITORY” they will look like:

    Command-Line Application
    # SSH
    git@github.com:ysph-dsde/ORIGINAL-REPOSITORY.git
    
    # HTTPS
    https://github.com/ysph-dsde/ORIGINAL-REPOSITORY.git

    Depending on your Git/GitHub configurations, you will copy one of these for the remainder of the steps.

Important

SSH keys or HTTPS URLs are file transfer protocols used to pass information between your local Git-configured directory and the remote GitHub repository. Only one protocol can be set up for a single Git/GitHub connection at a given time. More details can be found in the Transfer Protocols section of Configurations and Credentials.

  1. Open the command-line application (i.e. Terminal for Macs and Git Bash for Windows) and navigate to the file location you want to temporarily store the repository copy.

    Command-Line Application
    cd "/file_path/"
  2. Clone a bare copy of the original repository using its SSH Key or HTTPS URL 10:

    Command-Line Application
    # SSH
    git clone --bare git@github.com:ysph-dsde/ORIGINAL-REPOSITORY.git
    
    # HTTPS
    git clone --bare https://github.com/ysph-dsde/ORIGINAL-REPOSITORY.git
  3. Open the project file.

    Command-Line Application
    cd "ORIGINAL-REPOSITORY.git"
  4. Back in GitHub, in the top-right of the page navigation bar select the dropdown menu and click New repository.

  5. Fill out the following sections:

    1. Adjust the GitHub account owner as needed and create the name for the new repository.

    2. It is good practice to initially set the repository to “Private”.

    3. Do NOT use a template or include a description, README.md, .gitignore, or license.

  6. In the newly created GitHub repository under “Quick setup” you will find the repository’s SSH Key or HTTPS URL. Copy this.

  7. Back in the command-line application, push a mirror of the cloned Git file to your newly created GitHub repository using its SSH Key or HTPPS URL:

    Command-Line Application
    # SSH
    git push --mirror git@github.com:EXAMPLE-USER/NEW-REPOSITORY.git
    
    # HTTPS
    git push --mirror https://github.com/EXAMPLE-USER/NEW-REPOSITORY.git

    Refresh the new GitHub repository webpage to confirm the push was successful.

  8. Delete the bare cloned file used to create a new remote repository.

    Command-Line Application
    cd ..                                   # Go back one file location
    rm -rf ORIGINAL-REPOSITORY.git          # Delete the bare clone

This completes creating a clean-break copy of the ysph-dsde repository codespace. Proceed with cloning the newly made repository to your local device in the following section.

Cloning the Copied Repository

Now that you have copied this repository into your own GitHub, you are ready to proceed with a standard clone to your local device.

  1. Copy the SSH or HTTPS URL from your GitHub repository by clicking the Code button.

    # SSH
    git@github.com:ysph-dsde/NEW-REPOSITORY.git
    
    # HTTPS
    https://github.com/ysph-dsde/NEW-REPOSITORY.git
  2. In the command-line application (i.e. Terminal for Macs and Git Bash for Windows) navigate to the file location you want to store the repository.

    Command-Line Application
    cd "/file_path/"
  3. Clone the repository.

    Command-Line Application
    # using SSH
    git clone git@github.com:EXAMPLE-USER/NEW-REPOSITORY.git
    
    # or using HTTPS
    git clone https://github.com/EXAMPLE-USER/NEW-REPOSITORY.git
  4. OPTIONAL: You can reset the repository history, which will clear the previous commits, by running the following block of code (Source: StackExchange by Zeelot11.

    Command-Line Application
    git checkout --orphan tempBranch         # Create a temporary branch
    git add -A                               # Add all files and commit them
    git commit -m "Reset the repo"
    git branch -D main                       # Deletes the main branch
    git branch -m main                       # Rename the current branch to main
    git push -f origin main                  # Force push main branch to GitHub
    git gc --aggressive --prune=all          # Remove the old files

Initializing the Environment

After cloning the codespace to your local device, you will need to initialize the environment using renv(). This step will install all the packages and versions used in the workshop, ensuring a reproducible coding environment.

  1. In the command-line application (i.e., Terminal for Macs or Git Bash for Windows) navigate to the file location you want to store the repository.

    Command-Line Application
    cd "/file_path/"
  2. Launch the project by opening the *.Rproj file in RStudio.

  3. In the R console, activate the environment by running the following lines of code:

    RStudio Console
    renv::init()          # Initialize the project.
    renv::restore()       # Download packages and their version saved in the lockfile.
Note

If you are asked to update packages, say no. The renv() is intended to recreate the same environment under which the project was created, making it reproducible. You are ready to proceed when running renv::restore() gives the output:

RStudio Output
- The library is already synchronized with the lockfile.

If you experience any trouble with this step, you might want to confirm that you are using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). You can also read more about renv() in their vignette 12.

About the Data

The Johns Hopkins Coronavirus Resource Center (JHU CRC) tracked and compiled global COVID-19 pandemic data from January 22, 2020, to March 10, 2023 13. This data is publicly available through their two GitHub repositories. For this workshop content, we imported two datasets:

  • Cumulative vaccination counts for the U.S. from their GovEX/COVID-19 GitHub repository. The raw data used in the analysis script can be found in the data_tables/vaccine_data/us_data/time_series subdirectory (original source14,15.
  • Cumulative case and death counts for the U.S. from their CSSEGISandData GitHub repository. The raw data for these two datasets used in the analysis can be found in the csse_covid_19_data/csse_covid_19_time_series subdirectory (original source). Both time_series_covid19_confirmed_US.csv and time_series_covid19_deaths_US.csv were used 16,17.

The data dictionaries provided by JHU CRC can be found here: Vaccinations Dataset Data Dictionary and Cases and Deaths Datasets Data Dictionary 18,19. For our purposes, we conducted data cleaning, harmonization, and smoothing using isotonic regression. This included harmonizing the U.S. Census Bureau’s 2010 to 2019 population projections with the 2020 to 2023 vintages.

Details about these steps can be found in the Git-and-GitHub/R directory of this workshop’s GitHub repository (link to code). The cleaned datasets used in this workshop can be found in the Git-and-GitHub/Data directory of this workshop’s GitHub repository (link to data).

Section Glossary

Command-Line Interface (CLI) A texted-base application that directly interacts with the computer’s operating system, manages files, and can run programs. It typically lacks a GUI 31.
Graphical User Interface (GUI) An interface that allows users to interact with computers through visual elements like buttons and menus 2.
Integrated Development Environment (IDE) A software application that combines tools for editing, building, testing, and debugging code into a single, user-friendly interface 3.
Version Control Manage, organize, and track different versions of files. Identify differences between versions and allows reverting to older versions 1.

References

1.
Atlassian. What is version control. Atlassian Tutorials.
2.
3.
4.
Atlassian. Git cheat sheet. Atlassian Tutorial.
5.
Torruellas, R. Vim cheat sheet. Vim (2013).
6.
Balasundaram, M. Bash keyboard shortcuts. GitHub Gist (2014).
7.
8.
9.
GitHub. Creating a new repository. GitHub Docs.
10.
GitHub. Duplicating a repository. GitHub Docs.
11.
12.
Ushey, K., Wickham, H. & Posit. Introduction to renv.
13.
Moss, Dr. B. et al. Johns hopkins coronavirus resource center. (2020).
14.
Center, J. H. U. C. R. Time series COVID-19 vaccine US. GovEX GitHub (2020).
15.
Center, J. H. U. C. R. GovEX. GovEX GitHub (2020).
16.
Center, J. H. U. C. R. Time series COVID-19 cases and deaths US. Center for Systems Science and Engineering (CSSE) (2023).
17.
Center, J. H. U. C. R. CSSEGISandData. Center for Systems Science and Engineering (CSSE) (2020).
18.
Center, J. H. U. C. R. GovEX data dictionary. GovEX GitHub (2020).
19.
Center, J. H. U. C. R. CSSEGISandData data dictionary. Center for Systems Science and Engineering (CSSE) (2020).
20.
Ph.D., K. N. Version control with git. Yale Center for Research Computing (YCRC) (2021).
21.
DeMayo, J. Gitdemo. Harvey Cushing/John Hay Whitney Medical Library (2024).
22.
W3Schools. Git tutorial. W3Schools.
23.
GitHub. Introduction to GitHub. GitHub Skills.
24.
Ph.D., J. B. & Hester, J. Let’s git started. Happy Git and GitHub for the useR.
25.
Developers, G. GitHub docs. GitHub.
26.
Developers, G. Git - reference. Git.
27.
28.
contributors, V. Git guides. Graphite.
29.
cbeams. How to write a git commit message. cbeams (2014).
30.
Git-SCM. Git - GUI clients. Git-SCM.
31.