Getting Started with Git and GitHub
This material is presented over two workshops. Part 1 was authored by Shelby Golden, M.S., and Part 2 is a collaborative effort by Shelby Golden, M.S., and Howard Baik, M.S. You can learn more about the instructors here.
Introduction
Git and GitHub are among the most popular and widely used tools for code-based project version control 1. Despite their prominence, they can be challenging for new users to get started with. Many resources focus on specific or non-generalizable aspects of Git or GitHub, rather than providing a comprehensive overview. Few resources offer a unified introduction that guides users from initial setup to basic use cases for both individual and collaborative projects.
One common limitation of many introductory resources is that they exclusively teach using third-party Graphical User Interfaces (GUIs) or integrations of Git in Integrated Development Environments (IDEs) like RStudio 2,3. However, Git is inherently designed as a Command-Line Interface (CLI) tool to be executed in environments like Mac’s Terminal. Using the Git CLI allows users to directly execute Git operations, offering fine-grained control that is often not possible with third-party GUIs or IDEs. Consequently, relying solely on GUIs or IDEs can limit a learner’s growth and understanding of the full capabilities of Git and its interaction with GitHub.
Workshop Learning Goals
In this workshop, we aim to bridge the gap between Git and GitHub in an accessible and thorough format using the Git CLI. This content is designed for those who are new to Git and GitHub, as well as those looking to enhance their existing workflows. Our goal is to help you confidently navigate and utilize the full range of capabilities that Git and GitHub have to offer.
Over the course of the workshop chapter, we will take you through:
Part 1 Learning Goals:
- Understand the purpose and value of Git and GitHub in managing coding projects.
- Learn how Git manages files for version control locally and distributes them through GitHub.
- Set up and configure your local Git and GitHub accounts using either HTTPS or SSH Keys.
Part 2 Learning Goals:
- Get hands-on experience using Git and GitHub for solo projects through a worked-through example showing common workflows.
- Learn how to use GitHub to support collaboration and teamwork on group projects. Invite friends to have the full experience!
We have prepared real-world examples for the hands-on portion of this workshop. To fully engage with these materials, please create a clean-break copy of the two GitHub repositories that have been specifically prepared for this purpose. Detailed instructions for this process can be found below under Accessing the Codespaces.
Slides and Handouts
Download the complete slide decks with annotations and the in-person workshop handouts. Comments have been saved in the bottom left corner of each slide, and references for this webpage are located in the Appendix.
Part 1 Materials:
Part 2 Materials:
Below are some helpful cheat sheets that summarize commands for navigating Git, Bash, and the command-line application text editor vim. These will be the primary languages and command series used in this module.
- Git Cheat Sheet by Atlassian 4
- Vim Cheat Sheet by Richard Torruellas 5
- Bash Shortcuts by Mohankumar Balasundaram 6
- Command Line Cheat Sheet by Tobias Günther 7
Accessing the Codespaces
In this workshop, you will need to access the R code we have prepared for the worked-through examples and challenge questions. If you haven’t already, please download R to your local device. We also recommend using the IDE software RStudio. To access the code for this workshop, you will need Git installed on your local device, a GitHub account, and both configured. If you have not set this up yet, please follow the instructions in Configurations and Credentials first.
Two GitHub repositories have been created to practice using Git and GitHub:
- Solo projects: ysph-dsde/JHU-CRC-Vaccinations
- Collaborative projects: ysph-dsde/JHU-CRC-Cases-and-Deaths
To practice your Git and GitHub skills using our codespaces, you need to create independent copies of both repositories, referred to as a “clean-break” copy. This will decouple the GitHub account connections and give you full access to their contents. Once copied to your personal GitHub account, you must clone the codespace to your local device and initialize the environment.
Detailed instructions for these steps are provided below. There are two methods you can use: the GitHub Importer tool or your command-line application (i.e., Terminal for Macs or Git Bash for Windows). We recommend trying the GitHub Importer tool first; if it fails, proceed with the command-line steps. Please note that the importer tool may take a few minutes to transfer the files.
Please note that all materials provided in this workshop, including any code added to your personal repository, belongs to DSDE. When using or referencing this material, please ensure to cite it correctly to give proper credit to the original authors.
This workshop was created using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). The renv() package is included to reproduce the same coding environment, ensuring all relevant packages and package versions are stored. If you encounter issues running the scripts, please check that the environment is initialized and that you are using the same versions of R and RStudio.
Making a Clean-Break Copy
METHOD 1: Copying Using GitHub Importer 8
The method described here will not create a Fork. You can learn more about the GitHub Importer here.
Log in to your personal GitHub account.
In the top-right of the page navigation bar, select the
dropdown menu and click Import repository 9. Fill out the following sections:
Your source repository details: Paste the https URL of the repositories listed above. No credentials are required for this action.
Your new repository details: Adjust the GitHub account owner as needed and create the name for the new repository. It is good practice to initially set the repository to “Private”.
Click the Begin import button to copy the codespace.
After a few minutes, the newly created GitHub repository webpage will open up.
If this method is successful, then proceed to the Cloning the Copied Repository section. If this is not successful, you can try using the command-line application menthod detailed in Method 2.
METHOD 2: Copying Using The Command-Line Application
These directions follow GitHub’s duplicating a repository page.
Log in to your personal GitHub account.
Navigate to the ysph-dsde GitHub repository you want to copy by either searching for it by name or opening the URL provided above.
Near the right side of the page there will be a
Code button to click. In its drop down menu under the “Local” tab you will see options to copy the SSH Key or HTTPS URL to the repository. For example, if the repository name is “ORIGINAL-REPOSITORY” they will look like:
Command-Line Application
# SSH git@github.com:ysph-dsde/ORIGINAL-REPOSITORY.git # HTTPS https://github.com/ysph-dsde/ORIGINAL-REPOSITORY.gitDepending on your Git/GitHub configurations, you will copy one of these for the remainder of the steps.
SSH keys or HTTPS URLs are file transfer protocols used to pass information between your local Git-configured directory and the remote GitHub repository. Only one protocol can be set up for a single Git/GitHub connection at a given time. More details can be found in the Transfer Protocols section of Configurations and Credentials.
Open the command-line application (i.e. Terminal for Macs and Git Bash for Windows) and navigate to the file location you want to temporarily store the repository copy.
Command-Line Application
cd "/file_path/"Clone a bare copy of the original repository using its SSH Key or HTTPS URL 10:
Command-Line Application
# SSH git clone --bare git@github.com:ysph-dsde/ORIGINAL-REPOSITORY.git # HTTPS git clone --bare https://github.com/ysph-dsde/ORIGINAL-REPOSITORY.gitOpen the project file.
Command-Line Application
cd "ORIGINAL-REPOSITORY.git"Back in GitHub, in the top-right of the page navigation bar select the
dropdown menu and click New repository. Fill out the following sections:
Adjust the GitHub account owner as needed and create the name for the new repository.
It is good practice to initially set the repository to “Private”.
Do NOT use a template or include a description,
README.md,.gitignore, or license.
In the newly created GitHub repository under “Quick setup” you will find the repository’s SSH Key or HTTPS URL. Copy this.
Back in the command-line application, push a mirror of the cloned Git file to your newly created GitHub repository using its SSH Key or HTPPS URL:
Command-Line Application
# SSH git push --mirror git@github.com:EXAMPLE-USER/NEW-REPOSITORY.git # HTTPS git push --mirror https://github.com/EXAMPLE-USER/NEW-REPOSITORY.gitRefresh the new GitHub repository webpage to confirm the push was successful.
Delete the bare cloned file used to create a new remote repository.
Command-Line Application
cd .. # Go back one file location rm -rf ORIGINAL-REPOSITORY.git # Delete the bare clone
This completes creating a clean-break copy of the ysph-dsde repository codespace. Proceed with cloning the newly made repository to your local device in the following section.
Cloning the Copied Repository
Now that you have copied this repository into your own GitHub, you are ready to proceed with a standard clone to your local device.
Copy the SSH or HTTPS URL from your GitHub repository by clicking the
Code button. # SSH git@github.com:ysph-dsde/NEW-REPOSITORY.git # HTTPS https://github.com/ysph-dsde/NEW-REPOSITORY.gitIn the command-line application (i.e. Terminal for Macs and Git Bash for Windows) navigate to the file location you want to store the repository.
Command-Line Application
cd "/file_path/"Clone the repository.
Command-Line Application
# using SSH git clone git@github.com:EXAMPLE-USER/NEW-REPOSITORY.git # or using HTTPS git clone https://github.com/EXAMPLE-USER/NEW-REPOSITORY.gitOPTIONAL: You can reset the repository history, which will clear the previous commits, by running the following block of code (Source: StackExchange by Zeelot) 11.
Command-Line Application
git checkout --orphan tempBranch # Create a temporary branch git add -A # Add all files and commit them git commit -m "Reset the repo" git branch -D main # Deletes the main branch git branch -m main # Rename the current branch to main git push -f origin main # Force push main branch to GitHub git gc --aggressive --prune=all # Remove the old files
Initializing the Environment
After cloning the codespace to your local device, you will need to initialize the environment using renv(). This step will install all the packages and versions used in the workshop, ensuring a reproducible coding environment.
In the command-line application (i.e., Terminal for Macs or Git Bash for Windows) navigate to the file location you want to store the repository.
Command-Line Application
cd "/file_path/"Launch the project by opening the
*.Rprojfile in RStudio.In the R console, activate the environment by running the following lines of code:
RStudio Console
renv::init() # Initialize the project. renv::restore() # Download packages and their version saved in the lockfile.
If you are asked to update packages, say no. The renv() is intended to recreate the same environment under which the project was created, making it reproducible. You are ready to proceed when running renv::restore() gives the output:
RStudio Output
- The library is already synchronized with the lockfile.If you experience any trouble with this step, you might want to confirm that you are using R (v 4.4.3) in the RStudio IDE (v 2024.12.1+563). You can also read more about renv() in their vignette 12.
About the Data
The Johns Hopkins Coronavirus Resource Center (JHU CRC) tracked and compiled global COVID-19 pandemic data from January 22, 2020, to March 10, 2023 13. This data is publicly available through their two GitHub repositories. For this workshop content, we imported two datasets:
- Cumulative vaccination counts for the U.S. from their GovEX/COVID-19 GitHub repository. The raw data used in the analysis script can be found in the
data_tables/vaccine_data/us_data/time_seriessubdirectory (original source) 14,15. - Cumulative case and death counts for the U.S. from their CSSEGISandData GitHub repository. The raw data for these two datasets used in the analysis can be found in the
csse_covid_19_data/csse_covid_19_time_seriessubdirectory (original source). Bothtime_series_covid19_confirmed_US.csvandtime_series_covid19_deaths_US.csv were used16,17.
The data dictionaries provided by JHU CRC can be found here: Vaccinations Dataset Data Dictionary and Cases and Deaths Datasets Data Dictionary 18,19. For our purposes, we conducted data cleaning, harmonization, and smoothing using isotonic regression. This included harmonizing the U.S. Census Bureau’s 2010 to 2019 population projections with the 2020 to 2023 vintages.
Details about these steps can be found in the Git-and-GitHub/R directory of this workshop’s GitHub repository (link to code). The cleaned datasets used in this workshop can be found in the Git-and-GitHub/Data directory of this workshop’s GitHub repository (link to data).
Other Recommended Resources
Although there are numerous resources available on Git and GitHub, we have curated a selection of additional sites and tutorials that complement and expand upon our content. Some of these resources will reinforce the fundamentals covered in this workshop, while others offer advanced materials and guides to help you further develop your skills as you gain experience.
Fundamentals
- Yale’s Center for Research Computing workshop “Version Control by Git” by Kaylea Nelson, Ph.D. 20
- Yale’s Harvey Cushing/John Hay Whitney Medical Library workshop “Git & GitHub: An Introduction To Version Contro” by Justin DeMayo 21
- “Getting Git Right” by Atlassian
- Git and GitHub Tutorial by W3Schools 22
- “Introduction to GitHub” by GitHub 23
- Happy Git and GitHub for user by Professor Jenny Bryan (and Yale alumn!) and Jim Hester 24
Beyond Basic Git and GitHub
- Reviewing the developer documentation: git-scm.com/docs and docs.github.com 25,26
- “What is Git commit, push, pull, log, aliases, fetch, config & clone” by Amit Prajapati 27
- “Git Guides” by various Graphite contributors 28
- “How to Write a Git Commit Message” by cbeams 29
- Git Graphical User Interface (GUI) Clients by various contributors 30
Section Glossary
| Command-Line Interface (CLI) | A texted-base application that directly interacts with the computer’s operating system, manages files, and can run programs. It typically lacks a GUI 31. |
| Graphical User Interface (GUI) | An interface that allows users to interact with computers through visual elements like buttons and menus 2. |
| Integrated Development Environment (IDE) | A software application that combines tools for editing, building, testing, and debugging code into a single, user-friendly interface 3. |
| Version Control | Manage, organize, and track different versions of files. Identify differences between versions and allows reverting to older versions 1. |