Managing Version Control Locally

When I had to take Organic Chemistry in undergrad, my professor said that a successful student will learn why things are happening, as opposed to rote memorization. This very much applies for users of Git as well. Memorizing specific scenarios you copy-and-paste code will only get you so far with Git, as it will inevitably become a frustrating chore when things go wrong. And we’re talking about tech, we know things will go wrong at some point!

Unfortunately, we only have enough time to introduce you to the basic structures and workflows of Git, today. I encourage you to read further about the different Git commands and especially how information is moving through the Git framework. You can start with some of the sources we’ve provided earlier in the workshop (Other Recommended Resources) or some of the additional sources linked to below.

If you are beginner with Git, be patient with yourself because this part takes time and practice, but as us Colorado skiers say “If there’s no pain 🎿, there’s no Jane 🗻!” Let’s get to it!

Basics of Git/GitHub Interactions

In an established Git version-controlled environment, there are three primary domains: the Working Tree, Staged Edits, and Committed Edits. The figure below exhibits how information moves through Git’s version control and tracking framework on your local device. Notice the git * bash commands that drive the transitions between the three domains. For most scenarios, these are the Git commands you will be using.

Git workflow
Figure inspired by "What is Git commit, push, pull, log, aliases, fetch, config & clone" by MindOrks Amit Prajapati.

Let’s look at each version control domain our project files can be in:

  1. The Working Tree is the area where the developer makes changes to files in the project. In this domain, Git is not actively tracking changes that are made; changes get recycled back into the same file with no record of interim changes.

  2. When you are ready for Git to follow changes, you need to add them to the staged environment with git add. This prompts Git to identify differences between Staged Edits and already committed versions that are stored in the .git directory.

Note

You will use the same commands even if there was no previous version of a file stored in the .git directory. When you check the version control status of your project with git status, Git will automatically identify untracked files and prompt you to add them for tracking.

Git provides a robust framework to track changes within different files, including: storing comments explaining each change, tracing the file history, allowing recovery to older versions, and providing a line-by-line summary of changes from recent versions. Impressive!

Yet with all of this, Git doesn’t directly track file name changes. (What?!) Instead, file name changes are detected with a heuristic comparing the similarity of files committed in .git to renamed files that have been staged. This is an important distinction to be aware of because sometimes Git associates changes incorrectly.

Important

Moving a file within the Git initiated environment is interpreted by Git as a name change.

When committing a file name change, Git deletes the file listed under the old name and replaces it with a new addition of the same file under the new name. The change history of a file leading up to a name change is always retained, but the transition between name changes are not explicitly tracked as such. It is therefore best practice to commit name changes with a message clearly stating this.

The git mv, for move or rename, command supports clearly recording file name transitions. Example from How to rename a file in Git by Kenny DuMez:

Command-Line Application
git mv oldfilename.txt newfilename.txt
git commit -m "Rename oldfilename.txt to newfilename.txt"
  1. Once you have reviewed the detected changes and approve saving and sharing them, you use git commit to promote the staged version to become the most recent copy reflected in the .git directory. It is the most recent Committed Edits that get synced over the GitHub server through a “peer-to-peer” patch.

A configured local Git initiated project directory does not passively surveilling the remote repository’s contents. There are three core commands used to prompt your local Git to share information with the remote repository.

  • git clone: establish a Git initiated project locally by downloading the entire codebase stored in a GitHub repository. With a git clone, you set the transfer protocol (SSH key or HTML url) through which information is shared and assign the “origin” alias to the remote location.
  • git pull: download all of the information contained in GitHub and compare it with the last checked copy of the remote repository, integrating any new changes that are not yet reflected.
  • git push: upload the most current version of your project to GitHub and integrate the new changes to the remote copy. This command is used throughout the project lifetime and is used to establish a new remote repository in GitHub from a local, Git initiated project.

Detailed Walk Through

Identify and Save Local Changes

Hopefully the explanations provided thus far are eliciting an aha moment for you. But if not, and you’re otherwise ready to pretend it makes perfect sense for a little longer, let’s go through an example. Later we are going to discuss version branches of your project. For now, all you need to know is we are in a Git initialized project directory and on a version controlled branch called main.

xkcd Painbow comic
xkcd comic. Accessed from "Flowing Data"
April 22nd, 2025.

Imagine it. You have come up with what might be (and we’re scientists here so let’s not overstate) one of the best color schemes for your data ever. You generate this sweet looking, publication ready plot on the right and save the image as a file called earth_shattering_color_scheme.png. You’re now ready to record this version of your work and share it with the team through the projects GitHub (maybe even the world?), but the changes have not yet been recorded by Git.

RECALL: The same steps are taken if this is your first image file for the plot or if you have tracked previous versions in the .git directory. For our example, it will be the former case.

It is always best practice to start by checking the current status of your version controlled project directory. Say that you generated this image using R in a script called making_cool_plots.R. Running git status in the command-line application will give you something like this:

Command-Line Application
git status
Command-Line Output

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
(use "git add/rm ..." to update what will be committed)
(use "git restore ..." to discard changes in working directory)

modified:    making_cool_plots.R

Untracked files:
(use "git add ..." to include in what will be committed)

earth_shattering_color_scheme.png

These outputs can be a little intimidating, so let’s break them down section-by-section.

  1. The first statement is straight forward. Git is letting us know that we are interacting with it in a version controlled branch called main.

  2. origin/main refers to the remote GitHub repository version status of the same branch. “Origin” is an alias for the remote repository location. Equivalently, we can say something like: git@github.com:ysph-dsde/PROJECT-REPOSITORY.git/main.

    The output is telling us that our local copy of main is current with the last-checked status of the remote copy in GitHub. This means we don’t expect that there are local commits not yet reflected in the remote copy.

Note

Your local mirror of origin/main is not guaranteed to be a reflection of the remote repository contents in GitHub. Maybe you share the remote repository with a team who is actively pushing changes to the server, or maybe you changed a README.md, or uploaded a file directly in GitHub since you last synced your local copy. Your local Git is not passively checking the status of the remote repository, so these changes will go undetected until we tell Git to give GitHub a ring for a quick check-in.

It is therefore important to stipulate that origin/main in this context is the last checked status of the remote repository. Sometimes you might run into problems syncing locally committed changes with the remote repository because your local version of origin/main is not up to date.

It is best practice to commit file versions you want to retain to the local .git directory prior to updating your local mirror of origin/main. Later we will discuss the different commands Git uses to pass information from the remote repository to your local mirrored copy.

  1. The next section lists version tracked files stored in the .git directory that have been changed. Git conveniently provides a few suggestions pointing to next steps: git add/rm/restore. Above, we see the color pallet changes we made in the already version controlled R script have been detected.

  2. The “untracked files” section is where files not previously tracked by Git are listed. This means that no previously committed file versions have sufficient similarity with this file to be considered an updated version. Again, Git provides us with a suggestion on the next steps: git add.

    As expected, this is where our newly created image file is listed.

Up until this point, file iterations and changes are not recorded in Git’s version history. Before we can store a snapshot of these edits, we need to first add them to the Staged Edits domain. This is done by using the Git command git add in one of two ways:

  1. Explicitly list out each file you want to add to Staged Edits OR

  2. Use the “wild card”, ., to add every file listed under the git status output.

Command-Line Application
# OPTION #1: List each file
git add "making_cool_plots.R" "earth_shattering_color_scheme.png"

# OPTION #2: Use the wild card "." to add all files
git add .

# View the results of git add.
git status
Command-Line Output

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
(use "git restore --staged ..." to unstage)

modified:    making_cool_plots.R
new file:    earth_shattering_color_scheme.png

The files have now been moved from the Working Tree to the Staged Edits domain. Notice that this domain is not applying any changes to your project directory. Merely, the stage is a buffer between pending changes and the codified version history of the projects contents. All files promoted for saving get packaged with a message explaining the reason for the version update.

Code Like a Pro

Astute students will notice that staging edits allows the opportunity to group like changes to one commit action. This is a highly advantageous feature that allows you to better control how version changes are recorded in .git. Doing so improves bug troubleshooting and minimizes collateral changes a revert might cause on unrelated parts of the project.

In our example, the two files we have staged are related and the changes we made are objectively flawless, so let’s proceed with committing the changes using git commit. Notice that a message is required with every commit action. It is easiest to add the message inline:

git commit -m "Revelatory message elucidating the hidden secrets of git."

Because the message is a mandatory element of a commit action, if you forget to include one Git will always be there to forcibly remind you by opening an editor where you can include one. Thank you so much, Git. If you have not specified the editor Git will use in such circumstances, then it will open up the default vi editor.

To show you how to navigate this outcome, we’re going to forget to include our message.

Command-Line Application
git commit
Command-Line Output


# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit."
#
# On branch main
# Your branch is ahead of 'origin/main' by 1 commit.
# (use "git push" to publish your local commits)
#
# Changes to be committed:
# modified:    making_cool_plots.R
# new file:    earth_shattering_color_scheme.png

After entering git commit without a message, Git automatically opens a vi window inside the command-line application. There are specific commands that you will need to use inside this application. You can reference the Vim Cheat Sheet” by Richard Torruellas for help navigating this space.

To add a message you need to do the following steps:

  1. When the vi window first opens it is in “command” mode. Hit either a or i on your keyboard to enter “insert” mode. When you do this, you might see -- INSERT -- at the bottom of the application window.

  2. Add in your message in the line at the top of the document. Do NOT include a # in front, as this will cause your message to be ignored.

  3. Hit Esc to exit “insert” mode and reenter “command” mode.

  4. Save progress typing :w and hitting enter. If you are ready to exit vi then you save and exit using :wq instead. NOTE: :w/:wq will show up at the bottom of the application window.

For our example, after I entered “insert” mode by typing a on the keyboard, I add the message “Outstanding progress on color schemes for density plot fill scaling.” I then save and exit vi by hitting Esc then typing :wq and hitting enter.

Command-Line Output

Outstanding progress on color schemes for density plot fill scaling.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit."
#
# On branch main
# Your branch is ahead of 'origin/main' by 1 commit.
# (use "git push" to publish your local commits)
#
# Changes to be committed:
# modified:    making_cool_plots.R
# new file:    earth_shattering_color_scheme.png
:wq

Command-Line Output

[main f9b4cf2] Outstanding progress on color schemes for density plot
fill scaling.
2 files changed, 11 insertions (+), 2 deletions (-)
create mode 100644 earth_shattering_color_scheme.png

Successfully committing our staged edits to the .git directory gives the second output shown. Here, Git is affirming that we have associated the message “Outstanding progress on color schemes for density plot fill scaling.” to the commit hash ID f9b4cf2 in the branch main version history.

It then summarizes the total number of files changed, the total number of line insertions and deletions tallied, and finally it will summarize the file mode. In our example, we have edited two files which have a total of 11 line insertions and deletions between them. The newly created file is associated with directory object type, 100, and attributes the writeable permissions, 644. Together, mode 100644 means earth_shattering_color_scheme.png will be a “regular” file with “non-executable group-writeable” permissions (StackExchange reply by Go Dan).

Code Like a Pro

The commit ID is randomly assigned to any newly created version that has been successfully added to the .git directory. It is the same ID that will show up on GitHub to identify distinct versions and their parents, and it constitutes the first alphanumeric characters used in the 40 character SHA-1 checksum associated with the commit. Executing a git log will show the version history progression for the currently checkout branch, and will provide the entire SHA-1 checksum.

Read more in Atlassian’s Advanced Git log.

We are now well on our way to sharing our updates with the remote repository.

Preamble to Sharing Changes with GitHub

In the series of detailed command-line outputs presented above, there were a few key status cues that indicate the projects version status relative to the remote repository. These cues are summarized in the flow diagram below. Recall that origin/main denote the remote copy of the project branch main housed in GitHub. More accurately, it is the last checked state of origin/main.

summary workflow
Indicators of the projects local mirror's status relative to the copy housed in the GitHub remote repository.

Specifically take notice of the cue provided in the second box. Here, Git is telling us that our local copy of the branch main is ahead of the remote copy by one commit. Remember that changes made to files in the Working Tree go unnoticed by Git, and the committed changes are what get evaluated for version control management.

What this prompt means is we have taken one step away from the cohesive version where Git recognized our local copy of the project to be the same as the remote copy. Subsequently committed versions will automatically inherit the history of the parent version along with the newly included modifications in a linear fashion. If we had instead changed three different aspects of the project, and committed these three like changes together in separate commit actions, then this message will say that we are three commits ahead.

Pull Before You Push

Before we update the remote copy of our branch main to reflect our changes with git push, it is good practice to review updates that might have occurred in the remote repository since we last prompted Git to access it. This is done with an action called git pull. There are two possible situations you can run into after executing a git pull at this stage of the process:

  1. No changes have been made to origin/main, justifying no change to your local copy with git pull.

  2. Changes have been made to origin/main, requiring reconciliation of the divergent versions.

The first case is straightforward, and so we will only discuss two methods available to reconciling the second case. Consider the diagram below showing an example divergent commit history of the local branch main from its remote copy.

Atlassian's Git Tutorials
All three diagrams used here are from Atlassian's "Git Tutorials". Downloaded April 24th, 2025.

In purple, we have the version of origin/main retained on our local device from which we have been working. This is the version of origin/main that our local Git uses to reference changes made in the same branch, represented in blue with the commits E-F-G. In green are additional changes made directly in GitHub or from another collaborator that have been integrated into origin/main with the commits A-B-C.

There are a great deal of commands associated with Git’s version control framework. It is most helpful to slowly expand your Git vocabulary and tool box by exploring related commands, or commands that specifically address your workflow needs. Below are a couple of good places to start advancing your basic understanding of Git:

  • git log facilitates navigating the version history of your project. This command is most often used to understand version differences prior to merging or rebasing, such as following a git fetch. Read more from Atlassian’s Git Tutorial: “Advanced Git log”.
  • Atlassian’s comment on: Merging vs. Rebasing, specifically the section about the golden rule for rebasing.

In actuality, git pull is two Git actions wrapped into one command: git fetch and one of two integration protocols, git merge or git rebase. git fetch will download the files and version history currently reflected in the remote copy of origin/main as a temporary branch by the same name on your local device (read more from Atlassian’s Git Tutorial: Git fetch). In the diagrams below, we see the different ways git merge and git rebase will interpret a coalescence of the two.

Merge

Atlassian's Git Tutorials

Rebase

Atlassian's Git Tutorials

git merge conducts a three way commit: two inherited from the heads of each branch (versions C and G) and a third for the merge action that creates a new version that integrates changes relfected in C and G, here called H. git rebase will realign the commit histories of the two branches relative to one another by pasting the series of one branch at the head of the other. Doing this necessitates redefining the parent commit history for the newly appended commit series, from which we get the name “rebasing”. The figure above shows rebasing the main branch commits with origin/main A-B-C, creating the rebased versions of those commits, E’-F’-G’.

Command-Line Application
# -----------
# Pull with one command.

# OPTION #1: Integrate the fetched copy of "origin/main" into "main" with merge.
git pull                        # Assuming the default protocol is a merge

# OPTION #2: Rebases "main" with the new parent history reflected in "origin/main".
git pull --rebase               # Override the default merge to do a rebase
Command-Line Application
# -----------
# Two-step pull.

# Download branch main from the remote repository, origin.
git fetch origin main

# If needed, return to the local copy of main, not the fetched branch.
git checkout main

# OPTION #1: Integrate the fetched copy of "origin/main" into "main" with merge.
git merge FETCH_HEAD

# OPTION #2: Rebases "main" with the new parent history reflected in "origin/main".
git rebase FETCH_HEAD

# If needed, remove the fetched copy of "origin/main" saved as a branch.
git branch -d FETCH_HEAD

If you are new to Git and uncomfortable discerning when to use one option over the other, it is best to employ a git merge. Fortunately, this is the default protocol used by git pull! Regardless of your comfort level with Git, it is important to understand the different protocols and that there are times when one is better to use over the other.

Important

The complete and current copy of your project is typically stored in the remote repository in the branch origin/main, where the development team or the public can access it. In development workflows, the publicly facing copy is treated as the current baseline on which further developments needs to based one.

If you decide to try git rebase, it is best practice to never rebase the remote copy of origin/main for this reason. Instead, only rebase your local branches with the remote version.

In general, use git merge when:

  • you are new to Git or not comfortable using a rebase.
  • you have longer periods over which you develop features that rarely get merged into the main workflow.
  • you use git merge all the time for no reason, but also happen to be a wiz at using git log (and can possibly visualize in the fourth dimension), making sorting through the pile of commits a trivial task.

Use git rebase when:

  • you want to avoid over complicating the version history by keeping it linear.
  • you want to update your base with the teams current base reference in GitHub.
  • you are not adding many version tracked edits to a branch before coalescing them again.
  • you are a meticulous and neat Git user, frequently committing your work in well messaged packages, limiting the possibility of wreaking havoc on the commit history with a rebase.
Note

Notwithstanding conflicts between the branches, git merge is considered to be a non-destructive, but non-linear, way of integrating two version histories. git rebase, on the other hand, maintains a linear version history but can be destructive to the commit history or create complicated divergences that are harder to bring together if used improperly.

When choosing your coalescing method, consider the position of your branch relative to the completely integrated copy of the project and the number of version tracked commits that differ between the branches.

Time to Share with the World

Whew! We’ve covered quite a bit of ground. To summarize, we have grouped similar project edits in one commit with a message explaining our work, advancing our projects .git directory one step forward. After committing all the changes in our repository, we prompted Git to check the status of origin/main stored in GitHub, and discussed two different coalescing strategies offered by git pull: git merge and git rebase.

We are now prepared to update the remote repository with git push. Recall the figure above that summarizes key status cues:

summary workflow
Indicators of the projects local mirror's status relative to the copy housed in the GitHub remote repository.

Pushing our changes to GitHub will cause the commit HEAD of our local copy of branch main to merge with the same branch stored in the remote repository. HEAD is a term describing the latest commits available in the .git directory for that checked-out branch. Sometimes, you might see it referred to as main --> main instead.

Command-Line Application
git push origin main
Command-Line Output

Enumerating objects: 14, done.
Counting objects: 100% (14/14), done.
Delta compression using up to 8 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 90.13 KiB | 22.53 MiB/s, done.
Total 9 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.
To github.com:ysph-dsde/PROJECT-REPOSITORY.git
7d1b339..f9b4cf2 main -> main

The git push output will look something like this. It indicates that our upload was completed successfully, and supplies some details about the update. For example, it gives the previously committed versions ID, which is now updated from 7d1b339 to f9b4cf2, the remote repository location, and the size of the objects we wrote to the remote repository.

Congratulations! You have completed the in-depth walk through exhibiting standard Git mediated project version control and Git/GitHub interactions!