Content from Motivation


Last updated on 2024-07-25 | Edit this page

Overview

Questions

  • Why version control?
  • Why Git?

Objectives

  • Understand the benefits of an automated version control system.
  • Understand the basics of how automated version control systems work.

Why do we need version control?


We’ll start by exploring how version control can be used to keep track of what one person did and when. Even if you aren’t collaborating with other people, automated version control is much better than this situation:

Piled Higher and Deeper by Jorge Cham, http://www.phdcomics.com/comics/archive_print.php?comicid=1531

We’ve all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word’s Track Changes, Google Docs’ version history, or LibreOffice’s Recording and Displaying Changes.

Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

Changes Are Saved Sequentially

Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.

Different Versions Can be Saved

Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.

Multiple Versions Can be Merged

A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.

Have you ever said or heard…

  • “I will just finish my work and then you can start with your changes.”.
  • “Can you please send me the latest version?”.
  • “Where is the latest version?”.
  • “Which version are you using?”.
  • “Which version have the authors used in the paper I am trying to reproduce?”.

Then version control is for you

Discussion : Paper Writing

  • Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve the excellent version of your conclusion? Is it even possible?

  • Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper? If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the Track Changes option? Do you have a history of those changes?

What is version control?


There are lots of different tools that implement version control (generally referred to as Version Control Systems, VCS). They all have common features, including:

  • A system which records snapshots of a project
  • Implementation of branching:
    • you can work on several feature branches and switch between them
    • different people can work on the same code/project in parallel without interfering
    • you can experiment with an idea and discard it if it turns out to be a bad idea
  • Implementation of merging:
    • tool to merge different versions of a file

Code becomes a disaster without version control


Roll-back functionality

  • Mistakes happen - without recorded snapshots you cannot easily undo mistakes and go back to a working version.

Branching

  • Often you want to experiment with an idea, or work on different approaches in one file - without branching this can be messy and confusing.
  • You could simulate branching by copying an entire project to multiple places but this would be messy and confusing.

Reproducibility

  • How do you indicate which version of your code you have used in your paper?
  • When you find a bug, how do you know when precisely this bug was introduced (are published results affected? do you need to inform collaborators or users of your code?).

Storing and sharing code

  • Online repositories, for safely storing snapshots and version history, are a cornerstone of version control
  • Can act as a file backup, a collaboration tool, a download source, a citable reference, etc..

We will use Git as our VCS to record snapshots of our work - why Git?


  • Easy to set up - use even by yourself with no server needed.
  • Very popular: if contributing to somebody else’s code, chances are it’s tracked with Git.
  • Distributed: good backup, no single point of failure, you can track and clean-up changes offline, simplifies collaboration model for open-source projects.
  • Important platforms such as GitHub, GitLab, and Bitbucket build on top of Git.
  • Sharing software and data is getting popular and required in research context and sites like GitHub are a popular platform for sharing software.

Key Points

  • Version control is like an unlimited ‘undo’.
  • Version control also allows many people to work in parallel.

Content from In-browser session


Last updated on 2024-07-29 | Edit this page

Overview

Questions

  • Where are we heading?

Objectives

  • See an existing repository in action.
  • Browse the history.
  • See the big picture first before we dive into details.

In-browser session

  • We will explore and visualize an existing Git repository on BitBucket.
  • The goal of this episode is not to teach BitBucket, but rather to get a glimpse of the wider picture before going into the details.

Why?

  • Often our first contact with Git is an existing repository.
  • It’s good to see the social aspect to know what our end goal is.
  • Don’t worry about the details of the steps here, just investigate what we can learn about an existing repository.

Bitbucket demo

We’ll start with a very simple existing repository that contains a brief history from a couple of different people.

BitBucket example screen
  • History
  • Reproducibility
  • Collaboration
    • You can refer to code portions (so much simpler to send a link rather than describe which file to open and where to scroll to).
    • Note the contributors.
  • As a file source
    • We can create a local copy of all files in a repository through a clone
    • git clone https://bitbucket.csiro.au/scm/dat/programmatic-data-example.git

These features are all based on the core underlying Git system, that works independently of Bitbucket and is shared by other sites such as GitHub and GitLab.

GitHub demo

These lessons are themselves tracked through Git, but stored on GitHub (making use of the automatic webpage generation feature GitHub Pages).
Note that GitHub looks very different to BitBucket, but all of the same information and features may be found, with a click around.

This version of Git Intro lessons is a modified “fork” of an older version, which was itself a “fork” of other older versions. A fork is a full copy of a repository into a new repository, which retains full history but allows it to branch in its own separate direction, under separate control. We’ll discuss forks further later.

The bulk of these lessons are thanks to the Code Refinery and Software Carpentry Git lessons, also in their own Git repositories.

Key Points

  • There are multiple online repositories for storing projects that all use the underlying Git framework
  • Bitbucket is one such service, using Git to provide functionality to collaborate with other people
  • We can browse the history of the contents of repositories and see who made which changes
  • Other platforms like Github look different by employ the same fundamentals

Content from Configuring Git


Last updated on 2024-07-29 | Edit this page

Overview

Questions

  • How do we configure git?
  • What are our options for text editors?

Objectives

  • Learn how to configure the most useful git options.

Configuring Git


All the configuration we enter here will be stored in a file ~/.gitconfig.

When we use Git on a new computer for the first time, we need to configure a few things. Below are a few examples of configurations we will set as we get started with Git:

  • our name and email address,
  • what our preferred text editor is,
  • and that we want to use these settings globally (i.e. for every project).

Let’s work through it together in Git Bash.

First, the following commands will set your user name and email address:

BASH

$ git config --global user.name "Your Name"
$ git config --global user.email yourname@example.com

The name and contact email will be recorded together with the code changes when we run git commit.

Private email

If you’re using GitHub, and if you’d like to keep your personal email address private, you can use a GitHub-provided no-reply email address as your commit email address. See here for further details.

Line Endings

As with other keys, when you hit Return on your keyboard, your computer encodes this input as a character. Different operating systems use different character(s) to represent the end of a line. (You may also hear these referred to as newlines or line breaks.) Because Git uses these characters to compare files, it may cause unexpected issues when editing a file on different machines. Though it is beyond the scope of this lesson, you can read more about this issue on this GitHub page.

You can change the way Git recognizes and encodes line endings using the core.autocrlf command to git config. The following settings are recommended:

On macOS and Linux:

BASH

$ git config --global core.autocrlf input

And on Windows:

BASH

$ git config --global core.autocrlf true

Setting a default text editor


When you work with Git, you often need to make small text files to describe a ‘snapshot’. When this is necessary, Git will open whatever default text editor you have set. This means it’s often useful to choose which text editor you prefer, and set it as the default. On your local machine, you can set it to be whatever you like, but if you’re working on a remote system, you will only have access to editors that are available there.

Below is a list of commands to set the default editor to a list of common tools. If you don’t have any of these available, you might want to install Sublime Text, which is a great option that you can download from https://www.sublimetext.com/3, or VS Code, which is also great and in addition is free.

Editor Configuration command
Atom $ git config --global core.editor "atom --wait"
BBEdit (Mac, with command line tools) $ git config --global core.editor "bbedit -w"
Emacs $ git config --global core.editor "emacs"
Gedit (Linux) $ git config --global core.editor "gedit --wait --new-window"
Kate (Linux) $ git config --global core.editor "kate"
nano $ git config --global core.editor "nano -w"
Notepad++ (Win, 32-bit install) $ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
Notepad++ (Win, 64-bit install) $ git config --global core.editor "'c:/program files/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
Scratch (Linux) $ git config --global core.editor "scratch-text-editor"
Sublime Text (Mac) $ git config --global core.editor "/Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl -n -w"
Sublime Text (Win, 32-bit install) $ git config --global core.editor "'c:/program files (x86)/sublime text 3/sublime_text.exe' -w"
Sublime Text (Win, 64-bit install) $ git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w"
Vim $ git config --global core.editor "vim"
VS Code $ git config --global core.editor "code --wait"

Git Help and Manual

Always remember that if you forget a git command, you can access the list of commands by using -h and access the Git manual by using --help :

BASH

$ git config -h
$ git config --help

Optional: Git GUI


You might find it easier to know what is going on if you install a Graphical User Interface.

There are many options here. Depending on your installation of git you might have a built-in basic GUI called gitk or [QGit](https://github.com/tibirna/qgit#readme). This is free. Alternatively you might try a commercial git GUI. Here are some popular ones:

Sourcetree is available for CSIRO staff within the Software Center.

None of these are needed for this introductory tutorial, but they can be helpful to build understanding.

Key Points

  • git configuration is all stored in ~/.gitconfig
  • The config is specific to each computer you use.

Content from Our first repo


Last updated on 2024-07-29 | Edit this page

Overview

Questions

  • What is a repository?
  • How does Git operate?
  • How do I make commits?
  • How do I select what to commit?

Objectives

  • Learn to create Git repositories and make commits.
  • Get a grasp of the structure of a repository.
  • Learn how to inspect the project history.
  • Learn how to write useful commit log messages.

Tracking a guacamole recipe with Git


We will learn how to initialize a Git repository, how to track changes, and how to make delicious guacamole!

This example is inspired by Byron Smith, for original reference, see this thread. The motivation for taking a cooking recipe instead of a program is that everybody can relate to cooking but not everybody may be able to relate to a program written in e.g. Python or another language.

Let’s start.

Make a new directory for this lesson. We’ll store the Git repositories we make inside this directory.

One of the basic principles of Git is that it is easy to create repositories:

From inside your new directory:

BASH

$ mkdir recipe
$ cd recipe
$ git init

That’s it! We have now created an empty Git repository.

If we use ls to show the directory’s contents, it appears that nothing has changed:

BASH

$ ls

But if we add the -a flag to show everything, we can see that Git has created a hidden directory within recipe called .git:

BASH

$ ls -a 

OUTPUT

. ..  .git 

Git uses this special sub-directory to store all the information about the project, including all files and sub-directories located within the project’s directory. If we ever delete the .git sub-directory, we will lose the project’s history.

We will use git status a lot to check out to see what is going on with the repository:

BASH

$ git status

OUTPUT

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

We will make sense of this information during this lesson.

So what exactly is a Git repository?


  • Remember Git is a version control system: it records snapshots and tracks the content of a folder as it changes over time.
  • Every time we commit a snapshot, Git records a snapshot of the entire project, saves it, and assigns it a version.
  • It does this efficiently, by recording just the changes from one snapshot to the next, called the diff.
  • These snapshots are kept inside the .git sub-folder.
  • If we remove .git, we remove the repository and history (but keep the working directory!).
  • .git uses relative paths - you can move the whole thing somewhere else and it will still work
  • Git doesn’t do anything unless you ask it to (it does not record anything automatically).

Recording a snapshot with Git


  • Git takes snapshots only if we request it.
  • We will record changes always in two steps (we will later explain why this is a recommended practice):

BASH

$ git add somefile.txt
$ git commit

$ git add file.txt anotherfile.txt
$ git commit
  • We first focus (git add, we “stage” the change), then shoot (git commit):
git staging and committing

Discussion

What do you think will be the outcome if you stage a file and then edit it and stage it again, do this several times and at the end perform a commit? (think of focusing several scenes and pressing the shoot button only at the end)


So that’s the concept - let’s do it for real.

Let’s create two files.

One file is called instructions.txt and contains:

* chop avocados
* chop onion
* squeeze lime
* add salt
* and mix well

The second file is called ingredients.txt and contains:

* 2 avocados
* 1 lime
* 2 tsp salt

As mentioned above, in Git you can always check the status of files in your repository using git status. It is always a safe command to run and in general a good idea to do when you are trying to figure out what to do next:

BASH

$ git status

OUTPUT

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	ingredients.txt
	instructions.txt

nothing added to commit but untracked files present (use "git add" to track)

The two files are untracked in the repository (directory). Going back to the photography analogy, you want to add the files (focus the camera) to the list of files tracked by Git. Git does not track any files automatically and you need make a conscious decision to add a file. Let’s do what Git hints at and add the files:

BASH

$ git add ingredients.txt
$ git add instructions.txt
$ git status

OUTPUT

On branch main

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   ingredients.txt
	new file:   instructions.txt

Now this change is staged and ready to be committed (the camera is focused and we’re ready to take the snapshot).

Let’s now commit the change to the repository:

BASH

$ git commit -m "adding ingredients and instructions"

OUTPUT

[main (root-commit) aa243ea] adding ingredients and instructions
 2 files changed, 8 insertions(+)
 create mode 100644 ingredients.txt
 create mode 100644 instructions.txt

Right after we query the status to get this useful command into our muscle memory:

BASH

$ git status

Looking at the history

Now try git log to see the information that git has stored about your snapshot:

BASH

$ git log

OUTPUT

commit 787611f02dd6fc862c87359b804859caa5d2fdbd
Author: Alex Whan <alexwhan@gmail.com>
Date:   Wed Mar 13 17:07:44 2019 +1100

    adding ingredients and instructions
  • We can browse the development and access each state that we have committed.
  • The long hashes uniquely label a state of the code.
  • They are not just integers counting 1, 2, 3, 4, … (why?).
  • We will use them when comparing versions and when going back in time.
  • git log --oneline only shows the first 7 characters of the commit hash and is good to get an overview.
  • If the first characters of the hash are unique it is not necessary to type the entire hash.
  • git log --stat is nice to show which files have been modified.

Challenge 1

Add 1/2 onion to ingredients.txt and also the instruction to “enjoy!” to instructions.txt. Do not stage the changes yet.

When you are done editing the files, run git diff:

BASH

$ git diff

What does the output tell you?

OUTPUT

diff --git a/ingredients.txt b/ingredients.txt
index 2607525..ec0abc6 100644
--- a/ingredients.txt
+++ b/ingredients.txt
@@ -1,3 +1,4 @@
* 2 avocados
* 1 lime
* 2 tsp salt
+* 1/2 onion
diff --git a/instructions.txt b/instructions.txt
index 6a8b2af..f7dd63a 100644
--- a/instructions.txt
+++ b/instructions.txt
@@ -3,3 +3,4 @@
* squeeze lime
* add salt
* and mix well
+* enjoy!
  • The output shows which files are being compared - the “before” and “after” versions of the same file.
  • The new lines added are prefixed with a + sign to show that they are new.

Challenge 2

Stage and commit each change separately. For the second commit, don’t use the -m flag.

What are the steps to run?

What happens if you don’t use -m?

A possible example:

BASH

$ git add ingredients.txt
$ git commit -m "add half an onion"
$ git add instructions.txt
$ git commit                   

When you leave out the -m flag, Git should open an editor where you can edit your commit message. This message will be associated and stored with the changes you made. This message is your chance to explain what you’ve done and convince others (and your future self) that the changes you made were justified.

Using a text editor (instead of -m) can be useful because you can include much longer commit messages.

Writing useful commit messages

Using git log --oneline we understand that the first line of the commit message is very important.

Good example:

increase threshold alpha to 2.0

the motivation for this change is
to enable ...
...

Convention: one line summarizing the commit, then one empty line, then paragraph(s) with more details in free form, if necessary.

  • Bad commit messages: “fix”, “oops”, “save work”, “foobar”, “toto”, “qppjdfjd”, ““.
  • For your amusement: http://whatthecommit.com
  • Write commit messages in English that will be understood 15 years from now by someone else than you.

Ignoring files and paths with .gitignore


Some files should not be tracked in a Git repository. This includes files that are: - specific to a particular computer - contain sensitive information - large, binary files - compiled files

Discussion

What could be the problems raised by committing the above files to a repo?

For this we use .gitignore files. Example:

# ignore R binary files
*.RData
# ignore .exe files
*.exe

Challenge 3

Make a new file called my-personal-notes.txt. Add some content to the file that describes your feelings about Git so far…

Since you might not want these comments seen by collaborators, make sure it is ignored by git

By adding the path my-personal-notes.txt to the .gitignore file, your personal thoughts about Git won’t be added to any snapshots.

You can have .gitignore files in lower level directories and they affect the paths relatively.

.gitignore should be part of the repository (why?).

Keep your repo clean

  • Use git status a lot.
  • Use .gitignore.
  • If you don’t want to track a file, it should be listed in .gitignore.
  • All files should be either tracked or ignored.

GUI tools


It is also possible to work from within a Git graphical user interface (GUI):


Summary


Now we know how to save snapshots (commits):

BASH

$ git add <file(s)>
$ git commit

And this is what we do as we program.

Every state is then saved and later we will learn how to go back to these “checkpoints” and how to undo things.

BASH

$ git init    # initialize new repository
$ git add     # add files or stage file(s)
$ git commit  # commit staged file(s)
$ git status  # see what is going on
$ git log     # see history
$ git diff    # show unstaged/uncommitted modifications
$ git show    # show the change for a specific commit
$ git mv      # move tracked files
$ git rm      # remove tracked files

Key Points

  • Initializing a Git repository is simple: git init
  • Commits should be used to tell a story.
  • Git uses the .git folder to store the snapshots.

Content from Undoing things


Last updated on 2024-07-29 | Edit this page

Overview

Questions

  • How can I undo things?

Objectives

  • Learn to undo changes safely
  • See when changes are permanently deleted and when they can be retrieved

Undoing things


The whole point of version control is that you can be see and retrieve previous snapshots of your work.

With Git, if you have committed (made a snapshot) of a file, you can get it back, even if it gets deleted or modified in the future.

Some commands will modify the hitory of a Git repository. This is generally a very bad idea, and you should only do it if you’re really confident you know what you’re doing.

If changes are uncommitted, they are not safe, and if they are deleted, they are gone.


Reverting commits

  • Imagine we made a few commits.
  • We realize that the latest commit was a mistake and we wish to undo it:

BASH

$ git log --oneline

OUTPUT

f960dd3 (HEAD -> main) not sure this is a good idea
40fbb90 draft a readme
dd4472c we should not forget to enjoy
2bb9bb4 add half an onion
2d79e7e adding ingredients and instructions

A safe way to undo the commit is to revert the commit with git revert:

BASH

$ git revert f960dd3

This creates a new commit that does the opposite of the reverted commit. The old commit remains in the history:

BASH

$ git log --oneline

OUTPUT

d62ad3e (HEAD -> main) Revert "not sure this is a good idea"
f960dd3 not sure this is a good idea
40fbb90 draft a readme
dd4472c we should not forget to enjoy
2bb9bb4 add half an onion
2d79e7e adding ingredients and instructions

Challenge 1

Make a new commit in your guacamole repo (it can be whatever you like)

Inspect the history, and revert the commit.


Undo unstaged/uncommitted changes

git restore is used to restore a file, or all files, back to a previously commited state.

Note: Older versions of git used git checkout <file> instead of restore.

DANGER!!

This command permanently deletes any changes that haven’t been staged/committed!

Modify before staging

  • Make a silly change to repo, do not stage it or commit it.
  • Inspect the change with git status and git diff.
  • Now undo the change with git restore <file>.
  • Verify that the change is gone with git status and git diff.

Modify after staging

  • Make a reasonable change to a project, stage it.
  • Make a silly change after you have staged the reasonable change.
  • Inspect the situation with git status, git diff, git diff --staged, and git diff HEAD.
  • Now undo the silly change with git restore <file>.
  • Inspect the new situation with git status, git diff, git diff --staged, and git diff HEAD.

Challenge 2

How much do you trust Git…?

Delete one (or more) of your committed files using any method you like.

Can you get it/them back?

As long as a file was committed, you can get it back with git restore <file>, but Git isn’t magic - you won’t have any changes that weren’t committed.

Recovering a previous versions


Because git stores the complete history of whatever snapshots you have recorded, you can step back to anyone of them at different levels of detail, from the complete working directory, to single files, even to single changes within files.

To get back to a previous state for the whole working directory, you can use git restore [-s <commit>] .. Jumping back to a previous commit can be useful for having a look at files, but you may get a warning about a detached HEAD (more discussion later).

Recovering previous versions of single files

Often it’s useful to be able to access a previous version of a particular file. When the restore command is given a file path (along with a reference to a commit with “-s” or “–source=”) it will update that path to the previous state.

Let’s see that with our recipe. If we wanted to get ingredients.txt back to its state before the addition of the onion, we could run

BASH

git restore -s 2d79e7e ingredients.txt

OUTPUT

Updated 1 path from 2d79e7e

If you run git status you’ll see that the changes to the file ingredients.txt, bringing it back to the previous state, are already staged and ready to be committed.

Key Points

  • Git history can be reverted without modifying it
  • Once changes are committed they are safe
  • Changes that are not committed can be deleted

Content from Branching and merging


Last updated on 2024-07-29 | Edit this page

Overview

Questions

  • How can I or my team work on multiple features in parallel?
  • How to combine the changes of parallel tracks of work?
  • How can I permanently reference a point in history, like a software version?

Objectives

  • Be able to create and merge branches.
  • Know the difference between a branch and a tag.

Motivation for branches


In the previous section we tracked a guacamole recipe with Git.

Up until now our repository had only one branch with one commit coming after the other:

Linear git repository
  • Commits are depicted here as little boxes with abbreviated hashes.
  • Here the branch main points to a commit.
  • “HEAD” is the current position (remember the recording head of tape recorders?).
  • When we talk about branches, we often mean all parent commits, not only the commit pointed to.

Now we want to do this:

Merging branches
(Source: https://twitter.com/jay_gee/status/703360688618536960)


Software development is often not linear:

  • We typically need at least one version of the code to “work” (to compile, to give expected results, …).
  • At the same time we work on new features, often several features concurrently. Often they are unfinished.
  • We need to be able to separate different lines of work really well.

The strength of version control is that it permits the researcher to isolate different tracks of work, which can later be merged to create a composite version that contains all changes:

Git collaborative graph
  • We see branching points and merging points.
  • Main line development is often called main (or master in older conventions).
  • Other than this convention there is nothing special about main, it is just a branch.
  • Commits form a directed acyclic graph (we have left out the arrows to avoid confusion about the time arrow).

A group of commits that create a single narrative are called a branch. There are different branching strategies, but it is useful to think that a branch tells the story of a feature, e.g. “fast sequence extraction” or “Python interface” or “fixing bug in matrix inversion algorithm”.


A useful alias

We will now define an alias in Git, to be able to nicely visualize branch structure in the terminal without having to remember a long Git command:

BASH

$ git config --global alias.graph "log --all --graph --decorate --oneline"

Let us inspect the project history using the git graph alias:

BASH

$ git graph

OUTPUT

* dd4472c (HEAD -> main) we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions
  • We have three commits and only one development line (branch) and this branch is called main.
  • Commits are states characterized by a 40-character hash (checksum).
  • git graph print abbreviations of these checksums.
  • Branches are pointers that point to a commit.
  • Branch main points to commit dd4472c8093b7bbcdaa15e3066da6ca77fcabadd.
  • HEAD is another pointer, it points to where we are right now (currently main)

On which branch are we?

To see where we are (where HEAD points to) use git branch:

BASH

$ git branch

OUTPUT

* main
  • This command shows where we are, it does not create a branch.
  • There is only main and we are on main (star represents the HEAD).

In the following we will learn how to create branches, how to switch between them, how to merge branches, and how to remove them afterwards.


Creating and working with branches


Let’s create a branch called experiment where we add cilantro to ingredients.txt.

BASH

$ git branch experiment main   # create branch called "experiment" from main
                                 # pointing to the present commit
$ git switch experiment        # switch to branch "experiment"
$ git branch                     # list all local branches and show on which branch we are
  • Verify that you are on the experiment branch (note that git graph also makes it clear what branch you are on: HEAD -> branchname):

BASH

$ git branch

OUTPUT

* experiment
  main
  • Then add 2 tbsp cilantro on top of the ingredients.txt:
* 2 tbsp cilantro
* 2 avocados
* 1 lime
* 2 tsp salt
* 1/2 onion
  • Stage this and commit it with the message “let us try with some cilantro”.
  • Then reduce the amount of cilantro to 1 tbsp, stage and commit again with “maybe little bit less cilantro”.

We have created two new commits:

BASH

$ git graph

OUTPUT

* 6feb49d (HEAD -> experiment) maybe little bit less cilantro
* 7cf6d8c let us try with some cilantro
* dd4472c (main) we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions
  • The branch experiment is two commits ahead of main.
  • We commit our changes to this branch.

Interlude: The multipurpose “checkout” command

Older versions of git used git checkout for the actions now handled by both restore and switch. git checkout can still be found in a lot of documentation, Git tools, and scripts. Depending on the context git checkout can do very different actions:

  1. Switch to a branch:

BASH

$ git checkout <branchname>
  1. Bring the working tree to a specific state (commit):

BASH

$ git checkout <hash>
  1. Set a file/path to a specific state (throws away all unstaged/uncommitted changes):

BASH

$ git checkout <path/file>

This is unfortunate from the user’s point of view but the way Git is implemented it makes sense. Picture git checkout as an operation that brings the working tree to a specific state. The state can be a commit or a branch (pointing to a commit).

In Git 2.23 (2019-08-16) and later this is much nicer:

BASH

$ git switch <branchname>  # switch to a different branch
$ git restore <path/file>  # discard changes in working directory

Exercise: create and commit to branches

In this exercise, you will create two new branches, make new commits to each branch. We will use this in the next section, to practice merging.

  • Change to the branch main.
  • Create another branch called less-salt
    • Note! Makes sure you are on main branch when you create the less-salt branch. A safer way would be to explicitly specify that you want to branch from the main branch, e.g.:
      git branch less-salt main
  • On this new branch reduce the amount of salt in your recipe.
  • Commit your changes to this less-salt branch.

Use the same commands as we used above.

We now have three branches (in this case HEAD points to less-salt):

BASH

$ git branch

OUTPUT

experiment
* less-salt
main

BASH

$ git graph

OUTPUT

* bf59be6 (HEAD -> less-salt) reduce amount of salt
| * 6feb49d (experiment) maybe little bit less cilantro
| * 7cf6d8c let us try with some cilantro
|/
* dd4472c (main) we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions

Here is a graphical representation of what we have created:

  • Now switch to main.
  • Add and commit the following README.md to main:

MARKDOWN

# Guacamole recipe

Used in teaching Git.

Now you should have this situation:

BASH

$ git graph

OUTPUT

* 40fbb90 (HEAD -> main) draft a readme
| * bf59be6 (less-salt) reduce amount of salt
|/
| * 6feb49d (experiment) maybe little bit less cilantro
| * 7cf6d8c let us try with some cilantro
|/
* dd4472c we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions

Merging branches


It turned out that our experiment with cilantro was a good idea. Our goal now is to merge experiment into main.

First we make sure we are on the branch we wish to merge into:

BASH

$ git branch

OUTPUT

  experiment
  less-salt
* main

Then we merge experiment into main:

BASH

$ git merge experiment

We can verify the result in the terminal:

BASH

$ git graph

OUTPUT

*   c43b24c (HEAD -> main) Merge branch 'experiment'
|\
| * 6feb49d (experiment) maybe little bit less cilantro
| * 7cf6d8c let us try with some cilantro
* | 40fbb90 draft a readme
|/
| * bf59be6 (less-salt) reduce amount of salt
|/
* dd4472c we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions

What happens internally when you merge two branches is that Git creates a new commit, attempts to incorporate changes from both branches and records the state of all files in the new commit. While a regular commit has one parent, a merge commit has two (or more) parents.

To view the branches that are merged into the current branch we can use the command:

BASH

$ git branch --merged

OUTPUT

  experiment
* main

We are also happy with the work on the less-salt branch. Let us merge that one, too, into main:

BASH

$ git branch  # make sure you are on main
$ git merge less-salt

We can verify the result in the terminal:

BASH

$ git graph

OUTPUT

*   4f00317 (HEAD -> main) Merge branch 'less-salt'
|\
| * bf59be6 (less-salt) reduce amount of salt
* |   c43b24c Merge branch 'experiment'
|\ \
| * | 6feb49d (experiment) maybe little bit less cilantro
| * | 7cf6d8c let us try with some cilantro
| |/
* | 40fbb90 draft a readme
|/
* dd4472c we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions

Observe how Git nicely merged the changed amount of salt and the new ingredient in the same file without us merging it manually:

BASH

$ cat ingredients.txt

OUTPUT

* 1 tbsp cilantro
* 2 avocados
* 1 lime
* 1 tsp salt
* 1/2 onion

If the same file is changed in both branches, Git attempts to incorporate both changes into the merged file. If the changes overlap then the user has to manually settle merge conflicts (we will do that later).


Deleting branches safely


Both feature branches are merged:

BASH

$ git branch --merged

OUTPUT

  experiment
  less-salt
* main

This means we can delete the branches:

BASH

$ git branch -d experiment less-salt

OUTPUT

Deleted branch experiment (was 6feb49d).
Deleted branch less-salt (was bf59be6).

This is the result:

Compare in the terminal:

BASH

$ git graph

OUTPUT

*   4f00317 (HEAD -> main) Merge branch 'less-salt'
|\
| * bf59be6 reduce amount of salt
* |   c43b24c Merge branch 'experiment'
|\ \
| * | 6feb49d maybe little bit less cilantro
| * | 7cf6d8c let us try with some cilantro
| |/
* | 40fbb90 draft a readme
|/
* dd4472c we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions

As you see only the pointers disappeared, not the commits.

Git will not let you delete a branch which has not been reintegrated unless you insist using git branch -D. Even then your commits will not be lost but you may have a hard time finding them as there is no branch pointing to them.


Exercise: encounter a fast-forward merge

  1. Create a new branch from main and switch to it.
  2. Create a couple of commits on the new branch (for instance edit README.md):
  1. Now switch to main.
  2. Merge the new branch to main.
  3. Examine the result with git graph.
  4. Have you expected the result? Discuss what you see.

The following exercises are advanced, absolutely no problem to postpone them to a few months later. If you give them a go, keep in mind that you might run into conflicts, which we will learn to resolve in the next section.

(Optional) Exercise: Moving commits to another branch

Sometimes it happens that we commit to the wrong branch, e.g. to main instead of a feature branch. This can easily be fixed:
1. Make a couple of commits to main, then realize these should have been on a new feature branch.
2. Create a new branch from main, and rewind main back using git reset --hard <hash>.
3. Inspect the situation with git graph. Problem solved!

(Optional) Exercise: Rebasing

As an alternative to merging branches, one can also rebase branches. Rebasing means that the new commits are replayed on top of another branch (instead of creating an explicit merge commit).
Note that rebasing changes history and should not be done on public commits!
1. Create a new branch, and make a couple of commits on it.
2. Switch back to main, and make a couple of commits on it.
3. Inspect the situation with git graph.
4. Now rebase the new branch on top of main by first switching to the new branch, and then git rebase main.
5. Inspect again the situation with git graph. Notice that the commit hashes have changed - think about why!

(Optional) Exercise: Squashing commits

Sometimes you may want to squash incomplete commits, particularly before merging or rebasing with another branch (typically main) to get a cleaner history.
Note that squashing changes history and should not be done on public commits!
1. Create two small but related commits on a new feature branch, and inspect with git graph.
2. Do a soft reset with git reset --soft HEAD~2. This rewinds the current branch by two commits, but keeps all changes and stages them.
3. Inspect the situation with git graph, git status and git diff --staged.
4. Commit again with a commit message describing the changes.
5. What do you think happens if you instead do git reset --soft <hash>?


Summary


Let us pause for a moment and recapitulate what we have just learned:

BASH

$ git branch               # see where we are
$ git branch <name>        # create branch <name>
$ git switch <name>      # switch to branch <name>
$ git merge <name>         # merge branch <name> (to current branch)
$ git branch -d <name>     # delete merged branch <name>
$ git branch -D <name>     # delete unmerged branch <name>

Since the following command combo is so frequent:

BASH

$ git branch <name>        # create branch <name>
$ git switch <name>      # switch to branch <name>

There is a shortcut for it:

BASH

$ git switch -c <name>   # Create branch <name> and switch to it

Typical workflows

With this there are two typical workflows:

BASH

$ git switch -c new-feature  # create branch, switch to it
$ git commit                   # work, work, work, ...
                               # test
                               # feature is ready
$ git switch main          # switch to main
$ git merge new-feature        # merge work to main
$ git branch -d new-feature    # remove branch

Sometimes you have a wild idea which does not work. Or you want some throw-away branch for debugging:

BASH

$ git switch -c wild-idea
                               # work, work, work, ...
                               # realize it was a bad idea
$ git switch main
$ git branch -D wild-idea      # it is gone, off to a new idea
                               # -D because we never merged back

No problem: we worked on a branch, branch is deleted, main is clean.


(Optional) Tags

  • A tag is a pointer to a commit but in contrast to a branch it does not move.
  • We use tags to record particular states or milestones of a project at a given point in time, like for instance versions (have a look at semantic versioning, v1.0.3 is easier to understand and remember than 64441c1934def7d91ff0b66af0795749d5f1954a).
  • There are two basic types of tags: annotated and lightweight.
  • Use annotated tags since they contain the author and can be cryptographically signed using GPG, timestamped, and a message attached.

Let’s add an annotated tag to our current state of the guacamole recipe:

BASH

$ git tag -a nobel-2020 -m "recipe I made for the 2020 Nobel banquet"

As you may have found out already, git show is a very versatile command. Try this:

BASH

$ git show nobel-2020

For more information about tags see for example the Pro Git book chapter on the subject.


Test your understanding

  1. Which of the following combos (one or more) creates a new branch and makes a commit to it?
    $ git branch new-branch
    $ git add file.txt
    $ git commit
    $ git add file.txt
    $ git branch new-branch
    $ git switch new-branch
    $ git commit
    $ git switch -c new-branch
    $ git add file.txt
    $ git commit
    $ git switch new-branch
    $ git add file.txt
    $ git commit
  2. What is a detached HEAD?
  3. What are orphaned commits?
  1. Both 2 and 3 would do the job. Note that in 2 we first stage the file, and then create the branch and commit to it. In 1 we create the branch but do not switch to it, while in 4 we don’t give the -c flag to git switch to create the new branch.
  2. When you check out a branch name, HEAD will point to the most recent commit of that branch. You can however check out a particular hash. This will bring your working directory back in time to that commit, and your HEAD will be pointing to that commit but it will not be attached to any branch. If you want to make commits in that state, you should instead create a new branch: git switch -c test-branch <hash>.
  3. An orphaned commit is a commit that does not belong to any branch, and therefore doesn’t have any parent commits. This could happen if you make a commit in a detached HEAD state. Commits rarely vanish in Git, and you could still find the orphaned commit using git reflog.

Key Points

  • A branch is a division unit of work, to be merged with other units of work.
  • A tag is a pointer to a moment in the history of a project.

Content from Conflict resolution


Last updated on 2024-07-31 | Edit this page

Overview

Questions

  • How can we resolve conflicts?
  • How can we avoid conflicts?

Objectives

  • Understand merge conflicts sufficiently well to be able to fix them.

Conflict resolution


In most cases a git merge runs smooth and automatic. Then a merge commit appears (unless fast-forward) without you even noticing.

Git is very good at resolving modifications when merging branches.

But sometimes the same line or portion of the code/text is modified on two branches and Git issues a conflict. Then you need to tell Git which version to keep (resolve it).

There are several ways to do that as we will see.

Please remember:

  • Conflicts look scary, but are not that bad after a little bit of practice. Also they are luckily rare.
  • Don’t be afraid of Git because of conflicts. You may not meet some conflicts using other systems because you simply can’t do the kinds of things you do in Git.
  • You can take human measures to reduce them.

Type-along: create a conflict


We will make two branches, make two conflicting changes (both increase and decrease the amount of cilantro), and then try to merge them together. Git won’t decide which to take for you, so will present it to you for deciding. We do that and commit again to resolve the conflict.

  • Create two branches from main: one called like-cilantro, one called dislike-cilantro:
$ git graph

OUTPUT

*   4b3e3cc (HEAD -> main, like-cilantro, dislike-cilantro) Merge branch 'less-salt'
|\
| * bf59be6 reduce amount of salt
* |   80351a9 Merge branch 'experiment'
|\ \
| * | 6feb49d maybe little bit less cilantro
| * | 7cf6d8c let us try with some cilantro
| |/
* | 40fbb90 draft a readme
|/
* dd4472c we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions
  • On the two branches make different modifications to the amount of the same ingredient:
$ git graph

OUTPUT

* eee4b85 (dislike-cilantro) reduce cilantro to 0.5
| * 55d1ce2 (like-cilantro) please more cilantro
|/
*   4b3e3cc (HEAD -> main) Merge branch 'less-salt'
|\
| * bf59be6 reduce amount of salt
* |   80351a9 Merge branch 'experiment'
|\ \
| * | 6feb49d maybe little bit less cilantro
| * | 7cf6d8c let us try with some cilantro
| |/
* | 40fbb90 draft a readme
|/
* dd4472c we should not forget to enjoy
* 2bb9bb4 add half an onion
* 2d79e7e adding ingredients and instructions

On the branch like-cilantro we have the following change:

$ git diff main like-cilantro

OUTPUT

diff --git a/ingredients.txt b/ingredients.txt
index a83af39..83f2f94 100644
--- a/ingredients.txt
+++ b/ingredients.txt
@@ -1,4 +1,4 @@
-* 1 tbsp cilantro
+* 2 tbsp cilantro
 * 2 avocados
 * 1 lime
 * 1 tsp salt

And on the branch dislike-cilantro we have the following change:

$ git diff main dislike-cilantro

OUTPUT

diff --git a/ingredients.txt b/ingredients.txt
index a83af39..2f60e23 100644
--- a/ingredients.txt
+++ b/ingredients.txt
@@ -1,4 +1,4 @@
-* 1 tbsp cilantro
+* 0.5 tbsp cilantro
 * 2 avocados
 * 1 lime
 * 1 tsp salt

What do you expect will happen when we try to merge these two branches into main?

The first merge will work:

BASH

$ git checkout main
$ git status
$ git merge like-cilantro

OUTPUT

Updating 4b3e3cc..55d1ce2
Fast-forward
 ingredients.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

But the second will fail:

BASH

$ git merge dislike-cilantro

OUTPUT

Auto-merging ingredients.txt
CONFLICT (content): Merge conflict in ingredients.txt
Automatic merge failed; fix conflicts and then commit the result.

Without conflict Git would have automatically created a merge commit, but since there is a conflict, Git did not commit:

BASH

$ git status

OUTPUT

On branch main
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	both modified:   ingredients.txt

no changes added to commit (use "git add" and/or "git commit -a")

Observe how Git gives us clear instructions on how to move forward.

Let us inspect the conflicting file:

BASH

$ cat ingredients.txt

OUTPUT

<<<<<<< HEAD
* 2 tbsp cilantro
=======
* 0.5 tbsp cilantro
>>>>>>> dislike-cilantro
* 2 avocados
* 1 lime
* 1 tsp salt
* 1/2 onion

Git inserted resolution markers (the <<<<<<<, >>>>>>>, and =======).

Try also git diff:

BASH

$ git diff

OUTPUT

diff --cc ingredients.txt
index 83f2f94,2f60e23..0000000
--- a/ingredients.txt
+++ b/ingredients.txt
@@@ -1,4 -1,4 +1,8 @@@
++<<<<<<< HEAD
 +* 2 tbsp cilantro
++=======
+ * 0.5 tbsp cilantro
++>>>>>>> dislike-cilantro
  * 2 avocados
  * 1 lime
  * 1 tsp salt

git diff now only shows the conflicting part, nothing else.

We have to resolve the conflict. We will discuss 3 different ways to do this.


Manual resolution


<<<<<<< HEAD
* 2 tbsp cilantro
=======
* 0.5 tbsp cilantro
>>>>>>> dislike-cilantro

We have to edit the code/text between the resolution markers. You only have to care about what git shows you: Git stages all files without conflicts and leaves the files with conflicts unstaged.

Simple steps:

  • Check status with git status and git diff.
  • Decide what you keep (the one, the other, or both or something else). Edit the file to do this.
    • Remove the resolution markers, if not already done.
    • The file(s) should now look exactly how you want them.
  • Check status with git status and git diff.
  • Tell Git that you have resolved the conflict with git add ingredients.txt (if you use the Emacs editor with a certain plugin the editor may stage the change for you after you have removed the conflict markers).
  • Verify the result with git status.
  • Finally commit the merge with just git commit - everything is pre-filled.

Exercise: Create another conflict and resolve

In this exercise, we repeat almost exactly what we did above with a different ingredient.

  1. After you have merged like-cilantro and dislike-cilantro create again two branches.
  2. Again modify some ingredient on both branches.
  3. Merge one, merge the other and observe a conflict, resolve the conflict and commit the merge.
  4. What happens if you apply the same modification on both branches?

(Optional) Exercise: Conflicts and rebase

  1. Create two branches where you anticipate a conflict.
  2. Try to merge them and observe that indeed they conflict.
  3. Abort the merge.
  4. What do you expect will happen if you rebase one branch on top of the other? Do you anticipate a conflict? Try it out.

(Optional) Resolution using mergetool

  • Again create a conflict (for instance disagree on the number of avocados).
  • Stop at this stage:
Auto-merging ingredients.txt
CONFLICT (content): Merge conflict in ingredients.txt
Automatic merge failed; fix conflicts and then commit the result.
  • Instead of resolving the conflict manually, use a visual tool (requires installing one of the visual diff tools):

BASH

$ git mergetool
  • Your current branch is left, the branch you merge is right, result is in the middle.
  • After you are done, close and commit, git add is not needed when using git mergetool.

If you have not instructed Git to avoid creating backups when using mergetool, then to be on the safe side there will be additional temporary files created. To remove those you can do a git clean after the merging.

To view what will be removed:

BASH

$ git clean -n

To remove:

BASH

$ git clean -f

To configure Git to avoid creating backups at all:

BASH

$ git config --global mergetool.keepBackup false

Using “ours” or “theirs” strategy


  • Sometimes you know that you want to keep “ours” version (version on this branch) or “theirs” (version on the merged branch).
  • Then you do not have to resolve conflicts manually.
  • See merge strategies.

Example:

BASH

$ git merge -s recursive -Xours less-avocados  
# merge and in doubt take the changes from current branch

Or:

BASH

$ git merge -s recursive -Xtheirs less-avocados  
# merge and in doubt take the changes from less-avocados branch

Aborting a conflicting merge


  • Imagine it is Friday evening, you try to merge but have conflicts all over the place.
  • You do not feel like resolving it now and want to undo the half-finished merge.
  • Or it is a conflict that you cannot resolve and only your colleague knows which version is the one to keep.

What to do?

  • There is no reason to delete the whole repository.
  • You can undo the broken merge by resetting the repository to HEAD (last committed state).

BASH

$ git merge --abort

The repository looks then exactly as it was before the merge.


Avoiding conflicts


  • Human measures
    • Think and plan to which branch you will commit to.
    • Few branches that contain many unrelated changes maximize risk of conflicts.
    • Use one branch for one task only.
  • Collaboration measures
    • Conflicts can be avoided if you think and talk with your colleagues before committing.
    • Semantic conflicts that merge but don’t work: Importance of talking!
  • Project layout measures
    • Modifying global data often causes conflicts.
    • Monolithic entangled code maximizes risk of conflicts.
    • Modular programming minimizes risk of conflicts.
  • Technical measures
    • Push early and often - this is one of the happy, rare circumstances when everyone doing the selfish thing (pushing as early as practical) results in best case for everyone!
    • Pull/rebase often to keep up to date with upstream.
    • Resolve conflicts early.

Discuss how Git handles conflicts compared to the Google Drive.

Key Points

  • Conflicts usually appear because of not enough communication or not optimal branching strategy.

Content from Sharing repositories online


Last updated on 2024-07-31 | Edit this page

Overview

Questions

  • How can I set up a public repository online?
  • How can I clone a public repository to my computer?

Objectives

  • To get a feeling for remote repositories.
  • Learn how to publish a repository on the web.
  • Learn how to fetch and track a repository from the web.

In this episode, we will publish our guacamole recipe on the web. Don’t worry, you will be able to remove it afterwards.

From our laptops to the web


We have seen that creating Git repositories and moving them around is simple and that is great.

So far everything was local and all snapshots are saved under .git.

If we remove .git, we remove all Git history of a project.

But… - What if the hard disk fails? - What if somebody steals my laptop? - How can we collaborate with others across the web?


Remotes


To store your git data on another computer, you use remotes. A remote is like making another copy of your repository, but you can choose to only push the changes you want to the remote and pull the changes you need from the remote.

You might use remotes to: - Back up your own work. - To collaborate with other people.

There are different types of remotes: - GitHub is a popular, closed-source commercial site. - GitLab is a popular, open-core commercial site. Many universities have their own private GitLab servers set up. - Bitbucket is yet another popular commercial site.


Bitbucket


CSIRO has an enterprise bitbucket server bitbucket.csiro.au available for staff use. It offers a nice HTML user interface to browse the repositories and handles many things very nicely. Accounts use your CSIRO credentials.


Create a new repository on Bitbucket


  1. Login at bitbucket.csiro.au
  2. Click on your user profile icon, in the top-right corner, then “View Profile”
  3. Click “Create Repository”

On this page choose a repository name and description (screenshot).

Create new repo on bitbucket

After you then click “Create repository”, you will see a page similar to:

Bitbucket repo setup page

Note that this screen is telling us exactly what to do to get started depending on different scenarios: 1. If creating the Bitbucket repository was the very first thing we’d done, before starting work (great forward planning!) then we could clone the empty repository and start working in it. 2. If we’d started working on files, but had never run git init and started performing local git opertaions, then it tells us how to now start tracking those files. However, it also then tells us how to link to this online repository, adding it’s url as the “remote origin”, through git remote add. This is where we are. 3. The final scenario would be less used. It’s only for when a repository had already been linked to a remote but now you’d like change it to point this new one instead.

Creating the repository on Bitbucket effectively did the equivalent to this on the Bitbucket servers:

BASH

$ mkdir recipe 
$ cd recipe
$ git init

Linking our local repository to Bitbucket


To be able to send our local changes to Bitbucket, we need to tell the local repository that the one we just created on Bitbucket’s servers exists. To do this, we add a ‘remote’. Git repositories can have any number of remotes, although it is by far the most common to only use one. Each git remote is given a name so that it can be referred to easily. The default remote name is origin.

As noted above, Bitbucket has provided us the instructions for how to do this under the scenario:

My code is ready to be pushed

  1. Go back to your guacamole repository on your computer.
  2. Check that you are in the right place with git status.
  3. We’ll copy and paste the instructed commands from Bitbucket, however, we’ve already run git init and already committed files, so we can ignore the first several steps, so…
  4. Copy and paste just the last two lines to the terminal and execute those, in my case (you need to replace the “user” part and possibly also the repository name if you gave it a different one):

BASH

$ git remote add origin https://bitbucket.csiro.au/scm/<user>/recipe.git
$ git push -u origin HEAD:main

You should now see something similar to:

OUTPUT

Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 259.80 KiB | 0 bytes/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To https://bitbucket.csiro.au/scm/user/recipe.git
 * [new branch]      main -> main
branch 'main' set up to track 'origin/main'.

Refresh your Bitbucket project website and - taa-daa - your commits should now be online!

What just happened? Think of publishing a repository as uploading the .git part online.

When those two lines of code were run, two commands were given. The first was to add a reference to the Bitbucket repository, and call it origin.

The second was to push our local changes to that remote. That command was:

git push -u origin main

This is in the format:

git push -u <remote-name> <branch-name>

If you’ve got a simple repository with only one remote and one repository, you can simply run git push.

Challenge 1

Make a change to your project and commit the changes locally.

Push the changes to your BitBucket remote.

What information can you access about the commit you just made?

Getting changes from the remote


Of course we don’t want information to only go one way - if the remote has changes to the project from a collaborator we need to get those onto our local machine. To do this, we’re doing the opposite of a push, so it’s helpfully called a pull.

Challenge 2

Make a change to your repository using the Bitbucket web interface: 1. Click on a file and then click ‘Edit’ (top right) 2. Write something new and then click ‘Commit’ (bottom left) 3. Fill in commit message (as if you were doing a commit -m "something") and click ‘Commit’ again

Once you’ve made a change, use git pull in your terminal to get the changes onto your local machine.

Inspect the history with git log.

Similar to git push, if you have multiple remotes and branches, you need to specify which you are referring to by using the format git pull <origin-name> <branch-name>, but for our purposes git pull is sufficient.

Key Points

  • A repository can have one or multiple remotes
  • A remote serves as a full backup of your work.
  • git push sends local changes to the remote
  • git pull gets remote changes onto your local machine.
  • A remote allows other people to collaborate with you

Content from Collaborating with git repositories


Last updated on 2024-07-31 | Edit this page

Overview

Questions

  • How can I contribute to a shared repository?
  • How can I contribute to a public repository without write access?

Objectives

  • Understand how to share a repository collaboratively.
  • Learn how to contribute a pull request.

Collaborating with git remotes


One of the major advantages of version control systems is the ability to collaborate, without having to email each other files, or bother about sharedrives. We’ll consider two scenarios for collaboration:

  • A remote repository where collaborators each have write acess
  • A remote repository where you do not have write access

A remote with write access


Remember that a git remote is simply a copy of the .git directory. It contains the instructions for how to recreate any state in the history that has been captured. If more than one person is contributing to a collaborative remote, there will be a shared history. That is, the copy on the remote will be a combination of the history between the collaborators.

Discussion

What do you think will happen if two collaborators make their own sequence of commits on main and try to push them to the same remote?

To make sure that you don’t end up with a mess of conflicting commits, it’s essential to have an agreed strategy for how to manage your contributions.

There are different models that can work, and depending on the complexity of each situation might be appropriate.

There’s a good discussion of different models, including git-flow here.

To be absolutely sure your local work won’t conflict with someone else’s, always work on your own branch. Don’t commit directly to main, but only merge your branch onto main after discussion with your collaborators, or through a pull request (discussed below).

A remote without write access


Lots of open source projects welcome contributions from the community, but clearly don’t want to give write access to just anyone. Instead, a very commonly used approach is to accept pull requests from forked versions of the repository.

Forking a repo

Forking a repository is making your own copy of a remote. For example, a Data School example repo is hosted at bitbucket.csiro.au/scm/dat/programmatic-data-example. By forking that repository, you can have your own copy, retaining the complete history of the project, at bitbucket.csiro.au/scm/<your_username>/programmatic-data-example.

To fork a repository on Bitbucket, click on the Create Fork button in the lefthand menu of a repository’s page.

Bitbucket repo fork

Submitting a pull request

To contribute a change to a repository that you don’t have write access to, you first of all need to make your own copy (fork the repo) which you do have write access to. You can then make your changes to the repo, and push them to your own fork.

To get them into the original repo (if that’s what you want), you need to ask the maintainers of that repository to accept them, through a pull request. You are requesting that the repository “pull” in your changes.

Pull requests as a collaborative framework

Pull requests can also be useful on a repository that you do have write permissions on, as a collaborative, organisational, and record-keeping tool. A common working pattern, when using Git in a team, is to complete a body of work on a separate branch and then, rather than doing a git merge, instead create a pull request. In doing so, you can: * Invite collaborators to review your changes * Create discussion around the changes (with discussions saved for posterity) * Continue to make further edits to your changes before finally merging * Save a formalised record of these steps having taken place

An open pull request may continue to receive further commits, by pushing changes to the same branch. This allows a pull request to act as a draft step, under review, until finally approved to ‘merge’.


Pull requests on Bitbucket


The option to create a Pull request on Bitbucket may be found in the lefthand menu of a repository’s page.

bitbucket pull request menu

You’ll then be asked to select a source branch (the branch with new work) and a target or destination branch (the branch to merge into).

bitbucket pull request menu

Next you’ll be able to write a description of what the pull request is about, and request specific teammates as “reviewers” of the request, before confirming the pull request.

With the pull request open, options include looking at the commits and file changes involved, writing discussion comments, starting an official review, making edits, etc.. The final goal would usually be the ‘Merge’ button, to the top-right, however other outcomes may be to decline or delete the pull request.


Challenge

Form teams of 2-3 people. One person will start.

Person 1: 1. One person from each team should create a new Bitbucket repository named ‘favourite-things’. 2. Copy the supplied git clone command to create a local copy. 3. Locally, create a file named README.md and list a few of your favourite things within it. 4. Use git add, git commit and git push to move your new file back to the remote. 5. In the Bitbucket repository, click ‘Repository Settings’ in the lefthand menu, followed by ‘Repository permissions’. Use the form to give “User access” with “Write” permissions to your team member(s). bitbucket pull request menu
6. Share the repository link to your team member(s).

After the above, other team member(s) then: 1. Use git clone to create a local copy of the ‘favourite-things’ repository. 2. Create and switch to a new branch (with a meaningful name). 3. Edit the README.md file to add a few of your own favourite things to it. 4. Use git add, git commit and git push to get your changes to the remote (still on your new branch). 5. On Bitbucket create a Pull Request that would merge your new branch into the original.

Finally, together, explore the created Pull Request(s) on Bitbucket and finally “merge” them.

Bonus discussion: Why was the suggested filename ‘README.md’ specifically?

Key Points

  • Sharing a repository needs good communication.
  • Branches are really necessary.
  • Pull requests enable sensible merging of changes between branches and across repositories.

Content from Making git citable


Last updated on 2024-07-31 | Edit this page

Overview

Questions

  • How can we make a particular git commit citable?
  • How can version control help reproduceable science?

Objectives

  • Learn how to tag a particular git commit
  • Consider for version control contributes to open science

Citing code


One of the major benefits of using code for data analysis is reproducibility and enabling transparency about the way data was manipulated and analysed. Using a version control system like git takes that to another level by enabling the sharing of the history of the code, as well allowing collaboration on code.

But because by its very nature the code may change over time, it’s necessary to be able to point to a particular point of history when we want to cite the code, such as in a research paper.

Discussion

The old way of sharing a particular version of code was to upload a text file as supplementary material.
- What are some of the shortcomings of this approach?
- With your new expertise with git, what would be a better way?

Referring to particular commits


To enable us to accurately refer to a particular commit, we need a label for it. The hash strings that git generates could work, but they are unwieldy. Branch pointers don’t work because they stay at the tip of each branch.

Instead, git provides tags.

Tags are pointers that refer to a specific commit, and then never move.

You can give a tag any label you’d like (short, meaningful, no spaces). A classic example of a tag may be “v1.0” to denote a finalised “version 1” release of something.

Challenge

  • In the vizualise git environment have a go at creating tags.
  • Use git tag [tagname]
  • Create a tag, then add some commits. What happens to the different pointers?
  • Make some more tags, then practice doing git checkout [tag]

Making tags static with GitHub and Zenodo


While tags are really useful to point to a particular commit, they don’t give a single access point - for example they exist in all copies of the repository, and there doesn’t have to be a publicly accessible copy. For a citation, that’s not much good. If working in GitHub, there is an option to publish a Digital Object Identifier (DOI), permanently linking to a particular tag, via Zenodo - https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content


Open Science


Free sharing of information might be the ideal in science, but the reality is often more complicated. Often practice today looks something like this:

  • A scientist collects some data and stores it on a machine that is occasionally backed up by their department.
  • They then write or modify a few small programs (which also reside on the machine) to analyze that data.
  • Once they have some results, they write them up and submit a paper. The scientist might include their data – a growing number of journals require this – but they probably don’t include the code.
  • Time passes.
  • The journal sends the scientist reviews written anonymously by a handful of other people in their field.
    The scientist revises the paper to satisfy the reviewers, during which time they might also modify the scripts they wrote earlier, and resubmits.
  • More time passes.
  • The paper is eventually published. It might include a link to an online copy of the data, but the paper itself will be behind a paywall: only people who have personal or institutional access will be able to read it.

For a growing number of scientists, though, the process looks like this:

  • The data that the scientist collects is stored in an open access repository like CSIRO’s Data Access Portal, possibly as soon as it’s collected, and given its own
    Digital Object Identifier (DOI).
  • The scientist creates a new repository on BitBucket to hold their work.
  • During analysis, they push changes to their scripts (and possibly some output files) to that repository. The scientist also uses the repository for their paper; that repository is then the hub for collaboration with colleagues.
  • When they are happy with the state of the paper, the scientist posts a version to arXiv or some other preprint server to invite feedback from peers.
  • Based on that feedback, they may post several revisions before finally submitting the paper to a journal.
  • The published paper includes links to the preprint and to the code and data repositories, which makes it much easier for other scientists to use their work as starting point for their own research.

This open model accelerates discovery: the more open work is, the more widely it is cited and re-used. However, people who want to work this way need to make some decisions about what exactly “open” means and how to do it. You can find more on the different aspects of Open Science in this book.

This is one of the (many) reasons we teach version control. When used diligently, it answers the “how” question by acting as a shareable electronic lab notebook for computational work:

  • The conceptual stages of your work are documented, including who did what and when. Every step is stamped with an identifier (the commit ID) that is for most intents and purposes unique.
  • You can tie documentation of rationale, ideas, and other intellectual work directly to the changes that spring from them.
  • You can refer to what you used in your research to obtain your computational results in a way that is unique and recoverable.
  • With a version control system such as Git, the entire history of the repository is easy to archive for perpetuity.

With tools like R Markdown and Jupyter Notebooks, documentation may be mixed directly with code to generate graphs and images for the same documentation, and all stored in version control!


Licenses and citations


A final note to keep in mind if sharing a Git repository publicly- it may be important to include some licensing information (often done in a LICENSE.md file) instructing under what conditions others are welcome to use/modify your work. Example IM&T may help with this if necessary.

Similarly, a citation file may be useful to include, with a request for how you’d like your work to be cited. A special format is proposed specifically for this purpose, in the form of a CITATION.cff file, containing a standardised set of information that is both human and machine readable. E.g.:

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Druskat
    given-names: Stephan
    orcid: https://orcid.org/1234-5678-9101-1121
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11

More information is available here: citation-file-format.github.io

Key Points

  • Git tags may be generated to note a particular version in history.
  • Consider use of git early in scientific workflows, for robust documentation.
  • Open scientific work is more useful and more highly cited than closed.

Content from What to not add to Git


Last updated on 2024-07-31 | Edit this page

Overview

Questions

  • What should be included in Git repositories?

Objectives

  • Consider the dos and don’ts of Git usage

What to not add to Git


A final note on what should and shouldn’t be included in Git repositories.

In general, Git can describe individual line changes for any sort of file that may be opened within a text editor. This includes many forms of code, documentation, HTML, simple data files, etc., but excludes other binary formats like: - Microsoft Office documents and other rich text documents
- PDF files
- JPEG/PNG images
- Zip files
- Proprietary data files
- Etc..

Git may still store these files, but any version history would be less meaningful, without clear ‘diffs’.

More importantly though, Git repositories should not be used to store: - Personal information, especially usernames, passwords, keys, secrets
- Details that are specific to an individual computer or system (e.g. use relative paths rather than full system paths)
- Large volumes of data / input files (make use of services like the DAP for these)
- Compiled executables, software libraries, and other files that may be regenerated by scripts when needed

Remember to make use of .gitignore files to ignore and exclude files as necessary.

Key Points

  • Make use of .gitignore to list files to exclude.
  • Don’t commit sensitive, personal or machine-specific information.
  • Don’t commit large data files.
  • Git works best with text files, for which it can track individual line changes.