Rebasing illustrated

Posted on Wed 18 November 2015 in Commit grooming

Once a commit has been created, it is set in stone. Due to git's nature, it is impossible to change existing commits without changing their identifier. However, this does not mean that you cannot alter history; it merely means that such edits will be noticeable.

There are various ways of editing history, and one of the more common ones is the git rebase command, which can take commits and copy them to another part of your history. Its name is confusing, it usage can be complicated and it's often misunderstood. So let's try to at least clear up the last part of that.

Changing published history can be problematic

While it's perfectly safe to rebase commits you have never pushed, or rebase to clean up/alter pull requests that have not yet been merged, things become more complicated when changing for example the master branch of a popular project after pushing it to a central repository, as others may have based new work on it.

If you change published history that other people have based their work on, they also need to alter their histories. Please be aware of this when doing a rebase of such code.

Simple example

Let's take a simple example, a branch that was branch off of master a few commits ago, and now you want to rebase it to bring all of master's changes in.

When you type git rebase master, git will take your commits and put them on top of master. Sort of. What the graph above shows is that your old commits are still there, but two new commits were created ahead of master, and your mybranch ref is changed to point at the new commits.

You also see that the old commits are still there. Git does not immediately delete such dangling commits, but leaves them around for a little while. This means you can still access all the data in there, in case you need to see a diff, or want to rescue some older version of a file.

Other refs

You also saw that that your rebase only affected the mybranch branch. No other branches or tags were touched, including the master branch, which you rebased onto.

If there are refs that point to commits you are rebasing, they will also not change. Take the following example, that has a tag pointing to a commit on a branch that is about to be rebased.

When you git rebase master in this repository, you see that the mytag tag does not move. If you wish the tag to move as well. You will need to do so manually (but beware that moved tags are not automatically fetched by clients).

Three-argument rebase

Git rebase can also transplant arbitrary commits to anywhere in the commit tree. In the example below, we have a master branch from which a develop branch has been split. From that develop branch, a feature branch has been split, but that feature branch really should have been based on master. We can tell git rebase to take all commits from the ancestor of develop to feature and transplant them to a new branch on top of master.

The command to run to do this is git rebase --onto master develop feature, meaning "copy all commits between develop and feature to what master points at". One thing to note about this type of rebase is that the second argument (develop) doesn't need to be a strict ancestor of the third argument, git will find a common ancestor between the two and use that as starting point.

As you can see, a side effect of this command is that now the feature branch is checked out instead of the develop branch. Other than that it's a normal rebase, and it just copied three commits and moved a ref.

Rebasing more than one branch

Let's make it a bit more difficult: let's rebase everything onto master, while keeping the layout intact. Let's do the easy one first. We've checked out the develop branch and git rebase master. Now we do git checkout feature, but we cannot just rebase it, as that would duplicate all commits that were common between develop and feature. So we need to carefully rebase just the commits we want, the last three, and we need to attach it to the parent of develop. This turns into git rebase --onto develop~1 HEAD~3.

Lastly, we want to move the sometag tag to the grandparent of the new develop branch. This can be done with git tag -f sometag develop~2

Recovering from an upstream rebase

As I warned above, changing published history causes work for people who based new work on that history. But how much work? Let's take a simple example. As you can see in the graph below, the local origin/master ref points to a commit that no longer exists remotely: remote has changed history, possibly by rebasing.

So let's see what we need to do. First we git fetch the commits into our repository. As you can see, we now have an extra commit (currently pointed to by master) that we should drop. This means we need some surgery on the master and develop branches.

We must fix branches right-to-left, or topological newest to oldest, so we can use a three-argument rebase as above. So first we transplant the develop branch on top of the new origin/master: git rebase --onto origin/master master develop.

And then we git checkout master so we can fix it. Since we didn't have local changes to master, we can just git reset --hard origin/master to move the ref without copying any commits.

That seems simple enough, right? Well... this example is almost trivial (one branch, two commits) and still takes manual inspection and careful use of git to fix. Imagine having a few dozen branches and commits based on work that was rebased, it can be quite a bit of work. Moreover, because any rebase can cause conflicts, the work may not be limited to just moving commits, no some may need to be modified or entirely rewritten. So on behalf of all your collaborators, think twice before rebasing history somebody may have based their work on.

Interactive rebase

All the examples so far show git's default non-interactive rebase. The only time you need to do more than a rebase invocation, is when you have a conflict. This all works fine if all you do is moving commits around, but since you're rewriting history anyway, it is also a good time to groom your history.

Together with git add -p, git rebase -i makes a very clean commit history possible even if your way of working isn't quite as clean. For example, I often work on a few things at the same time, creating many small commits and fix-ups with git add -p as I go. When time comes to publish my work, I use git rebase -i to combine smaller commits into logical units, reorder commits to make sense and maybe even create multiple branches for multiple pull requests.

The example below is one of these cleanup sessions. It's what you can see when you do git rebase -i origin/master to groom all your unpushed commits. In this case I wasn't too messy, but I still wanted to combine the third and fourth commit and fix a typo in commit #5's commit message

pick f351964 Python 3.4 compatibility
pick b40853f Ignore docs builddir
pick 6c4f061 Make non-redirected commands work under windows
pick 6098e98 Stray os.pipe() leads to fd leakage
pick 9b0611f tests: don't relyh on non-coreutils tools
pick d3097d1 Enable travis tests

# Rebase f351964..d3097d1 onto ea5ee38 (6 command(s))
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

Git opens a file with this text in your favourite text editor, in my case Vim with syntax highlighting. Following the instructions in the file I changed a pick to a squash, and another pick to an edit. When exiting the editor, the normal rebase process starts. The only difference is that for each squash or edit, you will now be prompted to make the changes you want.

pick f351964 Python 3.4 compatibility
pick b40853f Ignore docs builddir
pick 6c4f061 Make non-redirected commands work under windows
squash 6098e98 Stray os.pipe() leads to fd leakage
edit 9b0611f tests: don't relyh on non-coreutils tools
pick d3097d1 Enable travis tests

After making those changes and doing git rebase --continue, git will have rewritten the history to your liking. It will now look like this:

$ git log --oneline origin/master..
8d22fc7 (HEAD -> master) Enable travis tests
4e118d5 tests: don't rely on non-coreutils tools
fbdf972 Make non-redirected commands work under windows
4e5ee71 Ignore docs builddir
944d627 Python 3.4 compatibility

All commit sha1's have of course changed, it's a rebase after all. But the history makes more sense now, and it can be pushed!