Undoing all kinds of mistakes

Posted on Sun 20 March 2016 in Effective git usage

The most commonly asked questions in #git are variants of "I did something with my repository I should not have done, how do I fix this". Often the asker uses words like revert or undo very ambiguously and it takes a while to figure out what the desired result is. Here we discuss various ways of fixing mistakes and when to use them.

Make sure you understand these commands completely before running them

The git commands in this article are all about making things disappear, so they can be quite dangerous when used incorrectly. Make sure you understand what these commands do before running them, and run them in noop mode first if possible (most commands accept a -n flag for that).

The recipes in the sections about undoing committed changes all work best if you have a clean worktree. So no staged or unstaged changes. Stash or commit them before doing cleanups.

Let's start with some definitions where git is very precise, but people are not. When asking questions about fixing your repository, it really helps if you use these terms correctly.

working directory

The working directory or worktree is the directory that holds both your .git directory and the checked out files.

commit

A commit is a full snapshot of all files in your repository. Git does not store diffs or patches, only full snapshots of files. How git manages not to require a very large amount of disk space for your repository is a subject for another article. For now it's just important to remember that all versions of all files ever committed can be retrieved by git.

branch, tag and ref

Usually when people refer to a branch, they mean a set of commits that follow each other. This is not wrong per se, but in the world of git, the meaning of branch is more subtle. A branch label, say 'master', is really nothing more than a name that points to a commit (called the tip of that branch). Branch names are not recorded in commits, and once merged they can be deleted without losing commits.

Such a label is called a ref (from reference), and git has many of them. The most important ones are heads (branches), which are refs that move, and tags, which are refs that don't move.

reflog

For refs that move (branches and HEAD), git keeps a log of when they moved and why. So for every commit, reset, merge and all other actions that move heads around, git tracks before and after states. Even when you change history and commits become unreachable, the reflog has your back. And because git's garbage collection also does not delete things that are still in the reflog, you can even undo things that would be very destructive otherwise. Git really doesn't like losing committed data.

commit-ish

Many git commands take a commit identifier as argument. While each commit has it's own unique identifier, the sha1, a commit can usually be referred to in many ways: any ref that points to it, the commit tree walking tricks with ^ and ~ etc. 'commit-ish' means any of the ways you can refer to a commit.

index

The index or staging area is a feature that is unique to git and is part of what makes git so powerful at commit grooming and refining. The staging area, as its name implies is a staging area where you prepare the next commit. It is in essence a simple list of (filename, sha1) pairs that tell git which data objects should be part of the next commit.

When you git add a file, git actually already adds the file to the object database and adds the sha1 of the file to the index. This is what makes git add -p possible, but also why you have to git add the file after every change.

revert

As a noun, it means a commit that is the inverse of another commit, effectively undoing the changes of that commit. As a verb it means to create such a commit. This is the most misused word when talking about undoing changes, so please only use revert if you actually mean either of these two things.

reset

Reset can affect the working tree, the index and the commit graph. So it can mean three things, or a combination of two or three of those things. Have I told you yet that git can be confusing?

  • When talking about the commit graph, to reset a branch means to point a branch label to another commit, in the context of undoing changes usually an older commit. This makes git forget that commits newer than the commit you reset to have ever been part of that branch.
  • Reset can also manipulate the index (reset --hard, reset --mixed, reset -p). This does the inverse of git add, making the index resemble the last commit and not the worktree.
  • And finally, reset can undo changes in the worktree (reset --hard).

As a verb, unfortunately it can also mean all of these things. So when talking about a reset, it's vital to say exactly which command you mean.

checkout

To check out something means to update the index and working tree with contents from a commit and update the HEAD pointer. The usual invocation of git checkout branchname makes the index and worktree match the tip of the branch and also updates HEAD to point to that tip.

Checkout can also be used to grab only parts of the contents of a commit. In this mode it does not update HEAD. And finally, because git users are lazy, git checkout can also be used to create new branches and check them out at the same time, this is what the -b option does.

merge

A merge commit is a commit with more than one parent. Nothing more, nothing less.

To merge means to create a merge commit, merging two or more branches into one. When merging, you will often need to resolve conflicts between these branches.

rebase

Rebasing commits copies them to another place in the commit graph. See the rebasing illustrated article for more info on rebase.

Fixing up uncommitted changes

By far the easiest undo to accomplish is undoing uncommitted changes. But even here git is surprisingly flexible, allowing you to decide which local changes to keep (and where to keep them) and which not to.

Getting rid of all local uncommitted changes

Sometimes you just really want to say 'damn it, I did it all wrong, let's get rid of this mess' and undo all uncommitted changes and hang your head in shame. Your friend in this case is git reset --hard which resets the index and the worktree to the state of the last commit.

And if you also want to get rid of untracked files, git clean -di (or its more destructive options, -f and -x) will help you clean up even more.

Undoing selected local changes

While it's fun to tableflip all your changes away, usually you only want to undo some of your local changes while preserving the rest. If you've already git added the changes, first do a git reset --mixed of the files you want to change to make git forget that you added some changes to the index.

If you want to undo all changes to a certain file, you can simply check the file out again: git checkout -- path/to/file. This also works to 'undelete' a tracked file that you deleted.

To only undo some changes to a file, you can still use checkout, but now with the -p flag: git checkout -p -- path/to/file. Like git add -p, it will show each change and ask you what to do with it.

Undoing staged changes

If you've git added a change, or an entire new file, you can simply git reset filename to undo the adding, without touching any history or your worktree. If you don't want to undo the adding, but want to add more changes to the same file, simply git add them and git will update the index.

Moving changes to a different branch

Another common issue is finding out you're on the wrong branch and wanting to move your changes to that other branch. If you're lucky, you can simply check out that branch (git checkout branchname if the branch already exists, git checkout -b branchname for a new one). However, if your changes conflict with that branch, you can first git stash your changes, do the checkout and git stash apply, followed by the normal conflict resolution.

I don't like git stash though, so I take a different approach. Which is not actually that different from what git stash does, except with a whole lot less magic and no abuse of the reflog.

First I tag where I'm currently at so I can easily go back. git tag backup. Then I git commit my changes in one or more commits. If there are also changes, I do not want to commit, I'll reset them out of the way. Once that's done, I'll git checkout develop to go to the other branch. I then git cherry-pick backup..master to cherry-pick the new commits onto that branch, solving any merge conflicts that may arise. Then I git checkout master and git reset --hard backup to point master to where it should be. Now we can git tag -d backup and everything is squeaky clean again.

Recovering uncommitted files after reset

After working with git for a while, most people know that once a file has been committed, git will not easily lose it. What many people do not know is that even just git add is enough to make git remember the version of the file you are adding, even when you make more changes and do another git add. And even when you git reset --hard before committing!

The trick is that git add actually already creates a git object for you and puts its sha1 in the index. When you add again, or when you reset, that blob becomes a so-called dangling blob and git gc will eventually clean it up. But until it has done so, git fsck will find it and tell you the sha1's of all dangling blobs. You can then use git show to recover them, or use git fsck --lost-found to recover them all at once.

Fixing up committed changes without rewriting history

Have a clean worktree

The recipes in this section assume you have a clean worktree. Some may cause you to lose uncommitted changes, so make sure you commit or stash any work in progress.

Once a change has been committed, there are two general ways of undoing the change: rewriting history, making it look like the change never happened. Or creating changes that invert your change. While it's perfectly safe to change history you have never pushed, or to clean up/alter history that has not yet been merged in main branches, things become more complicated when changing for example the master branch of a popular project after pushing it to a central repository, as others may have based new work on it.

If you change published history that other people have based their work on, they also need to alter their histories. Please be aware of this when altering such history. To help those people, we start with fixes that do not require any modification of the commit history.

Undoing an older commit

To make a commit that inverts all the changes of another commit, you use git revert. For example, to revert the second to last commit in the graph above, you could do git revert HEAD^.

And since a revert is just a simple commit, it can also be reverted, making the changes appear again. This can be useful if you only had to revert changes temporarily while preparing for them to work. In the graph above, git revert HEAD would do the trick.

Reverting many commits

You can revert many commits in a single command. For example, should you decide that everything between version 0.1 and 0.2 was actually a big mistake, you can git revert v0.1..v0.2. If you want to make only one commit containing all of the reverts, you can git revert -n. This will revert the changes in the worktree but it will not create a new commit. That way you can do final tweaking and commit the results yourself.

Reverting to a specific commit

If you wish to make the next commit look exactly like another commit, you can of course revert until you reach its state. But that may be tricky, or even impossible if that commit is not a direct ancestor of the current HEAD.

But fear not, git is here to help you out. Remember that git does not track changes, instead each commit is a full snapshot of your files. So let's not try to undo changes made, but just git checkout commit-ish -- .. Your tree now looks exactly like the commit you specified, and you can commit it.

There's just one caveat: if there are files in your current commit that are not in the other commit, they will be kept in their current state. So a more complete version of this recipe is: git rm -rf :/ && git checkout commit-ish -- :/

If the commit you want to use as the source of truth is on another branch (and if it isn't, you can simply create a branch) can also trick git merge into doing this. By using the 'ours' merge strategy, it will make a merge commit that has multiple parents, but instead of merging the contents of those commits and their merge base, it simply discards the contents of the other commits and keeps the contents of the current branch.

So if you want to make master look like exactly like develop, that would look as follows:

$ git checkout develop
$ git merge -s ours master
$ git checkout master
$ git merge --ff-only develop

If the commit you want to revert to is not at the tip of a branch, you can simply create a temporary branch:

$ git checkout -b temp-branch 03406c86
$ git merge -s ours master
$ git checkout master
$ git merge --ff-only temp-branch

Reverting a single file

The above recipes are all very useful if you want to revert entire commits. But what if you just want to revert parts of it? To revert the edits to a single file, you can use a combination of diff and apply: git diff commit-ish^..commit-ish -- file | git apply -.

And if you want to make a file look the way it looked in another commit, you can simply check the file out: git checkout commitsh -- file. Use checkout -p to decide hunk-by-hunk whether to retain your current version or use the other version.

Stuck in a conflict

Commands that can result in conflicts, such as merge, cherry-pick and rebase, all use the same strategy for solving the conflicts: use the built-in algorithms to automatically resolve them, and if those fail the user gets to pick up the pieces and solve the conflict manually.

The commands keep some state around when they do so, which is incredibly useful if you don't want to or cannot resolve the conflict. You can then simply do git merge --abort (or a similar incantation for rebase and cherry-pick) and git brings you back to the state where you were before attempting whatever you did that caused the conflict.

Reverting a merge

Every commit can be reverted, even a merge commit. But reverting a merge commit has one really big downside, which I will illustrate with the graph below. There are 2 branches: master and develop, and develop got merged into master. After this both master and develop have received new commits.

When you git revert HEAD^, git does not undo the merge, but only its effect. So all changes from the develop branch disappear. If you now git merge develop again, they also do not come back, only the changes from the last two commits on the develop branch are applied!

Why is this? Well, when git does a merge, it does a 3-way merge of the content of the current branch, the branch merged in and their common ancestor. For the second merge, the grandparent of the tip of the 'develop' branch is now that common ancestor. So all git sees is that in the current branch a bunch of changes were made, it does not see that these are undoing older commits. It also does not see those older commits, as it does not look further back than the merge base.

So all in all, reverting a merge is not always a good idea. If you still really want to make that merge go away and do not mind rewriting history, there is another recipe for you further below.

Stop tracking a file

If you want a file to no longer exist, you git rm it. This deletes it from disk and adds the deletion to the index, ready for the next commit. But if you do want to keep it on disk, just not in the repository, you can git rm --cached it, this only stages the deletion but leaves the file untouched.

There is one big caveat here though: if you commit this deletion, and then pull that change into another repository, for instance, to deploy your changes, the file will be deleted from disk there!

To get the file back on local disk, you can use git log to find the last copy of it, and git show to get it back.

$ file=myfile.txt
$ commit=$(git log -1 --format=%H -- "$file")
$ git show "$commit^:$file" > "$file"

Rewriting history to make mistakes disappear

Have a clean worktree

The recipes in this section assume you have a clean worktree. Some may cause you to lose uncommitted changes, so make sure you commit or stash any work in progress.

While some people consider it a thoughtcrime to even think about changing history, sometimes you really need to be Winston and make sure things have never happened. Whether you've committed passwords or simply want to clean up before merging, git has you covered.

Before you go all minitrue (ok... that's enough 1984 puns), please do think about the people you are collaborating with in the repository you are manipulating. While it's perfectly safe to alter history you have never pushed, or to clean up/modify pull requests that have not yet been merged, things become more complicated when changing for example the master branch of a popular project after pushing it to a central repository, as others may have based new work on it.

If you change published history that other people have based their work on, they also need to alter their histories. This can be a complicated, error-prone task and you should really avoid forcing others to do so.

Don't use git push -f

After changing history that you have already pushed, you will notice that git push now fails. This is a failsafe mechanism to avoid losing data. But since in this case you want to lose data, you will need to tell git to accept this. The common way is to do git push -f, but that's actually quite bad. A safer alternative is git push --force-with-lease, which makes sure nobody else added commits on top of what you altered. And just to avoid typing all those characters, you can git config alias.force-push 'push --force-with-lease' and then simply use git force-push.

Changing the latest commit

The latest commit is the easiest to change. Just make more changes and git commit --amend. Of course this doesn't actually change the commit, but creates a new one and moves the refs for HEAD and the current branch there.

Changing an older commit

It is only slightly harder to change an older commit. Since you cannot amend the commit directly, you will need to make a new commit for your changes. You can remove files you mistakenly added, fix typos, even remove changes you don't want after all. Just make your changes and create a new commit from them.

You can then use the interactive rebase tool to squash these changes into the existing commit. To do this interactive rebase, first use git log to find the sha1 of the commit you wish to change. If we assume for now that that is commit 1f6a83a, you would run git rebase -i 1f6a83a^ and you will be presented with a text editor with contents that look somewhat like this:

pick 1f6a83a Awesome new feature: sine waves
pick fa3b29e Fix bug in noise generator
pick 6995214 Export to soundcloud
pick 0341984 This should be in the sine waves commit

This is called the worksheet and with it, you can tell rebase exactly what to do. In this case we want to squash the last commit into the first one, so we move the commit and change pick to squash.

pick 1f6a83a Awesome new feature: sine waves
squash 0341984 This should be in the sine waves commit
pick fa3b29e Fix bug in noise generator
pick 6995214 Export to soundcloud

Save the worksheet, close your editor and git rebase will do its magic. If you do git log -p, you will see that your commit is now gone, its effects having been moved to the commit where they should have been.

Making the latest commit or commits disappear

Making commits disappear is easy. git reset --hard HEAD^ makes the last commit go away. git reset --hard HEAD~5 does the same for the last 5 commits. Both also make the changes disappear from your index and worktree. If you do want to keep the changes in your worktree, for instance because you like the changes bit the commits were all messy and you want to redo them using git add -p, don't use hard resets, but git reset --soft.

Hard resets also work really well to undo merges that shouldn't have happened. If you git pull and notice that it does a merge you did not expect, you can do a hard reset to make the merge disappear (and then think about how to actually integrate your changes).

Making an older commit disappear

As was the case for changing a commit, making older commits go away is slightly trickier, but not much. Again do a git rebase -i to the parent of the commit you want to eradicate. In the instruction sheet, you simply delete the lines corresponding to commits that should go away, and git will make it happen.

Moving changes to a different branch

As we saw earlier in this article, git doesn't frown upon wanting to move changes to a different branch. We saw how to do this for uncommitted changes, but for committed changes it is really not that much different.

Start with checking out the branch that the changes should have been on. Then cherry-pick the commits that you want to have on this branch. Now go back to the branch they should not have been on and use the recipes above to make the commits disappear. Either a hard reset or an interactive rebase, depending on where the commits are in your history.

Making (parts of) files disappear from all of history

The recipes above work great for removing or changing single commits, but what if you want to remove a file from all of history? Or committed a password 20 commits ago and want to eradicate it? There are two ways of doing this: git filter-branch, which is black magic on steroids that deserves its own article, or the BFG repo cleaner, which is kinda black magic but much more usable.

The BFG also deserves its own article, and already has one! Go read that article for more information about this kind of scrubbing.

Undoing a rebase, reset or other rewriting

All this rebasing and reseting lets you fix up a lot of things. But what if you mess up while doing so? How do you go back to history that has been deleted? Once again, git has got you covered. As explained early on in this article, git keeps a log of everything you do to refs that change, this includes rewriting the history. So even after a rebase, git reflog knows what you were up to and can help you recover from even more mistakes. As long as a commit is in the reflog, or reachable from a commit in the reflog, git will not delete it during garbage collection and you have yet another safety net in case of mistakes.