Undoing all kinds of mistakes
Posted on Sun 20 March 2016 in Effective git usage
The most commonly asked questions in #git are variants of "I did something with my repository I should not have done, how do I fix this". Often the asker uses words like revert or undo very ambiguously and it takes a while to figure out what the desired result is. Here we discuss various ways of fixing mistakes and when to use them.
Make sure you understand these commands completely before running them
The git commands in this article are all about making things disappear, so they can be quite dangerous when used incorrectly. Make sure you understand what these commands do before running them, and run them in noop mode first if possible (most commands accept a -n flag for that).
The recipes in the sections about undoing committed changes all work best if you have a clean worktree. So no staged or unstaged changes. Stash or commit them before doing cleanups.
Let's start with some definitions where git is very precise, but people are not. When asking questions about fixing your repository, it really helps if you use these terms correctly.
working directory
The working directory or worktree is the directory that holds both your .git directory and the checked out files.
commit
A commit is a full snapshot of all files in your repository. Git does not store diffs or patches, only full snapshots of files. How git manages not to require a very large amount of disk space for your repository is a subject for another article. For now it's just important to remember that all versions of all files ever committed can be retrieved by git.
branch, tag and ref
Usually when people refer to a branch, they mean a set of commits that follow each other. This is not wrong per se, but in the world of git, the meaning of branch is more subtle. A branch label, say 'master', is really nothing more than a name that points to a commit (called the tip of that branch). Branch names are not recorded in commits, and once merged they can be deleted without losing commits.
Such a label is called a ref (from reference), and git has many of them. The most important ones are heads (branches), which are refs that move, and tags, which are refs that don't move.
reflog
For refs that move (branches and HEAD), git keeps a log of when they moved and why. So for every commit, reset, merge and all other actions that move heads around, git tracks before and after states. Even when you change history and commits become unreachable, the reflog has your back. And because git's garbage collection also does not delete things that are still in the reflog, you can even undo things that would be very destructive otherwise. Git really doesn't like losing committed data.
commit-ish
Many git commands take a commit identifier as argument. While each commit has it's own unique identifier, the sha1, a commit can usually be referred to in many ways: any ref that points to it, the commit tree walking tricks with ^ and ~ etc. 'commit-ish' means any of the ways you can refer to a commit.
index
The index or staging area is a feature that is unique to git and is part of what makes git so powerful at commit grooming and refining. The staging area, as its name implies is a staging area where you prepare the next commit. It is in essence a simple list of (filename, sha1) pairs that tell git which data objects should be part of the next commit.
When you git add
a file, git actually already adds the file to the object
database and adds the sha1 of the file to the index. This is what makes git
add -p
possible, but also why you have to git add
the file after every
change.
revert
As a noun, it means a commit that is the inverse of another commit, effectively undoing the changes of that commit. As a verb it means to create such a commit. This is the most misused word when talking about undoing changes, so please only use revert if you actually mean either of these two things.
reset
Reset can affect the working tree, the index and the commit graph. So it can mean three things, or a combination of two or three of those things. Have I told you yet that git can be confusing?
- When talking about the commit graph, to reset a branch means to point a branch label to another commit, in the context of undoing changes usually an older commit. This makes git forget that commits newer than the commit you reset to have ever been part of that branch.
- Reset can also manipulate the index (reset --hard, reset --mixed, reset -p). This does the inverse of git add, making the index resemble the last commit and not the worktree.
- And finally, reset can undo changes in the worktree (reset --hard).
As a verb, unfortunately it can also mean all of these things. So when talking about a reset, it's vital to say exactly which command you mean.
checkout
To check out something means to update the index and working tree with contents
from a commit and update the HEAD pointer. The usual invocation of git
checkout branchname
makes the index and worktree match the tip of the branch
and also updates HEAD to point to that tip.
Checkout can also be used to grab only parts of the contents of a commit. In this mode it does not update HEAD. And finally, because git users are lazy, git checkout can also be used to create new branches and check them out at the same time, this is what the -b option does.
merge
A merge commit is a commit with more than one parent. Nothing more, nothing less.
To merge means to create a merge commit, merging two or more branches into one. When merging, you will often need to resolve conflicts between these branches.
rebase
Rebasing commits copies them to another place in the commit graph. See the rebasing illustrated article for more info on rebase.
Fixing up uncommitted changes
By far the easiest undo to accomplish is undoing uncommitted changes. But even here git is surprisingly flexible, allowing you to decide which local changes to keep (and where to keep them) and which not to.
Getting rid of all local uncommitted changes
Sometimes you just really want to say 'damn it, I did it all wrong, let's get
rid of this mess' and undo all uncommitted changes and hang your head in shame.
Your friend in this case is git reset --hard
which resets the index and the
worktree to the state of the last commit.
And if you also want to get rid of untracked files, git clean -di
(or its
more destructive options, -f and -x) will help you clean up even more.
Undoing selected local changes
While it's fun to tableflip all your changes away, usually you only want to
undo some of your local changes while preserving the rest. If you've already
git add
ed the changes, first do a git reset --mixed
of the files you want
to change to make git forget that you added some changes to the index.
If you want to undo all changes to a certain file, you can simply check the
file out again: git checkout -- path/to/file
. This also works to 'undelete' a
tracked file that you deleted.
To only undo some changes to a file, you can still use checkout, but now with
the -p flag: git checkout -p -- path/to/file
. Like git add -p
, it will show
each change and ask you what to do with it.
Undoing staged changes
If you've git add
ed a change, or an entire new file, you can simply git
reset filename
to undo the adding, without touching any history or your
worktree. If you don't want to undo the adding, but want to add more changes to
the same file, simply git add
them and git will update the index.
Moving changes to a different branch
Another common issue is finding out you're on the wrong branch and wanting to
move your changes to that other branch. If you're lucky, you can simply check
out that branch (git checkout branchname
if the branch already exists, git
checkout -b branchname
for a new one). However, if your changes conflict with
that branch, you can first git stash
your changes, do the checkout and git
stash apply
, followed by the normal conflict resolution.
I don't like git stash
though, so I take a different approach. Which is not
actually that different from what git stash
does, except with a whole lot
less magic and no abuse of the reflog.
First I tag where I'm currently at so I can easily go back. git tag
backup
. Then I git commit
my
changes in one or more commits. If there are also changes, I do not want to
commit, I'll reset them out of the way. Once that's done, I'll git checkout
develop
to go to the other branch. I then git
cherry-pick backup..master
to
cherry-pick the new commits onto that branch, solving any merge conflicts that
may arise. Then I git checkout master
and git
reset --hard backup
to point master to where it
should be. Now we can git tag -d backup
and everything is squeaky clean again.
Recovering uncommitted files after reset
After working with git for a while, most people know that once a file has been
committed, git will not easily lose it. What many people do not know is that
even just git add
is enough to make git remember the version of the file you
are adding, even when you make more changes and do another git add. And even
when you git reset --hard
before committing!
The trick is that git add
actually already creates a git object for you and
puts its sha1 in the index. When you add again, or when you reset, that blob
becomes a so-called dangling blob and git gc
will eventually clean it up.
But until it has done so, git fsck
will find it and tell you the sha1's of
all dangling blobs. You can then use git show
to recover them, or use git
fsck --lost-found
to recover them all at once.
Fixing up committed changes without rewriting history
Have a clean worktree
The recipes in this section assume you have a clean worktree. Some may cause you to lose uncommitted changes, so make sure you commit or stash any work in progress.
Once a change has been committed, there are two general ways of undoing the change: rewriting history, making it look like the change never happened. Or creating changes that invert your change. While it's perfectly safe to change history you have never pushed, or to clean up/alter history that has not yet been merged in main branches, things become more complicated when changing for example the master branch of a popular project after pushing it to a central repository, as others may have based new work on it.
If you change published history that other people have based their work on, they also need to alter their histories. Please be aware of this when altering such history. To help those people, we start with fixes that do not require any modification of the commit history.
Undoing an older commit
To make a commit that inverts all the changes of another commit, you use git
revert. For example, to revert the second to last commit in the graph above,
you could do git revert HEAD^
.
And since a revert is just a simple commit, it can also be reverted, making the
changes appear again. This can be useful if you only had to revert changes
temporarily while preparing for them to work. In the graph above, git revert
HEAD
would do the trick.
Reverting many commits
You can revert many commits in a single command. For example, should you decide
that everything between version 0.1 and 0.2 was actually a big mistake, you can
git revert v0.1..v0.2
. If you want to make
only one commit containing all of the reverts, you can git revert -n
. This
will revert the changes in the worktree but it will not create a new commit.
That way you can do final tweaking and commit the results yourself.
Reverting to a specific commit
If you wish to make the next commit look exactly like another commit, you can of course revert until you reach its state. But that may be tricky, or even impossible if that commit is not a direct ancestor of the current HEAD.
But fear not, git is here to help you out. Remember that git does not track
changes, instead each commit is a full snapshot of your files. So let's not try
to undo changes made, but just git checkout commit-ish -- .
. Your tree now
looks exactly like the commit you specified, and you can commit it.
There's just one caveat: if there are files in your current commit that are not
in the other commit, they will be kept in their current state. So a more
complete version of this recipe is: git rm -rf :/ && git checkout commit-ish -- :/
If the commit you want to use as the source of truth is on another branch (and
if it isn't, you can simply create a branch) can also trick git merge
into
doing this. By using the 'ours' merge strategy, it will make a merge commit
that has multiple parents, but instead of merging the contents of those commits
and their merge base, it simply discards the contents of the other commits and
keeps the contents of the current branch.
So if you want to make master look like exactly like develop, that would look as follows:
$ git checkout develop
$ git merge -s ours master
$ git checkout master
$ git merge --ff-only develop
If the commit you want to revert to is not at the tip of a branch, you can simply create a temporary branch:
$ git checkout -b temp-branch 03406c86
$ git merge -s ours master
$ git checkout master
$ git merge --ff-only temp-branch
Reverting a single file
The above recipes are all very useful if you want to revert entire commits. But
what if you just want to revert parts of it? To revert the edits to a single
file, you can use a combination of diff and apply: git diff
commit-ish^..commit-ish -- file | git apply -
.
And if you want to make a file look the way it looked in another commit, you
can simply check the file out: git checkout commitsh -- file
. Use checkout -p
to decide hunk-by-hunk whether to retain your current version or use the other
version.
Stuck in a conflict
Commands that can result in conflicts, such as merge, cherry-pick and rebase, all use the same strategy for solving the conflicts: use the built-in algorithms to automatically resolve them, and if those fail the user gets to pick up the pieces and solve the conflict manually.
The commands keep some state around when they do so, which is incredibly useful
if you don't want to or cannot resolve the conflict. You can then simply do
git merge --abort
(or a similar incantation for rebase and cherry-pick) and
git brings you back to the state where you were before attempting whatever you
did that caused the conflict.
Reverting a merge
Every commit can be reverted, even a merge commit. But reverting a merge commit has one really big downside, which I will illustrate with the graph below. There are 2 branches: master and develop, and develop got merged into master. After this both master and develop have received new commits.
When you git revert HEAD^
, git does not undo the
merge, but only its effect. So all changes from the develop branch disappear.
If you now git merge develop
again, they also do
not come back, only the changes from the last two commits on the develop
branch are applied!
Why is this? Well, when git does a merge, it does a 3-way merge of the content of the current branch, the branch merged in and their common ancestor. For the second merge, the grandparent of the tip of the 'develop' branch is now that common ancestor. So all git sees is that in the current branch a bunch of changes were made, it does not see that these are undoing older commits. It also does not see those older commits, as it does not look further back than the merge base.
So all in all, reverting a merge is not always a good idea. If you still really want to make that merge go away and do not mind rewriting history, there is another recipe for you further below.
Stop tracking a file
If you want a file to no longer exist, you git rm
it. This deletes it from
disk and adds the deletion to the index, ready for the next commit. But if you
do want to keep it on disk, just not in the repository, you can git rm
--cached
it, this only stages the deletion but leaves the file untouched.
There is one big caveat here though: if you commit this deletion, and then pull that change into another repository, for instance, to deploy your changes, the file will be deleted from disk there!
To get the file back on local disk, you can use git log to find the last copy of it, and git show to get it back.
$ file=myfile.txt
$ commit=$(git log -1 --format=%H -- "$file")
$ git show "$commit^:$file" > "$file"
Rewriting history to make mistakes disappear
Have a clean worktree
The recipes in this section assume you have a clean worktree. Some may cause you to lose uncommitted changes, so make sure you commit or stash any work in progress.
While some people consider it a thoughtcrime to even think about changing history, sometimes you really need to be Winston and make sure things have never happened. Whether you've committed passwords or simply want to clean up before merging, git has you covered.
Before you go all minitrue (ok... that's enough 1984 puns), please do think about the people you are collaborating with in the repository you are manipulating. While it's perfectly safe to alter history you have never pushed, or to clean up/modify pull requests that have not yet been merged, things become more complicated when changing for example the master branch of a popular project after pushing it to a central repository, as others may have based new work on it.
If you change published history that other people have based their work on, they also need to alter their histories. This can be a complicated, error-prone task and you should really avoid forcing others to do so.
Don't use git push -f
After changing history that you have already pushed, you will notice that
git push
now fails. This is a failsafe mechanism to avoid losing data.
But since in this case you want to lose data, you will need to tell git to
accept this. The common way is to do git push -f
, but that's actually
quite bad. A safer alternative is git push --force-with-lease
, which
makes sure nobody else added commits on top of what you altered. And just
to avoid typing all those characters, you can git config alias.force-push
'push --force-with-lease'
and then simply use git force-push
.
Changing the latest commit
The latest commit is the easiest to change. Just make more changes and git
commit --amend
. Of course this doesn't actually
change the commit, but creates a new one and moves the refs for HEAD and the
current branch there.
Changing an older commit
It is only slightly harder to change an older commit. Since you cannot amend the commit directly, you will need to make a new commit for your changes. You can remove files you mistakenly added, fix typos, even remove changes you don't want after all. Just make your changes and create a new commit from them.
You can then use the interactive rebase tool to squash these changes into the
existing commit. To do this interactive rebase, first use git log
to find the
sha1 of the commit you wish to change. If we assume for now that that is commit
1f6a83a, you would run git rebase -i 1f6a83a^
and you will be presented with
a text editor with contents that look somewhat like this:
pick 1f6a83a Awesome new feature: sine waves
pick fa3b29e Fix bug in noise generator
pick 6995214 Export to soundcloud
pick 0341984 This should be in the sine waves commit
This is called the worksheet and with it, you can tell rebase exactly what to do. In this case we want to squash the last commit into the first one, so we move the commit and change pick to squash.
pick 1f6a83a Awesome new feature: sine waves
squash 0341984 This should be in the sine waves commit
pick fa3b29e Fix bug in noise generator
pick 6995214 Export to soundcloud
Save the worksheet, close your editor and git rebase will do its magic. If you
do git log -p
, you will see that your commit is now gone, its effects having
been moved to the commit where they should have been.
Making the latest commit or commits disappear
Making commits disappear is easy. git reset --hard HEAD^
makes the last
commit go away. git reset --hard HEAD~5
does the same for the last 5 commits.
Both also make the changes disappear from your index and worktree. If you do
want to keep the changes in your worktree, for instance because you like the
changes bit the commits were all messy and you want to redo them using git add
-p
, don't use hard resets, but git reset --soft
.
Hard resets also work really well to undo merges that shouldn't have happened.
If you git pull
and notice that it does a merge you did not expect, you can
do a hard reset to make the merge disappear (and then think about how to
actually integrate your changes).
Making an older commit disappear
As was the case for changing a commit, making older commits go away is slightly
trickier, but not much. Again do a git rebase -i
to the parent of the commit
you want to eradicate. In the instruction sheet, you simply delete the lines
corresponding to commits that should go away, and git will make it happen.
Moving changes to a different branch
As we saw earlier in this article, git doesn't frown upon wanting to move changes to a different branch. We saw how to do this for uncommitted changes, but for committed changes it is really not that much different.
Start with checking out the branch that the changes should have been on. Then cherry-pick the commits that you want to have on this branch. Now go back to the branch they should not have been on and use the recipes above to make the commits disappear. Either a hard reset or an interactive rebase, depending on where the commits are in your history.
Making (parts of) files disappear from all of history
The recipes above work great for removing or changing single commits, but what
if you want to remove a file from all of history? Or committed a password 20
commits ago and want to eradicate it? There are two ways of doing this: git
filter-branch
, which is black magic on steroids that deserves its own article,
or the BFG repo cleaner, which is kinda black magic but much more usable.
The BFG also deserves its own article, and already has one! Go read that article for more information about this kind of scrubbing.
Undoing a rebase, reset or other rewriting
All this rebasing and reseting lets you fix up a lot of things. But what if you
mess up while doing so? How do you go back to history that has been deleted?
Once again, git has got you covered. As explained early on in this article, git
keeps a log of everything you do to refs that change, this includes rewriting
the history. So even after a rebase, git reflog
knows what you were up to and
can help you recover from even more mistakes. As long as a commit is in the
reflog, or reachable from a commit in the reflog, git will not delete it during
garbage collection and you have yet another safety net in case of mistakes.