As you may have heard by now, git stores its commits and other data in a directed acyclic graph of objects. What this means is that each commit is recorded as a piece of data containing an identifier (a sha1), a pointer to a tree object (another sha1), a log message, author and committer info, and most importantly for this article: information about its parents. A commit can have zero parents (root commit), one parent (regular old commit), or more than one parent (a merge commit).
Using this information about parents, you can describe each commit in relationship to its parents, and using some of git's plumbing information you can use this information for many purposes.
Many git commands, such as
git show and
git checkout accept a "commit-ish",
something that looks like a commit, as argument to specify a commit to act on.
A commit-ish can be the sha1 of a commit, a branch, tag or other ref that
points to that commit, or some of the things in this article.
When experimenting with these things, the
git rev-parse command is incredibly
useful, as it can tell you whether you actually have a commit-ish or just some
$ git rev-parse --verify HEAD 6dd661125d2715b36ca2ea2b32e3c5b7838eff58 $ git rev-parse --verify nonsense fatal: Needed a single revision
Commit graph walking
One way of specifying a commit is taking another commit and telling git to walk the history tree in a specific way. For example, if you want to do an interactive rebase of the last 4 commits, you can tell git that in a concise way:
$ git rebase -i HEAD~5
The tilde and number tell git: "using the first parent of each commit, walk 5 commits backwards. I mean that commit."
Notice that I said "using the first parent". If you want to use a different parent, you can use the caret to tell git to take a side street:
$ git show HEAD~3^2~2
This one means: "walk 3 commits back using the first parents of each commit That is a merge commit, go to its second parent (the top of the branch you merged in) and walk back to more commits using its first parent".
Let's have an illustration to make it all a bit clearer
master~3^2~2 to see the meaning of all
these things illustrated.
These exact paths through the commit tree are incredibly useful, but can be a bit unwieldy, sometimes you just need a general indication of how big the 'distance' is between two commits. A prime example of this is in build systems that use git information to create version numbers.
Even git itself uses this. If you build git from a git checkout, the version number is based on the git-describe output.
$ git clone https://github.com/git/git Cloning into 'git'... [...] $ cd git $ make git GIT_VERSION = 2.8.0.rc3.12.g047057b * new build flags * new prefix flags [...] AR xdiff/lib.a LINK git $ git describe v2.8.0-rc3-12-g047057b $ git rev-list --count v2.8.0-rc3.. 12 $ git rev-parse --short HEAD 047057b $ ./git --version git version 2.8.0.rc3.12.g047057b
So what does git describe do? It walks the commit history backwards to find the nearest annotated tag, in the case above that would be v2.8.0-rc3. It then appends the number of commits that have been added since that tag and an abbreviation of the exact sha1 of the commit you're looking at. That way you do uniquely identify the commit, but still put it in relation to the latest released version. And you can even feed the output back into git:
$ git rev-parse v2.8.0-rc3-12-g047057b 047057bb4159533b3323003f89160588c9e61fbd
Because git only stores pointers to parent commits, and not child commits, you can't easily answer questions like "what are my child commits" without walking the history graph.
Git can help you with this history walking in a few ways.
rev-list is one of git's plumbing subcommands.
It exposes the revision listing algorithm used by e.g. git log to be used in
scripts. It can also tell you who your children are:
git rev-list --children HEAD~10..
This walks the history graph 10 commits deep and reports all commits and their children. Its output is not pretty, generally meant to feed scripts.
Generally, direct children aren't the most useful information about commits.
More common are questions like "in which release did this feature appear?" Fore
example, git at some point learned to
chdir into a directory before doing
anything if you do
git -C /some/path .... A quick
git blame on
that this was added by commit
44e1e4d6. And according to
git tag --contains
44e1e4d6, this commit first appeared in git 1.8.5 rc0.
So far we've talked about relationships between two commits that are descendants of each other, but what to do if they are not? Is it even useful to talk about relationships between unrelated commits?
What would be useful things to say about the relationship between master and feature here? How about the fork point?
$ git merge-base master feature d05ef5de47773d03e9d09641209121591a6b37c8
When git does a merge, of multiple commits, it only looks at the commits being
merged and their merge base to determine how to merge the content. The
merge-base is also used by
git rebase to guess what to rebase if you're not
giving specific parameters.
But you don't always need to know the exact merge base, sometimes you just want
to know whether one commit is an ancestor of another or not. For example, in
perl.git, we allow people to
push -f to overwrite personal branches, but not
the 'blead' branch (our equivalent of master). So we cannot set
receive.denyNonFastForwards and have to solve this in an update hook, based
on the example hook shipped with git. The key part of that hook is:
case "$refname","$newrev_type" in refs/heads/*/$USER/*,commit|refs/heads/$USER/*,commit) ;; refs/heads/*,commit) if [ "$oldrev" != "0000000000000000000000000000000000000000" ] && ! git merge-base --is-ancestor "$oldrev" "$newrev"; then echo "*** Non-fast-forward push to $refname rejected, you should pull first" >&2 exit 1 fi ;; esac
This will refuse non-fast-forward pushes to all branches that do not have the user's loginname as a path component.
Another useful thing to know about these diverging branches is whether they have any commits in common, which can happen if commits get cherry-picked from one branch to another. They will have different commit id's, even if the patch text and log message is the same, because they'll have different parents.
Git can also calculate a 'patch id' based on just the patch content, this is
used by the
git cherry command to show you which commits have not yet been
cherry-picked or otherwise applied to another branch.