Describing the relationship between commits

Posted on Sun 20 March 2016 in Effective git usage

As you may have heard by now, git stores its commits and other data in a directed acyclic graph of objects. What this means is that each commit is recorded as a piece of data containing an identifier (a sha1), a pointer to a tree object (another sha1), a log message, author and committer info, and most importantly for this article: information about its parents. A commit can have zero parents (root commit), one parent (regular old commit), or more than one parent (a merge commit).

Using this information about parents, you can describe each commit in relationship to its parents, and using some of git's plumbing information you can use this information for many purposes.

Commit-ish and git rev-parse

Many git commands, such as git show and git checkout accept a "commit-ish", something that looks like a commit, as argument to specify a commit to act on. A commit-ish can be the sha1 of a commit, a branch, tag or other ref that points to that commit, or some of the things in this article.

When experimenting with these things, the git rev-parse command is incredibly useful, as it can tell you whether you actually have a commit-ish or just some random string:

$ git rev-parse --verify HEAD
6dd661125d2715b36ca2ea2b32e3c5b7838eff58
$ git rev-parse --verify nonsense
fatal: Needed a single revision

Commit graph walking

One way of specifying a commit is taking another commit and telling git to walk the history tree in a specific way. For example, if you want to do an interactive rebase of the last 4 commits, you can tell git that in a concise way:

$ git rebase -i HEAD~5

The tilde and number tell git: "using the first parent of each commit, walk 5 commits backwards. I mean that commit."

Notice that I said "using the first parent". If you want to use a different parent, you can use the caret to tell git to take a side street:

$ git show HEAD~3^2~2

This one means: "walk 3 commits back using the first parents of each commit That is a merge commit, go to its second parent (the top of the branch you merged in) and walk back to more commits using its first parent".

Let's have an illustration to make it all a bit clearer

Check out master~3, master~4, master~3^2, and master~3^2~2 to see the meaning of all these things illustrated.

git describe

These exact paths through the commit tree are incredibly useful, but can be a bit unwieldy, sometimes you just need a general indication of how big the 'distance' is between two commits. A prime example of this is in build systems that use git information to create version numbers.

Even git itself uses this. If you build git from a git checkout, the version number is based on the git-describe output.

$ git clone https://github.com/git/git
Cloning into 'git'...
[...]
$ cd git
$ make git
GIT_VERSION = 2.8.0.rc3.12.g047057b
    * new build flags
    * new prefix flags
[...]
    AR xdiff/lib.a
    LINK git
$ git describe
v2.8.0-rc3-12-g047057b
$ git rev-list --count v2.8.0-rc3..
12
$ git rev-parse --short HEAD
047057b
$ ./git --version
git version 2.8.0.rc3.12.g047057b

So what does git describe do? It walks the commit history backwards to find the nearest annotated tag, in the case above that would be v2.8.0-rc3. It then appends the number of commits that have been added since that tag and an abbreviation of the exact sha1 of the commit you're looking at. That way you do uniquely identify the commit, but still put it in relation to the latest released version. And you can even feed the output back into git:

$ git rev-parse v2.8.0-rc3-12-g047057b
047057bb4159533b3323003f89160588c9e61fbd

Reverse relationships

Because git only stores pointers to parent commits, and not child commits, you can't easily answer questions like "what are my child commits" without walking the history graph.

Git can help you with this history walking in a few ways.

rev-list

Like the rev-parse command, rev-list is one of git's plumbing subcommands. It exposes the revision listing algorithm used by e.g. git log to be used in scripts. It can also tell you who your children are:

git rev-list --children HEAD~10..

This walks the history graph 10 commits deep and reports all commits and their children. Its output is not pretty, generally meant to feed scripts.

tag --contains

Generally, direct children aren't the most useful information about commits. More common are questions like "in which release did this feature appear?" Fore example, git at some point learned to chdir into a directory before doing anything if you do git -C /some/path .... A quick git blame on git.c shows that this was added by commit 44e1e4d6. And according to git tag --contains 44e1e4d6, this commit first appeared in git 1.8.5 rc0.

Other relationships

So far we've talked about relationships between two commits that are descendants of each other, but what to do if they are not? Is it even useful to talk about relationships between unrelated commits?

What would be useful things to say about the relationship between master and feature here? How about the fork point?

$ git merge-base master feature
d05ef5de47773d03e9d09641209121591a6b37c8

When git does a merge, of multiple commits, it only looks at the commits being merged and their merge base to determine how to merge the content. The merge-base is also used by git rebase to guess what to rebase if you're not giving specific parameters.

But you don't always need to know the exact merge base, sometimes you just want to know whether one commit is an ancestor of another or not. For example, in perl.git, we allow people to push -f to overwrite personal branches, but not the 'blead' branch (our equivalent of master). So we cannot set receive.denyNonFastForwards and have to solve this in an update hook, based on the example hook shipped with git. The key part of that hook is:

case "$refname","$newrev_type" in
refs/heads/*/$USER/*,commit|refs/heads/$USER/*,commit)
    ;;
refs/heads/*,commit)
    if [ "$oldrev" != "0000000000000000000000000000000000000000" ] && ! git merge-base --is-ancestor "$oldrev" "$newrev"; then
        echo "*** Non-fast-forward push to $refname rejected, you should pull first" >&2
        exit 1
    fi
    ;;
esac

This will refuse non-fast-forward pushes to all branches that do not have the user's loginname as a path component.

Missing commits

Another useful thing to know about these diverging branches is whether they have any commits in common, which can happen if commits get cherry-picked from one branch to another. They will have different commit id's, even if the patch text and log message is the same, because they'll have different parents.

Git can also calculate a 'patch id' based on just the patch content, this is used by the git cherry command to show you which commits have not yet been cherry-picked or otherwise applied to another branch.