The meaning of refs and refspecs

Posted on Sat 07 November 2015 in Background information

Git's simple objects-and-refs model allows you to be very creative in naming your things and putting them in groups. For instance, all branches have names starting with refs/heads/, tags all start with refs/tags and so on.

So how many types of refs are there and how do they differ? Turns out, quite a few, and with quite a few differences!

Let's start with the one thing that's the same for all of them: you can use them anywhere git expects something that looks like a commit. So git show BISECT_HEAD or git log v2.6.2 are valid commands. But that's really the only thing that's the same for all refs.

Symbolic refs and tags

Most refs are a pointer from a name to a commit. There are three exceptions to this rule:

  • HEAD (discussed next) is usually a symbolic ref pointing to a local branch
  • refs/remotes/remote-name/HEAD are symbolic refs that point to remote-tracking branches
  • Tags usually point to tag objects or commits, but it's not unheard of for tags to point to blobs or trees.

Symbolic refs do not point to an object, but to another ref; similar to how symlinks work (and initially they were actually implemented as symlinks). They can not be fetched or pushed, the only exception is that a remote can advertise what its HEAD points to, which git can use when cloning to create refs/remotes/remote-name/HEAD. For this reason their usefulness is limited and they don't see much use, except of course for the HEAD ref.

There is also special fetch and push behaviour for tags, which is explained below.

But every ref can point anywhere!

Yes, with git update-ref and git symbolic-ref you can do whatever you want, but don't. Predictability is a good thing and you should not ruin it by being overly creative unless you have a good reason to do so.

HEAD

HEAD is a very special ref: it is the currently checked out commit. It can be either a symbolic ref that points to a branch, or a direct pointer to a commit. Unless you're manually updating the HEAD ref with git update-ref or git symbolic-ref, it can have no other values.

And if you do update HEAD manually, you'll see something you might not expect: a lot of files in your worktree will be dirty. This is because updating the ref does not change the file contents, so git now thinks something else is checked out than what the files represent and considers them all dirty. Take this for example, using git.git

$ git status
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
$ git update-ref HEAD HEAD^
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   Documentation/RelNotes/2.6.3.txt
    modified:   Documentation/git.txt

Another special thing about HEAD is that it is not prefixed with refs/ like most refs, but its name is just HEAD.

Ref location is an implementation detail

The fact that HEAD lives in .git/HEAD, branches live in .git/refs/heads etc. is an implementation detail you must not rely on. To read or update the value of any ref, you must use the git rev-parse, git update-ref and git symbolic-ref command, not read or write those files directly.

More HEADs

There are a few more HEAD-like refs that don't live in the refs/ hierarchy. Unlike HEAD, these don't have a reflog, and are mostly used by a single tool only.

  • ORIG_HEAD is sometimes created by tools that update HEAD in a drastic way
  • CHERRY_PICK_HEAD points to the commit you are currently cherry-picking
  • BISECT_HEAD is used by git-bisect in some cases
  • SVN2GIT_HEAD is created by git-svnimport

And there are two refs that are really special in that they can point to multiple objects:

  • FETCH_HEAD contains a list of all refs you last fetched, with the first one in the list marked as usable for merging.
  • MERGE_HEAD contains all the heads you are currently merging into the current branch, which could be more than one.

The existence of the merge, bisect and cherry-pick heads can be used as an indication that such an operation is in progress. git status does this, as does the git extension for the bash prompt.

Tags

All tags live in refs/tags, both the ones you created locally and the ones you fetched from others. There are two types of tags: lightweight tags which point directly to a commit, tree or blob, and annotated tags which point to a tag object. A tag object contains a tag message (for example "Version 1.0"), a pointer to a commit, tree or blob, and possibly a GPG signature.

Annotated tags should be used for tags you want to share, such as releases. Lightweight tags can be used for simple local bookmarks. It is of course not mandatory to stick to this, but there are tools that have this rule built in, such as git describe, which by default only looks at annotated tags, or the push.followtags configuration variable which also ignores lightweight tags.

Tags should also never change. While branches are used to show progress, and branch heads show the current state, tags are meant to mark a specific point in history. Once created they should never change. In fact, git fetch by default will not fetch any tags that already exist locally, even if the values differ. Tags are also not in a per-remote tree inside the refs/ hierarchy, instead all tags are fetched right into refs/tags.

Local branches

Local branches are the place where you add commits. By default git creates a branch named master when you initialize a repository, and most projects stick to that name for their default branch. This is of course not mandatory, for example perl.git doesn't have a master branch, their main branch is called blead, because that's what that branch was called before they moved to git.

Branch names of course point to different commits all the time. Every commit, merge and reset can make it point somewhere else. The information about when a branch pointed where is not stored anywhere in the git history. Mostly because it's irrelevant in the big picture, but also because it can be seen as private data. It can however be very useful to use this information when you are troubleshooting broken repositories, or recovering discarded commits that turn out to be wanted after all. For this, git stores this historical information in the reflog, a special log per ref which is only kept for branches and for the HEAD ref. Here's an example reflog of one of my personal projects.

$ git reflog
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{0}: checkout: moving from ea5ee3825b114ddab7513c2ae03afe8161f96608 to master
ea5ee38 (tag: 2.0, seveastest-test/master) HEAD@{1}: checkout: moving from master to 2.0
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{2}: checkout: moving from 5c7759821b9a52b63a6201488319abace9cfca09 to master
5c77598 HEAD@{3}: commit: foo
d3097d1 HEAD@{4}: checkout: moving from master to HEAD^
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{5}: checkout: moving from temp-branch to master
d3097d1 HEAD@{6}: checkout: moving from d3097d1d8311d14471261ca303e9a4fd27e696c8 to temp-branch
d3097d1 HEAD@{7}: checkout: moving from master to HEAD^
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{8}: checkout: moving from ba64e26e9d712ec87fb7eb8d8b916b57fc7096cc to master
ba64e26 (tag: 2.3) HEAD@{9}: checkout: moving from master to ba64e26e9d712ec87fb7eb8d8b916b57fc7096cc
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{10}: checkout: moving from master to master
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{11}: checkout: moving from master to master
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{12}: checkout: moving from master to master
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{13}: checkout: moving from d3097d1d8311d14471261ca303e9a4fd27e696c8 to master
d3097d1 HEAD@{14}: checkout: moving from master to HEAD^
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{15}: checkout: moving from debian to master
aa78569 (origin/debian, debian) HEAD@{16}: commit: New debian package
6f77ad1 HEAD@{17}: merge v2.5.1: Merge made by the 'recursive' strategy.
61b5083 HEAD@{18}: checkout: moving from master to debian
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{19}: checkout: moving from rpm to master
864c345 (origin/rpm, rpm) HEAD@{20}: commit (amend): Add specfile
d0908d9 HEAD@{21}: commit (amend): Add specfile
8a185f5 HEAD@{22}: commit (amend): Add specfile
2040df2 HEAD@{23}: rebase finished: returning to refs/heads/rpm
2040df2 HEAD@{24}: rebase: Add specfile
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{25}: rebase: checkout master
daf76a5 HEAD@{26}: checkout: moving from master to rpm
7c3c37b (HEAD -> master, tag: v2.5.1, origin/master, origin/HEAD) HEAD@{27}: commit: Release and copyright administrativa
d3097d1 HEAD@{28}: commit (amend): Enable travis tests
9401127 HEAD@{29}: commit (amend): Enable travis tests
973d756 HEAD@{30}: commit (amend): Enable travis tests
6c8d749 HEAD@{31}: commit (amend): Enable travis tests
a35343d HEAD@{32}: commit (amend): Enable travis tests
bd637ef HEAD@{33}: commit: Enable travis tests
9b0611f HEAD@{34}: commit: tests: don't relyh on non-coreutils tools
34555e0 (tag: 2.5) HEAD@{35}: checkout: moving from rpm to master
daf76a5 HEAD@{36}: commit: Add specfile
34555e0 (tag: 2.5) HEAD@{37}: checkout: moving from master to rpm
34555e0 (tag: 2.5) HEAD@{38}: reset: moving to HEAD^
3d85c97 HEAD@{39}: commit (amend): Enable travis tests
0699af1 HEAD@{40}: commit (amend): Enable travis tests
3c611df HEAD@{41}: commit (amend): Enable travis tests
a53ac2f HEAD@{42}: commit (amend): Enable travis tests
a0d7dc6 HEAD@{43}: commit (amend): Enable travis tests
5c50baa HEAD@{44}: commit (amend): Enable travis tests
c7674c3 HEAD@{45}: commit: Enable travis tests
34555e0 (tag: 2.5) HEAD@{46}: checkout: moving from debian to master
61b5083 HEAD@{47}: commit: New debian package
b6c48d6 HEAD@{48}: merge 2.5: Merge made by the 'recursive' strategy.
502e236 HEAD@{49}: checkout: moving from master to debian
34555e0 (tag: 2.5) HEAD@{50}: commit: Release 2.5: fix missing import and declare python 3 compatibility
2755035 (tag: 2.4, test/master) HEAD@{51}: reset: moving to origin/master
813a6b6 HEAD@{52}: commit: moo
2755035 (tag: 2.4, test/master) HEAD@{53}: checkout: moving from debian to master
502e236 HEAD@{54}: commit: New debian package
e6a269b HEAD@{55}: merge 2.4: Merge made by the 'recursive' strategy.
187383b HEAD@{56}: checkout: moving from master to debian
2755035 (tag: 2.4, test/master) HEAD@{57}: commit: Version 2.4
6c4f061 HEAD@{58}: commit: Make non-redirected commands work under windows
6098e98 HEAD@{59}: checkout: moving from debian to master
187383b HEAD@{60}: reset: moving to HEAD^
ddc337a HEAD@{61}: checkout: moving from master to debian
6098e98 HEAD@{62}: cherry-pick: Stray os.pipe() leads to fd leakage
ba64e26 (tag: 2.3) HEAD@{63}: checkout: moving from debian to master
ddc337a HEAD@{64}: commit: Stray os.pipe() leads to fd leakage
187383b HEAD@{65}: commit: New debian package
66e007c HEAD@{66}: merge 2.3: Merge made by the 'recursive' strategy.
fe3af8c HEAD@{67}: checkout: moving from master to debian
ba64e26 (tag: 2.3) HEAD@{68}: commit: Version 2.3
b40853f HEAD@{69}: commit: Ignore docs builddir
f351964 HEAD@{70}: commit: Python 3.4 compatibility
f43fe40 (tag: 2.2.1) HEAD@{71}: reset: moving to HEAD^
2e3b3fb HEAD@{72}: commit: Empty commit to trigger docs rebuild
f43fe40 (tag: 2.2.1) HEAD@{73}: checkout: moving from debian to master
fe3af8c HEAD@{74}: commit: New deban package
e80ac76 HEAD@{75}: merge 2.2.1: Merge made by the 'recursive' strategy.
c157a5d HEAD@{76}: checkout: moving from master to debian
f43fe40 (tag: 2.2.1) HEAD@{77}: commit (amend): README was renamed, BPB release
a995e2b HEAD@{78}: commit: README was renamed
e974196 (tag: 2.2) HEAD@{79}: checkout: moving from debian to master
c157a5d HEAD@{80}: commit: New debian package
951e051 HEAD@{81}: merge 2.2: Merge made by the 'recursive' strategy.
3b2cb36 HEAD@{82}: checkout: moving from master to debian
e974196 (tag: 2.2) HEAD@{83}: commit: Version 2.2
f45e4d3 HEAD@{84}: commit: Add some missing tests
c154086 HEAD@{85}: commit: PUN-inspired documentation additions
b37dfd9 HEAD@{86}: commit (amend): Add PUN presentation
7804b96 HEAD@{87}: rebase -i (finish): returning to refs/heads/master
7804b96 HEAD@{88}: rebase -i (pick): Add PUN presentation
b1d0802 HEAD@{89}: rebase -i (pick): Add Result.__nonzero__, reflecting the exit status
35dd9e8 HEAD@{90}: rebase -i (start): checkout origin/master
9336eb8 HEAD@{91}: commit (amend): Add Result.__nonzero__, reflecting the exit status
55982f7 HEAD@{92}: commit: Add Result.__nonzero__, reflecting the exit status
a1af0c0 HEAD@{93}: commit (amend): Add PUN presentation
a575e2c HEAD@{94}: commit: Add PUN presentation
35dd9e8 HEAD@{95}: reset: moving to HEAD^
d76710e HEAD@{96}: cherry-pick: Corrected example command
35dd9e8 HEAD@{97}: commit (amend): Using double colons for the last three examples
907bca3 HEAD@{98}: cherry-pick: Using double colons for the last three examples
23bdfee HEAD@{99}: commit (amend): Better support for window
83c732b HEAD@{100}: pull --upload-pack git upload-pack root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:whelk: Fast-forward
ffccd08 (tag: 2.1) HEAD@{101}: pull: Fast-forward
ea5ee38 (tag: 2.0, seveastest-test/master) HEAD@{102}: clone: from git@github.com:seveas/whelk.git

As you can see every action that changes where the ref points to is stored. You can use this to recover original commits that you accidentally amended, undo rebases, see resets and whatnot. It's a great forensic tool.

But reflogs are not the only thing that sets branches apart from other refs. To help git pull, git push and git merge decide what you mean when you use them without arguments, branches can be configured to know what they should merge with and where they should push to by default. As a more concrete example, when you clone a repository, the default branch is checked out and configured to merge from origin/remote and push to origin.

$ git clone https://git.example.com/example.git
Cloning into 'example'...
remote: Counting objects: 589, done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 589 (delta 8), reused 0 (delta 0), pack-reused 554
Receiving objects: 100% (589/589), 847.50 KiB | 829.00 KiB/s, done.
Resolving deltas: 100% (303/303), done.
Checking connectivity... done.
$ git -C example config --get-regexp branch.master.*
branch.master.remote origin
branch.master.merge refs/heads/master

This configuration means that git pull will fetch from the remote named origin and merge what its refs/heads/master points to, that git push will push the branch to the origin remote and that master@{upstream} can be used to refer to refs/heads/remotes/origin/master.

When creating a branch based on a remote branch (for example: git checkout -b develop origin/develop), a similar configuration is set up for the new branch. Other ways to configure this for a branch are using git branch -u or git push -u.

And that brings us to the last thing that's different about branches: there are more ways to specify a commit relative to a branch than for other refs. If you look at the gitrevisions manpage, you'll see there are many ways to specify a commit relative to another one, such as HEAD~2 for the leftmost grandparent of HEAD. For every ref except branches and HEAD, you can only use commit tree walking tricks, such as refs/tags/v2.0~4^2~3 (take the v2.0 tag, walk 4 parents back using the first parents, then take the second parent of that merge commit, and walk 3 more parents back from there). But for branches you can say things like master@{upstream} to refer to the branch it would merge from, or master@{8.hours.ago}, which uses the reflog to tell you where master pointed to 8 hours ago.

Remote-tracking branches

So far we've only talked about local refs, and technically all refs are local. However, some are less local than others. The refs under refs/remotes are all copied from your remote repositories when you clone, fetch or push. Git even configures your repository in such a way that any update to those refs is accepted from the remote, even updates that rewrite the history of those branches.

There is one exception to this rule, and it sometimes causes confusion: branches deleted on the remote are not automatically deleted locally. And because refs are currently still stored as files, this can cause file/directory conflicts for certain ref updates.

$ git fetch
error: cannot lock ref 'refs/remotes/origin/test/dennis': 'refs/remotes/origin/test' exists; cannot create 'refs/remotes/origin/test/dennis'
From https://git.example.com/example.git
 ! [new branch]      test/dennis -> origin/test/dennis  (unable to update local ref)
error: some local refs could not be updated; try running
 'git remote prune origin' to remove any old, conflicting branches

This is a side effect of the current implementation, where refs are stored as files and you thus cannot have a foo and foo/bar ref at the same time. Git tells you what you can do to resolve this conflict, but you can also tell it to always prune old branches when fetching:

$ git config fetch.prune true
$ git fetch
From https://git.example.com/example.git
 x [deleted]         (none)     -> origin/test
 * [new branch]      test/dennis -> origin/test/dennis

To create a local branch based on a remote-tracking branch, you used to have to do two steps:

$ git branch develop refs/remotes/origin/develop
$ git checkout develop

Which could be shortened to

$ git checkout -b develop origin/develop

But more recent git versions allow you to simply say

$ git checkout develop

And if there is no local branch with that name, and exactly one remote that has a branch by that name, git will interpret that as git checkout -b develop some-remote/develop. Git is built for and by lazy people, which leads us nicely into the next section.

DWIM (Do What I Mean)

We're all lazy and we don't like typing refs/heads or refs/tags all the time. So git allows you to use only the relevant parts of the ref and tries to guess what you mean. When you use the word 'tortoise' as a ref, git will try to find it in the following locations, in this order and stops at the first found match:

  • A file in .git, which is really only useful for the HEAD variants which live there.
  • The tag refs/tags/tortoise
  • The branch refs/heads/tortoise
  • The remote refs/remotes/tortoise, which means that the remote-tracking branch refs/remotes/tortoise/shell can be specified as tortoise/shell
  • The remote-tracking symbolic ref refs/remotes/tortoise/HEAD

Any other ref, such as the ones mentioned below, will only be found by its full ref name, such as refs/pull/42/head

Specialty refs

The refs discussed so far are all pretty common. But there are quite a few more refs that are more special cases.

Stash

git stash uses the refs/stash ref and its reflog to keep track of your stashes. The (ab)use of the reflog is why you refer to stashes as stash@{1} etc.

Multiple worktrees

With git worktree, you can create multiple worktrees for the same repository that are all aware enough of each other to avoid stepping on each others toes. For instance, you cannot have the same branch checked out in two worktrees. The HEAD refs of all these worktrees can be found as worktrees/*/HEAD in the main repository. And despite them not being under refs/, they can still be used as refs.

Bisecting

We already saw the BISECT_HEAD ref, but git bisect stores more files in .git, including BISECT_START, which can also be used as a ref, and the refs/bisect refs....

Notes

The git notes subsystem, which allows you to attach arbitrary notes to objects without modifying those objects, stores its data under refs/notes/commit.

Replacement refs

Because you cannot change data in the past without rewriting all commits that come behind it, there is a way to indicate that parts of the history are wrong and should be looked at differently. With git replace you tell git 'this object should really be replaced by this other object', and command like git log will honor that. These replacements are stored as refs in under refs/replace.

Should you decide that you want to make these replacements permanent by rewriting history, git filter-branch can be used to do so. It honors replacement refs when reading, but writes out a history that no longer needs them.

Namespaces

Within a repo, git already has some space-saving tricks: identical objects are never stored twice and packfiles store nearly-identical objects using delta compression for further space saving. But if you need to serve multiple copies of the same repo (for example, all forks of git.git on GitHub), git has another space saving trick.

Using ref namespaces, you can store all these copes as a single repo, but each copy sees only its own refs. Such refs are stored under refs/namespaces and require applications that access the repo (the webserver and/or git daemon, not the git client) to specify the ref namespace.

Backups

We already know ORIG_HEAD, which gets created by commands that drastically move HEAD, but there are also the refs/original refs which are created by git filter-branch as a backup in case your rewrite goes all horribly wrong.

git-svn

Git svn stores its refs in refs/remotes/git-svn/*. It used to store them directly in refs/remotes/*, but this isn't a very good idea so that practice was discontinued.

Third party refs

All the refs so far are created by tools built into git or shipped with git. But git itself is not the only tool that creates refs. Various third party tools also store their information this way. If you know of any that are not listed below, please comment and I'll add them to the list!

GitHub

GitHub stores pull requests as two refs: one for the tip of the branch behind the request, and one that is a merge between that tip and the branch it should be pulled into. Here is an example of a pull request with four commits.

$ git log --oneline --graph --decorate refs/pull/30/merge
*   4f6a13a (refs/pull/30/merge) Merge 91fc0a9cddd8e28b91e9c87edb439e17f61fad1c into 546a215f53ee449159f2e653061f484e91ecc4d5
|\  
| * 91fc0a9 (refs/pull/30/head) Change distutils to setuptools
| * b76a8d1 Add long_description to setup.py
| * 1e319d1 Make valid rst-syntax
| * 5a5b4dd Add .rst suffix to README
|/  
* 546a215 git hub issues: When running outside a repo, display all open issues of all repos
* 1e33e5a (tag: 1.20) Release 1.20

Gerrit

The gerrit code review tool has two special sets of refs:

  • To create a change to be reviewed, you push to refs/for/$branchname instead of refs/heads/$branchname
  • Changesets can be fetched from gerrit under the refs/changes namespace

The refs/for/* refs don't actually ever get created, despite gerrit telling you it has done so, it's just gerrit's way of specifying which branch a changeset is for.

Refspecs

Now that we know all about refs, there's one last trick to know: the refspec. With refspecs you tell git what to push/fetch where and how to map local refs to remote refs and vice versa.

When you clone a repository git sets up the default refspec, you can see it in .git/config in the repository:

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = https://git.example.com/example.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master

The fetch refspec is +refs/heads/*:refs/remotes/origin/*, this means 'fetch all refs under refs/heads, and map them to refs/remotes/origin'. The leading + means that git will accept any update, not just updates that are descendants of the current values of the local refs, thus allowing history rewriting.

Like refs, git's handling of refspecs is very DWIM-heavy. For instance, git push origin master actually maps to git push origin refs/heads/master:refs/heads/master@{upstream}, first mapping master to refs/heads/master and then looking up in the config what it should be pushed to. And if it cannot be found in the config, then it actualy maps to git push origin refs/heads/master:refs/heads/master. When pushing, can can of course specify a full refspec yourself, pushing any arbitrary local commit to any arbitrary remote ref. For example, if you want to submit all but the last 5 commits to gerrit as a changeset, you can do git push gerrit HEAD~5:refs/for/master.

When fetching you can also do similar tricks, with one exception: for security reasons you can only fetch refs, not arbitrary commits. So while refs/heads/next:refs/heads/test-branch is a valid fetch refspec, refs/heads/next~3:refs/heads/test-branch is not.

One last thing to mention about refspecs is that pushing an empty source will cause the destination ref to be deleted, which means that git push origin :test will delete the test branch remotely.