Removing unwanted data from git repositories

Posted on Thu 05 November 2015 in Commit grooming

A very common mistake is to commit sensitive data, build products or otherwise data that should never end up in a git repository. Git has an extremely powerful command to deal with this: git filter-branch. This command is not only extremely powerful, it's also nigh incomprehensible.

We'll cover filter-branch and its capabilities in later articles, but for today we'll stick with a much simpler tool that's been created just for cleaning up unwanted files and sensitive data (think passwords).

The BFG repo cleaner

The BFG repo cleaner cannot do everything git-filter branch can, but the things it does, it does extremely well, fast and easy to use. So easy that I'm going to be lazy and just copy some examples from the BFG website:

Removing SSH private keys:

$ java -jar bfg.jar --delete-files id_{rsa,dsa} example.git

Removing huge files:

$ java -jar bfg.jar --strip-blobs-bigger-than 50M example.git

Removing passwords:

$ java -jar bfg.jar --replace-text passwords.txt example.git

It can also replace text based on regular expressions, delete entire folders and convert huge blobs to GitHub's Git LFS.

Download the BFG repo cleaner here