Removing unwanted data from git repositories
Posted on Thu 05 November 2015 in Commit grooming
A very common mistake is to commit sensitive data, build products or otherwise
data that should never end up in a git repository. Git has an extremely
powerful command to deal with this: git filter-branch
. This command is not
only extremely powerful, it's also nigh incomprehensible.
We'll cover filter-branch and its capabilities in later articles, but for today we'll stick with a much simpler tool that's been created just for cleaning up unwanted files and sensitive data (think passwords).
The BFG repo cleaner
The BFG repo cleaner cannot do everything git-filter branch can, but the things it does, it does extremely well, fast and easy to use. So easy that I'm going to be lazy and just copy some examples from the BFG website:
Removing SSH private keys:
$ java -jar bfg.jar --delete-files id_{rsa,dsa} example.git
Removing huge files:
$ java -jar bfg.jar --strip-blobs-bigger-than 50M example.git
Removing passwords:
$ java -jar bfg.jar --replace-text passwords.txt example.git
It can also replace text based on regular expressions, delete entire folders and convert huge blobs to GitHub's Git LFS.