The following article describes a process you can use to remove large files selectively from the history of code changes stored in your Git repository. This process is most frequently needed if large binary files (movies, images, etc.) have been committed to your code repository, and need to be removed.
Get ready
For starters, follow these commands to clone a "mirror" version of your site's code repository to your local machine. This will give you a "bare" version of the repository that contains the entire history of all refs, branches and tags since its inception. This information contains everything that we'll need to create a backup of your repository, and to clean and replace the existing repository:
git clone --mirror git-user@git-url.example.com:my-repo-name.git
git --git-dir=my-repo-name.git remote update
Back up your repository
Next we'll create an archive of your repo's current state using git-bundle (included with git). In the event that you need to revert the cleanup steps you're about to take, you can use this archive to do just that.
To create the archive run the following command:
git --git-dir=my-repo-name.git bundle create my-repo-name.bdl --all
To verify the archive, ensuring that it contains a complete history of your repository, run the following command:
git bundle verify my-repo-name.bdl
To restore the archive to a directory called my-repo-name in the event of an emergency, run the following command:
git clone my-repo-name.bdl my-repo-name
Now that you've got your backup bundle, store that in a safe place, and let's move on to cleaning up your repo.
Shrink your repository with BFG repo cleaner
The BFG repo cleaner is a Java app designed to intelligently clean large files from your repository's history. It's a relatively fast and powerful tool, and documented very well at the aforementioned link. You can follow a process like the one described below to selectively remove large files from your Git history using this tool:
- Download the cleaner app by clicking the "Download" button at the link above, or download it directly from its project page on GitHubRun the BFG repo cleaner tool on your repository. This command can be run with may options which prune files based on various criteria. For example, the command below analyzes and cleans files larger than 50 MB from your repository:
java -jar /path/to/bfg.jar --strip-blobs-bigger-than 50M my-repo-name.git
git --git-dir=my-repo-name.git reflog expire --expire=now --all && git --git-dir=my-repo-name.git gc --prune=now --aggressive
- You can repeat steps 2 and 3 multiple times, running the BFG cleaner with different options to whittle down the size of this repo. Use the following command to check its adjusted size as you continue working:
du -d 1 -h my-repo-name.git
Push and propagate the cleaned repository
Finally, when you're satisfied with the size of the repository, push it back up to the remote server with the following command:
git --git-dir=my-repo-name.git push
Once this is done, everyone working on your project should delete their local repositories, and clone down your updated one, to avoid someone accidentally re-introducing the once-removed "dirty" refs back into your repository by mistake.