Find the large items
Git sometimes stores groups of incremental changes in a packfile . If you have large packfiles, these commands can help you find the large objects inside them:
Show the 20 largest items in a pack:
git verify-pack -v .git/objects/pack/pack-{hash}.idx | sort -k 3 -n | tail -n 20
Example (which displays only the 5 largest items):
git verify-pack -v .git/objects/pack/pack-e2ea360d8ed326227bdce049fa4b99c11576073e.idx | sort -k 3 -n | tail -n 5
43489f27ad4e9ec5dc258253a53460f190e9ed8b blob 1607511 176766 17289540
ba77b5da2484b0d2c6175d8b44dfd7415159f900 blob 4157568 4095137 64358027
93822a3c3099997d0c26aa8104a37ad7c93e295a blob 6214972 6173121 82337786
15edef5357a7fe0ef052c7c2b7131cee98e56048 blob 6272218 5625941 76711845
99fcbf756dd5a6263e86b4fe4ff6fef41a1783a8 blob 9182543 8258681 68453164
Once you've determined what the item is, you can view the pack object.
git rev-list --objects --all | grep {hash}
Example:
git rev-list --objects --all | grep ba77b5da2484b0d2c6175d8b44dfd7415159f900
ba77b5da2484b0d2c6175d8b44dfd7415159f900 docroot/themes/custom/theme2/image.jpg
Branch filtering
The simplest method of removing files uses git filter-branch
. This command enables you to eliminate particular files from your history. This can be especially useful if you've committed a small number of files.
git filter-branch --index-filter 'git rm --cached --ignore-unmatch ./path/to/resource/*.ext' --tag-name-filter cat -- --all
You can change this command to use the file name or extension of your choice.
When you are satisfied with the changes that you've made, you will need to force the changes up to the remote repo:
git push origin --force --all
To remove the purged information from your tagged releases as well, you can run the following command:
git push origin --force --tags
Cherry-picking
If the files were added in one commit or a group of commits and there haven't been many commits since, reverting to an older branch and cherry-picking your code base is an option.
Assuming that you are on a single branch called the obese
, and the last commit before the files were added was called, you can enter the following in the command line:
git checkout -b slim_branch
git cherry-pick
Use this for each commit made since that checkout, excluding the commit that bloated the repository.
-
git push origin slim
-
git branch -D obese
-
git push origin :obese
-
git prune origin
If the obese
branch is your master branch, this process is more complex. Git won't let you delete the HEAD
branch. In this instance, you will need to manipulate the HEAD
file directly on the bare Git repository to point to a branch other than master temporarily, in order to delete the master
branch. You will need to replace it with the slim
branch afterwards:
Perform the following commands:
- Manipulate
HEAD
in the bare Git repository on the Git server to point toslim
.
git branch -D master
git push origin master
git checkout -b master origin/slim
git push origin master
- Return the
HEAD
file in Git to point atmaster
again.
Using a Third Party Tool
Use the BFG, third party tool to remove large items or sensitive items like passwords. The BFG is a simpler, faster alternative to git-filter-branch
for cleansing bad data out of your Git repository history.
-
BFG Repo-Cleaner https://rtyley.github.io/bfg-repo-cleaner/
Wiping and restarting
If your packfile is out of control, your best option is to wipe the repository and restart. If your repository doesn't shrink to its expected initial size even after you've performed the maintenance (+ git gc), the packfile may be causing the issue. You can use a command similar to the following one from within the repo:
find . -type f -name "*.pack" | xargs du -h
This is the last resort. If this is necessary, you may find the article to Reset a Git repository on Acquia Cloud useful.