Troubleshooting Performance Issues Caused by Large Files in Git

Large files hurt Git in a way that is easy to miss until the repository is already painful to use. A single video, archive, database dump, or design file may not look dangerous when someone adds it. The trouble starts when that file changes several times. Git keeps the history, and every clone has to carry that history around.

The symptom is usually vague at first. git clone takes longer than it should. git fetch feels slow on hotel Wi-Fi. CI jobs spend too much time checking out the repository before they even build. Developers start using old local clones because a fresh clone is annoying. That is the moment to inspect the repository instead of telling everyone to be patient.

Confirm that large files are the problem

Start with simple checks:

du -sh .git
git count-objects -vH

du -sh .git tells you how heavy the local repository database is. git count-objects -vH shows loose objects and pack size. If the pack size is large compared with the actual source tree, history is likely carrying old payloads.

To find large files in the current checkout:

find . -path ./.git -prune -o -type f -size +10M -print

That only shows what exists now. A repository can be slow because of a file that was deleted months ago. To inspect history, Git LFS provides a useful report even before you migrate anything:

git lfs migrate info --everything --above=10MB

If Git LFS is not installed, you can still investigate with Git plumbing, but the command above is often the most direct view for this specific problem.

Decide what belongs in Git

Not every large file is a mistake. A small set of stable binary assets may be fine. A repository for infrastructure code should not contain VM images, database backups, customer exports, or build artifacts. A game repository may legitimately contain art and audio assets, but those files usually need Git LFS or a separate asset system.

A practical rule is this: Git is excellent for source text and small supporting files. Git is poor for frequently changing binary blobs. If the file cannot be meaningfully reviewed in a diff and it changes often, it probably should not live as a normal Git object.

Common candidates for Git LFS include:

*.psd
*.ai
*.mp4
*.mov
*.wav
*.zip
*.uasset
*.fbx
*.blend

Be careful with broad image patterns. Tracking every *.png in LFS can be helpful for a design-heavy repository, but it can be annoying for a web app with many tiny icons. Patterns should match the files that actually cause pain.

Use Git LFS for future large files

Git LFS stores a small pointer file in Git and keeps the large content in LFS storage. The normal Git history stays lighter, while users still get the real file in the working tree when LFS downloads it.

Install and initialize it:

git lfs install

Track the file patterns you actually need:

git lfs track "*.psd"
git lfs track "*.mp4"
git add .gitattributes
git commit -m "Track large design and video files with Git LFS"

The .gitattributes file is important. Commit it so everyone uses the same LFS rules.

After that, add files normally:

git add demo.mp4
git commit -m "Add product demo video"
git push origin main

A collaborator should install Git LFS before working with the repository. If they clone without LFS support, they may see pointer files instead of real assets until they install LFS and run:

git lfs pull

Also check storage and bandwidth policy on your Git host. Git LFS solves Git object bloat, but it does not make large assets free to store or transfer.

Migrating existing history

Enabling LFS today does not automatically fix yesterday's commits. If a 700 MB archive was committed and later deleted, it can still live in history. Cleaning that requires rewriting history.

History rewriting changes commit IDs. Anyone with an existing clone must resync carefully, and open pull requests may need to be rebased or recreated. Do this in a maintenance window, and make a mirror backup first:

git clone --mirror [email protected]:ORG/REPO.git repo-backup.git

Then work in a fresh clone. Make sure the working tree is clean:

git status

Inspect what would be migrated:

git lfs migrate info --everything --above=10MB

Migrate by pattern when possible:

git lfs migrate import --everything --include="*.psd,*.mp4,*.zip"

Or migrate files above a threshold if the repository has many unknown large files:

git lfs migrate import --everything --above=10MB

Review the result before pushing:

git log --oneline --decorate -5
git lfs ls-files
git status
git lfs migrate info --everything --above=10MB

If the migration did what you expected, push rewritten branches and tags deliberately:

git push --force-with-lease origin main
git push --force-with-lease origin --tags

For a repository with many active branches, decide which branches matter. You may not need to rewrite every abandoned branch, but any branch that still contains the large objects can keep the remote repository heavy.

After a history rewrite

Tell teammates exactly what changed. The cleanest instruction is often to reclone. If people have local work, they should save it first:

git status
git branch my-work-before-lfs-migration
git fetch origin
git rebase origin/main

For messy local clones, recloning is less risky than trying to surgically repair old history.

Remote storage may not shrink instantly. Hosting providers keep unreachable objects for a while, and some require support or repository maintenance before storage numbers update. Locally, you can prune old objects after you are sure the migration is good:

git reflog expire --expire=now --all
git gc --prune=now --aggressive

Do not run cleanup commands as a substitute for review. They make old local objects harder to recover.

Preventing the same problem again

Add a pre-commit or pre-receive check if large accidental files keep appearing. A local pre-commit hook can warn developers before they commit a large artifact. A server-side rule is stronger because it protects the shared repository even when someone skips local hooks.

A simple local check might reject files over a chosen size unless they are already tracked by LFS. The exact threshold depends on the project. A documentation site and a game project should not use the same limit.

Also fix the source of the files. If CI creates dist/, target/, coverage reports, archives, or screenshots inside the repository, add the right entries to .gitignore:

dist/
target/
coverage/
*.log
*.zip

Do not ignore files blindly. Make sure the ignored paths are generated outputs, not source inputs.

When LFS is not the answer

Git LFS is not a universal artifact store. Build outputs usually belong in a package registry, object storage, release asset, or CI artifact store. Database dumps belong in backup storage. Large datasets may need a data versioning tool or a separate storage workflow.

The goal is not to hide every big file from Git. The goal is to keep the repository fast enough that people can clone, branch, fetch, and review without fighting the tool.

A good cleanup leaves three things behind: clear .gitattributes rules for files that belong in LFS, .gitignore rules for files that should never be committed, and a short team note explaining how existing clones should resync. That is what keeps the fix from becoming a one-time cleanup that you repeat next quarter.