Coding with Titans

so breaking things happens constantly, but never on purpose

GiT Cheat Sheet – Split my repository

Internet is full of tutorials about git usage, so here is mine too. But instead of showing basic, I wish to present the solution for an advanced problem, that I personally fight from time to time. I hope this could be valid and could also save your day!

The Problem.

My project’s repository became so big I noticed some components that could be turned into a separate libraries and used elsewhere either. I would like to split this repo into several ones, apply new folder layout inside new repos and finally bind them all with submodules or subtrees. Most importantly I must keep the full change history of files and transfer it to all those new repos.

What Internet says about it?

Browse the stackoverflow.com site and you will find lots of suggestions for all versions of git. I recommend reading this answeras it combines really comprehensive guide and walkthrough. It only doesn’t explain, how to move files around nicely.

My Solution.

Here is my proposal for solving the problem:

  1. Split the repository
git subtree split -P <path_to_extract> -b <branch_name>
Notice:

This will actually extract specified folder across whole project’s history and place it on the given branch. It’s good to have the whole component already available via that one folder and not spread across. Additionally on Windows, always use the slash (‘/’) to separate path segments.
  1. Create new repository
mkdir <component_name>
cd <component_name>
git init
  1. Import files with history from old repository into new one
git pull <path_to_source_repository> <branch_name>
  1. Patch new repository to move files to respective folders

    It’s vital mostly because new source-code files, extracted at step 1., will be placed directly at root. I try to keep some ‘predefined’ repository structure with ‘art’, ‘bin’, ‘ext’, ‘src’ folders.

git filter-branch --tree-filter 'mkdir -p src/libX/core/; mv *.cs src/libX/core/;' HEAD
or move whole folders:
git filter-branch -f --tree-filter 'if [[ -e Model ]]; then mkdir -p src/libX/core/; mv Model src/libX/core/; fi' HEAD
Repeat the last command for all folders that need to be moved.

Notice:

The `-f` parameter is used to overwrite the index backup done during first filter-branch call. This backup could be potentially used, when something went wrong with history rewrite or with revert request. It’s stored inside "_.git/refs/original_" folder and could be deleted, but why to do it manually? Without it, you could see an error similar to:
Cannot create a new backup.
A previous backup already exists in refs/original/
Secondly, there is an `if` statement. It’s mostly required, when moving files and folders, that were not added during the first commit inside this new repository. Otherwise you will see:


As tries to move folders/files that don't exist in initial commit, could lead to following error:

`Rewrite 85700a9a54c203d49de11d3fbb15a37f4f5637E9 (1/18)mv: cannot stat 'Model': No such file or directory`
  1. Add a remote, where the new repo will be pushed
git remote add <component_name> <repo_url>
git push –set-upstream <component_name> master
  1. Remove original source
git rm -rf <path_to_extract>
git commit -a -m "Removed component"
  1. Add submodules or subtree to the old source repo

    Here is an official guide, how to manage submodules.

Final thoughts.

The source repository could be optimized and whole history about extracted component could be removed (pruned). There is only one catch with it – it requires rewriting history on an already published repository. If you fully manage the environment, where the repository is used, that should be OK. But if it’s a public project, I would highly avoid it. The procedure requires deletion of the repository and creating it again. Since publicized hashes changed, it will be a nightmare, to keep only the latest ones, if someone with the old version pushes all, there will be plenty of duplicates.