Skip to main content
blog.philz.dev

Creating a monorepo out of a multirepo

Inspired by Julia Evans' posts on git, I'm jotting down an obscure trick to combine repos.

Sometimes you have multiple repos, and you want to create a monorepo out of them. Perhaps you have a distribution of many components, and it's convenient to git grep across all of them together. The following annotated snippet creates two repos and joins them together. The key insight is that a commit in git is made up of (roughly) a message, pointers to parent commits, and a "tree," the latter of which can be made synthetically by using git plumbing commands.

This approach is probably overkill for a one time merging of two repos. In that case, create a commit in your second repo that moves everything to a subdirectory, add the second repo as a remote, and merge in that second repo using the --allow-unrelated-histories.

  1. Create two repos, a-repo and b-repo, and initialize them with a file and some commits.
$ (mkdir a-repo; cd a-repo; git init; echo a > a-README.md;
   git add *; git commit -a -m"First A Commit";
   date >> a-README.md; git commit -a -m "Second A Commit");
	(mkdir b-repo; cd b-repo; git init; echo b > b-README.md;
	 git add *; git commit -a -m "First B Commit";
	 date >> b-README.md; git commit -a -m "Second B Commit")
Initialized empty Git repository in /private/tmp/z/a-repo/.git/
[main (root-commit) 5cc777f] First A Commit
 1 file changed, 1 insertion(+)
 create mode 100644 a-README.md
[main ba1272b] Second A Commit
 1 file changed, 1 insertion(+)
Initialized empty Git repository in /private/tmp/z/b-repo/.git/
[main (root-commit) a2f815a] First B Commit
 1 file changed, 1 insertion(+)
 create mode 100644 b-README.md
[main f00ac44] Second B Commit
 1 file changed, 1 insertion(+)
  1. Initialize the monorepo
$ mkdir mono; cd mono; git init; touch README.md; git add README.md; git commit -a -m "First mono commit"
Initialized empty Git repository in /private/tmp/z/mono/.git/
[main (root-commit) c4ff496] First mono commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 README.md
  1. Add both subrepos as remotes and fetch them.
$ for r in a b; do git remote add ${r} ../${r}-repo; git fetch ${r}; done
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), 409 bytes | 136.00 KiB/s, done.
From ../a-repo
 * [new branch]      main       -> a/main
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), 406 bytes | 203.00 KiB/s, done.
From ../b-repo
 * [new branch]      main       -> b/main
  1. Synthesize a tree listing and create it.
$ TREE=$( (git ls-tree HEAD; for r in $(git remote); do printf "040000 tree
  $(git rev-parse refs/remotes/${r}/main^{tree})\t${r}\n"; done ) | git mktree )
$ git ls-tree $TREE
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    README.md
040000 tree d1afc680e7436c969788ca8eeca0f78029cf6e1c    a
040000 tree 139482e594d1ab8b06ad718012da5a80fb4d4657    b
  1. Synthesize a commit with all the parents
$ COMMIT=$(git commit-tree $TREE -m "Monorepo merge" -p HEAD
    $(for r in $(git remote); do printf " -p refs/remotes/${r}/main"; done))
$ git show $COMMIT
commit 5d60a6e98039a510923c3b530f5e1b094ebeec98
Merge: c4ff496 ba1272b f00ac44
Date:   Fri Jan 5 16:39:41 2024 -0800

    Monorepo merge

diff --cc a/a-README.md
index 0000000,0d54894,0000000..0d54894
mode 000000,100644,000000..100644
--- a/a-README.md
+++ a/a-README.md
diff --cc b/b-README.md
index 0000000,0000000,9191ef0..9191ef0
mode 000000,000000,100644..100644
--- b/b-README.md
+++ b/b-README.md
  1. Update our working copy
$ git update-ref HEAD $COMMIT
$ git reset --hard HEAD
HEAD is now at 5d60a6e Monorepo merge
  1. Voila
$tree
.
├── README.md
├── a
│   └── a-README.md
└── b
    └── b-README.md
  1. You can see how the tree preserves the subrepo histories
$git log --decorate --pretty --graph --oneline
*-.   5d60a6e (HEAD -> main) Monorepo merge
|\ \
| | * f00ac44 (b/main) Second B Commit
| | * a2f815a First B Commit
| * ba1272b (a/main) Second A Commit
| * 5cc777f First A Commit
* c4ff496 First mono commit

  1. Now let's let the A repo change
$ (cd ../a-repo; date > a-another-file.txt; git add a-another-file.txt; git commit -a -m 'Third A Commit')
[main a362cd1] Third A Commit
 1 file changed, 1 insertion(+)
 create mode 100644 a-another-file.txt

$git fetch a
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), 306 bytes | 306.00 KiB/s, done.
From ../a-repo
   ba1272b..a362cd1  main       -> a/main
  1. Now we have to redo the merge. We use the same trick, sorta. Since we have that nice "README.md" in the mono repo tree, we want to preserve that. But, when we pull out git ls-tree HEAD, we have those a/ and b/ trees, and we want to filter those out. So, here we're abusing grep to filter them out. The tr and sed expression is producing a|b.
$ echo grep -E -v '\t('$(git remote | tr '\n' '|' | sed 's,|$,,')')$'
grep -E -v \t(a|b)$
$ TREE=$( (git ls-tree HEAD | grep -E -v '\t('$(git remote | tr '\n' '|' | sed 's,|$,,')')$'; for r in $(git remote); do printf "040000 tree $(git rev-parse refs/remotes/${r}/main^{tree})\t${r}\n"; done ) | git mktree )
$ diff -u0 <(git ls-tree HEAD) <(git ls-tree $TREE)
--- /dev/fd/63  2024-01-06 11:04:02
+++ /dev/fd/62  2024-01-06 11:04:02
@@ -2 +2 @@
-040000 tree d1afc680e7436c969788ca8eeca0f78029cf6e1c   a
+040000 tree 7a6a0c35bccc4a0196c5b605a549ab930e02d269   a
$ COMMIT=$(git commit-tree $TREE -m "Monorepo merge 2" -p HEAD $(for r in $(git remote); do printf " -p refs/remotes/${r}/main"; done))
$ git update-ref HEAD $COMMIT
$ git reset --hard HEAD
HEAD is now at 681d927 Monorepo merge 2
  1. And, sure enough, voila!
$git log --decorate --pretty --graph --oneline
*-.   681d927 (HEAD -> main) Monorepo merge 2
|\ \
| * | a362cd1 (a/main) Third A Commit
| | |
|  \ \
*-. \ \   5d60a6e Monorepo merge
|\ \ \ \
| | |/ /
| |/| /
| | |/
| | * f00ac44 (b/main) Second B Commit
| | * a2f815a First B Commit
| * ba1272b Second A Commit
| * 5cc777f First A Commit
* c4ff496 First mono commit

Rendered another way (with git-graph from cargo and questionable abuse of ansi-to-html):

<  681d927 (HEAD -> main) Monorepo merge 2
     a362cd1 (a/main) Third A Commit
<  5d60a6e Monorepo merge
     c4ff496 First mono commit
     f00ac44 (b/main) Second B Commit
     a2f815a First B Commit
      ba1272b Second A Commit
      5cc777f First A Commit

If you're doing this for real, note that the above will fail spectacularly if you have spaces (and their ilk) in your names. Use python or something.