Dave's Free Press: Journal

violence, pornography, and rude words for the web generation


Recent posts


Recently commented posts


Journals what I read

geeky politics rant silly religion meta music perl weird drinking culture london language transport sport olympics hacking media maths web photography etiquette spam amazon film bastards books bryar holidays palm telecoms cars travel yapc bbc clothes rsnapshot phone whisky security home radio lolcats deafness environment curry art work privacy iphone linux bramble unix go business engineering kindle gps economics latin anglo-saxon money cars environment electronics
Sat, 23 Jul 2011

Splitting a git repository

For several years I've kept all my perl source code under version control. This is good. However, I was keeping all my distributions - all 40-odd of them - in a single repository. This is bad. It means that anyone who wants to check out the code has to check out 40 distributions, some of them very big, that they're not interested in as well as the one they are interested in.

So I've split the repository up into lots of seperate ones, and I've uplaoded them to Github instead of keeping them on my own machine. Normally I'm dead set against uploading my data to Teh Clowd, because you lose control over it and it's hard to make backups. Git and Github are an exception to this. My own checkouts - on my laptop and elsewhere - are complete copies of the entire repository, so if Github were to go out of business overnight, I'd not lose a damned thing, I'd just need to find somewhere else to act as the public front-end for my repositories. And it's all stuff that I want to be public anyway, so I really don't care if they lose a copy!

Splitting a git repository while still keeping all the history is a bit tricky, but the lovely Paul Johnson gave me a recipe, which I reproduce here with a few minor changes. Assuming that your monolithic repository contains a bunch of directories, each of which is to become a seperate repository ...

  mkdir split-repo
cd split-repo
for i in \`cd ../monolithic-repo;ls\`; do
git clone --no-hardlinks ADDRESS_OF_REPOSITORY $i
cd $i
git filter-branch --subdirectory-filter $i HEAD -- --all
rm -r .git/refs/original
git reflog expire --expire=now --all
git gc --aggressive
git prune
git remote rm origin
git remote add origin git@github.com:YOUR_USERNAME/$i.git
git push origin master
cd -

This leaves the original repository unchanged, so if anything goes wrong you need not worry. I did get some warnings and errors from 'git gc' and 'git prune' about it being out of memory when trying to compress files, but that's because my repository has some very big files. These errors were in fact harmless and just meant that the new copies of the repositories on my laptop were wasting lots of disk space. Once I'd uploaded them to github, deleted the local copy, and then re-downloaded from github, that was fixed.

Posted at 15:56 by David Cantrell
keywords: geeky
Permalink | 0 Comments

Sorry, this post is too old for you to comment on it.