Archive for August 2005

Git Epiphany

August 24, 2005

For several months I read about the wonders of distributed version control. I’ve followed it ever since Linux Weekly News started explaining Bitkeeper. I read about how many of the Linux developers realized huge productivity gains using Bitkeeper. I could visualize their peer to peer workflow, and I could even imagine a diagram of the relationships in my head. What I could not do until yesterday was picture exactly what the developers did to pull and push bits from each other.

Yesterday I had the epiphany. Using git, which is the current Linus tool:

A link between two git repositories has four parameters:

  1. URL of remote repository
  2. Push or pull
  3. Name of branch in remote repository.
  4. Name of branch in local repository.

Any savvy git user could point out that it’s possible to do some useful work without all of this information. But that user should be ignored as I’m papering over a lot of flexibility on purpose.

With that information in hand, lets take a look at git’s notion of what a branch is. A branch is just a commit, any commit, and a commit represents a single tree wide changeset. Commits have one or occasionally more parent commits. Starting with a commit I can recreate it’s full history by walking its parents and the parents’ parents, and so on. So a git branch is usually just a commit object which I can refer to by a particular name.

Perhaps an example will help:

I have a branch named “master” which refers to a particular commit id. I do some work and commit again to the master branch. Master now refers to my new commit id. The tricky part is that the parent of the new master commit is the old master commit.

And it really is that simple. Sometimes a branch might be referred to as a “head”. This name makes sense too as the branch is referring to a commit representing a significant head on a long linked tail of commits.

The physical representation of a branch is a single file. The file’s name is the branch name. The file contains a single commit id. The file just gets rewritten every time I commit to its branch. That file lives in a directory. I can have as many of these branch files in that directory as I like. (Git vets will notice I’m papering over a lot of flexibility again. And again, they should restrain themselves for the benefit of the rest of us n00bs for the moment.)

So a git repository can have many branches. And you can see how a link between two git repositories is going to have to involve more than just two urls to the repositories. We have to note which branches we mean.

The last complicating factor is that the remote repository and my repository might have different ideas about what to name a given branch. For instance, in git land, it is common to name your main work branch “master”. There is a good chance then, that the git branch I am interested in on the remote repository is named master. There is also a good chance that the local branch I intend to work on is called master too. This is a problem because, in order to merge, git wants to bring the history of that remote branch to my local machine and store it as a local branch. Once I have both branch histories in my local repository git can do a merge. A little more on that later. But in a single git repository the branches must have different names.

Here is an example of real link parameters. This would set up a pull link from the proposed updates, or “pu” branch in the mainline git repository.

  2. pull
  3. master
  4. junio-branch

Now let’s look at a real file. In a git remotes directory this link will look like so (filename .git/remotes/junio):

Pull: +master:junio-branch

Now I can reveal a little flexibility. A .git/remotes file can contain multiple pull and push links. Just space separate the branch relationships in the “Pull:” line and add a “Push:” line with more relationships as needed. The local branch names all need to be unique. (Ok maybe they don’t _have_ to, but the consequences of not doing so are a little hard to comprehend.)

I think it was seeing an example of this git remotes file that helped me over the mental barrier to understanding how git’s horizontal repo relationships work.

This is still being fleshed out somewhat on the git list, so that file format might change a bit before the relevant patches hit the mainline. Currently they are in Junio’s proposed updates (pu) branch. I, for one, hope they land soon! Update: They are in the mainline now. Yay!

In PDK I’m working on exposing this functionailty via publish subscribe commands. Publish will be equivalent to a single push, while subsribe will be used for setting up the initial remote pull, and an update command will do subsequent pulls. So after subscribing, updating should be roughly like CVS.

Hope this helps any other git merge challenged folks out there.

A quick note about actually merging branches: most of this has described how to note relationships with remote branches. Once those relationships are defined properly, git can simply push to or pull from the remote branches and they will be available for merging where needed. Actually doing those merges is something I won’t cover in too much detail here.

Note: These next examples aren’t the best. From here on out you could probably just use git pull and git-read-tree -m would be handled properly under the table. It would certainly be less typing. I’m not experienced enough with the pull command to write it up yet.

Suffice it to say that the manpage for git-read-tree has good information, see the -m option. To use the three argument form of git-read-tree -m you will want the git-merge-base command. Once you have performed the merge and updated your git index with the changed files, you will want to provide a number of -p options to git commmit, including HEAD, and all the local branches involved in the merge. That will preserve history of the complex merge and allow for intelligent repeated merging in the future.

Just in case this turns out to be useful:

I release this article into the public domain. – Darrin Thompson

Update: Thanks to Glen and Tim at Progeny for some post release proofreading. :-)

Update: Changed the branch example to point at mainline git and corrected the pull syntax slightly per Junio Hamano. (Thanks for reading!) Junio pointed out that git-read-tree -m is bad example for showing how to merge. Unfortunately I’m a little new at the whole git merging thing so I’m letting it stand until my knowledge improves or someone can give me a better example.