Archive for June 2006

Subversion Update Command Considered Harmful

June 19, 2006

This goes for CVS too.

Version control is the tool most programming workgroups won’t do without. It provides a kind of de facto backup mechanism, and lets us look at history to occasionally wonder about a particular change or line of code.

If multiple people are going to work on the same set of files, there will occasionally be two people wanting to work on the same thing. CVS pioneered or at least popularized the update comand as it works today. It was hailed as a major breakthrough for programming workgroups. Before CVS, if someone else was working on a file, you couldn’t touch it until they were done and had “unlocked” it, maybe days later. At that point you could pull their version of the file and make your change.

With CVS or SVN update you can mostly forget about the problem, as CVS will seamlessly merge others’ changes into your work before you commit yours, informing you only when it detects a problem, and leaving you to resolve it. These conflicts turn out to be quite rare in practice. Once you convince yourself that it works, it saves a lot of hassle.

I’d like to posit that both the pessimistic “locking” model and the optimistic “CVS update” model are badly broken. CVS made badly broken much more convenient, but it only accelerated a serious problem.

If I could change one thing about how I initially ran the pdk project I would have posted all changes to the list in patch form.

Early in pdk’s history I needed a small feature in git. I needed to expand on their already existing http support to include https with self signed certificates and http basic auth. The feature turned out to be small enough that I could add it myself. I didn’t want to maintain it so I contributed it upstream, and it was accepted. The feature is still in git today.

To get the patch accepted took some work and learning about the Linux development process. I fell in love with the culture, and in the process, the tool they built to support the culture.

Acceptance of the patch started with me posting it to the development mailing list.

Writing code for publication is different from writing code to get another feature done. The temptation to take a shortcut is virtually eliminated. If I do paper over an important case I’m going to have to point it out in a comment and/or in the email describing the patch. My patch and my description of it are going to end up in the inbox of hundreds, maybe thousands of people. That makes me want to get it right.

So while my patch was small, I was careful to make sure it was correct and the change was described clearly.

A few folks, including Junio Hamano, were interested in making git work over “dumb http.” Linus Torvalds, original author and maintainer of git at the time, didn’t give a rip about dumb http, preferring smart protocols. Despite his apathy over the feature, Linus trusted the judgement of Junio and took patches which improved the dumb http support over time. Ultimately Junio Hamano put my patch in with a series of his own http related patches and mine went into the official Linus tree as part of Junio’s set.

Today, Junio, not Linus, is the official maintainer of git. Linus still participates as a regular developer/user in the project. Even today Junio posts most of his work, maybe all of it, as patches and sets of patches to the mailing list before incorporating them into a release. Because he is careful about this, everyone’s work tends to be reviewed on list also. This isn’t so much a hard rule as it is a principle. Why would I want to disturb a complex system without help? Could someone look at this? Have I missed anything obvious?

Git is simply a flexible tool that can, among other things, support the Linux development model better than any.

The Linux development model stands in stark contrast to the more common CVS inspired development practice in closed and open source projects.

CVS and SVN update command encourages, nay insists, that developers skip the review step and trust one another blindly. How does this work in the normal project storing their code in SVN? Multiple times in a development cycle you incorporate into your code patches which have only ever been seen by their author. Could we imagine a more dangerous act? The times when you run cvs update are never convenient points for reviewing the incoming horde of changes. If you were to find a bug, you can’t reject the incoming change. You can add a review process to your dev group, but when the tool assumes your neighbor’s changes are good, your review is always second priority. The evidence of years of my own experience is that the implied trust truly is blind most of the time, this causes compounding trouble and we pay a heavy price.

You can’t even use SVN branches to keep separate lines of active development because you get zero help doing repeated merges.

The Linux culture promotes patch acceptance between developers as an explicit act. It implies a slight distrust and review follows naturally. Making a merge between developers is also always an explicit and transparent act. The only time there is a mass update (see git-rebase) is when someone at the tip of the project’s food chain (Linus, Junio) makes a particular line of _reviewed_ history the new official branch. Again I emphasize, these mass rebases, similar to the frequent cvs updates in normal land, are composed of mostly reviewed patches, so the whole thing is a lot safer. Furthermore, these rebases are never a barrier to commit a changeset.

When the history books are written, it will be known that Greg Hudson was dead wrong about distributed version control. His assertion that single integrators are a choke point throttling throughput has been thoroughly debunked by the last few years of Linux kernel development. Granted, some of the problems he pointed out did exist during the 2.4/2.5 Linux Kernel release series, but those problems cannot be attributed to the causes he proposes.

When the history books are written, it will be known that the Linux Kernel is, as of 2006, the largest agile project on the planet, having solved the “agile over TCP/IP” problem in a novel way. Git, and the torrent of emailed patches, are large part of what makes it possible.

Advertisements