Subversion Update Command Considered Harmful

This goes for CVS too.

Version control is the tool most programming workgroups won’t do without. It provides a kind of de facto backup mechanism, and lets us look at history to occasionally wonder about a particular change or line of code.

If multiple people are going to work on the same set of files, there will occasionally be two people wanting to work on the same thing. CVS pioneered or at least popularized the update comand as it works today. It was hailed as a major breakthrough for programming workgroups. Before CVS, if someone else was working on a file, you couldn’t touch it until they were done and had “unlocked” it, maybe days later. At that point you could pull their version of the file and make your change.

With CVS or SVN update you can mostly forget about the problem, as CVS will seamlessly merge others’ changes into your work before you commit yours, informing you only when it detects a problem, and leaving you to resolve it. These conflicts turn out to be quite rare in practice. Once you convince yourself that it works, it saves a lot of hassle.

I’d like to posit that both the pessimistic “locking” model and the optimistic “CVS update” model are badly broken. CVS made badly broken much more convenient, but it only accelerated a serious problem.

If I could change one thing about how I initially ran the pdk project I would have posted all changes to the list in patch form.

Early in pdk’s history I needed a small feature in git. I needed to expand on their already existing http support to include https with self signed certificates and http basic auth. The feature turned out to be small enough that I could add it myself. I didn’t want to maintain it so I contributed it upstream, and it was accepted. The feature is still in git today.

To get the patch accepted took some work and learning about the Linux development process. I fell in love with the culture, and in the process, the tool they built to support the culture.

Acceptance of the patch started with me posting it to the development mailing list.

Writing code for publication is different from writing code to get another feature done. The temptation to take a shortcut is virtually eliminated. If I do paper over an important case I’m going to have to point it out in a comment and/or in the email describing the patch. My patch and my description of it are going to end up in the inbox of hundreds, maybe thousands of people. That makes me want to get it right.

So while my patch was small, I was careful to make sure it was correct and the change was described clearly.

A few folks, including Junio Hamano, were interested in making git work over “dumb http.” Linus Torvalds, original author and maintainer of git at the time, didn’t give a rip about dumb http, preferring smart protocols. Despite his apathy over the feature, Linus trusted the judgement of Junio and took patches which improved the dumb http support over time. Ultimately Junio Hamano put my patch in with a series of his own http related patches and mine went into the official Linus tree as part of Junio’s set.

Today, Junio, not Linus, is the official maintainer of git. Linus still participates as a regular developer/user in the project. Even today Junio posts most of his work, maybe all of it, as patches and sets of patches to the mailing list before incorporating them into a release. Because he is careful about this, everyone’s work tends to be reviewed on list also. This isn’t so much a hard rule as it is a principle. Why would I want to disturb a complex system without help? Could someone look at this? Have I missed anything obvious?

Git is simply a flexible tool that can, among other things, support the Linux development model better than any.

The Linux development model stands in stark contrast to the more common CVS inspired development practice in closed and open source projects.

CVS and SVN update command encourages, nay insists, that developers skip the review step and trust one another blindly. How does this work in the normal project storing their code in SVN? Multiple times in a development cycle you incorporate into your code patches which have only ever been seen by their author. Could we imagine a more dangerous act? The times when you run cvs update are never convenient points for reviewing the incoming horde of changes. If you were to find a bug, you can’t reject the incoming change. You can add a review process to your dev group, but when the tool assumes your neighbor’s changes are good, your review is always second priority. The evidence of years of my own experience is that the implied trust truly is blind most of the time, this causes compounding trouble and we pay a heavy price.

You can’t even use SVN branches to keep separate lines of active development because you get zero help doing repeated merges.

The Linux culture promotes patch acceptance between developers as an explicit act. It implies a slight distrust and review follows naturally. Making a merge between developers is also always an explicit and transparent act. The only time there is a mass update (see git-rebase) is when someone at the tip of the project’s food chain (Linus, Junio) makes a particular line of _reviewed_ history the new official branch. Again I emphasize, these mass rebases, similar to the frequent cvs updates in normal land, are composed of mostly reviewed patches, so the whole thing is a lot safer. Furthermore, these rebases are never a barrier to commit a changeset.

When the history books are written, it will be known that Greg Hudson was dead wrong about distributed version control. His assertion that single integrators are a choke point throttling throughput has been thoroughly debunked by the last few years of Linux kernel development. Granted, some of the problems he pointed out did exist during the 2.4/2.5 Linux Kernel release series, but those problems cannot be attributed to the causes he proposes.

When the history books are written, it will be known that the Linux Kernel is, as of 2006, the largest agile project on the planet, having solved the “agile over TCP/IP” problem in a novel way. Git, and the torrent of emailed patches, are large part of what makes it possible.

Advertisements
Explore posts in the same categories: General

7 Comments on “Subversion Update Command Considered Harmful”

  1. Rune Says:

    While your observations may apply to distributed development efforts like the linux kernel, it completely ignores realities of making software in a team of peers, such as you would find in software companies.

    In my office I trust my peers. I pretty much have to. If we didn’t trust eachother we would spend all of our time reviewing eachothers code and never get anything done. So I trust my peers, and their code. If anyone breaks the tree, they go back and fix it, because frankly, breaking the tree is kinda embarrasing.

    And please, can we have an end to the whole “foo considered harmfull” thing that’s been going on the last year or so? Sure, Dijkstra can get away with it, but most of the time it’s just populistic, posturing.

  2. David Says:

    I generally agree with you. I want to mention how we solved this particular problem with CVS at the startup I used to work for. I wrote a build script that would not only try to build our code, but would do a cvs diff from the last correctly built version and then send it out on a team email list. This happened often, up to 10 times a work day.

    This had the same effect that you mentioned where it encouraged the best possible changes be committed as someone on the would generally catch you if you tried to cut corners. You’d then be asked some hard questions justifying your change. This ended up raising the quality of our code significantly and generally making the whole process very transparent, something that we definitely admired from the open source world.

    -David

  3. James Says:

    This is why we have comprehensive unit test suites. If somebody commits a change that breaks the HEAD, and you have good test coverage, the regression will be picked up. Sure beats the hell out of analysing each change on a large team before you include it in ‘your’ build.

  4. vuclan Says:

    Rune is 100% right. When talking about something you should always put things into context. It seems you have done so in your article. So why is Rune so skeptic about your points? It’s because you have an absolute statement and zero context in your article title :) .

    Every seasoned developer knows you alwas should have the right tool for the job. So the question should be: how to use SVN Update/Commit for distributed open source development?

    Ok. the answer to that would need a book, so it seems I am already rambling out here…

  5. Kevin Spaeth Says:

    Tortoise SVN has a cool feature (and maybe Subversion has it as well) called “check for modifications”. You can see exactly what changes are coming in the next update. If you don’t like the change, merge the file back to the previous version. You can do this on a file by file basis if you want.

    I think SVN is flexible enough where if you want to do a controlled update, you have that option, or if you want to update without reviewing the code, you can do that too.

  6. BE Says:

    Vulcan’s point about development environment is key. I use subversion as a single academic developer working on multiple machines to keep my code always in sync and to have a history that I can look back on when publication time comes or to find where and how a bug came to be introduced.

    In this environment, the streamlining is wonderful. Consider the following scenario. I make changes to the code on machine-1. Occasionally, (twice in the last year) I forget to commit them to the repository. Then, the next day, I work on the code on machine-2 and commit it. Finally I return to machine-1. Update merges in the changes from machine-2. No work or time is lost.

  7. Michael Says:

    I’m not agree with the statement of Subversion Update Command Considered Harmful”

    And I’m 100% site with the comment from Rune , David, Kevin , BE etc

    Is all about the practise … user must know what they should do … the tools is only to flexi the document control


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: