Archive for the ‘Linux’ category

pdk 0.0.20

November 18, 2005

At work I just put out pdk 0.0.20. While only one new major feature is in place, it brings out a significant piece of pdk’s original vision.

The ability to publish your workspace history over anonymous http means that it is really easy for another person to pick up where you left off and make a few changes. They can either pull your work into their workspace as an add-on, or make some changes to your work and publish them back to you. Installing pdk on your web server will make this easy and safe, but it isn’t absolutely required. A simple rsync-to-public-server will get the job done with some minor race conditions.

We’ll have to rework some tutorials in the coming weeks. We can make them simpler now. Basically now you can get started customizing a distro in a couple of easy steps. See the mailing list for details, but it boils down to:

  1. Pull Ian’s (or somebody else’s) workspace.
  2. Add your stuff.

The only major headaches left in the distro development process are making installable iso images and package maintenance.

Jeff has been putting a lot of work into pickaxe, and now he is working on integrating it directly into pdk. This will make the new user experience really easy all the way out to making new installable iso images.

I’ve got package build-farming across multiple architectures and some really jaw-dropping maintenance reporting stuff on my mental todo list.

I’m getting excited about all this progress. One user has already proven that the pull functionality works in the field.

Very cool.

Less is More, Less is Clever

September 28, 2005

From 43Folders comes an amazing display of less is clever. What a cool note taking device. By not doing a zillion things it gains 700 hours of battery life.

I’ve tried to make our PDK project exemplify this. The underlying metaphor of PDK is source code. Can I manage an entire linux distribution as easily as I could manage the source code to an ordinary project? Can I collaborate with others just as easily? Can the distro build process just build quietly and tell me if there are errors? Can I change (refactor) the structure of my linux distribution to make it more succinct? Can I refactor components of my distribution so that they can be shared among multiple distributions?

Certainly there is a lot of complexity in the problem space left to attack. But by literally using the source code metaphor, adding a simple cache mechanism for holding packages, we get the foundation for version control for distributions. Furthermore, source code and version control have a well understood relaxed data consitency that is well understood by nearly every programmer and manager in our target market. Now instead of dealing with data management issues, I’m focusing my time on refactoring mechanisms, sane default behaviors, etc. These are much more salient problem spaces, so I deliver higher value sooner.

I’ve tried to use the “less is clever” pattern, and hopefully I’ve succeeded well enough. Time will tell. So far, even with its warts, PDK has addicted at least one user.

Git Epiphany

August 24, 2005

For several months I read about the wonders of distributed version control. I’ve followed it ever since Linux Weekly News started explaining Bitkeeper. I read about how many of the Linux developers realized huge productivity gains using Bitkeeper. I could visualize their peer to peer workflow, and I could even imagine a diagram of the relationships in my head. What I could not do until yesterday was picture exactly what the developers did to pull and push bits from each other.

Yesterday I had the epiphany. Using git, which is the current Linus tool:

A link between two git repositories has four parameters:

  1. URL of remote repository
  2. Push or pull
  3. Name of branch in remote repository.
  4. Name of branch in local repository.

Any savvy git user could point out that it’s possible to do some useful work without all of this information. But that user should be ignored as I’m papering over a lot of flexibility on purpose.

With that information in hand, lets take a look at git’s notion of what a branch is. A branch is just a commit, any commit, and a commit represents a single tree wide changeset. Commits have one or occasionally more parent commits. Starting with a commit I can recreate it’s full history by walking its parents and the parents’ parents, and so on. So a git branch is usually just a commit object which I can refer to by a particular name.

Perhaps an example will help:

I have a branch named “master” which refers to a particular commit id. I do some work and commit again to the master branch. Master now refers to my new commit id. The tricky part is that the parent of the new master commit is the old master commit.

And it really is that simple. Sometimes a branch might be referred to as a “head”. This name makes sense too as the branch is referring to a commit representing a significant head on a long linked tail of commits.

The physical representation of a branch is a single file. The file’s name is the branch name. The file contains a single commit id. The file just gets rewritten every time I commit to its branch. That file lives in a directory. I can have as many of these branch files in that directory as I like. (Git vets will notice I’m papering over a lot of flexibility again. And again, they should restrain themselves for the benefit of the rest of us n00bs for the moment.)

So a git repository can have many branches. And you can see how a link between two git repositories is going to have to involve more than just two urls to the repositories. We have to note which branches we mean.

The last complicating factor is that the remote repository and my repository might have different ideas about what to name a given branch. For instance, in git land, it is common to name your main work branch “master”. There is a good chance then, that the git branch I am interested in on the remote repository is named master. There is also a good chance that the local branch I intend to work on is called master too. This is a problem because, in order to merge, git wants to bring the history of that remote branch to my local machine and store it as a local branch. Once I have both branch histories in my local repository git can do a merge. A little more on that later. But in a single git repository the branches must have different names.

Here is an example of real link parameters. This would set up a pull link from the proposed updates, or “pu” branch in the mainline git repository.

  2. pull
  3. master
  4. junio-branch

Now let’s look at a real file. In a git remotes directory this link will look like so (filename .git/remotes/junio):

Pull: +master:junio-branch

Now I can reveal a little flexibility. A .git/remotes file can contain multiple pull and push links. Just space separate the branch relationships in the “Pull:” line and add a “Push:” line with more relationships as needed. The local branch names all need to be unique. (Ok maybe they don’t _have_ to, but the consequences of not doing so are a little hard to comprehend.)

I think it was seeing an example of this git remotes file that helped me over the mental barrier to understanding how git’s horizontal repo relationships work.

This is still being fleshed out somewhat on the git list, so that file format might change a bit before the relevant patches hit the mainline. Currently they are in Junio’s proposed updates (pu) branch. I, for one, hope they land soon! Update: They are in the mainline now. Yay!

In PDK I’m working on exposing this functionailty via publish subscribe commands. Publish will be equivalent to a single push, while subsribe will be used for setting up the initial remote pull, and an update command will do subsequent pulls. So after subscribing, updating should be roughly like CVS.

Hope this helps any other git merge challenged folks out there.

A quick note about actually merging branches: most of this has described how to note relationships with remote branches. Once those relationships are defined properly, git can simply push to or pull from the remote branches and they will be available for merging where needed. Actually doing those merges is something I won’t cover in too much detail here.

Note: These next examples aren’t the best. From here on out you could probably just use git pull and git-read-tree -m would be handled properly under the table. It would certainly be less typing. I’m not experienced enough with the pull command to write it up yet.

Suffice it to say that the manpage for git-read-tree has good information, see the -m option. To use the three argument form of git-read-tree -m you will want the git-merge-base command. Once you have performed the merge and updated your git index with the changed files, you will want to provide a number of -p options to git commmit, including HEAD, and all the local branches involved in the merge. That will preserve history of the complex merge and allow for intelligent repeated merging in the future.

Just in case this turns out to be useful:

I release this article into the public domain. – Darrin Thompson

Update: Thanks to Glen and Tim at Progeny for some post release proofreading. :-)

Update: Changed the branch example to point at mainline git and corrected the pull syntax slightly per Junio Hamano. (Thanks for reading!) Junio pointed out that git-read-tree -m is bad example for showing how to merge. Unfortunately I’m a little new at the whole git merging thing so I’m letting it stand until my knowledge improves or someone can give me a better example.

smartpm internals

January 28, 2005

Jeff Johnson explains how smartpm does it’s amazing depsolving tricks.

Smart Package Manager

January 12, 2005

Spent some time today with Connectiva’s Smart Package Manager. My interest in it was not so much as a package manager for my workstation or other systems, but python access to the internal algorithm.

To test this beast I went ahead and gave it a whirl on my workstation, running unstable. I let it do an upgrade. Unfortunately it incorrectly determined that I needed to have both libreadline5-dev and libreadline4-dev installed, which dpkg considered a conflict. (I think this was due to the Provides.)

However, using the evil gui interface I could just manually turn off the libreadline5-dev installation and everything went fine after that.

I do have one huge complaint though about the download mechanism. When my first attempts to upgrade failed midstream, I had to redownload all the not yet installed packages. That has to get fixed!

Also, the download seems to do five simultaneous connections to the repository host when fetching packages. This should probably be replaced with proper pipelining. Bombing the server with simulaneous connections is probably bad netiquette, especially when pipelining should be just as, maybe more effective, at reducing latency and getting downloads done as quickly as possible.

Overall, I like the idea of a single known good depsolver and I hope the Connectiva folks reach their goal.

The Fedora Build System Explained

January 7, 2005

Another Gafton Gem…

Look at the comps.xml file. You can depend on having in the buildroot the packages from the @base and @development-tools groups. Everything else will have to be BuildRequires:’d in the src.rpm.

The “broad for some tastes” nature of the @development-tools group explains pretty clearly why build requirements in spec files often seem to be lacking, at least by Debian

Why Fedora Chose CVS

January 7, 2005

Cristian Gafton justifies Fedora’s decision to go with CVS and not some other SCM. Interesting read. This had to be a quote of the week:

It all boiled down back to CVS because . . . it had a unique quality: everybody knows what is wrong with it.

Odds are, he’s right. The rest of the post is slightly more positive about other SCMs. I’m no CVS fan so I’d like to see a big deployment of pretty much anything else that is free.