Oh no! Not again, Pacers.

December 10, 2007 by Darrin Thompson

Jamaal Tinsley of the Indiana Pacers was in the news last night for being involved in some kind of shooting. Not more distractions people! This was finally shaping up to be a fun season! I’m a fan, and this year they’ve improved a lot. So I hit the news site hoping for an update, hoping against hope that this will blow over quickly.

I was treated to a rather interesting headline. Maybe management is tired of all the drama too?

indystar-oops2.png

We won’t call you. We won’t email you. We won’t bother you. Period.

October 2, 2007 by Darrin Thompson

Anyone ever been duped while trying to buy insurance or refinance your home on the Internet? One day you were quietly minding your business, curious, then for a couple of weeks you’re getting calls multiple times per day. The commercial said I’d “win” but I’m just getting a bunch of calls and they’re all offering the same lousy rates.

What happens is sales people are new or having a lag in sales, so they buy “leads” from these web sites. They might have paid $50 for your name and phone number. You, dear friend, are now a “warm lead,” which overrides your status on the do not call list, and now you’re fair game for awhile. You’re fair game and they’re cash out of pocket to find out about you. Expect some calls, maybe some hard sell.

I want to do something different. I want to take this whole dirty game and turn it on its head. I’ll explain below. For those in a hurry, I’m building this now. The site is called “House Chat Monkeys.” You can search for Indianapolis Real Estate now. Once I’ve built this site up and get regular traffic I’m moving on to building the chat. After that it’s on to a bigger town and then a rapid expansion into other markets.

So follow my thinking out loud here. What if I put up a real estate search engine of my own. Mine will be cool and all, but what if on my search engine my super charismatic talented agent wife could talk with you directly?

Ok, that sounds less cool. You’re getting hassled. That’s not going to work.

See because half the problem is that when you ask a question and give your email address out, someone is going to follow up. So you’re going to have to go to all kinds of trouble telling them you were just curious. No, you are not thinking about moving right now. No, you would not like a free report on the current value of your home. No, get lost. Better to keep your question to yourself, no?

So what if you can ask your question, get an answer from a licensed agent, but they can’t hassle you. They can answer your question, and for as long as you like they can help you search, answer your questions, but they can’t know who you are or your email address or your phone number. Now we’re getting into some cool uncharted territory.

What if my search engine had two paths. Path 1, search for homes, ask questions, get answers, meet real estate agents, they can’t bother you. Path 2, describe the home you’d like, a real estate agent starts searching for you and showing you homes. Agent not getting it? Try a new one. They’ll never know who you are.

Our motto would be: “We won’t call you. We won’t email you. We won’t bother you. Period.” Too many words. I need a better motto.

Are you a real estate agent? How about his motto: “We get paid when you get paid.”

I’m interested in technical help and capital. (317) 753-8526

Maybe later we can sell some cars.

My New Day Job: Data Backup Made Perfect

October 2, 2007 by Darrin Thompson

My new day job since I left the now defunct Progeny is working on delivering the most wicked cool data backup system I have ever seen.

It isn’t out yet, but it’s complete enough that we are mostly shaking out bugs now.

One of the main problems with backup to teh intarwebs is that everyone can see your data, and if there was ever a compromise of the remote server, someone would have your data forever.

So first of all, all your data is encrypted from the moment it leaves your system, and stays that way until it comes back to your system, to the point that if you lose your password, we can’t recover it. The advantage there is that there’s just no way for us or someone else to steal your data, even if they had physical access to one of our machines.

Backups should be incremental, so if only 5 things have changed since your last backup, only 5 things should have to go over the wire and be stored. That’s tricky when we can’t actually see your data, so our client is smart and figures all that out before sending us more data to store. The backup client is quite sophisticated about finding redundancy, so it saves you money and keeps our storage requirements under control, all while keeping your data under encrypted wraps.

I’ve grown a lot from working on this project. This is the closest I’ve been to shrink wrap software. Our client software has to run on the user’s machine and therefore has to be cross platform, Windows, Universal OS X binaries and 2 Linux architectures. While I’ll agree that python has been helpful in this regard, there’s still a LOT of work to be done beyond out of the box Python to get here.

And we need to be able to push updates out to all those platforms. I’ve come to appreciate the completeness of the Linux desktop platform. Between apt and dbus you can easily package up your software and have it restart itself when updates happen. On OS X and Windows we need to download, verify, and update our software ourselves.

All in all this is a fun project to work on, an excellent day job.

In Praise of Git, the Greatest System Of Its Kind

May 18, 2007 by Darrin Thompson

I’ve been putting off my second attempt at this essay for awhile. But recently I came across a YouTube video of Linus Torvalds explaining his Git version control system at a Google Tech Talk.

It’s kinda funny because Linus likes to use a lot of tongue-in-cheek hyperbole. He called the CVS the devil, Subversion developers morons, and had some fun banter with Google engineers who use the centralized model in Perforce which he insisted can never work as well as a distributed model.

I’ve tried and failed to explain to the world what is special about Git. I was encouraged to see Linus himself have some trouble also. It’s not that I’m a failure, this thing is just hard.

But it isn’t so different from a lot of other hard to explain things.

Say you just learned Ruby on Rails, you made your first little Rails website in record time, and you’ve just got to tell your co-workers about this fantastic Ruby thing.

While you wait for Eclipse to start, it looks a lot less shiny today, you head over to the break room and attack your first target coworker. And as you launch into your first explanation to the incredulous, words fail you. How do you explain to the newly hired former J2EE consultant that not having config files and hard coding everything using a special convention is more flexible? Words fail because you sort of have to have been there and consumed enough kool-aid to see where these nut jobs were coming from. And they were right! Dead right!

How do you explain enlightenment?

Maybe you just learned Erlang and it’s your new hammer. Ready to convince your boss to let you try it in anger?

It’s hard to explain to somebody what a fun party it was because, well, if they weren’t there, they weren’t there. And if they weren’t there, they didn’t experience first hand all that fun. No enlightenment for them.

But I’ll try again. My first attempt failed, but I think I’ve learned better this time.

First I want to talk about SpongeBob.

There’s this episode of SpongeBob where his boss makes he and Squidward work 24 hour shifts at the fast food joint. SpongeBob can’t get enough of being a fry cook and pesters Squidward mercilessly for the whole night. “Hey Squidward, I’m cleaning the bathroom… at night!” “Hey Squidward, I’m polishing the tables… at night!” Or whatever.

And how many times have we seen some stupid blog meme like this: “Hey Internet! It’s Ruby program for… the Y combinator!” “Hey Internet! It’s a Perl program for… the Y combinator!”

Or maybe it’s a language: “Hey everyone, it’s a wiki… in Haskell! It’s a weblog.. in Haskell! It’s a stupid blog post… about Haskell!”

Or the much maligned: “Hey Internet! It’s a scheme implementation of… FizzBuzz!” “Hey Internet! It’s full J2EE stack implementation of… FizzBuzz!”

All these little toys flying around the Internet, and here’s where I try to get back on track. I’ve heard some folks cursing the fact that so much energy is being wasted on mere toys, when we could try and get some real work done.

I think something was lost with CVS and Subversion. Torvalds points out that in his not so humble opinion, CVS and Subversion are inferior to just passing around patches and tarballs. And here he isn’t being tongue-in-cheek. He really means that.

How could that be? When all you have is a mailing list and patches, even the core developers of a project have to communicate their code on a mailing list, and in little mail sized bits.

What’s outrageously cool about the whole thing, is real work degenerates into little memes.

“Hey look at this patch, it’s a new filesystem that twice as fast as the old one!” (see series of 6 patches)

“Hey, I increased your performance by 10% with this modification.” (Patch sent in reply to patch 2)

“Hey, that’s cool, but you missed a case and could corrupt unless you handle this.” (Patch sent in reply to patch 5.)

What I’m getting at is it’s that silly SpongeBob, “Hey Squidward!” behavior, but it gets real work done. Where CVS and Subversion go wrong is that they make it convenient for core developers to skip that step.

The core developers lose out on code review, and more importantly, on that stupid code banter. Stupid code banter is a fundamental human need.

Why would sane people post FizzBuzz implementations on the intarweb? Because code banter is required for human sanity. Once CVS and Subversion got popular everyone forgot the natrual way to bat code around, and had to turn to artificial drugs, Y and FizzBuzz.

Git comes from this guy and his friends who, in their isolated kernel world, never forgot about the fun of batting code around, and how you can harness it to get real work done. They saw early on that it’s 10 times more fun to review code before the flaws are petrified in in a subversion trunk. That’s the real lesson.

… in my not-so humble opinion.

Barely sane people deprived of natural cleansing code banter.

What one example of real code banter looks like.

Another one, showing you don’t have to always mean it. Notice it’s “untested.”

Now, doesn’t that look more fun than fizzbuzz?

Erlang in Python

November 26, 2006 by Darrin Thompson

Ok, not even almost.

But I’ve been thinking. What would it take to implement Erlang’s actors concurrency model using Linux facilities directly?

So I messed around with the idea today and came up with a working message queue and rough pattern matcher in python.

Basically the Erlang “process” is mapped to an OS process or thread.

Each thread has a public Unix domain datagram socket (kinda like a file, but works like a reliable UDP socket) somewhere in the filesystem.

To send a message to a particular message or thread, just send a datagram to the appropriate domain socket.

Since we aren’t picking any particular messaging format, the message queues just treat every message as a blob, and assume the queue reader knows how to parse all the messages.

Erlang is more than an actor model. To message queue it adds pattern matching, marshaling messages between nodes (OS processes or hosts), and process and node monitoring. I think to get that stuff you’d need a pretty smart supervisor process which would somehow notice when linked or monitored OS processes exit.

But anyway, here is my queue code.


import socket
import struct

class NoMatch(Exception):
    pass

class MessageQueue(object):
    def __init__(self, socketname, queuename, recvsize):
        self.socketname = socketname
        self.queuename = queuename
        self.recvsize = recvsize
        self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
        self.socket.bind(self.socketname)
        self.queue = open(self.queuename, 'w+')

    def receive(self, pattern):
        for place, payload in self.iter_queue():
            try:
                interpreted = pattern(payload)
                self.markused(place)
                return interpreted
            except NoMatch:
                continue

        for payload in self.iter_socket(True):
            try:
                return pattern(payload)
            except NoMatch:
                self.enqueue(payload)

        for payload in self.iter_socket(False):
            self.enqueue(payload)

    def iter_queue(self):
        self.queue.seek(0)
        while True:
            place = self.queue.tell()
            sizebuffer = self.queue.read(8)
            if len(sizebuffer) != 8:
                break
            used, size = struct.unpack('ii', sizebuffer)
            if used:
                self.queue.seek(size, 1)
                continue
            payloadbuffer = self.queue.read(size)
            if len(payloadbuffer) != size:
                break
            yield place, payloadbuffer

    def iter_socket(self, blocking):
        self.socket.setblocking(blocking)
        while True:
            try:
                payloadbuffer = self.socket.recv(self.recvsize)
            except socket.error, e:
                if e.args[0] == 1:
                    break
                else:
                    raise
            yield payloadbuffer

    def markused(self, place):
        previous = self.queue.tell()
        self.queue.seek(place)
        self.queue.write(struct.pack(’i', 1))
        self.queue.seek(previous)

    def enqueue(self, payload):
        self.queue.seek(0, 2)
        self.queue.write(struct.pack(’ii’, 0, len(payload)))
        self.queue.write(payload)

def main():
    q = MessageQueue(’tests’, ‘testq’, 4096)

    def patterna(buf):
        if buf[0] == ‘a’:
            return buf
        else:
            raise NoMatch, buf

    def patternb(buf):
        if buf[0] == ‘b’:
            return buf
        else:
            raise NoMatch, buf

    while 1:
        print ‘match!’, q.receive(patterna)
        print ‘match!’, q.receive(patternb)

if __name__ == ‘__main__’:
    main()

And a script to drive it. (The messages are sent out of the expected order.)

import socket

s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
for p in ('a', 'b'):
    for i in xrange(0, 3):
        s.sendto('%s%d' % (p, i), 'tests')

Why No One Likes Your Bug Tracker

September 1, 2006 by Darrin Thompson

This is how not to build communities, how not to get participation. I tried to report a bug in inkscape and was confronted with this:

Sourceforge Bug Submission Hurdle
Sourceforge Bug Submission Hurdle
I can’t think of a new password I’m willing to hand to sourceforge that I can remember.

Fortunately I found I could submit it anonymously. Fortunately I discovered this before I gave up.

14 Things You Can Say In A Fantasy Draft That You Can’t Say In a Real Draft

July 19, 2006 by Darrin Thompson

It’s NFL fantasy draft season, and I’m participating for the first time. I brought fresh ears to my league’s draft party, and a few things struck me as humorous, so much so that I kept a list.

In a real pro sports draft there are national TV cameras, ESPN commentators analyzing everything into the ground to fill time, and announcements of picks with great fanfare. There must be an army of folks surrying around behind the scenes finalizing decisions.

At a fantasy draft there are a bunch of folks at a table, some attacking the problem at hand with as much gravity as the executives and scouts at the real draft. They have notebooks, or notebook computers filled with research.

Then there are a few who just came for the food. Armed only with a printed out internet draft guide, thank you Fox Sports, they endure seventeen rounds of picks, bored out of their skulls by the ninth.

Sometimes what both kinds of folks say lacks the research depth of what is said at the real draft. Or maybe at the fantasy draft we can relax in the just a game atmosphere. (Yeah right…)

Whatever the reason, I enjoyed picturing an executive coming to a podium and announcing:

  • “Did somebody already pick Larry Johnson?”
  • “How many wide receivers do I need?”
  • “The Patriots select Larry Johnson.” “Dude, he’s taken.”
  • “What team am I?”
  • Responding to a press attack later: “We picked Larry for his bye week.”
  • “Can I phone a friend?”
  • “The Patriots select Larry Johnson.” “Dude, did you know he’s injured?” “Oh, wait…”
  • “Dude, you can’t take it back after you let go of the piece.”
  • “Computer, remove two bad players.”
  • “Just give me a player, doesn’t matter who.”
  • “The Patriots select Larry Johnson.” “Dude, you already have him.”
  • “Just pick already!”
  • This really happened to us after our draft: League commisioner: “We’ll need everyone to email in their picks. My computer lost power before I saved.”

Our draft was all men, and later, the wives of the veteran managers, tired of being fantasy football widows formed their own league. They had a few announcements of their own:

  • “What’s a tight end?” “I think it means he has a nice touche.”‘

And lastly, here are my picks:

  • 01.02 Larry Johnson
  • 02.11 Marvin Harrison (sentiment driven)
  • 03.02 Larry Fitzgerald
  • 04.11 Tatum Bell
  • 05.02 Daunte Culpepper
  • 06.11 Terry Glenn
  • 07.02 Chris Brown
  • 08.11 L.J. Smith
  • 09.02 Indianapolis Defense (sentimental influence)
  • 10.11 Roddy White
  • 11.02 Neil Rackers
  • 12.11 Brett Favre
  • 13.02 Ben Troupe
  • 14.11 Marty Booker
  • 15.02 Chat Morton (because he was the last name in the draft guide)
  • 16.11 Ryan Longwell
  • 17.02 Santonio Holmes

Subversion Update Command Considered Harmful

June 19, 2006 by Darrin Thompson

This goes for CVS too.

Version control is the tool most programming workgroups won’t do without. It provides a kind of de facto backup mechanism, and lets us look at history to occasionally wonder about a particular change or line of code.

If multiple people are going to work on the same set of files, there will occasionally be two people wanting to work on the same thing. CVS pioneered or at least popularized the update comand as it works today. It was hailed as a major breakthrough for programming workgroups. Before CVS, if someone else was working on a file, you couldn’t touch it until they were done and had “unlocked” it, maybe days later. At that point you could pull their version of the file and make your change.

With CVS or SVN update you can mostly forget about the problem, as CVS will seamlessly merge others’ changes into your work before you commit yours, informing you only when it detects a problem, and leaving you to resolve it. These conflicts turn out to be quite rare in practice. Once you convince yourself that it works, it saves a lot of hassle.

I’d like to posit that both the pessimistic “locking” model and the optimistic “CVS update” model are badly broken. CVS made badly broken much more convenient, but it only accelerated a serious problem.

If I could change one thing about how I initially ran the pdk project I would have posted all changes to the list in patch form.

Early in pdk’s history I needed a small feature in git. I needed to expand on their already existing http support to include https with self signed certificates and http basic auth. The feature turned out to be small enough that I could add it myself. I didn’t want to maintain it so I contributed it upstream, and it was accepted. The feature is still in git today.

To get the patch accepted took some work and learning about the Linux development process. I fell in love with the culture, and in the process, the tool they built to support the culture.

Acceptance of the patch started with me posting it to the development mailing list.

Writing code for publication is different from writing code to get another feature done. The temptation to take a shortcut is virtually eliminated. If I do paper over an important case I’m going to have to point it out in a comment and/or in the email describing the patch. My patch and my description of it are going to end up in the inbox of hundreds, maybe thousands of people. That makes me want to get it right.

So while my patch was small, I was careful to make sure it was correct and the change was described clearly.

A few folks, including Junio Hamano, were interested in making git work over “dumb http.” Linus Torvalds, original author and maintainer of git at the time, didn’t give a rip about dumb http, preferring smart protocols. Despite his apathy over the feature, Linus trusted the judgement of Junio and took patches which improved the dumb http support over time. Ultimately Junio Hamano put my patch in with a series of his own http related patches and mine went into the official Linus tree as part of Junio’s set.

Today, Junio, not Linus, is the official maintainer of git. Linus still participates as a regular developer/user in the project. Even today Junio posts most of his work, maybe all of it, as patches and sets of patches to the mailing list before incorporating them into a release. Because he is careful about this, everyone’s work tends to be reviewed on list also. This isn’t so much a hard rule as it is a principle. Why would I want to disturb a complex system without help? Could someone look at this? Have I missed anything obvious?

Git is simply a flexible tool that can, among other things, support the Linux development model better than any.

The Linux development model stands in stark contrast to the more common CVS inspired development practice in closed and open source projects.

CVS and SVN update command encourages, nay insists, that developers skip the review step and trust one another blindly. How does this work in the normal project storing their code in SVN? Multiple times in a development cycle you incorporate into your code patches which have only ever been seen by their author. Could we imagine a more dangerous act? The times when you run cvs update are never convenient points for reviewing the incoming horde of changes. If you were to find a bug, you can’t reject the incoming change. You can add a review process to your dev group, but when the tool assumes your neighbor’s changes are good, your review is always second priority. The evidence of years of my own experience is that the implied trust truly is blind most of the time, this causes compounding trouble and we pay a heavy price.

You can’t even use SVN branches to keep separate lines of active development because you get zero help doing repeated merges.

The Linux culture promotes patch acceptance between developers as an explicit act. It implies a slight distrust and review follows naturally. Making a merge between developers is also always an explicit and transparent act. The only time there is a mass update (see git-rebase) is when someone at the tip of the project’s food chain (Linus, Junio) makes a particular line of _reviewed_ history the new official branch. Again I emphasize, these mass rebases, similar to the frequent cvs updates in normal land, are composed of mostly reviewed patches, so the whole thing is a lot safer. Furthermore, these rebases are never a barrier to commit a changeset.

When the history books are written, it will be known that Greg Hudson was dead wrong about distributed version control. His assertion that single integrators are a choke point throttling throughput has been thoroughly debunked by the last few years of Linux kernel development. Granted, some of the problems he pointed out did exist during the 2.4/2.5 Linux Kernel release series, but those problems cannot be attributed to the causes he proposes.

When the history books are written, it will be known that the Linux Kernel is, as of 2006, the largest agile project on the planet, having solved the “agile over TCP/IP” problem in a novel way. Git, and the torrent of emailed patches, are large part of what makes it possible.

Spammer Blows It

February 23, 2006 by Darrin Thompson

I’ve never seen a spammer fall so flat before. Check out this spam:

From: Theresa Polk
To: [email omitted]
Subject: Cecelia Kerr
Date: Thu, 23 Feb 2006 09:44:58 -0600 (10:44 EST)

Myrna,

%CUSTOM_LINK

Theresa Polk

How’s that for misconfiguring your software?

For the uninitiated, the %WHATEVER words should have been replaced with something useful, but they blew it. Reading this one was wierdly satisfying. Apparently spam which cannot possibly accomplish anything can escape my local spam filters. That’s wierdly comforting.

Rant: Python Plugin Systems

February 20, 2006 by Darrin Thompson

A plugin system should allow plugin authors to install their plugins using distutils.

The One True Way(tm) for writing a plugin system is this:

Have a configuration file. In the configuration file name python modules. The system exposing the plugin interface then dynamicly imports the named modules. The modules then have the opportunity to plug themselves into provided hooks.

There is no other way.

If you are scanning for python files in a directory, you are doing it wrong. Why? Plugin authors have to write their own system installation code. Distutils won’t do it.

If you are scanning for modules in a package directory, you are even more wrong. Why? See previous.

If you disagree, you are wrongest. Why? Because. You are wrong.

/rant.