Think you can Trust Python's stdlib? Think again.

It's been a while that I've blogged about Ken Thompson's Reflections on Trusting Trust. And this week I was bitten hard by its moral:

The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect.

The task seemed simple enough. We had been passing around links between clones in a URL-like format of the type ${host}:${port}/${path}, with a small custom parser (an ugly hack) for parsing and unparsing these things. As we adapted the code to support IPv6 it turned out that in many cases (i.e. unless the nodename field was configured), raw IPv6 addresses would be passed around, and the parser would of course choke on that. Fair enough, I thought, time to use the established standards and

import urlparse 

Now this is supposed to split the URI into parts corresponding to scheme, host, path etc. like so

>>> urlparse.urlparse("") 
('http', '', '/bar', '', '', '')

Of course, most nodes still had the old clone links lying around, and I was surprised to find the parse for these entries:

>>> urlparse.urlparse("") 
('', '', '6221/bar', '', '', '')

Hmm. OK. Let's look at the internals of that parser, and vi

def urlsplit(url, scheme='', allow_fragments=1): """Parse a URL into 5 components: :/// ?#


(e.g. netloc is a single string) and we don't expand % escapes."""
key = url, scheme, allow_fragments
cached = _parse_cache.get(key, None)
if cached:
return cached
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
netloc = query = fragment = ''
i = url.find(':')
if i > 0:
if url[:i] == 'http': # optimize the common case
scheme = url[:i].lower()
url = url[i+1:]
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)


scheme, url = url[:i].lower(), url[i+1:]

return tuple

(Why do blogs always _INSIST_ on fucking up source code? But we're kind of on topic, so maybe this fits). Anyhow, we have a fancy caching scheme, but the parser itself consists of a bunch of if and uri.split() statements. Talk about premature optimization. More than that, one should think that language implementors know a thing or two about parsers...

Consider: the parser is written in such a way that the result is predictable if and only if the input string represents a valid URL. But how do you find out if a string is indeed a URL? The answer is easy: you use a parser. In other words, the urlparse module is in most cases useless, because unless have sufficient control over the input (unlikely for networking apps) the parse result is essentially undefined.

However the urlparse module is not only "useless", it is in fact dangerous, since by using it for untrusted input, the behaviour of your app is by implication also essentially undefined (how do you handle an undefined result?). Now consider the following quick google code search. I don't suppose that any of the following names rings a bell with you: Zope, Plone, twisted, Turbogears, mailman, django, chandler, bittorrent. Surely all of these software packages have carefully reviewed all of their uses of urlparse, and properly identify and handle all cases where an arbitrary result may be returned... Script kiddies, REJOICE!


CALL for testing ANGEL APPLICATION release candidate 0.2.0rc1

Dear all,

the ANGEL APPLICATION source code has reached a point which we think is good for creating a new public release for m221e ANGELS.

To make sure things go well, we kindly ask that each etoy.AGENT running MAC OS X or a Unix-ish operating system downloads the RELEASE CANDIDATE of the software, which is available at

All we ask for is starting it and checking the following things:

- does it crash?
- does the "p2p process" run continuously?
- do all the icons and images show up correctly?

If you encounter problems, you can do the following:

- purge the repository via the new File menu command and see if the problem persists
- remove all previous data like so:

    rm -rf ~/.angel-app
    rm ~/.angelrc

and see if the problem persists
- report the operating system version
- for mac users, consider copy pasting output from (it shows the logging of angel-app)

For a list of changes, I suggest looking at agent Vincent's blog post at:

It would be nice to get feedback (also positive ;-) ) during the weekend.

thank you!

Comments (16)  Permalink

next generation etoy.TERMINALS

cheaper, faster, lighter, less fragile, easier to set up, batteries included: eeepc. comments welcome.

ANGEL APPLICATION - approaching beta

We're highly pleased with the progress we have been making lately: The next release of the ANGEL APPLICATION is to be expected for one of the coming weekends (obviously, it's ready when it's ready, we're largely debian nerds after all). The obligatory screenie (looks haven't changed much, tho'):

Major changes include:

  • a completely revamped security model: we have abandoned our previously mixed pull/push model in favor of a purely pull model. This greatly simplifies the code, and increases security by disallowing any (with one tiny, optional, exception) modification of data on the clients by remote agents. However, this required
  • NAT traversal support. This we implemented by adding optional support for NAT traversal via teredo/miredo. This in turn required
  • (optional) support for IPv6 in the twisted matrix library, our primary infrastructure library. The extension is available as a (limited, but self-contained) add-on module from our subversion repository.
  • To support transparent addressing in the face of a schizophrenic internet infrastructure, agent.POL has implemented a dynamic DNS service that supports IPv6 (note e.g. the clone located at, IPv6 required). He's currently offering that as a free service on We plan to integrate it more tightly into the angel-app as time and resources permit.
  • A revamped configuration subsystem.
  • Improved GUI support.
  • An extensive code cleanup, resulting in a reasonably clean object model and a rather thorough unit test harness, while actually reducing the size of the code base.

I'm currently in the process of stress-testing the system by letting POL's home machine backup my holiday pictures (again, IPv6 support required). Things are looking good so far ;-) Stay tuned, or grab the latest snapshot from svn.


Supporting IPv6 in twisted

It turns out that adding IPv6 support to the twisted library is rather straightforward. Originally, I just hacked up a few changes to get my prototype running with teredo, resulting in a few lines of changes to the twisted networking code (patch available). It turns out that the resulting code seems fully backwards-compatible with IPv4. Unintended, but highly welcome ;-) YAY (IPv6 required)!

Anyhow -- if you're a hacker, give teredo a try. Turn your laptop into a server in a few minutes. It's a sexy piece of technology which I think will greatly change the way we think about and work with the internet.

One thing to keep in mind are security issues: teredo provides you with a globally visible IP address, meaning you're directly addressable worldwide. NATs and many in-between firewalls are tunnelled through. If you're using a mac, add something like the following ipfw firewall ruleset (thanks POL) to your miredo startup command (see /etc/miredo.conf) to protect you from unsolicited and possibly dangerous traffic:


exec > "$LOGFILE" 2>&1

echo "Starting miredo hook for setting up firewall rules...."
echo "$0: $$" echo "uid: $UID"
killall -HUP lookupd DirectoryService

/sbin/ip6fw add 1000 allow log tcp from any to any 6221
/sbin/ip6fw add 1001 deny log tcp from any to any

ANGEL APPLICATION getting ready for IPv6?

After a beautiful afternoon hack, we got the angel-app to work with miredo/teredo and IPv6.

screenshot with IPv6 address


What's the meaning of this you might ask yourself? Well, it means the potential for true p2p networking, which has so far been a real pain in the ANGEL APPLICATION's backside.

Among other things, agent.POL has been able to access my ANGEL APPLICATION instance running on my laptop at home -- behind 2 layers of NAT, no less.

And if you have a teredo-enabled host (perhaps even with any IPv6-enabled host, we're not sure yet), you can try it yourself for the time being (no guarantees):


This means that a highly secure pull-only model is (in principle) within reach, greatly simplifying and stabilizing the ANGEL APPLICATION.

Stay tuned.

Comments (3)  Permalink

fascinating collection of old usenet threads

since we're in the digital history business: i've run across a fascinating chronological collection of old usenet threads.

Highlights include the announcement of GNU, and the announcement of the www.


angel-app test system activated

we've been playing around a lot with our prototype (download available) lately. today, we noticed that the first data sperms have been injected into the googleverse:

note that in the meantime, we've switched ports for the public interface of the angel-app (from 9999 to 6221). port number 9999 seems to be reserved for worms, rootkits, and other evil TCP payload.

Related Entries:
ANGEL APPLICATION 0.4.2 "pollination"
harvesting April 1st hoaxes for future technologies

FON in zurich

The collaborative, worldwide FON wifi system is now all over the place, even in Zurich... time to become a Linus or a Bill Gates?

ah, btw, this is from a post on joi ito's blog, where he mentions that phone wifi routers in the US can now be had for free.


make way for the stumbling, flailing giant,

lest you be squashed as it falls:
For legacy reasons, an implementation using the 1900 date base system shall treat 1900 as though it was a leap year. [Note: That is, serial value 59 corresponds to February 28, and serial value 61 corresponds to March 1, the next day, allowing the (nonexistent) date February 29 to have the serial value 60. end note] A consequence of this is that for dates between January 1 and February 28, WEEKDAY shall return a value for the day immediately prior to the correct day, so that the (nonexistent) date February 29 has a day-of-the-week that immediately follows that of February 28, and immediately precedes that of March 1.
aahh, microsoft bashing.... how we will miss it once it's gone.
Prev Next11-20/24 twisting values since 1994