Summer of Code 2006 summary by Ethan Fremen

(originally posted August 20, 2006)

The Summer of Code project has been highly educational, but sadly not as functional as I had hoped. I believe a good foundation exists at this point to deliver a much improved interface for mailman 2.3, and I will continue to work towards this goal.

Key Features

First, I'll go over the good things that have come out of this, and then I'll talk a good deal about all the things that didn't work.

WSGI App - as a "full" WSGI app, the new mailman UI will be embeddable in any framework that can handle WSGI, which appears to be the majority of them.
Pieces for non-python apps - The new UI will allow people using any non-python web environment to embed chunks of the mailman UI into their web sites.
Extreem accessability - I've tried to make sure that the mailman UI can be used by any user agent there is. At this point it should work fine on just about any device capable of parsing XHTML.
i18n - While I'm not quite there on building .po files and the like, the approach taken is I think one of the best ways to ensure internationalization. All pages are utf-8, and there are zero strings inside the python code that require translation; if the templates are translated, the application is translated.

Requirements

I spent a large portion of the middle of the project trying to make an approach work that would allow this UI to be included in the mailman 2.2 release, without success. First I attempted to implement a mod_python only approach, but after being bitten for the umpteenth time with mod_python's peculiarities about the import statement I decided to try other approaches.

Python 2.3

Nobody seems to be interested in supporting python 2.3 in web framework land. I tried to use paste; the currently used servelet implementation, Wareweb, uses decorators. I spent a day or so trying to steal everything-but-the-decorators from Wareweb, but after running into problems I decided it wasn't the best use of time. Between this failure and the mod_python problems, I decided 2.3 just wasn't happening and thus I wasn't going to make it into the 2.3 release.

No User Model

This was one of my biggest challenges, but at this point I also think it is a strength. Mailman currently has no particular notion of users. An email address associated with a list has settings and perhaps even a password. Higher level privileges, of which currently there are "list admin" and "moderation" - require a username-less password.

This doesn't jibe with the classical notion of users, and for a while I struggeled with that. Now I think it's perfectly appropriate. Most humans-that-use-mailing-lists have multiple email addresses, and most people that deploy mailman mailing lists are doing so in the context of other web services that have their own sense of what a user is.

Currently under implementation is an approach where I consider an 'agent' to be a set of identifying strings - mostly email addresses, but also usernames and things like session-cookie tokens and even 'passwords'. Usernames, if any, are provided not by mailman, but by whatever site mailman runs under, in the form of a REMOTE_USER token or potentially other tokens as people let me know how they'd like to tell mailman who users are.

As soon as this is vaguely working, I also hope to support OpenID tokens, as a step towards mailman allowing "mailman wide" authentication; i.e. once you've signed up for any mailman list, new lists you subscribe to on other sites can automatically fetch information about you from other the other mailmen.

When mailman is provided sufficient user-tokens, it applies UI changes automatically; when it has been provided enough tokens to accept a user provisionally, it emails the address-of-record with a summary of the changes and an "apply" link. The common case of "provisional" authentication is a user-agent arriving with a visitation cookie that has been associated with an email previously, and the UA provides that email again.

Frameworks

For the Mailman UI, I wanted to leverage the work that has been done to make web programming easier. In pursuit of this, I investigated a number of web frameworks. The current situation in python-land is that all of the frameworks that provide support for kid are half-baked, and I ran into several problems using them.

Turbogears - Being the only framework with kid as its default templating language, this is the framework I spent the most time on. Unfortunately turbogears breaks kid in some pretty fundamental ways; the most irksome of which is that the import statement doesn't work. I tried working around this a variety of ways before punting.
Pylons - Pylons got support for other templating languages in the middle of the SoC project, so I gave that a try. Routes is definitely a nice URL mapping strategy, but it turned out the support for kid was provided through a Buffet-like interface which meant there were similar kinds of breakage.
Python Paste + colubrid + kid - as a frameworkless framework, colubrid seemed like an ok middle ground after my failure to reinvent Wareweb in python 2.3. I confess I can't even recall what broke on this approach, but I moved off of it for some reason.
Python Paste + Wareweb - this is the current contender; it's what's in CVS and what seems to be working.

Templating

Kid seems very nice on first blush, and Ryan Tomyako deserves mad props for inventing it, but it has some unfortunate implementation choices and feature limitations that made it painful enough to work with that eventually I moved on

Templates as python modules

I'm beginning to feel that one of the "problems" with python is that the ease of overriding default behaviour leads people to do so when custom behaviour would be more appropriate. One of the ways this shows up in Kid is that every template is also a python module.

This means that there are (at least) three ways to get a kid template:

Using the import statement, if the import-statement-hack has been turned on.
Using the kid import function
Using a lower-level template-loading function.

None of the frameworks I investigated were happy with willy-nilly imports. Pylons horked on even the kid import function, apparently because it clashed with the pre-existing Myghty re-implementation of import. Neither Kid nor Pylons would let me use the low-level template loading function because it wasn't exposed in the Buffet wrapper.

Even if you ignore this problem, the templates-as-modules approach had other problems. I may be incorrect in this assessment, but part of the reason kid couldn't provide meaningful error messages is because the compiled module had 'line numbers' that differed from those in the templates.

Markup to the rescue

Edgewall's Markup proved to be the solution to my dilemma. Largely a reimplementation of kid, Markup corrects the limitations that had me pulling my hair out with kid.

XInclude instead of py:extends - Markup implements a subset of the XInclude standard in a way that lets me dynamically specify the templates to be included in another template, as opposed to the hardwired py:extends / py:layout commands in kid.
XPath for matching - Matching in kid and Markup is basically specifying an element that is to be overriden by the markup in your match block. Kid provides a matching algorithm that relies on ElementTree's interface. It's not bad, but it's way more verbose and code-heavy than XPath. XPath is a nice, compact, well-known way to specify which portion of a document you're interested in, and is much better for my purposes. I'm still having trouble with doing dynamic matching, but variable substitution is coming in the next release of Markup, so this should be addressed soon.
Templates aren't modules - Since the templates aren't compiled to bytecode, and because Markup does much more work to preserve information about where one is in a template, error messages actually tell you what line and file is causing a problem, making debugging templates much easier.

Other fun

During this project, I had my production server crash hard enough to require the raid array to be rebuilt, key partitions (e.g. /var) to be lossily recovered, and a complete reinstall of the os to recover the files lost in the partition recovery.

I purchased a new server at the beginning of the summer, because I had seen this particular nightmare coming. I've been trying to use it as the development server, but unfortunately I have been experiencing intermittent segfaults and memory corruption. I have finally (fingers crossed) tracked the problem down to the kernel version I was using, so gods willing my new server should be functional now.

I also discovered that the "macs-don't-need-defragmentation" line is really only true if you don't use up all your disk; once you do, the fact that the swap file is dynamically allocated to the largest available contiguous block means you can rapidly find yourself out of swap, which translates to many delightful crashes due to illegal memory accesses. This has meant that even my workstation has been prone to dropping my IDE on the floor when it feels like it.

Current Status

Because I had so many problems with my development environment, I tried to implement all the templates in a vaccuum. This means that most of the templates exist in the repository, but they don't actually function yet because they haven't been running in an environment. I expect there to be a fair amount of effort left in getting them all plugged in.

I've got a light schedule for classes this upcoming year, and I'm still very enthusiastic about this project, so I intend to continue working on this throughout the year and beyond.

MailmanWiki: DEV/Google_Summer_of_Code_2006/Summary (last edited 2015-03-04 04:37:27 by msapiro)