Pickles Begone

For a while now I've been working on a branch of the main Mailman svn trunk. The primary goal of this branch was to see if we could save all list data in a SQLAlchemy backed database instead of all the little individual list-specific pickles that has been a signature of Mailman's "database" almost since its inception. My experiments were rather modest – I wanted to see how little code I had to change in order to get rid of the pickles altogether.

It turns out that really, not much code had to be changed. I think this is a testiment to SQLAlchemy's approach. I had made several attempts at switching to an ORM before, once with SQLObject and once with SQLAlchemy. In fact, during the last couple of PyCons in Washington DC, a number of us sprinting on this very thing. Our goals were much more ambitious though; we tried to layout an elaborate schema for the ultimate Mailman 3 database. In retrospect, this was a good exercise, but because the changes were so radical, the work was never completed.

My previous attempt using SQLAlchemy failed because I was still trying to keep the list-centric approach. IOW, I was trying to keep one database per list, but this didn't work for several reasons. This time around I decided to simplify things by having one database contain all data for all mailing lists. I've had enough success with this that I'm ready to merge my branch back into the trunk.

Compromises

While the branch seems to work pretty well – you can certainly make changes in the web u/i and they persist, and you can send mail through the system – it hasn't been thorougly tested. My opinion is that it won't get more testing in the branch and it's stable enough, so it's time to merge.

I've made some compromises in the schema design in order to get things working. For example, any MailList attribute that isn't of a simple type (e.g. not a string or number) is defined in the SQLAlchemy schema as a PickleType column. Okay, so pickles aren't totally eradicated! Still, this was the quickest path to getting everything working.

For example, the list of email address of the list's owners is stored as a Python list of strings on the attribute. Rather than DTRT now and represent those as its own table, I just make the MailList.owner attribute a PickleType and SQLAlchemy does the rest. My plan after merging to the trunk is to take these pickle columns one-by-one and split them out into their own tables.

Still to do

I think the very first thing to do after the merge is to build functional import/export scripts. Actually, we already have an export script for the trunk, but this must be ported to the MM2.1 branch. The way existing lists are going to be migrated to the trunk is by exporting an XML file representing one or more mailing lists, then taking those XML files over to a MM2.2 (or 3.0???) site and doing an import. ISTM there's no other way.

Once I have a working import/export, I plan on rewriting the pending database to be SQLAlchemy'd instead of the pickle it currently is. After that, I will work on individual PickleType columns, to de-picklify them too. My top priority is to implement the rosters idea I've had for a long time. This will be more disruptive to the code base but it will also have the greatest impact on usability I think.

Comments

Barry Warsaw

r8127 includes a rudimentary import script, so you can now dump your MM2.1 lists uses the branch's head's export.py script, then take the resulting XML and import it into MM2.2. Still to do: importing header_filters and topics (both list defined and user selected).

Joshua Kugler

A technical question: can you elaborate a little bit on why you chose SQLAlchemy over SQLObject? Not flaming...just wondering.

Barry Warsaw

I can't really point to x, y, and z technical reasons. Several Pycons ago a few of us sprinted on SO support in Mailman and got pretty far. Ultimately though I had trouble getting it to do what I wanted, as easily as I wanted it to be. When I heard about SA, I pulled it down and started playing with it and it just seemed so much easier to build it into Mailman that I just stuck with it. What I really like is the brevity and yet power, but more importantly the minimal impact it forced on Mailman's current architecture. What this means for me is that I can move Mailman over to SA incrementally, allowing me to have more confidence things are working correctly, along with better milestones along the way. So SA seems quite nice and I haven't (yet?) hit any roadblocks.