The Pipermail archiver inside Mailman is over 10 years old at this point, and was modelled on an even older program, Hypermail. Expectations have changed, and the archiver could use some modernizing.

Requirements

Greater customization through templates.
Must protect e-mail addresses from being crawled by spammers.
Must survive vicious crawlers that don't obey robots.txt or crawl very quickly.
- May provide a feature to automatically detect & disable crawlers.
Must be able to import existing mbox-format mailboxes.
Should be able to return a URL or ID for a message very quickly; this would let us include a link to the archived copy in the message footer.
Access through CGI or script. (Generating static files may continue to be an option.)
Easier to edit archives (deleting posts, moving discussions to another list or archive).
Easier to use an external archiver program.
Make archives more useful through different access methods:
- NNTP
- RSS feeds
- Per-topic archives

Vague notions

Store message data in SQL database instead of mailboxes?

Pro: may offer fast database text search features; moving a list will only require copying the SQL database; will have transaction support where possible.

Con: can't use grep, other Unix-type tools; database will consume a lot more disk.

Tasks

Templating probably needs to be added to Mailman before the archiver can be developed.

Look at Ethan Fremen's soc2006-webui. Use it? Update it? Ignore it?

Comments

Barry Warsaw

Good stuff Andrew, here are a couple of other thoughts:

imap access to the archives, readonly for most people, write access for admins
"send me this message" link so that if you're reading an article in the archive, you can click this link to get a real copy sent to you. may not be as important w/nntp or imap access. perhaps integrate with an email comment (i.e. 'send message-id')
we talked about perhaps have a tinyurl-like alternative access to the message (though this probably isn't the definitive url)
integration w/digester so we can have new kind of "summary-with-links" digest
if we use the original message's message-id, we need to have a policy for dealing w/collisions

Ethan Fremen

I appear to have almost closed (knock on wood) a contract to finish my webui stuff and to improve archiving, so I would love to help on this.

Julian Mehnle

You should check out Lurker as a 3rd-party archiver. Integration of Lurker with Mailman could be somewhat improved (especially the UI), but it works quite well already.

Generally, I don't think Mailman should try to create its very own archiver. It's just not Mailman's core competency.