The Pipermail archiver inside Mailman is over 10 years old at this point, and was modelled on an even older program, Hypermail. Expectations have changed, and the archiver could use some modernizing.
Requirements
- Greater customization through templates.
Must protect e-mail addresses from being crawled by spammers.
Must survive vicious crawlers that don't obey robots.txt or crawl very quickly.
May provide a feature to automatically detect & disable crawlers.
Must be able to import existing mbox-format mailboxes.
Should be able to return a URL or ID for a message very quickly; this would let us include a link to the archived copy in the message footer.
- Access through CGI or script. (Generating static files may continue to be an option.)
- Easier to edit archives (deleting posts, moving discussions to another list or archive).
- Easier to use an external archiver program.
- Make archives more useful through different access methods:
- NNTP
- RSS feeds
- Per-topic archives
Vague notions
Store message data in SQL database instead of mailboxes?
Pro: may offer fast database text search features; moving a list will only require copying the SQL database; will have transaction support where possible.
Con: can't use grep, other Unix-type tools; database will consume a lot more disk.
Tasks
Templating probably needs to be added to Mailman before the archiver can be developed.
- Look at Ethan Fremen's soc2006-webui. Use it? Update it? Ignore it?
Comments
I appear to have almost closed (knock on wood) a contract to finish my webui stuff and to improve archiving, so I would love to help on this.
You should check out Lurker as a 3rd-party archiver. Integration of Lurker with Mailman could be somewhat improved (especially the UI), but it works quite well already.
Generally, I don't think Mailman should try to create its very own archiver. It's just not Mailman's core competency.
Good stuff Andrew, here are a couple of other thoughts: