Warning: This is obsolete!

HyperKitty is the new archiver for the Mailman 3 suite.

The Pipermail archiver inside Mailman is over 10 years old at this point, and was modelled on an even older program, Hypermail.  Expectations have changed, and the archiver could use some modernizing.

Requirements

  • Greater customization through templates. At the very least, modernized HTML/CSS so themes can be all CSS.
  • Must provide stable URLs, e.g. that survive archive regeneration even if messages are deleted. See Stable URLs for more info.

  • Must protect e-mail addresses from being crawled by spammers.

  • Must survive vicious crawlers that don't obey robots.txt or crawl very quickly.

  • May provide a feature to automatically detect & disable crawlers.

  • Must be able to import existing mbox-format mailboxes.

  • Should be able to return a URL or ID for a message very quickly; this would let us include a link to the archived copy in the message footer.

  • Should provide a forwarding mechanism so that existing pipermail urls can be forwarded to the new archiver urls.

  • Access through CGI or script. (Generating static files may continue to be an option.)
  • Easier to edit archives (deleting posts, moving discussions to another list or archive).
  • Easier to use an external archiver program.
  • Make archives more useful through different access methods:
    • NNTP
    • RSS feeds
    • Per-topic archives
    • possible integration with bulletin board/forum software
  • Should Provide search capabilities, esp. for private archives

Following list is taken from the Mailman 2.2 page.

  • Reconsider using a 3rd-party archiver
  • Perhaps URLs to messages should be based on message-ids instead of message numbers so that regenerating archives can't break links. This must include backward compatible links

  • Ditch direct access and vend all archive messages through CGI so that we can do address obfuscation, and message deletion, etc. on the fly (with caching of course, but have to worry about web crawlers).
  • Add RSS feed

  • Allow for admins to remove or edit messages through the web.
  • Move archive threads into another list?
  • Put archives in the list/mylist directory.

  • Add a search option
  • Make archives default template look and feel similar to Web UI (whatever it looks like after the Summer of Code project is done)

  • Make archive templatable (at least by changing CSS) so they can match people's existing site look-and-feel
  • MUAs usually make URLs clickable. An new Archive could be used when posts are distributed, in the footer, so that each message has a link to the whole thread in the Archive.
  • Present all messages in a thread at once, and offer plaintext download of the whole thread
  • Put messages into a database and/or move away from mbox as the canonical storage format.

Vague notions

Store message data in SQL database instead of mailboxes?

Pro: may offer fast database text search features; moving a list will only require copying the SQL database;
will have transaction support where possible.

Con: can't use grep, other Unix-type tools; database will consume a lot more disk.

Tasks

Templating probably needs to be added to Mailman before the archiver can be developed.

  • Look at Ethan Fremen's soc2006-webui. Use it? Update it? Ignore it?

Other sources of information/ideas

Discussion on improved archives and stable URLs
Máirín Duffy's rich interface for mailing lists mockup


Comments

Barry Warsaw

Good stuff Andrew, here are a couple of other thoughts:

  • imap access to the archives, readonly for most people, write access for admins
  • "send me this message" link so that if you're reading an article in the archive, you can click this link to get a real copy sent to you. may not be as important w/nntp or imap access. perhaps integrate with an email comment (i.e. 'send message-id')
  • we talked about perhaps have a tinyurl-like alternative access to the message (though this probably isn't the definitive url)
  • integration w/digester so we can have new kind of "summary-with-links" digest
  • if we use the original message's message-id, we need to have a policy for dealing w/collisions
Ethan Fremen

I appear to have almost closed (knock on wood) a contract to finish my webui stuff and to improve archiving, so I would love to help on this.

Julian Mehnle

You should check out Lurker as a 3rd-party archiver. Integration of Lurker with Mailman could be somewhat improved (especially the UI), but it works quite well already.

Generally, I don't think Mailman should try to create its very own archiver. It's just not Mailman's core competency.

MailmanWiki: DEV/ModernArchiving (last edited 2015-03-04 02:12:51 by msapiro)