DOC/4.08 How can I add Namazu as a search engine for my Mailman list archives?

4.8. How can I add Namazu as a search engine for my Mailman list archives?

See also the message in the mailman-users archive at http://mail.python.org/pipermail/mailman-users/2004-June/037584.html, from Tom Morrison, dated Fri, 25 Jun 2004 09:45:23 -0500 and another approach at http://bakacsin.ki.iif.hu/~kissg/project/mailman+namazu/.

Subject: Re: Mailman-Users Mailman for maillist, namazu for search

  From: Philip S Tellis
  Date: Fri, 30 Nov 2001 14:13:40 +0530 (IST)
  To: <mailman-users@python.org>

  Sometime on Nov 29, Jon Carnes assembled some asciibets to say:

  > Looks cool.  How did you do it, and what are the current problems
  > with the way it's setup/working?

How I did it:

Namazu

There's this product called namazu (http://www.namazu.org/). It's a search engine that uses a full text index. Can index local paths, not remote paths.

Put the namazu cgi in the mailman/cgi-bin/ directory, and called it search. Also had to hack namazu a bit to get it to read the index from the PATH_INFO from the url (/linuxers/, /life/, etc.).

Also changed the namazu templates to look like mailman's listinfo pages.

The last thing was to index the pages.

So far, no changes were required to mailman, but the products aren't fully integrated yet.

The only thing that isn't done is automatic reindexing every time a new message is added. I thought it may make sense to do the reindexing at the same time that a new archive is created (i.e., weekly or monthly), but couldn't find the place to put that.

The next best idea I had was to put it the reindexing as a cron job. Problem here is that there is a chance that the reindexer would run while a new message was only partially written to disk.

Mailman

I also had to add either a link to the search page on the listinfo page, and a search box or link on the archive pages. While this isn't up on the urls yet, it is done by hacking Mailman/Archiver/HyperArch.py.

The real problem is getting the link to show up only on pages that have actually been indexed.

How this can be fully integrated

A config variable stating whether the list needs to be indexed for searching, and based on that, these links can be written to the archive pages. The reindexer can be run through mailman or through cron, depending on what others think is a better solution.

Philip

Converted from the Mailman FAQ Wizard

This is one of many Frequently Asked Questions.