3.3. How can I remove a post from the list archive / remove an entire archive?

One way to do this is to edit the raw archive "mbox" directly, and then regenerate the archive.

NOTE: that regenerating the archive from the mbox can have unwanted consequences. See the CAVEAT below for alternatives.

In any case, this requires login access to the command-line prompt on the host where Mailman is running; you'll either need to be in the "mailman" group, or be the "mailman" user. (You could of course do this as root, but that's not recommended – don't do anything as root that you don't absolutely need to do as root.)

If you don't know how to use the command-line prompt on the host where Mailman is installed, or you don't have access to the command-line prompt, you will need to talk to your service provider or your support services personnel in order to perform these functions.

Note that if you are using a cPanel-hosted service (see Mailman and CPanel), you almost certainly will not be allowed to log into the server.

Anyway, assuming you do have the necessary access and knowledge of how to use the command-line system on your server, here is a detailed example. Let's say you want to remove a post from the archive for the "test" list. First, you need to find the archive. There are actually two archives. The HTML archive is in $var-prefix/archives/private/test/* and the cumulative mailbox containing all posts in mbox format is $var-prefix/archives/private/test.mbox/test.mbox. If the archive is public, there will be symlinks $var-prefix/archives/public/test and possibly $var-prefix/archives/public/test.mbox pointing to the corresponding directories in $var-prefix/archives/private/. $var-prefix is the path to the directory where Mailman stores its mutable data. This may be /var/lib/mailman or elsewhere depending on how Mailman was configured.

Now, to edit the raw archive point your editor at the mbox:

  cd $var-prefix/archives/private/
  xemacs test.mbox/test.mbox

(You can use whatever text editor you like, but be sure that it 1) doesn't try to wrap lines or change anything implicitly, and 2) can handle large files, if this is a big list archive.)

You can find the message you want to remove, and remove it. Each message starts with a line that looks like:

  From joe@example.com Wed Sep 26 16:39:08 2001

where, obviously, the sender address and delivery date will vary. To remove an entire message, you need to delete from the "From " line that starts your target message to the next "From " line, ie. the start of the next message. You MUST have a blank line between the end of the previous message and the start of the next message. For example, after deleting message N, you should have something like this:

   [...last line of text from message N-1...]
   [blank line]
   From jane@example.net Wed Sep 26 17:53:02 2001
   [...headers from message N+1...]

NOTE: Instead of editing the .mbox file with an editor, At least one user has reported - "I've had good luck using a command-line email program (such as mutt) to delete messages from the archive mbox. I've found this is especially useful for archives that contain many attachments and/or encoded text sections." However, if you use an MUA for this, be sure it doesn't reorder the messages in the mbox.

Removing an entire message has side effects. See the CAVEAT below. You should consider just replacing the body with "Content removed." or similar and perhaps deleting or modifying some headers, but leave headers like From:, Subject:, Date: and Message-Id:, although you can edit From: and Subject: if you wish.

Now, if you are going to rebuild the archive visible from the web. First, move the existing archive directory (test, not test.mbox) out of the way:

  mv test test.save

and then go rebuild the archive:

  cd ~mailman
  ./bin/arch test

NOTE: you must move the directory out of the way, instead of simply copying it. If you don't do this, the arch command will add additional copies of previous messages to the existing archives and you will end up with duplicates.

Or you can use the --wipe option with ~mailman/bin/arch to initialize rather than add to the existing archive. See ~mailman/bin/arch --help

To make sure it worked, visit the list archive with your web browser. Once you're sure it's fine, you can delete the "test.save" directory.

CAVEAT: If you delete entire messages from the archive, side effects occur:

  1. Threading may be broken - if C is In-Reply-To: B which is In-Reply-To: A, and B is deleted, C will no longer be threaded with A.
  2. Messages will be renumbered - this may be important if there are saved links to archive messages. Since the message number is part of the URI, the saved link will no longer work or will retrieve the wrong message.
  3. In some cases of older lists (begun pre 2.1.x), messages may be renumbered even if you only edit and don't delete any messages.

To avoid these problems, instead of deleting the entire message, leave the headers intact and replace the body with "Message deleted" or some other meaningful text, but renumbering may still occur with older lists.

To avoid the potential issues with rebuilding, you can just edit the HTML and .txt files (see below) as with the mbox, but you should edit the mbox even if you won't rebuild so a future rebuild won't resurrect old messages.

NOTE ON MBOX ANOMALIES:

A few anomalies can exist in the cumulative mbox. Older archives may have messages with unescaped 'From ' lines in message bodies. bin/arch interprets these as message separators and results in one message being split at that point. The first part will be in the rebuilt archive up to the 'From ' line, and there will be a second message consisting of the remainder of the original with no headers archived with the current date.

Also, if the mbox has ever been opened in an MUA and saved, messages might have been reordered. Both this and unescaped 'From ' lines will certainly result in messages being numbered differently which will invalidate prior archive URLs.

To assist with finding and fixing these problems, there are two scripts. Mailman's bin/cleanarch script can find and optionally fix unescaped 'From ' lines, and the script at https://www.msapiro.net/scripts/check_arch (mirrored at http://fog.ccsf.edu/~msapiro/scripts/check_arch) checks for unescaped 'From ' lines, missing or invalid Date: headers and optionally can check if there is a message with the appropriate sequence number in the appropriate period of the existing HTML archive and if the Subject in the HTML archive matches the mbox Subject. This check can give some assurance that a rebuild will not renumber messages.

NOTE ON HOW TO DELETE AN ENTIRE ARCHIVE:

  1. Browse to the list-administration panel, turn off archiving (option archive) and switch archive from public to private if it's public (option archive_private).
  2. Do the things as described above, when editing the raw archive (mbox), just delete all entries.

If you like to reenable archiving later, just turn on archiving in the list-administration panel (and set it to public if you like to).

NOTE ON PRUNING OLD MESSAGES:

There is a script at http://www.msapiro.net/scripts/prune_arch (mirrored at http://fog.ccsf.cc.ca.us/~msapiro/scripts/prune_arch) which can prune messages older than a specified number of days from the archives of one or more lists.

HOW TO DELETE A MESSAGE FROM THE HTML ARCHIVE ONLY

If you prefer, you can delete a message from the html archive only.

If the URL of the message is something like http://www.example.com/mailman/private/list/2008-February/024419.html (for a private archive) or http://www.example.com/pipermail/list/2008-February/024419.html (for a public archive), the archived message is the file ~mailman/archives/private/list/2008-February/024419.html.

You can simply remove that file, but that will leave broken links in the index pages and the 'next' and 'previous' links of neighboring messages. A better approach is to edit that file to remove what you don't want. Note however that if you don't edit the .mbox file, if you ever rebuild the archive, the original message will return.

Also note that if the Subject: or From: contains something you don't want in the archive, you may meed to edit that content in the 'next' and 'previous' links of neighboring messages. Also that content appears in index files as well, but editing them is problematic as they are rebuilt with every post. There is a script at https://www.msapiro.net/scripts/hdfix (mirrored at https://fog.ccsf.edu/~msapiro/scripts/hdfix) that can edit the underlying database to fix this.

Also, the message will remain in the web accessible periodic .txt and .txt.gz files unless you edit the .txt file too. If you edit the .txt file, the corresponding .txt.gz file will be rebuilt the next time cron/nightly_gzip runs as long as GZIP_ARCHIVE_TXT_FILES is not set to Yes in mm_cfg.py.

Converted from the Mailman FAQ Wizard

This is one of many Frequently Asked Questions.

MailmanWiki: DOC/How can I remove a post from the list archive or remove an entire archive? (last edited 2016-10-20 18:04:33 by msapiro)