Improved Logging
The next major release of Mailman uses Python's logging module for all log writing. Logging will also be easilly configurable by usimg a LOG_CONFIG_FILE which will let you customize log levels for the various log files Mailman writes, as well as log formats and such. However, currently Mailman doesn't log any more or less information than it did in Mailman 2.1.
This page will collect thoughts on how to improve Mailman's logging so that site administrators, list owners, and users can better utilize this useful information.
Separate loggers
Mailman 2.1 uses a number of separate logging streams to separate out such information as postings, smtp successes and failures, lock debugging, errors (tracebacks), etc. Do people like this, and should we keep this for Mailman 3.0? OT1H it's nice to be able to just view the log you care about without all that information being conflated, but OTOH, it means more files to inspect. I'd like to hear your thoughts on that.
Oh, I should mention that it ought to be possible via the LOG_CONFIG_FILE to specify that any combination of the separate loggers output to the same log file. That might be enough to address this for all use cases.
Information to log
What does Mailman currently log that you have no need for, or that clutters up your automated processing of the log files? How could Mailman's log formats be improved for tools such as grep or other system administration tools?
What information does Mailman currently not log that would be useful to log? You system administrators out there, what are you missing that would make your lives easier?
Access to log data
Currently, only site administrators have access to Mailman's log information via the file system. However, some of this information (or new information we start logging) could be very useful to list administrators and users. For example, list owners might like to get some metrics on their list's traffic. How many posts per day, week, month do they see? What is the distribution of user activity? How big on average is their held queue?
Users might find some interesting information in some of the log data too. Which lists are they most active on and by how much? Which lists get the most and least traffic? Before they subscribe, a user might want to get a sense of how much email they will get.
Your comments
Please post your comments below. I'd like to get a sense for what people want to see and then we can modify these pages to include a more formal specification for improving Mailman's logging.
Separate Loggers: It would be nice if it were possible to get all logs in 1 place, and in a format that was consistent enough to aid writing a good log parser to generate traffic reports from.
Access to log data: I don't think list owners need access to the logs themselves, but it would be great to be able to offer traffic stats of some kind. Hopefully that output from that system would use the same templating system being discussed for mailman itself.
– acrosman
Comments
- I agree that separate logs really don't work very well; it's very common to find that you have to look in several logs to determine what happened to a specific post. Sure, it's occasionally useful that errors or smtp-failure will show a recent date in ls -l indicating "we have problems", but that's just as easy to accomplish (given a unified log format) with grep and tail. With an appropriate tag for each logger, grepping will give you only that information, and it would be easy to produce more sophisticated tools.
- I put the unified format high on the totem pole; increasing log data is not hard--just add a call to the logger--it's presentation, and especially filtering, to produce information from the raw data that requires thought.
- For the unified format, I think it should be made very easy to track a particular post through the system. This is sufficiently important that I've thought of a format that goes "UID PHASE TIMESTAMP STATUS freeform" where UID would be a fairly unique but short (6-8 ASCII characters) ID such as MTAs generate, PHASE would be in the Linux .rc format (00-RECEIVED, 01-PARSED, ...) for sorting. This may be a dumb idea for various reasons, haven't thought carefully about it, but you can see how it would make it easy to see a message's progress with a simple line sort. "line sort" is not essential, but it should be easy for a tool to produce such output.
- It's rather important to log receipt, as a couple of recent threads on Mailman-Users show.
- Log data through the web is probably the single biggest thing we can do to comfort cPanel and Plesk users.
I'd say that getting more log data out of the system is one of the biggest issues. We need to be able to track every single message coming into the system, passing through the various modules, and then finally passing out. We could tie message-ids to a queue-id, and then log queue-ids throughout (as is done with sendmail & postfix), but whatever we do needs to be consistent.
- An optional unified log file format (a la apache) would be nice, as it would allow you to keep everything in one place. But I'd place this lower on the totem pole than getting more log data out of the system.
- Don't fall into the trap of trying to make things too short or cryptic, which might increase the chance of collision or confusion. Better to be more verbose and eliminate all doubt.
- More importantly, making all log data available via the web interface to list admins would be great. The more we can do that doesn't require command-line access to the server, the better. This includes integration with log processing tools like Splunk (which we should use on python.org, btw).
I speak only for myself, but with experience in tracking down problems on posts to various lists on python.org, as well as writing or maintaining various log processing tools
[ Decided to move this stuff down into a "real" comment. Sorry about the edit, folks! ]
Almost always, when I'm trying to fingure out probelms, I'm trying to do it by viewing the transactions for a particular user/email address.
It would be super, super helpful. to be able to click on the user in the web interface and have:
1) Complete subscription history
2) complete bounce history (including bounce messages)
SUBSCRIBE HISTORY
Various verified sender programs (like postmaster.aol.com) require us to know exactly when and how somebody was added to the list. The existing subscribe log doesn't do this as well as we'd like.
People forget they signed up for a list and scream "GET ME OFF THIS SPAM", it is nice to show them the record of their sign up.
The log format might be:
timestamp: : event: Name: email: comment
The comment field would either be populated from the web invite/subscribe form or with useful info from the headers of a message automatically processed.
2007-06-01 21:00:01 :subscribed web interface : John Miller : jmiller@thecsl.org : paper sign-up sheet open house 07/05/29
2007-06-01 21:00:01 : invited web interface : Bob Smith : bsmith@thecsl.org : swapped business cards 07/05/28
2007-06-01 21:00:01 :received confirmation email from 192.168.1.1: Bob Smith : bsmith@thecsl.org : invite ID = 383847575 , IP = 192.168.1.1, x-mailer = MS Outlook 2010
2007-06-01 21:00:11 : received subscribe email : Jane Doe : jdoe@thecsl.org : IP =10.0.0..2,x-mailer=elm
2007-06-01 21:00:01 : sent automatic confirmation request email : Jane Doe : jdoe@thecsl.org : CONFIRMATION ID = 3838383875 logged so we know it was sent
2007-06-01 21:00:01 : received confirmation email: jdoe@thecsl.org : IP =10.0.0..2,x-mailer=elm,COMFIRMATION ID = 3838383875
BOUNCE HISTORY
Usually we we want to know why a message has bounced. This is especially true if somebody has been unsuscribed due to excessive bounces.
It would help a lot to have, the text of the bounce message. "no such account" is different than "We think you are spam, you message is rejected, click here if you are a human"
Right now, I edit /etc/aliases to send the bounce messages also to me, which is a pain.
Disc space is so cheap these days even with web hotels...
Sorry if this has been discussed before; I'm new . Logging of Administrative changes would be helpful. We have many lists with many administrators and when somethings goes "bump", we have to figure out who-dun-it.