4.12. What about VERP?
Here's one example of how this works in practice, summarizing various messages in the thread at <http://mail.python.org/pipermail/mailman-developers/2004-July/017015.html
Enabling VERP can be a serious performance hit for large lists. One example list with over 150,000 recipients saw a factor of 25-75 reduction in throughput, when VERP was enabled:
May 27 16:43:46 2004 (440) <20040527135407.74A5F368143 at alan.rezo.net> smtp for 151942 recips, completed in 1231.438 seconds Jun 11 19:05:45 2004 (440) <20040611163245.ED9CB3680AE at alan.rezo.net> smtp for 152333 recips, completed in 649.634 seconds Jun 30 15:39:26 2004 (435) <20040630132741.F1A0836811C at alan.rezo.net> smtp for 152717 recips, completed in 428.891 seconds Jul 13 02:05:22 2004 (435) <20040712150834.DCA153680BD at alan.rezo.net> smtp for 152991 recips, completed in 31782.241 seconds
428.891/152717 = 0.0028084 seconds per recipient (average) 1231.438/151942 = 0.0081046 seconds per recipient (average) 31782.241/152991 = 0.2077392 seconds per recipient (average)
This may not sound like much, but keep in mind that 31782.241 seconds is 529.704 minutes, or 8.828 hours, and during that time the mailing-list system was pretty much stuck (the other mail, that did not need to pass through Mailman, was OK).
If you can modify your Mailman configuration to let the MTA do the VERPing, you won't get the personalization benefits, but you will get the improved bounce management benefits. Both Exim and postfix are capable of doing this. For postfix, see <http://www.postfix.org/VERP_README.html#smtp>, although this will currently require that you modify your Mailman source code to use the "XVERP" option.
Of course, in this example the machine may not have been tuned for maximum performance, so your results might be better, if you pay more attention to the tuning of your systems. Of course, your results could also be worse.
So much for one example of the practice.
Now, here's the theory, from a message by Chuq von Rospach at <http://mail.python.org/pipermail/mailman-developers/2001-June/008928.html>:
1000 subscribers -- no digest subscribers to simplify this. Assume just individual messages. The message size is 10K, including header. The bandwidth needed to generate a connection to send a message is 1K (which is pretty close) The bandwidth needed to add an address to an existing message is about 1/10 of a K (also pretty close). The practical limit to the number of messages you can piggyback is 100, since this is specified in RFC2821 as the smallest number a site is REQUIRED to take. In practice, due to non-conformant sites, you have to be careful setting it beyond 50 these days, because sites set this number down because they think it slows down the spammers (I'm yet to be convinced it makes a damn bit a difference, especially since MTAs like postifx recognize the 452 and auto-adjust now. This is another place where sendmail seems behind the technology curve, FWIW) How much bandwidth is used depends on these factors: what your piggyback value is (in mailman, it's SMTP_MAX_RCPTS) how many domains have > 1 subscriber. Here's how plaidworks breaks down: 3101 subscribers across 1287 domains. that's an average of 2.3 subscribers per domain, but the numbers skew wildly, so averages are meaningless. Here's how my site breaks down: # of subscribers # of domains/# of users --------------------- ----------------- 1 263/263 2 142/284 3 40/120 4 19/76 5 16/80 6 10/60 7 7/49 8 3/24 9 6/54 10 2/20 11 2/22 12 2/24 13 1/ 14 1/ 16 1/ 17 1/ (worldnet.att.net) 22 1/(juno.com) 29 1 (mindspring.com) 30 1 (pacbell.net) 35 1 (plaidworks.com) 43 1 (sympatico.ca) 53 1 (earthlink.net) 150 1 (home.com) 173 1 (yahoo.com) 228 1 (hotmail.com) 441 1 (aol.com) if you're scoring at home, 37% of subscribers come from that last 4 domains: 5% for home and yahoo, 7% for hotmail, and 14% for aol. those are your 500 pound gorillas (AOL is 800 pounds), and piss them off at your own risk. At the other end, 8% of your users are the only subscriber from a domain. 16% are 1 or 2 per domain. 26% are on sites with 5 or fewer subscribers. Time for some numbers. Back to the 1000 member list for simplicity. The subscriber list breaks down to: 85 - 1/85 45 - 2/90 12 - 3/36 6 - 4/24 [...] 48 - 1 55 - 1 73 - 1 142 - 1 That's 553, or 55% of the subscribers, wedged tightly on both ends of the curve. We can extrapolate what they'll do to bandwidth from the end cases if we need to. Extreme case: SMTP_MAX_RCPTS = 1. 1000 subscribers * (10K message size + 1K overhead) = 11,000K bytes bandwidth. Extreme case: SMTP_MAX_RCPTS = 100 These get sent down the line this way: 85 * 11K 45 * (1 * 11K + 1 * .1K) 12 * (1 * 11K + 2 * .1K 6 * (1 * 11K + 3 * .1K) [...] 1 * 11K + 47 * .1K 1 * 11K + 54 * .1K 1 * 11K + 72 * .1K 2 * 11K + 140 * .1K Do you see how I got these numbers? In the case of the 12 domains with three subscribers, you have to make an 11K connection for the first message, and piggy back on the other two addresses at .01K each. You don't really see huge savings until the big domains, and you'll see AOL goes over the 100 address limit so gets split into two different messages. For this 55%, the SMTP=1 is 6050K. For 100, it's 1711K bytes. That's 28% of the first number, so we're cutting 72% of the bandwidth by chunking at 100. The tradeoff is performance, though -- it takes a lot longer to deliver those AOL addresses, because if you split it into two batches, you can't parallelize the delivery. Package up 100 AOL addresses in one batch, none of them get delivered until all 100 addresses are sent to AOL and accepted. It's much faster to send them as ten batches of ten in parallel -- but that's the trade off here. Cut network bandwidth but slow delivery to the larger domains. Okay, let's look at a case in the middle. SMTP_MAX = 5. The ones with less than 5 don't change, but the big domains do 85 * 11K 45 * (1 * 11K + 1 * .1K) 12 * (1 * 11K + 2 * .1K 6 * (1 * 11K + 3 * .1K) [...] 1 * (10 * 11K + 38 * .1k) 1 * (11 * 11K + 44 * .1K) 1 * (15 * 11K + 58 * .1K) 1* (29 * 11K + 113 * .1K) that works out to (trust me) about 2378K, or about a 60% reduction. Let's try SMTP_MAX = 2. 85 * 11K 45 * (1 * 11K + 1 * .1K) 12 * (2 * 11K + 1 * .1K 6 * (2 * 11K + 2 * .1K) [...] 1 * (10 * 11K + 38 * .1k) 1 * (11 * 11K + 44 * .1K) 1 * (15 * 11K + 58 * .1K) 1* (29 * 11K + 113 * .1K) that works out to 2575K, or about a 57% cut. By a rough look at those domains in the middle, I'd say these numbers are good +-10%. What's this mean? Here's the executive summary: The network penalty between SMTP_MAX = 1 (effectively VERP) and any kind of batching (SMTP > 1) is roughly 50%. To get VERP or customized footers or customized anything, you double your network bandwidth. There is very little advantage to setting SMTP_MAX > 5, UNLESS your subscriber base is heavily stratified onto very few sites. If you have really large groups of subscribers on AOL or Hotmail, it can help cut network bandwidth, but at best, it seems to be about a 10% improvement. If you plot the numbers I did on a curve, you can see just how little advantage you get by increasing the number. You get almost all of the advantage by going to 2, and the line past 5 is very flat.... Interesting -- I honestly didn't expect to see THIS big a difference -- I was expecting more like 25-30% increase in bandwidth for a VERP-type delivery. My thoughts on what this means to future directions: Customized messages (VERPing, or encoded unsub URLs, or all of that...) should definitely be an option in Mailman 2.1. I would set Mailman's 2.1 default to have this turned ON, giving us the customized unsub links and etc, but to document this for users so they know to turn it off on slow networks. If users turn it off, I recommend that SMTP_MAX be set by default to 5, and that we document that it makes little sense to change it unless a site is horribly network limited, because even setting to the max only gains them another 10% (and if they're THAT network limited, they're seriously asking for trouble anyway), and only if their subscriber base fits a profile that lends itself to the compression. Setting it large also leaves them open to spamblocking by systems that don't necessarily follow the standards or act right, too. We should ALSO note here that some MTAs (postfix, for instance) might override SMTP_MAX anyway -- you could set it to 100, but postfix might be configured smaller, so they have to be aware of those potential interactions. you then get into the issues of tuning all this, with few delivery threads with lots of addresses vs many threads in parallel.. and all that fun -- I guess I'm trying to say that you can't tune mailman in isolation from the MTA (and down that road lies a huge rathole of attempting to document this stuff...) But from these numbers, any 2.0.x version of mailman should set SMTP_MAX to between 2 and 5, unless they're horribly network limited. it makes no sense to be larger than 5, and it makes no sense to be 1 unless you've done some kind of VERPing patch. for 2.1, we want to implement these customizations and default them on, but with a 50% network hit, we definitely want to make it clear what's going on and make it possible for them to turn it off and return to a generic URL and non-customized e-mail. Barry's mileage may vary on his preferences for default, of course, and it's his show. I think the advantages of the customized URL/email capability is a huge one and most sites will benefit from it -- but the network hit might kill some sites, so we have to give them an easy ability to turn the feature off. What do y'all think? I've included mailman-developers on this reply, since while this started on mm-users, it really ought to be discussed on the developers list...
Converted from the Mailman FAQ Wizard
This is one of many Frequently Asked Questions.