4.56. What's with all these Mailman/python/qrunner processes? Why are they eating up all the memory on my server? (performance-tuning tips)
From the mailman-users mailing list (see http://mail.python.org/pipermail/mailman-users/2004-November/040809.html):
Mailman 2.1.x will have eight or nine qrunner processes constantly in memory, but they shouldn't be running unless they have actual work to do.
Depending on what OS you're using, the system may try to keep everything in memory that it can, so as to make maximum use of what is available. Anything that is not currently running is liable to be paged out in favour of other processes, filesystem/disk caching, etc....
On many systems I'm familiar with, it is not at all uncommon to see what appears to be just a few KB "free", but on closer inspection you discover that most of the memory that is "used" is actually "cache" or "inactive", and therefore available for immediate page-out and re-use by other processes.
Here's a FreeBSD 5.2.1 system I help administer:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND mailman 53524 0.0 0.0 7928 12 ?? Ss Wed02PM 0:00.46 mailmanctl mailman 54142 0.0 1.5 8544 3828 ?? S Wed02PM 0:57.26 VirginRunner mailman 54143 0.0 0.7 7892 1844 ?? S Wed02PM 0:55.32 CommandRunner mailman 54144 0.0 1.7 8592 4252 ?? S Wed02PM 1:03.93 IncomingRunner mailman 54145 0.0 0.4 7892 1064 ?? S Wed02PM 0:00.69 RetryRunner mailman 54146 0.0 0.7 8328 1888 ?? S Wed02PM 0:57.06 NewsRunner mailman 54147 0.0 0.8 8512 2036 ?? S Wed02PM 0:59.44 BounceRunner mailman 54148 0.0 1.1 10180 2784 ?? S Wed02PM 1:18.16 ArchRunner mailman 54149 0.0 1.7 8940 4332 ?? S Wed02PM 1:47.38 OutgoingRunner
On this machine, the first few lines of "top" shows:
last pid: 75984; load averages: 0.00, 0.00, 0.00 up 10+08:30:45 02:13:03 79 processes: 3 running, 76 sleeping CPU states: 14.3% user, 0.0% nice, 23.8% system, 0.0% interrupt, 61.9% idle Mem: 132M Active, 20M Inact, 60M Wired, 6280K Cache, 34M Buf, 25M Free Swap: 513M Total, 73M Used, 440M Free, 14% Inuse
Here's a Debian Linux (kernel 2.4.26) machine I help administer:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mailman 5130 0.0 0.1 5828 2088 ? S Jul06 0:00 mailmanctl mailman 5131 2.0 1.6 54028 34896 ? S Jul06 3807:00 ArchRunner mailman 5132 0.3 0.7 25740 15252 ? S Jul06 606:43 BounceRunner mailman 5133 0.0 0.7 19328 15608 ? S Jul06 73:38 CommandRunner mailman 5134 0.1 0.7 18696 16040 ? S Jul06 305:05 IncomingRunner mailman 5135 0.0 0.3 9212 6840 ? S Jul06 43:38 NewsRunner mailman 5136 2.4 1.1 25316 22816 ? S Jul06 4528:36 OutgoingRunner mailman 5137 0.1 0.7 16828 14500 ? S Jul06 307:12 VirginRunner mailman 5138 0.0 0.0 9624 1848 ? S Jul06 0:03 RetryRunner mailman 19970 0.0 0.0 10184 1592 ? S Aug21 0:00 gate_news
Top shows:
11:16:51 up 129 days, 14:07, 1 user, load average: 0.08, 0.18, 0.22 145 processes: 142 sleeping, 2 running, 1 zombie, 0 stopped CPU states: 71.4% user, 33.0% system, 0.9% nice, 2333.9% idle Mem: 2069316K total, 2020500K used, 48816K free, 53956K buffers Swap: 1951888K total, 191316K used, 1760572K free, 935268K cached
Both of these machines are effectively completely idle (2333% !?!) at the moment, and yet neither of them has a whole lot of memory that, on first glance, would appear to be free. If you really want to find out whether or not you're tight on memory that is actively being used, with your system thrashing about trying to always free up memory from processes that are fighting for the same resources, you need to use other tools to investigate this matter. One good tool is "iostat", another one is "vmstat".
Looking at that FreeBSD machine again, vmstat shows:
% vmstat 1 20 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 1 0 0 500148 49132 58 1 0 0 36 95 0 0 363 0 317 2 1 97 0 0 0 500148 49132 5 0 0 0 5 0 0 0 365 0 304 0 4 96 0 0 0 500148 49132 0 0 0 0 1 0 0 0 357 0 291 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 358 0 284 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 357 0 284 1 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 361 0 298 1 2 97 0 0 0 500148 49132 0 0 0 0 0 0 0 0 369 0 312 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 3 0 369 0 341 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 364 0 296 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 357 0 287 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 361 0 301 2 2 97 0 0 0 500148 49132 0 0 0 0 4 0 9 0 386 0 344 1 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 367 0 301 1 4 95 0 0 0 500148 49132 0 0 0 0 0 0 0 0 360 0 288 1 3 96 0 0 0 500148 49132 0 0 0 0 0 0 0 0 357 0 286 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 0 0 361 0 301 0 2 98 0 0 0 500148 49132 0 0 0 0 0 0 2 0 368 0 312 1 2 98 2 0 0 500148 49132 0 0 0 0 1 0 0 0 358 0 290 2 2 96 2 0 0 500148 49132 0 0 0 0 0 0 0 0 357 0 285 1 3 96 2 0 0 500148 49132 0 0 0 0 0 0 0 0 358 0 289 0 3 97
Looking at the Linux box, vmstat shows:
% vmstat 1 20 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 191316 51020 54252 936864 0 0 14 15 9 19 2 10 18 0 0 0 191316 50940 54280 936880 0 0 4 256 251 329 0 0 99 0 0 0 191316 50828 54284 936900 0 0 8 0 263 422 0 0 99 0 0 0 191316 50780 54288 936916 0 0 8 0 241 414 0 1 99 0 0 0 191316 50016 54292 936932 0 0 4 0 216 345 0 1 99 0 0 0 191316 49244 54292 936940 0 0 0 0 195 248 0 0 100 0 0 0 191316 50588 54316 936956 0 0 0 1160 300 644 3 2 95 0 0 0 191316 50516 54328 936968 0 0 8 64 222 252 1 0 99 0 0 0 191316 50360 54344 936996 0 0 24 0 236 324 0 1 99 0 0 0 191316 49708 54352 936980 0 0 16 0 246 433 1 0 98 0 0 0 191316 50364 54360 936992 0 0 12 0 315 466 0 1 99 0 0 0 191316 48820 54376 937004 0 0 0 272 220 314 0 1 99 0 0 0 191316 49112 54380 936936 0 0 4 0 225 343 1 0 99 0 0 0 191316 50836 54388 936920 0 0 4 0 206 304 3 1 96 0 0 0 191316 50772 54388 936932 0 0 0 0 174 171 0 0 100 0 1 0 191316 50728 54392 936944 0 0 12 0 237 435 0 0 100 0 0 0 191316 50696 54412 936956 0 0 0 220 193 221 0 1 99 0 0 0 191316 50640 54416 936968 0 0 8 0 187 120 0 0 100 0 0 0 191316 49928 54416 936984 0 0 0 0 214 362 1 0 99 0 0 0 191316 50576 54416 936992 0 0 0 0 215 344 0 0 99
For the FreeBSD box, look at the columns for "pi" (page in) and "po" (page out). This machine isn't doing any paging at all, which means that there is no memory pressure. It may appear to be short on memory, but that's only because the system is keeping everything in memory that it can, and it hasn't needed to page anything out that it's got currently loaded. You can also look at the columns for "fr" (free) and "sr" (scan rate). The former is "pages freed per second", and the latter is "pages scanned by clock algorithm, per-second". Both fields show that this system is doing very little in either of these categories, and confirms the conclusions drawn from the pi/po columns.
Using some command-line options to vmstat which are specific to Linux, and doing a comparison/contrast on this same box at a later time, we can see the following:
% vmstat 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 2 975916 32988 140552 920632 0 0 1 1 0 1 22 12 66 0 0 0 975916 32792 140560 920752 0 0 144 556 647 1015 4 1 95 0 0 0 975916 31928 140560 920820 0 0 64 0 1150 724 3 2 94 0 1 0 975916 31764 140568 920832 0 0 16 0 694 523 2 1 97 0 0 0 975916 32508 140572 920852 0 0 8 672 1243 1068 5 3 92 0 0 0 975916 31192 140584 920844 0 0 0 400 550 834 7 3 90 0 0 0 975916 31064 140588 920852 0 0 8 0 403 593 3 1 96 0 0 0 975916 30892 140588 920856 0 0 0 0 447 594 3 1 96 0 0 0 975916 30868 140588 920860 0 0 0 0 463 779 3 1 95 0 0 0 975916 30656 140592 920956 0 0 92 0 417 503 3 1 96 0 0 0 975916 30192 140616 920980 0 0 24 416 370 518 4 1 95 0 0 0 975916 30176 140616 920992 0 0 0 256 364 570 3 1 96 0 1 0 975916 30128 140620 920992 0 0 4 0 292 375 1 1 97 0 0 0 975916 30120 140620 920996 0 0 0 0 350 665 1 1 98 0 0 0 975916 30072 140620 921000 0 0 4 0 282 439 2 2 96 0 0 0 975916 30004 140636 921020 0 0 16 780 237 494 4 2 94 0 0 0 975916 29892 140636 921024 0 0 4 0 235 325 3 0 97 0 0 0 975916 30012 140640 921040 0 0 20 0 322 497 3 2 95 0 0 0 975916 29984 140648 921056 0 0 20 0 360 666 3 1 96 0 0 0 975916 30036 140656 921128 0 0 80 116 410 791 3 1 95 0
And here's what vmstat looks like when given the "-a" argument:
% vmstat -a 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 975916 30164 1082752 229380 0 0 1 1 0 1 22 12 66 0 0 0 975916 30068 1082836 229384 0 0 80 0 333 533 3 1 96 0 0 0 975916 30060 1082856 229388 0 0 12 220 293 507 3 1 97 0 0 0 975916 30080 1082860 229392 0 0 4 0 220 312 3 1 96 0 0 0 975916 30220 1082700 229392 0 0 0 0 198 294 3 1 96 0 1 0 975916 29376 1083532 229396 0 0 44 0 338 654 1 0 99 0 0 0 975916 30336 1082604 229396 0 0 0 0 285 507 4 2 94 0 0 0 975916 30300 1082640 229404 4 0 12 572 275 445 3 2 95 0 0 0 975916 30432 1082496 229408 0 0 12 0 272 440 3 1 96 0 0 0 975916 30396 1082044 229896 0 0 4 0 612 356 3 1 96 0 0 0 975916 31292 1080800 230252 0 0 140 0 781 682 3 2 95 0 0 4 975916 31208 1080892 230260 0 0 76 444 356 572 3 0 97 0 0 0 975916 31352 1080748 230264 0 0 16 40 227 305 3 0 97 0 0 0 975916 31324 1080764 230264 0 0 0 0 337 721 4 2 95 0 0 0 975916 31520 1080584 230264 0 0 0 0 266 442 4 0 96 0 1 0 975916 31520 1080592 230264 0 0 8 0 308 653 2 0 98 0 0 0 975916 31660 1080452 230268 0 0 8 544 269 371 2 1 98 0 0 0 975916 31660 1080452 230284 0 0 4 0 242 366 3 1 97 0 0 0 975916 31840 1080276 230284 0 0 0 0 187 224 3 0 96 0 0 0 975916 31820 1080260 230316 0 0 16 0 289 429 3 1 95 0
In particular, by looking at the "inact" versus "active" columns, you can see that this machine has no memory pressure, and almost all the memory that is used is actually inactive. If you add up the respective columns, it's obvious that this machine has 2GB of memory, of which about 1GB is inactive.
Some additional Linux-specific options to vmstat show some very detailed information:
% vmstat -m Cache Num Total Size Pages kmem_cache 80 80 244 5 ip_conntrack 1963 6513 288 382 tcp_tw_bucket 710 1020 128 34 tcp_bind_bucket 388 678 32 6 tcp_open_request 720 720 96 18 inet_peer_cache 59 59 64 1 ip_fib_hash 9 226 32 2 ip_dst_cache 1344 2352 160 93 arp_cache 2 30 128 1 blkdev_requests 4096 4160 96 104 journal_head 730 2028 48 20 revoke_table 3 253 12 1 revoke_record 226 226 32 2 dnotify_cache 0 0 20 0 file_lock_cache 455 520 96 13 fasync_cache 0 0 16 0 uid_cache 18 452 32 4 skbuff_head_cache 756 888 160 37 sock 682 864 960 215 sigqueue 522 522 132 18 kiobuf 0 0 64 0 Cache Num Total Size Pages cdev_cache 973 1062 64 18 bdev_cache 4 177 64 3 mnt_cache 14 177 64 3 inode_cache 833119 833119 512 119017 dentry_cache 1289340 1289340 128 42978 filp 12297 12360 128 412 names_cache 64 64 4096 64 buffer_head 267637 325280 96 8132 mm_struct 666 720 160 30 vm_area_struct 7463 11720 96 292 fs_cache 661 767 64 13 files_cache 344 441 416 49 signal_act 306 306 1312 102 size-131072(DMA) 0 0 131072 0 size-131072 0 0 131072 0 size-65536(DMA) 0 0 65536 0 size-65536 0 0 65536 0 size-32768(DMA) 0 0 32768 0 size-32768 1 2 32768 1 size-16384(DMA) 0 0 16384 0 size-16384 0 1 16384 0 Cache Num Total Size Pages size-8192(DMA) 0 0 8192 0 size-8192 2 6 8192 2 size-4096(DMA) 0 0 4096 0 size-4096 179 179 4096 179 size-2048(DMA) 0 0 2048 0 size-2048 218 338 2048 130 size-1024(DMA) 0 0 1024 0 size-1024 454 516 1024 129 size-512(DMA) 0 0 512 0 size-512 560 560 512 70 size-256(DMA) 0 0 256 0 size-256 540 540 256 36 size-128(DMA) 0 0 128 0 size-128 961 1230 128 41 size-64(DMA) 0 0 64 0 size-64 150332 150332 64 2548 size-32(DMA) 0 0 32 0 size-32 170140 179218 32 1586
% vmstat -s 2069316 total memory 2038880 used memory 232384 active memory 1080640 inactive memory 30436 free memory 142724 buffer memory 937524 swap cache 1951888 total swap 975916 used swap 975972 free swap 826138426 non-nice user cpu ticks 28477042 nice user cpu ticks 466997502 system cpu ticks 2583888858 idle cpu ticks 0 IO-wait cpu ticks 0 IRQ cpu ticks 0 softirq cpu ticks 1453923144 pages paged in 1620774295 pages paged out 317133 pages swapped in 445086 pages swapped out 131794970 interrupts 245776829 CPU context switches 1130916810 boot time 115549581 forks
To go any further into this topic, you really have to know more about your OS and how to do proper performance monitoring, analysis, and tuning for it. Of course, that is really beyond the scope of this mailing list.
Converted from the Mailman FAQ Wizard
This is one of many Frequently Asked Questions.