Out of memory (OOM killer) - what is causing my memory issue?
Hi,
I would be happy to get some help on this as I cant find the issue. I try to find the cause why my server gets out of memory every few weeks ans calls the OOM killer. It seems to be that memory usage is stable for about 2 weeks, then going up gradually for 2 weeks. Then there is a big hike resulting in an OOM call. http://i.stack.imgur.com/PNIO5.jpg The memory usage just before the hike: Code:
Wed Jun 3 08:50:01 EDT 2015 Code:
Wed Jun 3 09:10:02 EDT 2015 System calling OOM: Code:
1 Time(s): /usr/sbin/spamd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 After a reboot, memory usage drops and system is running well for a few weeks. Code:
COMMAND %MEM Thanks a lot! |
The problem looks to be with Spam Assassin. I would first suggest to check if you are running perl with cpan. That is, just type in the command "cpan" and see if it responds, you can exit by just typing "exit". If that all checks out then verify he spamd modules are updated. Inside cpan type "install Mail::SpamAssassin::Conf" without the quotes. It will either say it is up to date or start the install. If the install fails then run "reports Mail::SpamAssassin::Conf" to hopefully gleam more information. If it fails or not, run a forced update on spamd. If the cpan install did fail, start asking for further support on that specific issue, but it is beyond what I would want to try to detail here.
|
Thanks!
That is what I got: Code:
cpan[3]> install Mail::SpamAssassin::Conf Code:
# spamassassin -V Code:
COMMAND %MEM |
Looks like you're trying to d oa fair amount of stuff with not much resource. While it looks like perhaps spamd has a memory leak or just requires more memory as time goes on, this could go away with just a little more resource (RAM).
What do the OOM messages in dmesg or /var/log/messages look like? |
The problem is not (necessarily) with spamassassin. Please post the full output of the oom log message. The last line looks like
"kill process <No> (<name>) - score <oom score> - or sacrifice child" The message you posted "spamd invoked oom killer" means the system was out of memory exactly when spamd tried to reserve a new chunk of memory. This does not mean that spamd is the process having the memory leak. You wrote "the process TOR was killed". From my experience, if exactly one process has a big memory leak, the oom killer is good in identifying and killing the correct process. So maybe tor has the memory leak, but as said, please post the full message. |
^^ This is exactly where I was going.
The OOM message should tell us what order of memory is lacking, and by how much, and you can also run 'cat /proc/buddyinfo' during one of these events and the following details will help us, as well: # cat /proc/buddyinfo Node 0, zone DMA 1 1 1 0 2 1 1 0 1 1 3 Node 0, zone DMA32 5759 6204 1281 334 139 99 26 3 2 4 18 Node 0, zone Normal 80447 90383 12328 1965 852 65 29 21 9 8 25 |
I think you have far too little swap space configured to allow reasonable diagnosis of the real problem.
If you had more swap space, instead of the OOM killer being called, performance would only degrade slowly as the problem expands and the problem would then be much more obvious BEFORE the OOM killer starts destroying evidence. |
My favourite OOM analogy - here
|
Quote:
|
I agree that spamd caused the OOM killer call but I don't think its the root of the problem. I did run a script every 10 minutes to track the memory usage of the top ten processes. All top ten processes were more or less stable.
I also don't think that low resources are my problem. I normally swapped around 80MB in average and that was very stable and I had no peaks. I also don't think that the process TOR is the cause as its memory usage is also stable as I can see in my top ten statistic. Its normal that OOM will kill TOR in this case. Its on the top and I prefer to loose this process than apache or mysql. Here is the kernel output during the OOM call. Maybe someone can see something useful in it: http://pastebin.com/rBvbFcyt I don't want to increase my swap space for trouble shooting. To get some more info, I run a script now every minute and append memory statistics for all running processes to a file (not only top 10). I hope that will give more hints when it happens next time. |
gombi: when you have another incident, it will be good to see /proc/buddyinfo. Sometimes OOM events have nothing to do with RAM, but with some of the zoned memory, instead.
|
Quote:
|
Try adding in /etc/sysctl.conf the following line:
vm.min_free_kbytes = 65536 then run: sysctl -p or reboot your machine. |
Quote:
I have it already set to: Code:
# sysctl vm.min_free_kbytes |
Just to add some of my thoughts to the conversation, I somewhat agree with the other posts about proper diagnostics, but at the same time the perl modules needing to be manually updated is a known issue. As such I tend to try to rule out possible known issues to resolve conflicts to see if that fixes the issue before committing time to do further investigation. Habit from work environments that are strict on productivity levels.
|
All times are GMT -5. The time now is 04:43 PM. |