LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-11-2023, 09:01 AM   #1
LinuxRSA
Member
 
Registered: Apr 2015
Location: South Africa
Posts: 71

Rep: Reputation: Disabled
Unable to Identify Process Causing Server Failure


Hi All

We running various versions of ubuntu and often experience Linux servers freezing & going into a hung state. After the hard reset is completed the server is back online. The issue we having is identifying what processes or applications has utilized high system resources. Our monitoring tool is limited in terms of identifying Process ID's, it however provides the overall CPU MEM Load usage via graphs.

I need to develop a bash script that records Process id's breaching a certain threshold and write the output to /home/monuser/perfstats.txt to identify the PIDS post server crash.

CPU is 95% +
MEM is 95% +
Load is 25 +


The below command is easy and provides the output we need.

Code:
monuser@monitor01:~$ ps -ax -o pid,pcpu,pmem,user --sort -pcpu --no-headers | head
 2033 31.1  3.7 process-name
 4147 21.1  1.0 root
11512 16.0  3.5 process-name
 2481  9.2  0.5 root
 1718  6.7  0.5 root
13285  3.1 29.7 process-name
 1910  0.9  0.2 root
10869  0.8  0.2 root
 3452  0.7  0.1 root
 3185  0.6  0.2 process-name
monuser@monitor01:~$
We know the 2nd Colum is CPU and the 3rd id MEM

The script below is continuous in running and gathering the information throughout the day however we dont require the entries below 95% for CPU and Mem and Load usage.

Code:
#!/bin/bash
# This script monitors CPU Memory and Load. 

while :
do 
  # Check the current PID and usage of CPU and memory and Load.
  cpuandmemUsage=$(ps -ax -o pid,pcpu,pmem,user --sort -pcpu --no-headers | head)
  loadUsage=$(cat /proc/loadavg)

  # Print the usage
  echo "CPU and Mem Usage: $cpuandmemUsage%" >> /home/monuser/perfstats.txt
  echo "SysLoad Usage: $loadUsage MB" >> /home/monuser/perfstats.txt
 
  # Sleep for 1 second
  sleep 1
done
1. Is there a mechanism OR script we can configure to write to a file when that 2 columns value exceeds 95% and activate the script to start recording the data ?

Is this possible.

TIA

Last edited by LinuxRSA; 10-11-2023 at 09:04 AM.
 
Old 10-12-2023, 03:08 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Quote:
Originally Posted by LinuxRSA View Post
Is there a mechanism OR script we can configure to write to a file when that 2 columns value exceeds 95% and activate the script to start recording the data ?
Why do you care ?. The disk space used will be trivial.

Bash has arithmetic operators, but they are integer only IIRC. Too hard. Just record the lot and parse it later with something like awk. If you stick the date in the filename it will even create a new file for you each midnight. Any daylight saving issues are your own.

There are other (better) options, but if this is your chosen path, make it as easy as possible.
 
Old 10-12-2023, 10:04 AM   #3
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,832

Rep: Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219
Code:
while :
do 
  # Check the current PID and usage of CPU and memory and Load.
  cpuandmemUsage=$(ps -ax -o pid,pcpu,pmem,user --sort -pcpu --no-headers | head)
  # Read the top line and split
  read pid pcpu pmem user <<< "$cpuandmemUsage"
  # Get the integer part and compare
  if [ "${pcpu%.*}" -ge 95 ]
  then
    # Print the usage
    read loadUsage < /proc/loadavg
    echo "CPU and Mem Usage: $cpuandmemUsage%"
    echo "SysLoad Usage: $loadUsage MB"
  fi >> /home/monuser/perfstats.txt
 
  # Sleep for 1 second
  sleep 1
done
 
Old 10-12-2023, 10:39 AM   #4
LinuxRSA
Member
 
Registered: Apr 2015
Location: South Africa
Posts: 71

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by MadeInGermany View Post
Code:
while :
do 
  # Check the current PID and usage of CPU and memory and Load.
  cpuandmemUsage=$(ps -ax -o pid,pcpu,pmem,user --sort -pcpu --no-headers | head)
  # Read the top line and split
  read pid pcpu pmem user <<< "$cpuandmemUsage"
  # Get the integer part and compare
  if [ "${pcpu%.*}" -ge 95 ]
  then
    # Print the usage
    read loadUsage < /proc/loadavg
    echo "CPU and Mem Usage: $cpuandmemUsage%"
    echo "SysLoad Usage: $loadUsage MB"
  fi >> /home/monuser/perfstats.txt
 
  # Sleep for 1 second
  sleep 1
done
Thanks, will try this.

I have a secondary plan to use my monitoring tool to read the output of a command.

Problem is i cant get to display the output as i need it to display.

This is the output

Quote:
monuser@monitor01:~$ ps -ax -o pid,pcpu,pmem,user --sort -pcpu --no-headers | head
2033 31.1 3.7 process-name
4147 21.1 1.0 root
11512 16.0 3.5 process-name
2481 9.2 0.5 root
1718 6.7 0.5 root
13285 3.1 29.7 process-name
1910 0.9 0.2 root
10869 0.8 0.2 root
3452 0.7 0.1 root
3185 0.6 0.2 process-name
monuser@monitor01:~$
How can i get ps -ax -o pid,pcpu,pmem,user --sort -pcpu --no-headers | head to display only the PID and USER into one column ?

Can we combine the PID and User in the same column?

Something like this

Code:
2033
process-name
4147 
root
11512
process-name
Thanks
 
Old 10-12-2023, 06:36 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
I wonder how long it will be before you regret throwing all the historical data away, so you now have no way of doing what-if queries around the time of any event.
 
Old 10-12-2023, 08:30 PM   #6
uteck
Senior Member
 
Registered: Oct 2003
Location: Elgin,IL,USA
Distribution: Ubuntu based stuff for the most part
Posts: 1,177

Rep: Reputation: 501Reputation: 501Reputation: 501Reputation: 501Reputation: 501Reputation: 501
Have you looked at the sar command? It does a lot of this and more.
https://www.baeldung.com/linux/sar-system-statistics
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Analysing syslog to identify monitor freezing failure helen314 Linux - Newbie 7 07-08-2019 11:08 PM
[SOLVED] Failure after failure after failure.....etc 69Rixter Linux - Laptop and Netbook 5 04-14-2015 09:58 AM
LXer: Identify PCI and USB Wired and Wireless Driver in Linux – Identify PCI Driver. Ubuntu, Debian, LXer Syndicated Linux News 0 08-20-2014 07:21 AM
How to read "identify" button press event, or state of "identify" blue led with IPMI? iav Linux - Server 0 01-27-2009 01:13 PM
identify application causing network traffic maenho Linux - Software 2 03-03-2005 09:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 01:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration