LinuxQuestions.org - Email notification if remote CPU is down?

- Linux - Enterprise (https://www.linuxquestions.org/questions/linux-enterprise-47/)

- - Email notification if remote CPU is down? (https://www.linuxquestions.org/questions/linux-enterprise-47/email-notification-if-remote-cpu-is-down-182482/)

Email notification if remote CPU is down?

All:

I am looking for a Linux application I can run (command line preferred) that occasionally checks to see if a remote Linux PC is still running or on-line (i.e., powered on, not crashed; WAN is up) and emails me (to cellphone) if any one of these things fail after so much time (minutes, hours, etc.) or a certain number of retries.

For my purposes, simply a response to a ping command would likely be fine, as long as an email is only sent upon a "change of state" (PC/WAN down, and PC/WAN up).

I am running RH Linux V9 (command line; no GUI) on all PC's. I could run the "Master" application on the PC here which is currently configured as a firewall, and communicate with maybe 2-3 remote PC's that are sitting at other locations behind typical Linksys routers/firewalls. Each of the PC's have dynamic IP's but are configured to use a dynamic DNS service (dyndns.org) - so each has a fixed hostname (URL).

I am relatively new to Linux (3 yrs) - but I can install RPM's, "hack" script files, write some if necessary. But writing something like this is beyond me...

Is there something built-in to RH Linux 9 (SNMP???) that could be used?

Thanks!

intermod

Looks like you're a prime candidate for snmp monitoring.

There are basically two ways you can do it, either have each box send SNMP traps to a central "listener" and have the listener expect certain information from each box on a certain interval (and alert if an expected notification wasn't received), or you can have the central box "poll" each client for information and generate an alert if it can't poll a particular machine.

Since SNMP can also issue remote commands, there are some very real security risks in allowing a box to be polled and thus it's probably a better idea to simply send traps on a regular interval, unless there are some specific things you need to do that required polling. If you go the route of polling, make sure to read up on the security implications.

It takes some doing to get it setup, so read up and test it first before rolling out.

You could use SNMP or you could write a very small Perl script that will ping and mail. Something like this, maybe:

Code:

#/bin/perl -w

$pid = fork;

exit if $pid;

die "Couldn't fork: $!" unless defined($pid);

 

use Mail::Sender;

use Net::Ping;

use strict;

 

our ($p, $host, $location, $sender, @date);

our $my_addr = "your.IP.address.here";

our $sig = 1;  #Never fully implemented this.

our %host_hash = (



"server1" => "/vmware/S1",

"server2" => "/vmware/S2",

"server3" => "/vmware/S3"



);

 

do {

@date = localtime;

print "$date[2]:$date[1]:$date[0]\n";

sleep (300);

 

Ping();

 

} while ($sig lt 1000000);

 

 

 

#####  Define subroutines here:  #####

 

sub Ping {

        $p = Net::Ping->new();

        while ( ($host, $location) = each %host_hash ) {

            if ($p->ping($host)){

                print "$host is reachable!\n";

            }

 

            else {

                Died();

                Restart(); #<- Remark this line out.

            }

        }

        $p->close();

}

 

 

sub Died {

              print "Arghhh! $host died!\n";

              $sender = new Mail::Sender {

                      smtp => 'mail.host.org',

                      from => 'account@machine.host.org',

                      on_errors => undef,

              }

                      or die "Can't create the Mail::Sender object: $Mail::Sender::Error\n";

              $sender->Open({

                      to => '1234@my.cell.phone.com',

                      subject => "$host"

              })

                      or die "Can't open the message: $sender->{'error_msg'}\n";

              $sender->SendLineEnc("$host is not responding.  I will attempt to restart it.  You will be paged again if it does not restart.");

              $sender->Close()

                      or die "Failed to send the message: $sender->{'error_msg'}\n";

              sleep(5);

}

 

 

sub Restart {

 

system "$location/$host stop";

sleep (45);

system "$location/$host start";

sleep (180);

 

}

I know, cheesy code, but what do you expect for a 3 AM quickie script so I could get some sleep? I did this to monitor VM's running on the box and page me if they died, then restart them. You can just add IP or host names to hash, drop the second part of each entry and remark out the call to the restart section.

Thanks! I will try the scripts for the first fix, and start looking at SNMP as a long-term fix.

Thanks for the quick response!

Greg

Nagios might be another option to consider, although it's probably a little bit more than you need at this point. The basic check-host-alive test is a simple ping test. If you decide to start monitoring services at a later date, then you've already got a tool in place.