Nagios/nrpe: SSL Issues
I have set up two machines running Fedora Core 2. Nagios is installed and setup on both, with WebMIN and NagMIN. I am setting up a failover monitoring system, so if one Nagios machine goes down, the other one will find out and start monitoring the network.
In the Nagios documentation, it suggests using Nrpe (Nagios Remote Plugin Executor). A cronjob will call check_nrpe to tell the other Nagios machine to check the disk space or something through port 5666, where the host nrpe daemon is listening. When it returns with the data, the backup machine knows everything is fine, so it'll waits about 5 minutes and do it again, until it receives no response. At that time, the backup Nagios system will takeover the monitoring. Where I am running into a problem is with SSL (I think). I have not written the cron job; I've just been running check_nrpe from the terminal. From the backup machine, I get the message Connection refused. This message appears immediately, which at first lead me to believe the network wasn't working. But the network is working fine. When I run the exact same check_nrpe line from the terminal of the primary machine, it says: CHECK_NRPE: Error - Could not complete SSL handshake I checked out the faq on the Nagios website (nagios.org). And they had this to say: =================================== Solution: This error message could be due to several problems: 1. Different versions. Make sure you are using the same version of the check_nrpe plugin and the NRPE daemon. Newer versions of NRPE are usually not backward compatible with older versions. 2. SSL is disabled. Make sure both the NRPE daemon and the check_nrpe plugin were compiled with SSL support and that neither are being run without SSL support (using command line switches). 3. Incorrect file permissions. Make sure the NRPE config file (nrpe.cfg) is readable by the user (i.e. nagios) that executes the NRPE binary from inetd/xinetd. 4. Pseudo-random device files are not readable. Greg Haygood noted the following... "After wringing my hair out and digging around with truss, I figured out the problem on my Solaris 8 boxen. The files /devices/pseudo/random* (linked through /dev/*random, and provided by Sun patch 111238) were not readable by the nagios user I use to launch NRPE. Making the character devices world-readable solved it." Dave van Nierop added that "Fortunately, for HPUX 11.i (11.11) and later Nagios users, HP now supports /dev/random and /dev/urandom via a kernel loadable module. Prior to running the NRPE 2.0 configure script, you will need to download this program from [HAD TO REMOVE URL FOR POST] Installation does require a server reboot. For detailed information, consult [REMOVED URL TO POST] =================================== Now, I am pretty well a newbie. Though I think I have a good grasp of the problem, I am not aware of the possible solutions. I have checked the permissions, and they are fine. The program is the same version as the check_nrpe plugin. I would assume SSL would be set up by following the install instructions that come with it, since SSL is required to use it. And when I downloaded HPUX, it was a .depot file, which I am unaware how to open it. I'm very frustrated, and I cannot figure out how to progress. Any insight would be greatly appreciated, or at least tutorials or faqs that would point me in the right direction. |
Killbot_5000,
I have no knowledge of nagios it self, but: From the backup machine, I get the message Connection refused. This probably means that the NRPE daemon is either not running, or not listening to the IP addess/port. Run "netstat -lp" on the primary machine and look for the daemon and the port. If you don't see it, make sure the all the programs are being started. Another option is that the firewall is blocking this. 2. SSL is disabled. Make sure both the NRPE daemon and the check_nrpe plugin were compiled with SSL support and that neither are being run without SSL support (using command line switches). I'm assuming that you installed nagios by rpm. You can run "ldd <your daemon>" to see if it links against libssl. You should also check the init script /etc/init.d/<something like nagios> (I'm not sure of the exact name, or if it even has an init script) or in your inetd configuration. Reread your configuration file to make sure SSL isn't being disabled. You don't need the HPUX binary. It's for another operating system. Good Luck, chris |
Is the mod_ssl package installed in your Fedora Core 2?
|
I did what you said
It does not make a reference to libssl when I ldd nrpe. Nrpe comes with README.SSL, which states:
Quote:
If anyone wants to download and install NRPE, which is only like 50k, I would greatly appreciate it. I don't believe Nagios has to be running for it to work, only if you want it to be useful. I appreciate any help I can get. I'm stressing about this now, because I can't set up my distributed monitoring, so I had to set up a failover monitoring system by using a shell script that pings the nrpe box to see if its live. Of course, that won't do any good if Nagios crashes and the computer doesn't go down, or if its turned off. |
mod_ssl?
How do I check that? I have ssl installed, but I don't know if mod_ssl is something I could have overlooked. I have Fedora Core 2 installed with SSL installed from the OS installation. I also downloaded the latest open_ssl to confirm that my current package was up to date.
So my answer to that is: I have open_ssl installed. If I didn't have to do anything special to installed mod_ssl, then its installed. If I would know if I had it installed (like I would have had to do, well, something special), then I definitely don't. I hope I didn't confuse anyone with that little smidgen of half-mindless yammering. |
OK,
You don't need mod_ssl, as it is the apache 1.x ssl module. You don't have to worry about this until you secure the nagios web interface. I don't think nrpe uses apache. A "Connection refused" error would happen earlier in the process than a "SSL handshake" error. Unfortunately I don't know how check_nrpe would report the error as they can be related, see https://sourceforge.net/mailarchive/...msg_id=9046720 for a reverse problem. Now on to your "SSL handshake" problem. If nrpe does not link against anything related to ssl, then it probably doesn't use ssl (it could be static linked). Try turning off ssl for check_npre to see if that works. Your problem of "Connection refused" means nrpe is not listening to the expected IP:port combination. How is nrpe started, from inetd? run netstat -l to see what addresses it is listening to. You should see one with either your server IP address or 0.0.0.0. Good Luck, chris edit: turn of smilies add postscript PS: I believe anon-DH is susceptible to a man in the middle attack, make sure to secure your media. But so is a default ssh configuration, whatta ya goin' to do? |
Although I followed the instructions that came with NRPE, I think it may not be running.
When I do a chkconfig --list, it shows under Xinetd as being ON. When I ps aux | grep nrpe, it doesn't show up. When I try to start it manually, it does not produce any message, but I cannot grep the process either. When I netstat -l, it doesn't show on the list. Quote:
I have followed these instructions exactly. Anything you guys can think of that would stop it from running? Any help is greatly appreciated, as I feel I am making headway due to these suggestions. |
Ok, this is what I have found:
If NRPE is running (?) under Xinetd, I receive the message "Connection refused by host" and nothing is listening to port 5666 under netstat -lp. If I chkconfig nrpe off, I can then start it as a standalone demon (./nrpe -c nrpe.cfg -d). If I netstat -lp, nrpe is listening to 5666. If I run ./check_nrpe now, I get "Cannot complete SSL handshake". So I have two separate issues. I want NRPE to run under Xinetd, and I want to fix this SSL error. |
If "ldd npre" does not show libssl then the npre binary does not support a SSL connections (99% of the time). To verify, run the nrpe and check_nrpe programs w/o any arguments. This should print out something like:
$ ./nrpe NRPE - Nagios Remote Plugin Executor Copyright (c) 1999-2003 Ethan Galstad (nagios@nagios.org) Version: 2.0 Last Modified: 09-08-2003 License: GPL with exemptions (-l for more info) SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required Notice SSL/TLS Available: as the should both have this. If it isn't there, then you will need to rebuild the program. A default build of nrpe seems to find the ssl libs/headers just fine. Is this part of Fedora or downloaded seperately? For the xinetd problem (not starting npre), you should post your entry in xinetd (/etc/xinetd/nrpe). Is xinetd running? Try /etc/init.d/xinetd status or /etc/init.d/xinetd restart Good Luck, chris |
Hi,
I have set-up nagios: If you just want to monitor your nagios server , there is no need for nrpe. The only time that you might want to use nrpe is if you are monitoring services on other servers. What I Did: Download the following packages: nagios-nrpe_2.0.orig.tar.gz ( suse has these on it's cd's) nagios-plugins-1.3.1.tar.gz When installing nagios-nrpe: #cd nrpe.20 #./configure --enable-command-args **** There is issues on security, but im using authentication and im useing firewalls, so its ok for me #make all #mkdir /etc/nagios #cp src/nrpe /etc/nagios #cp nrpe.cfg /etc/nagios #vi nrpe make sure that the following is there: server_port=5666 allowed_hosts=127.0.0.1,<nagios-central-servers-ip> *************un-comment all of the following, this will be right at the bottom of the file:****************************** command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$ command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$ command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ MAKE SURE THAT MOST SSL PACKAGES ARE INSTALLED: Now start nrpe: #/etc/nagios/nrpe -d -c /etc/nagios/nrpe.cfg Now for the central server: TESTING: on the linux server pc, tail the log file!!!!!!!!!!!!! go to the central nagios server and do the following: #telnet <linux-server-ip> 5666 if you get SUSE9:~ # telnet 192.96.150.10 5666 Trying 192.96.150.10... Connected to 192.96.150.10. Escape character is '^ Then you know that the remote server is listening, then try the following: SUSE9:~ # /usr/lib/nagios/plugins/check_nrpe -H 192.96.150.10 -p 5666 -c "check_procs" -a 100 130 RSZDT OK - 74 processes running with STATE = RSZDT POSSIBLE ERRORS, when i run the same check from one of my other nagios servers, this is what i get: g4t3d:~ # /usr/lib/nagios/plugins/check_nrpe -H 192.96.150.11 -p 5666 -c "check_procs" -a 100 130 RSZDT CHECK_NRPE: Error - Could not complete SSL handshake. proxy:/var/log # tail -f messages Sep 2 10:33:22 proxy nrpe[25690]: Host 192.96.150.214 is not allowed to talk to us! The reason is obvious, in the nrpe.cfg file, the allowed hosts doesnt have my second nagios server ipaddress. I hope that this is enough help, if there are any questions , i should be able to answer anyother questions regarding this issue. |
Why dont you just run nrpe as a daemon?
|
I got nrpe working before the latest post. Thanks for all the help provided to me. I do have a question to nsi-f34r:
Do you have problems with the permissions on the external command file? Unless I set the permissions to 777, I can't do anything in Nagios that requires use of the external command file. I have changed the permissions on this file repeatedly. I have set the owner/group to root, nagios, and nagadmin (the web-login for nagios) and everytime it tells me that the permissions are wrong and to go back from whence I came. Of course, if I set the permissions wide open, it works, and even though its not on a public ip of any sort, I still don't feel comfortable leaving it wide open. I know this issue is COMPLETELY unrelated, so if you want to respond to it, go to this post on the squareBOX Nagios forum: http://alpha.square-box.com/index.ph...art=0#msg_6856 And thanks for the help! |
nrpe
killbot, what did you do to get it working. i followed nsi-f34r's post and i am now getting the ssl handshake error. i was getting connection refused and when i checked nrpe was not listening. it is now and the ssl error.
|
I got nrpe working by recompiling it without SSL support. See the SSL readme file that came with it.
That's honestly all I did differently from his post, because I was getting the same exact thing. Also, make sure the port you are using is open and the ip address is authorized in your nrpe.cfg file. If you need more info on it, then I'll have to post again tomorrow, when I am back at the office so I can look at my notes. |
Hi! I installed nrpe in one machine, but I try install in other with the same steps and is impossible. I read /var/log/messages, but don't show me nothing
I read /var/log/daemon.log, in this archive send this message "Unable to open config file '//nrpe.cfg' f or reading. Config file '//nrpe.cfg' contained errors , bailing out..." I check the permissions and don't have problem. When I try run the command send "CHECK_NRPE: Error - Could not complete SSL handshake." What can I do? |
All times are GMT -5. The time now is 12:18 AM. |