LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-12-2009, 12:54 AM   #1
apatil
LQ Newbie
 
Registered: Aug 2009
Posts: 4

Rep: Reputation: 0
Nagios escalating prematurely


We are currently using Nagios 1.3 .The issue we facing is , when a alert is in Warning state and then from Warning it moves to Critical state ,the alert is escalated directly to L2,L3 L4 escalations,here nagios assumes that the time period ,the alert was in warning state as unacknowledged time (even when it is acknowledged ),and it follows the L2 ,L3 escalation path depending on the time we have defined for the esclations.

Is this a bug or a feature of nagios and is there a way to fix this problem as escalation to TOP management is major thing because it is direct impacting to our SLAs.
 
Old 08-12-2009, 01:01 AM   #2
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
Hello,

The first thing I'd consider is upgrading to the current stable version which is 3.1.2. Between the version 1.3 and this current version there are A LOT of bugs and problems fixed. So if you consider it a major thing in your environment then I advice you to upgrade as soon as possible.

Kind regards,

Eric
 
Old 08-12-2009, 01:33 AM   #3
apatil
LQ Newbie
 
Registered: Aug 2009
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks for your reply but the thing is that, we have also found this issue in 2.9 version.
And if this was a bug or something,more users would have reported this ,but it seems that only a very handful users have this problem.So was wondering if there is any configuration changes that needs to be done.
 
Old 08-12-2009, 01:46 AM   #4
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
Hello,

Never worked with the version you have installed so not sure if what I say is applicable to that 'blast of the past' but have you checked your host_ en service_escalation settings along with the contact commands. I had a 'double notification' issue some time ago that I traced back to a faulty (double) configuration in the escalation trees. So might be a good starting point.

Kind regards,

Eric
 
Old 08-12-2009, 01:54 AM   #5
apatil
LQ Newbie
 
Registered: Aug 2009
Posts: 4

Original Poster
Rep: Reputation: 0
The problem here is when a alert is in warning state and is also acknowledged,but still when it moves to critical state the escalation to top level is triggered.
When the alert is directly in critical state then the nagios follows the correct escalation path..

Problem is when alert moves from warning to critical
 
Old 08-12-2009, 02:14 AM   #6
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
So if I understand you correctly, when a warning state is activated the right contacts get notified, but when the same issue 'evolves' into critical then the wrong people get notified?

How have you defined your contacts and groups and how are they referenced in your escalation trees?

Since Nagios has a lot of options in various config files, it's very easy to oversee one option that's out of place. When you refer to the notifications, those can be activated by contact, contact group, escalation (host and/or service), escalation tree (host / service), service level, host level, even with a command directly.

The thing with acknowledgment and such is not quite working in the current version I'm using either so might be that it gets all mixed up with that combination. In the version I use if I acknowledge a problem it will still send out notification although I have it configured not too. I just checked it and still is erroneous.

But I'd first check and compare your contacts, escalations and related to see if it's no double config.

Kind regards,

Eric
 
Old 08-12-2009, 03:05 AM   #7
apatil
LQ Newbie
 
Registered: Aug 2009
Posts: 4

Original Poster
Rep: Reputation: 0
Following are the configuration files

contacts.cfg

define contact{
contact_name alerts
alias Alerts Inbox
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email alerts@XXX.com
}

define contact{
contact_name admin
alias admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email,host-notify-by-epager
email admin@XXX.com



contactgroup

define contactgroup{
contactgroup_name all-admins
alias All Administrators
members alerts,admin
}

define contactgroup{
contactgroup_name L1-Escalation
alias First Escalation
members admin-pager,alert
}

define contactgroup{
contactgroup_name L2-Escalation
alias Second Escalation
members admin-pager
}

define contactgroup{
contactgroup_name L3-Escalation
alias Third Escalation
members manager
}

define contactgroup{
contactgroup_name Mgmt-Escalation
alias Third Escalation
members Sysadmin
}



Escalation

define serviceescalation{
host_name test
service_description PING
first_notification 2
last_notification 0
notification_interval 5
contact_groups L1-Escalation
}

define serviceescalation{
host_name test
service_description PING
first_notification 3
last_notification 0
notification_interval 5
contact_groups L2-Escalation
}

define serviceescalation{
host_name test
service_description PING
first_notification 4
last_notification 0
notification_interval 5
contact_groups L3-Escalation
}

define serviceescalation{
host_name test
service_description PING
first_notification 5
last_notification 0
notification_interval 5
contact_groups Mgmt-Escalation
}


Service

define service{
use generic-service ; Name of service template to use
host_name test
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
contact_groups all-admins
notification_interval 5
notification_period 24x7
notification_options w,u,c,r
check_command check_ping!100.0,20%!500.0,60%
}
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
poll() times out prematurely eunmind Linux - Server 0 10-03-2006 01:13 PM
linux SU command closing script prematurely. help please? Frelov Programming 4 02-20-2006 01:14 PM
qmail prematurely closes connection when trying to telnet... ivj Linux - Software 0 07-25-2004 04:47 AM
while loop ending prematurely in a bash script meat-head Programming 7 05-08-2004 01:46 AM
Question about escalating privelages amp2000 Linux - Security 2 02-06-2002 08:02 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration