LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 07-12-2022, 09:17 AM   #1
msutic
LQ Newbie
 
Registered: Jul 2022
Posts: 5

Rep: Reputation: 0
How to verify if the node reboot was triggered by software watchdog?


Hello,

I have created software watchdog using command:
Code:
$ sudo modprobe softdog soft_margin=60
And in OS logs I have seen message:
Code:
[    3.757002] softdog: Software Watchdog Timer: 0.08 initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
When I trigger node reboot using command below, node reboots, but there is no information in the log that softdog triggered reboot:
Code:
echo a | sudo tee /dev/watchdog
If software watchdog is created using soft_noboot=1 option, just to verify softdog without actual reboot, then information is logged:
Code:
softdog: Triggered - Reboot ignored

Based on the softdog implementation we should have log:
https://github.com/spacex/kernel-cen...hdog/softdog.c

Code:
static void watchdog_fire(unsigned long data)
	{
		if (soft_noboot)
			pr_crit("Triggered - Reboot ignored\n");
		else if (soft_panic) {
			pr_crit("Initiating panic\n");
			panic("Software Watchdog Timer expired");
		} else {
			pr_crit("Initiating system reboot\n");
			emergency_restart();
			pr_crit("Reboot didn't ?????\n");
		}
	}

OS: CentOS Linux release 7.9.2009 (Core)
Linux test1 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

It looks like node was rebooted before log info was written to disk.
Can I somehow verify that node was rebooted by software watchdog?

Thank you

Last edited by msutic; 07-22-2022 at 01:54 AM.
 
Old 07-20-2022, 08:02 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,160

Rep: Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266
also check what is reported by the "last" command for reboot reason.
 
Old 07-21-2022, 01:18 AM   #3
msutic
LQ Newbie
 
Registered: Jul 2022
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by smallpond View Post
also check what is reported by the "last" command for reboot reason.
With "last" command I can find when reboot happened and was it graceful, but cannot find the reason.

I have inspected audit logs and configured persistent systemd journal with journalctl but still no information that watchdog/softdog triggered reboot.
 
Old 07-21-2022, 02:29 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,041

Rep: Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348
At first I would try to check if the triggered reboot will create any log event anywhere. And you will know if that last reboot was really triggered by it or something else happened.
 
Old 07-22-2022, 01:50 AM   #5
msutic
LQ Newbie
 
Registered: Jul 2022
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pan64 View Post
At first I would try to check if the triggered reboot will create any log event anywhere. And you will know if that last reboot was really triggered by it or something else happened.
When I trigger reboot manually, intentionally writing something to "/dev/watchdog" device I see message "watchdog: watchdog0: watchdog did not stop!".
But, when watchdog triggers restart when node is overloaded or when used with Patroni cluster (timer not updated) then there is no any info.

This looks as a bug to me. If "Initiating system reboot" is supposed to be logged before "emergency_restart()" then I would expect this info - at least when node is idle.
Otherwise what is the purpose of this log message?!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
node recovery when the node becomes fault by using another node to replace it jerinc Linux - Wireless Networking 0 02-21-2014 05:44 PM
How to transfer the services from node 1 to node 2 ,if node 1 is directly turned off sankarg304 Linux - Server 1 12-12-2012 10:06 AM
Intel's watchdog support iTCO_wdt - does this mean watchdog is not present? kushalkoolwal Linux - Hardware 3 02-06-2009 03:16 PM
Hardware watchdog in BIOS and Linux watchdog driver are different? travishein Linux - Hardware 1 12-22-2008 09:41 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 03:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration