Watchdogd currently disables the watchdog when it exits, such as during
rc.shutdown processing. That leaves the system vulnerable to getting hung
or deadlocked during the shutdown part of a reboot. For embedded systems
it's especially important that the hardware watchdog always be active. It
can also be useful for servers that are administered remotely.
The new -x <seconds> option tells watchdogd to program the watchdog with the
given timeout just before exiting. The -x value can be longer or shorter
than the -t normal time value, to allow for various exceptional conditions
at shutdown such as allowing extra time for buffer flushing.
The exit value is also used internally in the "failsafe" handling (which
used to just disable the watchdog), on the theory that if you're using this
option, "safe" means having the watchdog always running, not disabled.
The default is still to disable the watchdog on exit if -x is not specified.
Differential Revision: https://reviews.freebsd.org/D2556 (timed out)
Previously, we have a nap interval of 1 second while we have a timeout of
128 seconds by default, which could be an overkill, and for some hardware
the patting action may be expensive.
Note that the choice of nap interval is still arbitrary. We preferred
a safe value where even when the system is very heavily loaded, the
watchdog should not shoot the system down if it's not really hung.
According to the manual page of Linux's watchdog daemon, the nap interval
time of theirs is 10 seconds, which seems to be a reasonable value --
according to Intel documentation AP-725 (Document Number: 292273-001),
ICH5's maximum timeout is about 37.5 seconds, which the ichwd(4) driver
would set when we requested 128 seconds (although it should probably
feed back this as an error and do not set the timeout). Since that's
the shortest maximum value, 10 seconds seems to be a right choice for
us too.
Discussed with: alfred
MFC after: 1 month
The original API calls for pow2ns, however the new APIs from
Linux call for seconds.
We need to be able to convert to/from 2^Nns to seconds in both
userland and kernel to fix this and properly compare units.
don't carp about the watchdog command taking too long until after the
watchdog has been patted, and don't carp via warnx(3) unless -S is set
since syslog(3) already logs to standard error otherwise.
Discussed with: alfred
Reviewed by: alfred
Approved by: emaste (co-mentor)
The following support was added to watchdog(4):
- Support to query the outstanding timeout.
- Support to set a software pre-timeout function watchdog with an 'action'
- Support to set a software only watchdog with a configurable 'action'
'action' can be a mask specifying a single operation or a combination of:
log(9), printf(9), panic(9) and/or kdb_enter(9).
Support the following in watchdogged:
- Support to utilize the new additions to watchdog(4).
- Support to warn if a watchdog script runs for too long.
- Support for "dry run" where we do not actually arm the watchdog,
but only report on our timing.
Sponsored by: iXsystems, Inc.
MFC after: 1 month
This uses the recently-added jemalloc(3) feature of setting the lg_chunk
tuning option to zero to request that memory be allocated in the smallest
chunks possible. Without this option, the default is to initally map 8MB,
and then the mlockall() call wires that entire allocation even though the
program only uses a few Kbytes of it at runtime.
PR: bin/173332
Approved by: cognet (mentor)
On machines with huge amount of swap and high IO activity,
watchdogd(8) may wait for a swap memory longer than timeout and
sometimes fires.
Approved by: kib (mentor)
MFC after: 1 week
I was considering committing all these patches one by one, but as
discussed with brooks@, there is no need to do this. If we ever
need/want to merge these changes back, it is still possible to do this
per application.
generic watchdoc(9) interface.
Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.
Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
- Reordered #includes
- Only include <sys/types.h>, not it and <sys/cdefs.h>
o style.Makefile(5) fixes
- No SRCS= line when only one src file with same name as program
o Use warn()/errx() instead of fprintf()
- Integrated patch from Philippe Charnier <charnier@xp11.frmug.org>
Approved by: jeff (mentor)
This commit has two pieces. One half is the watchdog kernel code which lives
primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland
daemon which, when run, will keep the watchdog from firing while the userland
is intact and functioning.
Approved by: jeff (mentor)