Commit Graph

44 Commits

Author SHA1 Message Date
Chuck Silvers
f0f3e3e961 ipmi: use a queue for kcs driver requests when possible
The ipmi watchdog pretimeout action can trigger unintentionally in
certain rare, complicated situations.  What we have seen at Netflix
is that the BMC can sometimes be sent a continuous stream of
writes to port 0x80, and due to what is a bug or misconfiguration
in the BMC software, this results in the BMC running out of memory,
becoming very slow to respond to KCS requests, and eventually being
rebooted by its own internal watchdog.  While that is going on in
the BMC, back in the host OS, a number of requests are pending in
the ipmi request queue, and the kcs_loop thread is working on
processing these requests.  All of the KCS accesses to process
those requests are timing out and eventually failing because the
BMC is responding very slowly or not at all, and the kcs_loop thread
is holding the IPMI_IO_LOCK the whole time that is going on.
Meanwhile the watchdogd process in the host is trying to pat the
BMC watchdog, and this process is sleeping waiting to get the
IPMI_IO_LOCK.  It's not entirely clear why the watchdogd process
is sleeping for this lock, because the intention is that a thread
holding the IPMI_IO_LOCK should not sleep and thus any thread
that wants the lock should just spin to wait for it.  My best guess
is that the kcs_loop thread is spinning waiting for the BMC to
respond for so long that it is eventually preempted, and during
the brief interval when the kcs_loop thread is not running,
the watchdogd thread notices that the lock holder is not running
and sleeps.  When the kcs_loop thread eventually finishes processing
one request, it drops the IPMI_IO_LOCK and then immediately takes the
lock again so it can process the next request in the queue.
Because the watchdogd thread is sleeping at this point, the kcs_loop
always wins the race to acquire the IPMI_IO_LOCK, thus starving
the watchdogd thread.  The callout for the watchdog pretimeout
would be reset by the watchdogd thread after its request to the BMC
watchdog completes, but since that request never processed, the
pretimeout callout eventually fires, even though there is nothing
actually wrong with the host.

To prevent this saga from unfolding:

 - when kcs_driver_request() is called in a context where it can sleep,
   queue the request and let the worker thread process it rather than
   trying to process in the original thread.
 - add a new high-priority queue for driver requests, so that the
   watchdog patting requests will be processed as quickly as possible
   even if lots of application requests have already been queued.

With these two changes, the watchdog pretimeout action does not trigger
even if the BMC is completely out to lunch for long periods of time
(as long as the watchdogd check command does not also get stuck).

Sponsored by:	Netflix
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D36555
2022-11-01 10:55:14 -07:00
Eugene Grosbein
6d9d4b2da8 ipmi(4): spelling fix cyle_wait -> cycle_wait
There are no consumers of hw.ipmi.cyle_wait in our tree.
Also the knob is undocumented, so it should be safe to fix its name.
No MFC planned, though.
2022-07-20 18:32:24 +07:00
Philip Paeps
c4995b69db ipmi: fix a use-after-free bug in error handling
18db96dbfd introduced a use-after-free bug
in the error handling of the IPMICTL_RECEIVE_MSG ioctl.

Reported by:    Coverity (CID 1490456) (via vangyzen)
Differential Revision:	https://reviews.freebsd.org/D35605
2022-07-08 11:49:54 +08:00
Yuri
177f8b3294 ipmi: do not omit lun in BMC addresses
Some systems put sensors on non-0 lun, so we should not omit it.  This
was the only difference with the Linux driver, where DIMM sensors could
be queried, but not on FreeBSD.

See this report[1] on the FreeBSD forums:
https://forums.freebsd.org/threads/freebsd-cannot-get-dimm-temperature-sensor-value.85166/

Reviewed by:	philip
Tested by:	Andrey Lanin[1]
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D35612
2022-07-04 14:30:39 +08:00
Yuri
18db96dbfd ipmi: correctly handle ipmb requests
Handle IPMB requests using SEND_MSG (sent as driver request as we do not
need to return anything back to userland for this) and GET_MSG (sent as
usual request so we can return the data for RECEIVE_MSG ioctl) pair.

This fixes fetching complete sensor data from boards (e.g. HP ProLiant
DL380 Gen10).

Reviewed by:	philip
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D35605
2022-07-04 13:00:42 +08:00
John Baldwin
fd773e2bbf ipmi: Remove unused devclass arguments to DRIVER_MODULE. 2022-05-09 12:22:02 -07:00
John Baldwin
ac56d90a49 ipmi: Use devclass_find to lookup ipmi devclass in ipmi_unload.
Differential Revision:	https://reviews.freebsd.org/D35061
2022-05-05 16:34:33 -07:00
Alexander Motin
016d18229c ipmi: Make all sysctls also tunables.
MFC after:	1 week
2022-03-17 13:34:15 -04:00
Alexander Motin
6c2d440416 ipmi(4): Limit maximum watchdog pre-timeout interval.
Previous code by default setting pre-timeout interval to 120 seconds
made impossible to set timeout interval below that, resulting in error
0xcc (Invalid data field in Request) at least on Supermicro boards.

To fix that limit maximum pre-timeout interval to ~1/4 of the timeout
interval, that sounds like a reasonable default: not too short to fire
too late, but also not too long to give many false reports.

MFC after:	2 weeks
2021-09-14 21:06:39 -04:00
Wojciech Macek
e3500c602b ipmi: fix negative logic in watchdog control flag
Use wd_enable instead of wd_disable
2021-08-18 08:21:14 +02:00
Wojciech Macek
e8ad0a0059 ipmi: New tunable to deactivate IPMI watchdog
In case we want to use other WD than IPMI-provided, add
sysctl to disable initialization.

Obtained from:		Semihalf
Sponsored by:		Stormshield
Differential revision:	https://reviews.freebsd.org/D31548
2021-08-17 08:31:00 +02:00
Alexander Motin
9d3b47abbb ipmi(4): Add more watchdog error checks.
Add request submission status checks before checking req->ir_compcode,
otherwise it may be zero just because of initialization.

Add checks for req->ir_compcode errors in ipmi_reset_watchdog() and
ipmi_set_watchdog().  In first case explicitly check for 0x80, which
means timer was not previously set, that I found happening after BMC
cold reset.  This change makes watchdog timer to recover instead of
permanently ignoring reset errors after BMC reset or upgraded.

MFC after:	2 weeks
Sponsored by:   iXsystems, Inc.
2021-07-29 23:39:04 -04:00
Brooks Davis
562894f0dc Centralize compatability translation macros.
Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Input from:	cem, jhb
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24275
2020-04-14 20:30:48 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Conrad Meyer
26649bb5e8 efirt: When present, attempt to use EFI runtime services to shutdown
PR:		maybe related to 233998 (inconclusive at this time)
Submitted by:	byuu <byuu AT tutanota.com> (previous version)
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D18506
2018-12-15 05:46:04 +00:00
Doug Ambrisko
3991dbf3fa Fix a module Makefile error on amd64 so the IPMI HW interfaces are built.
When the module is being unloaded and no HW interfaces were created don't
clean up.  This was exposed by the amd64 module build issue.
2018-08-16 15:59:02 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Peter Wemm
9ee3ea71b3 As a follow-on to r325378, make the shutdown timer default to 0 as well.
Otherwise an orderly shutdown will initiate a watchdog that will cause
a 7 minute delayed reboot *by default*,  In the freebsd.org cluster's case
this often worked out be a surprise reboot a minute or two after the
machine came back up.
2017-11-05 05:05:18 +00:00
Warner Losh
c154763db1 Make the startup timeout 0 seconds by default rathern than 420s. This
makes the default fail safe when watchdogd is disabled (which is also
the default).

Sponsored by
2017-11-04 03:01:58 +00:00
Warner Losh
16f0063e99 Make time we wait for a power cycle tunable.
hw.ipmi.cycle_time is the time to wait for the power down phase of the
ipmi power cycle before falling back to either reboot or halt.

Sponsored by: Netflix
2017-10-26 22:53:02 +00:00
Warner Losh
14d004507e Various IPMI watchdog timer improvements
o Make hw.ipmi.on a tuneable
o Changes to keep shutdown from hanging indefinitately after the wd
  would normally have been disabled.
o Add support for setting pretimeout (which fires an interrupt
  some time before the actual watchdog expires)
o Allow refinement of the actions to take when the watchdog expires
o Allow special startup timeout to keep us from hanging in boot
  before watchdogd is started, but after we've loaded the kernel.

Obtained From: Netflix OCA Firmware
2017-10-26 22:52:51 +00:00
Warner Losh
1170c2fecc Implement IPMI support for RB_POWRECYCLE
Some BMCs support power cycling the chassis via the chassis control
command 2 subcommand 2 (ipmitool called it 'chassis power cycle').  If
the BMC supports the chassis device, register a shutdown_final handler
that sends the power cycle command if request and waits up to 10s for
it to take effect. To minimize stack strain, we preallocate a ipmi
request in the softc. At the moment, we're verbose about what we're
doing.

Sponsored by: Netflix
2017-10-25 15:30:53 +00:00
Alexander Motin
ea2ef9931c Optimize IPMI watchdog patting.
Set watchdog timer parameters only when they really need to be changed.
In other cases just restart the timer with single Reset command instead
of two (Set and Reset).

From one side this visually reduces amount of CPU time burned in tight
loop waiting while some slow BMC configures its watchdog hardware, that
seems to be much more complicated task then just resetting the timer.

From another side on some BMCs those slow Set commands sometimes tend to
timeout, that leads to noisy log messages and even more CPU time burned,
so avoiding them can provide even bigger bonuses.

MFC after:	2 weeks
2016-03-22 06:24:52 +00:00
John Baldwin
9662eef57a Watchdog drivers need to support rearming the watchdog in contexts which
are not permitted to sleep.  Only use the IPMI watchdog with backends
which poll driver-initiated requests to meet this requirement.

In practice this means that watchdogs will no longer be used on systems
that use the SSIF backend.

Differential Revision:	https://reviews.freebsd.org/D2062
MFC after:	2 weeks
2015-04-24 16:56:23 +00:00
John Baldwin
c869aa71f0 Use direct hardware access for internal requests for KCS and SMIC. In
particular, updates to the watchdog should no longer sleep.
- Add a new IPMI_IO_LOCK for low-level I/O access.  Use this for
  kcs_polled_request() and smic_polled_request().
- Add a new backend callback "ipmi_driver_request" to handle a driver
  request.  The new callback performs the request sychronously for KCS
  and SMIC.  SSIF still defers the work to the worker thread since the
  worker thread sleeps during request processing anyway.
- Allocate driver requests on the stack rather than using malloc().

Differential Revision:	https://reviews.freebsd.org/D1723
Tested by:	scottl
MFC after:	2 weeks
2015-02-06 16:45:10 +00:00
Gleb Smirnoff
a9b3c1bf05 Provide a crutch that prevents watchdog to interrupt dumping
on a box with IPMI enabled.

Okay from:	jhb
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-31 05:13:53 +00:00
John Baldwin
1710852ebd Don't try to stop the IPMI watchdog timer if it is not running.
Starting or stopping the IPMI watchdog is rather expensive with the
current implementation as all IPMI requests are bounced via thread.
This is not viable during shutdown or dumps, and this avoids headache
in the common case that the watchdog is not enabled.  The IPMI watchdog
should probably be reworked to not use a separate thread to fix this
in the case when the watchdog timer is enabled.

MFC after:	2 weeks
2012-08-07 12:40:31 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Ruslan Ermilov
14689886a5 Fixed firmware revision decoding:
- the major is 7-bit binary encoded
- the minor is BCD encoded

PR:		kern/151586
MFC after:	3 days
2011-04-14 07:14:22 +00:00
Ruslan Ermilov
a46a1e767d - Fixed incorrect watchdog timeout setting: MSB of a 2-byte
value is obtained by dividing it by 256, not by 2550; also,
  one second is 10^9 nanoseconds, not 1800000000 nanoseconds.

- Due to rounding error, setting watchdog to a really small
  timeout (<1 sec) was turning the watchdog off.  It should
  set the watchdog to a small timeout instead.

- Implemented error checking in ipmi_wd_event(), as required
  by watchdog(9).

PR:		kern/130512
Submitted by:	Dmitrij Tejblum

- Additionally, check that the timeout value is within the
  supported range, and if it's too large, act as required by
  watchdog(9).

MFC after:	3 days
2009-12-18 12:10:42 +00:00
David E. O'Brien
c12dbd1d96 Fix typo where the code was missing the "IPMICTL_RECEIVE_MSG_32" condition
test.
2008-11-14 01:53:10 +00:00
John Baldwin
943bebd2c9 Remove hack attempt at using devfs cloning for per-file descriptor storage.
Use the much simpler cdevpriv for per-fd state and enable it.  This allows
multiple opens of /dev/ipmi0 (e.g. using ipmitool while ipmievd is running
in the background).

MFC after:	1 week
2008-08-28 02:13:53 +00:00
Nick Hibma
f29fa1dfa4 Revisit the watchdogs: Resetting the error to EINVAL after failing to set the
watchdog might hide the succesful arming of an earlier one. Accept that on
failing to arm any watchdog (because of non-supported timeouts) EOPNOTSUPP is
returned instead of the more appropriate EINVAL.

MFC after:	3 days
2007-03-27 21:03:37 +00:00
Paolo Pisati
ef544f6312 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
Nick Hibma
9079fff550 Align the interfaces for the various watchdogs and make the interface
behave as expected.

Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
  WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
  explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
  lost your sense of humor, than don't add a define.

Specific changes:

i80321_wdog.c
  Don't roll your own passive watchdog tickle as this would defeat the
  purpose of an active (userland) watchdog tickle.

ichwd.c / ipmi.c:
  WD_ACTIVE means active patting of the watchdog by a userland process,
  not whether the watchdog is active. See sys/watchdog.h.

kern_clock.c:
  (software watchdog) Remove a check for WD_ACTIVE as this does not make
  sense here. This reverts r1.181.
2006-12-15 21:44:49 +00:00
John Baldwin
d78cd1ad55 Fix some edge cases in detach() as well as a memory leak if we fail to
talk to the BMC.

Reported by:	Alexander Logvinov : ports at logvinov_com
MFC after:	1 week
2006-12-06 15:10:11 +00:00
John Baldwin
5283d39b98 ipmi_polled_enqueue_request() is already called with the lock held, just
assert it rather than recursing.

Reported by:	mjacob
Pointy hat:	jhb
MFC after:	3 days
2006-10-12 16:26:42 +00:00
John Baldwin
bec0c98eae Fix a memory leak in ipmi_unload().
CID:		1542
Found by:	Coverity Prevent
2006-09-26 15:48:13 +00:00
John Baldwin
d72a078647 Update the ipmi(4) driver:
- Split out the communication protocols into their own files and use
  a couple of function pointers in the softc that the commuication
  protocols setup in their own attach routine.
- Add support for the SSIF interface (talking to IPMI over SMBus).
- Add an ACPI attachment.
- Add a PCI attachment that attaches to devices with the IPMI interface
  subclass.
- Split the ISA attachment out into its own file: ipmi_isa.c.
- Change the code to probe the SMBIOS table for an IPMI entry to just use
  pmap_mapbios() to map the table in rather than trying to setup a fake
  resource on an isa device and then activating the resource to map in the
  table.
- Make bus attachments leaner by adding attach functions for each
  communication interface (ipmi_kcs_attach(), ipmi_smic_attach(), etc.)
  that setup per-interface data.
- Formalize the model used by the driver to handle requests by adding an
  explicit struct ipmi_request object that holds the state of a given
  request and reply for the entire lifetime of the request.  By bundling
  the request into an object, it is easier to add retry logic to the various
  communication backends (as well as eventually support BT mode which uses
  a slightly different message format than KCS, SMIC, and SSIF).
- Add a per-softc lock and remove D_NEEDGIANT as the driver is now MPSAFE.
- Add 32-bit compatibility ioctl shims so you can use a 32-bit ipmitool
  on FreeBSD/amd64.
- Add ipmi(4) to i386 and amd64 NOTES.

Submitted by:	ambrisko (large portions of 2 and 3)
Sponsored by:	IronPort Systems, Yahoo!
MFC after:	6 days
2006-09-22 22:11:29 +00:00
Poul-Henning Kamp
c40da00ca3 Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.
2006-05-16 14:37:58 +00:00
Doug Ambrisko
19d61f3f27 Don't conflict with the DEBUG define. 2006-02-13 16:50:45 +00:00
Doug Ambrisko
37b1ce132c Add an OpenIPMI mostly compatible driver. This driver was developed
to work with ipmitools.  It works with other tools that have an OpenIPMI
driver interface.  The port will need to get updated to used this.
I have not implemented the IPMB mode yet so ioctl's for that don't
really do much otherwise it should work like the OpenIPMI version.
The ipmi.h definitions was derived from the ipmitool header file.
The bus attachments are done for smbios and pci/smbios.  Differences
in bus probe order for modules/static are delt with.  ACPI attachment
should be done.

This drivers registers with the watchdod(4) interface

Work to do:
     - BT interface
     - IPMB mode

This has been tested on Dell PE2850, PE2650 & PE850 with i386 & amd64
kernel.

I will link this into the build on next week.

Tom Rhodes, helped me with the man page.

Sponsored by:   IronPort Systems Inc.
Inspired from:  ipmitool & Linux
2006-02-10 20:51:35 +00:00