Commit Graph

163 Commits

Author SHA1 Message Date
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Warner Losh
59dc489a7e mpr: Fix minor 'typos' comment
moving -> removing (we're removing the device)
CAM_REQ_CMO_ERROR -> CAM_REQ_ERR (the former isn't a thing)

Sponsored by:		Netflix
2023-07-20 17:18:28 -06:00
Mariusz Zaborski
444c661545 mpr: don't use hardcoded value in debug branch
Pointed out by:	imp
Sponsored by:   Klara Inc.
2023-04-21 10:01:38 +02:00
Mariusz Zaborski
ea6597c38c mpr: fix copying of event_mask
Before the commit 6cc44223cb the
field event_mask was fully copied to the EventMasks field.
After this commit the event_mask (uint8_t) is 4 times casted to
EventMask (uint32_t). Because of that 24 bits of each event_mask array
is lost.

This commits brings back simple copying of field, and after words
converting 32 bits field to the requested endian.

I don't think we need more sophisticated method,
as the array is of size 4 (for 32 bits version).

Reviewed by:	imp
MFC after:	1 week
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D39562
2023-04-21 10:01:38 +02:00
Alan Somers
72aad3f902 Fix kernel memory disclosures in mpr and mps
In every mpr and mps ioctl that copies kernel data to userland, validate
that the requested length does not exceed the size of the kernel's
buffer.

Note that all of these ioctls already required root access.

MFC after:	2 weeks
Sponsored by:	Axcient
Reviewed by:	imp
Differential Revision: https://reviews.freebsd.org/D38842
2023-03-02 13:31:06 -07:00
Kenneth D. Merry
11778fca4a Fix mpr(4) panic during a firmware update.
Issue Description:
The RequestCredits field of IOCFacts got changed between the Phase23
firmware to Phase24 firmware. So as part of firmware update operation,
driver has to free the resources & pools which are created with the Phase23
Firmware's IOCFacts data (i.e. during driver load time) and has to
reallocate the resources and pools using Phase24's IOCFacts data. Here
driver has freed the interrupts but missed to reallocate the interrupts and
hence config page read operation is getting timed out and controller is
going for recursive reinit (controller reset) operations and leading to
kernel panic.

Fix:
Reallocate the interrupts if the interrupts are disabled as part of
firmware update/downgrade operation.

Submitted by:	Sreekanth Ready <sreekanth.reddy@broadcom.com>
Tested by:	ken
MFC after:	3 days
2022-10-17 12:48:34 -04:00
Gordon Bergling
2f9de90b22 mpr(4): Remove a double word in a source code comment
- s/the the/the/

MFC after:	3 days
2022-09-04 13:46:44 +02:00
John Baldwin
d48ddc5f96 mpr/mps/mpt: Remove unused devclass arguments to DRIVER_MODULE. 2022-05-06 15:39:32 -07:00
Warner Losh
ca420b4ef2 mpr/mps: when sending reset on removal, include target in message
It's possible for muliple drives to be departing at the same time (if
the common power rail the share goes dark, for example). To understand
what's going on better, include target and handle in the messages
announcing the reset to allow matching with other corresponding events.

MFC After:		3 days
Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35092
2022-04-28 16:30:00 -06:00
Warner Losh
c5041b4ee8 mpr/mps: Add comment explaining state transition
When we can't load a request due to a shortage of chains, we complete
the command's cm. However, to avoid an assert in mp?_complete_command,
we transition its state to INQUEUE. This transition is legitimate
because this is the only error path that terminates a cm before it's
enqueued and the only other alternative would be an additional transient
state that would add complexity w/o adding value. Add a comment
explainging all this because otherwise the transition can look a bit
weird.

Sponsored by:		Netflix
2022-04-28 11:19:39 -06:00
Ed Maste
27ac4281fd mpr: add \n in diagnostic printf
Diff reduction between mpr and mps.

Fixes:		e2997a03b7 ("Diagnostic buffer fixes for the mps(4)...")
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2022-03-30 20:14:07 -04:00
Ed Maste
8276c4149b mpr/mps/mpt: verify cfg page ioctl lengths
*_CFG_PAGE ioctl handlers in the mpr, mps, and mpt drivers allocated a
buffer of a caller-specified size, but copied to it a fixed size header.
Add checks that the size is at least the required minimum.

Note that the device nodes are owned by root:operator with 0640
permissions so the ioctls are not available to unprivileged users.

This change includes suggestions from scottl, markj and mav.

Two of the mpt cases were reported by Lucas Leong (@_wmliang_) of
Trend Micro Zero Day Initiative; scottl reported the third case in mpt.
Same issue found in mpr and mps after discussion with imp.

Reported by:	Lucas Leong (@_wmliang_), Trend Micro Zero Day Initiative
Reviewed by:	imp, mav
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34692
2022-03-28 20:35:47 -04:00
Alexander Motin
074bed4f48 mps/mpr: Add missing newlines in error messages.
MFC after:	1 week
2022-02-22 15:08:22 -05:00
Warner Losh
e35816c1c9 mpr/mps: Fix a race in diagnostic reset
There's a small race in freezing the simq when performing a diagnostic
reset. During this time, a transaction can slip through and encounter
the target id of 0. If we're still in diagnostic reset when we detect
this, return a CAM_DEVICE_NOT_THERE status. Instead, freeze the queue
and return a requeue status, similar to what we do when we're resetting
a target and a transaction get here. The race is unavoidable due to
separate locks for queue and SIM, but easy enough to detect and make
harmless.

Sponsored by:		Netflix
Reviewed by:		scottl, mav
Differential Revision:	https://reviews.freebsd.org/D34017
2022-01-25 19:15:46 -07:00
Warner Losh
802f8d4afe mpr/mps: Remove write-only flag and callout
The discovery callout is initialized and cancelled only, making it
write-only. Remove a state flag associated with it being pending as well
as two defines that aren't used that are associated with it. Remove
MP?SAS_SHUTDOWN flag, which is unused.

Sponsored by:		Netflix
Reviewed by:		ken, scottl, mav
Differential Revision:	https://reviews.freebsd.org/D33925
2022-01-24 13:21:09 -07:00
Alexander Motin
1849bc5f3f mps/mpr: Relax doorbell polling precision.
It does not matter how often do we check firmware for crashes.

MFC after:	2 weeks
2022-01-07 21:34:49 -05:00
Gordon Bergling
ddeb702f7b mpr(4): Fix a typo in a source code comment
- s/segement/segment/

MFC after:	3 days
2021-11-30 10:40:50 +01:00
Scott Long
61f17c5fd6 Fix "set but not used" warnings in the mpr driver. This fixes a minor
bug in error handling.
2021-11-25 03:28:29 +00:00
Warner Losh
a8837c77ef mpr: fix freeze / release mismatch in timeout code
So, if we're processing a timeout, and we've sent an ABORT to the
firmware for that timeout, but not yet received the response from the
firmware, AND we get another timeout, we queue the timeout and freeze
the queue. However, when we've finally processed them all, we only
release the queue once. This causes all I/O to halt as the devq remains
frozen forever.

Instead, only freeze the queue when we start the process (eg set INRESET
on the target). This will allow the release when all the timed out I/Os
have finished ABORTing.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D33054
2021-11-21 08:54:45 -07:00
Warner Losh
2bbaed4d7f mpr: Minor formatting changes to match mps.
Minor reformatting nits to make mprsas_scsiio_timeout match
mpssas_scsiio_timeout more closely. The differences aren't necessary and
are distracting when comparing the routines. No functional changes.

Sponsored by:		Netflix
2021-11-15 21:27:15 -07:00
Alexander Motin
02d8194012 mps/mpr(4): Move xpt_register_async() out of lock.
It fixes lock ordere reversal between SIM and device locks.  Also
remove registration for AC_FOUND_DEVICE, unused for a while now.

MFC after:	1 month
2021-09-14 17:40:32 -04:00
Alexander Motin
9781c28c6d mpr(4): Fix unmatched devq release.
Before this change devq was frozen only if some command was sent to
the target after reset started, but release was called always.  This
change freezes the devq immediately, leaving mprsas_action_scsiio()
check only to cover race condition due to different lock devq use.

This should also avoid unnecessary requeue of the commands, creating
additional log noise and confusing some broken apps.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2021-08-20 14:51:12 -04:00
Alexander Motin
e3c5965c25 mpr(4): Handle mprsas_alloc_tm() errors on device removal.
SAS9305-16e with firmware 16.00.01.00 report HighPriorityCredit of
only 8, while for comparison some other combinations I have report
100 or even 128.  In case of large JBOD detach requirement to send
target reset command to each target same time overflows the limit,
and without adequate handling makes devices stuck in half-detached
state, preventing later re-attach.

To handle that in case of allocation error mark the target with new
MPRSAS_TARGET_TOREMOVE flag, and retry the removal attempt next time
something else free high priority command.  With this patch I can
successfully detach/attach 102 disk JBOD from/to the SAS9305-16e.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2021-08-20 10:03:32 -04:00
Alexander Motin
b776de6796 Mark some sysctls as CTLFLAG_MPSAFE.
MFC after:	2 weeks
2021-08-10 20:44:27 -04:00
Warner Losh
33755dbb20 mpr/mps: Minor state machine fix
When a DMA chain can't be loaded, set the state to STATE_INQUEUE so that
the mp[rs]_complete_command can properly fail the command.

Sponsored by:		Netflix
2021-06-03 13:46:19 -06:00
Kenneth D. Merry
175ad3d003 Fix mpr(4) and mps(4) state transitions and a use-after-free panic.
When the mpr(4) and mps(4) drivers probe a SATA device, they issue an
ATA Identify command (via mp{s,r}sas_get_sata_identify()) before the
target is fully setup in the driver.  The drivers wait for completion of
the identify command, and have a 5 second timeout.  If the timeout
fires, the command is marked with the SATA_ID_TIMEOUT flag so it can be
freed later.

That is where the use-after-free problem comes in.  Once the ATA
Identify times out, the driver sends a target reset, and then frees any
identify commands that have timed out.  But, once the target reset
completes, commands that were queued to the drive are returned to the
driver by the controller.

At that point, the driver (in mp{s,r}_intr_locked()) looks up the
command descriptor for that particular SMID, marks it CM_STATE_BUSY and
sends it on for completion handling.

The problem at this stage is that the command has already been freed,
and put on the free queue, so its state is CM_STATE_FREE.  If INVARIANTS
are turned on, we get a panic as soon as this command is allocated,
because its state is no longer CM_STATE_FREE, but rather CM_STATE_BUSY.

So, the solution is to not free ATA Identify commands that get stuck
until they actually return from the controller.  Hopefully this works
correctly on older firmware versions.  If not, it could result in
commands hanging around indefinitely.  But, the alternative is a
use-after-free panic or assertion (in the INVARIANTS case).

This also tightens up the state transitions between CM_STATE_FREE,
CM_STATE_BUSY and CM_STATE_INQUEUE, so that the state transitions happen
once, and we have assertions to make sure that commands are in the
correct state before transitioning to the next state.  Also, for each
state assertion, we print out the current state of the command if it is
incorrect.

mp{s,r}.c:      Add a new sysctl variable, dump_reqs_alltypes,
                that controls the behavior of the dump_reqs sysctl.
                If dump_reqs_alltypes is non-zero, it will dump
                all commands, not just the commands that are in the
                CM_STATE_INQUEUE state.  (You can see the commands
                that are in the queue by using mp{s,r}util debug
                dumpreqs.)

                Make sure that the INQUEUE -> BUSY state transition
                happens in one place, the mp{s,r}_complete_command
                routine.

mp{s,r}_sas.c:  Make sure we print the current command type in
                command state assertions.

mp{s,r}_sas_lsi.c:
                Add a new completion handler,
                mp{s,r}sas_ata_id_complete.  This completion
                handler will free data allocated for an ATA
                Identify command and free the command structure.

                In mp{s,r}_ata_id_timeout, do not set the command
                state to CM_STATE_BUSY.  The command is still in
                queue in the controller.  Since we were blocking
                waiting for this command to complete, there was
                no completion handler previously.  Set the
                completion handler, so that whenever the command
                does come back, it will get freed properly.

                Do not free ATA Identify commands that have timed
                out in mp{s,r}sas_add_device().  Wait for them
                to actually come back from the controller.

mp{s,r}var.h:   Add a dump_reqs_alltypes variable for the new
                dump_reqs_alltypes sysctl.

                Make sure we print the current state for state
                transition asserts.

This was tested in the Spectra Logic test bed (as described in the
review), as well Netflix's Open Connect fleet (where panics dropped from
a dozen or two a month to zero).

Reviewed by:		imp@ (who is handling the commit with ken's OK)
Sponsored by:		Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D25476
2021-06-03 13:46:11 -06:00
Edward Tomasz Napierala
7608b98c43 mpr, mps: clear CCBs allocated on the stack
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30301
2021-05-21 07:42:13 +01:00
Alexander Motin
b99419aee4 mpr/mps(4): Make device mapping some more robust.
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.

If we can not process DPM due to corruption -- wipe it and restart
from scratch.  Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.

Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table.  It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.

Also while there make some enclosure mapping errors more informative.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2021-04-23 23:36:51 -04:00
John Baldwin
645b15e558 Remove unused wrappers around kproc_create() and kproc_exit().
Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29205
2021-03-12 09:47:58 -08:00
Alfredo Dal'Ava Junior
71900a794d mpr: big-endian support
This fixes mpr driver on big-endian devices.
Tested on powerpc64 and powerpc64le targets using a SAS9300-8i card
(LSISAS3008 pci vendor=0x1000 device=0x0097)

Submitted by:	Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by:	luporl, alfredo, Sreekanth Reddy <sreekanth.reddy@broadcom.com> (by email)
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25785
2021-03-02 22:21:42 -03:00
Mark Johnston
adc0dcc352 mpr, mps: Fix an off-by-one bug in the BTDH_MAPPING ioctl
The device mapping table contains sc->max_devices entries, so only
indices in [0, sc->max_devices) are valid.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27964
2021-01-08 13:32:05 -05:00
Mark Johnston
de828a91db mpr, mps: Fix a stack buffer overflow in the user passthru ioctl
Previously we copied in the request into a stack-allocated structure
that could be smaller than the request size.  Furthermore, we checked
the request size only after doing the copyin.

Fix this by allocating a buffer to hold the request, then copying the
buffer's contents into a command descriptor.  This is a bit heavy-handed
but I expect the overhead will not be noticeable.  The approach of
coping the header in first is susceptible to TOCTOU problems.

Reviewed by:	imp
Reported by:	maxpl0it@protonmail.com
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27963
2021-01-08 13:32:04 -05:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Alexander Motin
8836496815 Introduce support of SCSI Command Priority.
SAM-3 specification introduced concept of Task Priority, that was renamed
to Command Priority in SAM-4, and supported by all modern SCSI transports.
It provides 15 levels of relative priorities: 1 - highest, 15 - lowest and
0 - default.  SAT specification for SATA devices translates priorities 1-3
into NCQ high priority.

This change adds new "priority" field into empty spots of struct ccb_scsiio
and struct ccb_accept_tio of CAM and struct ctl_scsiio of CTL.  Respective
support is added into iscsi(4), isp(4), mpr(4), mps(4) and ocs_fc(4) drivers
for both initiator and where applicable target roles.  Minimal support was
added to CTL to receive the priority value from different frontends, pass it
between HA controllers and report in few places.

This patch does not add consumers of this functionality, so nothing should
really change yet, since the field is still set to 0 (default) on initiator
and not actively used on target.  Those are to be implemented separately.

I've confirmed priority working on WD Red SATA disks connected via mpr(4)
and properly transferred to CTL target via iscsi(4), isp(4) and ocs_fc(4).

While there, added missing tag_action support to ocs_fc(4) initiator role.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2020-10-25 19:34:02 +00:00
Scott Long
4bc604dcda Bring the request_descriptor union into harmony internally. No
functional change.
2020-10-13 14:10:49 +00:00
Scott Long
74c781ed91 Refine the busdma template interface. Provide tools for filling in fields
that can be extended, but also ensure compile-time type checking.  Refactor
common code out of arch-specific implementations.  Move the mpr and mps
drivers to this new API.  The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
2020-09-14 05:58:12 +00:00
Mateusz Guzik
577858c836 mpr: clean up empty lines in .c and .h files 2020-09-01 22:07:12 +00:00
Alexander Motin
f0f2014387 Remove extra memset() left after r342388.
This memset() wiped MPI2_FUNCTION_SCSI_TASK_MGMT set by mprsas_alloc_tm(),
that broke target reset on device removal, making later re-insertion into
the same slot impossible, since firmware was still waiting for the driver
to finish with the removed device.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-08-04 19:27:03 +00:00
Mark Johnston
d2a5f0812b mpr(4), mps(4): Stop checking for failures from malloc(M_WAITOK).
PR:		240545
Submitted by:	Andrew Reiter <arr@watson.org>
Reviewed by:	imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25766
2020-07-27 14:28:55 +00:00
Scott Long
8fae77f50c Add a small hack to the ioctl header files so that both mpr and mps can
be included.  This isn't a great solution, but fixing it correctly is a
bigger task and this is the lesser of the existing evils.
2020-04-16 03:28:28 +00:00
Brooks Davis
562894f0dc Centralize compatability translation macros.
Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Input from:	cem, jhb
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24275
2020-04-14 20:30:48 +00:00
Alexander Motin
a2386b6f6a Increase buffer in mprsas_log_command() from 192 to 224 bytes.
192 bytes are not enough to print long commands, such as ATA COMMAND PASS
THROUGH(16), that makes debug output difficult to read.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-03-13 14:51:11 +00:00
Warner Losh
0d87f3c702 Remove support for all pre FreeBSD 11.0 versions from mpr and mps.
Remove a number of workarounds for older versions of FreeBSD. FreeBSD stable/10
was branched over 6 years ago. All of these changes date from about that time or
earlier. These workarounds are extensive and get in the way of understanding
the current flow in the driver.
2020-02-26 19:15:08 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Warner Losh
4c1cdd4a7c Before issing the REMOVE_DEVICE command to the firmware, make sure that all
commands have completed.

It's not OK to force complete any pending commands before we send the
REMOVE_DEVICE. Instead, make sure that all pending commands are complete before
sending that. By trying to second guess the firmware here, we run the risk of
completing commands twice, which leads to corruption.

This removes the forced completion of commands introduced in r218811. So it's a
partial backout of that commit, but replaces it with a more rebust
mechanism. Either these commands will complete due to the TARGET RESET, or they
will timeout and be aborted, but they will all complete.

Add assert that all commands are complete to REMOVE_DEVICE completion
routine. We attempt to assure this programatically, so we shouldn't have any
commands in the queue because we've waited for them all. Any commands that make
it into our action routine after we mark the target in removal will complete
immediately with an error.

When we're removing a target that's not a volume, advertise up the stack that
it's actually gone, as opposed to having a transient selection error we should
retry. Do this both in the action routine, and when we get a notification of an
aborted command. We don't do this for volumes because the driver tries hard not
to advertise to the OS a volume has disappeared.

Apply these changes to both mpr and mps since they are based on quite similar
designs.

Discussed with: scottl@
Differential Revision: https://reviews.freebsd.org/D23768
2020-02-25 04:27:23 +00:00
Scott Long
69e85eb8ae Advertise the MPI Message Version that's contained in the IOCFacts message
in the sysctl block for the driver.  mpsutil/mprutil needs this so it can
know how big of a buffer to allocate when requesting the IOCFacts from the
controller.  This eliminates the kernel console messages about wrong
allocation sizes.

Reported by:	imp
2020-02-07 12:15:39 +00:00
Scott Long
f5ead20562 Convert the mpr driver to use busdma templates. 2019-12-24 14:50:17 +00:00
Warner Losh
57aa9163fd Fix leak in state machine for commands.
When we get a device departed message from the firmware, we send a TARGET_REST
to the device to let the firmware know we're done and as part of the recovery
process. This will abort all the commands. While the documentation says the IOC
is responsible for writing the completion message for all the commands pending
with an aborted status, we sometimes have queued commands for the target that
haven't been completed so are in the INQUEUE state. So, when we later complete
the pending CCB as aborted, these commands are freed and we hit the "state not
busy" panic.

Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do
that here as well. In talking to Ken, Scott and Justin, they recommended a
series of tests to see if this is 100% safe. Those tests are ongoing, but
preliminary tests suggest this is safe as we see no duplicate completions when
we hit this case at work. We have a machine that has a dodgy powersupply which
usually doesn't apply power to a few drives, but sometimes does when the machine
is under heavy load so we get a rash of the connect / disconnect messages over
half an hour. Without this change, we'd see state not busy panic. With this
change, the drives just annoyingly come and go without affecting the rest of the
machine, but without a complete error injection test suite, it's hard to know if
all edge cases are now covered or not.

Discussed with: scottl, ken, gibbs
2019-11-24 15:24:05 +00:00
Warner Losh
4b1ac5c2d8 More fully implement the state machine.
When a command is finished running, we must transition it from INQUEUE
to busy state. We were failing to do that, so we hit a panic when the
commands were freed. This only affects mpr, mps already did simmilar
things. Now both the polling and interrupt paths properly set BUSY as
appropriate.
2019-07-11 06:22:15 +00:00