Commit Graph

81 Commits

Author SHA1 Message Date
Scott Long
f0779b0452 Improve command lifecycle debugging and detection of problems.
Sponsored by:	Netflix
2018-02-18 16:41:34 +00:00
Li-Wen Hsu
92ddc7b86d Fix non-64-bit platform build by printing bus_addr_t values using %#jx
Reviewed by:	slm
Differential Revision:	https://reviews.freebsd.org/D14344
2018-02-13 16:26:06 +00:00
Scott Long
63b1a33514 Print out the shared memory queues during initialization
Sponsored by:	Netflix
2018-02-11 20:15:47 +00:00
Alexander Motin
4f5d657343 Teach mps(4) and mpr(4) drivers to autotune chain frames.
This is a first part of the change.  It makes the drivers to calculate
the required number of chain frames to satisfy worst case scenarios, but
it does not change existing overly strict limits on them.  The next step
will be to rewrite the allocator to not require megabytes of physically
contiguous address space, that may be problematic if done after boot,
after doing which the limits can be removed.  Until that this code can
just correct user set limits, if they are set too high.

Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14261
2018-02-10 00:55:46 +00:00
Scott Long
964107031b Cache the value of the request and reply frame size since it's used quite
a bit in the normal operation of the driver.  Covert it to represent bytes
instead of 32bit words.  Fix what I believe to be is a bug in this respect
with the Tri-mode cards.

Sponsored by:	Netflix
2018-02-06 21:01:38 +00:00
Alexander Motin
62a09ee976 Fix queue length reporting in mps(4) and mpr(4).
Both drivers were found to report CAM bigger queue depth then they really
can handle.  It made them later under high load with many disks return
some of submitted requests back with CAM_REQUEUE_REQ status for later
resubmission.

Reviewed by:	scottl
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14215
2018-02-06 16:02:25 +00:00
Kenneth D. Merry
e2997a03b7 Diagnostic buffer fixes for the mps(4) and mpr(4) drivers.
In mp{r,s}_diag_register(), which is used to register diagnostic
buffers with the mp{r,s}(4) firmware, we allocate DMAable memory.

There were several issues here:
 o No checking of the bus_dmamap_load() return value.  If the load
   failed or got deferred, mp{r,s}_diag_register() continued on as if
   nothing had happened.  We now check the return value and bail
   out if it fails.

 o No waiting for a deferred load callback.  bus_dmamap_load()
   calls a supplied callback when the mapping is done.  This is
   generally done immediately, but it can be deferred.
   mp{r,s}_diag_register() did not check to see whether the callback
   was already done before proceeding on.  We now sleep until the
   callback is done if it is deferred.

 o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
   memory is allocated and loaded.  This is necessary on some
   platforms to synchronize host memory that is going to be updated
   by a device.

Both drivers would also panic if the firmware was reinitialized while
a diagnostic buffer operation was in progress.  This fixes that problem
as well.  (The driver will reinitialize the firmware in various
circumstances, but the problem I ran into was that the firmware would
generate an IOC Fault due to a PCIe error.)

mp{r,s}var.h:
	Add a new structure, struct mpr_busdma_context, that is
	used for deferred busdma load callbacks.

	Add a prototype for mp{r,s}_memaddr_wait_cb().
mp{r,s}.c:
	Add a new busdma callback function, mp{r,s}_memaddr_wait_cb().
	This provides synchronization for callers that want to
	wait on a deferred bus_dmamap_load() callback.

mp{r,s}_user.c:
	In bus_dmamap_register(), add a call to bus_dmamap_sync()
	with the BUS_DMASYNC_PREREAD flag set after an allocation
	is loaded.

	Also, check the return value of bus_dmamap_load().  If it
	fails, bail out.  If it is EINPROGRESS, wait for the
	callback to happen.  We use an interruptible sleep (msleep
	with PCATCH) and let the callback clean things up if we get
	interrupted.

	In mpr_diag_read_buffer() and mps_diag_read_buffer(), call
	bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying
	the data out to make sure the data is in stable storage.

	In mp{r,s}_post_fw_diag_buffer() and
	mp{r,s}_release_fw_diag_buffer(), check the reply to see
	whether it is NULL.  It can be NULL (and the command non-NULL)
	if the controller gets reinitialized while we're waiting for
	the command to complete but the driver structures aren't
	reallocated.  The driver structures generally won't be
	reallocated unless there is a firmware upgrade that changes
	one of the IOCFacts.

	When freeing diagnostic buffers in mp{r,s}_diag_register()
	and mp{r,s}_diag_unregister(), zero/NULL out the buffer after
	freeing it.  This will prevent a duplicate free in some
	situations.

Sponsored by:	Spectra Logic
Reviewed by:	mav, scottl
MFC after:	1 week
Differential Revision:	D13453
2018-02-06 15:58:22 +00:00
Justin Hibbits
77baa2256d Minimal changes for MPR to build on architectures with physical addresses larger than virtual
Summary:
Some architectures use large (36-bit) physical addresses, with smaller
virtual addresses.  Casting between vm_paddr_t (or bus_addr_t) and void * is
considered illegal, so cast through uintptr_t.  No functional change on existing
platforms.

Reviewed By:	scottl
Differential Revision:	https://reviews.freebsd.org/D14042
2018-02-04 15:37:58 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
26c1d774b5 dev: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
2018-01-13 22:30:30 +00:00
Scott Long
1069541760 Refactoring the interrupt setup code introduced a bug where the drivers
would attempt to re-allocate interrupts during a chip reset without
first de-allocating them.  Doing that right is going to be tricky, so
just band-aid it for now so that a re-init doesn't guarantee a failure
due to resource re-use.

Reported by:	gallatin
Sponsored by:	Netflix
2017-11-10 17:01:51 +00:00
Alan Somers
1d909844ab Fix mpr(4) panics caused by bad drive mapping tables
sys/dev/mpr/mpr_mapping.c
	If _mapping_process_dpm_pg0 detects inconsistencies in the drive
	mapping table (stored in the HBA's NVRAM), abort reading it and
	continue to boot as if the mapping table were blank.  I observed
	such inconsistencies in several HBAs after upgrading firmware from
	14.0.0.0 to 15.0.0.0.

Reviewed by:	slm
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12901
2017-11-03 15:07:36 +00:00
Scott Long
cfd6fd5ad1 Improve the debug parsing to allow flags to be added and subtracted
from the existing set.

Submitted by:	rea@freebsd.org
2017-10-01 15:35:21 +00:00
Scott Long
cb242d7cd9 Convert sysctl sbuf usage to use a fully dynaic sbuf. This is strictly
needed, but it silences an erroneous Coverity warning and makes the code a
little more logically consistent.  Also mark the sysctl as MPSAFE.

Sponsored by:	Netflix
2017-09-29 04:52:15 +00:00
Scott Long
867aa8cd99 Add the ability to report and set debug flags as text strings instead of
just integer flags.  Report both for convenience.

Submitted by:	Eygene Ryabinkin (manpage)
Sponsored by:	Netflix
2017-09-24 13:14:50 +00:00
Scott Long
55f1f05248 Garbage collect usued fields
Sponsored by:	Netflix
2017-09-23 08:26:42 +00:00
Scott Long
aeb9ac0df5 Clean up error messages related to device discovery
Sponsored by:	Netflix
2017-09-22 12:07:03 +00:00
Scott Long
7eed4c1853 Fix line wrap issues.
Sponsored by:	Netflix
2017-09-15 20:58:52 +00:00
Scott Long
3c5ac992c7 Add infrastructure for allocating multiple MSI-X interrupts. Also
add more fine-tuned controls for allocating requests and replies.

Sponsored by:	Netflix
2017-09-11 01:51:27 +00:00
Scott Long
a4bb51a4a2 Fix intrhook release in MPR and MPS for EARLY_AP_STARTUP.
Reported by:	Limelight
Sponsored by:	Netflix
2017-09-10 07:10:40 +00:00
Scott Long
1415db6ca2 More code refactoring in preparation for enabling multiqueue.
Sponsored by:	Netflix
2017-09-10 04:09:18 +00:00
Scott Long
2bf620cb8d Convert some in-line printing of diagnostic into tables.
Sponsored by:	Netflix
2017-09-09 22:02:36 +00:00
Scott Long
a7d065b3af Remove the unnecessary use of a temporary string buffer.
Sponsored by:	Netflix
2017-09-09 18:39:55 +00:00
Scott Long
bec09074ca Start separating the LSI drivers into per-queue structures. No
functional change.

Sponsored by:	Netflix
2017-09-09 18:03:40 +00:00
Scott Long
3d96cd7873 Refactor interrupt allocation and deallocation. Add some extra
diagnostics.  No other functional changes.

Sponsored by:	Netflix
2017-09-08 20:20:35 +00:00
Scott Long
6eea4f463d Checkpoint the next phase in debug message cleanup, this time focusing on
error recovery messages.

Sponsored by:	Netflix
2017-09-06 09:19:54 +00:00
Scott Long
757ff64216 Start overhauling debug printing in the MPS and MPR drivers. The focus of this
commit it to make initiazation less chatty in the normal case, and more useful
and informative when real debugging is turned on.

Reviewed by:	ken (earlier version)
Sponsored by:	Netflix
2017-08-27 06:24:06 +00:00
Kenneth D. Merry
6d4ffcb4ac Changes to make mps(4) and mpr(4) handle reinit with reallocation.
When the mps(4) and mpr(4) drivers need to reinitialize the
firmware, they sometimes need to reallocate all of the memory
allocated by the driver.  The reallocation happens whenever the IOC
Facts change.  That should only happen after a firmware upgrade.

If the reinitialization happens as a result of a timed out command
sent to the card, the command that timed out and triggered the
reinit may have been freed if iocfacts_allocate() reallocated all
memory.  If the caller attempts to access the command after that,
the kernel will panic because the caller will be dereferencing
freed memory.

The solution is to set a flag in the softc when we reallocate,
and avoid dereferencing the command strucure if we've reallocated.

The changes are largely the same in both drivers, since mpr(4) is a
derivative of mps(4).

 o In iocfacts_allocate(), if the IOC Facts have changed and we
   need to reallocate, set the REALLOCATED flag in the softc.

 o Change wait_command() to take a struct mps_command ** instead of
   a struct mps_command *.  This allows us to NULL out the caller's
   command pointer if we have to reinit the controller and the data
   structures get reallocated.  (The REALLOCATED flag will be set
   in the softc if that has happened.)

 o In every place that calls wait_command(), make sure we handle
   the case where the command is NULL after the call.

 o The mpr(4) driver has mpr_request_polled() which can also
   reinitialize the card.  Also check for reallocation there.

Reviewed by:	scottl, slm
MFC after:	1 week
Sponsored by:	Spectra Logic
2017-08-10 14:59:17 +00:00
Scott Long
2068b2aa88 Fix a logic bug in the split PCI interrupt code that slipped through
Reported by:	Harry Schmalzbauer
2017-07-31 16:55:56 +00:00
Scott Long
b618318ae3 Don't re-parse PCI IDs in order to set card-specific flags, use
the flags field in the PCIID table.
2017-07-31 00:05:49 +00:00
Scott Long
055e2653d4 Change from using underbar function names to normal function names for
the informational print functions.  Collapse the debug API a bit to be
more generic and not require as much code duplication.  While here, fix
a bug in MPS that was already fixed in MPR.
2017-07-30 22:34:24 +00:00
Scott Long
252b2b4f4f Split the interrupt setup code into two parts: allocation and configuration.
Do the allocation before requesting the IOCFacts message.  This triggers
    the LSI firmware to recognize the multiqueue should be enabled if available.
    Multiqueue isn't used by the driver yet, but this also fixes a problem with
    the cached IOCFacts not matching latter checks, leading to potential problems
    with error recovery.

    As a side-effect, fetch the driver tunables as early as possible.

Reviewed by:	slm
Obtained from:	Netflix
Differential Revision:	D9243
2017-07-30 06:53:58 +00:00
Scott Long
6c85e33ee8 Quiet a message that sounds far more dire than it really is. 2017-07-26 01:48:13 +00:00
Kenneth D. Merry
417aa6b850 Fix spurious timeouts on commands sent to mps(4) and mpr(4) controllers.
mps_wait_command() and mpr_wait_command() were using getmicrotime() to
determine elapsed time when checking for a timeout in polled mode.
getmicrotime() isn't guaranteed to monotonically increase, and that
caused spurious timeouts occasionally.

Switch to using getmicrouptime(), which does increase monotonically.
This fixes the spurious timeouts in my test case.

Reviewed by:	slm, scottl
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-07-19 15:39:01 +00:00
Stephen McConnell
327f2e6c56 Fix several problems with mapping code.
Reviewed by:    ken, scottl, asomers, ambrisko, mav
Approved by:	ken, mav
MFC after:      1 week
Differential Revision: https://reviews.freebsd.org/D10861
2017-05-25 19:20:06 +00:00
Stephen McConnell
18982e8fb0 Fix powerpc compiler error.
Approved by:	ken
2017-05-22 20:27:29 +00:00
Stephen McConnell
67feec5045 Add tri-mode support (SAS/SATA/PCIe).
This includes NVMe device support and adds support for the following adapters:
    SAS 3408
    SAS 3416
    SAS 3508
    SAS 3516
    SAS 3616
    SAS 3708
    SAS 3716

Reviewed by:    ken, scottl, asomers, mav
Approved by:	ken, scottl, mav
MFC after:      2 weeks
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D10095
2017-05-17 21:33:37 +00:00
Scott Long
855fe445b3 Improve error messages during command timeout for the mpr and mps
drivers.

Sponsored by:	Netflix
2017-05-11 15:19:04 +00:00
Alexander Motin
0656476aa6 Import mpr(4) driver P12 to P14 diff from vendor site.
This is mostly a version bump to stay in version number sync with firmware.
The only change there was cosmetic:  Display degraded speed message upon
receiving Active Cable Exception Event with DEGRADED reason code.

Discussed with:	slm@
MFC after:	1 week
2017-03-06 19:39:31 +00:00
Alan Somers
4e02badb18 Initialize a stack variable in mprsas_get_sas_address_for_sata_disk
Thought it's difficult to reproduce, I think this variable was responsible
for a use-after-free panic when a SATA disk timed out responding to a SATA
identify command during boot.

Submitted by:	slm
Reviewed by:	slm
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9364
2017-01-30 19:49:08 +00:00
Scott Long
c11c484f15 Rework the debug print API. Event printing no longer gets special handling.
All of the printing from the tables file now has wrappers so that the
handling is cleaner and it's possible to print something out (say, during
development) without having to fight the global debug flags. This re-org
will also make it easier to have the tables be compiled out at build time
if desired.

Other than fixing some minor bugs, there are no user-visible changes from
this change

Sponsored by:	Netflix, Inc.
Differential Revision:	D9238
2017-01-19 21:47:50 +00:00
Scott Long
94e4e732af Print out the number of queues/MSIx vectors.
Sponsored by:	Netflix
2017-01-12 01:13:05 +00:00
Alan Somers
4195c7de24 Always null-terminate ccb_pathinq.(sim_vid|hba_vid|dev_name)
The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are
fixed-length strings. AFAICT the only place they're read is in
sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated.
However, the kernel doesn't null-terminate them. A bunch of copy-pasted code
uses strncpy to write them, and doesn't guarantee null-termination. For at
least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually
overflows. You can see the result by doing "camcontrol negotiate da0 -v".

This change null-terminates those fields everywhere they're set in the
kernel. It also shortens a few strings to ensure they'll fit within the
16-character field.

PR:		215474
Reported by:	Coverity
CID:		1009997 1010000 1010001 1010002 1010003 1010004 1010005
CID:		1331519 1010006 1215097 1010007 1288967 1010008 1306000
CID:		1211924 1010009 1010010 1010011 1010012 1010013 1010014
CID:		1147190 1010017 1010016 1010018 1216435 1010020 1010021
CID:		1010022 1009666 1018185 1010023 1010025 1010026 1010027
CID:		1010028 1010029 1010030 1010031 1010033 1018186 1018187
CID:		1010035 1010036 1010042 1010041 1010040 1010039
Reviewed by:	imp, sephe, slm
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9037
Differential Revision:	https://reviews.freebsd.org/D9038
2017-01-04 20:26:42 +00:00
Alan Somers
fa699bb23e misc minor fixes in mpr(4)
sys/dev/mpr/mpr_sas.c
	* Fix a potential null pointer dereference (CID 1305731)
	* Check for overrun of the ccb_scsiio.cdb_io.cdb_bytes buffer (CID
	  1211934)

sys/dev/mpr/mpr_sas_lsi.c
	* Nullify a dangling pointer in mprsas_get_sata_identify
	* Fix a memory leak in mprsas_SSU_to_SATA_devices (CID 1211935)

Reported by:	Coverity (partially)
CID:		1305731 1211934 1211935
Reviewed by:	slm
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D8880
2017-01-03 17:35:16 +00:00
Scott Long
694cb8b815 Record the LogInfo field when reporting the IOCStatus. Helps in
debugging errors.

Submitted by:	slm
Obtained from:	Netflix
MFC after:	3 days
2016-11-04 17:25:47 +00:00
Scott Long
4ab1cdc5ad Add a fallback to the device mapper logic. We've seen systems in the field
that are apparently misconfigured by the manufacturer and cause the mapping
logic to fail.  The fallback allows drive numbers to be assigned based on the
PHY number that they're attached to.  Add sysctls and tunables to overrid
this new behavior, but they should be considered only necessary for debugging.

Reviewed by:	 imp, smh
Obtained from:	Netflix
MFC after:	3 days
Sponsored by:	D8403
2016-11-02 15:13:25 +00:00
Stephen McConnell
32b0a21e43 Use real values to calculate Max I/O size instead of guessing.
Reviewed by:	ken, scottl
Approved by:	ken, scottl, ambrisko (mentors)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D7043
2016-07-12 19:34:10 +00:00
Edward Tomasz Napierala
13a8942827 Remove NULL checks after M_WAITOK allocations from mpr(4).
Reviewed by:	asomers@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6297
2016-05-10 15:04:24 +00:00
Stephen McConnell
e769dfac49 Bump version of mpr driver to 13.00.00.00-fbsd
Approved by:	ken, scottl, ambrisko
MFC after:      1 week
2016-05-09 16:38:51 +00:00
Stephen McConnell
58581c1363 Disks can go missing until a reboot is done in some cases.
This is due to the DevHandle not being released, which causes the Firmware to
not allow that disk to be re-added.

Reviewed by:    ken, scottl, ambrisko, asomers
Approved by:	ken, scottl, ambrisko
MFC after:      1 week
Differential Revision: https://reviews.freebsd.org/D6102
2016-05-09 16:36:40 +00:00