The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).
This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.
This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.
Bump __FreeBSD_version to 1200081 due to API change
Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404
r330951 by smh fixed the mps driver to avoid deadlocks when panicing.
The same code is needed for mpr, so port it here, along with the fix
which allows the CCBs scheduled to complete avoiding at least a scary
message and likely other unintended consequences.
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16663
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
The mps(4) and mpr(4) drivers and hardware handle T10 Protection
Information, which is a system of checksums and guard blocks to protect
data while it is being transferred and while it is on disk. It is also
known as T10 DIF. For more details, see section 4.22 of the SBC-4 spec.
Supporting Type 2 protection requires using 32 byte CDBs, and filling in
the fields in those CDBs. We don't yet support that in the da(4) driver.
Type 1 and Type 3 protection don't require that, and can be handled by
the mps(4)/mpr(4) driver's code and firmware without any additional
input from the da(4) driver.
If a drive has Type 2 protection enabled (you frequently see this with
SAS drives shipped from Dell), don't set the various EEDP fields in the
mps(4)/mpr(4) driver command fields. Otherwise, you wind up with errors
like this that would otherwise make no sense:
(da9:mpr0:0:18:0): READ(10). CDB: 28 00 00 00 00 00 00 02 00 00
(da9:mpr0:0:18:0): CAM status: SCSI Status Error
(da9:mpr0:0:18:0): SCSI status: Check Condition
(da9:mpr0:0:18:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
(da9:mpr0:0:18:0):
(da9:mpr0:0:18:0): Field Replaceable Unit: 0
(da9:mpr0:0:18:0): Command Specific Info: 0
(da9:mpr0:0:18:0):
(da9:mpr0:0:18:0): Descriptor 0x80: f8 21
(da9:mpr0:0:18:0): Descriptor 0x81: 00 00 00 00 00 00
(da9:mpr0:0:18:0): Error 22, Unretryable error
In other words, what kind of strange SAS hard drive doesn't support a
standard 10 byte SCSI READ command? In this case, one that has Type 2
protection enabled.
We can revisit this when we put Type 2 protection support in the da(4)
driver, but for now this will help people who put Type 2 formatted drives
in a system and wonder what in the world is going on.
MFC after: 3 days
Sponsored by: Spectra Logic
Version 16 is just a number bump, since we already had those changes.
Version 17 introduces new AdapterType value, that allows new user-space
tools from Broadcom to differentiate adapter generations 3 and 3.5.
Version 18 updates headers and adds SAS_DEVICE_DISCOVERY_ERROR reporting.
MFC after: 2 weeks
Chain frames required to satisfy all 2K of declared I/Os of 128KB each take
more then a megabyte of a physical memory, all of which existing code tries
allocate as physically contiguous. This patch removes that physical
contiguousness requirement, leaving only virtual contiguousness. I was
thinking about other ways of allocation, but the less granular allocation
becomes, the bigger is the overhead and/or complexity, reaching about 100%
overhead if allocate each frame separately.
The patch also bumps the chain frames hard limit from 2K to 16K. It is more
than enough for the case of default REQ_FRAMES and MAXPHYS (the drivers will
allocate less than that automatically), while in case of increased MAXPHYS
it will control maximal memory usage.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14420
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.
NVMe is now working on powerpc64 host.
Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: imp, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
This is a first part of the change. It makes the drivers to calculate
the required number of chain frames to satisfy worst case scenarios, but
it does not change existing overly strict limits on them. The next step
will be to rewrite the allocator to not require megabytes of physically
contiguous address space, that may be problematic if done after boot,
after doing which the limits can be removed. Until that this code can
just correct user set limits, if they are set too high.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14261
a bit in the normal operation of the driver. Covert it to represent bytes
instead of 32bit words. Fix what I believe to be is a bug in this respect
with the Tri-mode cards.
Sponsored by: Netflix
Both drivers were found to report CAM bigger queue depth then they really
can handle. It made them later under high load with many disks return
some of submitted requests back with CAM_REQUEUE_REQ status for later
resubmission.
Reviewed by: scottl
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14215
In mp{r,s}_diag_register(), which is used to register diagnostic
buffers with the mp{r,s}(4) firmware, we allocate DMAable memory.
There were several issues here:
o No checking of the bus_dmamap_load() return value. If the load
failed or got deferred, mp{r,s}_diag_register() continued on as if
nothing had happened. We now check the return value and bail
out if it fails.
o No waiting for a deferred load callback. bus_dmamap_load()
calls a supplied callback when the mapping is done. This is
generally done immediately, but it can be deferred.
mp{r,s}_diag_register() did not check to see whether the callback
was already done before proceeding on. We now sleep until the
callback is done if it is deferred.
o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
memory is allocated and loaded. This is necessary on some
platforms to synchronize host memory that is going to be updated
by a device.
Both drivers would also panic if the firmware was reinitialized while
a diagnostic buffer operation was in progress. This fixes that problem
as well. (The driver will reinitialize the firmware in various
circumstances, but the problem I ran into was that the firmware would
generate an IOC Fault due to a PCIe error.)
mp{r,s}var.h:
Add a new structure, struct mpr_busdma_context, that is
used for deferred busdma load callbacks.
Add a prototype for mp{r,s}_memaddr_wait_cb().
mp{r,s}.c:
Add a new busdma callback function, mp{r,s}_memaddr_wait_cb().
This provides synchronization for callers that want to
wait on a deferred bus_dmamap_load() callback.
mp{r,s}_user.c:
In bus_dmamap_register(), add a call to bus_dmamap_sync()
with the BUS_DMASYNC_PREREAD flag set after an allocation
is loaded.
Also, check the return value of bus_dmamap_load(). If it
fails, bail out. If it is EINPROGRESS, wait for the
callback to happen. We use an interruptible sleep (msleep
with PCATCH) and let the callback clean things up if we get
interrupted.
In mpr_diag_read_buffer() and mps_diag_read_buffer(), call
bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying
the data out to make sure the data is in stable storage.
In mp{r,s}_post_fw_diag_buffer() and
mp{r,s}_release_fw_diag_buffer(), check the reply to see
whether it is NULL. It can be NULL (and the command non-NULL)
if the controller gets reinitialized while we're waiting for
the command to complete but the driver structures aren't
reallocated. The driver structures generally won't be
reallocated unless there is a firmware upgrade that changes
one of the IOCFacts.
When freeing diagnostic buffers in mp{r,s}_diag_register()
and mp{r,s}_diag_unregister(), zero/NULL out the buffer after
freeing it. This will prevent a duplicate free in some
situations.
Sponsored by: Spectra Logic
Reviewed by: mav, scottl
MFC after: 1 week
Differential Revision: D13453
Summary:
Some architectures use large (36-bit) physical addresses, with smaller
virtual addresses. Casting between vm_paddr_t (or bus_addr_t) and void * is
considered illegal, so cast through uintptr_t. No functional change on existing
platforms.
Reviewed By: scottl
Differential Revision: https://reviews.freebsd.org/D14042
Uses of mallocarray(9).
The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.
Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.
Reported by: wosch
PR: 225197
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
would attempt to re-allocate interrupts during a chip reset without
first de-allocating them. Doing that right is going to be tricky, so
just band-aid it for now so that a re-init doesn't guarantee a failure
due to resource re-use.
Reported by: gallatin
Sponsored by: Netflix
sys/dev/mpr/mpr_mapping.c
If _mapping_process_dpm_pg0 detects inconsistencies in the drive
mapping table (stored in the HBA's NVRAM), abort reading it and
continue to boot as if the mapping table were blank. I observed
such inconsistencies in several HBAs after upgrading firmware from
14.0.0.0 to 15.0.0.0.
Reviewed by: slm
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12901
needed, but it silences an erroneous Coverity warning and makes the code a
little more logically consistent. Also mark the sysctl as MPSAFE.
Sponsored by: Netflix
commit it to make initiazation less chatty in the normal case, and more useful
and informative when real debugging is turned on.
Reviewed by: ken (earlier version)
Sponsored by: Netflix
When the mps(4) and mpr(4) drivers need to reinitialize the
firmware, they sometimes need to reallocate all of the memory
allocated by the driver. The reallocation happens whenever the IOC
Facts change. That should only happen after a firmware upgrade.
If the reinitialization happens as a result of a timed out command
sent to the card, the command that timed out and triggered the
reinit may have been freed if iocfacts_allocate() reallocated all
memory. If the caller attempts to access the command after that,
the kernel will panic because the caller will be dereferencing
freed memory.
The solution is to set a flag in the softc when we reallocate,
and avoid dereferencing the command strucure if we've reallocated.
The changes are largely the same in both drivers, since mpr(4) is a
derivative of mps(4).
o In iocfacts_allocate(), if the IOC Facts have changed and we
need to reallocate, set the REALLOCATED flag in the softc.
o Change wait_command() to take a struct mps_command ** instead of
a struct mps_command *. This allows us to NULL out the caller's
command pointer if we have to reinit the controller and the data
structures get reallocated. (The REALLOCATED flag will be set
in the softc if that has happened.)
o In every place that calls wait_command(), make sure we handle
the case where the command is NULL after the call.
o The mpr(4) driver has mpr_request_polled() which can also
reinitialize the card. Also check for reallocation there.
Reviewed by: scottl, slm
MFC after: 1 week
Sponsored by: Spectra Logic
the informational print functions. Collapse the debug API a bit to be
more generic and not require as much code duplication. While here, fix
a bug in MPS that was already fixed in MPR.
Do the allocation before requesting the IOCFacts message. This triggers
the LSI firmware to recognize the multiqueue should be enabled if available.
Multiqueue isn't used by the driver yet, but this also fixes a problem with
the cached IOCFacts not matching latter checks, leading to potential problems
with error recovery.
As a side-effect, fetch the driver tunables as early as possible.
Reviewed by: slm
Obtained from: Netflix
Differential Revision: D9243
mps_wait_command() and mpr_wait_command() were using getmicrotime() to
determine elapsed time when checking for a timeout in polled mode.
getmicrotime() isn't guaranteed to monotonically increase, and that
caused spurious timeouts occasionally.
Switch to using getmicrouptime(), which does increase monotonically.
This fixes the spurious timeouts in my test case.
Reviewed by: slm, scottl
MFC after: 3 days
Sponsored by: Spectra Logic
This includes NVMe device support and adds support for the following adapters:
SAS 3408
SAS 3416
SAS 3508
SAS 3516
SAS 3616
SAS 3708
SAS 3716
Reviewed by: ken, scottl, asomers, mav
Approved by: ken, scottl, mav
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10095
This is mostly a version bump to stay in version number sync with firmware.
The only change there was cosmetic: Display degraded speed message upon
receiving Active Cable Exception Event with DEGRADED reason code.
Discussed with: slm@
MFC after: 1 week
Thought it's difficult to reproduce, I think this variable was responsible
for a use-after-free panic when a SATA disk timed out responding to a SATA
identify command during boot.
Submitted by: slm
Reviewed by: slm
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9364
All of the printing from the tables file now has wrappers so that the
handling is cleaner and it's possible to print something out (say, during
development) without having to fight the global debug flags. This re-org
will also make it easier to have the tables be compiled out at build time
if desired.
Other than fixing some minor bugs, there are no user-visible changes from
this change
Sponsored by: Netflix, Inc.
Differential Revision: D9238