VMware returns BUSY status when storage has transient connectivity issues.
It is often better to wait and let VM admin fix the problem then crash.
Discussed with: ken
MFC after: 1 week
Replace iSCSI-specific LUN mapping mechanism with new one, working for any
ports. By default all ports are created without LUN mapping, exposing all
CTL LUNs as before. But, if needed, LUN mapping can be manually set on
per-port basis via ctladm. For its iSCSI ports ctld does it via ioctl(2).
The next step will be to teach ctld to work with FibreChannel ports also.
Respecting additional flexibility of the new mechanism, ctl.conf now allows
alternative syntax for LUN definition. LUNs can now be defined in global
context, and then referenced from targets by unique name, as needed. It
allows same LUN to be exposed several times via multiple targets.
While there, increase limit for LUNs per target in ctld from 256 to 1024.
Some initiators do not support LUNs above 255, but that is not our problem.
Discussed with: trasz
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
sys/cam/scsi/scsi_all.h:
In struct scsi_extended_inquiry_data:
- Increase the length field to 2 bytes, as it is 2 bytes in SPC-4.
- Add bit definitions for the various Activiate Microcode actions.
- Add the Sequential Access Logical Block Protection support bit,
since we need that in the sa(4) driver. (For modifications
that will come later.)
- Add definitions for the various Multi I_T Nexus Microcode
Download modes.
sys/cam/ctl/ctl.c:
As of SPC-4, a single report of "REPORTED LUNS DATA HAS CHANGED"
is to be given per I_T nexus. Once it is reported, the unit
attention condition should be cleared for all LUNS attached to
an I_T nexus.
Previously that only happened when a REPORT LUNS command was
processed.
This behavior may be different (according to SAM-5) when the
UA_INTLCK_CTRL bits are non-zero in the control mode page but
CTL does not currently support that.
So, in view of the spec, whenever we report a LUN inventory
change unit attention, clear it on all LUNs for that
particular I_T nexus.
Add a new function, ctl_clear_ua() that will clear a unit
attention on all LUNs for the given I_T nexus.
One field in the extended inquiry data that we could potentially
report at some point is the maximum supported sense data length.
To do that, we would the SIM to report (via path inquiry
perhaps) how much sense data it is able to send.
Add comments to explain some of the bits that are set in the
Extended Inquiry VPD page.
Add a few comments to make it more clear which functions handle
various VPD pages.
Sponsored by: Spectra Logic
MFC after: 1 week
This could cause data corruption due to accessing wrong LUN in case of
retries on write errors. Failed writes were retried to read LUN.
MFC after: 3 days
Define it as an atomic uint32_t. These increments happen infrequently
enough for the atomic overhead to be a problem, and since they're now
independent atomics, they won't contend with xpt_lock_buses().
This counter is useful as a means of cheaply identifying whether any changes
have been made to the CAM peripheral list. Userland programs have no guarantee
that the counter won't change on them while being returned or while processing
the information, so they must be written accordingly.
Discussed with: ken, mav (in general)
MFC after: 1 week
Sponsored by: Spectra Logic
If we aggregated status sending with data move and got error, allow status
to be updated and resent again separately. Without this command may stuck
without status sent at all.
MFC after: 2 weeks
This includes a new summary mode (-s) for camcontrol defects that
quickly tells the user the most important thing: how many defects
are in the requested list. The actual location of the defects is
less important.
Modern drives frequently have more than the 8191 defects that can
be reported by the READ DEFECT DATA (10) command. If they don't
have that many grown defects, they certainly have more than 8191
defects in the primary (i.e. factory) defect list.
The READ DEFECT DATA (12) command allows for longer parameter
lists, as well as indexing into the list of defects, and so allows
reporting many more defects.
This has been tested with HGST drives and Seagate drives, but
does not fully work with Seagate drives. Once I have a Seagate
spec I may be able to determine whether it is possible to make it
work with Seagate drives.
scsi_da.h: Add a definition for the new long block defect
format.
Add bit and mask definitions for the new extended
physical sector and bytes from index defect
formats.
Add a prototype for the new scsi_read_defects() CDB
building function.
scsi_da.c: Add a new scsi_read_defects() CDB building function.
camcontrol(8) was previously composing CDBs manually.
This is long overdue.
camcontrol.c: Revamp the camcontrol defects subcommand. We now
go through multiple stages in trying to get defect
data off the drive while avoiding various drive
firmware quirks.
We start off by requesting the defect header with
the 10 byte command. If we're in summary mode (-s)
and the drive reports fewer defects than can be
represented in the 10 byte header, we're done.
Otherwise, we know that we need to issue the
12 byte command if the drive reports the maximum
number of defects.
If we're in summary mode, we're done if we get a
good response back when asking for the 12 byte header.
If the user has asked for the full list, then we
use the address descriptor index field in the 12
byte CDB to step through the list in 64K chunks.
64K is small enough to work with most any ancient
or modern SCSI controller.
Add support for printing the new long block defect
format, as well as the extended physical sector and
bytes from index formats. I don't have any drives
that support the new formats.
Add a hexadecimal output format that can be turned
on with -X.
Add a quiet mode (-q) that can be turned on with
the summary mode (-s) to just print out a number.
Revamp the error detection and recovery code for
the defects command to work with HGST drives.
Call the new scsi_read_defects() CDB building
function instead of rolling the CDB ourselves.
Pay attention to the residual from the defect list
request when printing it out, so we don't run off
the end of the list.
Use the new scsi_nv library routines to convert
from strings to numbers and back.
camcontrol.8: Document the new defect formats (longblock, extbfi,
extphys) and command line options (-q, -s, -S and
-X) for the defects subcommand.
Explain a little more about what drives generally
do and don't support.
Sponsored by: Spectra Logic
MFC after: 1 week
data to go undetected.
The probe code does an MD5 checksum of the inquiry data (and page
0x80 serial number if available) before doing a reprobe of an
existing device, and then compares a checksum after the probe to
see whether the device has changed.
This check was broken in January, 2000 by change 56146 when the extended
inquiry probe code was added.
In the extended inquiry probe case, it was calculating the checksum
a second time. The second time it included the updated inquiry
data from the short inquiry probe (first 36 bytes). So it wouldn't
catch cases where the vendor, product, revision, etc. changed.
This change will have the effect that when a device's inquiry data is
updated and a rescan is issued, it will disappear and then reappear.
This is the appropriate action, because if the inquiry data or serial
number changes, it is either a different device or the device
configuration may have changed significantly. (e.g. with updated
firmware.)
scsi_xpt.c: Don't calculate the initial MD5 checksum on
standard inquiry data and the page 0x80 serial
number if we have already calculated it.
MFC after: 1 week
Sponsored by: Spectra Logic
While in most cases CTL should correctly fetch those values from backing
storages, there are some initiators (like MS SQL), that may not like large
physical block sizes, even if they are true. For such cases allow override
fetched values with supported ones (like 4K).
MFC after: 1 week
While we don't support MCS, hole in received sequence numbers may mean
only PDU loss. While we don't support lost PDU recovery, terminate the
connection to avoid stuck commands.
While there, improve handling of sequence numbers wrap after 2^32 PDUs.
MFC after: 2 weeks
Technically read requests can be executed in any order or simultaneously
since they are not changing any data. But ZFS prefetcher goes crasy when
it receives consecutive requests from different threads. Since prefetcher
works on level of separate blocks, instead of two consecutive 128K requests
it may receive 32 8K requests in mixed order.
This patch is more workaround then a real fix, and it does not fix all of
prefetcher problems, but it improves sequential read speed by 3-4x times
in some configurations. On the other side it may hurt performance if
some backing store has no prefetch, that is why it is disabled by default
for raw devices.
MFC after: 2 weeks
While without UNMAP support there is not much initiator can do about it,
the administrator still better be notified about the storage overflow.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Previously it was supported only for ZVOL-backed LUNs, but now should work
for file-backed LUNs too. Used value in this case is a space occupied by
the backing file, while available value is an available space on file
system. Pool thresholds are still not implemented in this case.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
It is implemented for LUNs backed by ZVOLs in "dev" mode and files.
GEOM has no such API, so for LUNs backed by raw devices all LBAs will
be reported as mapped/unknown.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
After recent optimizations this change is no longer blocked by CTL memory
consumption. Those limits are still not free, but much cheaper now.
MFC after: 1 week
Relnotes: yes
Sponsored by: iXsystems, Inc.
Abusing ability of major UAs cover minor ones we may not account UAs for
inactive ports. Allocate UAs storage for port and start accounting only
after some initiator from that port fetched its first POWER ON OCCURRED.
This reduces per-LUN CTL memory usage from >1MB to less then 100K.
MFC after: 1 month
In configurations with many ports, like iSCSI, each LUN is typically
accessed only by limited subset of ports. Allocating that memory on
demand allows to reduce CTL memory usage from 5.3MB/LUN to 1.3MB/LUN.
MFC after: 1 month
interpreating NULLs as EOLs, but converting them to spaces.
SPC-4 does not tell that T10-based IDs should be NULL-terminated/padded.
And while it tells that it should include only ASCII chars (0x20-0x7F),
there are some USB sticks (SanDisk Ultra Fit), that have NULLs inside
the value. Treating NULLs as EOLs there made those LUN IDs non-unique.
MFC after: 1 week
Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.
For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.
On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Previously, any timeout value for which (timeout * hz) will overflow the
signed integer, will give weird results, since callout(9) routines will
convert negative values of ticks to '1'. For unsigned integer overflow we
will get sufficiently smaller timeout values than expected.
Switch from callout_reset, which requires conversion to int based ticks
to callout_reset_sbt to avoid this.
Also correct isci to correctly resolve ccb timeout.
This was based on the original work done by Eygene Ryabinkin
<rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid
the overlow.
Differential Revision: https://reviews.freebsd.org/D1157
Reviewed by: mav, davide
MFC after: 1 month
Sponsored by: Multiplay
In this mode one head is in Active state, supporting all commands, while
another is in Standby state, supporting only minimal LUN discovery subset.
It is still incomplete since Standby state requires reservation support,
which is impossible to do right without having interlink between heads.
But it allows to run some basic experiments.
related cleanups:
- Require each driver to initalize a mutex in the scsi_low_softc that
is shared with the scsi_low code. This mutex is used for CAM SIMs,
timers, and interrupt handlers.
- Replace the osdep function switch with direct calls to the relevant
CAM functions and direct manipulation of timers via callout(9).
- Collapse the CAM-specific scsi_low_osdep_interface substructure
directly into scsi_low_softc.
- Use bus_*() instead of bus_space_*().
- Return BUS_PROBE_DEFAULT from probe routines instead of 0.
- No need to zero softcs.
- Pass 0ul and ~0ul instead of 0 and ~0 to bus_alloc_resource().
- Spell "dettach" as "detach".
- Remove unused 'dvname' variables.
- De-spl().
Tested by: no one
With command serialization used in CTL, there are no other commands to abort
when PREEMPT AND ABORT gets to run, so it is practically equal to PREEMPT.
MFC after: 1 week
For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This makes VMWare VAAI Thin Provisioning Stun primitive activate, pausing
the virtual machine, when backing storage (ZFS pool) is getting overflowed.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
This prevents BIO_DELETE requests getting stuck in the TRIM queue which
results in a panic on shutdown due to outstanding requests.
PR: 194606
Reported by: Guido Falsi
Reviewed by: mav
MFC after: 3 days
Sponsored by: Multiplay
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.
- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.
- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days
Sponsored by: Mellanox Technologies
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe
This includes support for:
- Read-Write Error Recovery mode page;
- Informational Exceptions Control mode page;
- Logical Block Provisioning mode page;
- LOG SENSE command.
No real Informational Exceptions features yet. This is only a placeholder.
Sponsored by: iXsystems, Inc.
SPC-4 r2 allows to return empty defect list if the list is not supported.
We don't reallu support defect data lists, but this suppresses some errors.
MFC after: 1 week
Make this subcommand less FC-specific, reporting target and port addresses
in more generic way. Also make it report list of connected initiators in
unified way, working for both FC and iSCSI, and potentially others.
MFC after: 1 week
Queued async events handling in CAM opened race, that may lead to duplicate
AC_PATH_REGISTERED events delivery during boot. That was not happening
before r272935 because the driver was initialized later. After that change
it started create duplicate ports in CTL.
Target mode operation does not depend on the initiator mode scan process.
This change allows the target driver to attach earlier and receive some
async events (like AC_CONTRACT) that could be lost otherwise.
MFC after: 1 week
Such LUNs will be visible to initiators, but return "not ready" status
on media access commands. If backing storage become available later,
`ctladm modify ...` or `service ctld reload` can trigger its reopen.
It allows to push out some final data from the send queue to the socket
before its close. In particular, it increases chances for logout response
to be delivered to the initiator.