All requests arriving for processing after OFFLINE flag set are rejected
with BUSY status. Races around OFFLINE flag setting are closed by calling
taskqueue_drain_all().
CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.
This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.
Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.
The code may still need some polishing, but generally it is functional.
Relnotes: yes
Sponsored by: iXsystems, Inc.
This is preparation for possibility to open/close media several times
per LUN life cycle. While there, rename variables to reduce confusion.
As additional bonus this allows to open read-only media, such as ZFS
snapshots.
Previously such LUNs were silently ignored. But while they indeed unable
to process most of SCSI commands, some, like RTPG, they still can.
MFC after: 1 month
The significant changes and bugs fixed here are:
1. Fixed a bug in the progress display code:
When the user's filename is too big, or his terminal width is too
small, the progress code could wind up using a negative number for
the length of the "stars" that it uses to indicate progress.
This negative value was assigned to an unsigned variable, resulting
in a very large positive value.
The result is that we wound up writing garbage from memory to the
user's terminal.
With an 80 column terminal, a file name length of more than 35
characters would generate this problem.
To address this, we now set a minimum progress bar length, and
truncate the user's file name as needed.
This has been tested with large filenames and small terminals, and
at least produces reasonable results. If the terminal is too
narrow, the progress display takes up an additional line with each
update, but this is more user friendly than writing garbage to the
tty.
2. SATA drives connected via a SATA controller didn't have SCSI Inquiry
data populated in struct cam_device. This meant that the code in
fw_get_vendor() in fwdownload.c would try to match a zero-length
vendor ID, and so return the first entry in the vendor table. (Which
used to be HITACHI.) Fixed by grabbing identify data, passing the
identify buffer into fw_get_vendor(), and matching against the model
name.
3. SATA drives connected via a SAS controller do have Inquiry data
populated. The table included a couple of entries -- "ATA ST" and
"ATA HDS", intended to handle Seagate and Hitachi SATA drives attached
via a SAS controller. SCSI to ATA translation layers use a vendor
ID of "ATA" (which is standard), and then the model name from the ATA
identify data as the SCSI product name when they are returning data on
SATA disks. The cam_strmatch code will match the first part of the
string (because the length it is given is the length of the vendor,
"ATA"), and return 0 (i.e. a match). So all SATA drives attached to
a SAS controller would be programmed using the Seagate method
(WRITE BUFFER mode 7) of SCSI firmware downloading.
4. Issue #2 above covered up a bug in fw_download_img() -- if the
maximum packet size in the vendor table was 0, it tried to default
to a packet size of 32K. But then it didn't actually succeed in
doing that, because it set the packet size to the value that was
in the vendor table (0). Now that we actually have ATA attached
drives fall use the VENDOR_ATA case, we need a reasonable default
packet size. So this is fixed to properly set the default packet size.
5. Add support for downloading firmware to IBM LTO drives, and add a
firmware file validation method to make sure that the firmware
file matches the drive type. IBM tape drives include a Load ID and
RU name in their vendor-specific VPD page 0x3. Those should match
the IDs in the header of the firmware file to insure that the
proper firmware file is loaded.
6. This also adds a new -q option to the camcontrol fwdownload
subcommand to suppress informational output. When -q is used in
combination with -y, the firmware upgrade will happen without
prompting and without output except if an error condition occurs.
7. Re-add support for printing out SCSI inquiry information when
asking the user to confirm that they want to download firmware, and
add printing of ATA Identify data if it is a SATA disk. This was
removed in r237281 when support for flashing ATA disks was added.
8. Add a new camcontrol(8) "opcodes" subcommand, and use the
underlying code to get recommended timeout values for drive
firmware downloads.
Many SCSI devices support the REPORT SUPPORTED OPERATION CODES
command, and some support the optional timeout descriptor that
specifies nominal and recommended timeouts for the commands
supported by the device.
The new camcontrol opcodes subcommand allows displaying all
opcodes supported by a drive, information about which fields
in a SCSI CDB are actually used by a given SCSI device, and the
nominal and recommended timeout values for each command.
Since firmware downloads can take a long time in some devices, and
the time varies greatly between different types of devices, take
advantage of the infrastructure used by the camcontrol opcodes
subcommand to determine the best timeout to use for the WRITE
BUFFER command in SCSI device firmware downloads.
If the device recommends a timeout, it is likely to be more
accurate than the default 50 second timeout used by the firmware
download code. If the user specifies a timeout, it will override
the default or device recommended timeout. If the device doesn't
support timeout descriptors, we fall back to the default.
9. Instead of downloading firmware to SATA drives behind a SAS controller
using WRITE BUFFER, use the SCSI ATA PASS-THROUGH command to compose
an ATA DOWNLOAD MICROCODE command and it to the drive. The previous
version of this code attempted to send a SCSI WRITE BUFFER command to
SATA drives behind a SAS controller. Although that is part of the
SAT-3 spec, it doesn't work with the parameters used with LSI
controllers at least.
10.Add a new mechanism for making common ATA passthrough and
ATA-behind-SCSI passthrough commands.
The existing camcontrol(8) ATA command mechanism checks the device
type on every command executed. That works fine for individual
commands, but is cumbersome for things like a firmware download
that send a number of commands.
The fwdownload code detects the device type up front, and then
sends the appropriate commands.
11.In simulation mode (-s), if the user specifies the -v flag, print out
the SCSI CDB or ATA registers that would be sent to the drive. This will
aid in debugging any firmware download issues.
sbin/camcontrol/fwdownload.c:
Add a device type to the fw_vendor structure, so that we can
specify different download methods for different devices from the
same vendor. In this case, IBM hard drives (from when they
still made hard drives) and tape drives.
Add a tur_status field to the fw_vendor structure so that we can
specify whether the drive to be upgraded should be ready, not
ready, or whether it doesn't matter. Add the corresponding
capability in fw_download_img().
Add comments describing each of the vendor table fields.
Add HGST and SmrtStor to the supported SCSI vendors list.
In fw_get_vendor(), look at ATA identify data if we have a SATA
device to try to identify what the drive vendor is.
Add IBM firmware file validation. This gets VPD page 0x3, and
compares the Load ID and RU name in the page to the values
included in the header. The validation code will refuse to load
a firmware file if the values don't match. This does allow the
user to attempt a downgrade; whether or not it succeeds will
likely depend on the drive settings.
Add a -q option, and disable all informative output
(progress bars, etc.) when this is enabled.
Re-add the inquiry in the confirmation dialog so the user has
a better idea of which device he is talking to. Add support for
displaying ATA identify data.
Don't automatically disable confirmation in simulation (-s) mode.
This allows the user to see the inquiry or identify data in the
dialog, and see exactly what they would see when the command
actually runs. Also, in simulation mode, if the user specifies
the -v flag, print out the SCSI CDB or ATA registers that would
be sent to the drive. This will aid in debugging any firmware
download issues.
Add a timeout field and timeout type to the firmware download
vendor table. This allows specifying a default timeout and allows
specifying whether we should attempt to probe for a recommended
timeout from the drive.
Add a new fuction, fw_get_timeout(), that will determine
which timeout to use for the WRITE BUFFER command. If the
user specifies a timeout, we always use that. Otherwise,
we will use the drive recommended timeout, if available,
and fall back to the default when a drive recommended
timeout isn't available.
When we prompt the user, tell him what timeout we're going
to use, and the source of the timeout.
Revamp the way SATA devices are handled.
In fwdownload(), use the new get_device_type() function to
determine what kind of device we're talking to.
Allow firmware downloads to any SATA device, but restrict
SCSI downloads to known devices. (The latter is not a
change in behavior.)
Break out the "ready" check from fw_download_img() into a
new subfunction, fw_check_device_ready(). This sends the
appropriate command to the device in question -- a TEST
UNIT READY or an IDENTIFY. The IDENTIFY for SATA devices
a SAT layer is done using the SCSI ATA PASS-THROUGH
command.
Use the new build_ata_cmd() function to build either a SCSI or
ATA I/O CCB to issue the DOWNLOAD MICROCODE command to SATA
devices. build_ata_cmd() figures looks at the devtype argument
and fills in the correct CCB type and CDB or ATA registers.
Revamp the vendor table to remove the previous
vendor-specific ATA entries and use a generic ATA vendor
placeholder. We currently use the same method for all ATA
drives, although we may have to add vendor-specific
behavior once we test this with more drives.
sbin/camcontrol/progress.c:
In progress_draw(), make barlength a signed value so that
we can easily detect a negative value.
If barlength (the length of the progress bar) would wind up
negative due to a small TTY width or a large filename,
set the bar length to the new minimum (10 stars) and
truncate the user's filename. We will truncate it down to
0 characters if necessary.
Calculate a new prefix_len variable (user's filename length)
and use it as the precision when printing the filename.
sbin/camcontrol/camcontrol.c:
Implement a new camcontrol(8) subcommand, "opcodes". The
opcodes subcommand allows displaying the entire list of
SCSI commands supported by a device, or details on an
individual command. In either case, it can display
nominal and recommended timeout values.
Add the scsiopcodes() function, which calls the new
scsigetopcodes() function to fetch opcode data from a
drive.
Add two new functions, scsiprintoneopcode() and
scsiprintopcodes(), which print information about one
opcode or all opcodes, respectively.
Remove the get_disk_type() function. It is no longer used.
Add a new function, dev_has_vpd_page(), that fetches the
supported INQUIRY VPD list from a device and tells the
caller whether the requested VPD page is available.
Add a new function, get_device_type(), that returns a more
precise device type than the old get_disk_type() function.
The get_disk_type() function only distinguished between
SCSI and ATA devices, and SATA devices behind a SCSI to ATA
translation layer were considered to be "SCSI".
get_device_type() offers a third type, CC_DT_ATA_BEHIND_SCSI.
We need to know this to know whether to attempt to send ATA
passthrough commands. If the device has the ATA
Information VPD page (0x89), then it is an ATA device
behind a SCSI to ATA translation layer.
Remove the type argument from the fwdownload() subcommand.
Add a new function, build_ata_cmd(), that will take one set
of common arguments and build either a SCSI or ATA I/O CCB,
depending on the device type passed in.
sbin/camcontrol/camcontrol.h:
Add a prototype for scsigetopcodes().
Add a new enumeration, camcontrol_devtype.
Add prototypes for dev_has_vpd_page(), get_device_type()
and build_ata_cmd().
Remove the type argument from the fwdownload() subcommand.
sbin/camcontrol/camcontrol.8
Explain that the fwdownload subcommand will use the drive
recommended timeout if available, and that the user can
override the timeout.
Document the new opcodes subcommand.
Explain that we will attempt to download firmware to any
SATA device.
Document supported SCSI vendors, and models tested if known.
Explain the commands used to download firmware for the
three different drive and controller combinations.
Document that the -v flag in simulation mode for the fwdownload
subcommand will print out the SCSI CDBs or ATA registers that would
be used.
sys/cam/scsi/scsi_all.h:
Add new bit definitions for the one opcode descriptor for
the REPORT SUPPORTED OPCODES command.
Add a function prototype for scsi_report_supported_opcodes().
sys/cam/scsi/scsi_all.c:
Add a new CDB building function, scsi_report_supported_opcodes().
Sponsored by: Spectra Logic
MFC after: 1 week
It has nothing to share with too huge ctl.c other then device descriptor,
but even that may be counted as design error that may be fixed later.
At some point we may even want to have several ioctl ports.
Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places..
Reporting SCSI errors to console is often useless, pollutes logs and may
affect performance. For debugging there is kern.cam.ctl.debug sysctl
MFC after: 1 week
At this point IMMED flag is translated to MNT_NOWAIT flag of VOP_FSYNC(),
hoping that file system implements that (ZFS seems doesn't).
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Before this change SYNCHRONIZE CACHE commands were executed exclusively,
as if they had ORDERED tag. But looking through SCSI specs I've found
no any reason to be so strict. For reads this ordering seems pointless.
For writes it looks less obvious, so I left ordering against preceeding
write commands, while following ones are no longer required to wait.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time. If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.
My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- Use pointer assignment rather than a combination of pointers and
flags to switch buffers between unmapped and mapped. This eliminates
multiple flags and generally simplifies the logic.
- Eliminate b_saveaddr since it is only used with pager bufs which have
their b_data re-initialized on each allocation.
- Gather up some convenience routines in the buffer cache for
manipulating buf space and buf malloc space.
- Add an inline, buf_mapped(), to standardize checks around unmapped
buffers.
In collaboration with: mlaier
Reviewed by: kib
Tested by: pho (many small revisions ago)
Sponsored by: EMC / Isilon Storage Division
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.
Reviewed by: kib
To avoid conflicts between target and initiator devices in CAM, make
CTL use target ID reported by HBA as its initiator_id in XPT_PATH_INQ.
That target ID is known to never be used for initiator role, so it won't
conflict. For Fibre Channel and FireWire HBAs this specific ID choice
is irrelevant since all target IDs there are virtual. Same time for SPI
HBAs it seems could be even requirement to use same target ID for both
initiator and target roles.
While there are some more things to polish in isp(4) driver, first tests
of using both roles same time on the same port appeared successfull:
# camcontrol devlist -v
scbus0 on isp0 bus 0:
<FREEBSD CTLDISK 0001> at scbus0 target 1 lun 0 (da20,pass21)
<> at scbus0 target 256 lun 0 (ctl0)
<> at scbus0 target -1 lun ffffffff (ctl1)
- remove last remnants of never implemented multiple targets support;
- implement missing support for LUN mapping in this area.
Due to existing locking constraints LUN mapping code is practically
unlocked at this point. Hopefully it is not racy enough to live until
somebody get idea how to call sleeping fronend methods under lock also
taken by the same frontend in non-sleepable context. :(
At this point CTL has no known use case for device queue freezes.
Same time existing (considered to be broken) code was found to cause
modify-after-free issues.
Discussed with: ken
MFC after: 1 week
MAM is Medium Auxiliary Memory and is most commonly found as flash
chips on tapes.
This includes support for reading attributes and decoding most
known attributes, but does not yet include support for writing
attributes or reporting attributes in XML format.
libsbuf/Makefile:
Add subr_prf.c for the new sbuf_hexdump() function. This
function is essentially the same function.
libsbuf/Symbol.map:
Add a new shared library minor version, and include the
sbuf_hexdump() function.
libsbuf/Version.def:
Add version 1.4 of the libsbuf library.
libutil/hexdump.3:
Document sbuf_hexdump() alongside hexdump(3), since it is
essentially the same function.
camcontrol/Makefile:
Add attrib.c.
camcontrol/attrib.c:
Implementation of READ ATTRIBUTE support for camcontrol(8).
camcontrol/camcontrol.8:
Document the new 'camcontrol attrib' subcommand.
camcontrol/camcontrol.c:
Add the new 'camcontrol attrib' subcommand.
camcontrol/camcontrol.h:
Add a function prototype for scsiattrib().
share/man/man9/sbuf.9:
Document the existence of sbuf_hexdump() and point users to
the hexdump(3) man page for more details.
sys/cam/scsi/scsi_all.c:
Add a table of known attributes, text descriptions and
handler functions.
Add a new scsi_attrib_sbuf() function along with a number
of other related functions that help decode attributes.
scsi_attrib_ascii_sbuf() decodes ASCII format attributes.
scsi_attrib_int_sbuf() decodes binary format attributes, and
will pass them off to scsi_attrib_hexdump_sbuf() if they're
bigger than 8 bytes.
scsi_attrib_vendser_sbuf() decodes the vendor and drive
serial number attribute.
scsi_attrib_volcoh_sbuf() decodes the Volume Coherency
Information attribute that LTFS writes out.
sys/cam/scsi/scsi_all.h:
Add a number of attribute-related structure definitions and
other defines.
Add function prototypes for all of the functions added in
scsi_all.c.
sys/kern/subr_prf.c:
Add a new function, sbuf_hexdump(). This is the same as
the existing hexdump(9) function, except that it puts the
result in an sbuf.
This also changes subr_prf.c so that it can be compiled in
userland for includsion in libsbuf.
We should work to change this so that the kernel hexdump
implementation is a wrapper around sbuf_hexdump() with a
statically allocated sbuf with a drain. That will require
a drain function that goes to the kernel printf() buffer
that can take a non-NUL terminated string as input.
That is because an sbuf isn't NUL-terminated until it is
finished, and we don't want to finish it while we're still
using it.
We should also work to consolidate the userland hexdump and
kernel hexdump implemenatations, which are currently
separate. This would also mean making applications that
currently link in libutil link in libsbuf.
sys/sys/sbuf.h:
Add the prototype for sbuf_hexdump(), and add another copy
of the hexdump flag values if they aren't already defined.
Ideally the flags should be defined in one place but the
implemenation makes it difficult to do properly. (See
above.)
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
referenced. I think that there does exist an unlikely edge case for a
memory leak, but only if a driver is incorrectly written and specifies no
valid range of targets to scan. That can be fixed in a follow-up commit.
Obtained from: Netflix, Inc.
AC_ADVINFO_CHANGED.
Without this change, newly inserted hard disks won't always have their
physical path device nodes created. The problem reproduces most readily
when attaching a large number of disks at once.
Differential Revision: https://reviews.freebsd.org/D2290
Reviewed by: mav, imp
MFC after: 2 weeks
Sponsored by: Spectra Logic
There are four places, all in cam_xpt.c, where ccbs are malloc'ed. Two of
these use M_ZERO, two don't. The two that don't meant that allocated ccbs
had trash in them making it hard to debug errors where they showed up. Due
to this, use M_ZERO all the time when allocating ccbs.
Submitted by: Scott Ferris <scott.ferris@isilon.com>
Sponsored by: EMC/Isilon Storage Division
Reviewed by: scottl, imp
Differential: https://reviews.freebsd.org/D2120
The only drives I have discovered so far that support medium type
reports are newer HP LTO (LTO-5 and LTO-6) drives. IBM drives
only support the density reports.
sys/cam/scsi/scsi_sa.h:
The number of possible density codes in the medium type
report is 9, not 8. This caused problems parsing all of
the medium type report after this point in the structure.
usr.bin/mt/mt.c:
Run the density codes returned in the medium type report
through denstostring(), just like the primary and secondary
density codes in the density report. This will print the
density code in hex, and give a text description if it
is available.
Thanks to Rudolf Cejka for doing extensive testing with HP LTO drives
and Bacula and discovering these problems.
Tested by: Rudolf Cejka <cejkar at fit.vutbr.cz>
Sponsored by: Spectra Logic
MFC after: 4 days
SCSI-2 devices.
Some older tape devices claim to be SCSI-2, but actually do support
long position information. (Long position information includes
the current file mark.) For example, the COMPAQ SuperDLT1.
So we now only disable the check on SCSI-1 and older devices.
sys/cam/scsi/scsi_sa.c:
In saregister(), only disable fetching long position
information on SCSI-1 and older drives. Update the
comment to explain why.
Confirmed by: dvl
Sponsored by: Spectra Logic
MFC after: 3 weeks
tracking.
It is important to subtract the residual from the requested
transfer size to see how much data was actually transferred. With
tape drives in particular, it is common to request more data than is
returned.
Also, add I/O latency tracking for CAM requests issued by
cam_periph_runccb().
If the caller supplies a struct devstat, and the I/O is a SCSI or
ATA I/O, we will track the elapsed time to provide I/O latency
statistics for the request.
sys/cam/scsi/cam_periph.c:
In cam_periph_runccb(), subtract the residual when reporting I/O
totals to devstat(9) for SCSI and ATA passthrough requests.
In cam_periph_runccb(), grab the I/O start time and supply
the start time to devstat_end_transaction() so that it can
calculate the elapsed I/O time.
Sponsored by: Spectra Logic
MFC after: 1 week
The primary focus of these changes is to modernize FreeBSD's
tape infrastructure so that we can take advantage of some of the
features of modern tape drives and allow support for LTFS.
Significant changes and new features include:
o sa(4) driver status and parameter information is now exported via an
XML structure. This will allow for changes and improvements later
on that will not break userland applications. The old MTIOCGET
status ioctl remains, so applications using the existing interface
will not break.
o 'mt status' now reports drive-reported tape position information
as well as the previously available calculated tape position
information. These numbers will be different at times, because
the drive-reported block numbers are relative to BOP (Beginning
of Partition), but the block numbers calculated previously via
sa(4) (and still provided) are relative to the last filemark.
Both numbers are now provided. 'mt status' now also shows the
drive INQUIRY information, serial number and any position flags
(BOP, EOT, etc.) provided with the tape position information.
'mt status -v' adds information on the maximum possible I/O size,
and the underlying values used to calculate it.
o The extra sa(4) /dev entries (/dev/saN.[0-3]) have been removed.
The extra devices were originally added as place holders for
density-specific device nodes. Some OSes (NetBSD, NetApp's OnTap
and Solaris) have had device nodes that, when you write to them,
will automatically select a given density for particular tape drives.
This is a convenient way of switching densities, but it was never
implemented in FreeBSD. Only the device nodes were there, and that
sometimes confused users.
For modern tape devices, the density is generally not selectable
(e.g. with LTO) or defaults to the highest availble density when
the tape is rewritten from BOT (e.g. TS11X0). So, for most users,
density selection won't be necessary. If they do need to select
the density, it is easy enough to use 'mt density' to change it.
o Protection information is now supported. This is either a
Reed-Solomon CRC or CRC32 that is included at the end of each block
read and written. On write, the tape drive verifies the CRC, and
on read, the tape drive provides a CRC for the userland application
to verify.
o New, extensible tape driver parameter get/set interface.
o Density reporting information. For drives that support it,
'mt getdensity' will show detailed information on what formats the
tape drive supports, and what formats the tape drive supports.
o Some mt(1) functionality moved into a new mt(3) library so that
external applications can reuse the code.
o The new mt(3) library includes helper routines to aid in parsing
the XML output of the sa(4) driver, and build a tree of driver
metadata.
o Support for the MTLOAD (load a tape in the drive) and MTWEOFI
(write filemark immediate) ioctls needed by IBM's LTFS
implementation.
o Improve device departure behavior for the sa(4) driver. The previous
implementation led to hangs when the device was open.
o This has been tested on the following types of drives:
IBM TS1150
IBM TS1140
IBM LTO-6
IBM LTO-5
HP LTO-2
Seagate DDS-4
Quantum DLT-4000
Exabyte 8505
Sony DDS-2
contrib/groff/tmac/doc-syms,
share/mk/bsd.libnames.mk,
lib/Makefile,
Add libmt.
lib/libmt/Makefile,
lib/libmt/mt.3,
lib/libmt/mtlib.c,
lib/libmt/mtlib.h,
New mt(3) library that contains functions moved from mt(1) and
new functions needed to interact with the updated sa(4) driver.
This includes XML parser helper functions that application writers
can use when writing code to query tape parameters.
rescue/rescue/Makefile:
Add -lmt to CRUNCH_LIBS.
src/share/man/man4/mtio.4
Clarify this man page a bit, and since it contains what is
essentially the mtio.h header file, add new ioctls and structure
definitions from mtio.h.
src/share/man/man4/sa.4
Update BUGS and maintainer section.
sys/cam/scsi/scsi_all.c,
sys/cam/scsi/scsi_all.h:
Add SCSI SECURITY PROTOCOL IN/OUT CDB definitions and CDB building
functions.
sys/cam/scsi/scsi_sa.c
sys/cam/scsi/scsi_sa.h
Many tape driver changes, largely outlined above.
Increase the sa(4) driver read/write timeout from 4 to 32
minutes. This is based on the recommended values for IBM LTO
5/6 drives. This may also avoid timeouts for other tape
hardware that can take a long time to do retries and error
recovery. Longer term, a better way to handle this is to ask
the drive for recommended timeout values using the REPORT
SUPPORTED OPCODES command. Modern IBM and Oracle tape drives
at least support that command, and it would allow for more
accurate timeout values.
Add XML status generation. This is done with a series of
macros to eliminate as much duplicate code as possible. The
new XML-based status values are reported through the new
MTIOCEXTGET ioctl.
Add XML driver parameter reporting, using the new MTIOCPARAMGET
ioctl.
Add a new driver parameter setting interface, using the new
MTIOCPARAMSET and MTIOCSETLIST ioctls.
Add a new MTIOCRBLIM ioctl to get block limits information.
Add CCB/CDB building routines scsi_locate_16, scsi_locate_10,
and scsi_read_position_10().
scsi_locate_10 implements the LOCATE command, as does the
existing scsi_set_position() command. It just supports
additional arguments and features. If/when we figure out a
good way to provide backward compatibility for older
applications using the old function API, we can just revamp
scsi_set_position(). The same goes for
scsi_read_position_10() and the existing scsi_read_position()
function.
Revamp sasetpos() to take the new mtlocate structure as an
argument. It now will use either scsi_locate_10() or
scsi_locate_16(), depending upon the arguments the user
supplies. As before, once we change position we don't have a
clear idea of what the current logical position of the tape
drive is.
For tape drives that support long form position data, we
read the current position and store that for later reporting
after changing the position. This should help applications
like Bacula speed tape access under FreeBSD once they are
modified to support the new ioctls.
Add a new quirk, SA_QUIRK_NO_LONG_POS, that is set for all
drives that report SCSI-2 or older, as well as drives that
report an Illegal Request type error for READ POSITION with
the long format. So we should automatically detect drives
that don't support the long form and stop asking for it after
an initial try.
Add a partition number to the sa(4) softc.
Improve device departure handling. The previous implementation
led to hangs when the device was open.
If an application had the sa(4) driver open, and attempted to
close it after it went away, the cam_periph_release() call in
saclose() would cause the periph to get destroyed because that
was the last reference to it. Because destroy_dev() was
called from the sa(4) driver's cleanup routine (sacleanup()),
and would block waiting for the close to happen, a deadlock
would result.
So instead of calling destroy_dev() from the cleanup routine,
call destroy_dev_sched_cb() from saoninvalidate() and wait for
the callback.
Acquire a reference for devfs in saregister(), and release it
in the new sadevgonecb() routine when all devfs devices for
the particular sa(4) driver instance are gone.
Add a new function, sasetupdev(), to centralize setting
per-instance devfs device parameters instead of repeating the
code in saregister().
Add an open count to the softc, so we know how many
peripheral driver references are a result of open
sessions.
Add the D_TRACKCLOSE flag to the cdevsw flags so
that we get a 1:1 mapping of open to close calls
instead of a N:1 mapping.
This should be a no-op for everything except the
control device, since we don't allow more than one
open on non-control devices.
However, since we do allow multiple opens on the
control device, the combination of the open count
and the D_TRACKCLOSE flag should result in an
accurate peripheral driver reference count, and an
accurate open count.
The accurate open count allows us to release all
peripheral driver references that are the result
of open contexts once we get the callback from devfs.
sys/sys/mtio.h:
Add a number of new mt(4) ioctls and the requisite data
structures. None of the existing interfaces been removed
or changed.
This includes definitions for the following new ioctls:
MTIOCRBLIM /* get block limits */
MTIOCEXTLOCATE /* seek to position */
MTIOCEXTGET /* get tape status */
MTIOCPARAMGET /* get tape params */
MTIOCPARAMSET /* set tape params */
MTIOCSETLIST /* set N params */
usr.bin/mt/Makefile:
mt(1) now depends on libmt, libsbuf and libbsdxml.
usr.bin/mt/mt.1:
Document new mt(1) features and subcommands.
usr.bin/mt/mt.c:
Implement support for mt(1) subcommands that need to
use getopt(3) for their arguments.
Implement a new 'mt status' command to replace the old
'mt status' command. The old status command has been
renamed 'ostatus'.
The new status function uses the MTIOCEXTGET ioctl, and
therefore parses the XML data to determine drive status.
The -x argument to 'mt status' allows the user to dump out
the raw XML reported by the kernel.
The new status display is mostly the same as the old status
display, except that it doesn't print the redundant density
mode information, and it does print the current partition
number and position flags.
Add a new command, 'mt locate', that will supersede the
old 'mt setspos' and 'mt sethpos' commands. 'mt locate'
implements all of the functionality of the MTIOCEXTLOCATE
ioctl, and allows the user to change the logical position
of the tape drive in a number of ways. (Partition,
block number, file number, set mark number, end of data.)
The immediate bit and the explicit address bits are
implemented, but not documented in the man page.
Add a new 'mt weofi' command to use the new MTWEOFI ioctl.
This allows the user to ask the drive to write a filemark
without waiting around for the operation to complete.
Add a new 'mt getdensity' command that gets the XML-based
tape drive density report from the sa(4) driver and displays
it. This uses the SCSI REPORT DENSITY SUPPORT command
to get comprehensive information from the tape drive about
what formats it is able to read and write.
Add a new 'mt protect' command that allows getting and setting
tape drive protection information. The protection information
is a CRC tacked on to the end of every read/write from and to
the tape drive.
Sponsored by: Spectra Logic
MFC after: 1 month
properly.
If there is garbage in the flags field, it can sometimes include a
set CDAI_FLAG_STORE flag, which may cause either an error or
perhaps result in overwriting the field that was intended to be
read.
sys/cam/cam_ccb.h:
Add a new flag to the XPT_DEV_ADVINFO CCB, CDAI_FLAG_NONE,
that callers can use to set the flags field when no store
is desired.
sys/cam/scsi/scsi_enc_ses.c:
In ses_setphyspath_callback(), explicitly set the
XPT_DEV_ADVINFO flags to CDAI_FLAG_NONE when fetching the
physical path information. Instead of ORing in the
CDAI_FLAG_STORE flag when storing the physical path, set
the flags field to CDAI_FLAG_STORE.
sys/cam/scsi/scsi_sa.c:
Set the XPT_DEV_ADVINFO flags field to CDAI_FLAG_NONE when
fetching extended inquiry information.
sys/cam/scsi/scsi_da.c:
When storing extended READ CAPACITY information, set the
XPT_DEV_ADVINFO flags field to CDAI_FLAG_STORE instead of
ORing it into a field that isn't initialized.
sys/dev/mpr/mpr_sas.c,
sys/dev/mps/mps_sas.c:
When fetching extended READ CAPACITY information, set the
XPT_DEV_ADVINFO flags field to CDAI_FLAG_NONE instead of
setting it to 0.
sbin/camcontrol/camcontrol.c:
When fetching a device ID, set the XPT_DEV_ADVINFO flags
field to CDAI_FLAG_NONE instead of 0.
sys/sys/param.h:
Bump __FreeBSD_version to 1100061 for the new XPT_DEV_ADVINFO
CCB flag, CDAI_FLAG_NONE.
Sponsored by: Spectra Logic
MFC after: 1 week
This change by 2-3 times improves performance of misaligned XCOPY and WUT
commands by avoiding unneeded read-modify-write cycles inside ZFS.
MFC after: 1 week
This change by 2-3 times improves performance of misaligned WRITE SAME
commands by avoiding unneeded read-modify-write cycles inside ZFS.
MFC after: 1 week
target iSCSI offload. Add mechanism to query maximum receive data segment
size supported by chosen hardware offload module, and use it in ctld(8)
to determine the value to advertise to the other side.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
This VPD page is effectively an extension of the standard Inquiry
data page, and includes lots of additional bits.
This commit includes support for probing the page in the SCSI probe code,
and an additional request type for the XPT_DEV_ADVINFO CCB. CTL already
supports the Extended Inquiry page.
Support for querying this page in the sa(4) driver will come later.
sys/cam/scsi/scsi_xpt.c:
Probe the Extended Inquiry page, if the device supports it, and
return it in response to a XPT_DEV_ADVINFO CCB if it is requested.
sys/cam/scsi/cam_ccb.h:
Define a new advanced information CCB data type, CDAI_TYPE_EXT_INQ.
sys/cam/cam_xpt.c:
Free the extended inquiry data in a device when the device goes
away.
sys/cam/cam_xpt_internal.h:
Add an extended inquiry data pointer and length to struct cam_ed.
sys/sys/param.h
Bump __FreeBSD_version for the addition of the new
CDAI_TYPE_EXT_INQ advanced information type.
Sponsored by: Spectra Logic
MFC after: 1 week
While ctld(8) still does not allow multiple portal groups per target
to be configured, kernel should now be able to handle it.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
VMware returns BUSY status when storage has transient connectivity issues.
It is often better to wait and let VM admin fix the problem then crash.
Discussed with: ken
MFC after: 1 week
Replace iSCSI-specific LUN mapping mechanism with new one, working for any
ports. By default all ports are created without LUN mapping, exposing all
CTL LUNs as before. But, if needed, LUN mapping can be manually set on
per-port basis via ctladm. For its iSCSI ports ctld does it via ioctl(2).
The next step will be to teach ctld to work with FibreChannel ports also.
Respecting additional flexibility of the new mechanism, ctl.conf now allows
alternative syntax for LUN definition. LUNs can now be defined in global
context, and then referenced from targets by unique name, as needed. It
allows same LUN to be exposed several times via multiple targets.
While there, increase limit for LUNs per target in ctld from 256 to 1024.
Some initiators do not support LUNs above 255, but that is not our problem.
Discussed with: trasz
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
sys/cam/scsi/scsi_all.h:
In struct scsi_extended_inquiry_data:
- Increase the length field to 2 bytes, as it is 2 bytes in SPC-4.
- Add bit definitions for the various Activiate Microcode actions.
- Add the Sequential Access Logical Block Protection support bit,
since we need that in the sa(4) driver. (For modifications
that will come later.)
- Add definitions for the various Multi I_T Nexus Microcode
Download modes.
sys/cam/ctl/ctl.c:
As of SPC-4, a single report of "REPORTED LUNS DATA HAS CHANGED"
is to be given per I_T nexus. Once it is reported, the unit
attention condition should be cleared for all LUNS attached to
an I_T nexus.
Previously that only happened when a REPORT LUNS command was
processed.
This behavior may be different (according to SAM-5) when the
UA_INTLCK_CTRL bits are non-zero in the control mode page but
CTL does not currently support that.
So, in view of the spec, whenever we report a LUN inventory
change unit attention, clear it on all LUNs for that
particular I_T nexus.
Add a new function, ctl_clear_ua() that will clear a unit
attention on all LUNs for the given I_T nexus.
One field in the extended inquiry data that we could potentially
report at some point is the maximum supported sense data length.
To do that, we would the SIM to report (via path inquiry
perhaps) how much sense data it is able to send.
Add comments to explain some of the bits that are set in the
Extended Inquiry VPD page.
Add a few comments to make it more clear which functions handle
various VPD pages.
Sponsored by: Spectra Logic
MFC after: 1 week
This could cause data corruption due to accessing wrong LUN in case of
retries on write errors. Failed writes were retried to read LUN.
MFC after: 3 days
Define it as an atomic uint32_t. These increments happen infrequently
enough for the atomic overhead to be a problem, and since they're now
independent atomics, they won't contend with xpt_lock_buses().
This counter is useful as a means of cheaply identifying whether any changes
have been made to the CAM peripheral list. Userland programs have no guarantee
that the counter won't change on them while being returned or while processing
the information, so they must be written accordingly.
Discussed with: ken, mav (in general)
MFC after: 1 week
Sponsored by: Spectra Logic
If we aggregated status sending with data move and got error, allow status
to be updated and resent again separately. Without this command may stuck
without status sent at all.
MFC after: 2 weeks
This includes a new summary mode (-s) for camcontrol defects that
quickly tells the user the most important thing: how many defects
are in the requested list. The actual location of the defects is
less important.
Modern drives frequently have more than the 8191 defects that can
be reported by the READ DEFECT DATA (10) command. If they don't
have that many grown defects, they certainly have more than 8191
defects in the primary (i.e. factory) defect list.
The READ DEFECT DATA (12) command allows for longer parameter
lists, as well as indexing into the list of defects, and so allows
reporting many more defects.
This has been tested with HGST drives and Seagate drives, but
does not fully work with Seagate drives. Once I have a Seagate
spec I may be able to determine whether it is possible to make it
work with Seagate drives.
scsi_da.h: Add a definition for the new long block defect
format.
Add bit and mask definitions for the new extended
physical sector and bytes from index defect
formats.
Add a prototype for the new scsi_read_defects() CDB
building function.
scsi_da.c: Add a new scsi_read_defects() CDB building function.
camcontrol(8) was previously composing CDBs manually.
This is long overdue.
camcontrol.c: Revamp the camcontrol defects subcommand. We now
go through multiple stages in trying to get defect
data off the drive while avoiding various drive
firmware quirks.
We start off by requesting the defect header with
the 10 byte command. If we're in summary mode (-s)
and the drive reports fewer defects than can be
represented in the 10 byte header, we're done.
Otherwise, we know that we need to issue the
12 byte command if the drive reports the maximum
number of defects.
If we're in summary mode, we're done if we get a
good response back when asking for the 12 byte header.
If the user has asked for the full list, then we
use the address descriptor index field in the 12
byte CDB to step through the list in 64K chunks.
64K is small enough to work with most any ancient
or modern SCSI controller.
Add support for printing the new long block defect
format, as well as the extended physical sector and
bytes from index formats. I don't have any drives
that support the new formats.
Add a hexadecimal output format that can be turned
on with -X.
Add a quiet mode (-q) that can be turned on with
the summary mode (-s) to just print out a number.
Revamp the error detection and recovery code for
the defects command to work with HGST drives.
Call the new scsi_read_defects() CDB building
function instead of rolling the CDB ourselves.
Pay attention to the residual from the defect list
request when printing it out, so we don't run off
the end of the list.
Use the new scsi_nv library routines to convert
from strings to numbers and back.
camcontrol.8: Document the new defect formats (longblock, extbfi,
extphys) and command line options (-q, -s, -S and
-X) for the defects subcommand.
Explain a little more about what drives generally
do and don't support.
Sponsored by: Spectra Logic
MFC after: 1 week
data to go undetected.
The probe code does an MD5 checksum of the inquiry data (and page
0x80 serial number if available) before doing a reprobe of an
existing device, and then compares a checksum after the probe to
see whether the device has changed.
This check was broken in January, 2000 by change 56146 when the extended
inquiry probe code was added.
In the extended inquiry probe case, it was calculating the checksum
a second time. The second time it included the updated inquiry
data from the short inquiry probe (first 36 bytes). So it wouldn't
catch cases where the vendor, product, revision, etc. changed.
This change will have the effect that when a device's inquiry data is
updated and a rescan is issued, it will disappear and then reappear.
This is the appropriate action, because if the inquiry data or serial
number changes, it is either a different device or the device
configuration may have changed significantly. (e.g. with updated
firmware.)
scsi_xpt.c: Don't calculate the initial MD5 checksum on
standard inquiry data and the page 0x80 serial
number if we have already calculated it.
MFC after: 1 week
Sponsored by: Spectra Logic
While in most cases CTL should correctly fetch those values from backing
storages, there are some initiators (like MS SQL), that may not like large
physical block sizes, even if they are true. For such cases allow override
fetched values with supported ones (like 4K).
MFC after: 1 week
While we don't support MCS, hole in received sequence numbers may mean
only PDU loss. While we don't support lost PDU recovery, terminate the
connection to avoid stuck commands.
While there, improve handling of sequence numbers wrap after 2^32 PDUs.
MFC after: 2 weeks
Technically read requests can be executed in any order or simultaneously
since they are not changing any data. But ZFS prefetcher goes crasy when
it receives consecutive requests from different threads. Since prefetcher
works on level of separate blocks, instead of two consecutive 128K requests
it may receive 32 8K requests in mixed order.
This patch is more workaround then a real fix, and it does not fix all of
prefetcher problems, but it improves sequential read speed by 3-4x times
in some configurations. On the other side it may hurt performance if
some backing store has no prefetch, that is why it is disabled by default
for raw devices.
MFC after: 2 weeks
While without UNMAP support there is not much initiator can do about it,
the administrator still better be notified about the storage overflow.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Previously it was supported only for ZVOL-backed LUNs, but now should work
for file-backed LUNs too. Used value in this case is a space occupied by
the backing file, while available value is an available space on file
system. Pool thresholds are still not implemented in this case.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
It is implemented for LUNs backed by ZVOLs in "dev" mode and files.
GEOM has no such API, so for LUNs backed by raw devices all LBAs will
be reported as mapped/unknown.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
After recent optimizations this change is no longer blocked by CTL memory
consumption. Those limits are still not free, but much cheaper now.
MFC after: 1 week
Relnotes: yes
Sponsored by: iXsystems, Inc.
Abusing ability of major UAs cover minor ones we may not account UAs for
inactive ports. Allocate UAs storage for port and start accounting only
after some initiator from that port fetched its first POWER ON OCCURRED.
This reduces per-LUN CTL memory usage from >1MB to less then 100K.
MFC after: 1 month
In configurations with many ports, like iSCSI, each LUN is typically
accessed only by limited subset of ports. Allocating that memory on
demand allows to reduce CTL memory usage from 5.3MB/LUN to 1.3MB/LUN.
MFC after: 1 month
interpreating NULLs as EOLs, but converting them to spaces.
SPC-4 does not tell that T10-based IDs should be NULL-terminated/padded.
And while it tells that it should include only ASCII chars (0x20-0x7F),
there are some USB sticks (SanDisk Ultra Fit), that have NULLs inside
the value. Treating NULLs as EOLs there made those LUN IDs non-unique.
MFC after: 1 week
Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.
For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.
On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Previously, any timeout value for which (timeout * hz) will overflow the
signed integer, will give weird results, since callout(9) routines will
convert negative values of ticks to '1'. For unsigned integer overflow we
will get sufficiently smaller timeout values than expected.
Switch from callout_reset, which requires conversion to int based ticks
to callout_reset_sbt to avoid this.
Also correct isci to correctly resolve ccb timeout.
This was based on the original work done by Eygene Ryabinkin
<rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid
the overlow.
Differential Revision: https://reviews.freebsd.org/D1157
Reviewed by: mav, davide
MFC after: 1 month
Sponsored by: Multiplay
In this mode one head is in Active state, supporting all commands, while
another is in Standby state, supporting only minimal LUN discovery subset.
It is still incomplete since Standby state requires reservation support,
which is impossible to do right without having interlink between heads.
But it allows to run some basic experiments.
related cleanups:
- Require each driver to initalize a mutex in the scsi_low_softc that
is shared with the scsi_low code. This mutex is used for CAM SIMs,
timers, and interrupt handlers.
- Replace the osdep function switch with direct calls to the relevant
CAM functions and direct manipulation of timers via callout(9).
- Collapse the CAM-specific scsi_low_osdep_interface substructure
directly into scsi_low_softc.
- Use bus_*() instead of bus_space_*().
- Return BUS_PROBE_DEFAULT from probe routines instead of 0.
- No need to zero softcs.
- Pass 0ul and ~0ul instead of 0 and ~0 to bus_alloc_resource().
- Spell "dettach" as "detach".
- Remove unused 'dvname' variables.
- De-spl().
Tested by: no one
With command serialization used in CTL, there are no other commands to abort
when PREEMPT AND ABORT gets to run, so it is practically equal to PREEMPT.
MFC after: 1 week
For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This makes VMWare VAAI Thin Provisioning Stun primitive activate, pausing
the virtual machine, when backing storage (ZFS pool) is getting overflowed.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
This prevents BIO_DELETE requests getting stuck in the TRIM queue which
results in a panic on shutdown due to outstanding requests.
PR: 194606
Reported by: Guido Falsi
Reviewed by: mav
MFC after: 3 days
Sponsored by: Multiplay