CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.
This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.
Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.
The code may still need some polishing, but generally it is functional.
Relnotes: yes
Sponsored by: iXsystems, Inc.
This is preparation for possibility to open/close media several times
per LUN life cycle. While there, rename variables to reduce confusion.
As additional bonus this allows to open read-only media, such as ZFS
snapshots.
It has nothing to share with too huge ctl.c other then device descriptor,
but even that may be counted as design error that may be fixed later.
At some point we may even want to have several ioctl ports.
Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places..
Reporting SCSI errors to console is often useless, pollutes logs and may
affect performance. For debugging there is kern.cam.ctl.debug sysctl
MFC after: 1 week
At this point IMMED flag is translated to MNT_NOWAIT flag of VOP_FSYNC(),
hoping that file system implements that (ZFS seems doesn't).
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- remove last remnants of never implemented multiple targets support;
- implement missing support for LUN mapping in this area.
Due to existing locking constraints LUN mapping code is practically
unlocked at this point. Hopefully it is not racy enough to live until
somebody get idea how to call sleeping fronend methods under lock also
taken by the same frontend in non-sleepable context. :(
Replace iSCSI-specific LUN mapping mechanism with new one, working for any
ports. By default all ports are created without LUN mapping, exposing all
CTL LUNs as before. But, if needed, LUN mapping can be manually set on
per-port basis via ctladm. For its iSCSI ports ctld does it via ioctl(2).
The next step will be to teach ctld to work with FibreChannel ports also.
Respecting additional flexibility of the new mechanism, ctl.conf now allows
alternative syntax for LUN definition. LUNs can now be defined in global
context, and then referenced from targets by unique name, as needed. It
allows same LUN to be exposed several times via multiple targets.
While there, increase limit for LUNs per target in ctld from 256 to 1024.
Some initiators do not support LUNs above 255, but that is not our problem.
Discussed with: trasz
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
sys/cam/scsi/scsi_all.h:
In struct scsi_extended_inquiry_data:
- Increase the length field to 2 bytes, as it is 2 bytes in SPC-4.
- Add bit definitions for the various Activiate Microcode actions.
- Add the Sequential Access Logical Block Protection support bit,
since we need that in the sa(4) driver. (For modifications
that will come later.)
- Add definitions for the various Multi I_T Nexus Microcode
Download modes.
sys/cam/ctl/ctl.c:
As of SPC-4, a single report of "REPORTED LUNS DATA HAS CHANGED"
is to be given per I_T nexus. Once it is reported, the unit
attention condition should be cleared for all LUNS attached to
an I_T nexus.
Previously that only happened when a REPORT LUNS command was
processed.
This behavior may be different (according to SAM-5) when the
UA_INTLCK_CTRL bits are non-zero in the control mode page but
CTL does not currently support that.
So, in view of the spec, whenever we report a LUN inventory
change unit attention, clear it on all LUNs for that
particular I_T nexus.
Add a new function, ctl_clear_ua() that will clear a unit
attention on all LUNs for the given I_T nexus.
One field in the extended inquiry data that we could potentially
report at some point is the maximum supported sense data length.
To do that, we would the SIM to report (via path inquiry
perhaps) how much sense data it is able to send.
Add comments to explain some of the bits that are set in the
Extended Inquiry VPD page.
Add a few comments to make it more clear which functions handle
various VPD pages.
Sponsored by: Spectra Logic
MFC after: 1 week
While in most cases CTL should correctly fetch those values from backing
storages, there are some initiators (like MS SQL), that may not like large
physical block sizes, even if they are true. For such cases allow override
fetched values with supported ones (like 4K).
MFC after: 1 week
Technically read requests can be executed in any order or simultaneously
since they are not changing any data. But ZFS prefetcher goes crasy when
it receives consecutive requests from different threads. Since prefetcher
works on level of separate blocks, instead of two consecutive 128K requests
it may receive 32 8K requests in mixed order.
This patch is more workaround then a real fix, and it does not fix all of
prefetcher problems, but it improves sequential read speed by 3-4x times
in some configurations. On the other side it may hurt performance if
some backing store has no prefetch, that is why it is disabled by default
for raw devices.
MFC after: 2 weeks
While without UNMAP support there is not much initiator can do about it,
the administrator still better be notified about the storage overflow.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
It is implemented for LUNs backed by ZVOLs in "dev" mode and files.
GEOM has no such API, so for LUNs backed by raw devices all LBAs will
be reported as mapped/unknown.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Abusing ability of major UAs cover minor ones we may not account UAs for
inactive ports. Allocate UAs storage for port and start accounting only
after some initiator from that port fetched its first POWER ON OCCURRED.
This reduces per-LUN CTL memory usage from >1MB to less then 100K.
MFC after: 1 month
In configurations with many ports, like iSCSI, each LUN is typically
accessed only by limited subset of ports. Allocating that memory on
demand allows to reduce CTL memory usage from 5.3MB/LUN to 1.3MB/LUN.
MFC after: 1 month