Commit Graph

1772 Commits

Author SHA1 Message Date
Alexander Motin
115dc0c762 Reduce code duplication around Write Exclusive persistent reservation.
While there, allow some more commands to pass persistent reservation.

MFC after:	1 week
2014-10-27 09:26:24 +00:00
Alexander Motin
b491e75b4b Allocate buffer for READ BUFFER/WRITE BUFFER commands on demand.
These commands are rare, but consume additional 256KB RAM per LUN.

MFC after:	1 week
2014-10-26 23:25:42 +00:00
Alexander Motin
ac4bf33203 Fix support for LUN flat space addressing.
MFC after:	1 week
2014-10-26 20:13:46 +00:00
Steven Hartland
467298f5e3 Fix CF ERASE breakage caused by 268205.
This prevents BIO_DELETE requests getting stuck in the TRIM queue which
results in a panic on shutdown due to outstanding requests.

PR:		194606
Reported by:	Guido Falsi
Reviewed by:	mav
MFC after:	3 days
Sponsored by:	Multiplay
2014-10-26 18:41:01 +00:00
Alexander Motin
fd86d88034 Fix printing non-terminated strings in devlist XML.
MFC after:	1 week
2014-10-26 15:28:07 +00:00
Alexander Motin
6f67ce91ca Add "rpm" and "formfactor" LUN options to match istgt functionality.
MFC after:	1 week
2014-10-26 07:40:37 +00:00
Alexander Motin
78c4829b8b Add support for 12/16-byte EUI and 16-byte NAA IDs.
MFC after:	1 week
2014-10-25 17:07:35 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
George V. Neville-Neil
e3a21bd139 Add new quirks for the latest Samsung SSD, model 850.
Submitted by:	sbruno
MFC after:	2 weeks
2014-10-19 16:46:36 +00:00
Alexander Motin
4cc7d0982c Make VPD 80h (Serial Number) transfer length match serial number length.
MFC after:	1 week
2014-10-18 17:11:02 +00:00
Sean Bruno
323e0f6d4c Add 4k quirks for PM853T Samsung SSD
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2014-10-16 20:33:04 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Alexander Motin
218d5d4be2 Implement more functional CTL debug logging.
Setting bits in kern.cam.ctl.debug allows to log errors, commands and some
commands data respectively.

MFC after:	1 week
2014-10-16 08:42:17 +00:00
Alexander Motin
9a0190c9a1 Remove couple Copan's vendor-specific mode pages.
Those pages are highly system-/hardware-specific, the code is incomplete,
and so they hardly can be useful for anybody else.
2014-10-14 11:28:25 +00:00
Alexander Motin
523f047ea2 Some groundwork for later Informational Exceptions support.
This includes support for:
 - Read-Write Error Recovery mode page;
 - Informational Exceptions Control mode page;
 - Logical Block Provisioning mode page;
 - LOG SENSE command.

No real Informational Exceptions features yet. This is only a placeholder.

Sponsored by:	iXsystems, Inc.
2014-10-14 10:14:14 +00:00
Alexander Motin
ec05088b9c Add LBPERE mode bit definition. 2014-10-14 08:30:02 +00:00
Alexander Motin
09020352fe Don't confuse frontend with zero length data moves, just return immediately.
MFC after:	1 week
2014-10-13 16:15:32 +00:00
Alexander Motin
d70698b372 Add support for READ DEFECT DATA (10/12) commands.
SPC-4 r2 allows to return empty defect list if the list is not supported.
We don't reallu support defect data lists, but this suppresses some errors.

MFC after:	1 week
2014-10-13 14:48:49 +00:00
Alexander Motin
20a28d6cee Report physical block size for file-backed LUNs, using vattr.va_blocksize.
MFC after:	1 week
2014-10-13 11:00:58 +00:00
Alexander Motin
6461dd5c77 Remove stale comments. 2014-10-12 18:57:22 +00:00
Alexander Motin
555e99a3bf Improve and document ctladm portlist subcommand.
Make this subcommand less FC-specific, reporting target and port addresses
in more generic way.  Also make it report list of connected initiators in
unified way, working for both FC and iSCSI, and potentially others.

MFC after:	1 week
2014-10-12 06:55:34 +00:00
Alexander Motin
9045dbb21c Give physical and virtual ports numbers some more meaning. 2014-10-11 17:52:54 +00:00
Alexander Motin
5c8dcc1131 Shorten frontend name. 2014-10-11 12:56:49 +00:00
Alexander Motin
7d9cb4d9ab Filter out duplicate AC_PATH_REGISTERED async events.
Queued async events handling in CAM opened race, that may lead to duplicate
AC_PATH_REGISTERED events delivery during boot.  That was not happening
before r272935 because the driver was initialized later.  After that change
it started create duplicate ports in CTL.
2014-10-11 10:19:37 +00:00
Alexander Motin
dc5362fde9 Mark CTL frontend's CAM driver as CAM_PERIPH_DRV_EARLY.
Target mode operation does not depend on the initiator mode scan process.
This change allows the target driver to attach earlier and receive some
async events (like AC_CONTRACT) that could be lost otherwise.

MFC after:	1 week
2014-10-11 07:49:27 +00:00
Alexander Motin
19720f4113 Make ctld start even if some LUNs are unable to open backing storage.
Such LUNs will be visible to initiators, but return "not ready" status
on media access commands.  If backing storage become available later,
`ctladm modify ...` or `service ctld reload` can trigger its reopen.
2014-10-10 19:41:09 +00:00
Alexander Motin
0ca2fdeca2 Store persistent reservation keys as uint64_t instead of uint8_t[8].
This allows to simplify the code and save 512KB of RAM per LUN (8%)
by removing no longer needed "registered" keys flags.
2014-10-10 12:38:53 +00:00
Alexander Motin
259f12da87 Make iSCSI connection close somewhat less aggressive.
It allows to push out some final data from the send queue to the socket
before its close.  In particular, it increases chances for logout response
to be delivered to the initiator.
2014-10-09 09:12:08 +00:00
Alexander Motin
840a5dd447 Use proper variable when looping through periphs with CAM_PERIPH_FREE.
PR:		194256
Submitted by:	Scott M. Ferris <smferris@gmail.com>
MFC after:	3 days
Sponsored by:	EMC/Isilon Storage Division
2014-10-09 05:53:58 +00:00
Alexander Motin
5396f4d279 Implement software (mode page) and hardware (config) write protection. 2014-10-08 12:24:24 +00:00
Alexander Motin
8a41675372 Add support for WRITE ATOMIC (16) command and report SBC-4 compliance.
Atomic writes are only supported for ZVOLs in "dev" mode.  In other cases
atomicity can not be guarantied and so the command is blocked.
2014-10-08 07:48:36 +00:00
Alexander Motin
204ca472bf Set CAM_SIM_QUEUED flag before calling ctl_queue() to avoid race.
PR:		194128
Submitted by:	Scott M. Ferris <smferris@gmail.com>
MFC after:	3 days
Sponsored by:	EMC/Isilon Storage Division
2014-10-06 14:52:04 +00:00
Alexander Motin
04f10b6c96 Add support for MaxBurstLength and Expected Data transfer Length parameters.
Before this change target could send R2T request for write transfer of any
size, that could violate iSCSI RFC, which allows initiator to limit maximum
R2T size by negotiating MaxBurstLength connection parameter.

Also report an error in case of write underflow, when initiator provides
less data than initiator expects.  Previously in such case our target
sent R2T request for non-existing data, violating the RFC, and confusing
some initiators.  SCSI specs don't explicitly define how write underflows
should be handled and there are different oppinions, but reporting error
is hopefully better then violating iSCSI RFC with unpredictable results.

MFC after:	2 weeks
2014-10-06 12:20:46 +00:00
Alexander Motin
5554eb9bfc Fix length of Extended INQUIRY Data VPD page.
MFC after:	3 days
2014-10-06 07:01:32 +00:00
Alexander Motin
11cca94715 Use REPORT LUNS command for SPC-2 devices with LUN 0 disconnected.
SPC-2 tells REPORT LUNS shall be supported by devices supporting LUNs other
then LUN 0.  If we see LUN 0 disconnected, guess there may be others, and
so REPORT LUNS shall be supported.

MFC after:	1 month
2014-10-02 10:58:52 +00:00
Alexander Motin
b8f810fecd Make disconnected LUN 0 don't remain in half-configured state if there are
no LUNs on SPC-3 target after we tried REPORT LUNS.
2014-10-02 10:39:07 +00:00
Alexander Motin
5832a6aaf9 Restore CAM_QUIRK_NOLUNS check, lost in previous commit.
MFC after:	1 month
2014-10-02 10:02:38 +00:00
Alexander Motin
1d4bc8bce6 Rework the logic of sequential SCSI LUN scanner.
Previous logic was not differentiating disconnected LUNs and absent targets.
That made it to stop scan if LUN 0 was not found for any reason.  That made
problematic, for example, using iSCSI targets declaring SPC-2 compliance and
having no LUN 0 configured.

The new logic continues sequential LUN scan if:
 -- we have more configured LUNs that need recheck;
 -- this LUN is connected and its SCSI version allows more LUNs;
 -- this LUN is disconnected, its SCSI version allows more LUNs and we
    guess they may be connected (we haven't scanned first 8 LUNs yet or
    kern.cam.cam_srch_hi sysctl is set to scan more).

Reported by:	trasz
MFC after:	1 month
2014-10-02 09:42:11 +00:00
Alexander Motin
0b060244ab Fix couple issues with ROD tokens content.
MFC after:	3 days
2014-10-01 11:30:20 +00:00
Alexander Motin
3c21968c19 Do not transfer unneeded training zero bytes in INQUIRY response.
It is an addition to r269631.
2014-09-28 11:10:37 +00:00
Alexander Motin
975c8d15c2 Fix page length reported for Block Limits VPD page. 2014-09-27 20:08:34 +00:00
Alexander Motin
8f07b2d523 When reporting some major UNIT ATTENTION condition, like POWER ON OCCURRED
or I_T NEXUS LOSS, clear all minor UAs for the LUN, redundant in this case.

All SAM specifications tell that target MAY do it, but libiscsi initiator
seems require it to be done, terminating connection with error if some more
UAs happen to be reported during iSCSI connection.

MFC after:	3 days
2014-09-23 20:35:48 +00:00
Alexander Motin
eebb09cfe7 Fix ASCQ for "Logical unit not ready, manual intervention required" error. 2014-09-23 17:30:00 +00:00
Alexander Motin
5b5ad15088 Pretend that we support BYTCHK=1 in WRITE AND VERIFY command.
Technically that is not true, but since we don't implement VERIFY there
at all, doing only WRITE part, this is a minor sin.
2014-09-22 12:40:43 +00:00
Alexander Motin
228e6f4296 Fix read overrun handling, broken by using wrong variable.
MFC after:	3 days
2014-09-22 11:35:06 +00:00
Alexander Motin
227b3b9229 Deny ANCHOR flag set without UNMAP flag set in WRITE SAME commands. 2014-09-22 10:46:06 +00:00
Alexander Motin
9a9fbc3dbd Don't try to continue aborted commands if status was not set. 2014-09-22 10:05:36 +00:00
Alexander Motin
4a6d9c740a Fix UNMAP stuck if the last block descriptor in the list is empty.
MFC after:	3 days
2014-09-22 09:22:58 +00:00
Alexander Motin
2f872218e7 Simplify legacy reservation handling. Drop it on I_T nexus loss. 2014-09-22 07:59:25 +00:00
Alexander Motin
b0737f1a67 Don't report unsupported FUA_NV bit set in READ/WRITE commands as error.
While this bit is obsolete in SBC-3, SBC-2 allowed to silently ignore it.
2014-09-22 01:17:48 +00:00
Alexander Motin
d69a1908fc Report proper errors codes for unsupported SERVICE ACTION values. 2014-09-22 01:04:27 +00:00
Alexander Motin
2db9b8b5a0 Polish INQUIRY command fields validation. 2014-09-22 00:40:20 +00:00
Alexander Motin
c7a7dbbc0b Allow SUBPAGE CODE field in MODE SENSE commands. 2014-09-21 14:39:31 +00:00
Alexander Motin
810a5a5c08 Fix inverted expression to report block size in mode page block descriptor. 2014-09-19 11:15:30 +00:00
Alexander Motin
fb767c2ba2 Allow more commands to pass persistent reservation according to SPC-4 r37. 2014-09-18 22:22:14 +00:00
Alexander Motin
64c5167c91 Add support for "no Data-Out Buffer" (NDOB) flag of WRITE SAME (16) command. 2014-09-18 21:39:00 +00:00
Alexander Motin
71d8e97e35 When updating device media size use cached cdevsw pointer.
Using pointer from the cdev directly is dangerous since we have no reference
on it, and it may change any time.  That caused panic if device has gone.

While there, report capacity change only if it really changed.

MFC after:	3 days
2014-09-18 17:25:20 +00:00
Bryan Drewery
21cffce593 Correct a comment 2014-09-17 18:59:25 +00:00
Alexander Motin
4ab4d6879c Fix tpc_create_token() introduced in r269497 to encode CREATOR LOGICAL UNIT
DESCRIPTOR field as Identification Descriptor CSCD descriptor, not just as
Identification Descriptor.

MFC after:	3 days
2014-09-17 07:08:59 +00:00
Alexander Motin
13378399d6 Fix typo in defined ROD types in r269497.
MFC after:	3 days
2014-09-17 06:46:37 +00:00
Alexander Motin
cc47e5ee4f Add quirks to disable READ CAPACITY (16) for PNY USB 3.0 Flash Drives.
Submitted by:	Sean Fagan <sef@ixsystems.com>
MFC after:	3 days
2014-09-15 19:48:27 +00:00
Alexander Motin
29611ce906 Always report that we support REPORT TARGET PORT GROUPS command.
Without clustering support we any way have only one group of permanently
active ports, but that gives us one more supported VMWare feature. ;)

Solaris' Comstar also reports it even when only one port is present.
2014-09-14 23:39:13 +00:00
Alexander Motin
959ec2581b Update CAM CCB accounting for the new status quo.
devq_openings counter lost its meaning after allocation queues has gone.
held counter is still meaningful, but problematic to update due to separate
locking of CCB allocation and queuing.

To fix that replace devq_openings counter with allocated counter.  held is
now calculated on request as difference between number of allocated, queued
and active CCBs.

MFC after:	1 month
2014-09-14 11:59:49 +00:00
Alexander Motin
ab55ae255a Implement control over command reordering via options and control mode page.
It allows to bypass range checks between UNMAP and READ/WRITE commands,
which may introduce additional delays while waiting for UNMAP parameters.
READ and WRITE commands are always processed in safe order since their
range checks are almost free.
2014-09-13 10:34:23 +00:00
Alexander Motin
8e6441d87d Add "readcache" and "writecache" LUN options to control default behavior.
Default values are "on".  Disabling requires backend to support IO_DIRECT
and IO_SYNC flags respectively, or some alternatives.
2014-09-13 08:55:22 +00:00
Alexander Motin
abafbab15f Implement range checks between UNMAP and READ/WRITE commands.
Before this change UNMAP completely blocked other I/Os while running.
Now it blocks only colliding ones, slowing down others only due to ZFS
locks collisions.

Sponsored by:	iXsystems, Inc.
2014-09-13 07:45:03 +00:00
Alexander Motin
5e5ac52b42 Add support for Extended INQUIRY Data (0x86) VPD page. 2014-09-11 22:40:11 +00:00
Alexander Motin
a005d36245 Extend UNMAP blacklist on all STEC SSD models.
None of existing STEC devices need UNMAP or even support it well, having
many limitations and even hanging sometimes executing those commands.
New devices that may use UNMAP going to be released under HGST name.

MFC after:	3 days
2014-09-10 21:24:15 +00:00
Edward Tomasz Napierala
2c5b89e54b Make sure we handle less than zero timeouts in iSCSI initiator and target
in a reasonable way.

Sponsored by:	The FreeBSD Foundation
2014-09-10 14:04:10 +00:00
Edward Tomasz Napierala
42c12ebf91 Make it possible to disable NOP-In PDUs by the iSCSI initiator by setting
kern.cam.ctl.iscsi.ping_timeout to 0.  This fixes interoperability with
some initiators that don't properly support NOP-Ins, namely iPXE/gPXE.

Submitted by:	Chen Wen <pokkys@gmail.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-09-10 13:34:27 +00:00
Alexander Motin
bfa005503c Make ctl_port_mask an array to support more then 32 ports.
Overflow reported by Coverity.

CID:		1229894
MFC after:	3 days
2014-09-10 07:16:17 +00:00
Alexander Motin
f198b1719a Remove uninitialized and unused variable, reported by Coverity.
CID:		1230015
2014-09-10 07:00:36 +00:00
Alexander Motin
436b3d2f5a Fix array overrun, reported by Coverity.
CID:		1229970
2014-09-10 06:56:45 +00:00
Alexander Motin
0d5de8346a Fix couple off-by-one range check errors, reported by Coverity.
CID:		1007837
2014-09-10 06:35:00 +00:00
Alexander Motin
41ee818335 Fix memory leak on error, reported by Coverity.
CID:		1007773
2014-09-10 06:29:31 +00:00
Alexander Motin
888da1578a Fix minor buffer overflow reported by Coverity.
CID:		1006781
2014-09-10 06:25:18 +00:00
Alexander Motin
4f3fe448f5 Report that DPO and FUA bits are supported after r271311. 2014-09-09 15:19:38 +00:00
Alexander Motin
a3c5994cdf Oops, missed piece of r271311. 2014-09-09 14:20:55 +00:00
Alexander Motin
c97f969b95 Add support for Mode Page Policy (0x87) VPD page. 2014-09-09 14:09:51 +00:00
Alexander Motin
55551d0542 Improve cache control support, including DPO/FUA flags and the mode page.
At this moment it works only for files and ZVOLs in device mode since BIOs
have no respective respective cache control flags (DPO/FUA).

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2014-09-09 11:38:29 +00:00
Warner Losh
2da8d262e0 Add a few defines and packet types for SATA 3.2 and FPDMA (First Party
DMA).

Sponsored by: Netflix
2014-08-30 02:13:04 +00:00
Warner Losh
e4bed0b403 We should never enter the PROBE_SETAN phase if we're not ATAPI, since
that's ATAPI specific. Instead, skip to PROBE_SET_MULTI instead for
non ATAPI protocols. The prior code incorrectly terminated the probe
with a break, rather than arranging for probedone to get called. This
caused panics or worse on some systems.
2014-08-22 13:15:59 +00:00
Sean Bruno
5f91863a54 Add the Samsung 843T as a 4k enabled drive
Submitted by:	Jason Wolfe <jason@llnw.com>
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2014-08-21 21:05:58 +00:00
Edward Tomasz Napierala
ddc5fd903d Use proper include paths in kernel iSCSI code.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2014-08-21 16:08:17 +00:00
Warner Losh
15f48aaad6 Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return data
that's only mostly similar. Specifically word 78 bits are defined for
IDENTIFY DEVICE as
	5 Supports Hardware Feature Control
while a IDENTIFY PACKET DEVICE defines them as
	5 Asynchronous notification supported
Therefore, only pay attention to bit 5 when we're talking to ATAPI
devices (we don't use the hardware feature control at this time).
Ignore it for ATA devices. Remove kludge that papered over this issue
for Samsung SATA SSDs, since Micron drives also have the bit set and
the error was caused by this bad interpretation of the spec (which is
quite easy to do, since bits aren't normally overlapping like this).
2014-08-20 22:58:12 +00:00
John Baldwin
cef60f1868 Unexpand TAILQ_FOREACH(). 2014-08-20 16:07:56 +00:00
Alexander Motin
2ac1d5afc8 Fix lock recursion on LUN shutdown, introduced on r269497.
MFC after:	3 days
2014-08-19 17:04:18 +00:00
Steven Hartland
dc98c62f89 Added 4K quirks for Corsair Force GT and Samsung 840 SSDs
MFC after:	1 week
Sponsored by:	Multiplay
2014-08-14 13:57:17 +00:00
Warner Losh
b7cdc564ac is_full_id is set to 0 and then not used. remove it. 2014-08-08 11:46:45 +00:00
Alexander Motin
b0529e0a2d Reduce reported additional INQUIRY data length.
sizeof(struct scsi_inquiry_data) of 256 bytes combined with off-by-one
error in the changed code gave total INQUIRY data length above 255 bytes,
that was maximal INQUIRY length in SPC-2.  While SPC-3 increased the
maximal length to 64K, at least sg3_utils are still confused by that.

MFC after:	1 week
2014-08-06 17:02:19 +00:00
Alexander Motin
3406a2a083 Fix several issues and inconsistencies in UNMAP capabilities reporting.
This makes Windows 2012 to start using UNMAP on our disks.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-08-06 08:54:31 +00:00
Alexander Motin
e3e592bb7d Reimplement WRITE USING TOKEN with Block Zero token using WRITE SAME.
On my ZVOL of SSDs that increases speed of zero writing in that way from
1 to 2.5GB/s by reducing CPU overhead.
MFC after:	2 weeks
2014-08-05 15:01:30 +00:00
Alexander Motin
25eee848cd Add support for Windows dialect of EXTENDED COPY command, aka Microsoft ODX.
This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012.  It should also accelerate Hyper-V VM operations, etc.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-08-04 01:16:20 +00:00
Alexander Motin
6158ee0396 Do not retry on set of non-transient XCOPY errors.
MFC after:	1 week
2014-08-03 11:43:14 +00:00
Alexander Motin
be022505fe Do not retry token errors. They are not going to disappear by themselves.
MFC after:	1 week
2014-08-03 10:02:14 +00:00
Alexander Motin
c5c6059574 Rework r269444 to work also for lists without IDs.
MFC after:	3 days
2014-08-02 23:20:43 +00:00
Alexander Motin
475267eff1 Plug EXTENDED COPY request data memory leak.
MFC after:	3 days
2014-08-02 20:15:00 +00:00
Alexander Motin
a7c09f5c12 Fix some bugs in RECEIVE COPY STATUS data.
MFC after:	3 days
2014-08-02 19:59:19 +00:00
Alexander Motin
43d2d71953 Add missing comparisons to make list IDs in EXTENDED COPY per-initiator,
as they should be.  Wrap it into a function to not duplicate the code.

MFC after:	3 days
2014-08-02 19:51:10 +00:00
Joerg Wunsch
8d92719522 Fix breakage introduced by r256843: removing the SA_CCB_WAITING bit
left some of the decisions based on its counterpart, SA_CCB_BUFFER_IO
being random.  As a result, propagation of the residual information
for the SPACE command was broken, so the number of filemarks
encountered during a SPACE operation was miscalculated.  Consequently,
systems relying on properly tracked filemark counters (like Bacula)
fell apart.

The change also removes a switch/case in sadone() which r256843
degraded to a single remaining case label.

PR:		192285
Approved by:	ken
MFC after:	2 weeks
2014-07-31 22:09:50 +00:00
Alexander Motin
1b5307b2b0 Fix several cases of NULL dereference when INQUIRY sent to absent LUN.
MFC after:	3 days
2014-07-27 06:49:55 +00:00
Alexander Motin
67f586a8c1 Implement separate I/O dispatch method for ZVOLs in "dev" mode.
Unlike disk devices ZVOLs process all requests synchronously.  That makes
impossible sending multiple requests to them from single thread.  From the
other side ZVOLs have real d_read/d_write methods, which unlike d_strategy
can handle uio scatter/gather and have no strict I/O size limitations.

So, if ZVOL in "dev" mode is detected, use of d_read/d_write methods instead
of d_strategy allows to avoid pointless splitting of large requests into
MAXPHYS (128K) sized chunks.

MFC after:	1 week
2014-07-26 13:56:50 +00:00
Alexander Motin
696297ad2f Fix infinite loop, when doing WRITE SAME on file-backed LUN.
MFC after:	3 days
2014-07-26 13:43:25 +00:00
Edward Tomasz Napierala
cb6cc00d74 Fix ctl(4) kldload failure that manifested like this:
link_elf_obj: symbol icl_pdu_new_bhs undefined

PR:		192031
Submitted by:	Nils Beyer (earlier version)
MFC after:	3 days
Sponsored by:	FreeBSD Foundation
2014-07-25 11:29:45 +00:00
Alexander Motin
e2507751a6 Fix build with QUEUE_MACRO_DEBUG.
Submitted by:	benno@
MFC after:	3 days
2014-07-24 14:10:58 +00:00
Alexander Motin
8cbf9eae6f Increase maximal number of SCSI ports in CTL from 32 to 128.
After I gave each iSCSI target its own port, the old limit appeared to be
not so big.  This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.

MFC after:	2 weeks
2014-07-17 21:16:52 +00:00
Alexander Motin
38afa8f733 Reduce per-LUN memory usage from 18MB to 1.8MB.
CTL never had use for CA support code since SPI has gone, and there is no
even frontends supporting that.  But it still was reserving 256 bytes of
memory per LUN per every possible initiator on every possible port.

Wrap unused code with ifdef's in case somebody even need it.

MFC after:	2 weeks
2014-07-17 20:28:51 +00:00
Alexander Motin
984a2ea91f Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.
This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.

LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.

Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-07-16 15:57:17 +00:00
Alexander Motin
4d877c4148 Merge several equal serialization indexes. 2014-07-13 06:01:23 +00:00
Alexander Motin
409a3c1383 Add LUN options to specify 64-bit EUI and NAA identifiers. 2014-07-09 04:37:50 +00:00
Alexander Motin
3120a49e50 Remove status setting from datamove() path. Leave that to other places. 2014-07-08 18:51:03 +00:00
Alexander Motin
ad3cd840f2 Fix use-after-free on XPT_RESET_BUS.
That command is not queued, so does not use later status update.
2014-07-08 16:56:21 +00:00
Alexander Motin
b33b96e352 Enable TAS feature: notify initiator if its command was aborted by other.
That should make operation more kind to multi-initiator environment.
Without this, other initiators may find out that something bad happened
to their commands only via command timeout.
2014-07-08 16:38:05 +00:00
Alexander Motin
d6205772da Fix typo in r267873. 2014-07-08 13:28:37 +00:00
Alexander Motin
6c7a99be9f Do not return statuses for aborted iSCSI commands. 2014-07-08 12:16:28 +00:00
Alexander Motin
f5ffef352f Return task management requests to queued execution, but differently.
Testing shown that both original queued design with separate task queue,
and recent direct execution design had significant flaw: If abort request
arrives just after the victim, the last one may not be in the ooa_queue
yet, and so invisible for the task management function.

Unlike original queued implementation, use same queue for all SCSI and
TASK requests from the same initiator. That avoids races between them:
task functions are always executed in proper time, relatively to other
requests.
2014-07-08 12:15:15 +00:00
Alexander Motin
fdfc6c8ebd Fix task management functions status: task not found is not an error,
while not implemented function is.
2014-07-08 08:34:34 +00:00
Alexander Motin
dbd849d868 Fix "use after free" on port creation error in r268291. 2014-07-07 11:52:22 +00:00
Alexander Motin
1e5a8b8f4b Add support for READ FULL STATUS action of PERSISTENT RESERVE IN command. 2014-07-07 11:05:04 +00:00
Alexander Motin
604e257984 Teach ctl_add_initiator() to dynamically allocate IIDs from pool.
If port passed negative IID value, the function will try to allocate IID
from the pool of unused, based on passed wwpn or name arguments.  It does
all its best to make IID unique and persistent across reconnects.

This makes persistent reservation properly work for iSCSI.  Previously,
in case of reconnects, reservation could be unexpectedly lost, or even
migrate between intiators.
2014-07-07 09:37:22 +00:00
Alexander Motin
0f8de8afaa Fix bugs for PERSISTENT RESERVE OUT bits in r268096. 2014-07-07 08:58:36 +00:00
Alexander Motin
99f8c067e6 Correction to r268356: collide only sessions to the same target. 2014-07-07 06:17:07 +00:00
Alexander Motin
2c6c9e47b2 When new connection comes in, check whether we already have session from
the same intiator (Name+ISID).  If so -- terminate the old session and let
the new one take its place, as required by iSCSI RFC.
2014-07-07 05:48:11 +00:00
Alexander Motin
0020682baa Implement ABORT TASK SET and I_T NEXUS RESET task management functions.
Use the last one to terminate active commands on iSCSI session termination.
Previous code was aborting only commands doing some data moves.
2014-07-07 03:10:56 +00:00
Andreas Tobler
64175581d0 Make gcc happy, init idlen2. 2014-07-06 20:09:23 +00:00
Alexander Motin
1380b77c12 Close race in r268291 between port destruction, delayed by sessions
teardown, and new port creation during `service ctld restart`.

Close it by returning iSCSI port internal state, that allows to identify
dying ports, which should not be counted as existing, from really alive.
2014-07-06 17:57:59 +00:00
Alexander Motin
99ae56ac82 Add support for SCSI Ports (88h) VPD page. 2014-07-06 07:34:18 +00:00
Alexander Motin
69d7b87790 Make REPORT TARGET PORT GROUPS command report realistic data instead of
hardcoded garbage.
2014-07-06 07:02:36 +00:00
Alexander Motin
c26eee2dc9 Move lun_map() method from command nexus to port.
Previous implementation made impossible to do some things, such as calling
it for ports other then one through which command arrived.
2014-07-06 06:21:34 +00:00
Alexander Motin
561764b1c5 Relax some bit checks for INQUIRY command.
FreeBSD still tries to put LUN number in second byte until it get device
protocol version, even that it was obsoleted about 20 years ago.
2014-07-06 06:12:29 +00:00
Alexander Motin
6d81c129dd Pass through iSCSI session ISID from LOGIN request to the CTL frontend.
ISID is an important part of initiator transport ID for iSCSI.  It is not
used now, but should be to properly implement persistent reservation.
2014-07-05 21:18:33 +00:00
Alexander Motin
027e5269c9 Burry devid port method, which was a gross hack.
Instead make ports provide wanted port and target IDs, and LUNs provide
wanted LUN IDs.  After that core Device ID VPD code only had to link all
of them together and add relative port and port group numbers.

LUN ID for iSCSI LUNs no longer created by CTL, but by ctld, and passed
to CTL as "scsiname" LUN option.  This makes LUNs to report the same set
of IDs, independently from the port through which it is accessed, as
required by SCSI specifications.
2014-07-05 19:30:20 +00:00
Alexander Motin
917d38fb99 Create separate CTL port for every iSCSI target (and maybe portal group).
Having single port for all iSCSI connections makes problematic implementing
some more advanced SCSI functionality in CTL, that require proper ports
enumeration and identification.

This change extends CTL iSCSI API, making ctld daemon to control list of
iSCSI ports in CTL.  When new target is defined in config fine, ctld will
create respective port in CTL.  When target is removed -- port will be
also removed after all active commands through that port properly aborted.
This change require ctld to be rebuilt to match the kernel.

As a minor side effect, this allows to have iSCSI targets without LUNs.
While that may look odd and not very useful, that is not incorrect.
2014-07-05 18:15:00 +00:00
Alexander Motin
831e16f359 Improve CTL_BEARG_* flags support, including optional values copyout. 2014-07-05 14:32:42 +00:00
Alexander Motin
ab2616c5b0 Implement and use ctl_frontend_find(). 2014-07-05 13:50:05 +00:00
Alexander Motin
92782c33a6 Introduce new IOCTL CTL_PORT_LIST reporting in more flexible XML format.
Leave old CTL_GET_PORT_LIST in place so far.  Garbage-collect it later.
2014-07-05 05:44:26 +00:00
Alexander Motin
2cfbcb9b3a Improve readability of XML generated by CTL_LUN_LIST. 2014-07-05 04:10:24 +00:00
Alexander Motin
43fb3a65e3 Make options KPI more generic to allow it to be used for ports too,
not only for LUNs.
2014-07-05 03:34:52 +00:00
Alexander Motin
487ddad55e Use proper links field for ports linking. 2014-07-05 01:24:06 +00:00
Alexander Motin
92168f4c01 Separate concepts of frontend and port.
Before iSCSI implementation CTL had no knowledge about frontend drivers,
it had only frontends, which really were ports (alike to LUNs, if comparing
to backends).  But iSCSI added there ioctl() method, which does not belong
to frontend as a port, but belongs to a frontend driver.
2014-07-04 19:27:06 +00:00
Alexander Motin
2f5be87a14 Remove targ_enable()/targ_disable() frontend methods.
Those methods were never implemented, and I believe that their concept is
wrong, since single frontend (SCSI port) can not handle several targets.
2014-07-04 19:19:03 +00:00
Kenneth D. Merry
08df2e3eaf Add persistent reservation support to camcontrol(8).
camcontrol(8) now supports a new 'persist' subcommand that allows users to
issue SCSI PERSISTENT RESERVE IN / OUT commands.

sbin/camcontrol/Makefile:
	Add persist.c.

sbin/camcontrol/persist.c:
	New persistent reservation support for camcontrol(8).

	We have support for all known operation modes for PERSISTENT RESERVE
	IN and PERSISTENT RESERVE OUT.
	exceptions noted above.

sbin/camcontrol/camcontrol.8:
	Document the new 'persist' subcommand.

	In the section on the Transport ID (-I) option, explain what
	Transport IDs for each protocol should look like.  At some point
	some of this information could probably get moved off in a
	separate man page, either on Transport IDs alone or a man page
	documenting the Transport ID parsing code.

	Add a number of examples of persistent reservation commands.
	Persistent Reservations are complex enough that the average user
	probably won't be able to get the commands exactly right by just
	reading the man page.  These examples show a few basic and
	advanced examples of how to use persistent reservations.

sbin/camcontrol/camcontrol.h:
	Move the definition for camcontrol_optret here, so we can use it
	for the persistent reservation code.

	Add a definition for the new scsipersist() function.

sbin/camcontrol/camcontrol.c:
	Add 'persist' to the list of subcommands.

	Document 'persist' in the help text.

sys/cam/scsi/scsi_all.c:
	Add the scsi_persistent_reserve_in() and
	scsi_persistent_reserve_out() CCB building functions.

	Add a new function, scsi_transportid_sbuf().  This takes a
	SCSI Transport ID (documented in SPC-4), and prints it to
	an sbuf(9).  There are some transports (like ATA, USB, and
	SSA) for which there is no transport defined.  We need to
	come up with a reasonable thing to do if we're presented
	with a Transport ID that claims to be for one of those
	protocols.

	Add new routines scsi_get_nv() and scsi_nv_to_str().

	These functions do a table lookup to go between a string and an
	integer.  There are lots of table lookups needed in the
	persistent reservation code in camcontrol(8).

	Add a new function, scsi_parse_transportid(), along with leaf node
	functions to parse:
	FC, 1394 and SAS (scsi_parse_transportid_64bit())
	iSCSI (scsi_parse_transportid_iscsi())
	SPI (scsi_parse_transportid_spi())
	RDMA (scsi_parse_transportid_rdma())
	PCIe (scsi_parse_transportid_sop())

	Transport IDs.  Given a string with the general form proto,id these
	functions create a SCSI Transport ID structure.

sys/cam/scsi/scsi_all.h:
	Update the various persistent reservation data structures to
	SPC4r36l, but also rename some fields that were previously
	obsolete with the proper names from older SCSI specs.  This
	allows using older, obsolete persistent reservation types when
	desired.

	Add function prototypes for the new persistent reservation CCB
	building functions.

	Add a data strucure for the READ FULL STATUS service action
	of the PERSISTENT RESERVE IN command.

	Add Transport ID structures for all protocols described in SPC-4.

	Add a new series of SCSI_PROTO_XXX definitions, and
	redefine other defines in terms of these new definitions.

	Add a prototype for scsi_transportid_sbuf().

	Change a couple of "obsolete" persistent reservation data
	structure fields into something more meaningful, based on
	what the field was called when it was defined in the spec.
	(e.g. SPC, SPC-2, etc.)

	Create a new define, SPRI_MAX_LEN, for the maximum allocation
	length allowed for the PERSISTENT RESERVE IN command.

	Add data structures and enumerations for the new name/value
	translation functions.

	Add data structures for SCSI over PCIe Routing IDs.

	Bring the PERSISTENT RESERVE OUT Register and Move parameter list
	structure (struct scsi_per_res_out_parms) up to date with SPC-4.

	Add a data structure for the transport IDs that can optionally be
	appended to the basic PERSISTENT RESERVE OUT parameter list.

	Move SCSI protocol macro definitions out of the VPD page 0x83
	definition and combine them with the more up to date protocol
	definitions higher in the file.

	Add function prototypes for scsi_nv_to_str(), scsi_get_nv(),
	scsi_parse_transportid_64bit(), scsi_parse_transportid_spi(),
	scsi_parse_transportid_rdma(), scsi_parse_transportid_iscsi(),
	scsi_parse_transportid_sop(), and scsi_parse_transportid().

Sponsored by:	Spectra Logic Corporation
MFC after:	1 week
2014-07-03 23:09:44 +00:00
Warner Losh
7ddad071a5 Rework the BIO_DELETE code slightly. Always queue the BIO_DELETE
requests on the trim_queue, even for the CFA ERASE. This allows us, in
the future, to collapse adjacent requests. Since CFA ERASE is only for
CF cards, and it is so restrictive in what it can do, the collapse
code is not presently here. This also brings the ada driver more in
line with the da driver's treatment of BIO_DELETEs.

Reviewed by: mav@
2014-07-03 05:22:13 +00:00
Alexander Motin
22d6cbd4d3 Use separate memory type M_CTLIO for I/Os.
CTL allocate large amount of RAM.  This change give some more stats.

MFC after:	2 weeks
2014-07-03 04:26:53 +00:00
Alexander Motin
25c9d5e593 Add support for REPORT TIMESTAMP command.
MFC after:	2 weeks
2014-07-01 16:52:41 +00:00
Alexander Motin
1b08cb4ee7 Add more formal and strict command parsing and validation.
For every supported command define CDB length and mask of bits that are
allowed to be set.  This allows to remove bunch of checks through the code
and still make the validation more strict.  To properly do it for commands
supporting multiple service actions, formalize their parsing by adding
subtables for each of such commands.

As visible effect, this change allows to add support for REPORT SUPPORTED
OPERATION CODES command, reporting to client all the data about supported
SCSI commands, except timeouts.

MFC after:	2 weeks
2014-07-01 15:05:23 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Alexander Motin
acee7463b6 Remove odd practice of inverting error codes.
-EPERM is equal to ERESTART, returning which from ioctl() handler causes
infinite syscall restart.

MFC after:	2 weeks
2014-06-27 22:28:14 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alexander Motin
1d6f7db544 Fix typo in r267481.
MFC after:	3 days
2014-06-27 06:52:37 +00:00
Alexander Motin
b88b05216a Simplify statistics calculation.
Instead of trying to guess size of disk I/O operations (it just won't work
that way for newly added commands, and is equal to data move size for old
ones), account data move traffic.  If disk I/Os are that interesting, then
backends have to account and provide that information.

Block backend already exports the information about disk I/Os via devstat,
so having it here too is excessive.

MFC after:	2 weeks
2014-06-26 20:06:37 +00:00
Alexander Motin
f82388fd84 Allow MODE SENSE commands through Write Exclusive persistent reservation,
as required by SPC-4.

Report that fact in persistent reservation capabilities.

MFC after:	2 weeks
2014-06-26 09:42:00 +00:00
Alexander Motin
85165a3f70 Add READ BUFFER and improve WRITE BUFFER SCSI commands support.
This gives some use to 512KB per-LUN buffers, allocated for Copan-specific
processor code and not used.  It allows, for example, to test transport
performance and/or correctness without accessing the media, as supported
by Linux version of sg3_utils.

MFC after:	2 weeks
2014-06-26 08:56:36 +00:00
Alexander Motin
75c7a1d357 Lock devstat updates in block backend to make it usable. Polish lock names.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-25 17:54:36 +00:00
Alexander Motin
3a8ce4a36b Introduce fine-grained CTL locking to improve SMP scalability.
Split global ctl_lock, historically protecting most of CTL context:
 - remaining ctl_lock now protects lists of fronends and backends;
 - per-LUN lun_lock(s) protect LUN-specific information;
 - per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.

Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them.  This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.

On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-25 17:02:01 +00:00
Alexander Motin
d309b227c5 Allow to use iSCSI immediate data by several ctl_datamove() calls.
While for FreeBSD client that is only a minor optimization, VMWare client
doesn't support additional data requests after all data being sent once as
immediate.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2014-06-25 16:12:14 +00:00
Alexander Motin
50fe38b6b8 Execute task management request directly in ctl_queue() context.
From one side it allows to remove CTL_FLAG_TASK_PENDING flag, handling of
which significantly complicates fine-grained locking.  From the other side
it reduces task management requests latency even below then that flag could.
As downside, it denies task management code to sleep, but that is not needed
any way now.

Discussed with:	ken
2014-06-19 13:19:35 +00:00
Alexander Motin
ead2f11724 Add some more CTL_FLAG_ABORT check points.
This should allow to abort commands doing mostly disk I/O, such as VERIFY
or WRITE SAME.  Before this change CTL_FLAG_ABORT was only checked around
data moves, which for these commands may not happen for a very long time.

MFC after:	2 weeks
2014-06-19 12:43:41 +00:00
Alexander Motin
28b9e53b7d Increase CTL_DEVID_LEN from 16 to 64 bytes.
SPC-4 recommends T10 vendor ID based LUN ID was created by concatenating
product name and serial number (and istgt follows that).  But product name
is 16 bytes long by itself, so 16 bytes total length is clearly not enough
to fit both.

To keep compatibility with existing configurations, pad short device IDs
to old length of 16, same as before.

This change probably breaks CTL user-level ABI, so control tools should
be rebuilt after this change.

MFC after:	2 weeks
2014-06-19 09:46:43 +00:00
Marius Strobl
de6a705e34 Don't denounce peripherals on system shutdown. Together with r267321,
we're now back to the pre-r228483 level of default verbosity. This in
turn again typically allows for reading information that userland might
have printed on the screen before initiating a halt, but still permits
to debug potential device shutdown problems on system shutdown via
CAM_DEBUG etc.

Reviewed by:	mav
MFC after:	3 days
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-06-19 09:08:20 +00:00
Alexander Motin
9ad03ef5e7 Add iSCSI Target Name ID descriptor to VPD 83h.
It shall/should be there according to SPC-4, and istgt also provides it.

MFC after:	2 weeks
2014-06-19 08:13:53 +00:00
Edward Tomasz Napierala
f7d6790884 Rework session termination in iSCSI target to actually wait
for any outstanding commands to be properly aborted by CTL.
Without it, in some cases (such as files backing the LUNs
stored on failing disk drives), terminating a busy session
would result in panic.

Reviewed by:	mav@ (earlier version)
Sponsored by:	The FreeBSD Foundation
2014-06-18 17:13:18 +00:00
Edward Tomasz Napierala
2af142cafe Make cs_terminating a bool; no functional changes.
Sponsored by:	The FreeBSD Foundation
2014-06-17 09:02:10 +00:00
Edward Tomasz Napierala
57072b5118 Add comment explaining a potential problem with just added LUN ID.
Reminded by:	mav@
Sponsored by:	The FreeBSD Foundation
2014-06-16 19:05:51 +00:00
Edward Tomasz Napierala
a39adbef47 Add LUN-associated name to VPD, to make Hyper-V Failover Cluster happy.
Sponsored by:	The FreeBSD Foundation
2014-06-16 18:14:05 +00:00
Alexander Motin
11b569f7cb Add support for VERIFY(10/12/16) and COMPARE AND WRITE SCSI commands.
Make data_submit backends method support not only read and write requests,
but also two new ones: verify and compare.  Verify just checks readability
of the data in specified location without transferring them outside.
Compare reads the specified data and compares them to received data,
returning error if they are different.

VERIFY(10/12/16) commands request either verify or compare from backend,
depending on BYTCHK CDB field.  COMPARE AND WRITE command executed in two
stages: first it requests compare, and then, if succeesed, requests write.
Atomicity of operation is guarantied by CTL request ordering code.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-16 11:00:14 +00:00
Alexander Motin
e86a414238 Make backends track completion by processed number of sectors instead of
total transfer size.

Commands such as VERIFY or COMPARE AND WRITE may have transfer size not
matching directly to number of sectors.
2014-06-15 20:14:11 +00:00
Alexander Motin
66df9136e3 Remove memcpy() from ctl_private[] accesses.
That union is aligned enough to access data directly.
2014-06-15 18:16:51 +00:00
Alexander Motin
9c71cd5aae Move kern_total_len setting from backend to core code. 2014-06-15 17:14:52 +00:00
Alexander Motin
20a5f2d963 Format Portal Group Tag same as istgt does -- %4.4x instead of %x.
SPC-4 spec tells it should be "two or more hexadecimal digits".
RFC3720 tells it is 16-bit value.

MFC after:	2 weeks
2014-06-15 10:04:44 +00:00
Alexander Motin
eb3687a6a6 Remove custom processing for "file" option. 2014-06-15 09:37:06 +00:00
Alexander Motin
5777f09019 Respect "vendor" option in all places.
MFC after:	2 weeks
2014-06-15 08:43:52 +00:00
Alexander Motin
0c934f7f89 Add "vendor", "product" and "revision" options to control inquiry data.
MFC after:	2 weeks
2014-06-15 06:56:10 +00:00
Alexander Motin
ad9cb3314a Remove non-functional remnants of control LUN -- 18MB of RAM for nothing. 2014-06-14 20:25:14 +00:00
Alexander Motin
57a5db13b7 Implement small KPI to access LUN options instead doing it by hands.
MFC after:	2 weeks
2014-06-14 17:47:44 +00:00
Alexander Motin
9e005bbcc9 Fix some leaks on LUN creation error.
MFC after:	2 weeks
2014-06-12 21:50:46 +00:00
Warner Losh
dbb3f5b28b The code that combines adjacent ranges for BIO_DELETEs to optimize
trims to the device assumes the list is sorted. Don't apply the
optimization of not sorting the queue when we have SSDs to the
delete_queue, since it causes more discard traffic to the drive. While
one could argue that the higher levels should coalesce the trims,
that's not done today, so some optimization at this level is needed.

CR: https://phabric.freebsd.org/D142
2014-06-05 17:13:42 +00:00
Alexander Motin
94fe9f959c - Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support
scatter/gather lists.
- Return error for still unsupported SG 3.x API read/write calls.

MFC after:	1 month
2014-06-04 12:05:47 +00:00
Alexander Motin
fcaf473cfc Overhaul CAM SG driver IOCTL interfaces.
Make it really work for native FreeBSD programs.  Before this it was broken
for years due to different number of pointer dereferences in Linux and
FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs.
This change breaks the driver FreeBSD IOCTL ABI, making it more strict,
but since it was not working any way -- who bother.

Add shims for 32-bit programs on 64-bit host, translating the argument
of the SG_IO IOCTL for both FreeBSD and Linux ABIs.

With this change I was able to run 32-bit Linux sg3_utils tools and simple
32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems.

MFC after:	1 month
2014-06-02 19:53:53 +00:00
Edward Tomasz Napierala
8cd22f5edf Provide better descriptions for 'struct ctl_scsiio' fields; based mostly
on emails from ken@.
2014-05-04 15:35:04 +00:00
Alexander Motin
51ad63daae Respect MAXIMUM TRANSFER LENGTH field of Block Limits VPD page.
Nobody yet reported disk supporting I/Os less then our MAXPHYS value, but
since we any way have code to read Block Limits VPD page, that is easy.

MFC after:	2 weeks
2014-04-30 19:44:31 +00:00
Alexander Motin
b28e753c93 Do not reread SCSI disk VPD pages on every device open.
Instead of rereading VPD pages on every device open, do it only on initial
device probe, and in cases when device reported via UNIT ATTENTIONs that
something has changed.  Capacity is still rereaded on every open because
it is more critical for operation and more probable to change in run time.

On my tests with Intel 530 SSDs on mps(4) HBA this change reduces time
GEOM needs to retaste the device (that includes few open/close cycles)
from ~150ms to ~30ms.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-04-30 17:38:26 +00:00
Alexander Motin
08a7cce543 Remove limits on size of READ/WRITE operations.
Instead of allocating up to 16MB or RAM at once to handle whole I/O,
allocate up to 1MB at a time, but do multiple ctl_datamove() and storage
I/Os if needed.
2014-04-24 16:19:49 +00:00
Alexander Motin
acf5bea460 Make CAM target CTL frontend respect SIM I/O size limitations.
If datamove size is bigger then SIM can handle, or it has more segments
then this code can handle -- split it into several CTIO requests.
2014-04-24 15:16:26 +00:00
Edward Tomasz Napierala
a7f6a46874 Modify CTL iSCSI frontend to properly handle situations where datamove
routine is called multiple times per SCSI task.

Sponsored by:	The FreeBSD Foundation
2014-04-24 12:54:35 +00:00
Alexander Motin
0fa3cb336b Disable UNMAP support for STEC 842 SSDs.
In some unknown cases UNMAP commands make device firmware stuck.

MFC after:	2 weeks
2014-04-23 19:50:35 +00:00
Edward Tomasz Napierala
8eab95d646 Properly pass the initiator address when running in proxy mode.
Sponsored by:	The FreeBSD Foundation
2014-04-16 11:00:10 +00:00
Edward Tomasz Napierala
6e4f347cd6 Make it possible to interrupt login when running in proxy mode.
Sponsored by:	The FreeBSD Foundation
2014-04-16 10:37:26 +00:00
Edward Tomasz Napierala
8cab2ed4cd Properly identify target portal when running in proxy mode. While here,
remove CTL_ISCSI_CLOSE, it wasn't used or implemented anyway.

Sponsored by:	The FreeBSD Foundation
2014-04-16 10:29:34 +00:00
Edward Tomasz Napierala
2ebde326cb Add some stuff to make it easier to figure out for the system administrator
whether the ICL_KERNEL_PROXY stuff got compiled in correctly.

Sponsored by:	The FreeBSD Foundation
2014-04-16 10:18:44 +00:00
Edward Tomasz Napierala
ba3a2d31c8 Make it possible for the iSCSI target side to operate in both normal
and ICL_KERNEL_PROXY mode, and fix some bit rot so the latter actually
works again.

Sponsored by:	The FreeBSD Foundation
2014-04-16 10:06:37 +00:00
Alexander Motin
2dfdd4ae19 Join CTL worker threads into one process for convenience.
Report their idle state as "-".
2014-04-13 11:10:36 +00:00
Alexander Motin
3710ae64b9 Report more readable state "-" for idle CAM scan thread. 2014-04-13 11:08:57 +00:00
Steven Hartland
43d0f063c2 Fix build breakage caused by r264295
X-MFC-With: r264295
MFC after:	1 week
2014-04-10 05:04:23 +00:00
Alexander Motin
7081bb15b0 Fix three refcounter leaks and lock recursion they covered.
MFC after:	1 week
2014-04-09 19:16:40 +00:00
Alexander Motin
004008d6e6 Introduce new serialization type CTL_SERIDX_UNMAP.
Unfortunately we can't check range collisions for UNMAP commands alike
to writes, because they include multiple ranges, which are also passed
in data block, not in CDB.  As result, UNMAP commands have to be treated
as colliding with any other command accessing the media.

From the other side all UNMAPs are equal (we don't support ANCHOR flag),
so we can execute several UNMAPs same time.
2014-04-09 10:58:52 +00:00
Alexander Motin
8f5a226a3c When splitting huge unmap requests, do it on sector boundary. 2014-04-09 10:44:09 +00:00
Alexander Motin
b8a8ed5664 Remove support of LUN-based CD changers from cd(4) driver.
This code was heavily broken few months ago during CAM locking changes.
Fixing it would require almost complete rewrite.  Since there are no
known devices on market using this interface younger then ~15 years, and
they are CD, not even DVD, I don't see much reason to rewrite it.

This change does not mean those devices won't work.  They will just work
slower due to inefficient disks load/unload schedule if several LUNs
accessed same time.

Discussed with:	ken@
Silence on:	scsi@, hardware@
MFC after:	1 week
2014-04-09 08:57:57 +00:00
Alexander Motin
7e0be0226f Another fix for r264274. Last moment cosmetic changes are evil! 2014-04-08 22:36:39 +00:00
Alexander Motin
f7ad1c4625 Oops! Few quick fixes for r264274. 2014-04-08 21:30:10 +00:00
Alexander Motin
ee7f31c068 Add support for SCSI UNMAP commands to CTL.
This patch adds support for three new SCSI commands: UNMAP, WRITE SAME(10)
and WRITE SAME(16).  WRITE SAME commands support both normal write mode
and UNMAP flag.  To properly report UNMAP capabilities this patch also adds
support for reporting two new VPD pages: Block limits and Logical Block
Provisioning.

UNMAP support can be enabled per-LUN by adding "-o unmap=on" to `ctladm
create` command line or "option unmap on" to lun sections of /etc/ctl.conf.

At this moment UNMAP supported for ramdisks and device-backed block LUNs.
It was tested to work great with ZFS ZVOLs.  For file-backed LUNs UNMAP
support is unfortunately missing due to absence of respective VFS KPI.

Reviewed by:	ken
MFC after:	1 month
Sponsored by:	iXsystems, Inc
2014-04-08 20:50:48 +00:00
Alexander Motin
1fa3ca3ca7 Wakeup only one thread of added in r263978i at a time.
This slightly reduces lock congestion between threads.

Submitted by:	trasz
2014-04-08 18:22:03 +00:00
Alexander Motin
f601272298 Report stripe size and offset of the backing device in READ CAPACITY (16)
as physical sector size and offset.

MFC after:	2 weeks
2014-04-06 10:13:14 +00:00
Edward Tomasz Napierala
605a34b065 All the iSCSI sysctls are also tunables; advertise that.
Sponsored by:	The FreeBSD Foundation
2014-04-04 08:48:55 +00:00
Edward Tomasz Napierala
01a111f447 Use atomic ops instead of mutexes where appropriate.
Submitted by:	mav@
Sponsored by:	The FreeBSD Foundation
2014-04-01 21:54:20 +00:00
Edward Tomasz Napierala
ecba49ddfc Instead of "icltx" and "iclrx", use thread names with prefix from upper
layer, so that one can see which side of the stack the threads are for.

Sponsored by:	The FreeBSD Foundation
2014-04-01 21:47:22 +00:00
Edward Tomasz Napierala
6ed8f5d236 Get rid of ICL lock; use upper-layer (initiator or target) lock instead.
This avoids extra locking in icl_pdu_queue(); the upper layer needs to call
it while holding its own lock anyway, to avoid sending PDUs out of order.

Sponsored by:	The FreeBSD Foundation
2014-04-01 21:40:46 +00:00
Edward Tomasz Napierala
a0e36aee93 Remove the homegrown ctl_be_block_io allocator, replacing it with UMA.
There is no performance difference.

Reviewed by:	mav@
Sponsored by:	The FreeBSD Foundation
2014-04-01 21:13:05 +00:00
Edward Tomasz Napierala
ac030c53c0 Hide CTL messages about SCSI error responses. Too many users take
them for actual target errors.  They can be enabled back by setting
kern.cam.ctl.verbose=1, or booting with bootverbose.

Sponsored by:	The FreeBSD Foundation
2014-03-31 21:04:15 +00:00
Edward Tomasz Napierala
9328f8a939 Make it possible to have multiple CTL worker threads. Leave the default
of 1 for now.

Sponsored by:	The FreeBSD Foundation
2014-03-31 20:49:33 +00:00
Warner Losh
8a27a339b6 Remove instances of variables that were set, but never used. gcc 4.9
warns about these by default.
2014-03-30 23:43:36 +00:00
Edward Tomasz Napierala
0f26fd2c61 Remove ctl_mem_pool.{c,h}.
Sponsored by:	The FreeBSD Foundation
2014-03-27 11:10:13 +00:00
Edward Tomasz Napierala
6d9d321647 Rework cfiscsi_datamove_in() to obey expected data transfer length
received from the initiator.

Sponsored by:	The FreeBSD Foundation
2014-03-27 10:15:35 +00:00
Edward Tomasz Napierala
ea86ccce9b Target Transfer Tag is opaque; no need to htonl(3) it.
Sponsored by:	The FreeBSD Foundation
2014-03-25 19:28:40 +00:00
Edward Tomasz Napierala
b1277ad1f7 Use a less unusual syntax in debug printfs.
Sponsored by:	The FreeBSD Foundation
2014-03-25 18:30:57 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Alexander Motin
6d45fdc941 Fix support for increased logical sector size (4K-native drives).
- Logical sector size is measured in words, not bytes.
- If physical sector is not bigger then logical sector, it does not mean
it should be set equal to 512 bytes, but set to logical sector.

PR:		misc/187269
Submitted by:	Ravi Pokala <rpokala@panasas.com>
MFC after:	1 week
2014-03-07 09:45:40 +00:00
Edward Tomasz Napierala
2cb15089bb Make reset handling in iSCSI target RFC-compliant. This fixes some rare
hangs with Open-iSCSI (Linux).

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-03-06 10:45:53 +00:00
Edward Tomasz Napierala
561a971a76 Fix missing unlock in persistent reservations code, which resulted in panics
with Hyper-V Failover Cluster.

Reviewed by:	ken@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2014-03-05 12:02:29 +00:00
Alexander Motin
357478a5af Do not retry on CAM_FUNC_NOTAVAIL error, but return immediately.
MFC after:	2 weeks
2014-03-04 15:07:00 +00:00
Alexander Motin
e0c2f97550 Make CTL block backend return proper error code for operations unsupposed
by the underlying device.

MFC after:	2 weeks
2014-02-06 03:54:58 +00:00
Alexander Motin
d0754f0829 Mostly revert r260267 and hopefully really fix the original problem.
The latest draft of SBC-3 tells: "A MAXIMUM UNMAP LBA COUNT field set to
a non-zero value indicates the maximum number of LBAs that may be unmapped
by an UNMAP command."  To me it does not sound like that limit is set per
single descriptor, but rather per all command.  And I have at least one
device that behaves exactly that way.  This patch fixes the problem there.

MFC after:	1 week
2014-01-22 22:19:53 +00:00
Alexander Motin
7f7aacb476 Fix memory and references leak due to unfreed path in case we can't
allocate bus scan CCB.

MFC after:	2 weeks
2014-01-21 23:15:23 +00:00
Alexander Motin
d718926b6d Move xpt_run_devq() call before request completion callback where it was
originally.

I am not sure why exactly have I moved it during one of many refactorings
during camlock project, but obviously it opens race window that may cause
use after free panics during SIM (in reported cases umass(4)) detach.

MFC after:	2 weeks
2014-01-11 16:52:09 +00:00
Alexander Motin
4515f70a9c Fix for r260541: do not drop periph reference when request is restarted.
CAM_DEV_QFREEZE flag is still there and it will freeze device again.
2014-01-11 16:37:20 +00:00
Alexander Motin
c33e40291b Take additional reference on SCSI probe periph to cover its freeze count.
Otherwise periph may be invalidated and freed before single-stepping freeze
is dropped, causing use after free panic.
2014-01-11 13:35:36 +00:00
Alexander Motin
431d3a5bfc Replace several instances of -1 with appropriate CAM_*_WILDCARD and types.
It was equal before r259397, but for good or bad, not any more for LUNs.

This change fixes at least CAM debugging.
2014-01-10 12:18:05 +00:00
Alexander Motin
5b4374aa27 Allow delete_method sysctl to be set to "DISABLE". 2014-01-07 20:12:10 +00:00
Steven Hartland
6907488efa Correct short delete issue in SCSI UNMAP support
Correct missing \n's in xpt_print's
Correct incorrect count being passed to short delete xpt_print

MFC after:	1 week
2014-01-04 17:52:43 +00:00
Nathan Whitehorn
92be6c51f0 Widen lun_id_t to 64 bits. This is a follow-on to r257345 to let the kernel
support all valid SAM-5 LUN IDs. CAM_VERSION is bumped, as the CAM ABI
(though not API) is changed. No behavior is changed relative to r257345
except that LUNs with non-zero high 32 bits will no longer be ignored
during device enumeration for SIMs that have set PIM_EXTLUNS.

Reviewed by:	scottl
2013-12-14 22:07:40 +00:00
Alexander Motin
c689c6239e When comparing device IDs, make sure that they have the same type
(like NAA assigned) and identify the same entity (like device or port).
Otherwise there can be false positives since at least some models of
Seagate disks use same IDs for the whole device and one of its ports.

MFC after:	2 weeks
2013-12-08 20:43:01 +00:00
Edward Tomasz Napierala
025a2301f8 Properly report an error instead of panicing when user tries to create
LUN backed by non-disk device, e.g. /dev/null.

Reviewed by:	ken (earlier version)
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2013-12-03 18:04:14 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Alexander Motin
8c6d5f8282 Introduce seperate mutex lock to protect protect CTL I/O pools, slightly
reducing global CTL lock scope and congestion.

While there, simplify CTL I/O pools KPI, hiding implementation details.
2013-11-11 08:27:20 +00:00
Alexander Motin
daa5487f36 Some CAM locks polishing:
- Fix LOR and possible lock recursion when handling high-power commands.
Introduce new lock to protect left power quota and list of frozen devices.
 - Correct locking around xpt periph creation.
 - Remove seems never used XPT_FLAG_OPEN xpt periph flag.
2013-11-10 12:16:09 +00:00
Steven Hartland
e2b8af8404 Corrected definition for old_rate to match d_rotation_rate
MFC after:	2 Days
X-MFC-With:	r256956
2013-11-07 23:21:52 +00:00
Alexander Motin
9813c936c4 Fix lock recursion, triggered by smartctl -a /dev/adaX. 2013-11-01 00:14:15 +00:00
Nathan Whitehorn
abe8350519 printf() specifier updates to CAM to handle either 32-bit or 64-bit lun_id_t.
MFC after:	2 weeks
2013-10-30 14:13:15 +00:00
Nathan Whitehorn
ef5758fa10 Implement extended LUN support. If PIM_EXTLUNS is set by a SIM, encode
the upper 32-bits of the LUN, if possible, into the target_lun field as
passed directly from the REPORT LUNs response. This allows extended LUN
support to work for all LUNs with zeros in the lower 32-bits, which covers
most addressing modes without breaking KBI. Behavior for drivers not
setting PIM_EXTLUNS is unchanged. No user-facing interfaces are modified.

Extended LUNs are stored with swizzled 16-bit word order so that, for
devices implementing LUN addressing (like SCSI-2), the numerical
representation of the LUN is identical with and without PIM_EXTLUNS. Thus
setting PIM_EXTLUNS keeps most behavior, and user-facing LUN IDs, unchanged.
This follows the strategy used in Solaris. A macro (CAM_EXTLUN_BYTE_SWIZZLE)
is provided to transform a lun_id_t into a uint64_t ordered for the wire.

This is the second part of work for full 64-bit extended LUN support and is
designed to a bridge for stable/10 to the final 64-bit LUN code. The
third and final part will involve widening lun_id_t to 64 bits and will
not be MFCed. This third part will break the KBI but will keep the KPI
unchanged so that all drivers that will care about this can be updated now
and not require code changes between HEAD and stable/10.

Reviewed by:	scottl
MFC after:	2 weeks
2013-10-29 15:36:58 +00:00
Alexander Motin
030844d1e7 Some microoptimizations for da and ada drivers:
- Replace ordered_tag_count counter with single flag;
 - From da remove outstanding_cmds counter, duplicating pending_ccbs list;
 - From da_softc remove unused links field.
2013-10-24 14:05:44 +00:00
Alexander Motin
eeb9405409 Remove 128KB bzero() call done for every block I/O data buffer.
On my tests this improves performance of the new iSCSI target backed by
GEOM STRIPE of SSDs from 75K to 110K IOPS.

Reviewed by:	ken
2013-10-23 17:55:35 +00:00
Alexander Motin
ec08c07e8c Minor (mostly cosmetical) addition to r256960. 2013-10-23 14:58:09 +00:00
Alexander Motin
f1486b5163 Move CAM_UNQUEUED_INDEX setting to the last moment and under the periph lock.
This fixes race condition with cam_periph_ccbwait(), causing use-after-free.
2013-10-23 12:53:05 +00:00
Steven Hartland
c28078e903 Improve ZFS N-way mirror read performance by using load and locality
information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay
2013-10-23 09:54:58 +00:00
Alexander Motin
3231e8bddb Fix memory and references leak due to unfreed path.
Coverity CID:	1054773
2013-10-22 13:56:30 +00:00
Alexander Motin
8ec5ab3f16 Unconditionally acquire periph reference on CCB allocation failure.
cam_periph_acquire() can return error if periph already invalidated, but
that may be unacceptable and cause deadlock if the invalidated periph can't
be destroyed without "executing" the scheduled request.

Coverity CID:	1109822
MFC after:	2 months
2013-10-22 12:58:22 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Alexander Motin
227d67aa54 Merge CAM locking changes from the projects/camlock branch to radically
reduce lock congestion and improve SMP scalability of the SCSI/ATA stack,
preparing the ground for the coming next GEOM direct dispatch support.

Replace big per-SIM locks with bunch of smaller ones:
 - per-LUN locks to protect device and peripheral drivers state;
 - per-target locks to protect list of LUNs on target;
 - per-bus locks to protect reference counting;
 - per-send queue locks to protect queue of CCBs to be sent;
 - per-done queue locks to protect queue of completed CCBs;
 - remaining per-SIM locks now protect only HBA driver internals.

While holding LUN lock it is allowed (while not recommended for performance
reasons) to take SIM lock.  The opposite acquisition order is forbidden.
All the other locks are leaf locks, that can be taken anywhere, but should
not be cascaded.  Many functions, such as: xpt_action(), xpt_done(),
xpt_async(), xpt_create_path(), etc. are no longer require (but allow) SIM
lock to be held.

To keep compatibility and solve cases where SIM lock can't be dropped, all
xpt_async() calls in addition to xpt_done() calls are queued to completion
threads for async processing in clean environment without SIM lock held.

Instead of single CAM SWI thread, used for commands completion processing
before, use multiple (depending on number of CPUs) threads.  Load balanced
between them using "hash" of the device B:T:L address.

HBA drivers that can drop SIM lock during completion processing and have
sufficient number of completion threads to efficiently scale to multiple
CPUs can use new function xpt_done_direct() to avoid extra context switch.
Make ahci(4) driver to use this mechanism depending on hardware setup.

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-21 12:00:26 +00:00
Alexander Motin
2030b2943b MFprojects/camlock:
Remove hard limit on number of BIOs handled with one ATA TRIM request.
2013-10-21 08:57:27 +00:00
Alexander Motin
8d36a71b76 Unify periph invalidation and destruction reporting.
Print message containing device model and serial number on invalidation.

Requested by:   glebius
MFC after:	1 week
2013-10-15 17:59:41 +00:00
Steven Hartland
d85805b291 Added 4K quirks for Corsair Neutron GTX SSD's 2013-10-15 17:03:02 +00:00
Alexander Motin
aa93041d2c Unhide "Serial Number" lines from bootverbose. That information may be
useful for system administration to have in hard copy (in logs) if one of
several devices suddenly dies.

Requested by:	glebius
MFC after:	1 week
2013-10-15 12:59:40 +00:00
Edward Tomasz Napierala
c48f182bb1 Remove no longer useful debugging output and a stale comment.
Approved by:	re (gjb)
Sponsored by:	FreeBSD Foundation
2013-10-09 17:34:45 +00:00
Edward Tomasz Napierala
02147e9cd0 Make the error handling more consistant. Shouldn't make any functional
difference.

Approved by:	re (gjb)
Sponsored by:	FreeBSD Foundation
2013-10-09 17:06:03 +00:00
Edward Tomasz Napierala
8ba0396077 Tidy up, cache return value of a function, and add an assertion;
shouldn't make any functional difference.

Approved by:	re (gjb)
Sponsored by:	FreeBSD Foundation
2013-10-09 16:55:52 +00:00
Alexander Motin
6dfc67e379 Close the race on path ID allocation in xpt_bus_register() if two buses are
registered simultaneously. Due to topology unlock between the ID allocation
and the bus registration there is a chance that two buses may get the
same IDs. That is supposed reason of lock assertion panic in CAM during
initial bus scanning after new iscsid initiates two sessions same time.

Reported by:	trasz
Approved by:	re (glebus, marius)
MFC after:	2 weeks
2013-10-09 12:09:01 +00:00
Edward Tomasz Napierala
1008ac5eb7 Fix NOP-In/NOP-Out payload handling. Previous way didn't work at all; fortunately
nothing seems to actually use this feature, but it's required by standard.

Approved by:	re (glebius)
Sponsored by:	FreeBSD Foundation
2013-10-09 12:03:04 +00:00
Edward Tomasz Napierala
a9c80a534a Properly fix out of memory handling in the iSCSI target.
Approved by:	re (glebius)
Sponsored by:	FreeBSD Foundation
2013-10-08 19:18:02 +00:00
Edward Tomasz Napierala
0f30c5d3c0 Split cfiscsi_datamove() in two; no functional changes.
Approved by:	re (glebius)
Sponsored by:	FreeBSD Foundation
2013-10-05 16:22:33 +00:00
Edward Tomasz Napierala
c28c09c1f0 Don't leak memory when removing an unconnected session, and remove useless
UMA_ZONE_NOFREE that caused another leak when unloading the module.

Approved by:	re (glebius)
Sponsored by:	FreeBSD Foundation
2013-10-04 19:31:41 +00:00
Nathan Whitehorn
b559575358 Make sure the CCB xflags field is initialized to zero so that
CAM_EXTLUN_VALID is not erroneously set. Also add an XPORT_SRP
identifier to the known SCSI transports for the SCSI RDMA protocol, as
used, for example with Infiniband storage.

Reviewed by:	scottl
Approved by:	re (marius)
2013-09-27 16:02:40 +00:00
Scott Long
f564de00f7 Re-do r255853. Along with adding back the API/ABI changes from the
original, this hides the contents of cam_compat.h from ktrace/kdump/truss,
avoiding problems there.  There are no user-servicable parts in there, so
no need for those tools to be groping around in there.

Approved by:	re
2013-09-25 15:55:56 +00:00
Glen Barber
0082e54e9d Revert r255853 pending fixes to build errors in usr.bin/kdump
Approved by:	re (implicit)
2013-09-25 01:48:45 +00:00
Scott Long
185884259b Update the CAM API for FreeBSD 10:
- Remove the timeout_ch field.  It's been deprecated since FreeBSD 7.0;
  MPSAFE drivers should be managing their own timeout storage.  The
  remaining non-MPSAFE drivers have been modified to also manage their own
  storage, and should be considered for updating to MPSAFE (or removal)
  during the FreeBSD 10.x lifecycle.

- Add fields related to soft timeouts and quality of service, to be used
  in upcoming work.

- Add room for more flags in the CCB header and path_inq structures.

- Begin support for extended 64-bit LUNs.

- Bump the CAM version number to 0x18, but add compat shims.  Tested with
  camcontrol and smartctl.

Reviewed by:    nathanw, ken, kib
Approved by:    re
Obtained from:  Netflix
2013-09-24 16:50:53 +00:00
Edward Tomasz Napierala
9606f568fe Properly ignore PDUs with CmdSN outside of allowed range.
Approved by:	re (glebius)
Sponsored by:	FreeBSD Foundation
2013-09-24 13:46:13 +00:00
Edward Tomasz Napierala
69aa56bef2 Fix a few instances of M_WAITOK in threads marked as prohibited from sleep,
missed in r255824.

Approved by:	re (kib)
Sponsored by:	FreeBSD Foundation
2013-09-24 09:33:31 +00:00
Edward Tomasz Napierala
46aaea8995 Don't use M_WAITOK when running from context where sleeping is prohibited,
such as callout or a geom thread.

Approved by:	re (marius)
Sponsored by:	FreeBSD Foundation
2013-09-23 19:54:44 +00:00
Edward Tomasz Napierala
ac873bb350 Add some spare fields to structs used by the new iSCSI stack - some just
in case, some for future MC/S support.

This requires kernel and world rebuild.

Approved by:	re (blanket)
Sponsored by:	FreeBSD Foundation
2013-09-20 21:26:51 +00:00
Edward Tomasz Napierala
009ea47eb2 Bring in the new iSCSI target and initiator.
Reviewed by:	ken (parts)
Approved by:	re (delphij)
Sponsored by:	FreeBSD Foundation
2013-09-14 15:29:06 +00:00
Alexander Motin
f9004a5db0 Make SES driver adequately react on simple enclosure devices -- read Short
Enclosure status to enclosure status field, clear previous state and exit.
2013-09-06 15:41:37 +00:00
Bryan Venteicher
ffead710d5 Add camcontrol support for the SCSI sanitize command
Reviewed by:	ken, mjacob (eariler version)
Sponsored by:	Netapp
2013-09-06 15:19:57 +00:00
Alexander Motin
d7a52e7b49 Fix kernel panic if cache->nelms is zero.
MFC after:	2 weeks
2013-09-06 14:31:52 +00:00
Alexander Motin
0d4f3c316e Add debug trace points for freeze/release device queue. 2013-09-01 17:37:19 +00:00
Alexander Motin
1d64933fe2 Bring legacy CAM target implementation back into API/KPI-coherent and even
functional state.  While CTL is much more superior target from all points,
there is no reason why this code should not work.

Tested with ahc(4) as target side HBA.

MFC after:	2 weeks
2013-09-01 13:01:59 +00:00
Alexander Motin
f017ca80b1 Fix SES_ENABLE_PASSTHROUGH kernel option, unexpectedly broken during driver
overhaul.

MFC after:	3 days
2013-09-01 12:18:44 +00:00
Alexander Motin
d1d536f0eb Fix targbh crash on XPT_IMMED_NOTIFY error during attach. 2013-09-01 11:50:37 +00:00
Alexander Motin
27492bea85 Fix the build with CTLFEDEBUG, broken by unmapped I/O support changes. 2013-09-01 10:11:00 +00:00
Kenneth D. Merry
ee5bd4fc5a Bump up the default timeouts for move commands in the ch(4) driver
to 15 minutes, and 5 minutes for things like READ ELEMENT STATUS.

This is needed to account for the worst case scenarios on at least
some Spectra Logic tape libraries.

Sponsored by:	Spectra Logic
MFC after:	3 days
2013-08-29 21:25:27 +00:00
Kenneth D. Merry
73825c1732 If a drive returns ASC/ASCQ 0x04,0x11 "Logical unit not ready,
notify (enable spinup) required", instead of doing the normal
retries, poll for a change in status.

We will poll every half second for a minute for the status to
change.

Hitachi drives (and likely other SAS drives) return that ASC/ASCQ
when they are waiting to spin up.  What it means is that they are
waiting for the SAS expander to send them the SAS
NOTIFY (ENABLE SPINUP) primitive.

That primitive is the mechanism expanders/enclosures use to
sequence drive spinup to avoid overloading power supplies.

Sponsored by:	Spectra Logic
MFC after:	3 days
2013-08-27 19:47:03 +00:00
Alexander Motin
40f27d7cf6 Add new attribute lunname to report only textual LUN-specific device IDs.
While lunid attribute prefers to report numeric ones, having both may be
useful in some situations.
2013-08-24 09:42:14 +00:00
Kenneth D. Merry
93729c1796 Add support to physio(9) for devices that don't want I/O split and
configure sa(4) to request no I/O splitting by default.

For tape devices, the user needs to be able to clearly understand
what blocksize is actually being used when writing to a tape
device.  The previous behavior of physio(9) was that it would split
up any I/O that was too large for the device, or too large to fit
into MAXPHYS.  This means that if, for instance, the user wrote a
1MB block to a tape device, and MAXPHYS was 128KB, the 1MB write
would be split into 8 128K chunks.  This would be done without
informing the user.

This has suboptimal effects, especially when trying to communicate
status to the user.  In the event of an error writing to a tape
(e.g. physical end of tape) in the middle of a 1MB block that has
been split into 8 pieces, the user could have the first two 128K
pieces written successfully, the third returned with an error, and
the last 5 returned with 0 bytes written.  If the user is using
a standard write(2) system call, all he will see is the ENOSPC
error.  He won't have a clue how much actually got written.  (With
a writev(2) system call, he should be able to determine how much
got written in addition to the error.)

The solution is to prevent physio(9) from splitting the I/O.  The
new cdev flag, SI_NOSPLIT, tells physio that the driver does not
want I/O to be split beforehand.

Although the sa(4) driver now enables SI_NOSPLIT by default,
that can be disabled by two loader tunables for now.  It will not
be configurable starting in FreeBSD 11.0.  kern.cam.sa.allow_io_split
allows the user to configure I/O splitting for all sa(4) driver
instances.  kern.cam.sa.%d.allow_io_split allows the user to
configure I/O splitting for a specific sa(4) instance.

There are also now three sa(4) driver sysctl variables that let the
users see some sa(4) driver values.  kern.cam.sa.%d.allow_io_split
shows whether I/O splitting is turned on.  kern.cam.sa.%d.maxio shows
the maximum I/O size allowed by kernel configuration parameters
(e.g. MAXPHYS, DFLTPHYS) and the capabilities of the controller.
kern.cam.sa.%d.cpi_maxio shows the maximum I/O size supported by
the controller.

Note that a better long term solution would be to implement support
for chaining buffers, so that that MAXPHYS is no longer a limiting
factor for I/O size to tape and disk devices.  At that point, the
controller and the tape drive would become the limiting factors.

sys/conf.h:	Add a new cdev flag, SI_NOSPLIT, that allows a
		driver to tell physio not to split up I/O.

sys/param.h:	Bump __FreeBSD_version to 1000049 for the addition
		of the SI_NOSPLIT cdev flag.

kern_physio.c:	If the SI_NOSPLIT flag is set on the cdev, return
		any I/O that is larger than si_iosize_max or
		MAXPHYS, has more than one segment, or would have
		to be split because of misalignment with EFBIG.
		(File too large).

		In the event of an error, print a console message to
		give the user a clue about what happened.

scsi_sa.c:	Set the SI_NOSPLIT cdev flag on the devices created
		for the sa(4) driver by default.

		Add tunables to control whether we allow I/O splitting
		in physio(9).

		Explain in the comments that allowing I/O splitting
		will be deprecated for the sa(4) driver in FreeBSD
		11.0.

		Add sysctl variables to display the maximum I/O
		size we can do (which could be further limited by
		read block limits) and the maximum I/O size that
		the controller can do.

		Limit our maximum I/O size (recorded in the cdev's
		si_iosize_max) by MAXPHYS.  This isn't strictly
		necessary, because physio(9) will limit it to
		MAXPHYS, but it will provide some clarity for the
		application.

		Record the controller's maximum I/O size reported
		in the Path Inquiry CCB.

sa.4:		Document the block size behavior, and explain that
		the option of allowing physio(9) to split the I/O
		will disappear in FreeBSD 11.0.

Sponsored by:	Spectra Logic
2013-08-24 04:52:22 +00:00
Edward Tomasz Napierala
81a2151d5c CTL changes required for iSCSI target, most notably LUN remapping
and a mechanism to allow CTL frontends for retrieving LUN options.

Reviewed by:	ken (earlier version)
2013-08-24 01:50:31 +00:00
Edward Tomasz Napierala
83fd94a416 Fix the (unused for now) SCSI_PROTO_iSCSI define to match style(9). 2013-08-21 07:45:47 +00:00
Kenneth D. Merry
aeb681d798 Add unmapped I/O and larger I/O support to the sa(4) driver.
We now pay attention to the maxio field in the XPT_PATH_INQ CCB,
and if it is set, propagate it up to physio via the si_iosize_max
field in the cdev structure.

We also now pay attention to the PIM_UNMAPPED capability bit in the
XPT_PATH_INQ CCB, and set the new SI_UNMAPPED cdev flag when the
underlying SIM supports unmapped I/O.

scsi_sa.c:	Add unmapped I/O support and propagate the SIM's
		maximum I/O size up.

		Adjust scsi_tape_read_write() in the same way that
		scsi_read_write() was changed to support unmapped
		I/O.  We overload the readop parameter with bits
		that tell us whether it's an unmapped I/O, and we
		need to set the CAM_DATA_BIO CCB flag.  This change
		should be backwards compatible in source and
		binary forms.

MFC after:	1 week
Sponsored by:	Spectra Logic
2013-08-16 16:14:32 +00:00
Edward Tomasz Napierala
da4757e06b Turn comments about locking into actual lock assertions.
Reviewed by:	ken
Tested by:	ken
MFC after:	1 month
2013-08-15 20:00:32 +00:00
Steven Hartland
dce643c85f Added 4K quirks for:-
* OCZ Agility 2 SSDs
* Marvell SSDs
* Intel X25-M Series SSDs
2013-08-14 15:18:28 +00:00
Alexander Motin
a29779e865 Remove droping topology mutex after iterating 100 periphs in CAMGETPASSTHRU.
That is not so slow and so often operation to handle unneeded otherwise
xsoftc.xpt_generation and respective locking complications.
2013-08-07 11:34:20 +00:00
Alexander Motin
71185c66dd Improve r253721 by reporting detected lack of BIO_FLUSH support to GEOM.
That prevents more of such requests from coming and errors from logging.
2013-08-07 08:20:11 +00:00
Edward Tomasz Napierala
ea7c84e46f Remove dead code. 2013-08-06 10:42:18 +00:00
Alexander Motin
39fe6d85ee MFprojects/camlock r249006:
Pass SIM pointer as an argument to camisr_runqueue() instead of doneq
pointer.
2013-08-05 12:15:53 +00:00
Alexander Motin
ea541bfdaa MFprojects/camlock r249505:
Change CCB queue resize logic to be able safely handle overallocations:
 - (re)allocate queue space in power of 2 chunks with 64 elements minimum
and never shrink it; with only 4/8 bytes per element size is insignificant.
 - automatically reallocate the queue to double size if it is overflowed.
 - if queue reallocation failed, store extra CCBs in unsorted TAILQ,
fetching them back as soon as some queue element is freed.

To free space in CCB for TAILQ linking, change highpowerq from keeping
high-power CCBs to keeping devices frozen due to high-power CCBs.

This encloses all pieces of queue resize logic inside of cam_queue.[ch],
removing some not obvious duties from xpt_release_ccb().
2013-08-05 11:48:40 +00:00
Alexander Motin
1c509c52b7 Add NO_RC16 quirk to make da driver avoid using READ CAPACITY(16) command
if possible.  Use it for Kingston JetFlash USB sticks, that are known to
return garbage in response to that command.
2013-07-30 13:00:09 +00:00
Alexander Motin
7651b989e8 Fix returning incorrect bio_resid value with failed BIO_DELETE requests.
Neither residual length reported for ATA/SCSI command nor one from another
BIO_DELETE request are in any way related to the value to be returned.
2013-07-28 19:56:08 +00:00
Alexander Motin
69114bc0da Synchronize device cache on close only if there were some write operations.
While these operations are not really needed otherwise, at least for SCSI
they may cause extra errors if some other initiator holds write exclusive
reservation on the LUN (SYNCHRONIZE CACHE handled as "write" operation).
2013-07-27 22:44:55 +00:00
Alexander Motin
16bfc1cf28 Oops, revert unwanted part of r253721. 2013-07-27 22:21:10 +00:00
Alexander Motin
196619d9cc Detect unsupported PREVENT ALLOW MEDIUM REMOVAL and SYNCHRONIZE CACHE(10)
to not spam devices with useless commands and logs with errors.
2013-07-27 22:19:34 +00:00
Kenneth D. Merry
b01773b0f1 CAM and mps(4) driver scanning changes.
Add a PIM_NOSCAN flag to the CAM path inquiry CCB.  This tells CAM
not to perform a rescan on a bus when it is registered.

We now use this flag in the mps(4) driver.  Since it knows what
devices it has attached, it is more efficient for it to just issue
a target rescan on the targets that are attached.

Also, remove the private rescan thread from the mps(4) driver in
favor of the rescan thread already built into CAM.  Without this
change, but with the change above, the MPS scanner could run before
or during CAM's initial setup, which would cause duplicate device
reprobes and announcements.

sys/param.h:
	Bump __FreeBSD_version to 1000039 for the inclusion of the
	PIM_RESCAN CAM path inquiry flag.

sys/cam/cam_ccb.h:
sys/cam/cam_xpt.c:
	Added a PIM_NOSCAN flag.  If a SIM sets this in the path
	inquiry ccb, then CAM won't rescan the bus in
	xpt_bus_regsister.

sys/dev/mps/mps_sas.c
	For versions of FreeBSD that have the PIM_NOSCAN path
	inquiry flag, don't freeze the sim queue during scanning,
	because CAM won't be scanning this bus.  Instead, hold
	up the boot.  Don't call mpssas_rescan_target in
	mpssas_startup_decrement; it's redundant and I don't
	know why it was in there.

	Set PIM_NOSCAN in path inquiry CCBs.

	Remove methods related to the internal rescan daemon.

	Always use async events to trigger a probe for EEDP support.
	In older versions of FreeBSD where AC_ADVINFO_CHANGED is
	not available, use AC_FOUND_DEVICE and issue the
	necessary READ CAPACITY manually.

	Provide a path to xpt_register_async() so that we only
	receive events for our own SCSI domain.

	Improve error reporting in cases where setup for EEDP
	detection fails.

sys/dev/mps/mps_sas.h:
	Remove softc flags and data related to the scanner thread.

sys/dev/mps/mps_sas_lsi.c:
	Unconditionally rescan the target whenever a device is added.

Sponsored by:	Spectra Logic
MFC after:	1 week
2013-07-22 18:37:07 +00:00
Alexander Motin
e5736ac88c Make some improvements to r253322 to really rescan target, not a bus.
Add there and in two more places checks for NULL on xpt_alloc_ccb_nowait().
2013-07-15 18:17:31 +00:00