Commit Graph

2050 Commits

Author SHA1 Message Date
Warner Losh
c4231018d0 Be nicer on the dump stack by allocating only a ccb_nvmeio rather than
a full ccb. This saves a few hundre bytes, which might be important
during a crash dump...

Sponsored by: Netflix
Suggested by: scottl@
2017-10-15 16:18:03 +00:00
Warner Losh
717bff5d85 Update comment to reflect actual default timeout.
Sponsored by: Netflix
2017-10-15 16:17:55 +00:00
Edward Tomasz Napierala
5b4bc31eee Fix iSCSI target panics on concurrent session teardown and display
(eg removing a target and doing "ctladm islist -v" at the same time).

Reviewed by:	manu
Tested by:	manu
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-10-04 11:35:04 +00:00
Alexander Motin
748f120f7c Add sysctl/tunable for maximal request time.
MFC after:	1 week
2017-09-30 13:17:31 +00:00
Warner Losh
78ed811e6c cam iosched: Bettar account IOPS for smoother performance
Prevent cam_iosched_iops_tick() from discarding 'unspent' ios unless
it's a new accounting interval.

Previously ios that weren't used between ticks were lost, as a result
the iops limiter could enforce a limit below the configured maximum.

Obtained from: ElectroBSD
Submitted by: Fabian Keil
PR: 221974
2017-09-22 02:36:36 +00:00
Warner Losh
f777123b83 cam iosched: Enforce iop limits below the quanta value
Previously the iops limiter would always allow at least
quanta ios per second as cam_iosched_iops_tick() never set
ios->l_value1 below 1.

Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD
PR: 221974
2017-09-22 02:36:32 +00:00
Jung-uk Kim
8c294161aa Remove an ancient comment about the existence of READ(16) and WRITE(16).
MFC after:	3 days
2017-09-21 00:03:59 +00:00
Warner Losh
89d26636f3 cam iosched: Call cam_iosched_limiter_init() after ios->current is set to the default
Previously ios->current was set to 0 until the first
cam_iosched_cl_maybe_steer() call.

PR: 221954
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12349
2017-09-20 21:26:01 +00:00
Warner Losh
3028dd8dd5 cam iosched: Schedule cam_iosched_ticker() quanta times per second
Previously callout_reset() was called with a "ticks" value that was
off by one.  As a result cam_iosched_ticker() was called a bit too
frequently: On systems with hz=1000 a quanta value of 200 resulted in
~250 calls and a value of 100 in ~111 calls.

For the "queue_depth" and "bandwidth" limiters the difference doesn't
matter but the "iops" limiter depends on the scheduling to enforce the
correct maximum.

PR: 221956
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12350
2017-09-20 21:25:56 +00:00
Warner Losh
2d22619adc cam iosched: Add a handler for the quanta sysctl to enforce valid values
Invalid values can result in devision-by-zero panics or other
undefined behaviour so lets not allow them.

PR: 221957
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12351
2017-09-20 21:19:53 +00:00
Warner Losh
84c12dcdd0 cam iosched: Use the write queue for BIO_ZONE commands
Use the write queue for BIO_ZONE commands so they can't get executed
ahead of writes that were sent after them. More generally, since they
introduce strong ordering into the list, they need to go to the write
queue (which is the only queue that BIO_ORDERED is honored for at the
moment). In fact, fix mismatch between queueing and dequeueing code by
changing this to queue all non-reads (and non-trims) to the write
queue.

As a side effect this prevents the kernel message:
kernel: Found bio_cmd = 0x9
which cam_iosched_next_bio() emits when finding commands
other than BIO_READ in the read queue.

PR: 221973
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12353
2017-09-20 21:13:20 +00:00
Ilya Bakulin
ffa317051b Add kern.features flag for MMCCAM
kern.features.mmcam will be present and equal to 1 if the kernel has been
compiled with option MMCCAM.
This will help sdio-related userland tools to fail-fast if running on the kernel
without MMCCAM enabled.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12386
2017-09-18 20:17:08 +00:00
Warner Losh
851063e16a Allow multiple TRIMs to be done for nda
Don't call cam_iosched_trim_done or cam_iosched_submit_trim for nda
since its hardware can handle almost an arbitrary number of TRIMs and
we don't have to be careful to only ever do one.

Sponsored by: Netflix
2017-09-15 20:16:06 +00:00
Warner Losh
55c770b40a Update comments on what the CAM_IOSCHED_FLAG_TRIM_ACTIVE means.
It's intended only for those situations where the periph driver
ones to limit the number of trims active to one and only one.
Also update comments on associated functions.

Sponsored by: Netflix
2017-09-15 20:15:55 +00:00
Ilya Bakulin
02c474b481 Miscellaneous fixes and improvements to MMCCAM stack
* Demote the level of several debug messages to CAM_DEBUG_TRACE
 * Add detection for SDHC cards that can do 1.8V. No voltage switch sequence
   is issued yet;
 * Don't create a separate LUN for each SDIO function. We need just one to make
   pass(4) attach;
 * Remove obsolete mmc_sdio* files. SDIO functionality will be moved into the
   separate device that will manage a new sdio(4) bus;
 * Terminate probing if got no reply to CMD0;
 * Make bcm2835 SDHCI host controller driver compile with 'option MMCCAM'.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12109
2017-09-15 19:47:44 +00:00
Warner Losh
d7fa1ab02d cam iosched: Limit the quanta default to hz if it's below 200
The cam_iosched_ticker() can't be scheduled more than once per tick.
Some limiters depend on quanta matching the number of calls per second
to enforce the proper limits. Limit the quanta to no faster than 1 per
clock tick. This fixes some features when running in VMs where the
default HZ is 100.

PR: 221953
Obtained from: ElectroBSD
Differential Revision: https://reviews.freebsd.org/D12337
Submitted by: Fabian Keil
2017-09-12 23:46:33 +00:00
Alan Somers
71cd87c66c Remove spaces from CTL devices' default serial numbers
It's awkward to have spaces in CAM device serial numbers. That leads to
such things as device nodes named "/dev/diskid/MYSERIAL%20%20%201". Better
to replace the spaces with "0"s. This change only affects the default
serial numbers for users who don't provide their own.

Reviewed by:	ken, mav
MFC after:	Never
Relnotes:	Yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12263
2017-09-12 19:36:24 +00:00
Conrad Meyer
b1631dfb46 cam(4): Fix some warnings
When bcopy is treated as memcpy/memmove, Clang produces warnings that the
size argument doesn't match the type of the source.  This is true, it
doesn't match; we're aliasing the source.

Explicitly cast the source pointer to the expected type to remove the
warning.

No functional change.

Sponsored by:	Dell EMC Isilon
2017-09-07 07:24:22 +00:00
Warner Losh
a3cf03a519 Add missing test for NVME CCBs for nvme passthru support.
Submitted by: Chuck Tuffli
2017-08-29 21:04:29 +00:00
Warner Losh
9f8ed7e40b Fix NVMe's use of XPT_GDEV_TYPE
This patch changes the way XPT_GDEV_TYPE works for NVMe. The current
ccb_getdev structure includes pointers to the NVMe Identify Controller
and Namespace structures, but these are kernel virtual addresses which
are not accessible from user space.

As an alternative, the patch changes the pointers into padding in
ccb_getdev and adds two new types to ccb_dev_advinfo to retrieve the
Identify Controller (CDAI_TYPE_NVME_CNTRL) and Namespace
(CDAI_TYPE_NVME_NS) data structures.

Reviewed By: rpokala, imp
Differential Revision: https://reviews.freebsd.org/D10466
Submitted by: Chuck Tuffli
2017-08-29 17:03:30 +00:00
Warner Losh
c2005bba77 Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck
Tuffli had submitted a more thorough patch that I was unaware of when
I did my work and this brings in the bits I missed from that patch.

PR: 220267
Submitted by: Chuck Tuffli
2017-08-29 15:46:34 +00:00
Warner Losh
519772814d Add CAM/NVMe support for CAM_DATA_SG
This adds support in pass(4) for data to be described with a
scatter-gather list (sglist) to augment the existing (single) virtual
address.

Differential Revision: https://reviews.freebsd.org/D11361
Submitted by: Chuck Tuffli
Reviewed by: imp@, scottl@, kenm@
2017-08-29 15:29:57 +00:00
Warner Losh
9754579b01 Add comment about where we need to place this routine, and why.
Sponsored by: Netflix
2017-08-28 19:27:33 +00:00
Warner Losh
43effc8ca8 Add comment about where we need to place this routine, and why.
Sponsored by: Netflix
2017-08-28 19:25:49 +00:00
Warner Losh
08fc2f23b3 Expand the latency tracking array from 1.024s to 8.192s to help track
extreme outliers from dodgy drives. Adjust comments to reflect this,
and make sure that the number of latency buckets match in the two
places where it matters.
2017-08-24 22:11:10 +00:00
Warner Losh
e4c9cba71f Fix 32-bit overflow on latency measurements
o Allow I/O scheduler to gather times on 32-bit systems. We do this by shifting
  the sbintime_t over by 8 bits and truncating to 32-bits. This gives us 8.24
  time. This is sufficient both in range (256 seconds is about 128x what current
  users need) and precision (60ns easily meets the 1ms smallest bucket size
  measurements). 64-bit systems are unchanged. Centralize all the time math so
  it's easy to tweak tha range / precision tradeoffs in the future.
o While I'm here, the I/O scheduler should be using periph_data rather than
  sim_data since it is operating on behalf of the periph.

Differential Review: https://reviews.freebsd.org/D12119
2017-08-24 22:10:58 +00:00
Ed Maste
0b4060b073 cam iosched: fix typos in comments
PR:		220947
Submitted by:	Fabian Keil
Obtained from:	ElectroBSD
2017-08-18 16:38:33 +00:00
Alexander Motin
c901a9b1f1 Do not loose CCB flags after r320493.
There is at least CAM_UNLOCKED that should be kept.

MFC after:	3 days
2017-08-09 09:13:15 +00:00
Warner Losh
d45e16744f Add nvd alias to nda ndoes.
All ndaX and ndaXpY nodes will appear as nvdX and nvdXpY as well
(through symlinks in devfs via the normal disk aliasing mechanism in
GEOM).

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:43 +00:00
Alexander Motin
1d9aaa862b adaasync(): Set ADA_STATE_WCACHE based on ADA_FLAG_CAN_WCACHE
The attached patch lets adaasync() set ADA_STATE_WCACHE based on
ADA_FLAG_CAN_WCACHE instead of ADA_FLAG_CAN_RAHEAD.

This fixes a regression introduced in r300207 which changed
the flag names.

PR:		220948
Submitted by:	Fabian Keil <fk@fabiankeil.de>
Obtained from:	ElectroBSD
MFC after:	1 week
2017-07-27 07:28:29 +00:00
Warner Losh
df4245150a This adds CAM pass(4) support for NVMe IO's. Applications indicate
the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or
XPT_NVME_IO.

Submitted by:   Chuck Tuffli <chuck@tuffli.net>
Differential Revision:  https://reviews.freebsd.org/D10247
2017-07-14 14:52:20 +00:00
Sean Bruno
d03ae351ed Add 4k and NCQ_TRIM_BROKEN quirks for Samsung 845 SSDs.
Submitted by:	 hannula@gmail.com
Differential Revision:	https://reviews.freebsd.org/D7967
2017-07-13 16:56:26 +00:00
Sean Bruno
989e632aa7 Add 4K quirks for Samsung 750 EVO SSD
Submitted by:	lev
Reviewed by:	mav
Differential Revision:	https://reviews.freebsd.org/D9478
2017-07-13 15:33:08 +00:00
Warner Losh
9f74b6d90a Move mmc_parmas to the end of the structure for better compatability. 2017-07-10 21:55:19 +00:00
Warner Losh
4fccee4f22 Kill some unnecessary noise. 2017-07-10 21:38:26 +00:00
Warner Losh
4e38d89520 Include opt files in the kernel with "" instead of <>. 2017-07-10 05:08:01 +00:00
Warner Losh
6df06cd671 Opt files are included with single quotes. 2017-07-10 03:38:12 +00:00
Warner Losh
a94a63f0a6 An MMC/SD/SDIO stack using CAM
Implement the MMC/SD/SDIO protocol within a CAM framework. CAM's
flexible queueing will make it easier to write non-storage drivers
than the legacy stack. SDIO drivers from both the kernel and as
userland daemons are possible, though much of that functionality will
come later.

Some of the CAM integration isn't complete (there are sleeps in the
device probe state machine, for example), but those minor issues can
be improved in-tree more easily than out of tree and shouldn't gate
progress on other fronts. Appologies to reviews if specific items
have been overlooked.

Submitted by: Ilya Bakulin
Reviewed by: emaste, imp, mav, adrian, ian
Differential Review: https://reviews.freebsd.org/D4761

merge with first commit, various compile hacks.
2017-07-09 16:57:24 +00:00
Ed Maste
8fadf6a637 cam: EOL whitespace cleanup and line wrapping changes
NFC. This cleanup simplifies diffs for review of the MMC-CAM work.

Submitted by:	kibab
2017-07-04 18:48:08 +00:00
Alexander Motin
e8583acf9a Allow status aggregation for ramdisk reads. 2017-06-30 07:48:08 +00:00
Alexander Motin
86ce2be65a Unify INOT/ATIO setup. 2017-06-30 06:34:49 +00:00
Kenneth D. Merry
59fe76647c Fix a panic in camperiphfree().
If a peripheral driver (e.g. da, sa, cd) is added or removed from the
peripheral driver list while an unrelated peripheral driver instance (e.g.
da0, sa5, cd2) is going away and is inside camperiphfree(), we could
dereference an invalid pointer.

When peripheral drivers are added or removed (see periphdriver_register()
and periphdriver_unregister()), the peripheral driver array is resized
and existing entries are moved.

Although we hold the topology lock while we traverse the peripheral driver
list, we retain a pointer to the location of the peripheral driver pointer
and then drop the topology lock.  So we are still vulnerable to the list
getting moved around while the lock is dropped.

To solve the problem, cache a copy of the peripheral driver pointer.  If
its storage location in the list changes while we have the lock dropped, it
won't have any effect.

This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
to individual instances like "da0", "cd0") are not generally part of a
reference counting scheme to guard against deregistering them while there
are instances active.  The caller (generally the person unloading a module)
has to be aware of active drivers and not unload something that is in use.

sys/cam/cam_periph.c:
	In camperiphfree(), cache a pointer to the peripheral driver
	instance to avoid holding a pointer to an invalid memory location
	in the event that the peripheral driver list changes while we have
	the topology lock dropped.

PR:		kern/219701
Submitted by:	avg
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-06-27 19:26:02 +00:00
Kenneth D. Merry
89763b3f8e In scsi_zbc_in(), fill in the length in the ZBC IN CDB.
Without the allocation length set, the target will either reject
the command or complete it without transferring any data.

This fixes the REPORT ZONES command for SCSI ZBC protocol devices,
as well as ATA ZAC protocol devices that are behind a SCSI to ATA
translation layer.  (LSI/Broadcom's 12Gb SAS adapters translate ZBC
commands to ZAC commands.)  Those are Host Aware and Host Managed SMR
drives.

This will fix REPORT ZONE commands sent to the da(4) driver via the
GEOM bio interface and zonectl, and REPORT ZONE commands sent from
camcontrol(8).

Note that in the case of camcontrol(8), we currently only send
SCSI ZBC commands to native SCSI protocol devices, not ATA devices
behind a SAT layer.

sys/cam/scsi/scsi_da.c:
	Fill in the length field in scsi_zbc_in().

MFC after:	3 days
Sponsored by:	Spectra Logic
2017-06-27 17:55:25 +00:00
Warner Losh
1d6e811063 Namespace is 32-bits, don't cast it to 16 here 2017-06-27 16:48:05 +00:00
Mark Johnston
b3db6c0140 Fix a memory leak in ses_get_elm_devnames().
After r307132 the sbuf buffer is malloc()ed, but corresponding
sbuf_delete() call was missing.

Fix a nearby whitespace bug.

MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2017-06-26 19:41:14 +00:00
Kenneth D. Merry
6f579fdb17 Fix a potential sleep while holding a mutex in the sa(4) driver.
If the user issues a MTIOCEXTGET ioctl, and the tape drive in question has
a serial number that is longer than 80 characters, we malloc a buffer in
saextget() to hold the output of cam_strvis().

Since a mutex is held in that codepath, doing a M_WAITOK malloc could lead
to sleeping while holding a mutex.  Change it to a M_NOWAIT malloc and bail
out if we fail to allocate the memory.  Devices with serial numbers longer
than 80 bytes are very rare (I don't recall seeing one), so this
should be a very unusual case to hit.  But it is a bug that should be fixed.

sys/cam/scsi/scsi_sa.c:
	In saextget(), if we need to malloc a buffer to hold the output of
	cam_strvis(), don't wait for the memory.  Fail and return an error
	if we can't allocate the memory immediately.

PR:		kern/220094
Submitted by:	Jia-Ju Bai <baijiaju1990@163.com>
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-06-19 20:48:00 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Wojciech Macek
631f8f40d3 Introduce Genesys GL3224 quirks
The Genesys chip is failing when issueing READ_CAP(16) command.
Force a quirk to disable it and use READ_CAP(10) instead.

Also, depending on used firmware, GL3224 can be recognized
either as 'storage device' or 'mass storage class' -
enable both variants in scsi_quirk_table.

Submitted by:    Wojciech Macek <wma@semihalf.com>
                 Konrad Adamczyk <ka@semihalf.com>
Obtained from:   Semihalf
Sponsored by:    Stormshield
Reviewed by:     mav
Differential revision: https://reviews.freebsd.org/D10902
2017-05-29 09:22:53 +00:00
Andriy Gapon
b5617df55b Allow PROBE_SPINUP to fail in CAM ATA transport
The motivation for this is two-fold.

1. Some old WD SATA disks may appear as if they need to be spun up
when they are already spinning.  Those disks would respond with
an error to the spin-up request.

2. Even if we really fail to spin up the disk, we still can try to
proceed to the subsequent phases.  If we fail later on, then no
difference.  Otherwise we get a chance to communicate with the
disk which is better than completely ignoring it, because a user
can try to recover the disk.

Reviewed by:	mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D10896
2017-05-26 17:44:47 +00:00
Kenneth D. Merry
64409eeee7 Add basic programmable early warning error injection to the sa(4) driver.
This will help application developers simulate end of tape conditions.

To inject an error in sa0:

sysctl kern.cam.sa.0.inject_eom=1

This will return the next read or write request queued with 0 bytes
written.  Any subsequent writes or reads will go along as usual.

This will also cause the early warning position flag to get set
for the next position query.  So, 'mt status' will show the BPEW
(Beyond Programmable Early Warning) flag on the first query after
an error injection.  After that, the position flags will be as they
are in the underlying tape drive.

Also, update the sa(4) man page to describe tape parameters,
which can be set via 'mt param'.

sys/cam/scsi/scsi_sa.c:
	In saregister(), create the inject_eom sysctl variable.

	In sastart(), check to see whether inject_eom is set.  If
	so, return the read or write with 0 bytes written to
	indicate EOM.  Set the set_pews_status flag so that we
	fake PEWS status in the next position call for reads, and the
	next 3 calls for writes.  This allows the user to see the BPEW
	flag one time via 'mt status'.

	In sagetpos(), check the set_pews_status flag and fake
	PEWS status and decrement the counter if it is set.

share/man/man4/sa.4:
	Document the inject_eom sysctl variable.

	Document all of the parameters currently supported via
	'mt param'.

usr.bin/mt/mt.1:
	Point the user to the sa(4) man page for more details on
	supported parameters.

MFC after:	3 days
Sponsored by:	Spectra Logic
2017-05-05 20:00:53 +00:00