Commit Graph

2162 Commits

Author SHA1 Message Date
Eric van Gyzen
2ebb808f8c cam nvme: fix array overrun
Fix a classic array overrun where the index could be one past the end.

Reported by:	Coverity
CID:		1356596
MFC after:	3 days
Sponsored by:	Dell EMC
2018-05-28 03:14:36 +00:00
Alexander Motin
f439e3a4ff Refactor NVMe CAM integration.
- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
 - Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
 - Add initial support for namespace change async events.  So far only
in CAM mode, but it allows run-time namespace arrival and departure.
 - Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.

Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-05-25 03:34:33 +00:00
Warner Losh
b1988d44b3 We can't release the refcount outside of the periph lock.
We're dropping the periph lock then dropping the refcount. However,
that violates the locking protocol and is racy. This seems to be
the cause of weird occasional panics with a bogus assert.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15517
2018-05-24 16:31:18 +00:00
Ilya Bakulin
96e47614f9 Implement initial MMC partitions support for MMCCAM.
For MMC cards, add partitions found on the card as separate disk(9) devices.
Don't do anything with RPMB partition for now.
Lots of code is copied almost 1:1 from the mmcsd.c in the old stack,
credits Marius Strobl (marius@FreeBSD.org)

Reviewed by:	marius
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12762
2018-05-22 22:16:49 +00:00
Ilya Bakulin
7fbf511890 Fix MMCCAM scanning for new cards.
r326645 used an incorrect argument for xpt_path_inq().

Reviewed by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15521
2018-05-22 16:32:34 +00:00
Warner Losh
d9a7a61b2b Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.

Sponsored by: Netflix
Submitted by: Chuck Silvers
2018-05-15 22:22:10 +00:00
Warner Losh
0eedd21317 Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)

Sponsored by: Netflix
Submitted by: Chuck Silvers
Differential Revision: https://reviews.freebsd.org/D15435
2018-05-15 21:25:35 +00:00
Li-Wen Hsu
137c41d763 Fix build for platforms using GCC:
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15383
2018-05-10 17:22:04 +00:00
Marcelo Araujo
8951f05525 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
Warner Losh
041f49aece Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
Scott Long
4899b94bac Refactor dadone(). There was no useful code sharing in it; it was just
a 1500 line switch statement.  Callers now specify a discrete completion
handler, though they're still welcome to track state via ccb_state.

Sponsored by:	Netflix
2018-05-01 21:42:27 +00:00
Scott Long
eed99e7557 cam_periph_runccb() changed several years ago to overwrite the ccb callback
pointer.  It's now unhelpful and misleading for callers to continue to set
it, so bring all callers into conformance.  There's no real functional change,
but it makes reading the code a lot less confusing.

Sponsored by:	Netflix
2018-05-01 20:09:29 +00:00
Scott Long
7631477269 Add and fix comments for cam_periph_runccb()
Sponsored by:	Netflix
2018-05-01 17:48:50 +00:00
Warner Losh
c67f3c609b Just assert that the lock is held here, rather than taking it out and
dropping it.

Sponsored by: Netflix
2018-04-13 16:45:35 +00:00
Kenneth D. Merry
fc774835cb Handle Programmable Early Warning for control commands in sa(4).
When the tape position is inside the Early Warning area, the tape
drive will return a sense key of NO SENSE, and an ASC/ASCQ of
0x00,0x02, which means: End-of-partition/medium detected".  If
this was in response to a control command like WRITE FILEMARKS,
we correctly translate this as informational status and return
0 from saerror().

Programmable Early Warning should be handled the same way, but
we weren't handling it that way.  As a result, if a PEW status
(sense key of NO SENSE, ASC/ASCQ of 0x00,0x07, "Programmable early
warning detected") came back in response to a WRITE FILEMARKS,
we returned an error.

The impact of this was that if an application was writing to a
sa(4) device, and a PEW area was set (in the Device Configuration
Extension subpage -- mode page 0x10, subpage 1), and a filemark
needed to be written on close, we could wind up returning an error
to the user on close because of a "failure" to write the filemarks.

It actually isn't a failure, but rather just a status report from
the drive, and shouldn't be treated as a failure.

sys/cam/scsi/scsi_sa.c:
	For control commands in saerror(), treat asc/ascq 0x00,0x07
	the same as 0x00,{0-5} -- not an error.  Return 0, since
	the command actually did succeed.

Reported by:	Dr. Andreas Haakh <andreas@haakh.de>
Tested by:	Dr. Andreas Haakh <andreas@haakh.de>
Sponsored by:	Spectra Logic
MFC after:	3 days
2018-04-12 21:21:18 +00:00
Alexander Motin
d8d4983e5e Do not fail devices just for errors in descriptor format.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-04-06 19:47:44 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Warner Losh
6a6c0d5844 Flag when we have a pending TUR. Don't schedule another one when we
have one pending. Otherwise, we can race and send two, which is
wasteful in close proximity. It can also cause the acaquire/release
count for TUR to be > 1, which is undexpected.

PR: 226510
Differential Review: https://reviews.freebsd.org/D14792
2018-03-23 16:23:15 +00:00
Warner Losh
df4ee7639e Revert r331273: "Release the "TUR" reference when clearing the TUR work flag. We mostly"
It exposes other issues, so revert to the pervious state of known issues.
2018-03-21 12:55:59 +00:00
Warner Losh
7b0eb8dbf8 Release the "TUR" reference when clearing the TUR work flag. We mostly
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.

PR: 226510
Sponsored by: Netflix
2018-03-20 22:07:45 +00:00
John Baldwin
e875be212d Use <stdarg.h> instead of <machine/stdarg.h> in userland.
<machine/stdarg.h> is a kernel-only header.  The standard header for
userland is <stdarg.h>.  Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.

Reviewed by:	arichardson, brooks, imp
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D14776
2018-03-20 21:00:45 +00:00
Warner Losh
400326b667 Kill assert I shouldn't have committed 2018-03-20 13:14:10 +00:00
Warner Losh
afdbfe1e1b Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.

Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.

Sponsored by: Netflix
2018-03-20 03:37:14 +00:00
Warner Losh
6f591d13fd Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.

Sponsored by: Netflix
2018-03-20 03:37:09 +00:00
Warner Losh
fdfc0a83a3 Remove some redundant MPSAFE flags.
This was pointed out in a code review I'm having trouble finding right
now, but go ahead and eliminate these.

Sponsored by: Netfix
2018-03-20 03:37:04 +00:00
Kenneth D. Merry
0afdc47158 cam_periph_acquire() now returns an errno.
The ch(4) driver was missed in change 328918, which changed
cam_periph_acquire() to return an errno instead of cam_status.

As a result, ch(4) failed to attach.

Sponsored by:	Spectra Logic
2018-03-19 20:19:00 +00:00
Warner Losh
378e38c1cf Only take out the periph lock when we're modifying the flags of the
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.

Reported by:	avg
PR:		226510 (same panic, differnt circumstances)
Sponsored by:	Netflix
2018-03-17 16:04:06 +00:00
Edward Tomasz Napierala
6616539dcc Fix iSCSI target crash on session reinstation.
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources.  At
this point another thread receives a PDU.  Boom.

PR:		222898, 219866
Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by:	mav@ (earlier version)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-03-15 17:36:13 +00:00
Warner Losh
d38677d23c Create a sysctl kern.cam.{,a,n}da.X.invalidate
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
2018-03-14 17:53:37 +00:00
Warner Losh
157cb465c4 Fix inverted logic that counted all completions as errors, except when
they were actual errors.

Sponsored by: Netflix
2018-03-14 16:44:57 +00:00
Warner Losh
807e94b2c3 Implement trim collapsing in nda
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).

Sponsored by: Netflix
2018-03-14 16:44:50 +00:00
Warner Losh
8a3de7bc34 Allow NULL ccb to cam_iosched_bio_complete
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.

Sponsored by: Netflix
2018-03-14 16:44:16 +00:00
Brooks Davis
405b67a225 Reject ioctls to SCSI enclosures from 32-bit compat processes.
The ioctl objects contain pointers and require translation and some
refactoring of the infrastructure to work. For now prevent opertion
on garbage values. This is very slightly overbroad in that ENCIOC_INIT
is safe.

Reviewed by:	imp, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14671
2018-03-12 23:02:01 +00:00
Brooks Davis
871dc9833b Reject CAMIOGET and CAMIOQUEUE ioctl's on pass(4) in 32-bit compat mode.
These take a union ccb argument which is full of kernel pointers.
Substantial translation efforts would be required to make this work.
By rejecting the request we avoid processing or returning entierly
wrong data.

Reviewed by:	imp, ken, markj, cem
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14654
2018-03-12 22:58:07 +00:00
Warner Losh
af1823cde8 Tighten up periph lock to avoid some races
Make sure the periph lock is held around rmw access to softc data,
espeically flags, including work flags in iosched.
Add asserts for the periph lock where it should be held.

PR: 226510
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D14456
2018-03-12 15:17:16 +00:00
Conrad Meyer
2e1fccf2cf nvme_da: Fix minor memory leak in error case
Reported by:	cppcheck
Sponsored by:	Dell EMC Isilon
2018-03-10 01:28:55 +00:00
Warner Losh
2d87718fda Use bool instead of int for predicate functions relating to work
available.
2018-02-23 16:06:54 +00:00
Wojciech Macek
0d787e9b35 NVMe: Add big-endian support
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           imp, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
2018-02-22 13:32:31 +00:00
Warner Losh
07e5967a22 Revert r329814 as well. It should have been in r329819. 2018-02-22 11:51:50 +00:00
Warner Losh
0028abe633 Backout r329818, r329816 and r329815.
These aren't the commits I thought I was testing prior to
commit. Revert until I can sort out what happened and fix it.
2018-02-22 11:18:33 +00:00
Warner Losh
91acaad987 Fix typo in last commit after last rebase before commit... 2018-02-22 10:55:23 +00:00
Warner Losh
4d87e27125 Combine BIO_DELETE requests for nda devices
Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler,
it make sense to try to combine as many as possible into a single
request to send down to hardware. Hopefully, lots of larger requests
like this are better than lots of individual transactions.

Note for future: need to limit based on total size of the trim
request. Should also collapse adjacent ranges where possible to
increase the size of the max payload.

Sponsored by: Netflix
2018-02-22 05:44:00 +00:00
Warner Losh
c5fe3ae9b8 Introduce capacity flags for periphs
Introduce flags word to describe the capacities of the peripheral.
First bit will describe if the periph driver allows multiple
outstanding TRIMS to be active in a device.

Modify the I/O scheduler so that the nda driver can queue trims
for a while after the first one arrives. We'll queue until we see
a I/O scheduler tick, then we'll schedule as many TRIMs as allowed
by other factors (currently this is slocts in the NVMe controller).
This mariginally helps the read latency issues we see with reads,
but sets the stage for the nda driver to do TRIM collapsing like the
da and ada drivers do today.

Sponsored by: Netflix
2018-02-22 05:43:55 +00:00
Warner Losh
c9878d6d63 Note when we tick.
To help implement a policy of 'queue all trims until next I/O sched
tick' policy to help coalesce them, note when we tick so we can do
something special on the first call after the tick to get more work.

Sponsored by: Netflix
2018-02-22 05:43:50 +00:00
Warner Losh
f2b9885036 Wrap an extra long line
This debugging line is too big for even my largest xterm. wrap it at
about 80 columns.

Sponsored by: Netflix
2018-02-22 05:43:45 +00:00
Warner Losh
97f8aa050e Don't sort TRIMs.
While the code for ada and da both assume that the trim list is
ordered when doing the coaleascing the TRIMs, it turns out that
creating the sorted list uses more resources than are saved by having
slightly fewer trims sent to the device.

Sponsored by: Netflix
2018-02-22 05:43:20 +00:00
Warner Losh
6ffe523f18 Minor formatting nits. 2018-02-21 23:49:35 +00:00
Edward Tomasz Napierala
8404ad78db Use proper buffer length (the announce_buf char pointer used to be anarray),
broken in r317143. This fixes those weird "cd0: Attempt" messages at boot.

PR:		222103
Reviewed by:	scottl@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14369
2018-02-21 14:05:13 +00:00
Warner Losh
bc40691e40 Report the number of remaining retries when we have an error that
we're retrying.
2018-02-15 18:57:54 +00:00
Warner Losh
9d602e4e3d Fix cut and pasted comments to reflect differences in code from the
original source.

Sponsored by: Netflix
2018-02-07 18:33:46 +00:00