Commit Graph

2171 Commits

Author SHA1 Message Date
Warner Losh
62c94a0551 For the dynamic I/O scheduler, make the TRIM stuff also count against
read bias so we do reads in preference to TRIMs. This helps a lot when
many trims are delivered at once from the upper layers as they tend to
delay READs due to priority inversion in the code today.

The non iosched case will be fixed when the trim comibing changes
needed for nvme come in later this year.

Sponsored by: Netflix
2018-07-26 22:55:51 +00:00
Alexander Motin
79fab7d48a Stop further SCSI recovery attempts after one has failed.
We've got a set of probably damaged hard disks, reporting 0x04,0x02
("Logical unit not ready, initializing command required") in response
to READ CAPACITY(16), where attempts to use START STOP UNIT for recovery
results in 0x44,0x00 ("Internal target failure") after ~1 second delay.
As result of all recovery retries, device open attempt took ~3 seconds
before finally reporting to GEOM that device is opened, but has no media.
If the open was for writing and since it hasn't formally failed, following
close triggered GEOM retaste, opening device few more times with respective
delays.

This change reduces whole time of this cycle from ~12 seconds to ~3 by
giving up on recovery after the first failure.

Reviewed by:	ken
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-07-21 21:34:10 +00:00
Andriy Gapon
b0af06052c remove unneeded inclusion of sys/interrupt.h from several files
It's likely that the header was needed in the past for swi(9).
But now that code does not use swi(9) or any other interfaces defined
in sys/interrupt.h.

MFC after:	1 week
2018-07-04 09:07:18 +00:00
Ilya Bakulin
e8e5c76419 Fix setting RCA for MMC cards
Unlike SD cards, that publish RCA in response to CMD3,
MMC cards expect the host to set RCA itself.

Since we don't support multiple MMC cards on the bus,
just assign a static RCA of 2 to the attached MMC card.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D13063
2018-06-19 20:02:03 +00:00
Ilya Bakulin
8b0e085f65 Don't try to turn power down MMC bus if it is already down
Regulator framework doens't like turning off already turned off
regulators, so we get panic on AllWinner boards.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15890
2018-06-19 11:28:50 +00:00
Ilya Bakulin
4c4200c6d9 Correctly define rawscr so initializing it doesn't result in overwriting memory.
We need 8 bytes of storage for rawscr.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15889
2018-06-19 11:25:40 +00:00
Ilya Bakulin
3f1cfdb122 Set MMC_DATA_MULTI flag when doing multi-block transfers
Lower layers (MMC / SDHCI controller drivers) may make certain decisions
based on the presence of this flag. The fact that sdhci.c doesn't
look at this flag is another problem that should be fixed separately.

Found when adding MMCCAM support to AllWinner MMC controller driver
where the presence of this flag actually matters.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15888
2018-06-19 11:23:48 +00:00
Kenneth D. Merry
e4b58dfe33 Fix da(4) locking when probing SMR drives.
Probing host aware and host managed SMR drives got broken in revision
330796.

The added cam_periph_lock() calls were in areas in dadone() where
the peripheral lock was already held.

Since then, dadone() has been split into separate functions that are
dedicated to each probe state.

The result is that when probing a host aware drive, I ran into a recursive
lock acquisition in dadone_probeatalogdir(). I would have run into the
same problem in dadone_probeataiddir(), and in dadone_probeatasup() and
dadone_probeatazone() in the error paths had the probe continued.

The solution is to take out all of the extra cam_periph_lock() calls. I
also added cam_periph_assert(periph, MA_OWNED) near the top of each of
the dadone_* calls. These make it clear to anyone coming along in the
the future that the lock is held in the probe done functions.

Also add a locking assert in daprobedone(), to make it clear that it must
be called with the periph lock held.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D15764
2018-06-14 17:08:44 +00:00
Ilya Bakulin
d670d9518f Enable high-speed on the card before increasing frequency on the controller
Increasing operating frequency without telling card to switch
to high-speed mode first upsets some cards and generates CRC errors.

While here, deselect / reselect cards after CMD6 and SCR fetch, as in original code.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15568
2018-06-05 11:03:24 +00:00
Eric van Gyzen
2ebb808f8c cam nvme: fix array overrun
Fix a classic array overrun where the index could be one past the end.

Reported by:	Coverity
CID:		1356596
MFC after:	3 days
Sponsored by:	Dell EMC
2018-05-28 03:14:36 +00:00
Alexander Motin
f439e3a4ff Refactor NVMe CAM integration.
- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
 - Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
 - Add initial support for namespace change async events.  So far only
in CAM mode, but it allows run-time namespace arrival and departure.
 - Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.

Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-05-25 03:34:33 +00:00
Warner Losh
b1988d44b3 We can't release the refcount outside of the periph lock.
We're dropping the periph lock then dropping the refcount. However,
that violates the locking protocol and is racy. This seems to be
the cause of weird occasional panics with a bogus assert.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15517
2018-05-24 16:31:18 +00:00
Ilya Bakulin
96e47614f9 Implement initial MMC partitions support for MMCCAM.
For MMC cards, add partitions found on the card as separate disk(9) devices.
Don't do anything with RPMB partition for now.
Lots of code is copied almost 1:1 from the mmcsd.c in the old stack,
credits Marius Strobl (marius@FreeBSD.org)

Reviewed by:	marius
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12762
2018-05-22 22:16:49 +00:00
Ilya Bakulin
7fbf511890 Fix MMCCAM scanning for new cards.
r326645 used an incorrect argument for xpt_path_inq().

Reviewed by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15521
2018-05-22 16:32:34 +00:00
Warner Losh
d9a7a61b2b Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.

Sponsored by: Netflix
Submitted by: Chuck Silvers
2018-05-15 22:22:10 +00:00
Warner Losh
0eedd21317 Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)

Sponsored by: Netflix
Submitted by: Chuck Silvers
Differential Revision: https://reviews.freebsd.org/D15435
2018-05-15 21:25:35 +00:00
Li-Wen Hsu
137c41d763 Fix build for platforms using GCC:
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15383
2018-05-10 17:22:04 +00:00
Marcelo Araujo
8951f05525 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
Warner Losh
041f49aece Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
Scott Long
4899b94bac Refactor dadone(). There was no useful code sharing in it; it was just
a 1500 line switch statement.  Callers now specify a discrete completion
handler, though they're still welcome to track state via ccb_state.

Sponsored by:	Netflix
2018-05-01 21:42:27 +00:00
Scott Long
eed99e7557 cam_periph_runccb() changed several years ago to overwrite the ccb callback
pointer.  It's now unhelpful and misleading for callers to continue to set
it, so bring all callers into conformance.  There's no real functional change,
but it makes reading the code a lot less confusing.

Sponsored by:	Netflix
2018-05-01 20:09:29 +00:00
Scott Long
7631477269 Add and fix comments for cam_periph_runccb()
Sponsored by:	Netflix
2018-05-01 17:48:50 +00:00
Warner Losh
c67f3c609b Just assert that the lock is held here, rather than taking it out and
dropping it.

Sponsored by: Netflix
2018-04-13 16:45:35 +00:00
Kenneth D. Merry
fc774835cb Handle Programmable Early Warning for control commands in sa(4).
When the tape position is inside the Early Warning area, the tape
drive will return a sense key of NO SENSE, and an ASC/ASCQ of
0x00,0x02, which means: End-of-partition/medium detected".  If
this was in response to a control command like WRITE FILEMARKS,
we correctly translate this as informational status and return
0 from saerror().

Programmable Early Warning should be handled the same way, but
we weren't handling it that way.  As a result, if a PEW status
(sense key of NO SENSE, ASC/ASCQ of 0x00,0x07, "Programmable early
warning detected") came back in response to a WRITE FILEMARKS,
we returned an error.

The impact of this was that if an application was writing to a
sa(4) device, and a PEW area was set (in the Device Configuration
Extension subpage -- mode page 0x10, subpage 1), and a filemark
needed to be written on close, we could wind up returning an error
to the user on close because of a "failure" to write the filemarks.

It actually isn't a failure, but rather just a status report from
the drive, and shouldn't be treated as a failure.

sys/cam/scsi/scsi_sa.c:
	For control commands in saerror(), treat asc/ascq 0x00,0x07
	the same as 0x00,{0-5} -- not an error.  Return 0, since
	the command actually did succeed.

Reported by:	Dr. Andreas Haakh <andreas@haakh.de>
Tested by:	Dr. Andreas Haakh <andreas@haakh.de>
Sponsored by:	Spectra Logic
MFC after:	3 days
2018-04-12 21:21:18 +00:00
Alexander Motin
d8d4983e5e Do not fail devices just for errors in descriptor format.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-04-06 19:47:44 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Warner Losh
6a6c0d5844 Flag when we have a pending TUR. Don't schedule another one when we
have one pending. Otherwise, we can race and send two, which is
wasteful in close proximity. It can also cause the acaquire/release
count for TUR to be > 1, which is undexpected.

PR: 226510
Differential Review: https://reviews.freebsd.org/D14792
2018-03-23 16:23:15 +00:00
Warner Losh
df4ee7639e Revert r331273: "Release the "TUR" reference when clearing the TUR work flag. We mostly"
It exposes other issues, so revert to the pervious state of known issues.
2018-03-21 12:55:59 +00:00
Warner Losh
7b0eb8dbf8 Release the "TUR" reference when clearing the TUR work flag. We mostly
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.

PR: 226510
Sponsored by: Netflix
2018-03-20 22:07:45 +00:00
John Baldwin
e875be212d Use <stdarg.h> instead of <machine/stdarg.h> in userland.
<machine/stdarg.h> is a kernel-only header.  The standard header for
userland is <stdarg.h>.  Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.

Reviewed by:	arichardson, brooks, imp
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D14776
2018-03-20 21:00:45 +00:00
Warner Losh
400326b667 Kill assert I shouldn't have committed 2018-03-20 13:14:10 +00:00
Warner Losh
afdbfe1e1b Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.

Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.

Sponsored by: Netflix
2018-03-20 03:37:14 +00:00
Warner Losh
6f591d13fd Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.

Sponsored by: Netflix
2018-03-20 03:37:09 +00:00
Warner Losh
fdfc0a83a3 Remove some redundant MPSAFE flags.
This was pointed out in a code review I'm having trouble finding right
now, but go ahead and eliminate these.

Sponsored by: Netfix
2018-03-20 03:37:04 +00:00
Kenneth D. Merry
0afdc47158 cam_periph_acquire() now returns an errno.
The ch(4) driver was missed in change 328918, which changed
cam_periph_acquire() to return an errno instead of cam_status.

As a result, ch(4) failed to attach.

Sponsored by:	Spectra Logic
2018-03-19 20:19:00 +00:00
Warner Losh
378e38c1cf Only take out the periph lock when we're modifying the flags of the
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.

Reported by:	avg
PR:		226510 (same panic, differnt circumstances)
Sponsored by:	Netflix
2018-03-17 16:04:06 +00:00
Edward Tomasz Napierala
6616539dcc Fix iSCSI target crash on session reinstation.
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources.  At
this point another thread receives a PDU.  Boom.

PR:		222898, 219866
Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by:	mav@ (earlier version)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-03-15 17:36:13 +00:00
Warner Losh
d38677d23c Create a sysctl kern.cam.{,a,n}da.X.invalidate
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
2018-03-14 17:53:37 +00:00
Warner Losh
157cb465c4 Fix inverted logic that counted all completions as errors, except when
they were actual errors.

Sponsored by: Netflix
2018-03-14 16:44:57 +00:00
Warner Losh
807e94b2c3 Implement trim collapsing in nda
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).

Sponsored by: Netflix
2018-03-14 16:44:50 +00:00
Warner Losh
8a3de7bc34 Allow NULL ccb to cam_iosched_bio_complete
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.

Sponsored by: Netflix
2018-03-14 16:44:16 +00:00
Brooks Davis
405b67a225 Reject ioctls to SCSI enclosures from 32-bit compat processes.
The ioctl objects contain pointers and require translation and some
refactoring of the infrastructure to work. For now prevent opertion
on garbage values. This is very slightly overbroad in that ENCIOC_INIT
is safe.

Reviewed by:	imp, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14671
2018-03-12 23:02:01 +00:00
Brooks Davis
871dc9833b Reject CAMIOGET and CAMIOQUEUE ioctl's on pass(4) in 32-bit compat mode.
These take a union ccb argument which is full of kernel pointers.
Substantial translation efforts would be required to make this work.
By rejecting the request we avoid processing or returning entierly
wrong data.

Reviewed by:	imp, ken, markj, cem
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14654
2018-03-12 22:58:07 +00:00
Warner Losh
af1823cde8 Tighten up periph lock to avoid some races
Make sure the periph lock is held around rmw access to softc data,
espeically flags, including work flags in iosched.
Add asserts for the periph lock where it should be held.

PR: 226510
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D14456
2018-03-12 15:17:16 +00:00
Conrad Meyer
2e1fccf2cf nvme_da: Fix minor memory leak in error case
Reported by:	cppcheck
Sponsored by:	Dell EMC Isilon
2018-03-10 01:28:55 +00:00
Warner Losh
2d87718fda Use bool instead of int for predicate functions relating to work
available.
2018-02-23 16:06:54 +00:00
Wojciech Macek
0d787e9b35 NVMe: Add big-endian support
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           imp, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
2018-02-22 13:32:31 +00:00
Warner Losh
07e5967a22 Revert r329814 as well. It should have been in r329819. 2018-02-22 11:51:50 +00:00
Warner Losh
0028abe633 Backout r329818, r329816 and r329815.
These aren't the commits I thought I was testing prior to
commit. Revert until I can sort out what happened and fix it.
2018-02-22 11:18:33 +00:00
Warner Losh
91acaad987 Fix typo in last commit after last rebase before commit... 2018-02-22 10:55:23 +00:00