Commit Graph

2201 Commits

Author SHA1 Message Date
gonzo
14fafba2ca [ata] Add workaround for KingDian S200 SSD crash on receiving TRIM command
- Add ADA_Q_NO_TRIM quirk to be used with the device that falsely advertise TRIM support
- Add ADA_Q_NO_TRIM entry for KingDian S200 SSD

PR:		222802
Submitted by:	Bertrand Petit <bsdpr@phoe.frmug.org>
MFC after:	1 week
2019-01-18 04:23:52 +00:00
glebius
7ee1aa34d4 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
imp
ffbdab75c8 Add NO_SYNC_CACHE quirk for PENTAX cameras
PR: 93389
Submitted by: Demin Alexander
2019-01-08 20:55:02 +00:00
imp
5fed75b757 Add NO_RC16 quirk for Chipfancier 16GB USB stick...
Submitted by: osef.lar@gmail.com
PR: 234503
2018-12-31 22:20:30 +00:00
avg
dcef4b8263 add a knob that disables detection of write protected disks
It has been reported that on some systems (with real hardware passed
through to a virtual machine) the WP detection causes USB disk probing
failures.

While here, also fix the selection of the next state in the case
of malloc failure in DA_STATE_PROBE_WP.  It was DA_STATE_PROBE_RC
unconditionally even when it should have been DA_STATE_PROBE_RC16.

PR:		225794
Reported by:	David Boyd <David.Boyd49@twc.com>
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D18496
2018-12-17 16:01:37 +00:00
chuck
2fd83c3710 nda(4) fix check for Dataset Management support
In the nda(4) driver, only set DISKFLAG_CANDELETE (a.k.a. can support
BIO_DELETE) if the drive supports Dataset Management. There are reports
that without this check, VMWare Workstation does not work reliably.

Fix is to check the ONCS field in the NVMe Controller Data structure for
support. This check previously existed but did not survive the
big-endian changes.

Reported by: yuripv@yuripv.net
Reviewed by: imp, mav, jimharris
Approved by: imp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18493
2018-12-13 13:25:37 +00:00
imp
568f85fc1c Send a START UNIT command when a disk responds with an ASC of 04/1C.
This will hopefully spin up a disk that's in low-power mode.

Sponsored by: Netflix
Submitted by: scottl@
2018-12-09 21:37:34 +00:00
scottl
a4498f7fb2 Don't allocate the config_intrhook separately from the softc, it's small
enough that it costs more code to handle the malloc/free than it saves.
2018-12-09 06:16:54 +00:00
avg
c1fb81f798 daprobedone: announce if a disk is write-protected
MFC after:	2 weeks
2018-12-07 12:02:31 +00:00
imp
02d67d047d NVME trim clocking
Add the ability to set two goals for trims in the I/O scheduler. The
first goal is the number of BIO_DELETEs to accumulate
(kern.cam.XX.U.trim_goal). When non-zero, this many trims will be
accumulated before we start to transfer them to lower layers. This is
useful for devices that like to get lots of trims all at once in one
transaction (not all devices are like this, and some vary by workload).

The second is a number of ticks to defer trims. If you've set a trim
goal, then kern.cam.XX.U.trim_ticks controls how long the system will
defer those trims before timing out and sending them anyway. It has no
effect when trim_goal is 0.

In any event, a BIO_FLUSH will cause all the TRIMs to be released to
the periph drivers. This may be a minor overloading of what BIO_FLUSH
is supposed to mean, but it's useful to preserve other ordering
semantics that users of BIO_FLUSH reply on.

Sponsored by: Netflix, Inc
2018-11-27 00:36:35 +00:00
imp
e511976db9 Minor tweaks to the formatting
Tweak the format of the trim + read bias code. Add similar debug to
the read + writes case.

Spondored by: Netflix
2018-11-26 22:50:30 +00:00
imp
a9d5ac8402 Add cam_iosched_set_latfcn to set a latency callback for high latency.
It's often useful to have a callback when an I/O takes more than a
threshold amount of time. This adds the infrastructure for periph
devices to register one.

One use-case is as a debugging aide when you need a semi-realtime
indication of an I/O outlier so you can trigger bus capture gear for
vendor analysis.

Sponsored by: Netflix, Inc
2018-11-15 16:02:45 +00:00
imp
c2bb195d18 Introduce scsi_ata_setfeatures() as a convenient way to make
a passthru ATA SETFEATURES command.

Sponsored by: Netflix, Inc
2018-11-15 16:02:34 +00:00
imp
a18b0830c4 Remove trailing white space in advance of other changes. 2018-11-14 23:15:50 +00:00
imp
9225311061 Only assert locked for many async events.
Many async events that we see are called for this specific path. When
calling an async callback for a targetted device, XTP will lock that
specific device's path lock (same as what cam_periph_lock does). For
those AC_ events, assert we have the lock rather than trying to
recusrively take it (which causes panics since it's not recursive).

Add annotations about this and about the fact that AC_SCSI_AEN events
are generated now only in the ata stack (which cannot have a scsi_da
attachment). Leave it in place in case I've overlooked something as
the code is harmless.

This is fallout from my attempts to "fix" locking for softc->flags in
r330796 that's not been triggered often enough to get my attention
until now.

Sponsored by: Netflix
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D17837
2018-11-05 18:47:29 +00:00
imp
8af15e0bcc Add comments explaining what hold/unhold do
They act as a simple one-deep semaphore to keep open/close/probe from
running at the same time to avoid races that creates.
2018-11-01 21:51:41 +00:00
imp
2e9fda2a00 Add statistics for TRIM comands
Add a counter for the LBAs, Ranges and hardware commands so that we
can provide additional color to the statistics we provide to vendors.

Sponsored by: Netflix, Inc
2018-10-26 16:23:51 +00:00
imp
0c21ab179e Retire scsi_low
scsi_low was a common set of routines to do the SCSI bus sequencing
for the ncv, nsp and stg drivers. Those have been removed, so it's no
longer needed since nothing else in the tree uses it and nothing
likely ever will (it's for super-low-end 8-bit parallel SCSI cards).
2018-10-22 02:36:07 +00:00
brooks
3a94dca87f Move 32-bit compat support for CDIOREADTOCENTRYS to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other members which require no translation.

Reviewed by:	kib (prior version), jhb
Approved by:	re (rgrimes)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Review:	https://reviews.freebsd.org/D17378
2018-10-02 23:23:56 +00:00
ken
b90df93520 Fix a da(4) driver memory leak for SCSI SMR devices.
In the probe case for SCSI SMR Host Aware or Most Managed drives, be sure
to free allocated memory.

sys/cam/scsi/scsi_da.c:
	In dadone_probezone(), free the data pointer before returning.

MFC after:	3 days
Sponsored by:	Spectra Logic
Approved by:	re (kib)
2018-10-01 19:00:46 +00:00
trasz
b2be995d83 Make the wait in cfiscsi_offline() interruptible. This is the second half
of the fix/workaround for the "ctld hanging on reload" problem.

PR:		220175
Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
Approved by:	re (kib)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-09-11 11:39:59 +00:00
mav
231b46e180 Add missing copyin() to access LUN and port ioctl arguments.
Somehow this was working even after PTI in, at least on amd64, and got
broken by something only very recently.

Reviewed by:	araujo
Approved by:	re (gjb)
2018-09-06 14:03:10 +00:00
trasz
9e7534ea78 Try harder in cfiscsi_offline(). This is believed to be the workaround
for the "ctld hanging on reload" problem observed in same cases under
high load.  I'm not 100% sure it's _the_ fix, as the issue is rather hard
to reproduce, but it was tested as part of a larger path and the problem
disappeared.  It certainly shouldn't break anything.

Now, technically, it shouldn't be needed.  Quoting mav@, "After
ct->ct_online == 0 there should be no new sessions attached to the target.
And if you see some problems abbout it, it may either mean that there are
some races where single cfiscsi_session_terminate(cs) call may be lost,
or as a guess while this thread was sleeping target was reenabbled and
redisabled again".  Should such race be discovered and properly fixed
in the future, than this and the followup two commits can be backed out.

PR:		220175
Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
Discussed with:	mav
Approved by:	re (gjb)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-09-01 16:16:40 +00:00
chuck
fa895cb8d2 Make NVMe compatible with the original API
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).

This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.

This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.

Bump __FreeBSD_version to 1200081 due to API change

Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404
2018-08-22 04:29:24 +00:00
trasz
96a2cf426c Remove unneccessary code, which also introduced a (very minor)
race condition, due to a missing call to cfiscsi_target_release().

Discussed with:	mav@
Tested by:	Eugene M. Zheganin <emz at norma.perm.ru> (earlier version)
MFC after:	2 weeks
Sponsored by:	playkey.net
2018-08-21 14:34:24 +00:00
imp
09ab5192c4 Flesh out a comment about what we're doing with read bias and trims.
Sponsored by: Netflix
2018-08-15 00:15:40 +00:00
imp
ac2b4cbc15 Create xpt_sim_poll and refactor a bit using it.
xpt_sim_poll takes the sim to poll as an argument. It will do the
proper locking protocol, call the SIM polling routine, and then call
camisr_runqueue to process completions on any CCBs the SIM's poll
routine completed. It will be used during late shutdown when a SIM is
waiting for CCBs it sent during shutdown to finish and the scheduler
isn't running because we've panic'd.

This sequence was used twice in cam_xpt, so refactor those to use this
new function.

Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16663
2018-08-13 19:59:32 +00:00
cem
f760da50b5 Walk back r337554 while discussion continues
The idea was to get the uncontroversial mechanical change out of the way,
then get the meatier functional changes reviewed subsequently.  I had not
realized that the immediately adjacent issue was addressed in a different
direction in r334506 (see Warner's guidance in D15592).

Discussion continues, trying to determine if there is a secondary issue
still[1] and how best to fix it.  With 12-related activities coming up,
while that is ongoing, just take this back for now.

[1]: Shutdown-time eventhandler events fire normally during panic's reboot
path.  Driver callbacks that attempt to issue and wait on interrupt-
completed IO may never complete, hanging the system.  This is particularly
obnoxious in the shutdown/panic path, as the debugger cannot be entered
anymore and the hang prevents reboot restoring availability.

(There's nothing CAM-specific about this problem -- any shutdown
event-triggered driver could do something like this during panic.  But most
NICs, etc.  don't try to send spin-down commands at shutdown. ;-))

Discussed with:	imp, markj
2018-08-10 19:19:07 +00:00
cem
5f3e2ff1af cam(4): Add an xpt-neutral flag indicating a valid panic CCB
No functional change.

Note that this change is careful to set the CCB header xflags after
foo_fill_bar() routines, which generally zero existing flags.  An earlier
version of this patch mistakenly set the flag before the fill routines.

Submitted by:	Scott Ferris <sferris AT isilon.com>, jhibbits@
Reviewed by:	bdrewery@, markj@, and non-committer FreeBSD contributor Anton Rang
Sponsored by:	Dell EMC Isilon
2018-08-09 21:53:32 +00:00
cem
8b9f945b19 cam_ccb.h: Remove redundant declarations of static inline functions
No functional change.

They're unnecessarily confusing for tools like grep or ctags.

Sponsored by:	Dell EMC Isilon
2018-08-09 21:20:07 +00:00
imp
61ca973984 For the dynamic I/O scheduler, make the TRIM stuff also count against
read bias so we do reads in preference to TRIMs. This helps a lot when
many trims are delivered at once from the upper layers as they tend to
delay READs due to priority inversion in the code today.

The non iosched case will be fixed when the trim comibing changes
needed for nvme come in later this year.

Sponsored by: Netflix
2018-07-26 22:55:51 +00:00
mav
ae5d1fba64 Stop further SCSI recovery attempts after one has failed.
We've got a set of probably damaged hard disks, reporting 0x04,0x02
("Logical unit not ready, initializing command required") in response
to READ CAPACITY(16), where attempts to use START STOP UNIT for recovery
results in 0x44,0x00 ("Internal target failure") after ~1 second delay.
As result of all recovery retries, device open attempt took ~3 seconds
before finally reporting to GEOM that device is opened, but has no media.
If the open was for writing and since it hasn't formally failed, following
close triggered GEOM retaste, opening device few more times with respective
delays.

This change reduces whole time of this cycle from ~12 seconds to ~3 by
giving up on recovery after the first failure.

Reviewed by:	ken
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-07-21 21:34:10 +00:00
avg
da4bc8f61e remove unneeded inclusion of sys/interrupt.h from several files
It's likely that the header was needed in the past for swi(9).
But now that code does not use swi(9) or any other interfaces defined
in sys/interrupt.h.

MFC after:	1 week
2018-07-04 09:07:18 +00:00
kibab
06f56d598b Fix setting RCA for MMC cards
Unlike SD cards, that publish RCA in response to CMD3,
MMC cards expect the host to set RCA itself.

Since we don't support multiple MMC cards on the bus,
just assign a static RCA of 2 to the attached MMC card.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D13063
2018-06-19 20:02:03 +00:00
kibab
2d62377a21 Don't try to turn power down MMC bus if it is already down
Regulator framework doens't like turning off already turned off
regulators, so we get panic on AllWinner boards.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15890
2018-06-19 11:28:50 +00:00
kibab
9d45ebbc21 Correctly define rawscr so initializing it doesn't result in overwriting memory.
We need 8 bytes of storage for rawscr.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15889
2018-06-19 11:25:40 +00:00
kibab
ea2b6880ec Set MMC_DATA_MULTI flag when doing multi-block transfers
Lower layers (MMC / SDHCI controller drivers) may make certain decisions
based on the presence of this flag. The fact that sdhci.c doesn't
look at this flag is another problem that should be fixed separately.

Found when adding MMCCAM support to AllWinner MMC controller driver
where the presence of this flag actually matters.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15888
2018-06-19 11:23:48 +00:00
ken
896df23a52 Fix da(4) locking when probing SMR drives.
Probing host aware and host managed SMR drives got broken in revision
330796.

The added cam_periph_lock() calls were in areas in dadone() where
the peripheral lock was already held.

Since then, dadone() has been split into separate functions that are
dedicated to each probe state.

The result is that when probing a host aware drive, I ran into a recursive
lock acquisition in dadone_probeatalogdir(). I would have run into the
same problem in dadone_probeataiddir(), and in dadone_probeatasup() and
dadone_probeatazone() in the error paths had the probe continued.

The solution is to take out all of the extra cam_periph_lock() calls. I
also added cam_periph_assert(periph, MA_OWNED) near the top of each of
the dadone_* calls. These make it clear to anyone coming along in the
the future that the lock is held in the probe done functions.

Also add a locking assert in daprobedone(), to make it clear that it must
be called with the periph lock held.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D15764
2018-06-14 17:08:44 +00:00
kibab
f53b0281b8 Enable high-speed on the card before increasing frequency on the controller
Increasing operating frequency without telling card to switch
to high-speed mode first upsets some cards and generates CRC errors.

While here, deselect / reselect cards after CMD6 and SCR fetch, as in original code.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15568
2018-06-05 11:03:24 +00:00
vangyzen
75bd7104d0 cam nvme: fix array overrun
Fix a classic array overrun where the index could be one past the end.

Reported by:	Coverity
CID:		1356596
MFC after:	3 days
Sponsored by:	Dell EMC
2018-05-28 03:14:36 +00:00
mav
a8d82e59ae Refactor NVMe CAM integration.
- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
 - Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
 - Add initial support for namespace change async events.  So far only
in CAM mode, but it allows run-time namespace arrival and departure.
 - Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.

Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-05-25 03:34:33 +00:00
imp
b37bf7e1e0 We can't release the refcount outside of the periph lock.
We're dropping the periph lock then dropping the refcount. However,
that violates the locking protocol and is racy. This seems to be
the cause of weird occasional panics with a bogus assert.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15517
2018-05-24 16:31:18 +00:00
kibab
acc25b4abd Implement initial MMC partitions support for MMCCAM.
For MMC cards, add partitions found on the card as separate disk(9) devices.
Don't do anything with RPMB partition for now.
Lots of code is copied almost 1:1 from the mmcsd.c in the old stack,
credits Marius Strobl (marius@FreeBSD.org)

Reviewed by:	marius
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12762
2018-05-22 22:16:49 +00:00
kibab
e09ab09b6c Fix MMCCAM scanning for new cards.
r326645 used an incorrect argument for xpt_path_inq().

Reviewed by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15521
2018-05-22 16:32:34 +00:00
imp
f39d874a4b Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.

Sponsored by: Netflix
Submitted by: Chuck Silvers
2018-05-15 22:22:10 +00:00
imp
b2910ffe25 Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)

Sponsored by: Netflix
Submitted by: Chuck Silvers
Differential Revision: https://reviews.freebsd.org/D15435
2018-05-15 21:25:35 +00:00
lwhsu
a2b0dc578d Fix build for platforms using GCC:
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15383
2018-05-10 17:22:04 +00:00
araujo
2549fc5001 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
imp
899bd2ec13 Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
scottl
88f39fc72c Refactor dadone(). There was no useful code sharing in it; it was just
a 1500 line switch statement.  Callers now specify a discrete completion
handler, though they're still welcome to track state via ccb_state.

Sponsored by:	Netflix
2018-05-01 21:42:27 +00:00