Commit Graph

256 Commits

Author SHA1 Message Date
Sean Bruno
23030355c6 Add 4k quirk for Micron 5100 and Intel S3610 SSDs
Submitted by:	Jason Wolfe <j@nitrology.com>
MFH:		1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D9209
2017-01-17 14:52:48 +00:00
Ed Schouten
4c484fd216 Add label annotations to CAM sysctls.
Under kern.cam we have certain sysctls that are per-device, such as the
ones under kern.cam.ada.[0-9]+.*. Add a "device_index" label annotation
to such sysctls, so that the Prometheus metrics exporter will give all
of those metrics the same name. The device number will be added to the
metric name as the "device_index" label.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D8775
2016-12-14 12:53:33 +00:00
Alexander Motin
55a1720717 Replicate r307507 for ATA disks.
MFC after:	2 weeks
2016-10-17 08:38:24 +00:00
Sepherosa Ziehau
a11463fd84 cam/ata: Allow drivers to veto ATA disk attachment.
This eventhandler is mainly used by VMs, e.g. Hyper-V, whose disk
controllers share the disks with the simulated ATA controllers.

Submitted by:	Hongjiang Zhang <honzhan microsoft com>
Discussed with:	mav
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D7693
2016-09-28 08:35:05 +00:00
Alexander Motin
1e3d53e2c4 Decode some new ATA commands found in ACS-3.
MFC after:	1 week
2016-08-27 19:51:37 +00:00
Pedro F. Giffuni
a061aa46fe sys: replace comma with semicolon when pertinent.
Uses of commas instead of a semicolons can easily go undetected. The comma
can serve as a statement separator but this shouldn't be abused when
statements are meant to be standalone.

Detected with devel/coccinelle following a hint from DragonFlyBSD.

MFC after:	1 month
2016-08-09 19:42:20 +00:00
Warner Losh
08f1387933 Move protocol specific stuff into a linker set object that's
per-protocol. This reduces the number scsi symbols references by
cam_xpt significantly, and eliminates all ata / nvme symbols. There's
still some NVME / ATA specific code for dealing with XPT_NVME_IO and
XPT_ATA_IO respectively, and a bunch of scsi-specific code, but this
is progress.

Differential Revision: https://reviews.freebsd.org/D7289
2016-07-28 22:55:21 +00:00
Warner Losh
ded2b70617 Switch to linker sets to find the xport callback object. This
eliminates the need to special case everything in cam_xpt for new
transports. It is now a failure to not have a transport object when
registering the bus as well. You can still, however, create a
transport that's unspecified (XPT_)

Differential Revision: https://reviews.freebsd.org/D7289
2016-07-28 22:55:14 +00:00
Alexander Motin
db8e94bb3b Restore PIM_ATA_EXT flag handling, lost at r300207.
This re-enables NCQ TRIM usage on capable hardware (bhyve).
2016-07-17 14:17:58 +00:00
Conrad Meyer
75548271a9 Fix memory leaks in (a|)daregister introduced in r298002
In the case where cam_iosched_init() fails, the ada and da softcs were leaked.
Instead, free them.

Reported by:	Coverity
CID:		1356039
Sponsored by:	EMC / Isilon Storage Division
2016-06-07 20:33:55 +00:00
Kenneth D. Merry
600fd98ff3 Fix a few ada(4) driver issues:
o Some Samsung drives do not support the ATA READ LOG EXT or READ
   LOG DMA EXT commands, despite indicating that they do in their
   IDENTIFY data.  So, fix this in two ways:
	1. Only start the log directory probe (ADA_STATE_LOGDIR) if
	   the drive claims to be an SMR drive in the first place.
	   We don't need to do the extra probing for other devices.
	   This will also serve to prevent problems with other
	   drives that have the same issue.
	2. Add quirks for the two Samsung drives that have been
	   reported so far (thanks to Oleg Nauman and Alex Petrov).
	   If there is a reason to do a Read Log later on, we will
	   know that it doesn't work on these drives.

 o Add a quirk entry to mark Seagate Lamarr Drive Managed drives as
   drive managed.  They don't report this in their Identify data.

sys/cam/ata/ata_da.c:
	Add two new quirks:
	1. ADA_Q_LOG_BROKEN, for drives that claim to support Read
	   Log but don't really.
	2. ADA_Q_SMR_DM, for drives that are Drive Managed SMR, but
	   don't report it.  This can matter for software that
	   wants to know when it should make an extra effort to
	   write sequentially.

	Record two Samsung drives that don't support Read Log, and
	one Seagate drive that doesn't report that it is a SMR drive.
	The Seagate drive is already recorded in the da(4) driver.

	We may have to come up with a similar solution in the da(4)
	driver for SATA drives that don't properly support Read Log.

	In adasetflags(), Dont' set the ADA_FLAG_CAN_LOG bit if the
	device has the LOG_BROKEN quirk set.  Also, look at the
	SMR_DM quirk and set the device type accordingly if it is
	actually a drive managed drive.

	When deciding whether to go into the LOGDIR probe state,
	look to see whether the device claims to be an SMR device.
	If not, don't bother with the LOGDIR probe state.

Sponsored by:	Spectra Logic
2016-05-25 01:37:39 +00:00
Kenneth D. Merry
3f54ec85e8 Fix ada(4) trim support quirk setting.
I broke broke the quirk in the ada(4) driver disabling NCQ trim support
in revision 300207.  The support flags were set before the quirks were
loaded.

sys/cam/ata/ata_da.c:
	Call adasetflags() after loading quirks, so that we'll set the
	flags accurately.

Sponsored by:	Spectra Logic
2016-05-23 19:52:08 +00:00
Kenneth D. Merry
9a6844d55f Add support for managing Shingled Magnetic Recording (SMR) drives.
This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set.  You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity.  In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged.  For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation.  I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated.  These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers.  Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
	Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
	Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
	Add the zone and epc subcommands.

	Add auxiliary register support to build_ata_cmd().  Make sure to
	set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
	flags as appropriate for ATA commands.

	Add a new get_ata_status() function to parse ATA result from SCSI
	sense descriptors (for ATA passthrough over SCSI) and ATA I/O
	requests.

sbin/camcontrol/camcontrol.h:
	Update the build_ata_cmd() prototype

	Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
	Support for ATA Extended Power Conditions features.  This includes
	support for all features documented in the ACS-4 Revision 12
	specification from t13.org (dated February 18, 2016).

	The EPC feature set allows putting a drive into a power power mode
	immediately, or setting timeouts so that the drive will
	automatically enter progressively lower power states after various
	idle times.

sbin/camcontrol/fwdownload.c:
	Update the firmware download code for the new build_ata_cmd()
	arguments.

sbin/camcontrol/zone.c:
	Implement support for Shingled Magnetic Recording (SMR) drives
	via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
	Command Set (ZAC).

	These specs were developed in concert, and are functionally
	identical.  The primary differences are due to SCSI and ATA
	differences.  (SCSI is big endian, ATA is little endian, for
	example.)

	This includes support for all commands defined in the ZBC and
	ZAC specs.

sys/cam/ata/ata_all.c:
	Decode a number of additional ATA command names in ata_op_string().

	Add a new CCB building function, ata_read_log().

	Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
	functions.  These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
	Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
	ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
	Revamp the ada(4) driver to support zoned devices.

	Add four new probe states to gather information needed for zone
	support.

	Add a new adasetflags() function to avoid duplication of large
	blocks of flag setting between the async handler and register
	functions.

	Add new sysctl variables that describe zone support and paramters.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
	Add command descriptions for the ZBC IN/OUT commands.

	Add descriptions for ZBC Host Managed devices.

	Add a new function, scsi_ata_pass() to do ATA passthrough over
	SCSI.  This will eventually replace scsi_ata_pass_16() -- it
	can create the 12, 16, and 32-byte variants of the ATA
	PASS-THROUGH command, and supports setting all of the
	registers defined as of SAT-4, Revision 5 (March 11, 2016).

	Change scsi_ata_identify() to use scsi_ata_pass() instead of
	scsi_ata_pass_16().

	Add a new scsi_ata_read_log() function to facilitate reading
	ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
	Add the new ATA PASS-THROUGH(32) command CDB.  Add extended and
	variable CDB opcodes.

	Add Zoned Block Device Characteristics VPD page.

	Add ATA Return SCSI sense descriptor.

	Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
	Revamp the da(4) driver to support zoned devices.

	Add five new probe states, four of which are needed for ATA
	devices.

	Add five new sysctl variables that describe zone support and
	parameters.

	The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
	devices when they are attached via a SCSI to ATA Translation (SAT)
	layer.  Since ZBC -> ZAC translation is a new feature in the T10
	SAT-4 spec, most SATA drives will be supported via ATA commands
	sent via the SCSI ATA PASS-THROUGH command.  The da(4) driver will
	prefer the ZBC interface, if it is available, for performance
	reasons, but will use the ATA PASS-THROUGH interface to the ZAC
	command set if the SAT layer doesn't support translation yet.
	As I mentioned above, ZBC command support is untested.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

	Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

	Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
	building functions.  Note that these have return values, unlike
	almost all other CCB building functions in CAM.  The reason is
	that they can fail, depending upon the particular combination
	of input parameters.  The primary failure case is if the user
	wants NCQ, but fails to specify additional CDB storage.  NCQ
	requires using the 32-byte version of the SCSI ATA PASS-THROUGH
	command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
	Add ZBC IN and ZBC OUT CDBs and opcodes.

	Add SCSI Report Zones data structures.

	Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
	scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
	Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

	ahci_setup_fis() previously set the top bits of the sector count
	register in the FIS to 0 for FPDMA commands.  This is okay for
	read and write, because the PRIO field is in the only thing in
	those bits, and we don't implement that further up the stack.

	But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
	byte, so it needs to be transmitted to the drive.

	In ahci_setup_fis(), always set the the top 8 bits of the
	sector count register.  We need it in both the standard
	and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
	Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
	Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
	Add new DIOCZONECMD ioctl, which allows sending zone commands to
	disks.

sys/geom/geom_disk.c:
	Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
	Add a new flag, DISKFLAG_CANZONE, that indicates that a given
	GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
	Add a new function, g_io_zonecmd(), that handles execution of
	BIO_ZONE commands.

	Add permissions check for BIO_ZONE commands.

	Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
	Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
	Record statistics for REPORT ZONES commands.  Note that the
	number of bytes transferred for REPORT ZONES won't quite match
	what is received from the harware.  This is because we're
	necessarily counting bytes coming from the da(4) / ada(4) drivers,
	which are using the disk_zone.h interface to communicate up
	the stack.  The structure sizes it uses are slightly different
	than the SCSI and ATA structure sizes.

sys/sys/ata.h:
	Add many bit and structure definitions for ZAC, NCQ, and EPC
	command support.

sys/sys/bio.h:
	Convert the bio_cmd field to a straight enumeration.  This will
	yield more space for additional commands in the future.  After
	change r297955 and other related changes, this is now possible.
	Converting to an enumeration will also prevent use as a bitmask
	in the future.

sys/sys/disk.h:
	Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
	Add a new API for managing zoned disks.  This is very close to
	the SCSI ZBC and ATA ZAC standards, but uses integers in native
	byte order instead of big endian (SCSI) or little endian (ATA)
	byte arrays.

	This is intended to offer to the complete feature set of the ZBC
	and ZAC disk management without requiring the application developer
	to include SCSI or ATA headers.  We also use one set of headers
	for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
	Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
	of SMR support.

usr.sbin/Makefile:
	Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
	Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
	Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
	zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
	The zonectl(8) utility.  This allows managing SCSI or ATA zoned
	disks via the disk_zone.h API.  You can report zones, reset write
	pointers, get parameters, etc.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D6147
Reviewed by:	wblock (documentation)
2016-05-19 14:08:36 +00:00
Pedro F. Giffuni
55e0987aea sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
Pedro F. Giffuni
323b076e9c sys: use our nitems() macro when param.h is available.
This should cover all the remaining cases in the kernel.

Discussed in:	freebsd-current
2016-04-21 19:40:10 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Warner Losh
916d57dfc5 Implement Auxiliary register. Add PIM_ATA_EXT flag to flag that a SIM
can handle it, and add the code to add it to the FIS that's sent to
the drive. The mvs driver is the only other ATA driver in the system,
and its hardware doesn't appear to support setting the Auxiliary
register.

Differential Revision: https://reviews.freebsd.org/D5598
2016-04-17 05:24:36 +00:00
Warner Losh
e4cc6558b3 tag_action is not used at all in ata. It's set to 1 for ordered
transactions, but that value isn't used. It's bogusly used to report
in devstat, due to a cut and paste error from SCSI. Mark it as unused
in cam_fill_ataio. Reclaim the memory as a new ata_flags. In addition,
tag_id and init_id are completely unused, so reclaim those as 'unused'
now too. These were needlessly copied when ata was split from scsi.

This allows us, in the future, to create structures that can
communicate AUXILIARY regsiter to the SIMs, which cannot be done now.

Differential Revision: https://reviews.freebsd.org/D5598
2016-04-17 05:24:28 +00:00
Warner Losh
bf95d6a610 Dell has an OEM drive from Samsung that has issues. NCQ Trim isn't
broken on this drive, but it doesn't support it and the fallback logic
is failing. Quirk it until those issues can be resolved in a more
generic way.
2016-04-17 02:06:10 +00:00
Warner Losh
acfc9b6862 Expand CAM_IO_STATS #ifdef to logical unit. 2016-04-15 05:10:39 +00:00
Warner Losh
b93ecd35e7 Out of an abundance of caution treat
* Samsung 843T Series SSDs (MZ7WD*)
 * Samsung PM851 Series SSDs (MZ7TE*)
 * Samsung PM853T Series SSDs (MZ7GE*)
as known having broken NCQ TRIM support as they appear to be based on
the same controller technology as the 840 and 850 series.

I've had at least one report of the PM853 being broken, so err on the
side of caution for the above drives. The PM863/SM863 appears to be
based on a newer controller, so give it the benefit of the doubt.
2016-04-15 05:10:31 +00:00
Warner Losh
555bb680cc Add FCCT M500 to the NCQ black list. Linux added it in 4.2 (August
2015). Correct the M500 firmware versions. EU07 was the engineering
test version, not the release version with the fix. MU07 is the
release version. It's the only Micron firmware version to actually
work. Remove support for EU07.

This brings the blacklist into parity with the Linux blacklist as of
4.5, except for the Micron M500 MU07 entry. I personally tested the
MU07 firmware on 12 machines running 6 drives each with no corruption
in the past 6 months with Netflix production loads. Prior versions of
the M500 firmware wouldn't last more than a few days.

Sponsored by: Netflix, Inc.
2016-04-15 03:10:04 +00:00
Enji Cooper
da908789ee Fix typos (intenral -> internal) in comments 2016-04-15 02:36:14 +00:00
Warner Losh
86ddf15ebd Add a comment about why the timeout for flush was lowered to 5s. 2016-04-14 22:13:46 +00:00
Warner Losh
a6e0c5da99 New CAM I/O scheduler for FreeBSD. The default I/O scheduler is the same
as before. The common scheduling bits have moved from inline code in
each of the CAM periph drivers into a library that implements the
default scheduling.

In addition, a number of rate-limiting and I/O preference options can
be enabled by adding CAM_IOSCHED_NETFLIX to your config file. A number
of extra stats are also maintained. CAM_IOSCHED_NETFLIX isn't on by
default because it uses a separate BIO_READ and BIO_WRITE queue, so
doesn't honor BIO_ORDERED between these two types of operations. We
already didn't honor it for BIO_DELETE, and we don't depend on
BIO_ORDERED between reads and writes anywhere in the system (it is
currently used with BIO_FLUSH in ZFS to make sure some writes are
complete before others start and as a poor-man's soft dependency in
one place in UFS where we won't be issuing READs until after the
operation completes). However, out of an abundance of caution, it
isn't enabled by default.

Plus, this also brings in NCQ TRIM support for those SSDs that support
it. A black list is also provided for known rogues that use NCQ trim
as an excuse to corrupt the drive. It was difficult to separate out
into a separate commit.

This code has run in production at Netflix for over a year now.

Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D4609
2016-04-14 21:47:58 +00:00
Scott Long
c9767ca834 Add sbuf variants ata_cmd_sbuf() and ata_res_sbuf(), and reimplement the
_string variants on top of this.  This requires a change to the function
signature of ata_res_sbuf().  Its use in the tree seems to be very limited,
and the change makes it more consistent with the rest of the API.

Reviewed by:	imp, mav, kenm
Sponsored by:	Netflix
Differential Revision:	D5940
2016-04-13 20:10:06 +00:00
Jean-Sébastien Pédron
eae90da9b0 CAM: Generalize 4k quirk to all Samsung MZ7* SSDs
This adds Samsung PM851 to the list. It can be found in Lenovo Thinkpad
T440 for instance.

Reviewed by:	Kevin Bowling <kevin.bowling@kev009.com>,
		Jason Wolfe <j@nitrology.com>
Approved by:	Kevin Bowling <kevin.bowling@kev009.com>,
		Jason Wolfe <j@nitrology.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5753
2016-03-29 06:56:46 +00:00
Sean Bruno
844b798499 Add 4k enabled cam quirks for Samsung SM863 Series SSDs
Submitted by:	Jason (j@nitrology.com)
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D5711
2016-03-24 14:20:33 +00:00
Ravi Pokala
dd4637c078 Add defines for WRITE_UNCORRECTABLE ATA command, and improve command logging
Add #defines for ATA_WRITE_UNCORRECTABLE48 and its features. Update the
decoding in ATACAM to recognize the new values. Also improve command
decoding for a few other commands (SMART, NOP, SET_FEATURES). Bring the
decoding in ata(4) up to parity with ATACAM.

Reviewed by:	mav, imp
MFC after:	1 month
Sponsored by:	Panasas, Inc.
Differential Revision:	https://reviews.freebsd.org/D5181
2016-02-04 19:53:54 +00:00
Kenneth D. Merry
a9934668aa Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl.  User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists.  This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out.  This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
   Currently the unmapped I/O interface operates on a struct bio,
   which includes only one address and length.  It would be nice
   to be able to send an unmapped scatter/gather list down to
   busdma.  This would allow eliminating the copy we currently do
   for data.

2. Add an ioctl to list currently outstanding CCBs in the various
   queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
   that.

4. Test physical address support.  Virtual pointers and scatter
   gather lists have been tested, but I have not yet tested
   physical addresses or scatter/gather lists.

5. Investigate multiple queue support.  At the moment there is one
   queue of commands per pass(4) device.  If multiple processes
   open the device, they will submit I/O into the same queue and
   get events for the same completions.  This is probably the right
   model for most applications, but it is something that could be
   changed later on.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout.  It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.

The I/O is done by two threads, one for the reader and one for the
writer.  The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order.  That could be modified later on for random I/O patterns
or slightly out of order I/O.

camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.

For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side.  In addition to testing both
interfaces, this makes any potential reblocking of I/O easier.  No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.

For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.

Things that would be nice to do for camdd(8) eventually:

1.  Add support for I/O pattern generation.  Patterns like all
    zeros, all ones, LBA-based patterns, random patterns, etc. Right
    Now you can always use /dev/zero, /dev/random, etc.

2.  Add support for a "sink" mode, so we do only reads with no
    writes.  Right now, you can use /dev/null.

3.  Add support for automatic queue depth probing, so that we can
    figure out the right queue depth on the input and output side
    for maximum throughput.  At the moment it defaults to 6.

4.  Add support for SATA device passthrough I/O.

5.  Add support for random LBAs and/or lengths on the input and
    output sides.

6.  Track average per-I/O latency and busy time.  The busy time
    and latency could also feed in to the automatic queue depth
    determination.

sys/cam/scsi/scsi_pass.h:
	Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
	and fetch asynchronous CAM CCBs respectively.

	Although these ioctls do not have a declared argument, they
	both take a union ccb pointer.  If we declare a size here,
	the ioctl code in sys/kern/sys_generic.c will malloc and free
	a buffer for either the CCB or the CCB pointer (depending on
	how it is declared).  Since we have to keep a copy of the
	CCB (which is fairly large) anyway, having the ioctl malloc
	and free a CCB for each call is wasteful.

sys/cam/scsi/scsi_pass.c:
	Add asynchronous CCB support.

	Add two new ioctls, CAMIOQUEUE and CAMIOGET.

	CAMIOQUEUE adds a CCB to the incoming queue.  The CCB is
	executed immediately (and moved to the active queue) if it
	is an immediate CCB, but otherwise it will be executed
	in passstart() when a CCB is available from the transport layer.

	When CCBs are completed (because they are immediate or
	passdone() if they are queued), they are put on the done
	queue.

	If we get the final close on the device before all pending
	I/O is complete, all active I/O is moved to the abandoned
	queue and we increment the peripheral reference count so
	that the peripheral driver instance doesn't go away before
	all pending I/O is done.

	The new passcreatezone() function is called on the first
	call to the CAMIOQUEUE ioctl on a given device to allocate
	the UMA zones for I/O requests and S/G list buffers.  This
	may be good to move off to a taskqueue at some point.
	The new passmemsetup() function allocates memory and
	scatter/gather lists to hold the user's data, and copies
	in any data that needs to be written.  For virtual pointers
	(CAM_DATA_VADDR), the kernel buffer is malloced from the
	new pass(4) driver malloc bucket.  For virtual
	scatter/gather lists (CAM_DATA_SG), buffers are allocated
	from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
	Physical pointers are passed in unchanged.  We have support
	for up to 16 scatter/gather segments (for the user and
	kernel S/G lists) in the default struct pass_io_req, so
	requests with longer S/G lists require an extra kernel malloc.

	The new passcopysglist() function copies a user scatter/gather
	list to a kernel scatter/gather list.  The number of elements
	in each list may be different, but (obviously) the amount of data
	stored has to be identical.

	The new passmemdone() function copies data out for the
	CAM_DATA_VADDR and CAM_DATA_SG cases.

	The new passiocleanup() function restores data pointers in
	user CCBs and frees memory.

	Add new functions to support kqueue(2)/kevent(2):

	passreadfilt() tells kevent whether or not the done
	queue is empty.

	passkqfilter() adds a knote to our list.

	passreadfiltdetach() removes a knote from our list.

	Add a new function, passpoll(), for poll(2)/select(2)
	to use.

	Add devstat(9) support for the queued CCB path.

sys/cam/ata/ata_da.c:
	Add support for the BIO_VLIST bio type.

sys/cam/cam_ccb.h:
	Add a new enumeration for the xflags field in the CCB header.
	(This doesn't change the CCB header, just adds an enumeration to
	use.)

sys/cam/cam_xpt.c:
	Add a new function, xpt_setup_ccb_flags(), that allows specifying
	CCB flags.

sys/cam/cam_xpt.h:
	Add a prototype for xpt_setup_ccb_flags().

sys/cam/scsi/scsi_da.c:
	Add support for BIO_VLIST.

sys/dev/md/md.c:
	Add BIO_VLIST support to md(4).

sys/geom/geom_disk.c:
	Add BIO_VLIST support to the GEOM disk class.  Re-factor the I/O size
	limiting code in g_disk_start() a bit.

sys/kern/subr_bus_dma.c:
	Change _bus_dmamap_load_vlist() to take a starting offset and
	length.

	Add a new function, _bus_dmamap_load_pages(), that will load a list
	of physical pages starting at an offset.

	Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
	Allow unmapped I/O to start at an offset.

sys/kern/subr_uio.c:
	Add two new functions, physcopyin_vlist() and physcopyout_vlist().

sys/pc98/include/bus.h:
	Guard kernel-only parts of the pc98 machine/bus.h header with
	#ifdef _KERNEL.

	This allows userland programs to include <machine/bus.h> to get the
	definition of bus_addr_t and bus_size_t.

sys/sys/bio.h:
	Add a new bio flag, BIO_VLIST.

sys/sys/uio.h:
	Add prototypes for physcopyin_vlist() and physcopyout_vlist().

share/man/man4/pass.4:
	Document the CAMIOQUEUE and CAMIOGET ioctls.

usr.sbin/Makefile:
	Add camdd.

usr.sbin/camdd/Makefile:
	Add a makefile for camdd(8).

usr.sbin/camdd/camdd.8:
	Man page for camdd(8).

usr.sbin/camdd/camdd.c:
	The new camdd(8) utility.

Sponsored by:	Spectra Logic
MFC after:	1 week
2015-12-03 20:54:55 +00:00
Alexander Motin
b94650a2bb Removed unused malloc types.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	1 week
2015-11-06 18:50:01 +00:00
Alexander Motin
6854699543 Remove legacy CHS geometry from dmesg and unify capacity outputs. 2015-10-11 13:48:20 +00:00
Alexander Motin
4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00
Alexander Motin
9202485814 Attach pass driver to LUNs is OFFLINE state.
Previously such LUNs were silently ignored.  But while they indeed unable
to process most of SCSI commands, some, like RTPG, they still can.

MFC after:	1 month
2015-08-29 11:21:20 +00:00
Alexander Motin
4beec13537 Remove some code duplication by using biofinish().
Submitted by:	imp
MFC after:	1 week
2015-08-22 15:58:35 +00:00
Alexander Motin
bac1eac93c Don't panic if disk lost TRIM support due to switching to PIO mode.
MFC after:	1 week
2015-08-08 11:22:45 +00:00
Eitan Adler
9073a96a85 Add some additional quirks for various Western Digital Caviar MHDDs
Submitted by:	Jeremy Chadwick
PR:		188685
MFC After:	1 month
2015-03-30 09:05:20 +00:00
Alexander Motin
4f42bb1021 Improve ATA and SCSI versions printing.
There is no "SCSI-6" and "ATA-9", but there is "SPC-4" and "ACS-2".

MFC after:	2 weeks
2015-03-17 13:21:49 +00:00
Warner Losh
0ac665747d Explain a bit of tricky code dealing with trims and how it prevents
starvation. These side effects aren't obvious without extremely
careful study, and are important to do just so.
2015-01-13 00:20:35 +00:00
Steven Hartland
467298f5e3 Fix CF ERASE breakage caused by 268205.
This prevents BIO_DELETE requests getting stuck in the TRIM queue which
results in a panic on shutdown due to outstanding requests.

PR:		194606
Reported by:	Guido Falsi
Reviewed by:	mav
MFC after:	3 days
Sponsored by:	Multiplay
2014-10-26 18:41:01 +00:00
George V. Neville-Neil
e3a21bd139 Add new quirks for the latest Samsung SSD, model 850.
Submitted by:	sbruno
MFC after:	2 weeks
2014-10-19 16:46:36 +00:00
Sean Bruno
323e0f6d4c Add 4k quirks for PM853T Samsung SSD
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2014-10-16 20:33:04 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Warner Losh
2da8d262e0 Add a few defines and packet types for SATA 3.2 and FPDMA (First Party
DMA).

Sponsored by: Netflix
2014-08-30 02:13:04 +00:00
Warner Losh
e4bed0b403 We should never enter the PROBE_SETAN phase if we're not ATAPI, since
that's ATAPI specific. Instead, skip to PROBE_SET_MULTI instead for
non ATAPI protocols. The prior code incorrectly terminated the probe
with a break, rather than arranging for probedone to get called. This
caused panics or worse on some systems.
2014-08-22 13:15:59 +00:00
Sean Bruno
5f91863a54 Add the Samsung 843T as a 4k enabled drive
Submitted by:	Jason Wolfe <jason@llnw.com>
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2014-08-21 21:05:58 +00:00
Warner Losh
15f48aaad6 Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return data
that's only mostly similar. Specifically word 78 bits are defined for
IDENTIFY DEVICE as
	5 Supports Hardware Feature Control
while a IDENTIFY PACKET DEVICE defines them as
	5 Asynchronous notification supported
Therefore, only pay attention to bit 5 when we're talking to ATAPI
devices (we don't use the hardware feature control at this time).
Ignore it for ATA devices. Remove kludge that papered over this issue
for Samsung SATA SSDs, since Micron drives also have the bit set and
the error was caused by this bad interpretation of the spec (which is
quite easy to do, since bits aren't normally overlapping like this).
2014-08-20 22:58:12 +00:00
Steven Hartland
dc98c62f89 Added 4K quirks for Corsair Force GT and Samsung 840 SSDs
MFC after:	1 week
Sponsored by:	Multiplay
2014-08-14 13:57:17 +00:00
Warner Losh
7ddad071a5 Rework the BIO_DELETE code slightly. Always queue the BIO_DELETE
requests on the trim_queue, even for the CFA ERASE. This allows us, in
the future, to collapse adjacent requests. Since CFA ERASE is only for
CF cards, and it is so restrictive in what it can do, the collapse
code is not presently here. This also brings the ada driver more in
line with the da driver's treatment of BIO_DELETEs.

Reviewed by: mav@
2014-07-03 05:22:13 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Warner Losh
dbb3f5b28b The code that combines adjacent ranges for BIO_DELETEs to optimize
trims to the device assumes the list is sorted. Don't apply the
optimization of not sorting the queue when we have SSDs to the
delete_queue, since it causes more discard traffic to the drive. While
one could argue that the higher levels should coalesce the trims,
that's not done today, so some optimization at this level is needed.

CR: https://phabric.freebsd.org/D142
2014-06-05 17:13:42 +00:00
Alexander Motin
6d45fdc941 Fix support for increased logical sector size (4K-native drives).
- Logical sector size is measured in words, not bytes.
- If physical sector is not bigger then logical sector, it does not mean
it should be set equal to 512 bytes, but set to logical sector.

PR:		misc/187269
Submitted by:	Ravi Pokala <rpokala@panasas.com>
MFC after:	1 week
2014-03-07 09:45:40 +00:00
Alexander Motin
030844d1e7 Some microoptimizations for da and ada drivers:
- Replace ordered_tag_count counter with single flag;
 - From da remove outstanding_cmds counter, duplicating pending_ccbs list;
 - From da_softc remove unused links field.
2013-10-24 14:05:44 +00:00
Steven Hartland
c28078e903 Improve ZFS N-way mirror read performance by using load and locality
information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay
2013-10-23 09:54:58 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Alexander Motin
227d67aa54 Merge CAM locking changes from the projects/camlock branch to radically
reduce lock congestion and improve SMP scalability of the SCSI/ATA stack,
preparing the ground for the coming next GEOM direct dispatch support.

Replace big per-SIM locks with bunch of smaller ones:
 - per-LUN locks to protect device and peripheral drivers state;
 - per-target locks to protect list of LUNs on target;
 - per-bus locks to protect reference counting;
 - per-send queue locks to protect queue of CCBs to be sent;
 - per-done queue locks to protect queue of completed CCBs;
 - remaining per-SIM locks now protect only HBA driver internals.

While holding LUN lock it is allowed (while not recommended for performance
reasons) to take SIM lock.  The opposite acquisition order is forbidden.
All the other locks are leaf locks, that can be taken anywhere, but should
not be cascaded.  Many functions, such as: xpt_action(), xpt_done(),
xpt_async(), xpt_create_path(), etc. are no longer require (but allow) SIM
lock to be held.

To keep compatibility and solve cases where SIM lock can't be dropped, all
xpt_async() calls in addition to xpt_done() calls are queued to completion
threads for async processing in clean environment without SIM lock held.

Instead of single CAM SWI thread, used for commands completion processing
before, use multiple (depending on number of CPUs) threads.  Load balanced
between them using "hash" of the device B:T:L address.

HBA drivers that can drop SIM lock during completion processing and have
sufficient number of completion threads to efficiently scale to multiple
CPUs can use new function xpt_done_direct() to avoid extra context switch.
Make ahci(4) driver to use this mechanism depending on hardware setup.

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-21 12:00:26 +00:00
Alexander Motin
2030b2943b MFprojects/camlock:
Remove hard limit on number of BIOs handled with one ATA TRIM request.
2013-10-21 08:57:27 +00:00
Alexander Motin
8d36a71b76 Unify periph invalidation and destruction reporting.
Print message containing device model and serial number on invalidation.

Requested by:   glebius
MFC after:	1 week
2013-10-15 17:59:41 +00:00
Steven Hartland
d85805b291 Added 4K quirks for Corsair Neutron GTX SSD's 2013-10-15 17:03:02 +00:00
Steven Hartland
dce643c85f Added 4K quirks for:-
* OCZ Agility 2 SSDs
* Marvell SSDs
* Intel X25-M Series SSDs
2013-08-14 15:18:28 +00:00
Alexander Motin
7651b989e8 Fix returning incorrect bio_resid value with failed BIO_DELETE requests.
Neither residual length reported for ATA/SCSI command nor one from another
BIO_DELETE request are in any way related to the value to be returned.
2013-07-28 19:56:08 +00:00
Alexander Motin
69114bc0da Synchronize device cache on close only if there were some write operations.
While these operations are not really needed otherwise, at least for SCSI
they may cause extra errors if some other initiator holds write exclusive
reservation on the LUN (SYNCHRONIZE CACHE handled as "write" operation).
2013-07-27 22:44:55 +00:00
Steven Hartland
7f1c77876f Added 4K QUIRK for OCZ Vertex 4 SSDs
Submitted by:	Borja Marcos <borjam@sarenet.es>
MFC after:	2 days
2013-07-09 10:41:17 +00:00
Alexander Motin
2f87dfb0db Restore use of polling mode for disk cache flush in case of kernel panic.
While I am not sure that any extra hardware access is a good idea after
panic, that is an existing behaviour that should better work correctly.
2013-06-15 12:46:38 +00:00
Alexander Motin
967206bde7 Revert r251649:
ken@ noticed that with recently added d_gone() disk method GEOM already
holds reference on the periph, so we don't need another one.
2013-06-13 08:34:23 +00:00
Alexander Motin
7912f917ca Acquire periph reference when handling d_getattr() method call.
While GEOM in general has provider opened while sending BIO_GETATTR,
GEOM DISK does not really need to open disk to read medium-unrelated
attributes for own use.

Proposed by:	ken
2013-06-12 09:07:15 +00:00
Steven Hartland
32fe0ef7ac Added missing SCSI quirks from r241784
Re-ordered SSD quirks alphabetically so they are easier to maintain.

Removed my email and PR reference from comments on each quirk.

Added quirks for more SSDs:
* Crucial M4
* Corsair Force GT
* Intel 520 Series
* Kingston E100 Series
* Samsung 830 Series

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	1 week
2013-05-28 14:44:37 +00:00
Steven Hartland
6fb5c84ea2 Added output of device QUIRKS for CAM and AHCI devices during boot.
Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-05-18 23:36:21 +00:00
Eitan Adler
883db1c1d9 Intel's 320-series and 510-series SSDs advertise 512-byte sectors
sizes for both logical and physical. Add ADA_Q_4K quirks
for both.

PR:		kern/178040
Submitted by:	Jeremy Chadwick <jdc@koitsu.org>
2013-05-11 23:13:49 +00:00
Alexander Motin
2406f9e41b Disable sending Early R_OK on SiI3726/SiI3826 port multipliers.
With "cached read" HDD testing and multiple ports busy on a SATA
host controller, 3726/3826 PMP will very rarely drop a deferred
R_OK that was intended for the host. Symptom will be all 5 drives
under test will timeout, get reset, and recover.

Submitted by:	Rich Futyma <rich.futyma@sanmina.com>
MFC after:	2 weeks
2013-05-11 13:21:31 +00:00
Alexander Motin
3d6dd54e2f Rework r250298 in more correct way. 2013-05-06 16:50:39 +00:00
Alexander Motin
5ab64734f3 Fix byte order of ATA WWN when converting it to SCSI LUN ID. 2013-05-06 15:58:53 +00:00
Steven Hartland
62cc3a6314 Correct comment typo's
Add missing comment

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-28 21:14:23 +00:00
Alexander Motin
7338ef1a6b MFprojects/camlock r249542:
Remove ADA_FLAG_PACK_INVALID flag. Since ATA disks have no concept of media
change it only duplicates CAM_PERIPH_INVALID flag, so we can use last one.

Slightly cleanup DA_FLAG_PACK_INVALID use.
2013-04-27 12:46:04 +00:00
Steven Hartland
90edda31ba Added automatic detection of non-rotating media which disables the
use of BIO queue sorting, hence optimising performance for devices
such as SSD's

Reviewed by:	scottl
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 16:31:03 +00:00
Steven Hartland
9fe9ba5bef Teach GEOM and CAM about the difference between the max "size" of r/w and delete
requests.

sys/geom/geom_disk.h:
        - Added d_delmaxsize which represents the maximum size of individual
          device delete requests in bytes. This can be used by devices to
          inform geom of their size limitations regarding delete operations
          which are generally different from the read / write limits as data
          is not usually transferred from the host to physical device.

sys/geom/geom_disk.c:
        - Use new d_delmaxsize to calculate the size of chunks passed through to
          the underlying strategy during deletes instead of using read / write
          optimised values. This defaults to d_maxsize if unset (0).

        - Moved d_maxsize default up so it can be used to default d_delmaxsize

sys/cam/ata/ata_da.c:
        - Added d_delmaxsize calculations for TRIM and CFA

sys/cam/scsi/scsi_da.c:
        - Added re-calculation of d_delmaxsize whenever delete_method is set.

        - Added kern.cam.da.X.delete_max sysctl which allows the max size for
          delete requests to be limited. This is useful in preventing timeouts
          on devices who's delete methods are slow. It should be noted that
          this limit is reset then the device delete method is changed and
          that it can only be lowered not increased from the device max.

Reviewed by:	mav
Approved by:	pjd (mentor)
2013-04-26 16:22:54 +00:00
Steven Hartland
c213c55153 Updated TRIM calculations in cam/ata to be based off ATA_DSM_* defines
Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:59:19 +00:00
Alexander Motin
e5dfa058da MFprojects/camlock r248982:
Stop abusing xpt_periph in random plases that really have no periph related
to CCB, for example, bus scanning.  NULL value is fine in such cases and it
is correctly logged in debug messages as "noperiph".  If at some point we
need some real XPT periphs (alike to pmpX now), quite likely they will be
per-bus, and not a single global instance as xpt_periph now.
2013-04-14 09:55:48 +00:00
Alexander Motin
cccf422080 MFprojects/camlock r248890, r248897, r248898, r248900, r248903, r248905,
r248917, r248918, r248978, r249001, r249014, r249030:

Remove multilevel freezing mechanism, implemented to handle specifics of
the ATA/SATA error recovery, when post-reset recovery commands should be
allocated when queues are already full of payload requests.  Instead of
removing frozen CCBs with specified range of priorities from the queue
to provide free openings, use simple hack, allowing explicit CCBs over-
allocation for requests with priority higher (numerically lower) then
CAM_PRIORITY_OOB threshold.

Simplify CCB allocation logic by removing SIM-level allocation queue.
After that SIM-level queue manages only CCBs execution, while allocation
logic is localized within each single device.

Suggested by:	gibbs
2013-04-14 09:28:14 +00:00
Alexander Motin
a4f17f083f MFprojects/camlock r248894:
Use full freeze while PMP does hard reset. This is only cosmetical change.
2013-04-13 14:03:44 +00:00
Kenneth D. Merry
0ba1e4d063 Add a callback to the ada(4) driver so that it knows when GEOM has released
references to it.

This is the functional equivalent to change r237518, which added this
functionality to the cd(4) and da(4) drivers.

This fix prevents a panic caused by GEOM calling adaopen() while the device
is going away.  We now keep the device around until GEOM has finished
cleaning up its state.

ata_da.c:	In adaregister(), add a d_gone callback to the GEOM disk
		structure registered for the ada driver.  Increment the
		peripheral reference count for GEOM.

		Add a new callback, adadiskgonecb(), that GEOM calls when
		it is done with its resources.  This callback releases the
		reference acquired in adaregister().

Submitted by:	Po-Li Soong
Sponsored by:	Spectra Logic
MFC After:	5 days
2013-04-10 22:12:21 +00:00
Marius Strobl
d2ce15bd43 - With the demise of !ATA_CAM, ATA_STATIC_ID is the only ata(4) related
option left but actually consumed by ada(4), so move it to opt_ada.h
  and get rid of opt_ata.h.
- Fix stand-alone build of atacore(4) by adding opt_cam.h.
- Use __FBSDID.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
2013-04-06 19:12:49 +00:00
Alexander Motin
6bf435dc39 Replicate r245306 from SCSI to ATA. The problem didn't appear so far,
covered by multilevel freeze mechanism, but it is better to be safe.
2013-04-06 17:14:56 +00:00
Marius Strobl
2e1eb33217 Unbreak ATA_NO_48BIT_DMA with ATA_CAM by treating 48-bit DMA as an
optional property with PATA transport.

Reviewed by:	mav
MFC after:	3 days
2013-04-06 13:39:02 +00:00
Alexander Motin
dcdf6e7418 MFprojects/camlock:
r249017:
Some cosmetic things:
 - Unify device to target insertion inside xpt_alloc_device() instead of
duplicating it three times.
 - Remove extra checks for empty lists of devices and targets on release
since zero refcount check also implies it.
 - Reformat code to reduce indentation.

r249103:
 - Add lock assertions to every point where reference counters are modified.
 - When reference counters are reaching zero, add assertions that there are
no children items left.
 - Add a bit more locking to the xptpdperiphtraverse().
2013-04-04 20:31:40 +00:00
Alexander Motin
edec59d99e MFprojects/camlock r248931:
Replace some direct mutex operations with wrappers.

MFC after:	2 weeks
2013-04-04 19:07:37 +00:00
Alexander Motin
f86141290c MFprojects/camlock r248930:
Remove extra NULL checks. d_drv1 can never be NULL during periph life cycle.

MFC after:	2 weeks
2013-04-04 19:04:15 +00:00
Alexander Motin
45f6d66569 Remove all legacy ATA code parts, not used since options ATA_CAM enabled in
most kernels before FreeBSD 9.0.  Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam.  Remove the
atacontrol utility and some man pages.  Remove useless now options ATA_CAM.

No objections:	current@, stable@
MFC after:	never
2013-04-04 07:12:24 +00:00
Alexander Motin
d6794b7067 Add xpt_release_ccb()'s missed at r248872. That made shutdown -p stuck
on controller with small number of queue slots and several disks connected.
2013-04-03 11:30:18 +00:00
Steven Hartland
5f83aee5e5 Adds the ability to enable / disable sorting of BIO requests queued within
CAM. This can significantly improve performance particularly for SSDs
which don't suffer from seek latencies.

The sysctl / tunable kern.cam.sort_io_queues provides the systems default
setting where:-
0 = queued BIOs are NOT sorted
1 = queued BIOs are sorted (default)

Each device gets its own sysctl kern.cam.<type>.<id>.sort_io_queue
Valid values are:-
-1 = use system default (default)
0 = queued BIOs are NOT sorted
1 = queued BIOs are sorted

Note: Additional patch will look to add automatic use of none sorted queues
for none rotating media e.g. SSD's

Reviewed by:	scottl
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-29 22:58:15 +00:00
Alexander Motin
09cfadbe7f Make pre-shutdown flush and spindown routines to not use xpt_polled_action(),
but execute the commands in regular way.  There is no any reason to cook CPU
while the system is still fully operational.  After this change polling in
CAM is used only for kernel dumping.
2013-03-29 08:33:18 +00:00
Alexander Motin
f371c9e260 Implement CAM_PERIPH_FOREACH() macro, safely iterating over the list of
driver's periphs, acquiring and releaseing periph references while doing it.

Use it to iterate over the lists of ada and da periphs when flushing caches
and putting devices to sleep on shutdown and suspend.  Previous code could
panic in theory if some device disappear in the middle of the process.
2013-03-29 07:50:47 +00:00
Alexander Motin
6d14d0d010 Remove two bzero()s that are erasing only few more bytes then set later. 2013-03-25 06:31:17 +00:00
Konstantin Belousov
abc1e60e0e Support unmapped i/o for the md(4).
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl
2013-03-19 15:01:50 +00:00
Alexander Motin
6efe203d7c Hide SEMB port of the SiI3826 Port Multiplier by default to avoid extra
errors while it tries to talk via I2C to usually missing external SEP.
There is tunable to enable it back when needed.
2013-02-22 19:53:12 +00:00
Alexander Motin
09dff10118 Fix problem with the Samsung 840 PRO series SSD detection.
The device reports support for SATA Asynchronous Notification in its
IDENTIFY data, but returns error on attempt to enable that feature.
Make SATA XPT of CAM only report these errors, but not fail the device.

MFC after:	1 week
2012-11-26 20:07:10 +00:00
Eitan Adler
9d3334e191 Adds 4K quirks for the some SSD's which all perform better when 4K
aligned and only except 4K deletes (TRIM).

PR:		kern/169974
Submitted by:	Steven Hartland <steven.hartland@multiplay.co.uk>
Tested by:	ak
Reviewed by:	mav
Approved by:	cperciva (implicit)
MFC after:	1 week
2012-10-20 15:30:14 +00:00
Alexander Motin
6884b66275 Protect xpt_getattr() calls with the SIM lock and assert that.
Submitted by:	ken@ (earlier version)
2012-10-12 17:18:24 +00:00