freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	8c6d5f8282	Introduce seperate mutex lock to protect protect CTL I/O pools, slightly reducing global CTL lock scope and congestion. While there, simplify CTL I/O pools KPI, hiding implementation details.	2013-11-11 08:27:20 +00:00
Alexander Motin	daa5487f36	Some CAM locks polishing: - Fix LOR and possible lock recursion when handling high-power commands. Introduce new lock to protect left power quota and list of frozen devices. - Correct locking around xpt periph creation. - Remove seems never used XPT_FLAG_OPEN xpt periph flag.	2013-11-10 12:16:09 +00:00
Steven Hartland	e2b8af8404	Corrected definition for old_rate to match d_rotation_rate MFC after: 2 Days X-MFC-With: r256956	2013-11-07 23:21:52 +00:00
Alexander Motin	9813c936c4	Fix lock recursion, triggered by `smartctl -a /dev/adaX`.	2013-11-01 00:14:15 +00:00
Nathan Whitehorn	abe8350519	printf() specifier updates to CAM to handle either 32-bit or 64-bit lun_id_t. MFC after: 2 weeks	2013-10-30 14:13:15 +00:00
Nathan Whitehorn	ef5758fa10	Implement extended LUN support. If PIM_EXTLUNS is set by a SIM, encode the upper 32-bits of the LUN, if possible, into the target_lun field as passed directly from the REPORT LUNs response. This allows extended LUN support to work for all LUNs with zeros in the lower 32-bits, which covers most addressing modes without breaking KBI. Behavior for drivers not setting PIM_EXTLUNS is unchanged. No user-facing interfaces are modified. Extended LUNs are stored with swizzled 16-bit word order so that, for devices implementing LUN addressing (like SCSI-2), the numerical representation of the LUN is identical with and without PIM_EXTLUNS. Thus setting PIM_EXTLUNS keeps most behavior, and user-facing LUN IDs, unchanged. This follows the strategy used in Solaris. A macro (CAM_EXTLUN_BYTE_SWIZZLE) is provided to transform a lun_id_t into a uint64_t ordered for the wire. This is the second part of work for full 64-bit extended LUN support and is designed to a bridge for stable/10 to the final 64-bit LUN code. The third and final part will involve widening lun_id_t to 64 bits and will not be MFCed. This third part will break the KBI but will keep the KPI unchanged so that all drivers that will care about this can be updated now and not require code changes between HEAD and stable/10. Reviewed by: scottl MFC after: 2 weeks	2013-10-29 15:36:58 +00:00
Alexander Motin	030844d1e7	Some microoptimizations for da and ada drivers: - Replace ordered_tag_count counter with single flag; - From da remove outstanding_cmds counter, duplicating pending_ccbs list; - From da_softc remove unused links field.	2013-10-24 14:05:44 +00:00
Alexander Motin	eeb9405409	Remove 128KB bzero() call done for every block I/O data buffer. On my tests this improves performance of the new iSCSI target backed by GEOM STRIPE of SSDs from 75K to 110K IOPS. Reviewed by: ken	2013-10-23 17:55:35 +00:00
Alexander Motin	ec08c07e8c	Minor (mostly cosmetical) addition to r256960.	2013-10-23 14:58:09 +00:00
Alexander Motin	f1486b5163	Move CAM_UNQUEUED_INDEX setting to the last moment and under the periph lock. This fixes race condition with cam_periph_ccbwait(), causing use-after-free.	2013-10-23 12:53:05 +00:00
Steven Hartland	c28078e903	Improve ZFS N-way mirror read performance by using load and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay	2013-10-23 09:54:58 +00:00
Alexander Motin	3231e8bddb	Fix memory and references leak due to unfreed path. Coverity CID: 1054773	2013-10-22 13:56:30 +00:00
Alexander Motin	8ec5ab3f16	Unconditionally acquire periph reference on CCB allocation failure. cam_periph_acquire() can return error if periph already invalidated, but that may be unacceptable and cause deadlock if the invalidated periph can't be destroyed without "executing" the scheduled request. Coverity CID: 1109822 MFC after: 2 months	2013-10-22 12:58:22 +00:00
Alexander Motin	40ea77a036	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
Alexander Motin	227d67aa54	Merge CAM locking changes from the projects/camlock branch to radically reduce lock congestion and improve SMP scalability of the SCSI/ATA stack, preparing the ground for the coming next GEOM direct dispatch support. Replace big per-SIM locks with bunch of smaller ones: - per-LUN locks to protect device and peripheral drivers state; - per-target locks to protect list of LUNs on target; - per-bus locks to protect reference counting; - per-send queue locks to protect queue of CCBs to be sent; - per-done queue locks to protect queue of completed CCBs; - remaining per-SIM locks now protect only HBA driver internals. While holding LUN lock it is allowed (while not recommended for performance reasons) to take SIM lock. The opposite acquisition order is forbidden. All the other locks are leaf locks, that can be taken anywhere, but should not be cascaded. Many functions, such as: xpt_action(), xpt_done(), xpt_async(), xpt_create_path(), etc. are no longer require (but allow) SIM lock to be held. To keep compatibility and solve cases where SIM lock can't be dropped, all xpt_async() calls in addition to xpt_done() calls are queued to completion threads for async processing in clean environment without SIM lock held. Instead of single CAM SWI thread, used for commands completion processing before, use multiple (depending on number of CPUs) threads. Load balanced between them using "hash" of the device B:T:L address. HBA drivers that can drop SIM lock during completion processing and have sufficient number of completion threads to efficiently scale to multiple CPUs can use new function xpt_done_direct() to avoid extra context switch. Make ahci(4) driver to use this mechanism depending on hardware setup. Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-21 12:00:26 +00:00
Alexander Motin	2030b2943b	MFprojects/camlock: Remove hard limit on number of BIOs handled with one ATA TRIM request.	2013-10-21 08:57:27 +00:00
Alexander Motin	8d36a71b76	Unify periph invalidation and destruction reporting. Print message containing device model and serial number on invalidation. Requested by: glebius MFC after: 1 week	2013-10-15 17:59:41 +00:00
Steven Hartland	d85805b291	Added 4K quirks for Corsair Neutron GTX SSD's	2013-10-15 17:03:02 +00:00
Alexander Motin	aa93041d2c	Unhide "Serial Number" lines from bootverbose. That information may be useful for system administration to have in hard copy (in logs) if one of several devices suddenly dies. Requested by: glebius MFC after: 1 week	2013-10-15 12:59:40 +00:00
Edward Tomasz Napierala	c48f182bb1	Remove no longer useful debugging output and a stale comment. Approved by: re (gjb) Sponsored by: FreeBSD Foundation	2013-10-09 17:34:45 +00:00
Edward Tomasz Napierala	02147e9cd0	Make the error handling more consistant. Shouldn't make any functional difference. Approved by: re (gjb) Sponsored by: FreeBSD Foundation	2013-10-09 17:06:03 +00:00
Edward Tomasz Napierala	8ba0396077	Tidy up, cache return value of a function, and add an assertion; shouldn't make any functional difference. Approved by: re (gjb) Sponsored by: FreeBSD Foundation	2013-10-09 16:55:52 +00:00
Alexander Motin	6dfc67e379	Close the race on path ID allocation in xpt_bus_register() if two buses are registered simultaneously. Due to topology unlock between the ID allocation and the bus registration there is a chance that two buses may get the same IDs. That is supposed reason of lock assertion panic in CAM during initial bus scanning after new iscsid initiates two sessions same time. Reported by: trasz Approved by: re (glebus, marius) MFC after: 2 weeks	2013-10-09 12:09:01 +00:00
Edward Tomasz Napierala	1008ac5eb7	Fix NOP-In/NOP-Out payload handling. Previous way didn't work at all; fortunately nothing seems to actually use this feature, but it's required by standard. Approved by: re (glebius) Sponsored by: FreeBSD Foundation	2013-10-09 12:03:04 +00:00
Edward Tomasz Napierala	a9c80a534a	Properly fix out of memory handling in the iSCSI target. Approved by: re (glebius) Sponsored by: FreeBSD Foundation	2013-10-08 19:18:02 +00:00
Edward Tomasz Napierala	0f30c5d3c0	Split cfiscsi_datamove() in two; no functional changes. Approved by: re (glebius) Sponsored by: FreeBSD Foundation	2013-10-05 16:22:33 +00:00
Edward Tomasz Napierala	c28c09c1f0	Don't leak memory when removing an unconnected session, and remove useless UMA_ZONE_NOFREE that caused another leak when unloading the module. Approved by: re (glebius) Sponsored by: FreeBSD Foundation	2013-10-04 19:31:41 +00:00
Nathan Whitehorn	b559575358	Make sure the CCB xflags field is initialized to zero so that CAM_EXTLUN_VALID is not erroneously set. Also add an XPORT_SRP identifier to the known SCSI transports for the SCSI RDMA protocol, as used, for example with Infiniband storage. Reviewed by: scottl Approved by: re (marius)	2013-09-27 16:02:40 +00:00
Scott Long	f564de00f7	Re-do r255853. Along with adding back the API/ABI changes from the original, this hides the contents of cam_compat.h from ktrace/kdump/truss, avoiding problems there. There are no user-servicable parts in there, so no need for those tools to be groping around in there. Approved by: re	2013-09-25 15:55:56 +00:00
Glen Barber	0082e54e9d	Revert r255853 pending fixes to build errors in usr.bin/kdump Approved by: re (implicit)	2013-09-25 01:48:45 +00:00
Scott Long	185884259b	Update the CAM API for FreeBSD 10: - Remove the timeout_ch field. It's been deprecated since FreeBSD 7.0; MPSAFE drivers should be managing their own timeout storage. The remaining non-MPSAFE drivers have been modified to also manage their own storage, and should be considered for updating to MPSAFE (or removal) during the FreeBSD 10.x lifecycle. - Add fields related to soft timeouts and quality of service, to be used in upcoming work. - Add room for more flags in the CCB header and path_inq structures. - Begin support for extended 64-bit LUNs. - Bump the CAM version number to 0x18, but add compat shims. Tested with camcontrol and smartctl. Reviewed by: nathanw, ken, kib Approved by: re Obtained from: Netflix	2013-09-24 16:50:53 +00:00
Edward Tomasz Napierala	9606f568fe	Properly ignore PDUs with CmdSN outside of allowed range. Approved by: re (glebius) Sponsored by: FreeBSD Foundation	2013-09-24 13:46:13 +00:00
Edward Tomasz Napierala	69aa56bef2	Fix a few instances of M_WAITOK in threads marked as prohibited from sleep, missed in r255824. Approved by: re (kib) Sponsored by: FreeBSD Foundation	2013-09-24 09:33:31 +00:00
Edward Tomasz Napierala	46aaea8995	Don't use M_WAITOK when running from context where sleeping is prohibited, such as callout or a geom thread. Approved by: re (marius) Sponsored by: FreeBSD Foundation	2013-09-23 19:54:44 +00:00
Edward Tomasz Napierala	ac873bb350	Add some spare fields to structs used by the new iSCSI stack - some just in case, some for future MC/S support. This requires kernel and world rebuild. Approved by: re (blanket) Sponsored by: FreeBSD Foundation	2013-09-20 21:26:51 +00:00
Edward Tomasz Napierala	009ea47eb2	Bring in the new iSCSI target and initiator. Reviewed by: ken (parts) Approved by: re (delphij) Sponsored by: FreeBSD Foundation	2013-09-14 15:29:06 +00:00
Alexander Motin	f9004a5db0	Make SES driver adequately react on simple enclosure devices -- read Short Enclosure status to enclosure status field, clear previous state and exit.	2013-09-06 15:41:37 +00:00
Bryan Venteicher	ffead710d5	Add camcontrol support for the SCSI sanitize command Reviewed by: ken, mjacob (eariler version) Sponsored by: Netapp	2013-09-06 15:19:57 +00:00
Alexander Motin	d7a52e7b49	Fix kernel panic if cache->nelms is zero. MFC after: 2 weeks	2013-09-06 14:31:52 +00:00
Alexander Motin	0d4f3c316e	Add debug trace points for freeze/release device queue.	2013-09-01 17:37:19 +00:00
Alexander Motin	1d64933fe2	Bring legacy CAM target implementation back into API/KPI-coherent and even functional state. While CTL is much more superior target from all points, there is no reason why this code should not work. Tested with ahc(4) as target side HBA. MFC after: 2 weeks	2013-09-01 13:01:59 +00:00
Alexander Motin	f017ca80b1	Fix SES_ENABLE_PASSTHROUGH kernel option, unexpectedly broken during driver overhaul. MFC after: 3 days	2013-09-01 12:18:44 +00:00
Alexander Motin	d1d536f0eb	Fix targbh crash on XPT_IMMED_NOTIFY error during attach.	2013-09-01 11:50:37 +00:00
Alexander Motin	27492bea85	Fix the build with CTLFEDEBUG, broken by unmapped I/O support changes.	2013-09-01 10:11:00 +00:00
Kenneth D. Merry	ee5bd4fc5a	Bump up the default timeouts for move commands in the ch(4) driver to 15 minutes, and 5 minutes for things like READ ELEMENT STATUS. This is needed to account for the worst case scenarios on at least some Spectra Logic tape libraries. Sponsored by: Spectra Logic MFC after: 3 days	2013-08-29 21:25:27 +00:00
Kenneth D. Merry	73825c1732	If a drive returns ASC/ASCQ 0x04,0x11 "Logical unit not ready, notify (enable spinup) required", instead of doing the normal retries, poll for a change in status. We will poll every half second for a minute for the status to change. Hitachi drives (and likely other SAS drives) return that ASC/ASCQ when they are waiting to spin up. What it means is that they are waiting for the SAS expander to send them the SAS NOTIFY (ENABLE SPINUP) primitive. That primitive is the mechanism expanders/enclosures use to sequence drive spinup to avoid overloading power supplies. Sponsored by: Spectra Logic MFC after: 3 days	2013-08-27 19:47:03 +00:00
Alexander Motin	40f27d7cf6	Add new attribute lunname to report only textual LUN-specific device IDs. While lunid attribute prefers to report numeric ones, having both may be useful in some situations.	2013-08-24 09:42:14 +00:00
Kenneth D. Merry	93729c1796	Add support to physio(9) for devices that don't want I/O split and configure sa(4) to request no I/O splitting by default. For tape devices, the user needs to be able to clearly understand what blocksize is actually being used when writing to a tape device. The previous behavior of physio(9) was that it would split up any I/O that was too large for the device, or too large to fit into MAXPHYS. This means that if, for instance, the user wrote a 1MB block to a tape device, and MAXPHYS was 128KB, the 1MB write would be split into 8 128K chunks. This would be done without informing the user. This has suboptimal effects, especially when trying to communicate status to the user. In the event of an error writing to a tape (e.g. physical end of tape) in the middle of a 1MB block that has been split into 8 pieces, the user could have the first two 128K pieces written successfully, the third returned with an error, and the last 5 returned with 0 bytes written. If the user is using a standard write(2) system call, all he will see is the ENOSPC error. He won't have a clue how much actually got written. (With a writev(2) system call, he should be able to determine how much got written in addition to the error.) The solution is to prevent physio(9) from splitting the I/O. The new cdev flag, SI_NOSPLIT, tells physio that the driver does not want I/O to be split beforehand. Although the sa(4) driver now enables SI_NOSPLIT by default, that can be disabled by two loader tunables for now. It will not be configurable starting in FreeBSD 11.0. kern.cam.sa.allow_io_split allows the user to configure I/O splitting for all sa(4) driver instances. kern.cam.sa.%d.allow_io_split allows the user to configure I/O splitting for a specific sa(4) instance. There are also now three sa(4) driver sysctl variables that let the users see some sa(4) driver values. kern.cam.sa.%d.allow_io_split shows whether I/O splitting is turned on. kern.cam.sa.%d.maxio shows the maximum I/O size allowed by kernel configuration parameters (e.g. MAXPHYS, DFLTPHYS) and the capabilities of the controller. kern.cam.sa.%d.cpi_maxio shows the maximum I/O size supported by the controller. Note that a better long term solution would be to implement support for chaining buffers, so that that MAXPHYS is no longer a limiting factor for I/O size to tape and disk devices. At that point, the controller and the tape drive would become the limiting factors. sys/conf.h: Add a new cdev flag, SI_NOSPLIT, that allows a driver to tell physio not to split up I/O. sys/param.h: Bump __FreeBSD_version to 1000049 for the addition of the SI_NOSPLIT cdev flag. kern_physio.c: If the SI_NOSPLIT flag is set on the cdev, return any I/O that is larger than si_iosize_max or MAXPHYS, has more than one segment, or would have to be split because of misalignment with EFBIG. (File too large). In the event of an error, print a console message to give the user a clue about what happened. scsi_sa.c: Set the SI_NOSPLIT cdev flag on the devices created for the sa(4) driver by default. Add tunables to control whether we allow I/O splitting in physio(9). Explain in the comments that allowing I/O splitting will be deprecated for the sa(4) driver in FreeBSD 11.0. Add sysctl variables to display the maximum I/O size we can do (which could be further limited by read block limits) and the maximum I/O size that the controller can do. Limit our maximum I/O size (recorded in the cdev's si_iosize_max) by MAXPHYS. This isn't strictly necessary, because physio(9) will limit it to MAXPHYS, but it will provide some clarity for the application. Record the controller's maximum I/O size reported in the Path Inquiry CCB. sa.4: Document the block size behavior, and explain that the option of allowing physio(9) to split the I/O will disappear in FreeBSD 11.0. Sponsored by: Spectra Logic	2013-08-24 04:52:22 +00:00
Edward Tomasz Napierala	81a2151d5c	CTL changes required for iSCSI target, most notably LUN remapping and a mechanism to allow CTL frontends for retrieving LUN options. Reviewed by: ken (earlier version)	2013-08-24 01:50:31 +00:00
Edward Tomasz Napierala	83fd94a416	Fix the (unused for now) SCSI_PROTO_iSCSI define to match style(9).	2013-08-21 07:45:47 +00:00

1 2 3 4 5 ...

1287 Commits