freebsd-dev

Author	SHA1	Message	Date
Mariusz Zaborski	c27fb0b589	The kern.geom.part.auto_resize should be tunable.	2017-02-28 20:51:20 +00:00
Mariusz Zaborski	01ad653a81	Add sysctl to control auto resize of the GEOM metadata. Reviewed by: AllanJude Differential Revision: https://reviews.freebsd.org/D9603	2017-02-27 17:54:01 +00:00
Marius Strobl	4874af73c1	- Allow different slicers for different flash types to be registered with geom_flashmap(4) and teach it about MMC for slicing enhanced user data area partitions. The FDT slicer still is the default for CFI, NAND and SPI flash on FDT-enabled platforms. - In addition to a device_t, also pass the name of the GEOM provider in question to the slicers as a single device may provide more than provider. - Build a geom_flashmap.ko. - Use MODULE_VERSION() so other modules can depend on geom_flashmap(4). - Remove redundant/superfluous GEOM routines that either do nothing or provide/just call default GEOM (slice) functionality. - Trim/adjust includes Submitted by: jhibbits (RouterBoard bits) Reviewed by: jhibbits	2017-02-22 10:21:39 +00:00
Allan Jude	85c15ab853	improve PBKDF2 performance The PBKDF2 in sys/geom/eli/pkcs5v2.c is around half the speed it could be GELI's PBKDF2 uses a simple benchmark to determine a number of iterations that will takes approximately 2 seconds. The security provided is actually half what is expected, because an attacker could use the optimized algorithm to brute force the key in half the expected time. With this change, all newly generated GELI keys will be approximately 2x as strong. Previously generated keys will talk half as long to calculate, resulting in faster mounting of encrypted volumes. Users may choose to rekey, to generate a new key with the larger default number of iterations using the geli(8) setkey command. Security of existing data is not compromised, as ~1 second per brute force attempt is still a very high threshold. PR: 202365 Original Research: https://jbp.io/2015/08/11/pbkdf2-performance-matters/ Submitted by: Joe Pixton <jpixton@gmail.com> (Original Version), jmg (Later Version) Reviewed by: ed, pjd, delphij Approved by: secteam, pjd (maintainer) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8236	2017-02-19 19:30:31 +00:00
John Baldwin	dcbe5188da	Defer startup of gjournal switcher kproc. Don't start switcher kproc until the first GEOM is created. Reviewed by: pjd MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8576	2017-02-07 22:45:59 +00:00
Andrey V. Elsukov	9ef6004352	Check that primary GPT header is valid before wiping partitioning. This allows safely destroy corrupted GPT when primary header was rewritten by some data, that do not want to destroy. MFC after: 1 week	2017-02-04 05:09:47 +00:00
Yoshihiro Takahashi	2b375b4edd	Remove pc98 support completely. I thank all developers and contributors for pc98. Relnotes: yes	2017-01-28 02:22:15 +00:00
Alexander Motin	d3fef0a092	Report disk addition errors on `add` or `create` subcommand. MFC after: 1 week	2017-01-20 13:49:04 +00:00
Alexander Motin	17160457b4	Report random flash storage as non-rotating to GEOM_DISK. While doing it, introduce respective constants in geom_disk.h. MFC after: 1 week	2017-01-12 08:53:10 +00:00
Conrad Meyer	b28ea2c250	g_raid: Prevent tasters from attempting excessively large reads Some g_raid tasters attempt metadata reads in multiples of the provider sectorsize. Reads larger than MAXPHYS are invalid, so detect and abort in such situations. Spiritually similar to r217305 / PR 147851. PR: 214721 Sponsored by: Dell EMC Isilon	2017-01-12 06:58:31 +00:00
Dimitry Andric	012039fd55	Fix logic error in gvinum's gv_set_sd_state() With clang 4.0.0, I'm getting the following warnings: sys/geom/vinum/geom_vinum_state.c:186:7: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses] if (!flags & GV_SETSTATE_FORCE) ^ ~ The logical not operator should obiously be called after masking. Reviewed by: mav, pfg MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D9093	2017-01-08 17:56:54 +00:00
Sepherosa Ziehau	c22dceff9d	build: Unbreak LINT Sponsored by: Microsoft	2016-12-21 01:39:11 +00:00
Konrad Witaszczyk	480f31c214	Add support for encrypted kernel crash dumps. Changes include modifications in kernel crash dump routines, dumpon(8) and savecore(8). A new tool called decryptcore(8) was added. A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump configuration in the diocskerneldump_arg structure to the kernel. The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for backward ABI compatibility. dumpon(8) generates an one-time random symmetric key and encrypts it using an RSA public key in capability mode. Currently only AES-256-CBC is supported but EKCD was designed to implement support for other algorithms in the future. The public key is chosen using the -k flag. The dumpon rc(8) script can do this automatically during startup using the dumppubkey rc.conf(5) variable. Once the keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O control. When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random IV and sets up the key schedule for the specified algorithm. Each time the kernel tries to write a crash dump to the dump device, the IV is replaced by a SHA-256 hash of the previous value. This is intended to make a possible differential cryptanalysis harder since it is possible to write multiple crash dumps without reboot by repeating the following commands: # sysctl debug.kdb.enter=1 db> call doadump(0) db> continue # savecore A kernel dump key consists of an algorithm identifier, an IV and an encrypted symmetric key. The kernel dump key size is included in a kernel dump header. The size is an unsigned 32-bit integer and it is aligned to a block size. The header structure has 512 bytes to match the block size so it was required to make a panic string 4 bytes shorter to add a new field to the header structure. If the kernel dump key size in the header is nonzero it is assumed that the kernel dump key is placed after the first header on the dump device and the core dump is encrypted. Separate functions were implemented to write the kernel dump header and the kernel dump key as they need to be unencrypted. The dump_write function encrypts data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps are not supported due to the way they are constructed which makes it impossible to use the CBC mode for encryption. It should be also noted that textdumps don't contain sensitive data by design as a user decides what information should be dumped. savecore(8) writes the kernel dump key to a key.# file if its size in the header is nonzero. # is the number of the current core dump. decryptcore(8) decrypts the core dump using a private RSA key and the kernel dump key. This is performed by a child process in capability mode. If the decryption was not successful the parent process removes a partially decrypted core dump. Description on how to encrypt crash dumps was added to the decryptcore(8), dumpon(8), rc.conf(5) and savecore(8) manual pages. EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU. The feature still has to be tested on arm and arm64 as it wasn't possible to run FreeBSD due to the problems with QEMU emulation and lack of hardware. Designed by: def, pjd Reviewed by: cem, oshogbo, pjd Partial review: delphij, emaste, jhb, kib Approved by: pjd (mentor) Differential Revision: https://reviews.freebsd.org/D4712	2016-12-10 16:20:39 +00:00
Alexander Motin	b6fe583c55	Add `gmirror create` subcommand, alike to gstripe, gconcat, etc. It is quite specific mode of operation without storing on-disk metadata. It can be useful in some cases in combination with some external control tools handling mirror creation and disks hot-plug. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-11-30 09:27:08 +00:00
Alexander Motin	dc399583ba	Use providergone method to cover race between destroy and g_access(). Reviewed by: markj MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-11-13 03:56:26 +00:00
Alexander Motin	80f0a89c62	Do not report error on close even if we have no paths left. MFC after: 2 weeks	2016-11-12 18:57:38 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Conrad Meyer	8532d381a9	Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code. This can be handy in tracking down what code touched hung bios and bufs last. The full history is especially useful, but adds enough bloat that it shouldn't be enabled in release builds. Function names (or arbitrary string constants) are tracked in a fixed-size ring in bufs. Bios gain a pointer to the upper buf for tracking. SCSI CCBs gain a pointer to the upper bio for tracking. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8366	2016-10-31 23:09:52 +00:00
Ruslan Bukin	ae8b1f90fe	Fix alignment issues on MIPS: align the pointers properly. All the 5520 GEOM_ELI tests passed successfully on MIPS64EB. Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D7905	2016-10-31 16:55:14 +00:00
Mark Johnston	5c2ac5cf2a	gmirror: Add a subroutine to free synchronization BIOs. This addresses a memory leak that occurs upon an I/O error during a mirror synchronization. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2016-10-20 23:08:40 +00:00
Mark Johnston	b450976dc2	gmirror: Release pending regular requests when synchronization stops. Normally gmirror allows colliding requests to proceed whenever a synchronization request completes and advances to the next offset. However if an I/O request collides with one of the final g_mirror_syncreqs, nothing releases it once synchronization completes, resulting in an apparent I/O hang. The same problem can occur if synchronization is aborted by an I/O error. Therefore, be sure to requeue pending requests when mirror synchronization is stopped for any reason. While here, remove some dead code from g_mirror_regular_release(). MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2016-10-20 23:02:30 +00:00
Alexander Motin	5a236b0ef9	Fix possible geom destruction before final provider close. Introduce internal counter to track opens. Using provider's counters is not very successfull after calling g_wither_provider(). MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-10-06 15:20:05 +00:00
Mark Johnston	4dea20be45	gmirror: Write an updated syncid before queuing writes. When a syncid bump is pending, any write to the mirror results in the updated syncid being written to each component's metadata block. However, the update was only being performed after the writes to the mirror componenents were queued. Instead, synchronously update the metadata block first. MFC after: 3 weeks Sponsored by: Dell EMC Isilon	2016-10-06 00:13:55 +00:00
Mark Johnston	903618cd65	gmirror: Bump the syncid if broken disks are found during startup. Consider a mirror with two components, m1 and m2. Suppose a hardware error results in the removal of m2, with m1's genid bumped. Suppose further that a replacement mirror component m3 is created and synchronized, after which the system is shut down uncleanly. During a subsequent bootup, if gmirror tastes m1 and m2 first, m2 will be removed from the mirror because it is broken, but the mirror will be started without bumping the syncid on m1 because all elements of the mirror are accounted for. Then m3 will be added to the already-running mirror with the same syncid as m1, so the components will not be synchronized despite the unclean shutdown. Handle this scenario by bumping the syncid of healthy components if any broken mirrors are discovered during mirror startup. MFC after: 3 weeks Sponsored by: Dell EMC Isilon	2016-10-06 00:05:45 +00:00
Mark Johnston	fff048e4bc	gmirror: Use bool instead of boolean_t. MFC after: 1 week Sponsored by: Dell EMC Isilon	2016-10-05 23:55:01 +00:00
Adrian Chadd	85ab1aeccf	[geom_redboot] Extend geom_redboot to handle non-zero fis offset. Submitted by: Mori Hiroki <yamori813@yahoo.co.jp> Differential Revision: https://reviews.freebsd.org/D7237	2016-10-04 16:35:38 +00:00
Alexander Motin	8b64f3ca6c	Use g_wither_provider() where applicable. It is just a helper function combining G_PF_WITHER setting with g_orphan_provider().	2016-09-23 21:29:40 +00:00
Edward Tomasz Napierala	0c4440c3aa	Follow up r305988 by removing g_bio_run_task and related code. The g_io_schedule_up() gets its "if" condition swapped to make it more similar to g_io_schedule_down(). Suggested by: mav@ Reviewed by: mav@ MFC after: 1 month	2016-09-20 09:18:33 +00:00
Edward Tomasz Napierala	bbdd6614bd	Remove unused bio_taskqueue(). MFC after: 1 month	2016-09-19 17:46:15 +00:00
Mark Johnston	4bfb585351	Don't treat an error from g_mirror_clear_metadata() as fatal. Such errors can occur as the result of a write error or because the disk backing the mirror element was removed. They result in a generation ID bump on all active elements of the mirror, so we can safely disconnect the mirror component rather than destroy it. MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7750	2016-09-06 23:42:59 +00:00
Mark Johnston	40c5032d32	Add some fail points to gmirror. These are useful for testing changes to I/O error handling, and for reproducing existing bugs in a controlled manner. The fail points are g_mirror_regular_request_read g_mirror_regular_request_write g_mirror_sync_request_read g_mirror_sync_request_write g_mirror_metadata_write They all effectively allow one to inject an error value into the bio_error field of a corresponding BIO request as it is being completed. MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division	2016-09-06 23:35:48 +00:00
Andrey V. Elsukov	0428336393	Do not invoke resize event if initial disk size is zero. Some disks report the size only after first opening. And due to the events are asynchronous, some consumers can receive this event too late and this confuses them. This partially restores previous behaviour, and at the same time this should fix the problem, when already opened provider loses resize event. PR: 211028 MFC after: 3 weeks	2016-08-01 20:54:54 +00:00
Andrey V. Elsukov	1f353a2315	Do not invoke resize method if geom is being withered. PR: 211028 MFC after: 2 weeks	2016-07-25 09:12:08 +00:00
Andrey V. Elsukov	f1ff88cf8c	Use g_resize_provider() to change the size of GEOM_DISK provider, when it is being opened. This should fix the possible loss of a resize event when disk capacity changed. PR: 211028 Reported by: Dexuan Cui <decui at microsoft dot com> MFC after: 3 weeks	2016-07-19 05:36:21 +00:00
Maxim Sobolev	55f9588af4	Relax checking if the privider size matches size recorded in the superblock, allowing provider to be bit bigger, i.e. have some extra padding after the FS image. That in some cases might be a side-effect of using CLOOP format which enforces certain block size and trying to compress image that is not exactly the number of those blocks in size. The UFS itself does not have any issues mounting such padded file systems, so it's what GEOM_LABEL should do. Submitted by: @mizhka_gmail.com Differential Revision: https://reviews.freebsd.org/D6208	2016-07-18 05:00:01 +00:00
Mark Johnston	7d31c3939a	Move some gmirror metadata update messages to a higher debug level. These can be printed quite frequently from a mostly-idle mirror, cluttering the console. MFC after: 1 week	2016-07-14 00:40:24 +00:00
Maxim Sobolev	74ba4047a3	1.Improve handling around last compressed block of the file, which is necessary because CLOOP format lacks explicit EOF or length, so that in the presence of padding or when the CLOOP is put onto a larger partition upper level provider size may be larger. Bound amount of extra data that we might touch to the max length of the compressed block and detect zero-padding in the last cluster, which when sector is all-zero might cause us to emit bogus I/O error after decompression of that fails. To not make code any more complicated that it needs to be deal with it in lazy-manner, i.e. when we first access that specific cluster. This change also fixes stupid mistake in the LZMA code, inherited from geom_lzma, which does not share length of the output buffer buffer with the decompression routine, so that in the presence of corrupted or purposedly tailored data may easily cause heap overflow and kernel memory corruption. Beef up validation of the CLOOP TOC by checking that lengths of all but the last compressed clusters match upper limit set by the decompressor and improve some error diagnostic output while I am here. 2.Add kern.geom.uzip.attach_to tunable to artifically limit attaching uzip to certain devices in the dev tree only. For example the following only makes us attaching to the GPT labels: kern.geom.uzip.attach_to="gpt/" 3.Add kern.geom.uzip.noattach_to, which does opposite to the (2) above, i.e. prevents geom_uzip from tasting / attaching to providers matching some pattern. By default we don't attach to our own kind, i.e. kern.geom.uzip.noattach_to=".uzip". It saves us quite some CPU cycles, esp on low-end embedded systems. Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D7013	2016-06-29 18:19:05 +00:00
Kenneth D. Merry	a02e196edd	Switch geom_disk over to using a pool mutex. The GEOM disk d_mtx is only acquired on disk creation and destruction. It is a good candidate for replacement with a pool mutex. This eliminates the mutex initialization and teardown and the mutex and name variables themselves from struct disk. sys/geom/geom_disk.h: Take d_mtx and d_mtx_name out of struct disk. sys/geom/geom_disk.c: Use mtx_pool_lock() and mtx_pool_unlock() to guard the disk initialization state instead of a dedicated mutex. This allows removing the initialization and destruction of d_mtx. sys/sys/param.h: Bump __FreeBSD_version to 1100119 for the change to struct disk. Suggested by: jhb Sponsored by: Spectra Logic Approved by: re (gjb)	2016-06-23 20:05:59 +00:00
Mark Johnston	be20fc2e90	Do not complete pending gmirror BIOs when tearing down the provider. This will result in lock recursion and is more generally incorrect since the completion handlers will just reinsert the BIOs into the queue we're trying to drain. Reviewed by: imp, ngie Approved by: re (gjb) MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6908	2016-06-22 21:00:28 +00:00
Kenneth D. Merry	e5616d65d0	Fix a bug that caused da(4) peripheral drivers to not fully go away after the underlying device went away. The problem was that callers who queue the GEOM resize provider event didn't check to make sure that the provider had not been withered. For the other equivalent case, g_new_provider_event(), the code checks to see whether the provider has been withered before queueing a g_new_provider_event() to the event thread. In some cases, a resize provider event would come through after the provider had been withered and all of the existing consumers had been orphaned. When the resize event triggered a taste of the provider, that would attach a new consumer to the now withered provider. The wither washer (g_wither_washer() would never be able to completely tear down the GEOM because of the consumers that were hanging around. The solution was to check the G_PF_WITHER provider flag before queueing the g_resize_provider_event(), and add an assert to g_resize_provider_event() to insure that it isn't called on a withered provider. sys/geom/geom_subr.c: In g_resize_provider(), don't try to continue if the G_PF_WITHER flag is set. In g_resize_provider_event(), add an assert that the G_PF_WITHER flag is not set. In g_access(), if a provider has an error, print out the name of the provider with the error. Sponsored by: Spectra Logic Approved by: re (marius) MFC after: 3 days	2016-06-22 14:39:13 +00:00
Kenneth D. Merry	1ff824e786	Fix a bug that caused da(4) instances to hang around after the underlying device is gone. The problem was that when disk_gone() is called, if the GEOM disk creation process has not yet happened, the withering process couldn't start. We didn't record any state in the GEOM disk code, and so the d_gone() callback to the da(4) driver never happened. The solution is to track the state of the creation process, and initiate the withering process from g_disk_create() if the disk is being created. This change does add fields to struct disk, and so I have bumped DISK_VERSION. geom_disk.c: Track where we are in the disk creation process, and check to see whether our underlying disk has gone away or not. In disk_gone(), set a new d_goneflag variable that g_disk_create() can check to see if it needs to clean up the disk instance. geom_disk.h: Add a mutex to struct disk (for internal use) disk init level, and a gone flag. Bump DISK_VERSION because the size of struct disk has changed and fields have been added at the beginning. Sponsored by: Spectra Logic Approved by: re (marius)	2016-06-21 20:18:19 +00:00
Gleb Smirnoff	a7c5163b5f	When we are in panic, always go the asynchronous path in g_mirror_destroy(), otherwise the system will hang. This is a temporarily least intrusive crutch to get certain panicing systems dumping. The proper fix should question is g_mirror_destroy() should be called on a panicing system at all. Discussed with: mav	2016-06-01 22:11:54 +00:00
Alan Somers	151746b244	Avoid issuing spa config updates for physical path when not necessary ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack. sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged Submitted by: will, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6428	2016-05-27 22:32:44 +00:00
Konstantin Belousov	d5446cc8f4	Remove unneeded Giant locking around kthreads creation. Sponsored by: The FreeBSD Foundation	2016-05-20 08:28:11 +00:00
Konstantin Belousov	4e2732b550	Removal of Giant droping wrappers for GEOM classes. Sponsored by: The FreeBSD Foundation	2016-05-20 08:25:37 +00:00
Konstantin Belousov	dff9131e58	Remove asserts that Giant is not held on entrance into geom KPI, which outlived their usefulness. This allows to remove drop/pickup Giant wrappers around GEOM calls. Discussed with: alfred, imp, phk Sponsored by: The FreeBSD Foundation	2016-05-20 08:22:20 +00:00
Kenneth D. Merry	9a6844d55f	Add support for managing Shingled Magnetic Recording (SMR) drives. This change includes support for SCSI SMR drives (which conform to the Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to the Zoned ATA Command Set or ZAC spec) behind SAS expanders. This includes full management support through the GEOM BIO interface, and through a new userland utility, zonectl(8), and through camcontrol(8). This is now ready for filesystems to use to detect and manage zoned drives. (There is no work in progress that I know of to use this for ZFS or UFS, if anyone is interested, let me know and I may have some suggestions.) Also, improve ATA command passthrough and dispatch support, both via ATA and ATA passthrough over SCSI. Also, add support to camcontrol(8) for the ATA Extended Power Conditions feature set. You can now manage ATA device power states, and set various idle time thresholds for a drive to enter lower power states. Note that this change cannot be MFCed in full, because it depends on changes to the struct bio API that break compatilibity. In order to avoid breaking the stable API, only changes that don't touch or depend on the struct bio changes can be merged. For example, the camcontrol(8) changes don't depend on the new bio API, but zonectl(8) and the probe changes to the da(4) and ada(4) drivers do depend on it. Also note that the SMR changes have not yet been tested with an actual SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT layer, so any testing help would be appreciated. These changes have been tested with Seagate Host Aware SATA drives attached to both SAS and SATA controllers. Also, I do not have any SATA Host Managed devices, and I suspect that it may take additional (hopefully minor) changes to support them. Thanks to Seagate for supplying the test hardware and answering questions. sbin/camcontrol/Makefile: Add epc.c and zone.c. sbin/camcontrol/camcontrol.8: Document the zone and epc subcommands. sbin/camcontrol/camcontrol.c: Add the zone and epc subcommands. Add auxiliary register support to build_ata_cmd(). Make sure to set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA flags as appropriate for ATA commands. Add a new get_ata_status() function to parse ATA result from SCSI sense descriptors (for ATA passthrough over SCSI) and ATA I/O requests. sbin/camcontrol/camcontrol.h: Update the build_ata_cmd() prototype Add get_ata_status(), zone(), and epc(). sbin/camcontrol/epc.c: Support for ATA Extended Power Conditions features. This includes support for all features documented in the ACS-4 Revision 12 specification from t13.org (dated February 18, 2016). The EPC feature set allows putting a drive into a power power mode immediately, or setting timeouts so that the drive will automatically enter progressively lower power states after various idle times. sbin/camcontrol/fwdownload.c: Update the firmware download code for the new build_ata_cmd() arguments. sbin/camcontrol/zone.c: Implement support for Shingled Magnetic Recording (SMR) drives via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA Command Set (ZAC). These specs were developed in concert, and are functionally identical. The primary differences are due to SCSI and ATA differences. (SCSI is big endian, ATA is little endian, for example.) This includes support for all commands defined in the ZBC and ZAC specs. sys/cam/ata/ata_all.c: Decode a number of additional ATA command names in ata_op_string(). Add a new CCB building function, ata_read_log(). Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building functions. These support both DMA and NCQ encapsulation. sys/cam/ata/ata_all.h: Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and ata_zac_mgmt_in(). sys/cam/ata/ata_da.c: Revamp the ada(4) driver to support zoned devices. Add four new probe states to gather information needed for zone support. Add a new adasetflags() function to avoid duplication of large blocks of flag setting between the async handler and register functions. Add new sysctl variables that describe zone support and paramters. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. sys/cam/scsi/scsi_all.c: Add command descriptions for the ZBC IN/OUT commands. Add descriptions for ZBC Host Managed devices. Add a new function, scsi_ata_pass() to do ATA passthrough over SCSI. This will eventually replace scsi_ata_pass_16() -- it can create the 12, 16, and 32-byte variants of the ATA PASS-THROUGH command, and supports setting all of the registers defined as of SAT-4, Revision 5 (March 11, 2016). Change scsi_ata_identify() to use scsi_ata_pass() instead of scsi_ata_pass_16(). Add a new scsi_ata_read_log() function to facilitate reading ATA logs via SCSI. sys/cam/scsi/scsi_all.h: Add the new ATA PASS-THROUGH(32) command CDB. Add extended and variable CDB opcodes. Add Zoned Block Device Characteristics VPD page. Add ATA Return SCSI sense descriptor. Add prototypes for scsi_ata_read_log() and scsi_ata_pass(). sys/cam/scsi/scsi_da.c: Revamp the da(4) driver to support zoned devices. Add five new probe states, four of which are needed for ATA devices. Add five new sysctl variables that describe zone support and parameters. The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC devices when they are attached via a SCSI to ATA Translation (SAT) layer. Since ZBC -> ZAC translation is a new feature in the T10 SAT-4 spec, most SATA drives will be supported via ATA commands sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will prefer the ZBC interface, if it is available, for performance reasons, but will use the ATA PASS-THROUGH interface to the ZAC command set if the SAT layer doesn't support translation yet. As I mentioned above, ZBC command support is untested. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. Add scsi_zbc_in() and scsi_zbc_out() CCB building functions. Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB building functions. Note that these have return values, unlike almost all other CCB building functions in CAM. The reason is that they can fail, depending upon the particular combination of input parameters. The primary failure case is if the user wants NCQ, but fails to specify additional CDB storage. NCQ requires using the 32-byte version of the SCSI ATA PASS-THROUGH command, and the current CAM CDB size is 16 bytes. sys/cam/scsi/scsi_da.h: Add ZBC IN and ZBC OUT CDBs and opcodes. Add SCSI Report Zones data structures. Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and scsi_ata_zac_mgmt_in() prototypes. sys/dev/ahci/ahci.c: Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver. ahci_setup_fis() previously set the top bits of the sector count register in the FIS to 0 for FPDMA commands. This is okay for read and write, because the PRIO field is in the only thing in those bits, and we don't implement that further up the stack. But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that byte, so it needs to be transmitted to the drive. In ahci_setup_fis(), always set the the top 8 bits of the sector count register. We need it in both the standard and NCQ / FPDMA cases. sys/geom/eli/g_eli.c: Pass BIO_ZONE commands through the GELI class. sys/geom/geom.h: Add g_io_zonecmd() prototype. sys/geom/geom_dev.c: Add new DIOCZONECMD ioctl, which allows sending zone commands to disks. sys/geom/geom_disk.c: Add support for BIO_ZONE commands. sys/geom/geom_disk.h: Add a new flag, DISKFLAG_CANZONE, that indicates that a given GEOM disk client can handle BIO_ZONE commands. sys/geom/geom_io.c: Add a new function, g_io_zonecmd(), that handles execution of BIO_ZONE commands. Add permissions check for BIO_ZONE commands. Add command decoding for BIO_ZONE commands. sys/geom/geom_subr.c: Add DDB command decoding for BIO_ZONE commands. sys/kern/subr_devstat.c: Record statistics for REPORT ZONES commands. Note that the number of bytes transferred for REPORT ZONES won't quite match what is received from the harware. This is because we're necessarily counting bytes coming from the da(4) / ada(4) drivers, which are using the disk_zone.h interface to communicate up the stack. The structure sizes it uses are slightly different than the SCSI and ATA structure sizes. sys/sys/ata.h: Add many bit and structure definitions for ZAC, NCQ, and EPC command support. sys/sys/bio.h: Convert the bio_cmd field to a straight enumeration. This will yield more space for additional commands in the future. After change r297955 and other related changes, this is now possible. Converting to an enumeration will also prevent use as a bitmask in the future. sys/sys/disk.h: Define the DIOCZONECMD ioctl. sys/sys/disk_zone.h: Add a new API for managing zoned disks. This is very close to the SCSI ZBC and ATA ZAC standards, but uses integers in native byte order instead of big endian (SCSI) or little endian (ATA) byte arrays. This is intended to offer to the complete feature set of the ZBC and ZAC disk management without requiring the application developer to include SCSI or ATA headers. We also use one set of headers for ioctl consumers and kernel bio-level consumers. sys/sys/param.h: Bump __FreeBSD_version for sys/bio.h command changes, and inclusion of SMR support. usr.sbin/Makefile: Add the zonectl utility. usr.sbin/diskinfo/diskinfo.c Add disk zoning capability to the 'diskinfo -v' output. usr.sbin/zonectl/Makefile: Add zonectl makefile. usr.sbin/zonectl/zonectl.8 zonectl(8) man page. usr.sbin/zonectl/zonectl.c The zonectl(8) utility. This allows managing SCSI or ATA zoned disks via the disk_zone.h API. You can report zones, reset write pointers, get parameters, etc. Sponsored by: Spectra Logic Differential Revision: https://reviews.freebsd.org/D6147 Reviewed by: wblock (documentation)	2016-05-19 14:08:36 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Maxim Sobolev	e3d7ead7df	Add missing include "opt_geom.h" to make GEOM_UZIP_DEBUG option working, also rename enum member so it does not conflict with GEOM_UZIP option name. Submitted by: mizhka@gmail.com Differential Revision: https://reviews.freebsd.org/D6207	2016-05-06 20:32:39 +00:00
Pedro F. Giffuni	4ed3c0e713	sys: Make use of our rounddown() macro when sys/param.h is available. No functional change.	2016-04-30 14:41:18 +00:00

1 2 3 4 5 ...

2075 Commits