freebsd-dev

Author	SHA1	Message	Date
Alexander Motin	8b64f3ca6c	Use g_wither_provider() where applicable. It is just a helper function combining G_PF_WITHER setting with g_orphan_provider().	2016-09-23 21:29:40 +00:00
Edward Tomasz Napierala	0c4440c3aa	Follow up r305988 by removing g_bio_run_task and related code. The g_io_schedule_up() gets its "if" condition swapped to make it more similar to g_io_schedule_down(). Suggested by: mav@ Reviewed by: mav@ MFC after: 1 month	2016-09-20 09:18:33 +00:00
Edward Tomasz Napierala	bbdd6614bd	Remove unused bio_taskqueue(). MFC after: 1 month	2016-09-19 17:46:15 +00:00
Mark Johnston	4bfb585351	Don't treat an error from g_mirror_clear_metadata() as fatal. Such errors can occur as the result of a write error or because the disk backing the mirror element was removed. They result in a generation ID bump on all active elements of the mirror, so we can safely disconnect the mirror component rather than destroy it. MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7750	2016-09-06 23:42:59 +00:00
Mark Johnston	40c5032d32	Add some fail points to gmirror. These are useful for testing changes to I/O error handling, and for reproducing existing bugs in a controlled manner. The fail points are g_mirror_regular_request_read g_mirror_regular_request_write g_mirror_sync_request_read g_mirror_sync_request_write g_mirror_metadata_write They all effectively allow one to inject an error value into the bio_error field of a corresponding BIO request as it is being completed. MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division	2016-09-06 23:35:48 +00:00
Andrey V. Elsukov	0428336393	Do not invoke resize event if initial disk size is zero. Some disks report the size only after first opening. And due to the events are asynchronous, some consumers can receive this event too late and this confuses them. This partially restores previous behaviour, and at the same time this should fix the problem, when already opened provider loses resize event. PR: 211028 MFC after: 3 weeks	2016-08-01 20:54:54 +00:00
Andrey V. Elsukov	1f353a2315	Do not invoke resize method if geom is being withered. PR: 211028 MFC after: 2 weeks	2016-07-25 09:12:08 +00:00
Andrey V. Elsukov	f1ff88cf8c	Use g_resize_provider() to change the size of GEOM_DISK provider, when it is being opened. This should fix the possible loss of a resize event when disk capacity changed. PR: 211028 Reported by: Dexuan Cui <decui at microsoft dot com> MFC after: 3 weeks	2016-07-19 05:36:21 +00:00
Maxim Sobolev	55f9588af4	Relax checking if the privider size matches size recorded in the superblock, allowing provider to be bit bigger, i.e. have some extra padding after the FS image. That in some cases might be a side-effect of using CLOOP format which enforces certain block size and trying to compress image that is not exactly the number of those blocks in size. The UFS itself does not have any issues mounting such padded file systems, so it's what GEOM_LABEL should do. Submitted by: @mizhka_gmail.com Differential Revision: https://reviews.freebsd.org/D6208	2016-07-18 05:00:01 +00:00
Mark Johnston	7d31c3939a	Move some gmirror metadata update messages to a higher debug level. These can be printed quite frequently from a mostly-idle mirror, cluttering the console. MFC after: 1 week	2016-07-14 00:40:24 +00:00
Maxim Sobolev	74ba4047a3	1.Improve handling around last compressed block of the file, which is necessary because CLOOP format lacks explicit EOF or length, so that in the presence of padding or when the CLOOP is put onto a larger partition upper level provider size may be larger. Bound amount of extra data that we might touch to the max length of the compressed block and detect zero-padding in the last cluster, which when sector is all-zero might cause us to emit bogus I/O error after decompression of that fails. To not make code any more complicated that it needs to be deal with it in lazy-manner, i.e. when we first access that specific cluster. This change also fixes stupid mistake in the LZMA code, inherited from geom_lzma, which does not share length of the output buffer buffer with the decompression routine, so that in the presence of corrupted or purposedly tailored data may easily cause heap overflow and kernel memory corruption. Beef up validation of the CLOOP TOC by checking that lengths of all but the last compressed clusters match upper limit set by the decompressor and improve some error diagnostic output while I am here. 2.Add kern.geom.uzip.attach_to tunable to artifically limit attaching uzip to certain devices in the dev tree only. For example the following only makes us attaching to the GPT labels: kern.geom.uzip.attach_to="gpt/" 3.Add kern.geom.uzip.noattach_to, which does opposite to the (2) above, i.e. prevents geom_uzip from tasting / attaching to providers matching some pattern. By default we don't attach to our own kind, i.e. kern.geom.uzip.noattach_to=".uzip". It saves us quite some CPU cycles, esp on low-end embedded systems. Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D7013	2016-06-29 18:19:05 +00:00
Kenneth D. Merry	a02e196edd	Switch geom_disk over to using a pool mutex. The GEOM disk d_mtx is only acquired on disk creation and destruction. It is a good candidate for replacement with a pool mutex. This eliminates the mutex initialization and teardown and the mutex and name variables themselves from struct disk. sys/geom/geom_disk.h: Take d_mtx and d_mtx_name out of struct disk. sys/geom/geom_disk.c: Use mtx_pool_lock() and mtx_pool_unlock() to guard the disk initialization state instead of a dedicated mutex. This allows removing the initialization and destruction of d_mtx. sys/sys/param.h: Bump __FreeBSD_version to 1100119 for the change to struct disk. Suggested by: jhb Sponsored by: Spectra Logic Approved by: re (gjb)	2016-06-23 20:05:59 +00:00
Mark Johnston	be20fc2e90	Do not complete pending gmirror BIOs when tearing down the provider. This will result in lock recursion and is more generally incorrect since the completion handlers will just reinsert the BIOs into the queue we're trying to drain. Reviewed by: imp, ngie Approved by: re (gjb) MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6908	2016-06-22 21:00:28 +00:00
Kenneth D. Merry	e5616d65d0	Fix a bug that caused da(4) peripheral drivers to not fully go away after the underlying device went away. The problem was that callers who queue the GEOM resize provider event didn't check to make sure that the provider had not been withered. For the other equivalent case, g_new_provider_event(), the code checks to see whether the provider has been withered before queueing a g_new_provider_event() to the event thread. In some cases, a resize provider event would come through after the provider had been withered and all of the existing consumers had been orphaned. When the resize event triggered a taste of the provider, that would attach a new consumer to the now withered provider. The wither washer (g_wither_washer() would never be able to completely tear down the GEOM because of the consumers that were hanging around. The solution was to check the G_PF_WITHER provider flag before queueing the g_resize_provider_event(), and add an assert to g_resize_provider_event() to insure that it isn't called on a withered provider. sys/geom/geom_subr.c: In g_resize_provider(), don't try to continue if the G_PF_WITHER flag is set. In g_resize_provider_event(), add an assert that the G_PF_WITHER flag is not set. In g_access(), if a provider has an error, print out the name of the provider with the error. Sponsored by: Spectra Logic Approved by: re (marius) MFC after: 3 days	2016-06-22 14:39:13 +00:00
Kenneth D. Merry	1ff824e786	Fix a bug that caused da(4) instances to hang around after the underlying device is gone. The problem was that when disk_gone() is called, if the GEOM disk creation process has not yet happened, the withering process couldn't start. We didn't record any state in the GEOM disk code, and so the d_gone() callback to the da(4) driver never happened. The solution is to track the state of the creation process, and initiate the withering process from g_disk_create() if the disk is being created. This change does add fields to struct disk, and so I have bumped DISK_VERSION. geom_disk.c: Track where we are in the disk creation process, and check to see whether our underlying disk has gone away or not. In disk_gone(), set a new d_goneflag variable that g_disk_create() can check to see if it needs to clean up the disk instance. geom_disk.h: Add a mutex to struct disk (for internal use) disk init level, and a gone flag. Bump DISK_VERSION because the size of struct disk has changed and fields have been added at the beginning. Sponsored by: Spectra Logic Approved by: re (marius)	2016-06-21 20:18:19 +00:00
Gleb Smirnoff	a7c5163b5f	When we are in panic, always go the asynchronous path in g_mirror_destroy(), otherwise the system will hang. This is a temporarily least intrusive crutch to get certain panicing systems dumping. The proper fix should question is g_mirror_destroy() should be called on a panicing system at all. Discussed with: mav	2016-06-01 22:11:54 +00:00
Alan Somers	151746b244	Avoid issuing spa config updates for physical path when not necessary ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack. sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged Submitted by: will, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6428	2016-05-27 22:32:44 +00:00
Konstantin Belousov	d5446cc8f4	Remove unneeded Giant locking around kthreads creation. Sponsored by: The FreeBSD Foundation	2016-05-20 08:28:11 +00:00
Konstantin Belousov	4e2732b550	Removal of Giant droping wrappers for GEOM classes. Sponsored by: The FreeBSD Foundation	2016-05-20 08:25:37 +00:00
Konstantin Belousov	dff9131e58	Remove asserts that Giant is not held on entrance into geom KPI, which outlived their usefulness. This allows to remove drop/pickup Giant wrappers around GEOM calls. Discussed with: alfred, imp, phk Sponsored by: The FreeBSD Foundation	2016-05-20 08:22:20 +00:00
Kenneth D. Merry	9a6844d55f	Add support for managing Shingled Magnetic Recording (SMR) drives. This change includes support for SCSI SMR drives (which conform to the Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to the Zoned ATA Command Set or ZAC spec) behind SAS expanders. This includes full management support through the GEOM BIO interface, and through a new userland utility, zonectl(8), and through camcontrol(8). This is now ready for filesystems to use to detect and manage zoned drives. (There is no work in progress that I know of to use this for ZFS or UFS, if anyone is interested, let me know and I may have some suggestions.) Also, improve ATA command passthrough and dispatch support, both via ATA and ATA passthrough over SCSI. Also, add support to camcontrol(8) for the ATA Extended Power Conditions feature set. You can now manage ATA device power states, and set various idle time thresholds for a drive to enter lower power states. Note that this change cannot be MFCed in full, because it depends on changes to the struct bio API that break compatilibity. In order to avoid breaking the stable API, only changes that don't touch or depend on the struct bio changes can be merged. For example, the camcontrol(8) changes don't depend on the new bio API, but zonectl(8) and the probe changes to the da(4) and ada(4) drivers do depend on it. Also note that the SMR changes have not yet been tested with an actual SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT layer, so any testing help would be appreciated. These changes have been tested with Seagate Host Aware SATA drives attached to both SAS and SATA controllers. Also, I do not have any SATA Host Managed devices, and I suspect that it may take additional (hopefully minor) changes to support them. Thanks to Seagate for supplying the test hardware and answering questions. sbin/camcontrol/Makefile: Add epc.c and zone.c. sbin/camcontrol/camcontrol.8: Document the zone and epc subcommands. sbin/camcontrol/camcontrol.c: Add the zone and epc subcommands. Add auxiliary register support to build_ata_cmd(). Make sure to set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA flags as appropriate for ATA commands. Add a new get_ata_status() function to parse ATA result from SCSI sense descriptors (for ATA passthrough over SCSI) and ATA I/O requests. sbin/camcontrol/camcontrol.h: Update the build_ata_cmd() prototype Add get_ata_status(), zone(), and epc(). sbin/camcontrol/epc.c: Support for ATA Extended Power Conditions features. This includes support for all features documented in the ACS-4 Revision 12 specification from t13.org (dated February 18, 2016). The EPC feature set allows putting a drive into a power power mode immediately, or setting timeouts so that the drive will automatically enter progressively lower power states after various idle times. sbin/camcontrol/fwdownload.c: Update the firmware download code for the new build_ata_cmd() arguments. sbin/camcontrol/zone.c: Implement support for Shingled Magnetic Recording (SMR) drives via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA Command Set (ZAC). These specs were developed in concert, and are functionally identical. The primary differences are due to SCSI and ATA differences. (SCSI is big endian, ATA is little endian, for example.) This includes support for all commands defined in the ZBC and ZAC specs. sys/cam/ata/ata_all.c: Decode a number of additional ATA command names in ata_op_string(). Add a new CCB building function, ata_read_log(). Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building functions. These support both DMA and NCQ encapsulation. sys/cam/ata/ata_all.h: Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and ata_zac_mgmt_in(). sys/cam/ata/ata_da.c: Revamp the ada(4) driver to support zoned devices. Add four new probe states to gather information needed for zone support. Add a new adasetflags() function to avoid duplication of large blocks of flag setting between the async handler and register functions. Add new sysctl variables that describe zone support and paramters. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. sys/cam/scsi/scsi_all.c: Add command descriptions for the ZBC IN/OUT commands. Add descriptions for ZBC Host Managed devices. Add a new function, scsi_ata_pass() to do ATA passthrough over SCSI. This will eventually replace scsi_ata_pass_16() -- it can create the 12, 16, and 32-byte variants of the ATA PASS-THROUGH command, and supports setting all of the registers defined as of SAT-4, Revision 5 (March 11, 2016). Change scsi_ata_identify() to use scsi_ata_pass() instead of scsi_ata_pass_16(). Add a new scsi_ata_read_log() function to facilitate reading ATA logs via SCSI. sys/cam/scsi/scsi_all.h: Add the new ATA PASS-THROUGH(32) command CDB. Add extended and variable CDB opcodes. Add Zoned Block Device Characteristics VPD page. Add ATA Return SCSI sense descriptor. Add prototypes for scsi_ata_read_log() and scsi_ata_pass(). sys/cam/scsi/scsi_da.c: Revamp the da(4) driver to support zoned devices. Add five new probe states, four of which are needed for ATA devices. Add five new sysctl variables that describe zone support and parameters. The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC devices when they are attached via a SCSI to ATA Translation (SAT) layer. Since ZBC -> ZAC translation is a new feature in the T10 SAT-4 spec, most SATA drives will be supported via ATA commands sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will prefer the ZBC interface, if it is available, for performance reasons, but will use the ATA PASS-THROUGH interface to the ZAC command set if the SAT layer doesn't support translation yet. As I mentioned above, ZBC command support is untested. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. Add scsi_zbc_in() and scsi_zbc_out() CCB building functions. Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB building functions. Note that these have return values, unlike almost all other CCB building functions in CAM. The reason is that they can fail, depending upon the particular combination of input parameters. The primary failure case is if the user wants NCQ, but fails to specify additional CDB storage. NCQ requires using the 32-byte version of the SCSI ATA PASS-THROUGH command, and the current CAM CDB size is 16 bytes. sys/cam/scsi/scsi_da.h: Add ZBC IN and ZBC OUT CDBs and opcodes. Add SCSI Report Zones data structures. Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and scsi_ata_zac_mgmt_in() prototypes. sys/dev/ahci/ahci.c: Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver. ahci_setup_fis() previously set the top bits of the sector count register in the FIS to 0 for FPDMA commands. This is okay for read and write, because the PRIO field is in the only thing in those bits, and we don't implement that further up the stack. But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that byte, so it needs to be transmitted to the drive. In ahci_setup_fis(), always set the the top 8 bits of the sector count register. We need it in both the standard and NCQ / FPDMA cases. sys/geom/eli/g_eli.c: Pass BIO_ZONE commands through the GELI class. sys/geom/geom.h: Add g_io_zonecmd() prototype. sys/geom/geom_dev.c: Add new DIOCZONECMD ioctl, which allows sending zone commands to disks. sys/geom/geom_disk.c: Add support for BIO_ZONE commands. sys/geom/geom_disk.h: Add a new flag, DISKFLAG_CANZONE, that indicates that a given GEOM disk client can handle BIO_ZONE commands. sys/geom/geom_io.c: Add a new function, g_io_zonecmd(), that handles execution of BIO_ZONE commands. Add permissions check for BIO_ZONE commands. Add command decoding for BIO_ZONE commands. sys/geom/geom_subr.c: Add DDB command decoding for BIO_ZONE commands. sys/kern/subr_devstat.c: Record statistics for REPORT ZONES commands. Note that the number of bytes transferred for REPORT ZONES won't quite match what is received from the harware. This is because we're necessarily counting bytes coming from the da(4) / ada(4) drivers, which are using the disk_zone.h interface to communicate up the stack. The structure sizes it uses are slightly different than the SCSI and ATA structure sizes. sys/sys/ata.h: Add many bit and structure definitions for ZAC, NCQ, and EPC command support. sys/sys/bio.h: Convert the bio_cmd field to a straight enumeration. This will yield more space for additional commands in the future. After change r297955 and other related changes, this is now possible. Converting to an enumeration will also prevent use as a bitmask in the future. sys/sys/disk.h: Define the DIOCZONECMD ioctl. sys/sys/disk_zone.h: Add a new API for managing zoned disks. This is very close to the SCSI ZBC and ATA ZAC standards, but uses integers in native byte order instead of big endian (SCSI) or little endian (ATA) byte arrays. This is intended to offer to the complete feature set of the ZBC and ZAC disk management without requiring the application developer to include SCSI or ATA headers. We also use one set of headers for ioctl consumers and kernel bio-level consumers. sys/sys/param.h: Bump __FreeBSD_version for sys/bio.h command changes, and inclusion of SMR support. usr.sbin/Makefile: Add the zonectl utility. usr.sbin/diskinfo/diskinfo.c Add disk zoning capability to the 'diskinfo -v' output. usr.sbin/zonectl/Makefile: Add zonectl makefile. usr.sbin/zonectl/zonectl.8 zonectl(8) man page. usr.sbin/zonectl/zonectl.c The zonectl(8) utility. This allows managing SCSI or ATA zoned disks via the disk_zone.h API. You can report zones, reset write pointers, get parameters, etc. Sponsored by: Spectra Logic Differential Revision: https://reviews.freebsd.org/D6147 Reviewed by: wblock (documentation)	2016-05-19 14:08:36 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Maxim Sobolev	e3d7ead7df	Add missing include "opt_geom.h" to make GEOM_UZIP_DEBUG option working, also rename enum member so it does not conflict with GEOM_UZIP option name. Submitted by: mizhka@gmail.com Differential Revision: https://reviews.freebsd.org/D6207	2016-05-06 20:32:39 +00:00
Pedro F. Giffuni	4ed3c0e713	sys: Make use of our rounddown() macro when sys/param.h is available. No functional change.	2016-04-30 14:41:18 +00:00
Pedro F. Giffuni	e8d5712284	sys/geom: spelling fixes in comments. No functional change.	2016-04-29 20:56:58 +00:00
Pedro F. Giffuni	310aef3257	sys/geom: spelling fixes. These affect debugging messages. MFC after: 2 weeks	2016-04-28 19:26:46 +00:00
Pedro F. Giffuni	b99bce73e2	geom: unsign some types to match their definitions and avoid overflows. In struct:gctl_req, nargs is unsigned. In mirror: g_mirror_syncreqs is unsigned. In raid: in struct:g_raid_volume, v_disks_count is unsigned. In virstor: in struct:g_virstor_softc, n_components is unsigned. MFC after: 2 weeks	2016-04-27 15:10:40 +00:00
Conrad Meyer	4a2776e538	g_part_bsd64: Delete duplicate/dead code RAW_PART is handled earlier in the loop. Reported by: Coverity CID: 1223201 Sponsored by: EMC / Isilon Storage Division	2016-04-26 22:32:33 +00:00
Conrad Meyer	5ad33e776f	g_part_bsd64: Check for valid on-disk npartitions value This value is u32 on disk, but assigned to an int in memory. After we do the implicit conversion via assignment, check that the result is at least one[1] (non-negative[2]). 1. The subsequent for-loop iterates from gpt_entries minus one, down, until reaching zero. A negative or zero initial index results in undefined signed integer overflow. 2. It is also used to index into arrays later. In practice, we expected non-malicious disks to contain small positive values. Reported by: Coverity CID: 1223202 Sponsored by: EMC / Isilon Storage Division	2016-04-26 22:30:54 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
Maxim Sobolev	f260c3eadc	Relax TOC offsets checking somewhat, allowing offset pointing to the next byte past EOF to denote zero-block(s) at the very end of the file.	2016-04-26 06:50:38 +00:00
Maxim Sobolev	416ee66e25	o Fix handling of images with compression block sizes comparable to MAXPHYS. o Improve debug somewhat; o Convert "BUG BUG BUG message" into a proper KASSERT.	2016-04-23 06:31:46 +00:00
Alan Somers	1c2c346f09	DRY on buffer sizes. Update to r298420. sys/geom/geom_disk.c: In disk_attr_changed, don't repeat a buffer size. Reported by: ngie, hselasky MFC after: 4 weeks X-MFC-With: 298420 Sponsored by: Spectra Logic Corp	2016-04-21 21:13:41 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Alan Somers	42f42c9942	Notify userspace listeners when geom disk attributes have changed sys/geom/geom_disk.c: disk_attr_changed(): Generate a devctl event of type GEOM:<attr> for every call. MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5952	2016-04-21 16:43:15 +00:00
Pedro F. Giffuni	63b6b7a74a	Indentation issues. Contract some lines leftover from r298310. Mea culpa.	2016-04-20 16:19:44 +00:00
Pedro F. Giffuni	02abd40029	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
Pedro F. Giffuni	01b5c6f73e	g_gate: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:18:07 +00:00
Warner Losh	9a8fa125c1	Bump bio_cmd and bio_*flags from 8 bits to 16. Differential Revision: https://reviews.freebsd.org/D5784	2016-04-14 05:10:41 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Allan Jude	d873662594	Create the GELIBOOT GEOM_ELI flag This flag indicates that the user wishes to use the GELIBOOT feature to boot from a fully encrypted root file system. Currently, GELIBOOT does not support key files, and in the future when it does, they will be loaded differently. Due to the design of GELI, and the desire for secrecy, the GELI metadata does not know if key files are used or not, it just adds the key material (if any) to the HMAC before the optional passphrase, so there is no way to tell if a GELI partition requires key files or not. Since the GELIBOOT code in boot2 and the loader does not support keys, they will now only attempt to attach if this flag is set. This will stop GELIBOOT from prompting for passwords to GELIs that it cannot decrypt, disrupting the boot process PR: 208251 Reviewed by: ed, oshogbo, wblock Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D5867	2016-04-08 01:25:25 +00:00
Pedro F. Giffuni	21ff1f7469	g_sched_destroy(): prevent return of uninitialized scalar variable. For the !gsp case there some chance of returning an uninitialized return value. Prevent that from happening by initializing the error value. CID: 1006421	2016-04-03 16:25:51 +00:00
Warner Losh	ca19dfe480	Don't assume that bio_cmd is a bit mask. Differential Revision: https://reviews.freebsd.org/D5592	2016-03-10 06:25:39 +00:00
Warner Losh	8076d204da	Don't assume that bio_cmd is bit mask. Differential Revision: https://reviews.freebsd.org/D5593	2016-03-10 06:25:31 +00:00
Adrian Chadd	443a0f85dd	Fixes to make it compile under gcc-4.2.	2016-02-24 02:52:49 +00:00
Maxim Sobolev	5497acc527	Obsolete mkulzma(8) and geom_uncompress(4), their functionality is now provided by mkuzip(8) and geom_uzip(4) respectively. MFC after: 1 month	2016-02-24 00:39:36 +00:00
Maxim Sobolev	8f8cb840b0	Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333	2016-02-23 23:59:08 +00:00
Warner Losh	bd4c1dd6d6	Use the right size for zeroing. Submitted by: rpokala@	2016-02-17 18:28:38 +00:00
Warner Losh	c55f57071a	Create an API to reset a struct bio (g_reset_bio). This is mandatory for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI.	2016-02-17 17:16:02 +00:00
Adrian Chadd	61789a9a76	Teach the flashmap code about the SPI flash. PR: kern/206227 Submitted by: Stanislav Galabov <sgalabov@gmail.com>	2016-01-23 05:26:29 +00:00
Ravi Pokala	cb03a5029b	Add rotationrate to geom disk dumpconf Parse and report the nominal rotation rate reported by the drive. Reviewed by: sbruno, jhb Approved by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D4483 Requested by: Kevin Bowling < kevin.bowling @ kev009.com >	2016-01-14 21:52:21 +00:00
Allan Jude	4332feca4b	Make additional parts of sys/geom/eli more usable in userspace The upcoming GELI support in the loader reuses parts of this code Some ifdefs are added, and some code is moved outside of existing ifdefs The HMAC parts of GELI are broken out into their own file, to separate them from the kernel crypto/openssl dependant parts that are replaced in the boot code. Passed the GELI regression suite (tools/regression/geom/eli) Files=20 Tests=14996 Result: PASS Reviewed by: pjd, delphij MFC after: 1 week Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D4699	2016-01-07 05:47:34 +00:00
Allan Jude	9c0c355f2a	Add some additional GPT partition types 4 ChromeOS GPT types 2 Microsoft partition types the new OpenBSD partition type Approved by: marcel (mentor) MFC after: 1 week Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3841	2015-12-27 18:12:13 +00:00
Allan Jude	7a3f5d11fb	Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c cperciva's libmd implementation is 5-30% faster The same was done for SHA256 previously in r263218 cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation Extend sbin/md5 to create sha384(1) Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h} Reviewed by: cperciva, des, delphij Approved by: secteam, bapt (mentor) MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3929	2015-12-27 17:33:59 +00:00
Allan Jude	1747e1d875	Fix incorrect error message in geom map If geom_map fails to find the end of a mapped partition based on a search, it would return the incorrect error message, stating it could not parse the START value Reviewed by: adrian Approved by: bapt (mentor) Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D4187	2015-12-27 17:09:23 +00:00
Warner Losh	268f69f40b	It turns out that it's OK to sleep in this context, so use M_WAITOK for the softc for the delay module. Noticed by: rpokala@	2015-12-18 14:10:00 +00:00
Warner Losh	6a607537da	Scheduling module to introduce a fixed delay into the I/O path.	2015-12-18 05:39:25 +00:00
Steven Hartland	25080ac4d4	Prevent g_access calls to bad multipath members When a multipath member is orphaned its access members are zeroed before its removed if marked for wither, so prevent any future calls to g_access on such members. This prevents a panic on debug kernels which validates the resultant values aren't negative. Reviewed by: mav MFC after: 2 weeks Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4416	2015-12-15 21:11:41 +00:00
Andrey V. Elsukov	af90a87209	Make detection of GPT a bit more reliable. When we are detecting a partition table and didn't find PMBR, try to read backup GPT header from the last sector and if it is correct, assume that we have GPT. Reviewed by: rpokala MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D4282	2015-12-10 10:35:07 +00:00
Kenneth D. Merry	985108aeb1	Fix a style issue in g_disk_limit(). Noticed by: bdrewery MFC after: 1 week	2015-12-04 03:44:12 +00:00
Kenneth D. Merry	42fbdde413	Fix g_disk_vlist_limit() to work properly with deletes. Add a new bp argument to g_disk_maxsegs(), and add a new function, g_disk_maxsize() tha will properly determine the maximum I/O size for a delete or non-delete bio. Submitted by: will MFC after: 1 week Sponsored by: Spectra Logic	2015-12-04 03:38:35 +00:00
Kenneth D. Merry	a9934668aa	Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility. CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed. While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations. Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. 2. Add an ioctl to list currently outstanding CCBs in the various queues. 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that. 4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists. 5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it is something that could be changed later on. Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface. This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface. It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices. It can also copy data to and from regular files, block devices, tape devices, pipes, stdin, and stdout. It does not support queueing multiple commands to any of those targets, since it uses the standard read(2)/write(2)/writev(2)/readv(2) system calls. The I/O is done by two threads, one for the reader and one for the writer. The reader thread sends completed read requests to the writer thread in strictly sequential order, even if they complete out of order. That could be modified later on for random I/O patterns or slightly out of order I/O. camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from the pass(4) driver and also to send request notifications internally. For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) per CAM CCB on the reading side, and a scatter/gather list (CAM_DATA_SG) on the writing side. In addition to testing both interfaces, this makes any potential reblocking of I/O easier. No data is copied between the reader and the writer, but rather the reader's buffers are split into multiple I/O requests or combined into a single I/O request depending on the input and output blocksize. For the file I/O path, camdd(8) also uses a single buffer (read(2), write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list (readv(2), writev(2), preadv(2), pwritev(2)) on writes. Things that would be nice to do for camdd(8) eventually: 1. Add support for I/O pattern generation. Patterns like all zeros, all ones, LBA-based patterns, random patterns, etc. Right Now you can always use /dev/zero, /dev/random, etc. 2. Add support for a "sink" mode, so we do only reads with no writes. Right now, you can use /dev/null. 3. Add support for automatic queue depth probing, so that we can figure out the right queue depth on the input and output side for maximum throughput. At the moment it defaults to 6. 4. Add support for SATA device passthrough I/O. 5. Add support for random LBAs and/or lengths on the input and output sides. 6. Track average per-I/O latency and busy time. The busy time and latency could also feed in to the automatic queue depth determination. sys/cam/scsi/scsi_pass.h: Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue and fetch asynchronous CAM CCBs respectively. Although these ioctls do not have a declared argument, they both take a union ccb pointer. If we declare a size here, the ioctl code in sys/kern/sys_generic.c will malloc and free a buffer for either the CCB or the CCB pointer (depending on how it is declared). Since we have to keep a copy of the CCB (which is fairly large) anyway, having the ioctl malloc and free a CCB for each call is wasteful. sys/cam/scsi/scsi_pass.c: Add asynchronous CCB support. Add two new ioctls, CAMIOQUEUE and CAMIOGET. CAMIOQUEUE adds a CCB to the incoming queue. The CCB is executed immediately (and moved to the active queue) if it is an immediate CCB, but otherwise it will be executed in passstart() when a CCB is available from the transport layer. When CCBs are completed (because they are immediate or passdone() if they are queued), they are put on the done queue. If we get the final close on the device before all pending I/O is complete, all active I/O is moved to the abandoned queue and we increment the peripheral reference count so that the peripheral driver instance doesn't go away before all pending I/O is done. The new passcreatezone() function is called on the first call to the CAMIOQUEUE ioctl on a given device to allocate the UMA zones for I/O requests and S/G list buffers. This may be good to move off to a taskqueue at some point. The new passmemsetup() function allocates memory and scatter/gather lists to hold the user's data, and copies in any data that needs to be written. For virtual pointers (CAM_DATA_VADDR), the kernel buffer is malloced from the new pass(4) driver malloc bucket. For virtual scatter/gather lists (CAM_DATA_SG), buffers are allocated from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. Physical pointers are passed in unchanged. We have support for up to 16 scatter/gather segments (for the user and kernel S/G lists) in the default struct pass_io_req, so requests with longer S/G lists require an extra kernel malloc. The new passcopysglist() function copies a user scatter/gather list to a kernel scatter/gather list. The number of elements in each list may be different, but (obviously) the amount of data stored has to be identical. The new passmemdone() function copies data out for the CAM_DATA_VADDR and CAM_DATA_SG cases. The new passiocleanup() function restores data pointers in user CCBs and frees memory. Add new functions to support kqueue(2)/kevent(2): passreadfilt() tells kevent whether or not the done queue is empty. passkqfilter() adds a knote to our list. passreadfiltdetach() removes a knote from our list. Add a new function, passpoll(), for poll(2)/select(2) to use. Add devstat(9) support for the queued CCB path. sys/cam/ata/ata_da.c: Add support for the BIO_VLIST bio type. sys/cam/cam_ccb.h: Add a new enumeration for the xflags field in the CCB header. (This doesn't change the CCB header, just adds an enumeration to use.) sys/cam/cam_xpt.c: Add a new function, xpt_setup_ccb_flags(), that allows specifying CCB flags. sys/cam/cam_xpt.h: Add a prototype for xpt_setup_ccb_flags(). sys/cam/scsi/scsi_da.c: Add support for BIO_VLIST. sys/dev/md/md.c: Add BIO_VLIST support to md(4). sys/geom/geom_disk.c: Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size limiting code in g_disk_start() a bit. sys/kern/subr_bus_dma.c: Change _bus_dmamap_load_vlist() to take a starting offset and length. Add a new function, _bus_dmamap_load_pages(), that will load a list of physical pages starting at an offset. Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios. Allow unmapped I/O to start at an offset. sys/kern/subr_uio.c: Add two new functions, physcopyin_vlist() and physcopyout_vlist(). sys/pc98/include/bus.h: Guard kernel-only parts of the pc98 machine/bus.h header with #ifdef _KERNEL. This allows userland programs to include <machine/bus.h> to get the definition of bus_addr_t and bus_size_t. sys/sys/bio.h: Add a new bio flag, BIO_VLIST. sys/sys/uio.h: Add prototypes for physcopyin_vlist() and physcopyout_vlist(). share/man/man4/pass.4: Document the CAMIOQUEUE and CAMIOGET ioctls. usr.sbin/Makefile: Add camdd. usr.sbin/camdd/Makefile: Add a makefile for camdd(8). usr.sbin/camdd/camdd.8: Man page for camdd(8). usr.sbin/camdd/camdd.c: The new camdd(8) utility. Sponsored by: Spectra Logic MFC after: 1 week	2015-12-03 20:54:55 +00:00
Steven Hartland	86787e8d97	Fix early kernel dump via dumpdev env Setting the dumpdev via env e.g. loader.conf provides the ability to configure the kernel dump device during early boot. When using this g_io_getattr was returning EPERM due to cp->acr == 0. Fix this by calling g_access to ensure we're a read consumer prior to calling g_dev_setdumpdev. MFC after: 2 weeks Sponsored by: Multiplay	2015-11-17 20:55:50 +00:00
Steven Hartland	2dc7e36b0b	Fix g_eli error loss conditions * Ensure that error information isn't lost. * Log the error code in all cases. * Don't overwrite bio_completed set to 0 from the error condition. MFC after: 2 weeks Sponsored by: Multiplay	2015-11-05 17:37:35 +00:00
Alexander Motin	4a3760bae6	Remove compatibility shims for legacy ATA device names. We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely removed old stack at 10.x, so at 11.x it is time to remove compat shims.	2015-10-11 13:01:51 +00:00
Edward Tomasz Napierala	45d7de1d37	Make geom_nop(4) collect statistics on all types of BIOs, not just reads and writes. PR: kern/198405 Submitted by: Matthew D. Fuller <fullermd at over-yonder dot net> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3679	2015-10-10 09:03:31 +00:00
Conrad Meyer	b4d7290796	geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon Skip a /dev/ prefix, if one is present, when checking for matching device names for dump. Suggested by: avg Reviewed by: markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3725	2015-09-23 21:08:52 +00:00
Edward Tomasz Napierala	bb27d7ed90	Add a way to specify stripesize and stripeoffset to gnop(8). This makes it possible to "simulate" 4K media, to eg test alignment handling. Reviewed by: mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3664	2015-09-15 18:01:59 +00:00
Warner Losh	3f2e5b8584	After the introduction of direct dispatch, the pacing code in g_down() broke in two ways. One, the pacing variable was accessed in multiple threads in an unsafe way. Two, since large numbers of I/O could come down from the buf layer at one time, large numbers of allocation failures could happen all at once, resulting in a huge pace value that would limit I/Os to 10 IOPS for minutes (or even hours) at a time. While a real solution to these problems requires substantial work (to go to a no-allocation after the first model, or to have some way to wait for more memory with some kind of reserve for pager and swapper requests), it is relatively easy to make this simplistic pacing less pathological. Move to using a volatile variable with loads and stores. While this is a little racy, losing the race is safe: either you get memory and proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever is larger) instead of 100ms. This removes the artificial 10 IOPS limit while still easing up on new I/Os during memory shortages. Remove tying the amount of time we do this to the number of failed requests and do it only as long as we keep failing requests. Finally, to avoid needless recursion when memory is tight (start -> g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the stack), don't do direct dispatch while pacing. This should be a rare event (not steady state) so the performance hit here is worth the extra safety of not starving g_down() with directly dispatched I/O. Differential Review: https://reviews.freebsd.org/D3546	2015-09-02 17:29:30 +00:00
Justin Hibbits	6aabc119b6	Create a RouterBoard platform and use it to create a flash map Summary: The RouterBoard uses a predefined partition map which doesn't exist in the fdt. This change allows overriding the fdt slicer with a custom slicer, and uses this custom slicer to define the flash map on the RouterBoard RB800. D3305 converts the mpc85xx platform into a base class, so that systems based on the mpc85xx platform can add their own overrides. This change builds on D3305, and creates a RouterBoard (RB800) platform to initialize the slicer override. Reviewed By: nwhitehorn, imp Differential Revision: https://reviews.freebsd.org/D3345	2015-08-22 05:50:18 +00:00
Pedro F. Giffuni	6bc3fe5f4e	Clean out some externally visible "more then" grammar MFC after: 3 days	2015-08-11 03:12:09 +00:00
Enji Cooper	604083d74c	Make some debug printf's into DPRINTF's to reduce noise on attach/detahh Similar reasoning to what was done in r286367 with geom_uzip(4) MFC after: 2 weeks Differential Revision: D3320 Sponsored by: EMC / Isilon Storage Division	2015-08-09 06:58:06 +00:00
Pawel Jakub Dawidek	46e3447026	Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected when GELI is used on a SSD or inside virtual machine, so that guest can tell host that it is no longer using some of the storage. Enabling BIO_DELETE passthru comes with a small security consequence - an attacker can tell how much space is being really used on encrypted device and has less data no analyse then. This is why the -T option can be given to the init subcommand to turn off this behaviour and -t/T options for the configure subcommand can be used to adjust this setting later. PR: 198863 Submitted by: Matthew D. Fuller fullermd at over-yonder dot net This commit also includes a fix from Fabian Keil freebsd-listen at fabiankeil.de for 'configure' on onetime providers which is not strictly related, but is entangled in the same code, so would cause conflicts if separated out.	2015-08-08 09:51:38 +00:00
Konstantin Belousov	347e9d5495	Minor style cleanup of the code surrounding r286404. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-07 08:24:12 +00:00
Konstantin Belousov	9b34965019	The condition to use direct processing for the unmapped bio is reverted. We can do direct processing when g_io_check() does not need to perform transient remapping of the bio, otherwise the thread has to sleep. Reviewed by: mav (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-07 08:13:34 +00:00
Pawel Jakub Dawidek	5ee9ea19fe	After crypto_dispatch() bio might be already delivered and destroyed, so we cannot access it anymore. Setting an error later lead to memory corruption. Assert that crypto_dispatch() was successful. It can fail only if we pass a bogus crypto request, which is a bug in the program, not a runtime condition. PR: 199705 Submitted by: luke.tw Reviewed by: emaste MFC after: 3 days	2015-08-06 17:13:34 +00:00
Enji Cooper	fcc8461cfb	Make some debug printf's into DPRINTF's to reduce noise on attach/detach Differential Revision: https://reviews.freebsd.org/D3306 MFC after: 1 week Reviewed by: loos Sponsored by: EMC / Isilon Storage Division	2015-08-06 15:30:14 +00:00
Edward Tomasz Napierala	72800098bf	Fix panic triggered by code like this: open("/dev/md0", O_EXEC); Discussed with: kib@, mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3051	2015-08-04 10:40:08 +00:00
Edward Tomasz Napierala	d6cc35b287	Fix panic that would happen on forcibly unmounting devfs (note that as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified to trigger the panic) with consumers still opened. Note that this still results in a leak of r/w/e counters. It seems to be harmless, though. If anyone knows a better way to approach this - please tell. Discussed with: kib@, mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3050	2015-08-03 16:35:18 +00:00
Andrey V. Elsukov	da6c24e123	Report the scheme and provider names in warning message about unaligned partition. PR: 201873 MFC after: 1 week	2015-07-26 11:16:48 +00:00
Allan Jude	ce808c7ad8	Add a new option to gpart(8) to fix Lenovo BIOS boot issue PR: 184910 Reviewed by: ae, wblock Approved by: marcel MFC after: 3 days Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3065	2015-07-15 02:23:55 +00:00
Pawel Jakub Dawidek	4273d41299	Spoil even can happen for some time now even on providers opened exclusively (on the media change event). Update GELI to handle that situation. PR: 201185 Submitted by: Matthew D. Fuller	2015-07-10 19:27:19 +00:00
Pawel Jakub Dawidek	fefb6a143a	Properly propagate errors in metadata reading. PR: 198860 Submitted by: Matthew D. Fuller	2015-07-02 10:57:34 +00:00
Pawel Jakub Dawidek	edaa9008ff	Allow to omit keyfile number for the first keyfile.	2015-07-02 10:55:32 +00:00
Edward Tomasz Napierala	628b712826	Fix off-by-one error in fstyp(8) and geom_label(4) that made them use a single space (" ") as a CD9660 label name when no label was present. Similar problem was also present in msdosfs label recognition. PR: 200828 Differential Revision: https://reviews.freebsd.org/D2830 Reviewed by: asomers@, emaste@ MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2015-06-18 21:55:55 +00:00
Andrey V. Elsukov	e7d0c7e458	Teach G_PART_GPT class to handle g_resize_provider event. MFC after: 10 days	2015-06-08 12:52:41 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Andrey V. Elsukov	153c57b5b4	Read GEOM_UNCOMPRESS metadata using several requests that fit into MAXPHYS. For large compressed images the metadata size can be bigger than MAXPHYS and this triggers KASSERT in g_read_data(). Also use g_free() to free memory allocated by g_read_data(). PR: 199476 MFC after: 2 weeks	2015-05-19 09:28:52 +00:00
Andrey V. Elsukov	4b8d4f97b0	Add apple-boot, apple-hfs and apple-ufs aliases to MBR scheme. Sort DOSPTYP_* entries in diskmbr.h by value. Document these scheme-specific types in gpart(8). MFC after: 1 week	2015-05-05 09:33:02 +00:00
Craig Rodrigues	d9db52256e	Move zlib.c from net to libkern. It is not network-specific code and would be better as part of libkern instead. Move zlib.h and zutil.h from net/ to sys/ Update includes to use sys/zlib.h and sys/zutil.h instead of net/ Submitted by: Steve Kiernan stevek@juniper.net Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28 Relnotes: yes	2015-04-22 14:38:58 +00:00
Pedro F. Giffuni	4a5e6b854d	g_uncompress_taste: prevent a double free. Found by: Clang Static Analyzer MFC after: 1 week	2015-04-20 16:31:27 +00:00
Alexander Motin	0ada3afc25	Remove sleeps from geom_up thread on device destruction. MFC after: 3 days.	2015-04-09 13:09:05 +00:00
Alexander Motin	5d85cd2d11	Remove extra semicolon. MFC after: 1 week	2015-03-27 12:45:20 +00:00
Alexander Motin	3ab0187add	Remove request sorting from GEOM_MIRROR and GEOM_RAID. When CPU is not busy, those queues are typically empty. When CPU is busy, then one more extra sorting is the last thing it needs. If specific device (HDD) really needs sorting, then it will be done later by CAM. This supposed to fix livelock reported for mirror of two SSDs, when UFS fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by pointless sorting of requests and responses under single mutex lock. MFC after: 2 weeks	2015-03-27 12:44:28 +00:00
Alexander Motin	41fe4ba647	Fix bug on memory allocation error in split method. While there, use bioq_takefirst() in place where it is convenient. MFC after: 1 week	2015-03-27 11:14:12 +00:00
Alexander Motin	5523c82c1a	Make GEOM_PART work in presence of previous withered self. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2015-03-26 12:17:47 +00:00
Alexander Motin	2f36085dcf	Report withered providers as such alike to GEOMs. MFC after: 2 weeks	2015-03-26 11:19:24 +00:00
Alexander Motin	ba772028db	When searching for provider by name, prefer non-withered one. MFC after: 2 weeks	2015-03-26 11:02:29 +00:00
Adrian Chadd	28d507fcec	Fix the label search routine in geom_map to not trip up on '\0' bytes. * Just do the buf check early and fail out * If the offset being searched is: 00110000 00 b5 7e 45 61 e2 76 d3 c1 78 dd 15 95 cd 1f f1 \|..~Ea.v..x......\| .. and the match string is '.!/bin/sh' .. then it'll set the match string[0] to '\0', do a strncmp() against the read buffer, find it's matching two zero-length strings, and think that's where to start. MFC after: 2 weeks	2015-03-19 03:58:25 +00:00
Andrey V. Elsukov	4fb4ebe0a4	Add GUID and alias for Apple Core Storage partition. PR: 196241 MFC after: 1 week	2015-03-12 18:51:31 +00:00
Alexander Motin	7715befdf2	Fix couple BIO_DELETE bugs in geom_mirror. Do not report GEOM::candelete if none of providers support BIO_DELETE. If consumer still requests BIO_DELETE, report error instead of hanging. MFC after: 2 weeks	2015-03-12 10:20:53 +00:00
Alexander Motin	0b1b7c2cec	Replace constant with proper sizeof(). Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2015-02-25 10:18:11 +00:00
Edward Tomasz Napierala	01de1a0650	Add devd(8) notifications for creation and destruction of GEOM devices. Differential Revision: https://reviews.freebsd.org/D1211 MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-01-14 11:15:57 +00:00
Warner Losh	a91275f72f	Remove old ioctl use and support, once and for all.	2015-01-06 05:28:37 +00:00
Warner Losh	0acf08d985	Remove support for FreeBSD 7 and really old FreeBSD 8. The classifiers have been in the base for a while, so the gymnastics here aren't needed. In addition, the bugs in subr_disk.c have been fixed since 2009, so there's no need for an identical copy of it in the tree anymore. There's really no need to binary patch g_io_request, so let's get rid of the code (not compiled in anymore) lest others think it is a good idea.	2014-12-20 00:04:01 +00:00
John-Mark Gurney	08fca7a56b	Add some new modes to OpenCrypto. These modes are AES-ICM (can be used for counter mode), and AES-GCM. Both of these modes have been added to the aesni module. Included is a set of tests to validate that the software and aesni module calculate the correct values. These use the NIST KAT test vectors. To run the test, you will need to install a soon to be committed port, nist-kat that will install the vectors. Using a port is necessary as the test vectors are around 25MB. All the man pages were updated. I have added a new man page, crypto.7, which includes a description of how to use each mode. All the new modes and some other AES modes are present. It would be good for someone else to go through and document the other modes. A new ioctl was added to support AEAD modes which AES-GCM is one of them. Without this ioctl, it is not possible to test AEAD modes from userland. Add a timing safe bcmp for use to compare MACs. Previously we were using bcmp which could leak timing info and result in the ability to forge messages. Add a minor optimization to the aesni module so that single segment mbufs don't get copied and instead are updated in place. The aesni module needs to be updated to support blocked IO so segmented mbufs don't have to be copied. We require that the IV be specified for all calls for both GCM and ICM. This is to ensure proper use of these functions. Obtained from: p4: //depot/projects/opencrypto Relnotes: yes Sponsored by: FreeBSD Foundation Sponsored by: NetGate	2014-12-12 19:56:36 +00:00
Alexander Motin	1e68fe9c33	Avoid unneeded malloc/memcpy/free if there is no metadata on disk. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2014-12-05 10:23:18 +00:00
Alexander Motin	26f0f92fa2	Decode some binary fields of Intel metadata. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2014-12-04 15:54:45 +00:00
Warner Losh	66cc25a224	Actually, that was a bad idea. Go back to MAXPARTITIONS. Submitted by: bruce	2014-11-20 17:31:25 +00:00
Warner Losh	dd87e2c610	The number of BSD partitions is variable. Return the proper number (which is in basetable->gpt_entries). Submitted by: ae@	2014-11-19 18:55:27 +00:00
Warner Losh	73f49e9eef	Implement the historic DIOCGDINFO ioctl for gpart on BSD partitions. Several utilities still use this interface and require additional information since gpart was activated than before. This allows fsck of a UFS partition without having to specify it is UFS, per historic behavior.	2014-11-18 17:06:40 +00:00
Pawel Jakub Dawidek	5ebb15b942	Add missing privilege check when setting the dump device. Before that change it was possible for a regular user to setup the dump device if he had write access to the given device. In theory it is a security issue as user might get access to kernel's memory after provoking kernel crash, but in practise it is not recommended to give regular users direct access to storage devices. Rework the code so that we do privileges check within the set_dumper() function to avoid similar problems in the future. Discussed with: secteam	2014-11-11 04:48:09 +00:00
Dag-Erling Smørgrav	133cdd9e13	Constify the AES code and propagate to consumers. This allows us to update the Fortuna code to use SHAd-256 as defined in FS&K. Approved by: so (self)	2014-11-10 09:44:38 +00:00
Poul-Henning Kamp	cd15a01091	Translate the errno to gctl_error() texts. Spotted by: mwlucas	2014-11-09 15:52:11 +00:00
Alexander Motin	c3e7ba3e6d	Add to CTL support for logical block provisioning threshold notifications. For ZVOL-backed LUNs this allows to inform initiators if storage's used or available spaces get above/below the configured thresholds. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-11-06 00:48:36 +00:00
Alexander Motin	ccf8a5688a	Revert somewhat hackish geom_disk optimization, committed as part of r256880, and the following r273143 commit, supposed to workaround introduced issue by quite innocent-looking change. While there is no clear understanding why, but r273143 is accused in data corruption in some environments with high I/O load. I personally don't see any problem in that commit, and possibly it is just a trigger to some other bug somewhere, but better safe then sorry for now. Requested by: scottl@ MFC after: 3 days	2014-10-25 15:16:19 +00:00
Colin Percival	66427784c1	Populate the GELI passphrase cache with the kern.geom.eli.passphrase variable (if any) provided in the boot environment. Unset it from the kernel environment after doing this, so that the passphrase is no longer present in kernel memory once we enter userland. This will make it possible to provide a GELI passphrase via the boot loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD) will have support for this soon. Tested by: kmoore	2014-10-22 23:41:15 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Andrey V. Elsukov	52fa0beb0a	Add provider's sectorsize and stripesize to confdot output. Submitted by: rpokala at panasas.com	2014-10-17 06:58:04 +00:00
Davide Italiano	2be111bf7d	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
Andrey V. Elsukov	0478dc0c16	Add an ability to set dumpdev via loader(8) tunable. MFC after: 3 weeks	2014-10-08 12:18:16 +00:00
Hiroki Sato	d17183901f	Fix a bug in r272297 which prevented dumpdev from setting. !u is not equivalent to (u != 0).	2014-10-03 04:13:25 +00:00
Pawel Jakub Dawidek	227f68edbb	Be prepared that set_dumper() might fail even when resetting it or prefix the call with (void) to document that we intentionally ignore the return value - no way to handle an error in case of device disappearing.	2014-09-30 12:00:50 +00:00
Pawel Jakub Dawidek	7f5b50719b	Style fixes.	2014-09-30 11:51:32 +00:00
Colin Percival	835c4dd436	Cache GELI passphrases entered at the console during the boot process, in order to improve user-friendliness when a system has multiple disks encrypted using the same passphrase. When examining a new GELI provider, the most recently used passphrase will be attempted before prompting for a passphrase; and whenever a passphrase is entered, it is cached for later reference. When the root disk is mounted, the cached passphrase is zeroed (triggered by the "mountroot" event), in order to minimize the possibility of leakage of passphrases. (After root is mounted, the "taste and prompt for passphrases on the console" code path is disabled, so there is no potential for a passphrase to be stored after the zeroing takes place.) This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0. Reviewed by: pjd, dteske, allanjude MFC after: 7 days	2014-09-16 08:40:52 +00:00
Sean Bruno	5f23eb4d9c	Add device name used in geom_map verbose output. This helps when using geom_map with multiple flash/spi devices. Phabric: https://reviews.freebsd.org/D766 Reviewed by: adrian MFC after: 2 weeks	2014-09-11 22:39:27 +00:00
John-Mark Gurney	89fac384c8	use a straight buffer instead of an iov w/ 1 segment... The aesni driver when it hits a mbuf/iov buffer, it mallocs and copies the data for processing.. This improves perf by ~8-10% on my machine... I have thoughts of fixing AES-NI so that it can better handle segmented buffers, which should help improve IPSEC performance, but that is for the future...	2014-09-04 23:53:51 +00:00
Scott Long	274919e965	Deal explicitly with possible failures of make_dev_alias_p() in GEOM. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> MFC after: 3 days	2014-08-18 19:27:47 +00:00
Andrey V. Elsukov	36b16d1f7d	Turn off kern.geom.part.mbr.enforce_chs by default.	2014-08-12 10:31:31 +00:00
Andrey V. Elsukov	fb86534cb1	Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set by default. It can be used to disable automatic alignment to CHS geometry, that GEOM_PART_MBR does. Reviewed by: wblock MFC after: 1 week	2014-08-12 09:10:13 +00:00
Warner Losh	cba7d97b61	cswitch is unsigned, so don't compare it < 0. Any negative numbers will look huge and be caught by > 100.	2014-08-07 21:56:42 +00:00
Warner Losh	86e26cb154	Unsigned values can never be less than 0.	2014-08-07 21:56:37 +00:00
Marcel Moolenaar	6c25615f39	In r264504, we prevented doing I/O for more than MAXPHYS by making the assumption that consumers would respect bio_completed and/or bio_resid to detect short reads. This assumption proved false and file corruption was the result. Create as many bios as we need to satisfy the original request. Check the cached chunk every time we need to do I/O to increase the hit rate. Obtained from: junipre Networks, Inc. MFC after: 1 week	2014-07-22 17:30:05 +00:00
Nathan Whitehorn	1ee0f08975	After EFI support was added to the installer, it needed to allow boot partitions of types other than "freebsd-boot" (in particular, "efi"). This allows the removal of some nasty hacks for supporting PowerPC systems, in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific code on MBR. This changes the installer to use the correct names, which also breaks a degeneracy in the meaning of "freebsd-boot" that allows the addition of support for some newer IBM systems that can boot from GPT in addition to MBR. Since I have no idea how to detect which those systems are, leave the default on IBM PPC systems as MBR for now.	2014-07-04 15:55:32 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Andrey V. Elsukov	91ca76a590	Add disklabel64 support to GEOM_PART class. This partitioning scheme is used in DragonFlyBSD. It is similar to BSD disklabel, but has the following improvements: * metadata has own dedicated place and isn't accessible through partitions; * all offsets are 64-bit; * supports 16 partitions by default (has reserved place for more); * has reserved place for backup label (but not yet implemented); * has UUIDs for partitions and partition types; No objections from: geom MFC after: 2 weeks Relnotes: yes	2014-06-11 10:42:34 +00:00
Andrey V. Elsukov	4042ab48c7	Allow swapping to DragonFlyBSD's swap partition. MFC after: 2 weeks	2014-06-11 10:23:49 +00:00
Andrey V. Elsukov	0640b71dfe	Add aliases for DragonFlyBSD's partition types. MFC after: 2 weeks	2014-06-11 10:19:11 +00:00
Brad Davis	ebd05adab8	- Fix the keyfile being cleared prematurely after r259428 PR: 185084 Submitted by: fk@fabiankeil.de Reviewed by: pjd@	2014-06-06 03:17:37 +00:00
Andrey V. Elsukov	39dcac849e	Use g_conf_printf_escaped() to escape symbols, which can break an XML tree. MFC after: 1 week	2014-05-30 10:35:51 +00:00
Andrey V. Elsukov	17e0c43319	Add a topology trace to the g_spoil_event. MFC after: 1 week	2014-05-19 16:08:15 +00:00
Andrey V. Elsukov	362073c089	We have two functions from where a geom orphan method could be called: g_orphan_register and g_resize_provider_event. Both are called from the event queue. Also we have GEOM_DEV class, which does deferred destroy for its consumers via g_dev_destroy (also called from the event queue). So it is possible, that for some consumers an orphan method will be called twice. This triggers panic in g_dev_orphan. Check that consumer isn't already orphaned before call orphan method. MFC after: 2 weeks	2014-05-19 16:05:42 +00:00
Alexander Motin	413037c8e7	Make GEOM DISK to account also BIO_FLUSH operations.	2014-05-17 15:07:00 +00:00
Andrey V. Elsukov	579259ea0d	It is safe to allow shrinking, when aligned size is bigger than current. Tested by: jmg MFC after: 1 week	2014-05-07 11:18:27 +00:00
Edward Tomasz Napierala	c7c7d7d0f0	Make r242379 - the fix for UFS labels disappearing after resizing the provider - also apply to UFS1 filesystems. This should help with resizing filesystems created by makefs(8), which still uses UFS1. Tested by: jmg@ Sponsored by: The FreeBSD Foundation	2014-05-05 09:20:30 +00:00
Andrey V. Elsukov	4f31a94bd2	Add an advice what to do when partition was automatically resized. X-MFC after: r256690	2014-05-04 20:00:08 +00:00
Andrey V. Elsukov	c778397f26	Add better error description for case when we are doing resize and scheme-specific method returns EBUSY. MFC after: 1 week	2014-05-04 16:55:51 +00:00
Andrey V. Elsukov	0dd7f00cee	Prevent an unexpected shrinking on resizing due to alignment for MBR, PC98 and VTOC8 schemes. Reported by: jmg MFC after: 1 week	2014-05-04 16:43:57 +00:00
Andrey V. Elsukov	bc1e8f56ff	For schemes that do an automatic partition aligning move this code to separate function. MFC after: 1 week	2014-05-04 10:14:25 +00:00
Luiz Otavio O Souza	81694cde44	Fix a leak in g_uzip_taste(). After retrieve all the block offsets from the uzip image, free the last data read.	2014-05-01 15:23:20 +00:00
Luiz Otavio O Souza	ccb7284af1	Actually the FEATURE() macro is defined on sys/sysctl.h. Pointyhat to: loos	2014-05-01 14:59:04 +00:00
Luiz Otavio O Souza	6d8beede60	Some style and whitespace fixes. Reduce the difference between geom_uzip(4) and geom_uncompress(4). Now, they produce an almost clean diff(1) output. Remove a duplicated variable from g_uncompress.c and an unnecessary header from g_uzip.c. No functional changes.	2014-05-01 14:47:27 +00:00
Bryan Drewery	74679c6a99	Remove redundant include MFC after: 3 days	2014-04-29 01:17:43 +00:00
Alexander Motin	dea1e22600	Reduce number of opens by REOM RAID during provider taste. Instead opening/closing provider by each of metadata classes, do it only once in core code. Since for SCSI disks open/close means sending some SCSI commands to the device, this change reduces taste time. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-04-28 15:03:52 +00:00
Luiz Otavio O Souza	6f05733a1f	Keep geom_uncompress(4) in line with geom_uzip(4), bring in the r264504 fix. Make sure not to start I/O bigger than MAXPHYS bytes. Quoting r264504: When we detect the condition, we'll reduce the block count and perform a "short" read. In g_uncompress_done() we need to consider the original I/O length and stop early if we're about to deflate a block that we didn't read. By using bio_completed in the cloned BIO and not bio_length to check for this, we automatically and gracefully handle short reads that our providers may be doing on top of the short reads we may initiate ourselves. Reviewed by: marcel	2014-04-22 18:08:34 +00:00
Marcel Moolenaar	855be5b2c1	Make sure not to do I/O for more than MAXPHYS bytes. Doing so can cause problems in our providers, such as a KASSERT in md(4). We can initiate I/O for more than MAXPHYS bytes if we've been given a BIO for MAXPHYS bytes, the blocks from which we're reading couldn't be compressed and we had compression in preceeding blocks resulting in misalignment of the blocks we're trying to read relative to the sector. We're forced to round up the I/O length to make it an multiple of the sector size. When we detect the condition, we'll reduce the block count and perform a "short" read. In g_uzip_done() we need to consider the original I/O length and stop early if we're about to deflate a block that we didn't read. By using bio_completed in the cloned BIO and not bio_length to check for this, we automatically and gracefully handle short reads that our providers may be doing on top of the short reads we may initiate ourselves. Obtained from: Juniper Networks, Inc.	2014-04-15 15:41:57 +00:00
Bryan Drewery	87bc328d63	Make g_access() KASSERT() more useful. Sponsored by: EMC / Isilon Storage Division Obtained from: Isilon OneFS MFC after: 2 weeks	2014-04-15 14:41:41 +00:00
Marcel Moolenaar	4787115d04	Align and round the partitionable disk space to 4K by default. Since this would also apply when recovering, make sure not to align or round when that would have a partition fall outside the partitionable area.	2014-04-12 20:28:39 +00:00
Bryan Drewery	1e4b22b44b	Fix spelling error in g_trace() call. Sponsored by: EMC / Isilon Storage Division MFC after: 1 week	2014-04-10 17:00:44 +00:00
Alexander Motin	1229e83d2b	Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields. This caused incorrect behavior of arrays with big-endian DDF metadata. Little-endian (like used by Adaptec controllers) should not be harmed. Add workaround should be enough to manage compatibility. MFC after: 2 weeks	2014-04-10 16:00:33 +00:00
Alexander Motin	66b92c07fe	Do not increment bio_data in case of BIO_DELETE. This fixes KASSERT() panic in g_io_request().	2014-04-10 10:12:56 +00:00
Marcel Moolenaar	e8c166e85a	An all-or-nothing approach to labels isn't flexible enough. Embedded systems need fine-grained control over what's in and what's out. That's ideal. For now, separate GPT labels from the rest and allow g_label to be built with just GPT labels. Obtained from: Juniper Networks, Inc.	2014-04-06 02:44:37 +00:00
Marcel Moolenaar	12b2d77da9	Make sure we don't free memory that's already been freed by setting the geom->softc pounter to NULL before freeing the g_slicer softc. In g_slicer_free() the pointer is checked first. Obtained from: Juniper Networks, Inc.	2014-04-06 02:20:42 +00:00
Bryan Drewery	09adfca39f	Show error code when failing to destroy a mirror on delay Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-04-05 03:01:29 +00:00
Xin LI	c35ddb346f	In g_eli_crypto_hmac_init(), zero out after using the ipad buffer, k_ipad. Note that the two consumers in geli(4) are not affected by this issue because the way the code is constructed and as such, we believe there is no security impact with or without this change with geli(4)'s usage. Reported by: Serge van den Boom <serge vdboom.org> Reviewed by: pjd MFC after: 2 weeks	2014-02-08 05:17:49 +00:00
Luiz Otavio O Souza	d9ffbff9f0	Fix the build with DEBUG enabled. Where possible, fix style(9) issues. Reviewed by: bde Approved by: adrian (mentor)	2014-02-07 13:06:48 +00:00
Luiz Otavio O Souza	f0d701f048	Fix a logic error. Because of this inflateReset() wasn't being called and the output buffer wasn't being cleared between the inflate() calls, producing zeroed output after the first inflate() call. This fixes the read of mkuzip(8) images with geom_uncompress(4). Reviewed by: ray Approved by: adrian (mentor)	2014-02-03 17:25:36 +00:00
Luiz Otavio O Souza	c2d90f35d5	Remove some unnecessary code. The offsets read from the first block are overwritten a few lines bellow. Reviewed by: ray Approved by: adrian (mentor)	2014-02-03 17:21:36 +00:00
Andrey V. Elsukov	524d7a4d4e	Always free sbuf in gctl_free(). MFC after: 1 week	2014-01-23 21:30:31 +00:00
Andrey V. Elsukov	d14a7ff1f5	Remove another unneeded NULL check from geom_alloc_copyin(). Do copyout in case of gctl version mismatch and fix sbuf leak in g_ctl_ioctl_ctl(). MFC after: 1 week	2014-01-23 20:25:38 +00:00
Andrey V. Elsukov	7f0e13dfe0	In gctl_copyin() remove unused error variable. geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad control request. Add check for NULL pointer in gctl_dump(), since it can be NULL when geom_alloc_copyin() failed. MFC after: 1 week	2014-01-23 19:55:02 +00:00
Andrey V. Elsukov	625ee733e3	Fix typo in r261084. Add to the gctl_error() an ability to specify error description even if numeric error code is already specified. Also by default set error code to EINVAL. PR: 185852 MFC after: 1 week	2014-01-23 19:31:17 +00:00
Andrey V. Elsukov	ee839ce84c	malloc() with M_WAITOK doesn't return NULL. MFC after: 1 week	2014-01-23 19:07:22 +00:00
Alexander Motin	eaed60f737	Removed unneeded and dangerous assignment. It would probably cause NULL refererence panic if compiler not optimize it out. Found with: Clang static analyzer MFC after: 2 weeks	2014-01-19 16:37:57 +00:00
Luiz Otavio O Souza	67619a4120	Build the geom_uncompress(4) module by default. Fix geom_uncompress(4) module loading. Don't link zlib.c (which is a module itself) directly. The built module was verified and used to read a few mkulzma(8) images on amd64 to validate some of the informations on the manual page. While here, don't overwrite CFLAGS. Reviewed by: ray Approved by: adrian (mentor)	2014-01-10 20:29:46 +00:00
Andrey V. Elsukov	ae3bc0acff	Add an ability to stop gmirror and clear its metadata in one command. This fixes the problem, when gmirror starts again just after stop. The problem occurs when gmirror's component has geom label with equal size. E.g. gpt and gptid have the same size as partition, diskid has the same size as entire disk. When gmirror's geom has been destroyed, glabel creates its providers and this initiate retaste. Now "gmirror destroy" command is available. It destroys geom and also erases gmirror's metadata. MFC after: 2 weeks	2013-12-27 02:43:53 +00:00
Dmitry Morozovsky	5cc596c46d	Add GPT UUID for VMware vSAN meta-data partition. Approved by: ae MFC after: 2 weeks	2013-12-26 21:06:12 +00:00
Andrey V. Elsukov	7c5710dbaf	Prevent users from deactivating the last component of a mirror. PR: 184985 MFC after: 1 week	2013-12-19 22:13:12 +00:00
Pawel Jakub Dawidek	396b29c74e	Clear some more places with potentially sensitive data. MFC after: 1 week	2013-12-15 22:52:18 +00:00
Pawel Jakub Dawidek	2a3237c84f	Clear content of keyfiles loaded by the loader after processing them. Pointed out by: rwatson MFC after: 1 week	2013-12-15 22:51:26 +00:00
Alexander Motin	2634da8cd5	Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media. Bisected by: pho	2013-12-12 08:23:28 +00:00
Justin Hibbits	6cec74b2e4	Partially revert r259080. bde@ pointed out that there are a lot more style bugs going on in here than can be fixed, and I introduced some of my own. Rather than fix the whole host of them, back out my bugs. Found by: bde X-MFC with: r259080	2013-12-08 09:34:56 +00:00
Justin Hibbits	8991c54091	Fix some integer signs. These unsigned integers should all be signed. Found by: clang (powerpc64)	2013-12-07 19:55:34 +00:00
Eitan Adler	7a22215c53	Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this shifts into the sign bit. Instead use (1U << 31) which gets the expected result. This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases. A similar change was made in OpenBSD. Discussed with: -arch, rdivacky Reviewed by: cperciva	2013-11-30 22:17:27 +00:00
Alexander Motin	7ae1a87bfe	Escape special XML chars, returned by some devices, confusing XML parsers. MFC after: 1 month	2013-11-27 14:25:06 +00:00
Marcel Moolenaar	3e5a0a6b70	Have the GPT probe return a lower priority when the MBR is not a PMBR The purpose of the PMBR is to have the disk appear in use to GPT unaware utilities (like fdisk). However, if the PMBR has been changed by a GPT unaware utlity then we must assume that this was deliberate (as it involved removal of the special slice) and we should not treat the unmodified GPT-specific sectors as being valid. By lowering the probe priority in that case, the MBR scheme will take precedence and the kernel will end up using the MBR and not the GPT. We will still use the GPT if the kernel does not support the MBR scheme.	2013-11-21 22:02:59 +00:00
Andrey V. Elsukov	32cea4ca0f	Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4). Now it is easy to expand the size of the mirror when all its components are replaced. Also add g_resize method to geom_mirror class. It will write updated metadata to new last sector, when parent provider is resized. Silence from: geom@ MFC after: 1 month	2013-11-19 22:55:17 +00:00
Alexander Motin	f8c79813cb	In addition to r258220 allow shrinking in "automatic" mode if there is already valid metadata found at the new location. This should allow easy transparent recovery if first resize was done by mistake. While there, unify metadata write code and fix minor memory leak. MFC after: 1 month	2013-11-17 05:38:54 +00:00
Alexander Motin	e6afd72b93	Implement automatic live resize support for GEOM MULTIPATH class. In "manual" mode just automatically resize provider in any direction. In "automatic" mode allow only growth (with new metadata write); in case of shrinking destroy the multipath device same as before since it may be undesirable to write new metadata within old user area. MFC after: 1 month	2013-11-16 14:31:49 +00:00
Andrey V. Elsukov	743437c451	Add missing line breaks. PR: 181900 MFC after: 1 week	2013-11-11 11:13:12 +00:00
Xin LI	7ac2e58818	When zero'ing out a buffer, make sure we are using right size. Without this change, in the worst but unlikely case scenario, certain administrative operations, including change of configuration, set or delete key from a GEOM ELI provider, may leave potentially sensitive information in buffer allocated from kernel memory. We believe that it is not possible to actively exploit these issues, nor does it impact the security of normal usage of GEOM ELI providers when these operations are not performed after system boot. Security: possible sensitive information disclosure Submitted by: Clement Lecigne <clecigne google com> MFC after: 3 days	2013-11-02 01:16:10 +00:00
John Baldwin	d6d78db57f	Reject attempts to attack a disk device that has the old NEEDSGIANT flag set. Reviewed by: mav	2013-10-25 19:19:12 +00:00
Steven Hartland	c28078e903	Improve ZFS N-way mirror read performance by using load and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay	2013-10-23 09:54:58 +00:00
Mateusz Guzik	aa25ccfa36	gnop: make sure that newly allocated memory for softc is zeroed This prevents mtx_init from encountering non-zeros and panicking the kernel as a result. Reported by: Keith White <kwhite site.uottawa.ca>	2013-10-23 01:34:18 +00:00
Alexander Motin	1a29adad30	Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9). Since at least FreeBSD 7 we had only four of them in the base tree, and in head branch, thanks to jhb@, we have no any for more then a year.	2013-10-22 10:21:20 +00:00
Alexander Motin	40ea77a036	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
Edward Tomasz Napierala	fb0e57b1a2	Fix build with gcc by spelling unused format string as "unused" instead of NULL. MFC after: 29 days	2013-10-19 08:20:00 +00:00
Edward Tomasz Napierala	19e5b2d50e	Make geom_label(4) resize-aware. This fixes a situation when "gpart resize" would resize a partition, but label providers - e.g. /dev/gptid/XXX - would stay the same size. Reviewed by: mav MFC after: 1 month Sponsored by: FreeBSD Foundation	2013-10-18 09:14:19 +00:00

... 2 3 4 5 6 ...

2199 Commits