freebsd-skq

Author	SHA1	Message	Date
ken	68b33bd397	Fix a bug that caused da(4) instances to hang around after the underlying device is gone. The problem was that when disk_gone() is called, if the GEOM disk creation process has not yet happened, the withering process couldn't start. We didn't record any state in the GEOM disk code, and so the d_gone() callback to the da(4) driver never happened. The solution is to track the state of the creation process, and initiate the withering process from g_disk_create() if the disk is being created. This change does add fields to struct disk, and so I have bumped DISK_VERSION. geom_disk.c: Track where we are in the disk creation process, and check to see whether our underlying disk has gone away or not. In disk_gone(), set a new d_goneflag variable that g_disk_create() can check to see if it needs to clean up the disk instance. geom_disk.h: Add a mutex to struct disk (for internal use) disk init level, and a gone flag. Bump DISK_VERSION because the size of struct disk has changed and fields have been added at the beginning. Sponsored by: Spectra Logic Approved by: re (marius)	2016-06-21 20:18:19 +00:00
glebius	7461cb05f9	When we are in panic, always go the asynchronous path in g_mirror_destroy(), otherwise the system will hang. This is a temporarily least intrusive crutch to get certain panicing systems dumping. The proper fix should question is g_mirror_destroy() should be called on a panicing system at all. Discussed with: mav	2016-06-01 22:11:54 +00:00
asomers	6836baf662	Avoid issuing spa config updates for physical path when not necessary ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack. sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged Submitted by: will, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6428	2016-05-27 22:32:44 +00:00
kib	e97aca5af0	Remove unneeded Giant locking around kthreads creation. Sponsored by: The FreeBSD Foundation	2016-05-20 08:28:11 +00:00
kib	f05c84067d	Removal of Giant droping wrappers for GEOM classes. Sponsored by: The FreeBSD Foundation	2016-05-20 08:25:37 +00:00
kib	0ffe69ace2	Remove asserts that Giant is not held on entrance into geom KPI, which outlived their usefulness. This allows to remove drop/pickup Giant wrappers around GEOM calls. Discussed with: alfred, imp, phk Sponsored by: The FreeBSD Foundation	2016-05-20 08:22:20 +00:00
ken	7eeed3c838	Add support for managing Shingled Magnetic Recording (SMR) drives. This change includes support for SCSI SMR drives (which conform to the Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to the Zoned ATA Command Set or ZAC spec) behind SAS expanders. This includes full management support through the GEOM BIO interface, and through a new userland utility, zonectl(8), and through camcontrol(8). This is now ready for filesystems to use to detect and manage zoned drives. (There is no work in progress that I know of to use this for ZFS or UFS, if anyone is interested, let me know and I may have some suggestions.) Also, improve ATA command passthrough and dispatch support, both via ATA and ATA passthrough over SCSI. Also, add support to camcontrol(8) for the ATA Extended Power Conditions feature set. You can now manage ATA device power states, and set various idle time thresholds for a drive to enter lower power states. Note that this change cannot be MFCed in full, because it depends on changes to the struct bio API that break compatilibity. In order to avoid breaking the stable API, only changes that don't touch or depend on the struct bio changes can be merged. For example, the camcontrol(8) changes don't depend on the new bio API, but zonectl(8) and the probe changes to the da(4) and ada(4) drivers do depend on it. Also note that the SMR changes have not yet been tested with an actual SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT layer, so any testing help would be appreciated. These changes have been tested with Seagate Host Aware SATA drives attached to both SAS and SATA controllers. Also, I do not have any SATA Host Managed devices, and I suspect that it may take additional (hopefully minor) changes to support them. Thanks to Seagate for supplying the test hardware and answering questions. sbin/camcontrol/Makefile: Add epc.c and zone.c. sbin/camcontrol/camcontrol.8: Document the zone and epc subcommands. sbin/camcontrol/camcontrol.c: Add the zone and epc subcommands. Add auxiliary register support to build_ata_cmd(). Make sure to set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA flags as appropriate for ATA commands. Add a new get_ata_status() function to parse ATA result from SCSI sense descriptors (for ATA passthrough over SCSI) and ATA I/O requests. sbin/camcontrol/camcontrol.h: Update the build_ata_cmd() prototype Add get_ata_status(), zone(), and epc(). sbin/camcontrol/epc.c: Support for ATA Extended Power Conditions features. This includes support for all features documented in the ACS-4 Revision 12 specification from t13.org (dated February 18, 2016). The EPC feature set allows putting a drive into a power power mode immediately, or setting timeouts so that the drive will automatically enter progressively lower power states after various idle times. sbin/camcontrol/fwdownload.c: Update the firmware download code for the new build_ata_cmd() arguments. sbin/camcontrol/zone.c: Implement support for Shingled Magnetic Recording (SMR) drives via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA Command Set (ZAC). These specs were developed in concert, and are functionally identical. The primary differences are due to SCSI and ATA differences. (SCSI is big endian, ATA is little endian, for example.) This includes support for all commands defined in the ZBC and ZAC specs. sys/cam/ata/ata_all.c: Decode a number of additional ATA command names in ata_op_string(). Add a new CCB building function, ata_read_log(). Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building functions. These support both DMA and NCQ encapsulation. sys/cam/ata/ata_all.h: Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and ata_zac_mgmt_in(). sys/cam/ata/ata_da.c: Revamp the ada(4) driver to support zoned devices. Add four new probe states to gather information needed for zone support. Add a new adasetflags() function to avoid duplication of large blocks of flag setting between the async handler and register functions. Add new sysctl variables that describe zone support and paramters. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. sys/cam/scsi/scsi_all.c: Add command descriptions for the ZBC IN/OUT commands. Add descriptions for ZBC Host Managed devices. Add a new function, scsi_ata_pass() to do ATA passthrough over SCSI. This will eventually replace scsi_ata_pass_16() -- it can create the 12, 16, and 32-byte variants of the ATA PASS-THROUGH command, and supports setting all of the registers defined as of SAT-4, Revision 5 (March 11, 2016). Change scsi_ata_identify() to use scsi_ata_pass() instead of scsi_ata_pass_16(). Add a new scsi_ata_read_log() function to facilitate reading ATA logs via SCSI. sys/cam/scsi/scsi_all.h: Add the new ATA PASS-THROUGH(32) command CDB. Add extended and variable CDB opcodes. Add Zoned Block Device Characteristics VPD page. Add ATA Return SCSI sense descriptor. Add prototypes for scsi_ata_read_log() and scsi_ata_pass(). sys/cam/scsi/scsi_da.c: Revamp the da(4) driver to support zoned devices. Add five new probe states, four of which are needed for ATA devices. Add five new sysctl variables that describe zone support and parameters. The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC devices when they are attached via a SCSI to ATA Translation (SAT) layer. Since ZBC -> ZAC translation is a new feature in the T10 SAT-4 spec, most SATA drives will be supported via ATA commands sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will prefer the ZBC interface, if it is available, for performance reasons, but will use the ATA PASS-THROUGH interface to the ZAC command set if the SAT layer doesn't support translation yet. As I mentioned above, ZBC command support is untested. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. Add scsi_zbc_in() and scsi_zbc_out() CCB building functions. Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB building functions. Note that these have return values, unlike almost all other CCB building functions in CAM. The reason is that they can fail, depending upon the particular combination of input parameters. The primary failure case is if the user wants NCQ, but fails to specify additional CDB storage. NCQ requires using the 32-byte version of the SCSI ATA PASS-THROUGH command, and the current CAM CDB size is 16 bytes. sys/cam/scsi/scsi_da.h: Add ZBC IN and ZBC OUT CDBs and opcodes. Add SCSI Report Zones data structures. Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and scsi_ata_zac_mgmt_in() prototypes. sys/dev/ahci/ahci.c: Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver. ahci_setup_fis() previously set the top bits of the sector count register in the FIS to 0 for FPDMA commands. This is okay for read and write, because the PRIO field is in the only thing in those bits, and we don't implement that further up the stack. But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that byte, so it needs to be transmitted to the drive. In ahci_setup_fis(), always set the the top 8 bits of the sector count register. We need it in both the standard and NCQ / FPDMA cases. sys/geom/eli/g_eli.c: Pass BIO_ZONE commands through the GELI class. sys/geom/geom.h: Add g_io_zonecmd() prototype. sys/geom/geom_dev.c: Add new DIOCZONECMD ioctl, which allows sending zone commands to disks. sys/geom/geom_disk.c: Add support for BIO_ZONE commands. sys/geom/geom_disk.h: Add a new flag, DISKFLAG_CANZONE, that indicates that a given GEOM disk client can handle BIO_ZONE commands. sys/geom/geom_io.c: Add a new function, g_io_zonecmd(), that handles execution of BIO_ZONE commands. Add permissions check for BIO_ZONE commands. Add command decoding for BIO_ZONE commands. sys/geom/geom_subr.c: Add DDB command decoding for BIO_ZONE commands. sys/kern/subr_devstat.c: Record statistics for REPORT ZONES commands. Note that the number of bytes transferred for REPORT ZONES won't quite match what is received from the harware. This is because we're necessarily counting bytes coming from the da(4) / ada(4) drivers, which are using the disk_zone.h interface to communicate up the stack. The structure sizes it uses are slightly different than the SCSI and ATA structure sizes. sys/sys/ata.h: Add many bit and structure definitions for ZAC, NCQ, and EPC command support. sys/sys/bio.h: Convert the bio_cmd field to a straight enumeration. This will yield more space for additional commands in the future. After change r297955 and other related changes, this is now possible. Converting to an enumeration will also prevent use as a bitmask in the future. sys/sys/disk.h: Define the DIOCZONECMD ioctl. sys/sys/disk_zone.h: Add a new API for managing zoned disks. This is very close to the SCSI ZBC and ATA ZAC standards, but uses integers in native byte order instead of big endian (SCSI) or little endian (ATA) byte arrays. This is intended to offer to the complete feature set of the ZBC and ZAC disk management without requiring the application developer to include SCSI or ATA headers. We also use one set of headers for ioctl consumers and kernel bio-level consumers. sys/sys/param.h: Bump __FreeBSD_version for sys/bio.h command changes, and inclusion of SMR support. usr.sbin/Makefile: Add the zonectl utility. usr.sbin/diskinfo/diskinfo.c Add disk zoning capability to the 'diskinfo -v' output. usr.sbin/zonectl/Makefile: Add zonectl makefile. usr.sbin/zonectl/zonectl.8 zonectl(8) man page. usr.sbin/zonectl/zonectl.c The zonectl(8) utility. This allows managing SCSI or ATA zoned disks via the disk_zone.h API. You can report zones, reset write pointers, get parameters, etc. Sponsored by: Spectra Logic Differential Revision: https://reviews.freebsd.org/D6147 Reviewed by: wblock (documentation)	2016-05-19 14:08:36 +00:00
jhb	bcc5b0c55d	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
sobomax	76759cc2cf	Add missing include "opt_geom.h" to make GEOM_UZIP_DEBUG option working, also rename enum member so it does not conflict with GEOM_UZIP option name. Submitted by: mizhka@gmail.com Differential Revision: https://reviews.freebsd.org/D6207	2016-05-06 20:32:39 +00:00
pfg	e72339bbf0	sys: Make use of our rounddown() macro when sys/param.h is available. No functional change.	2016-04-30 14:41:18 +00:00
pfg	fafa173c28	sys/geom: spelling fixes in comments. No functional change.	2016-04-29 20:56:58 +00:00
pfg	586d106e19	sys/geom: spelling fixes. These affect debugging messages. MFC after: 2 weeks	2016-04-28 19:26:46 +00:00
pfg	863c16cbbd	geom: unsign some types to match their definitions and avoid overflows. In struct:gctl_req, nargs is unsigned. In mirror: g_mirror_syncreqs is unsigned. In raid: in struct:g_raid_volume, v_disks_count is unsigned. In virstor: in struct:g_virstor_softc, n_components is unsigned. MFC after: 2 weeks	2016-04-27 15:10:40 +00:00
cem	8bb71df062	g_part_bsd64: Delete duplicate/dead code RAW_PART is handled earlier in the loop. Reported by: Coverity CID: 1223201 Sponsored by: EMC / Isilon Storage Division	2016-04-26 22:32:33 +00:00
cem	87aa845246	g_part_bsd64: Check for valid on-disk npartitions value This value is u32 on disk, but assigned to an int in memory. After we do the implicit conversion via assignment, check that the result is at least one[1] (non-negative[2]). 1. The subsequent for-loop iterates from gpt_entries minus one, down, until reaching zero. A negative or zero initial index results in undefined signed integer overflow. 2. It is also used to index into arrays later. In practice, we expected non-malicious disks to contain small positive values. Reported by: Coverity CID: 1223202 Sponsored by: EMC / Isilon Storage Division	2016-04-26 22:30:54 +00:00
pfg	fc01419148	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
sobomax	baab3f7672	Relax TOC offsets checking somewhat, allowing offset pointing to the next byte past EOF to denote zero-block(s) at the very end of the file.	2016-04-26 06:50:38 +00:00
sobomax	577d7115d6	o Fix handling of images with compression block sizes comparable to MAXPHYS. o Improve debug somewhat; o Convert "BUG BUG BUG message" into a proper KASSERT.	2016-04-23 06:31:46 +00:00
asomers	289cd3b2aa	DRY on buffer sizes. Update to r298420. sys/geom/geom_disk.c: In disk_attr_changed, don't repeat a buffer size. Reported by: ngie, hselasky MFC after: 4 weeks X-MFC-With: 298420 Sponsored by: Spectra Logic Corp	2016-04-21 21:13:41 +00:00
pfg	729533413f	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
asomers	5ffd1c3c24	Notify userspace listeners when geom disk attributes have changed sys/geom/geom_disk.c: disk_attr_changed(): Generate a devctl event of type GEOM:<attr> for every call. MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5952	2016-04-21 16:43:15 +00:00
pfg	32dcf3933a	Indentation issues. Contract some lines leftover from r298310. Mea culpa.	2016-04-20 16:19:44 +00:00
pfg	a7d40a88c9	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
pfg	a954b9061c	g_gate: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:18:07 +00:00
imp	19f97c184f	Bump bio_cmd and bio_*flags from 8 bits to 16. Differential Revision: https://reviews.freebsd.org/D5784	2016-04-14 05:10:41 +00:00
pfg	b63211eed5	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
allanjude	9554f17afb	Create the GELIBOOT GEOM_ELI flag This flag indicates that the user wishes to use the GELIBOOT feature to boot from a fully encrypted root file system. Currently, GELIBOOT does not support key files, and in the future when it does, they will be loaded differently. Due to the design of GELI, and the desire for secrecy, the GELI metadata does not know if key files are used or not, it just adds the key material (if any) to the HMAC before the optional passphrase, so there is no way to tell if a GELI partition requires key files or not. Since the GELIBOOT code in boot2 and the loader does not support keys, they will now only attempt to attach if this flag is set. This will stop GELIBOOT from prompting for passwords to GELIs that it cannot decrypt, disrupting the boot process PR: 208251 Reviewed by: ed, oshogbo, wblock Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D5867	2016-04-08 01:25:25 +00:00
pfg	858326e785	g_sched_destroy(): prevent return of uninitialized scalar variable. For the !gsp case there some chance of returning an uninitialized return value. Prevent that from happening by initializing the error value. CID: 1006421	2016-04-03 16:25:51 +00:00
imp	79cf687df7	Don't assume that bio_cmd is a bit mask. Differential Revision: https://reviews.freebsd.org/D5592	2016-03-10 06:25:39 +00:00
imp	cfe83b0802	Don't assume that bio_cmd is bit mask. Differential Revision: https://reviews.freebsd.org/D5593	2016-03-10 06:25:31 +00:00
adrian	b4d231f895	Fixes to make it compile under gcc-4.2.	2016-02-24 02:52:49 +00:00
sobomax	85ce861e46	Obsolete mkulzma(8) and geom_uncompress(4), their functionality is now provided by mkuzip(8) and geom_uzip(4) respectively. MFC after: 1 month	2016-02-24 00:39:36 +00:00
sobomax	8abe971b5e	Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333	2016-02-23 23:59:08 +00:00
imp	edbfaba7f5	Use the right size for zeroing. Submitted by: rpokala@	2016-02-17 18:28:38 +00:00
imp	0bfb5dbc86	Create an API to reset a struct bio (g_reset_bio). This is mandatory for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI.	2016-02-17 17:16:02 +00:00
adrian	4f6113524f	Teach the flashmap code about the SPI flash. PR: kern/206227 Submitted by: Stanislav Galabov <sgalabov@gmail.com>	2016-01-23 05:26:29 +00:00
rpokala	b44c9b2761	Add rotationrate to geom disk dumpconf Parse and report the nominal rotation rate reported by the drive. Reviewed by: sbruno, jhb Approved by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D4483 Requested by: Kevin Bowling < kevin.bowling @ kev009.com >	2016-01-14 21:52:21 +00:00
allanjude	19cde117cf	Make additional parts of sys/geom/eli more usable in userspace The upcoming GELI support in the loader reuses parts of this code Some ifdefs are added, and some code is moved outside of existing ifdefs The HMAC parts of GELI are broken out into their own file, to separate them from the kernel crypto/openssl dependant parts that are replaced in the boot code. Passed the GELI regression suite (tools/regression/geom/eli) Files=20 Tests=14996 Result: PASS Reviewed by: pjd, delphij MFC after: 1 week Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D4699	2016-01-07 05:47:34 +00:00
allanjude	fbff46d824	Add some additional GPT partition types 4 ChromeOS GPT types 2 Microsoft partition types the new OpenBSD partition type Approved by: marcel (mentor) MFC after: 1 week Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3841	2015-12-27 18:12:13 +00:00
allanjude	c7c2f2dfab	Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c cperciva's libmd implementation is 5-30% faster The same was done for SHA256 previously in r263218 cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation Extend sbin/md5 to create sha384(1) Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h} Reviewed by: cperciva, des, delphij Approved by: secteam, bapt (mentor) MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3929	2015-12-27 17:33:59 +00:00
allanjude	e7bdbef729	Fix incorrect error message in geom map If geom_map fails to find the end of a mapped partition based on a search, it would return the incorrect error message, stating it could not parse the START value Reviewed by: adrian Approved by: bapt (mentor) Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D4187	2015-12-27 17:09:23 +00:00
imp	6e6c8135d3	It turns out that it's OK to sleep in this context, so use M_WAITOK for the softc for the delay module. Noticed by: rpokala@	2015-12-18 14:10:00 +00:00
imp	59c25660ce	Scheduling module to introduce a fixed delay into the I/O path.	2015-12-18 05:39:25 +00:00
smh	4bf76e39ad	Prevent g_access calls to bad multipath members When a multipath member is orphaned its access members are zeroed before its removed if marked for wither, so prevent any future calls to g_access on such members. This prevents a panic on debug kernels which validates the resultant values aren't negative. Reviewed by: mav MFC after: 2 weeks Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4416	2015-12-15 21:11:41 +00:00
ae	088c9d3832	Make detection of GPT a bit more reliable. When we are detecting a partition table and didn't find PMBR, try to read backup GPT header from the last sector and if it is correct, assume that we have GPT. Reviewed by: rpokala MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D4282	2015-12-10 10:35:07 +00:00
ken	5a6402b7b4	Fix a style issue in g_disk_limit(). Noticed by: bdrewery MFC after: 1 week	2015-12-04 03:44:12 +00:00
ken	134df9b39d	Fix g_disk_vlist_limit() to work properly with deletes. Add a new bp argument to g_disk_maxsegs(), and add a new function, g_disk_maxsize() tha will properly determine the maximum I/O size for a delete or non-delete bio. Submitted by: will MFC after: 1 week Sponsored by: Spectra Logic	2015-12-04 03:38:35 +00:00
ken	d0f081c521	Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility. CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed. While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations. Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. 2. Add an ioctl to list currently outstanding CCBs in the various queues. 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that. 4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists. 5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it is something that could be changed later on. Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface. This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface. It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices. It can also copy data to and from regular files, block devices, tape devices, pipes, stdin, and stdout. It does not support queueing multiple commands to any of those targets, since it uses the standard read(2)/write(2)/writev(2)/readv(2) system calls. The I/O is done by two threads, one for the reader and one for the writer. The reader thread sends completed read requests to the writer thread in strictly sequential order, even if they complete out of order. That could be modified later on for random I/O patterns or slightly out of order I/O. camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from the pass(4) driver and also to send request notifications internally. For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) per CAM CCB on the reading side, and a scatter/gather list (CAM_DATA_SG) on the writing side. In addition to testing both interfaces, this makes any potential reblocking of I/O easier. No data is copied between the reader and the writer, but rather the reader's buffers are split into multiple I/O requests or combined into a single I/O request depending on the input and output blocksize. For the file I/O path, camdd(8) also uses a single buffer (read(2), write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list (readv(2), writev(2), preadv(2), pwritev(2)) on writes. Things that would be nice to do for camdd(8) eventually: 1. Add support for I/O pattern generation. Patterns like all zeros, all ones, LBA-based patterns, random patterns, etc. Right Now you can always use /dev/zero, /dev/random, etc. 2. Add support for a "sink" mode, so we do only reads with no writes. Right now, you can use /dev/null. 3. Add support for automatic queue depth probing, so that we can figure out the right queue depth on the input and output side for maximum throughput. At the moment it defaults to 6. 4. Add support for SATA device passthrough I/O. 5. Add support for random LBAs and/or lengths on the input and output sides. 6. Track average per-I/O latency and busy time. The busy time and latency could also feed in to the automatic queue depth determination. sys/cam/scsi/scsi_pass.h: Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue and fetch asynchronous CAM CCBs respectively. Although these ioctls do not have a declared argument, they both take a union ccb pointer. If we declare a size here, the ioctl code in sys/kern/sys_generic.c will malloc and free a buffer for either the CCB or the CCB pointer (depending on how it is declared). Since we have to keep a copy of the CCB (which is fairly large) anyway, having the ioctl malloc and free a CCB for each call is wasteful. sys/cam/scsi/scsi_pass.c: Add asynchronous CCB support. Add two new ioctls, CAMIOQUEUE and CAMIOGET. CAMIOQUEUE adds a CCB to the incoming queue. The CCB is executed immediately (and moved to the active queue) if it is an immediate CCB, but otherwise it will be executed in passstart() when a CCB is available from the transport layer. When CCBs are completed (because they are immediate or passdone() if they are queued), they are put on the done queue. If we get the final close on the device before all pending I/O is complete, all active I/O is moved to the abandoned queue and we increment the peripheral reference count so that the peripheral driver instance doesn't go away before all pending I/O is done. The new passcreatezone() function is called on the first call to the CAMIOQUEUE ioctl on a given device to allocate the UMA zones for I/O requests and S/G list buffers. This may be good to move off to a taskqueue at some point. The new passmemsetup() function allocates memory and scatter/gather lists to hold the user's data, and copies in any data that needs to be written. For virtual pointers (CAM_DATA_VADDR), the kernel buffer is malloced from the new pass(4) driver malloc bucket. For virtual scatter/gather lists (CAM_DATA_SG), buffers are allocated from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. Physical pointers are passed in unchanged. We have support for up to 16 scatter/gather segments (for the user and kernel S/G lists) in the default struct pass_io_req, so requests with longer S/G lists require an extra kernel malloc. The new passcopysglist() function copies a user scatter/gather list to a kernel scatter/gather list. The number of elements in each list may be different, but (obviously) the amount of data stored has to be identical. The new passmemdone() function copies data out for the CAM_DATA_VADDR and CAM_DATA_SG cases. The new passiocleanup() function restores data pointers in user CCBs and frees memory. Add new functions to support kqueue(2)/kevent(2): passreadfilt() tells kevent whether or not the done queue is empty. passkqfilter() adds a knote to our list. passreadfiltdetach() removes a knote from our list. Add a new function, passpoll(), for poll(2)/select(2) to use. Add devstat(9) support for the queued CCB path. sys/cam/ata/ata_da.c: Add support for the BIO_VLIST bio type. sys/cam/cam_ccb.h: Add a new enumeration for the xflags field in the CCB header. (This doesn't change the CCB header, just adds an enumeration to use.) sys/cam/cam_xpt.c: Add a new function, xpt_setup_ccb_flags(), that allows specifying CCB flags. sys/cam/cam_xpt.h: Add a prototype for xpt_setup_ccb_flags(). sys/cam/scsi/scsi_da.c: Add support for BIO_VLIST. sys/dev/md/md.c: Add BIO_VLIST support to md(4). sys/geom/geom_disk.c: Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size limiting code in g_disk_start() a bit. sys/kern/subr_bus_dma.c: Change _bus_dmamap_load_vlist() to take a starting offset and length. Add a new function, _bus_dmamap_load_pages(), that will load a list of physical pages starting at an offset. Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios. Allow unmapped I/O to start at an offset. sys/kern/subr_uio.c: Add two new functions, physcopyin_vlist() and physcopyout_vlist(). sys/pc98/include/bus.h: Guard kernel-only parts of the pc98 machine/bus.h header with #ifdef _KERNEL. This allows userland programs to include <machine/bus.h> to get the definition of bus_addr_t and bus_size_t. sys/sys/bio.h: Add a new bio flag, BIO_VLIST. sys/sys/uio.h: Add prototypes for physcopyin_vlist() and physcopyout_vlist(). share/man/man4/pass.4: Document the CAMIOQUEUE and CAMIOGET ioctls. usr.sbin/Makefile: Add camdd. usr.sbin/camdd/Makefile: Add a makefile for camdd(8). usr.sbin/camdd/camdd.8: Man page for camdd(8). usr.sbin/camdd/camdd.c: The new camdd(8) utility. Sponsored by: Spectra Logic MFC after: 1 week	2015-12-03 20:54:55 +00:00
smh	3de9025c5c	Fix early kernel dump via dumpdev env Setting the dumpdev via env e.g. loader.conf provides the ability to configure the kernel dump device during early boot. When using this g_io_getattr was returning EPERM due to cp->acr == 0. Fix this by calling g_access to ensure we're a read consumer prior to calling g_dev_setdumpdev. MFC after: 2 weeks Sponsored by: Multiplay	2015-11-17 20:55:50 +00:00
smh	8ad428a4e9	Fix g_eli error loss conditions * Ensure that error information isn't lost. * Log the error code in all cases. * Don't overwrite bio_completed set to 0 from the error condition. MFC after: 2 weeks Sponsored by: Multiplay	2015-11-05 17:37:35 +00:00
mav	64d53c4c7d	Remove compatibility shims for legacy ATA device names. We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely removed old stack at 10.x, so at 11.x it is time to remove compat shims.	2015-10-11 13:01:51 +00:00
trasz	1c53e39027	Make geom_nop(4) collect statistics on all types of BIOs, not just reads and writes. PR: kern/198405 Submitted by: Matthew D. Fuller <fullermd at over-yonder dot net> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3679	2015-10-10 09:03:31 +00:00
cem	a0f74ba8d8	geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon Skip a /dev/ prefix, if one is present, when checking for matching device names for dump. Suggested by: avg Reviewed by: markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3725	2015-09-23 21:08:52 +00:00
trasz	58cf716bce	Add a way to specify stripesize and stripeoffset to gnop(8). This makes it possible to "simulate" 4K media, to eg test alignment handling. Reviewed by: mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3664	2015-09-15 18:01:59 +00:00
imp	97bb89fb50	After the introduction of direct dispatch, the pacing code in g_down() broke in two ways. One, the pacing variable was accessed in multiple threads in an unsafe way. Two, since large numbers of I/O could come down from the buf layer at one time, large numbers of allocation failures could happen all at once, resulting in a huge pace value that would limit I/Os to 10 IOPS for minutes (or even hours) at a time. While a real solution to these problems requires substantial work (to go to a no-allocation after the first model, or to have some way to wait for more memory with some kind of reserve for pager and swapper requests), it is relatively easy to make this simplistic pacing less pathological. Move to using a volatile variable with loads and stores. While this is a little racy, losing the race is safe: either you get memory and proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever is larger) instead of 100ms. This removes the artificial 10 IOPS limit while still easing up on new I/Os during memory shortages. Remove tying the amount of time we do this to the number of failed requests and do it only as long as we keep failing requests. Finally, to avoid needless recursion when memory is tight (start -> g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the stack), don't do direct dispatch while pacing. This should be a rare event (not steady state) so the performance hit here is worth the extra safety of not starving g_down() with directly dispatched I/O. Differential Review: https://reviews.freebsd.org/D3546	2015-09-02 17:29:30 +00:00
jhibbits	2693c49ecb	Create a RouterBoard platform and use it to create a flash map Summary: The RouterBoard uses a predefined partition map which doesn't exist in the fdt. This change allows overriding the fdt slicer with a custom slicer, and uses this custom slicer to define the flash map on the RouterBoard RB800. D3305 converts the mpc85xx platform into a base class, so that systems based on the mpc85xx platform can add their own overrides. This change builds on D3305, and creates a RouterBoard (RB800) platform to initialize the slicer override. Reviewed By: nwhitehorn, imp Differential Revision: https://reviews.freebsd.org/D3345	2015-08-22 05:50:18 +00:00
pfg	04e944197e	Clean out some externally visible "more then" grammar MFC after: 3 days	2015-08-11 03:12:09 +00:00
ngie	650a97229c	Make some debug printf's into DPRINTF's to reduce noise on attach/detahh Similar reasoning to what was done in r286367 with geom_uzip(4) MFC after: 2 weeks Differential Revision: D3320 Sponsored by: EMC / Isilon Storage Division	2015-08-09 06:58:06 +00:00
pjd	2ae822a47d	Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected when GELI is used on a SSD or inside virtual machine, so that guest can tell host that it is no longer using some of the storage. Enabling BIO_DELETE passthru comes with a small security consequence - an attacker can tell how much space is being really used on encrypted device and has less data no analyse then. This is why the -T option can be given to the init subcommand to turn off this behaviour and -t/T options for the configure subcommand can be used to adjust this setting later. PR: 198863 Submitted by: Matthew D. Fuller fullermd at over-yonder dot net This commit also includes a fix from Fabian Keil freebsd-listen at fabiankeil.de for 'configure' on onetime providers which is not strictly related, but is entangled in the same code, so would cause conflicts if separated out.	2015-08-08 09:51:38 +00:00
kib	c63f289a9d	Minor style cleanup of the code surrounding r286404. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-07 08:24:12 +00:00
kib	1ca7d277b1	The condition to use direct processing for the unmapped bio is reverted. We can do direct processing when g_io_check() does not need to perform transient remapping of the bio, otherwise the thread has to sleep. Reviewed by: mav (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-07 08:13:34 +00:00
pjd	ce28904c93	After crypto_dispatch() bio might be already delivered and destroyed, so we cannot access it anymore. Setting an error later lead to memory corruption. Assert that crypto_dispatch() was successful. It can fail only if we pass a bogus crypto request, which is a bug in the program, not a runtime condition. PR: 199705 Submitted by: luke.tw Reviewed by: emaste MFC after: 3 days	2015-08-06 17:13:34 +00:00
ngie	e4b151050a	Make some debug printf's into DPRINTF's to reduce noise on attach/detach Differential Revision: https://reviews.freebsd.org/D3306 MFC after: 1 week Reviewed by: loos Sponsored by: EMC / Isilon Storage Division	2015-08-06 15:30:14 +00:00
trasz	33d415d265	Fix panic triggered by code like this: open("/dev/md0", O_EXEC); Discussed with: kib@, mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3051	2015-08-04 10:40:08 +00:00
trasz	f15074a185	Fix panic that would happen on forcibly unmounting devfs (note that as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified to trigger the panic) with consumers still opened. Note that this still results in a leak of r/w/e counters. It seems to be harmless, though. If anyone knows a better way to approach this - please tell. Discussed with: kib@, mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3050	2015-08-03 16:35:18 +00:00
ae	9407ae01ca	Report the scheme and provider names in warning message about unaligned partition. PR: 201873 MFC after: 1 week	2015-07-26 11:16:48 +00:00
allanjude	44bd001082	Add a new option to gpart(8) to fix Lenovo BIOS boot issue PR: 184910 Reviewed by: ae, wblock Approved by: marcel MFC after: 3 days Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3065	2015-07-15 02:23:55 +00:00
pjd	1f882f57d3	Spoil even can happen for some time now even on providers opened exclusively (on the media change event). Update GELI to handle that situation. PR: 201185 Submitted by: Matthew D. Fuller	2015-07-10 19:27:19 +00:00
pjd	8a028f1c90	Properly propagate errors in metadata reading. PR: 198860 Submitted by: Matthew D. Fuller	2015-07-02 10:57:34 +00:00
pjd	7d4cefa995	Allow to omit keyfile number for the first keyfile.	2015-07-02 10:55:32 +00:00
trasz	19250d4f16	Fix off-by-one error in fstyp(8) and geom_label(4) that made them use a single space (" ") as a CD9660 label name when no label was present. Similar problem was also present in msdosfs label recognition. PR: 200828 Differential Revision: https://reviews.freebsd.org/D2830 Reviewed by: asomers@, emaste@ MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2015-06-18 21:55:55 +00:00
ae	e9042a3e16	Teach G_PART_GPT class to handle g_resize_provider event. MFC after: 10 days	2015-06-08 12:52:41 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
ae	d55982dfbf	Read GEOM_UNCOMPRESS metadata using several requests that fit into MAXPHYS. For large compressed images the metadata size can be bigger than MAXPHYS and this triggers KASSERT in g_read_data(). Also use g_free() to free memory allocated by g_read_data(). PR: 199476 MFC after: 2 weeks	2015-05-19 09:28:52 +00:00
ae	7814ded30c	Add apple-boot, apple-hfs and apple-ufs aliases to MBR scheme. Sort DOSPTYP_* entries in diskmbr.h by value. Document these scheme-specific types in gpart(8). MFC after: 1 week	2015-05-05 09:33:02 +00:00
rodrigc	7807fcddc4	Move zlib.c from net to libkern. It is not network-specific code and would be better as part of libkern instead. Move zlib.h and zutil.h from net/ to sys/ Update includes to use sys/zlib.h and sys/zutil.h instead of net/ Submitted by: Steve Kiernan stevek@juniper.net Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28 Relnotes: yes	2015-04-22 14:38:58 +00:00
pfg	a9137e11b6	g_uncompress_taste: prevent a double free. Found by: Clang Static Analyzer MFC after: 1 week	2015-04-20 16:31:27 +00:00
mav	2e38078077	Remove sleeps from geom_up thread on device destruction. MFC after: 3 days.	2015-04-09 13:09:05 +00:00
mav	176aa76341	Remove extra semicolon. MFC after: 1 week	2015-03-27 12:45:20 +00:00
mav	3e52cc4fb2	Remove request sorting from GEOM_MIRROR and GEOM_RAID. When CPU is not busy, those queues are typically empty. When CPU is busy, then one more extra sorting is the last thing it needs. If specific device (HDD) really needs sorting, then it will be done later by CAM. This supposed to fix livelock reported for mirror of two SSDs, when UFS fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by pointless sorting of requests and responses under single mutex lock. MFC after: 2 weeks	2015-03-27 12:44:28 +00:00
mav	89133d309e	Fix bug on memory allocation error in split method. While there, use bioq_takefirst() in place where it is convenient. MFC after: 1 week	2015-03-27 11:14:12 +00:00
mav	dc6dad56dd	Make GEOM_PART work in presence of previous withered self. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2015-03-26 12:17:47 +00:00
mav	a8867c5ea6	Report withered providers as such alike to GEOMs. MFC after: 2 weeks	2015-03-26 11:19:24 +00:00
mav	4505b89481	When searching for provider by name, prefer non-withered one. MFC after: 2 weeks	2015-03-26 11:02:29 +00:00
adrian	a68ab2f735	Fix the label search routine in geom_map to not trip up on '\0' bytes. * Just do the buf check early and fail out * If the offset being searched is: 00110000 00 b5 7e 45 61 e2 76 d3 c1 78 dd 15 95 cd 1f f1 \|..~Ea.v..x......\| .. and the match string is '.!/bin/sh' .. then it'll set the match string[0] to '\0', do a strncmp() against the read buffer, find it's matching two zero-length strings, and think that's where to start. MFC after: 2 weeks	2015-03-19 03:58:25 +00:00
ae	0f7592dbe8	Add GUID and alias for Apple Core Storage partition. PR: 196241 MFC after: 1 week	2015-03-12 18:51:31 +00:00
mav	19bf0d882b	Fix couple BIO_DELETE bugs in geom_mirror. Do not report GEOM::candelete if none of providers support BIO_DELETE. If consumer still requests BIO_DELETE, report error instead of hanging. MFC after: 2 weeks	2015-03-12 10:20:53 +00:00
mav	15944be5bd	Replace constant with proper sizeof(). Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2015-02-25 10:18:11 +00:00
trasz	a54e74c182	Add devd(8) notifications for creation and destruction of GEOM devices. Differential Revision: https://reviews.freebsd.org/D1211 MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-01-14 11:15:57 +00:00
imp	20163879d8	Remove old ioctl use and support, once and for all.	2015-01-06 05:28:37 +00:00
imp	b272d82120	Remove support for FreeBSD 7 and really old FreeBSD 8. The classifiers have been in the base for a while, so the gymnastics here aren't needed. In addition, the bugs in subr_disk.c have been fixed since 2009, so there's no need for an identical copy of it in the tree anymore. There's really no need to binary patch g_io_request, so let's get rid of the code (not compiled in anymore) lest others think it is a good idea.	2014-12-20 00:04:01 +00:00
jmg	c3ff54cc39	Add some new modes to OpenCrypto. These modes are AES-ICM (can be used for counter mode), and AES-GCM. Both of these modes have been added to the aesni module. Included is a set of tests to validate that the software and aesni module calculate the correct values. These use the NIST KAT test vectors. To run the test, you will need to install a soon to be committed port, nist-kat that will install the vectors. Using a port is necessary as the test vectors are around 25MB. All the man pages were updated. I have added a new man page, crypto.7, which includes a description of how to use each mode. All the new modes and some other AES modes are present. It would be good for someone else to go through and document the other modes. A new ioctl was added to support AEAD modes which AES-GCM is one of them. Without this ioctl, it is not possible to test AEAD modes from userland. Add a timing safe bcmp for use to compare MACs. Previously we were using bcmp which could leak timing info and result in the ability to forge messages. Add a minor optimization to the aesni module so that single segment mbufs don't get copied and instead are updated in place. The aesni module needs to be updated to support blocked IO so segmented mbufs don't have to be copied. We require that the IV be specified for all calls for both GCM and ICM. This is to ensure proper use of these functions. Obtained from: p4: //depot/projects/opencrypto Relnotes: yes Sponsored by: FreeBSD Foundation Sponsored by: NetGate	2014-12-12 19:56:36 +00:00
mav	f68d5de62f	Avoid unneeded malloc/memcpy/free if there is no metadata on disk. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2014-12-05 10:23:18 +00:00
mav	173190c9f8	Decode some binary fields of Intel metadata. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2014-12-04 15:54:45 +00:00
imp	21eead8e91	Actually, that was a bad idea. Go back to MAXPARTITIONS. Submitted by: bruce	2014-11-20 17:31:25 +00:00
imp	89b75f4cb6	The number of BSD partitions is variable. Return the proper number (which is in basetable->gpt_entries). Submitted by: ae@	2014-11-19 18:55:27 +00:00
imp	3c75037f3e	Implement the historic DIOCGDINFO ioctl for gpart on BSD partitions. Several utilities still use this interface and require additional information since gpart was activated than before. This allows fsck of a UFS partition without having to specify it is UFS, per historic behavior.	2014-11-18 17:06:40 +00:00
pjd	cb36b2a5c4	Add missing privilege check when setting the dump device. Before that change it was possible for a regular user to setup the dump device if he had write access to the given device. In theory it is a security issue as user might get access to kernel's memory after provoking kernel crash, but in practise it is not recommended to give regular users direct access to storage devices. Rework the code so that we do privileges check within the set_dumper() function to avoid similar problems in the future. Discussed with: secteam	2014-11-11 04:48:09 +00:00
des	d656003ded	Constify the AES code and propagate to consumers. This allows us to update the Fortuna code to use SHAd-256 as defined in FS&K. Approved by: so (self)	2014-11-10 09:44:38 +00:00
phk	60744bb25f	Translate the errno to gctl_error() texts. Spotted by: mwlucas	2014-11-09 15:52:11 +00:00
mav	e22f45febc	Add to CTL support for logical block provisioning threshold notifications. For ZVOL-backed LUNs this allows to inform initiators if storage's used or available spaces get above/below the configured thresholds. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-11-06 00:48:36 +00:00
mav	f21b293af4	Revert somewhat hackish geom_disk optimization, committed as part of r256880, and the following r273143 commit, supposed to workaround introduced issue by quite innocent-looking change. While there is no clear understanding why, but r273143 is accused in data corruption in some environments with high I/O load. I personally don't see any problem in that commit, and possibly it is just a trigger to some other bug somewhere, but better safe then sorry for now. Requested by: scottl@ MFC after: 3 days	2014-10-25 15:16:19 +00:00
cperciva	bdb0ac02b9	Populate the GELI passphrase cache with the kern.geom.eli.passphrase variable (if any) provided in the boot environment. Unset it from the kernel environment after doing this, so that the passphrase is no longer present in kernel memory once we enter userland. This will make it possible to provide a GELI passphrase via the boot loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD) will have support for this soon. Tested by: kmoore	2014-10-22 23:41:15 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
ae	da21a8a9c1	Add provider's sectorsize and stripesize to confdot output. Submitted by: rpokala at panasas.com	2014-10-17 06:58:04 +00:00
davide	e88bd26b3f	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
ae	e58acb98ef	Add an ability to set dumpdev via loader(8) tunable. MFC after: 3 weeks	2014-10-08 12:18:16 +00:00
hrs	4603b170c7	Fix a bug in r272297 which prevented dumpdev from setting. !u is not equivalent to (u != 0).	2014-10-03 04:13:25 +00:00
pjd	d59e29a877	Be prepared that set_dumper() might fail even when resetting it or prefix the call with (void) to document that we intentionally ignore the return value - no way to handle an error in case of device disappearing.	2014-09-30 12:00:50 +00:00
pjd	a652dcda5d	Style fixes.	2014-09-30 11:51:32 +00:00
cperciva	93cbefc174	Cache GELI passphrases entered at the console during the boot process, in order to improve user-friendliness when a system has multiple disks encrypted using the same passphrase. When examining a new GELI provider, the most recently used passphrase will be attempted before prompting for a passphrase; and whenever a passphrase is entered, it is cached for later reference. When the root disk is mounted, the cached passphrase is zeroed (triggered by the "mountroot" event), in order to minimize the possibility of leakage of passphrases. (After root is mounted, the "taste and prompt for passphrases on the console" code path is disabled, so there is no potential for a passphrase to be stored after the zeroing takes place.) This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0. Reviewed by: pjd, dteske, allanjude MFC after: 7 days	2014-09-16 08:40:52 +00:00
sbruno	d549cc8e63	Add device name used in geom_map verbose output. This helps when using geom_map with multiple flash/spi devices. Phabric: https://reviews.freebsd.org/D766 Reviewed by: adrian MFC after: 2 weeks	2014-09-11 22:39:27 +00:00
jmg	5a07ed0920	use a straight buffer instead of an iov w/ 1 segment... The aesni driver when it hits a mbuf/iov buffer, it mallocs and copies the data for processing.. This improves perf by ~8-10% on my machine... I have thoughts of fixing AES-NI so that it can better handle segmented buffers, which should help improve IPSEC performance, but that is for the future...	2014-09-04 23:53:51 +00:00
scottl	cb231d7d66	Deal explicitly with possible failures of make_dev_alias_p() in GEOM. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> MFC after: 3 days	2014-08-18 19:27:47 +00:00
ae	74e6245779	Turn off kern.geom.part.mbr.enforce_chs by default.	2014-08-12 10:31:31 +00:00
ae	e3be5d165d	Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set by default. It can be used to disable automatic alignment to CHS geometry, that GEOM_PART_MBR does. Reviewed by: wblock MFC after: 1 week	2014-08-12 09:10:13 +00:00
imp	1cf5db2416	cswitch is unsigned, so don't compare it < 0. Any negative numbers will look huge and be caught by > 100.	2014-08-07 21:56:42 +00:00
imp	5999dde41e	Unsigned values can never be less than 0.	2014-08-07 21:56:37 +00:00
marcel	d784f2418c	In r264504, we prevented doing I/O for more than MAXPHYS by making the assumption that consumers would respect bio_completed and/or bio_resid to detect short reads. This assumption proved false and file corruption was the result. Create as many bios as we need to satisfy the original request. Check the cached chunk every time we need to do I/O to increase the hit rate. Obtained from: junipre Networks, Inc. MFC after: 1 week	2014-07-22 17:30:05 +00:00
nwhitehorn	6fa381c0bf	After EFI support was added to the installer, it needed to allow boot partitions of types other than "freebsd-boot" (in particular, "efi"). This allows the removal of some nasty hacks for supporting PowerPC systems, in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific code on MBR. This changes the installer to use the correct names, which also breaks a degeneracy in the meaning of "freebsd-boot" that allows the addition of support for some newer IBM systems that can boot from GPT in addition to MBR. Since I have no idea how to detect which those systems are, leave the default on IBM PPC systems as MBR for now.	2014-07-04 15:55:32 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
ae	c78bc4087e	Add disklabel64 support to GEOM_PART class. This partitioning scheme is used in DragonFlyBSD. It is similar to BSD disklabel, but has the following improvements: * metadata has own dedicated place and isn't accessible through partitions; * all offsets are 64-bit; * supports 16 partitions by default (has reserved place for more); * has reserved place for backup label (but not yet implemented); * has UUIDs for partitions and partition types; No objections from: geom MFC after: 2 weeks Relnotes: yes	2014-06-11 10:42:34 +00:00
ae	eee41776bd	Allow swapping to DragonFlyBSD's swap partition. MFC after: 2 weeks	2014-06-11 10:23:49 +00:00
ae	f045bddc4e	Add aliases for DragonFlyBSD's partition types. MFC after: 2 weeks	2014-06-11 10:19:11 +00:00
brd	f421527aa8	- Fix the keyfile being cleared prematurely after r259428 PR: 185084 Submitted by: fk@fabiankeil.de Reviewed by: pjd@	2014-06-06 03:17:37 +00:00
ae	09adbc3a8f	Use g_conf_printf_escaped() to escape symbols, which can break an XML tree. MFC after: 1 week	2014-05-30 10:35:51 +00:00
ae	26ea220350	Add a topology trace to the g_spoil_event. MFC after: 1 week	2014-05-19 16:08:15 +00:00
ae	120b176cea	We have two functions from where a geom orphan method could be called: g_orphan_register and g_resize_provider_event. Both are called from the event queue. Also we have GEOM_DEV class, which does deferred destroy for its consumers via g_dev_destroy (also called from the event queue). So it is possible, that for some consumers an orphan method will be called twice. This triggers panic in g_dev_orphan. Check that consumer isn't already orphaned before call orphan method. MFC after: 2 weeks	2014-05-19 16:05:42 +00:00
mav	b0e1e69974	Make GEOM DISK to account also BIO_FLUSH operations.	2014-05-17 15:07:00 +00:00
ae	9a65da5a22	It is safe to allow shrinking, when aligned size is bigger than current. Tested by: jmg MFC after: 1 week	2014-05-07 11:18:27 +00:00
trasz	54f013e376	Make r242379 - the fix for UFS labels disappearing after resizing the provider - also apply to UFS1 filesystems. This should help with resizing filesystems created by makefs(8), which still uses UFS1. Tested by: jmg@ Sponsored by: The FreeBSD Foundation	2014-05-05 09:20:30 +00:00
ae	9b1896e64f	Add an advice what to do when partition was automatically resized. X-MFC after: r256690	2014-05-04 20:00:08 +00:00
ae	65883d393c	Add better error description for case when we are doing resize and scheme-specific method returns EBUSY. MFC after: 1 week	2014-05-04 16:55:51 +00:00
ae	8488e4961e	Prevent an unexpected shrinking on resizing due to alignment for MBR, PC98 and VTOC8 schemes. Reported by: jmg MFC after: 1 week	2014-05-04 16:43:57 +00:00
ae	000f6e777d	For schemes that do an automatic partition aligning move this code to separate function. MFC after: 1 week	2014-05-04 10:14:25 +00:00
loos	a8fbe04bda	Fix a leak in g_uzip_taste(). After retrieve all the block offsets from the uzip image, free the last data read.	2014-05-01 15:23:20 +00:00
loos	37f1562c6b	Actually the FEATURE() macro is defined on sys/sysctl.h. Pointyhat to: loos	2014-05-01 14:59:04 +00:00
loos	affc38589f	Some style and whitespace fixes. Reduce the difference between geom_uzip(4) and geom_uncompress(4). Now, they produce an almost clean diff(1) output. Remove a duplicated variable from g_uncompress.c and an unnecessary header from g_uzip.c. No functional changes.	2014-05-01 14:47:27 +00:00
bdrewery	ead223adaa	Remove redundant include MFC after: 3 days	2014-04-29 01:17:43 +00:00
mav	b384f8775f	Reduce number of opens by REOM RAID during provider taste. Instead opening/closing provider by each of metadata classes, do it only once in core code. Since for SCSI disks open/close means sending some SCSI commands to the device, this change reduces taste time. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-04-28 15:03:52 +00:00
loos	5da15c1837	Keep geom_uncompress(4) in line with geom_uzip(4), bring in the r264504 fix. Make sure not to start I/O bigger than MAXPHYS bytes. Quoting r264504: When we detect the condition, we'll reduce the block count and perform a "short" read. In g_uncompress_done() we need to consider the original I/O length and stop early if we're about to deflate a block that we didn't read. By using bio_completed in the cloned BIO and not bio_length to check for this, we automatically and gracefully handle short reads that our providers may be doing on top of the short reads we may initiate ourselves. Reviewed by: marcel	2014-04-22 18:08:34 +00:00
marcel	8dd3747140	Make sure not to do I/O for more than MAXPHYS bytes. Doing so can cause problems in our providers, such as a KASSERT in md(4). We can initiate I/O for more than MAXPHYS bytes if we've been given a BIO for MAXPHYS bytes, the blocks from which we're reading couldn't be compressed and we had compression in preceeding blocks resulting in misalignment of the blocks we're trying to read relative to the sector. We're forced to round up the I/O length to make it an multiple of the sector size. When we detect the condition, we'll reduce the block count and perform a "short" read. In g_uzip_done() we need to consider the original I/O length and stop early if we're about to deflate a block that we didn't read. By using bio_completed in the cloned BIO and not bio_length to check for this, we automatically and gracefully handle short reads that our providers may be doing on top of the short reads we may initiate ourselves. Obtained from: Juniper Networks, Inc.	2014-04-15 15:41:57 +00:00
bdrewery	4d8489ab7f	Make g_access() KASSERT() more useful. Sponsored by: EMC / Isilon Storage Division Obtained from: Isilon OneFS MFC after: 2 weeks	2014-04-15 14:41:41 +00:00
marcel	eff424dd13	Align and round the partitionable disk space to 4K by default. Since this would also apply when recovering, make sure not to align or round when that would have a partition fall outside the partitionable area.	2014-04-12 20:28:39 +00:00
bdrewery	b8313a5a98	Fix spelling error in g_trace() call. Sponsored by: EMC / Isilon Storage Division MFC after: 1 week	2014-04-10 17:00:44 +00:00
mav	bc9e83bd99	Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields. This caused incorrect behavior of arrays with big-endian DDF metadata. Little-endian (like used by Adaptec controllers) should not be harmed. Add workaround should be enough to manage compatibility. MFC after: 2 weeks	2014-04-10 16:00:33 +00:00
mav	37fe2a4bcb	Do not increment bio_data in case of BIO_DELETE. This fixes KASSERT() panic in g_io_request().	2014-04-10 10:12:56 +00:00
marcel	e76013f8c8	An all-or-nothing approach to labels isn't flexible enough. Embedded systems need fine-grained control over what's in and what's out. That's ideal. For now, separate GPT labels from the rest and allow g_label to be built with just GPT labels. Obtained from: Juniper Networks, Inc.	2014-04-06 02:44:37 +00:00
marcel	0353f8ca30	Make sure we don't free memory that's already been freed by setting the geom->softc pounter to NULL before freeing the g_slicer softc. In g_slicer_free() the pointer is checked first. Obtained from: Juniper Networks, Inc.	2014-04-06 02:20:42 +00:00
bdrewery	2eab8fff0d	Show error code when failing to destroy a mirror on delay Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-04-05 03:01:29 +00:00
delphij	47f8d13b54	In g_eli_crypto_hmac_init(), zero out after using the ipad buffer, k_ipad. Note that the two consumers in geli(4) are not affected by this issue because the way the code is constructed and as such, we believe there is no security impact with or without this change with geli(4)'s usage. Reported by: Serge van den Boom <serge vdboom.org> Reviewed by: pjd MFC after: 2 weeks	2014-02-08 05:17:49 +00:00
loos	3fd6ea64d7	Fix the build with DEBUG enabled. Where possible, fix style(9) issues. Reviewed by: bde Approved by: adrian (mentor)	2014-02-07 13:06:48 +00:00
loos	df1c99a134	Fix a logic error. Because of this inflateReset() wasn't being called and the output buffer wasn't being cleared between the inflate() calls, producing zeroed output after the first inflate() call. This fixes the read of mkuzip(8) images with geom_uncompress(4). Reviewed by: ray Approved by: adrian (mentor)	2014-02-03 17:25:36 +00:00
loos	dc52643210	Remove some unnecessary code. The offsets read from the first block are overwritten a few lines bellow. Reviewed by: ray Approved by: adrian (mentor)	2014-02-03 17:21:36 +00:00
ae	165a2b7500	Always free sbuf in gctl_free(). MFC after: 1 week	2014-01-23 21:30:31 +00:00
ae	cb5cd1d9fc	Remove another unneeded NULL check from geom_alloc_copyin(). Do copyout in case of gctl version mismatch and fix sbuf leak in g_ctl_ioctl_ctl(). MFC after: 1 week	2014-01-23 20:25:38 +00:00
ae	f9c14b9c5e	In gctl_copyin() remove unused error variable. geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad control request. Add check for NULL pointer in gctl_dump(), since it can be NULL when geom_alloc_copyin() failed. MFC after: 1 week	2014-01-23 19:55:02 +00:00
ae	ce531f97ec	Fix typo in r261084. Add to the gctl_error() an ability to specify error description even if numeric error code is already specified. Also by default set error code to EINVAL. PR: 185852 MFC after: 1 week	2014-01-23 19:31:17 +00:00
ae	3ec50b516b	malloc() with M_WAITOK doesn't return NULL. MFC after: 1 week	2014-01-23 19:07:22 +00:00
mav	d0581e230a	Removed unneeded and dangerous assignment. It would probably cause NULL refererence panic if compiler not optimize it out. Found with: Clang static analyzer MFC after: 2 weeks	2014-01-19 16:37:57 +00:00
loos	74ff5d934c	Build the geom_uncompress(4) module by default. Fix geom_uncompress(4) module loading. Don't link zlib.c (which is a module itself) directly. The built module was verified and used to read a few mkulzma(8) images on amd64 to validate some of the informations on the manual page. While here, don't overwrite CFLAGS. Reviewed by: ray Approved by: adrian (mentor)	2014-01-10 20:29:46 +00:00
ae	150bba5c8f	Add an ability to stop gmirror and clear its metadata in one command. This fixes the problem, when gmirror starts again just after stop. The problem occurs when gmirror's component has geom label with equal size. E.g. gpt and gptid have the same size as partition, diskid has the same size as entire disk. When gmirror's geom has been destroyed, glabel creates its providers and this initiate retaste. Now "gmirror destroy" command is available. It destroys geom and also erases gmirror's metadata. MFC after: 2 weeks	2013-12-27 02:43:53 +00:00
marck	e207b96998	Add GPT UUID for VMware vSAN meta-data partition. Approved by: ae MFC after: 2 weeks	2013-12-26 21:06:12 +00:00
ae	86d162fb60	Prevent users from deactivating the last component of a mirror. PR: 184985 MFC after: 1 week	2013-12-19 22:13:12 +00:00
pjd	8f9b4c6a1e	Clear some more places with potentially sensitive data. MFC after: 1 week	2013-12-15 22:52:18 +00:00
pjd	170007786b	Clear content of keyfiles loaded by the loader after processing them. Pointed out by: rwatson MFC after: 1 week	2013-12-15 22:51:26 +00:00
mav	4dd7fdae6a	Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media. Bisected by: pho	2013-12-12 08:23:28 +00:00
jhibbits	d6564a729a	Partially revert r259080. bde@ pointed out that there are a lot more style bugs going on in here than can be fixed, and I introduced some of my own. Rather than fix the whole host of them, back out my bugs. Found by: bde X-MFC with: r259080	2013-12-08 09:34:56 +00:00
jhibbits	76f5b95a89	Fix some integer signs. These unsigned integers should all be signed. Found by: clang (powerpc64)	2013-12-07 19:55:34 +00:00
eadler	44c01df173	Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this shifts into the sign bit. Instead use (1U << 31) which gets the expected result. This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases. A similar change was made in OpenBSD. Discussed with: -arch, rdivacky Reviewed by: cperciva	2013-11-30 22:17:27 +00:00
mav	cf37ee63fb	Escape special XML chars, returned by some devices, confusing XML parsers. MFC after: 1 month	2013-11-27 14:25:06 +00:00
marcel	3ce6c52230	Have the GPT probe return a lower priority when the MBR is not a PMBR The purpose of the PMBR is to have the disk appear in use to GPT unaware utilities (like fdisk). However, if the PMBR has been changed by a GPT unaware utlity then we must assume that this was deliberate (as it involved removal of the special slice) and we should not treat the unmodified GPT-specific sectors as being valid. By lowering the probe priority in that case, the MBR scheme will take precedence and the kernel will end up using the MBR and not the GPT. We will still use the GPT if the kernel does not support the MBR scheme.	2013-11-21 22:02:59 +00:00
ae	0b2e834032	Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4). Now it is easy to expand the size of the mirror when all its components are replaced. Also add g_resize method to geom_mirror class. It will write updated metadata to new last sector, when parent provider is resized. Silence from: geom@ MFC after: 1 month	2013-11-19 22:55:17 +00:00
mav	c684e93584	In addition to r258220 allow shrinking in "automatic" mode if there is already valid metadata found at the new location. This should allow easy transparent recovery if first resize was done by mistake. While there, unify metadata write code and fix minor memory leak. MFC after: 1 month	2013-11-17 05:38:54 +00:00
mav	fb7bca5b5a	Implement automatic live resize support for GEOM MULTIPATH class. In "manual" mode just automatically resize provider in any direction. In "automatic" mode allow only growth (with new metadata write); in case of shrinking destroy the multipath device same as before since it may be undesirable to write new metadata within old user area. MFC after: 1 month	2013-11-16 14:31:49 +00:00
ae	69b3043c7e	Add missing line breaks. PR: 181900 MFC after: 1 week	2013-11-11 11:13:12 +00:00
delphij	f04c07285f	When zero'ing out a buffer, make sure we are using right size. Without this change, in the worst but unlikely case scenario, certain administrative operations, including change of configuration, set or delete key from a GEOM ELI provider, may leave potentially sensitive information in buffer allocated from kernel memory. We believe that it is not possible to actively exploit these issues, nor does it impact the security of normal usage of GEOM ELI providers when these operations are not performed after system boot. Security: possible sensitive information disclosure Submitted by: Clement Lecigne <clecigne google com> MFC after: 3 days	2013-11-02 01:16:10 +00:00
jhb	b132568f90	Reject attempts to attack a disk device that has the old NEEDSGIANT flag set. Reviewed by: mav	2013-10-25 19:19:12 +00:00
smh	5c7a6f5d92	Improve ZFS N-way mirror read performance by using load and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay	2013-10-23 09:54:58 +00:00
mjg	c2c7122a9a	gnop: make sure that newly allocated memory for softc is zeroed This prevents mtx_init from encountering non-zeros and panicking the kernel as a result. Reported by: Keith White <kwhite site.uottawa.ca>	2013-10-23 01:34:18 +00:00
mav	81c5dcd662	Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9). Since at least FreeBSD 7 we had only four of them in the base tree, and in head branch, thanks to jhb@, we have no any for more then a year.	2013-10-22 10:21:20 +00:00
mav	4219fc0074	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
trasz	26d2e3d5ff	Fix build with gcc by spelling unused format string as "unused" instead of NULL. MFC after: 29 days	2013-10-19 08:20:00 +00:00
trasz	fb4e4700a1	Make geom_label(4) resize-aware. This fixes a situation when "gpart resize" would resize a partition, but label providers - e.g. /dev/gptid/XXX - would stay the same size. Reviewed by: mav MFC after: 1 month Sponsored by: FreeBSD Foundation	2013-10-18 09:14:19 +00:00
ae	05ca533af6	Add an automatic resize support to the GEOM_PART class. When parent provider has been resized, the scheme specific G_PART_RESIZE method does an update of scheme's metadata. But all changes are not saved to disk, until `gpart commit` will be called. Discussed with: trasz MFC after: 1 month	2013-10-17 16:18:43 +00:00
mav	bf87deb4e7	MFprojects/camlock r256445: Add unmapped I/O support to GEOM RAID.	2013-10-16 09:33:23 +00:00
mav	9059e21206	MFprojects/camlock r256371: Fix passing uninitialized bio_resid argument to g_trace().	2013-10-16 09:21:40 +00:00
mav	9a5bcf65b3	MFprojects/camlock r254907: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more.	2013-10-16 09:18:01 +00:00
mav	2080607889	MFprojects/camlock r254905: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.	2013-10-16 09:12:40 +00:00
des	140807754c	Introduce a kern.geom.notaste sysctl that can be used to temporarily disable GEOM tasting to avoid the "bouncing GEOM" problem where, when you shut down the consumer of a provider which can be viewed in multiple ways (typically a mirror whose members are labeled partitions), GEOM will immediately taste that provider's alter ego and reattach the consumer. Approved by: re (glebius)	2013-09-24 20:05:16 +00:00
ae	7f30f5be1c	Remove stub implementation. MFC after: 1 week	2013-09-05 09:44:09 +00:00
mav	8324fc3480	Make ELI destruction (including orphanization) less aggressive, making it always wait for provider close. Old algorithm was reported to cause NULL dereference panic on attempt to close provider after softc destruction. If not global workaroung in GEOM, that could even cause destruction with requests still in flight.	2013-09-02 10:44:54 +00:00
mav	3380a03b00	MFprojects/camlock r254895: Add unmapped BIO support to GEOM ZERO if kern.geom.zero.clear is cleared.	2013-08-26 20:39:02 +00:00
mav	e8031ce26c	Add new attribute lunname to report only textual LUN-specific device IDs. While lunid attribute prefers to report numeric ones, having both may be useful in some situations.	2013-08-24 09:42:14 +00:00
ken	5591de079d	Change the way that unmapped I/O capability is advertised. The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to `1000045` for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic	2013-08-15 22:52:39 +00:00
mav	eba4a485b2	Return error when opening read-only volumes (like RAID4/5/...) for writing. Previously opens succeeded, but actual write operations returned errors. Requested by: peter MFC after: 2 weeks	2013-08-13 07:56:40 +00:00
mav	d9e76bbffc	Oops, wrong constant at r254269.	2013-08-13 06:25:34 +00:00
mav	1ddae2c9b4	Fix reasonable but safe Clang warnings.	2013-08-13 06:21:36 +00:00

... 2 3 4 5 6 ...

2185 Commits