freebsd-skq

Author	SHA1	Message	Date
bdrewery	4d8489ab7f	Make g_access() KASSERT() more useful. Sponsored by: EMC / Isilon Storage Division Obtained from: Isilon OneFS MFC after: 2 weeks	2014-04-15 14:41:41 +00:00
marcel	eff424dd13	Align and round the partitionable disk space to 4K by default. Since this would also apply when recovering, make sure not to align or round when that would have a partition fall outside the partitionable area.	2014-04-12 20:28:39 +00:00
bdrewery	b8313a5a98	Fix spelling error in g_trace() call. Sponsored by: EMC / Isilon Storage Division MFC after: 1 week	2014-04-10 17:00:44 +00:00
mav	bc9e83bd99	Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields. This caused incorrect behavior of arrays with big-endian DDF metadata. Little-endian (like used by Adaptec controllers) should not be harmed. Add workaround should be enough to manage compatibility. MFC after: 2 weeks	2014-04-10 16:00:33 +00:00
mav	37fe2a4bcb	Do not increment bio_data in case of BIO_DELETE. This fixes KASSERT() panic in g_io_request().	2014-04-10 10:12:56 +00:00
marcel	e76013f8c8	An all-or-nothing approach to labels isn't flexible enough. Embedded systems need fine-grained control over what's in and what's out. That's ideal. For now, separate GPT labels from the rest and allow g_label to be built with just GPT labels. Obtained from: Juniper Networks, Inc.	2014-04-06 02:44:37 +00:00
marcel	0353f8ca30	Make sure we don't free memory that's already been freed by setting the geom->softc pounter to NULL before freeing the g_slicer softc. In g_slicer_free() the pointer is checked first. Obtained from: Juniper Networks, Inc.	2014-04-06 02:20:42 +00:00
bdrewery	2eab8fff0d	Show error code when failing to destroy a mirror on delay Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-04-05 03:01:29 +00:00
delphij	47f8d13b54	In g_eli_crypto_hmac_init(), zero out after using the ipad buffer, k_ipad. Note that the two consumers in geli(4) are not affected by this issue because the way the code is constructed and as such, we believe there is no security impact with or without this change with geli(4)'s usage. Reported by: Serge van den Boom <serge vdboom.org> Reviewed by: pjd MFC after: 2 weeks	2014-02-08 05:17:49 +00:00
loos	3fd6ea64d7	Fix the build with DEBUG enabled. Where possible, fix style(9) issues. Reviewed by: bde Approved by: adrian (mentor)	2014-02-07 13:06:48 +00:00
loos	df1c99a134	Fix a logic error. Because of this inflateReset() wasn't being called and the output buffer wasn't being cleared between the inflate() calls, producing zeroed output after the first inflate() call. This fixes the read of mkuzip(8) images with geom_uncompress(4). Reviewed by: ray Approved by: adrian (mentor)	2014-02-03 17:25:36 +00:00
loos	dc52643210	Remove some unnecessary code. The offsets read from the first block are overwritten a few lines bellow. Reviewed by: ray Approved by: adrian (mentor)	2014-02-03 17:21:36 +00:00
ae	165a2b7500	Always free sbuf in gctl_free(). MFC after: 1 week	2014-01-23 21:30:31 +00:00
ae	cb5cd1d9fc	Remove another unneeded NULL check from geom_alloc_copyin(). Do copyout in case of gctl version mismatch and fix sbuf leak in g_ctl_ioctl_ctl(). MFC after: 1 week	2014-01-23 20:25:38 +00:00
ae	f9c14b9c5e	In gctl_copyin() remove unused error variable. geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad control request. Add check for NULL pointer in gctl_dump(), since it can be NULL when geom_alloc_copyin() failed. MFC after: 1 week	2014-01-23 19:55:02 +00:00
ae	ce531f97ec	Fix typo in r261084. Add to the gctl_error() an ability to specify error description even if numeric error code is already specified. Also by default set error code to EINVAL. PR: 185852 MFC after: 1 week	2014-01-23 19:31:17 +00:00
ae	3ec50b516b	malloc() with M_WAITOK doesn't return NULL. MFC after: 1 week	2014-01-23 19:07:22 +00:00
mav	d0581e230a	Removed unneeded and dangerous assignment. It would probably cause NULL refererence panic if compiler not optimize it out. Found with: Clang static analyzer MFC after: 2 weeks	2014-01-19 16:37:57 +00:00
loos	74ff5d934c	Build the geom_uncompress(4) module by default. Fix geom_uncompress(4) module loading. Don't link zlib.c (which is a module itself) directly. The built module was verified and used to read a few mkulzma(8) images on amd64 to validate some of the informations on the manual page. While here, don't overwrite CFLAGS. Reviewed by: ray Approved by: adrian (mentor)	2014-01-10 20:29:46 +00:00
ae	150bba5c8f	Add an ability to stop gmirror and clear its metadata in one command. This fixes the problem, when gmirror starts again just after stop. The problem occurs when gmirror's component has geom label with equal size. E.g. gpt and gptid have the same size as partition, diskid has the same size as entire disk. When gmirror's geom has been destroyed, glabel creates its providers and this initiate retaste. Now "gmirror destroy" command is available. It destroys geom and also erases gmirror's metadata. MFC after: 2 weeks	2013-12-27 02:43:53 +00:00
marck	e207b96998	Add GPT UUID for VMware vSAN meta-data partition. Approved by: ae MFC after: 2 weeks	2013-12-26 21:06:12 +00:00
ae	86d162fb60	Prevent users from deactivating the last component of a mirror. PR: 184985 MFC after: 1 week	2013-12-19 22:13:12 +00:00
pjd	8f9b4c6a1e	Clear some more places with potentially sensitive data. MFC after: 1 week	2013-12-15 22:52:18 +00:00
pjd	170007786b	Clear content of keyfiles loaded by the loader after processing them. Pointed out by: rwatson MFC after: 1 week	2013-12-15 22:51:26 +00:00
mav	4dd7fdae6a	Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media. Bisected by: pho	2013-12-12 08:23:28 +00:00
jhibbits	d6564a729a	Partially revert r259080. bde@ pointed out that there are a lot more style bugs going on in here than can be fixed, and I introduced some of my own. Rather than fix the whole host of them, back out my bugs. Found by: bde X-MFC with: r259080	2013-12-08 09:34:56 +00:00
jhibbits	76f5b95a89	Fix some integer signs. These unsigned integers should all be signed. Found by: clang (powerpc64)	2013-12-07 19:55:34 +00:00
eadler	44c01df173	Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this shifts into the sign bit. Instead use (1U << 31) which gets the expected result. This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases. A similar change was made in OpenBSD. Discussed with: -arch, rdivacky Reviewed by: cperciva	2013-11-30 22:17:27 +00:00
mav	cf37ee63fb	Escape special XML chars, returned by some devices, confusing XML parsers. MFC after: 1 month	2013-11-27 14:25:06 +00:00
marcel	3ce6c52230	Have the GPT probe return a lower priority when the MBR is not a PMBR The purpose of the PMBR is to have the disk appear in use to GPT unaware utilities (like fdisk). However, if the PMBR has been changed by a GPT unaware utlity then we must assume that this was deliberate (as it involved removal of the special slice) and we should not treat the unmodified GPT-specific sectors as being valid. By lowering the probe priority in that case, the MBR scheme will take precedence and the kernel will end up using the MBR and not the GPT. We will still use the GPT if the kernel does not support the MBR scheme.	2013-11-21 22:02:59 +00:00
ae	0b2e834032	Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4). Now it is easy to expand the size of the mirror when all its components are replaced. Also add g_resize method to geom_mirror class. It will write updated metadata to new last sector, when parent provider is resized. Silence from: geom@ MFC after: 1 month	2013-11-19 22:55:17 +00:00
mav	c684e93584	In addition to r258220 allow shrinking in "automatic" mode if there is already valid metadata found at the new location. This should allow easy transparent recovery if first resize was done by mistake. While there, unify metadata write code and fix minor memory leak. MFC after: 1 month	2013-11-17 05:38:54 +00:00
mav	fb7bca5b5a	Implement automatic live resize support for GEOM MULTIPATH class. In "manual" mode just automatically resize provider in any direction. In "automatic" mode allow only growth (with new metadata write); in case of shrinking destroy the multipath device same as before since it may be undesirable to write new metadata within old user area. MFC after: 1 month	2013-11-16 14:31:49 +00:00
ae	69b3043c7e	Add missing line breaks. PR: 181900 MFC after: 1 week	2013-11-11 11:13:12 +00:00
delphij	f04c07285f	When zero'ing out a buffer, make sure we are using right size. Without this change, in the worst but unlikely case scenario, certain administrative operations, including change of configuration, set or delete key from a GEOM ELI provider, may leave potentially sensitive information in buffer allocated from kernel memory. We believe that it is not possible to actively exploit these issues, nor does it impact the security of normal usage of GEOM ELI providers when these operations are not performed after system boot. Security: possible sensitive information disclosure Submitted by: Clement Lecigne <clecigne google com> MFC after: 3 days	2013-11-02 01:16:10 +00:00
jhb	b132568f90	Reject attempts to attack a disk device that has the old NEEDSGIANT flag set. Reviewed by: mav	2013-10-25 19:19:12 +00:00
smh	5c7a6f5d92	Improve ZFS N-way mirror read performance by using load and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay	2013-10-23 09:54:58 +00:00
mjg	c2c7122a9a	gnop: make sure that newly allocated memory for softc is zeroed This prevents mtx_init from encountering non-zeros and panicking the kernel as a result. Reported by: Keith White <kwhite site.uottawa.ca>	2013-10-23 01:34:18 +00:00
mav	81c5dcd662	Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9). Since at least FreeBSD 7 we had only four of them in the base tree, and in head branch, thanks to jhb@, we have no any for more then a year.	2013-10-22 10:21:20 +00:00
mav	4219fc0074	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
trasz	26d2e3d5ff	Fix build with gcc by spelling unused format string as "unused" instead of NULL. MFC after: 29 days	2013-10-19 08:20:00 +00:00
trasz	fb4e4700a1	Make geom_label(4) resize-aware. This fixes a situation when "gpart resize" would resize a partition, but label providers - e.g. /dev/gptid/XXX - would stay the same size. Reviewed by: mav MFC after: 1 month Sponsored by: FreeBSD Foundation	2013-10-18 09:14:19 +00:00
ae	05ca533af6	Add an automatic resize support to the GEOM_PART class. When parent provider has been resized, the scheme specific G_PART_RESIZE method does an update of scheme's metadata. But all changes are not saved to disk, until `gpart commit` will be called. Discussed with: trasz MFC after: 1 month	2013-10-17 16:18:43 +00:00
mav	bf87deb4e7	MFprojects/camlock r256445: Add unmapped I/O support to GEOM RAID.	2013-10-16 09:33:23 +00:00
mav	9059e21206	MFprojects/camlock r256371: Fix passing uninitialized bio_resid argument to g_trace().	2013-10-16 09:21:40 +00:00
mav	9a5bcf65b3	MFprojects/camlock r254907: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more.	2013-10-16 09:18:01 +00:00
mav	2080607889	MFprojects/camlock r254905: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.	2013-10-16 09:12:40 +00:00
des	140807754c	Introduce a kern.geom.notaste sysctl that can be used to temporarily disable GEOM tasting to avoid the "bouncing GEOM" problem where, when you shut down the consumer of a provider which can be viewed in multiple ways (typically a mirror whose members are labeled partitions), GEOM will immediately taste that provider's alter ego and reattach the consumer. Approved by: re (glebius)	2013-09-24 20:05:16 +00:00
ae	7f30f5be1c	Remove stub implementation. MFC after: 1 week	2013-09-05 09:44:09 +00:00
mav	8324fc3480	Make ELI destruction (including orphanization) less aggressive, making it always wait for provider close. Old algorithm was reported to cause NULL dereference panic on attempt to close provider after softc destruction. If not global workaroung in GEOM, that could even cause destruction with requests still in flight.	2013-09-02 10:44:54 +00:00
mav	3380a03b00	MFprojects/camlock r254895: Add unmapped BIO support to GEOM ZERO if kern.geom.zero.clear is cleared.	2013-08-26 20:39:02 +00:00
mav	e8031ce26c	Add new attribute lunname to report only textual LUN-specific device IDs. While lunid attribute prefers to report numeric ones, having both may be useful in some situations.	2013-08-24 09:42:14 +00:00
ken	5591de079d	Change the way that unmapped I/O capability is advertised. The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to `1000045` for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic	2013-08-15 22:52:39 +00:00
mav	eba4a485b2	Return error when opening read-only volumes (like RAID4/5/...) for writing. Previously opens succeeded, but actual write operations returned errors. Requested by: peter MFC after: 2 weeks	2013-08-13 07:56:40 +00:00
mav	d9e76bbffc	Oops, wrong constant at r254269.	2013-08-13 06:25:34 +00:00
mav	1ddae2c9b4	Fix reasonable but safe Clang warnings.	2013-08-13 06:21:36 +00:00
ed	e591d48c3e	Fix the formatting of the error message. The G_MIRROR_DEBUG() macro already appends a newline. Also, most of the log messages emitted by gmirror start with an uppercase letter.	2013-08-12 18:17:45 +00:00
ae	4c7750d3a8	gpt_entries is used as limit for the number of partition entries in the GEOM_PART. Instead of just using number of entries from the GPT header, calculate this limit based on the reserved space between GPT header and first available LBA. MFC after: 2 weeks	2013-08-08 16:09:20 +00:00
marcel	9f2f2e171a	Change <sys/diskpc98.h> to not redefine the same symbols that are being defined in <sys/diskmbr.h>. Instead give the symbols here a "PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h> can be included in the same C source file. The renaming is trivial. The only gotcha is that DOSBBSECTOR is also redefined from 0 to 1. This because DOSBBSECTOR was always used in conjunction with an addition of 1. The PC98_BBSECTOR symbol is defined as 1 and the expression is simplified. Note: it is not believed that ports are seriously impacted; or at all for that matter. Approved by: nyan@	2013-08-07 00:00:48 +00:00
marcel	a56cdf0d34	Remove inclusion of <sys/diskmbr.h>. We have no business knowing anything related to MBR in this file.	2013-08-04 21:00:22 +00:00
mav	2b433ed777	Introduce 3 seconds timeout on `graid stop` command (mostly with -f flag). Since completion waiting goes in g_event thread, it may cause GEOM deadlock if consumer on top (for example, ZFS) uses g_event thread for closing.	2013-07-27 15:02:19 +00:00
kib	28425e8270	When panicing due to the gjournal overflow, print the geom metadata journal id. Requested by: Andreas Longwitz <longwitz@incore.de> MFC after: 1 week	2013-07-10 10:11:43 +00:00
kib	a7b76b76e1	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2013-07-09 20:49:32 +00:00
smh	c0361090f3	Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940. Ensure that d_delmaxsize is always set, removing init to 0 which could cause future issues if use cases change. Allow kern.cam.da.X.delete_max (which maps to d_delmaxsize) to be increased up to the calculated max after being reduced. MFC after: 1 day X-MFC-With: r249940	2013-07-03 23:46:30 +00:00
jeff	e725dd5c1e	- Add a general purpose resource allocator, vmem, from NetBSD. It was originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-06-28 03:51:20 +00:00
scottl	0aa8657901	Fix a mystery cut-n-paste corruption from the previous commit. Submitted by: Brenden Fabeny	2013-06-19 23:09:10 +00:00
scottl	ecb835e525	Mark geom_mirror as capable of unmapped i/o Obtained from: Netflix MFC after: 3 days	2013-06-19 21:52:32 +00:00
mav	d745b9017b	Make CAM return and GEOM DISK pass through new GEOM::lunid attribute. SPC-4 specification states that serial number may be property of device, but not a specific logical unit. People reported about FC storages using serial number in that way, making it unusable for purposes of LUN multipath detection. SPC-4 states that designators associated with logical unit from the VPD page 83h "Device Identification" should be used for that purpose. Report first of them in the new attribute in such preference order: NAA, EUI-64, T10 and SCSI name string. While there, make GEOM DISK properly report GEOM::ident in XML output also using d_getattr() method, if available. This fixes serial numbers reporting for SCSI disks in `geom disk list` output and confxml. Discussed with: gibbs, ken Sponsored by: iXsystems, Inc. MFC after: 2 weeks	2013-06-12 13:36:20 +00:00
mav	0fc5c36eb6	Don't update provider properties and don't set DISKFLAG_OPEN if d_open() disk method call returned error. GEOM considers devices in such case as still closed, and won't call symmetric d_close() for them.	2013-06-11 10:06:07 +00:00
marcel	0c3f6d383d	Change the set and unset ctlreqs by making the index argument optional. This allows setting attributes on tables. One simply does not provide an index in that case. Otherwise the entry corresponding the index has the attribute set or unset. Use this change to fix a relatively longstanding bug in our GPT scheme that's the result of rev 198097 (relatively harmless) followed by rev 237057 (damaging). The damaging part being that our GPT scheme always has the active flag set on the PMBR slice. This is in violation with EFI. Existing EFI implementions for both x86 and ia64 reject the GPT. As such, GPT disks created by us aren't usable under EFI because of that. After this change, GPT disks never have the active flag set on the PMBR slice. In order to make the GPT disk bootable under some x86 BIOSes, the reason of rev 198097, one must now set the active attribute on the gpt table. The kernel will apply this to the PMBR slice For (S)ATA: gpart set -a active ada0 To fix an existing GPT disk that has the active flag set in the PMBR, and that does not need the flag, use (again for (S)ATA): gpart unset -a active ada0 The EBR, MBR & PC98 schemes, which also impement at least 1 attribute, now check to make sure the entry passed is valid. They do not have attributes that apply to the table.	2013-06-09 23:34:26 +00:00
marcel	30f527d554	Remove stub implementation.	2013-06-09 23:12:43 +00:00
brooks	eeccce2597	MFP4 @222836 Add support for partitioning CFI disks from FDT using geom_flashmap. Sponsored by: DARPA, AFRL	2013-05-30 01:19:02 +00:00
jh	ba5d89f702	Remove an extra semicolon from the DOT language output. PR: kern/178540 Submitted by: Trond Endrestol MFC after: 1 week	2013-05-21 18:40:54 +00:00
mav	da1d4b7e77	Fix vdc->Secondary_Element_Count metadata field access from 16 to 8 bit. In some cases it could cause kernel panic during failed drive replacement. Reported by: trasz MFC after: 1 week	2013-05-20 00:33:54 +00:00
stas	f6095fb25a	- Use int8_t type for the mftrecsz field in g_label_ntfs. char type used previously caused probe failure on platforms where char is unsigned (e.g. ARM), as mftrecsz can be negative. Submitted by: Ilya Bakulin <ilya@bakulin.de> MFC after: 2 weeks	2013-05-05 08:00:16 +00:00
mav	c050756861	Return "descr" field alike to "Intel RAID1 volume" for GEOM RAID to make it look better in bsdinstall.	2013-04-27 06:57:39 +00:00
smh	0418759378	Teach GEOM and CAM about the difference between the max "size" of r/w and delete requests. sys/geom/geom_disk.h: - Added d_delmaxsize which represents the maximum size of individual device delete requests in bytes. This can be used by devices to inform geom of their size limitations regarding delete operations which are generally different from the read / write limits as data is not usually transferred from the host to physical device. sys/geom/geom_disk.c: - Use new d_delmaxsize to calculate the size of chunks passed through to the underlying strategy during deletes instead of using read / write optimised values. This defaults to d_maxsize if unset (0). - Moved d_maxsize default up so it can be used to default d_delmaxsize sys/cam/ata/ata_da.c: - Added d_delmaxsize calculations for TRIM and CFA sys/cam/scsi/scsi_da.c: - Added re-calculation of d_delmaxsize whenever delete_method is set. - Added kern.cam.da.X.delete_max sysctl which allows the max size for delete requests to be limited. This is useful in preventing timeouts on devices who's delete methods are slow. It should be noted that this limit is reset then the device delete method is changed and that it can only be lowered not increased from the device max. Reviewed by: mav Approved by: pjd (mentor)	2013-04-26 16:22:54 +00:00
smh	5e3cfe527c	Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum size of a delete request sent to the providing device performed by g_dev_ioctl. This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA deletes which siginificantly improves performance. Previously this was hard coded to 65536 sectors, the new default is 262144 which doubles the throughput of deletes on commonly available SSD's. In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via newfs -E For some SSD's where delete time is pretty much constant, no matter what the request, setting this to 0 will provide significantly better throughput e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s Reviewed by: mav Approved by: pjd (mentor) MFC after: 2 weeks	2013-04-26 15:43:24 +00:00
ivoras	1028220198	Comment typo fix. Is aware of the importance of comments: dim	2013-04-16 22:42:40 +00:00
ivoras	b7d8771305	Fix the buffer-overflow-fixing fixes. Pointy-hat to: me, for not realizing snprintf() is available in kernel. Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for noting that the panic can be provoked in i386 and not in amd64	2013-04-16 19:58:24 +00:00
brooks	d7ad3cab59	Partial MFP4 of 222836: Only look for FDT partitions if our potential parent is a DISK device. Excluding direct recursion on the flashmap geoms was insufficient because it did not prevent the underlying device from being retrieved if flashmap geoms were further partitioned. Reviewed by: imp Sponsored by: DARPA, AFRL	2013-04-16 17:47:13 +00:00
ivoras	255c9a6acb	Introduce glabel labels based on GEOM ident attributes. In this initial implementation, error on the side of conservatism and only create labels for GEOMs of classes DISK and MULTIPATH. Discussed with: trasz Approved by: silence from freebsd-geom@	2013-04-15 16:09:24 +00:00
ivoras	4893ec0b76	Introduce a symbol for the GEOM class name instead of using the ad-hoc string constant.	2013-04-15 15:55:40 +00:00
jmg	b9f17d62db	move the error report to a lower log level... Now you can see when it returns an error without getting every single io that went through it.. MFC after: 1 week	2013-04-13 19:02:58 +00:00
trasz	249764dd66	Make it possible to submit FLUSH bios through geom_dev strategy. This is required for CTL to work with device-backed LUNs. Reviewed by: mav	2013-04-06 10:32:06 +00:00
mav	9414459699	Following r241022, replace iteration over the provider list on media events by taking first one and asserting that there is no others. MFC after: 1 week	2013-04-05 13:11:28 +00:00
mav	babac8c981	geom_slice.c and its consumers like GEOM_LABEL are not touching the data unless hotspots are used. Pass G_PF_ACCEPT_UNMAPPED flag through except such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).	2013-03-26 07:55:24 +00:00
mav	9856e08103	GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through.	2013-03-26 05:58:49 +00:00
mav	7997002b4f	Remove extra bio_data and bio_length copying to child request after calling g_clone_bio(), that already copied them.	2013-03-26 05:42:12 +00:00
kan	49a21b7c2e	Do not pass unmapped buffers to drivers that cannot handle them In physio, check if device can handle unmapped IO and pass an appropriately mapped buffer to the driver strategy routine. The only driver in the tree that can handle unmapped buffers is one exposed by GEOM, so mark it as such with the new flag in the driver cdevsw structure. This fixes insta-panics on hosts, running dconschat, as /dev/fwmem is an example of the driver that makes use of physio routine, but bypasses the g_down thread, where the buffer gets mapped normally. Discussed with: kib (earlier version)	2013-03-26 01:17:06 +00:00
mav	cdfcce8d39	Make GEOM MULTIPATH to report unmapped bio support if underling path report it. GEOM MULTIPATH itself never touches the data and so transparent.	2013-03-25 07:24:58 +00:00
mav	aebf1ef11e	In GEOM DISK: - Replace single done mutex with per-disk ones. On system with several disks on several HBAs that removes small, but measurable lock congestion. - Modify disk destruction process to not destroy the mutex prematurely. - Remove some extra pointer derefences.	2013-03-25 05:45:24 +00:00
mav	6cd6934fad	Fix long known deadlock between geom dev destruction and d_close() call. Use destroy_dev_sched_cb() to not wait for device destruction while holding GEOM topology lock (that actually caused deadlock). Use request counting protected by mutex to properly wait for outstanding requests completion in cases of device closing and geom destruction. Unlike r227009, this code does not block taskqueue thread for indefinite time, waiting for completion.	2013-03-24 10:14:25 +00:00
mav	2ebdf6d693	Make g_wither_washer() to not loop by itself, but only when there was some more topology change done that may require its attention. Add few missing g_do_wither() calls in respective places to signal it. This fixes potential infinite loop here when some provider is withered, but still opened or connected for some reason and so can not be destroyed. For example, see r227009 and r227510.	2013-03-24 03:15:20 +00:00
kib	c966fdfb31	Correct the page count when excess length is trimmed from the bio. Reported and tested by: Ivan Klymenko <fidaj@ukr.net	2013-03-21 22:36:43 +00:00
kib	20a66ac403	Assert that transient mapping of the bio is only done when unmapped buffers are allowed. Sponsored by: The FreeBSD Foundation	2013-03-21 07:26:33 +00:00
kib	4f250cea7a	The geom_part provider supports unmapped bio iff the underlying provider does so, since geom_part never inspects the bio_data. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:50:24 +00:00
kib	aa960ab755	A flag for the geom disk driver to indicate that it accepts the unmapped i/o requests. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:49:15 +00:00
kib	7c26a038f9	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
pjd	2e500238dd	We don't need buffer to handle BIO_DELETE, so don't check buffer size for it. This fixes handling BIO_DELETE larger than MAXPHYS.	2013-03-14 23:07:01 +00:00
sbruno	22372779e5	Add legacy support to geom raid to create a /dev/arX device for support of upgrading older machines using ataraid(4) to newer releases. This optional parameter is controlled via kern.geom.raid.legacy_aliases and will create a /dev/ar0 device that will point at /dev/raid/r0 for example. Tested on Dell SC 1425 DDF-1 format software raid controllers installing from stable/7 and upgrading to stable/9 without having to adjust /etc/fstab Reviewed by: mav Obtained from: Yahoo! MFC after: 2 Weeks	2013-03-08 20:07:32 +00:00
dumbbell	bd2d452e4e	g_label_ntfs_taste: Abort taste is recsize == 0 This will avoid a 0-byte read (in g_read_data()) leading to a panic, if previously read data are erroneous. Suggested by: John-Mark Gurney <jmg@funkthat.com>	2013-03-08 18:07:43 +00:00
gavin	5d48955751	Support the FAT16 partition type in gpart(8) PR: kern/174714 Submitted by: 4721 at hushmail dot com MFC after: 1 week	2013-03-07 22:32:41 +00:00
mav	a1b987fb96	Fix panic when Secondary_Element_Count == 1 and Secondary_Element_Seq is not set (255). Reported by: sbruno MFC after: 1 week	2013-03-07 18:55:37 +00:00
dumbbell	b05b4c54a0	g_label_ntfs.c: Mark structures as __packed Without this, read data is mis-interpreted. This could trigger a panic, as was the case on one computer where computed "recsize" was zero, leading to a call to g_read_page() asking for 0 bytes.	2013-03-05 11:02:05 +00:00
attilio	5775bdb2a4	Remove ntfs headers dependency for g_label_ntfs.c by redefining the used structs and values. This patch is not targeted for MFC.	2013-03-02 18:23:59 +00:00
mckusick	c04f4382b3	Add barrier write capability to the VFS buffer interface. A barrier write is a disk write request that tells the disk that the buffer being written must be committed to the media along with any writes that preceeded it before any future blocks may be written to the drive. Barrier writes are provided by adding the functions bbarrierwrite (bwrite with barrier) and babarrierwrite (bawrite with barrier). Following a bbarrierwrite the client knows that the requested buffer is on the media. It does not ensure that buffers written before that buffer are on the media. It only ensure that buffers written before that buffer will get to the media before any buffers written after that buffer. A flush command must be sent to the disk to ensure that all earlier written buffers are on the media. Reviewed by: kib Tested by: Peter Holm	2013-02-16 14:51:30 +00:00
avg	c245f91f95	g_mirror: g_getattr() failure should not be fatal This allows to use gmirror e.g. on top of ZVOLs. PR: kern/175323 Submitted by: Alexei.Volkov@softlynx.ru, mav Reported by: Alexei.Volkov@softlynx.ru Tested by: Alexei.Volkov@softlynx.ru Reviewed by: ae, mav, pjd MFC after: 1 week	2013-01-26 10:50:04 +00:00
mav	28491a8c65	- Fix rebuild position broken at r245522. - Identify one more metadata field.	2013-01-17 03:27:08 +00:00
mav	d3c13df6d1	For Promise/AMD metadata add support for disks with capacity above 2TiB and for volumes with sector size above 512 bytes.	2013-01-17 00:50:25 +00:00
mav	559b3a7eac	Recalculate volume size only for real CONCATs. For SINGLE trust volume size given by metadata, as it should be correct and in some cases can be smaller then subdisk size.	2013-01-17 00:09:50 +00:00
mav	97cf52e0e8	Allow to insert new component to geom_raid3 without specifying number. PR: kern/160562 MFC after: 2 weeks	2013-01-15 10:06:35 +00:00
mav	48d12751a4	Alike to r242314 for GRAID make GRAID3 more aggressive in marking volumes as clean on shutdown and move that action from shutdown_pre_sync stage to shutdown_post_sync to avoid extra flapping. ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID to shutdown gracefully. To handle that, mark volume as clean just when shutdown time comes and there are no active writes. MFC after: 2 weeks	2013-01-15 01:27:04 +00:00
mav	19b676e81d	Alike to r242314 for GRAID make GMIRROR more aggressive in marking volumes as clean on shutdown and move that action from shutdown_pre_sync stage to shutdown_post_sync to avoid extra flapping. ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID to shutdown gracefully. To handle that, mark volume as clean just when shutdown time comes and there are no active writes. PR: kern/113957 MFC after: 2 weeks	2013-01-15 01:13:55 +00:00
mav	1d08afc7e3	Keep value of orig_config_id metadata field. Windows driver writes there previous value of config_id when it is changed in some cases. I guess it may be used do avoid some split-brain conditions.	2013-01-14 20:31:45 +00:00
mav	7ed3ee172a	Small cosmetic tuning of the IRRT status constants.	2013-01-14 16:38:43 +00:00
mav	163aff2e8d	Print some more metadata fields.	2013-01-14 13:06:35 +00:00
mav	257051502a	Windows driver writes relative volume IDs to metadata field. Use that value as a hint for raid/rX device number to make it persistent across reboots.	2013-01-14 00:38:51 +00:00
mav	44f703ac3d	- Add checks for Intel metadata version and attributes. Ignore disks with unsupported metadata types like Intel Smart Response to not corrupt them. - Improve setting of these things during metadata writing to protect from incapable BIOS'es and other implementations.	2013-01-13 23:00:40 +00:00
mav	6157d3ce33	Improve support for disabled disks. If disabled disk disconnected and then reconnected back, leave it as disconnected. If new disk inserted instead of disabled, rebuild it and leave as enabled.	2013-01-13 14:30:37 +00:00
mav	8de7e63765	Windows handles INIT and VERIFY as array-wide and it doesn't specify which disks should be rebuilt. Our rebuild code is same time disk-centric. To handle this situation properly check all disks for RBLD flags, and if no disk specified try rebuild/resync all of them except newly inserted.	2013-01-12 21:51:49 +00:00
mav	960e9d02c6	Implement migration from single disk to RAID1/IRRT for Intel metadata. Windows driver uses such migration when it creates new arrays. While GEOM RAID has no mechanism to implement migration in general case, this specifc case still can be handled easily via degraded RAID1 creation followed by regular rebuild.	2013-01-12 18:25:48 +00:00
mav	2a61b082bf	Add basic support for Intel Rapid Recover Technology (Intel RRT). It is alike to RAID1, but with dedicating master and recovery disks and providing manual control over synchronization. It allows to use recovery disk as snapshot of the master disk from the time of the last sync. This implementation is not functionaly complete comparing to Windows, but it is better then silent conversion to RAID1 on first boot.	2013-01-12 09:35:44 +00:00
kib	a1f9360638	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
pjd	a86b6e7ae5	Reset provider-specific fields when resending I/O request in low memory conditions. This fixes assertion which checks those fields when kernel is compiled with DIAGNOSTIC. Reported by: kib, pho MFC after: 1 week	2012-12-26 20:07:47 +00:00
jh	93c8ab3bc4	Mangle label names containing spaces, non-printable characters '%' or '"'. Mangling is only done for label names read from file system metadata. Encoding resembles URL encoding. For example, the space character becomes %20. Help by: kib Discussed with: imp, kib, pjd	2012-12-22 13:43:12 +00:00
jh	25dd09b996	- Don't pass geom and provider names as format strings. - Add __printflike() attributes. - Remove an extra argument for the g_new_geomf() call in swapongeom_ev(). Reviewed by: pjd	2012-11-20 12:32:18 +00:00
alfred	4a74d2e51a	Provide a device name in the sysctl tree for programs to query the state of crashdump target devices. This will be used to add a "-l" (ell) flag to dumpon(8) to list the currently configured dumpdev. Reviewed by: phk	2012-11-01 17:01:05 +00:00
trasz	ff5b37a93e	Fix problem with geom_label(4) not recognizing UFS labels on filesystems extended using growfs(8). The problem here is that geom_label checks if the filesystem size recorded in UFS superblock is equal to the provider (i.e. device) size. This check cannot be removed due to backward compatibility. On the other hand, in most cases growfs(8) cannot set fs_size in the superblock to match the provider size, because, differently from newfs(8), it cannot recompute cylinder group sizes. To fix this problem, add another superblock field, fs_providersize, used only for this purpose. The geom_label(4) will attach if either fs_size (filesystem created with newfs(8)) or fs_providersize (filesystem expanded using growfs(8)) matches the device size. PR: kern/165962 Reviewed by: mckusick Sponsored by: FreeBSD Foundation	2012-10-30 21:32:10 +00:00
mav	d77bd5cf53	Minor addition to r242323: Alike to BIO_WRITE, report success if at least one subdisk succeeded with BIO_DELETE. But unlike BIO_WRITE don't fail disk on BIO_DELETE error. Sponsored by: iXsystems, Inc. MFC after: 1 month	2012-10-29 21:08:06 +00:00
mav	a43d540d9e	Add basic BIO_DELETE support to GEOM RAID class for all RAID levels. If at least one subdisk in the volume supports it, BIO_DELETE requests will be propagated down. Unfortunatelly, for RAID levels with redundancy unmapped blocks will be mapped back during first rebuild/resync process. Sponsored by: iXsystems, Inc. MFC after: 1 month	2012-10-29 18:04:38 +00:00
trasz	76f8fadfa8	Fix locking problem in disk_resize(); previously it would run without topology lock, resulting in assertion when running with DIAGNOSTIC. Reviewed by: mav (earlier version)	2012-10-29 17:52:43 +00:00
mav	fa229bcba8	Make GEOM RAID more aggressive in marking volumes as clean on shutdown and move that action from shutdown_pre_sync to shutdown_post_sync stage to avoid extra flapping. ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID to shutdown gracefully. To handle that, mark volume as clean just when shutdown time comes and there are no active writes. MFC after: 2 weeks	2012-10-29 14:18:54 +00:00
kib	560aa751e0	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
attilio	ff4947736d	It seems that it is preferable to keep support for glabel also for filesystems that we don't support natively. Revert part of r241636 to do so. This patch is not targeted for MFC. Requested by: gleb, jhb	2012-10-18 22:18:11 +00:00
attilio	85c1a64cec	Disconnect non-MPSAFE NTFS from the build in preparation for dropping GIANT from VFS. This code is particulary broken and fragile and other in-kernel implementations around, found in other operating systems, don't really seem clean and solid enough to be imported at all. If someone wants to reconsider in-kernel NTFS implementation for inclusion again, a fair effort for completely fixing and cleaning it up is expected. In the while NTFS regular users can use FUSE interface and ntfs-3g port to work with their NTFS partitions. This is not targeted for MFC.	2012-10-17 11:30:00 +00:00
mav	7849b3fa4d	NULL-ify last previously used pointer instead of last possible pointer. This should be only a cosmetic change. Found by: Clang Static Analyzer	2012-10-10 20:41:37 +00:00
mav	bfb53c205c	Make graid command line a bit more friendly by allowing volume name or provider name to be specified instead of geom name (first argument in all subcommands except label). In most cases there is only one array used any way, so it is not really useful to make user type ugly geom names like Intel-f0bdf223 or SiI-732c2b9448cf. Though they can be used in some cases. Sponsored by: iXsystems, Inc. MFC after: 1 month	2012-10-07 19:30:16 +00:00
avg	da6a14b6d9	g_part_taste: directly destroy consumer and geom here, no need for withering Besides withered but still alive consumers may interfere with re-tatsing. MFC after: 16 days	2012-10-06 19:52:50 +00:00
pjd	785571bd2b	Remove the topology lock from disk_gone(), it might be called with regular mutexes held and the topology lock is an sx lock. The topology lock was there to protect traversing through the list of providers of disk's geom, but it seems that disk's geom has always exactly one provider. Change the code to call g_wither_provider() for this one provider, which is safe to do without holding the topology lock and assert that there is indeed only one provider. Discussed with: ken MFC after: 1 week	2012-09-28 08:22:51 +00:00
pjd	04357ca182	Use the topology lock to protect list of providers while withering them. It is possible that provider is destroyed while we are iterating over the list. Reported by: Brian Parkison <parkison@panzura.com> Discussed with: phk MFC after: 1 week	2012-09-22 12:41:49 +00:00
avg	975f35c7e7	g_disk_flushcache definitely should not be traced under G_T_TOPOLOGY ... use G_T_BIO instead MFC after: 1 week	2012-09-18 07:57:34 +00:00
mav	db9e01aca9	Add global and per-module sysctls/tunables to enable/disable metadata taste. That should help to handle some cases when disk has some RAID metadata that should be ignored, especially during boot. MFC after: 3 days	2012-09-13 13:27:09 +00:00
glebius	d14a6db068	When synchronizing, include in the config dump amount of bytes syncronized. The rationale behind this is the following: for large disks the percent synchronisation counter ticks too seldom, and monitoring software (as well as human operator) can't tell whether synchronisation goes on or one of disks got stuck. On an idle server one can look into gstat and see whether synchronisation goes on or not, but on a busy server that won't work. Also, new value monitored can be differentiated obtaining the synchronisation speed quite precisely. Submitted by: Konstantin Kukushkin <dark ramtel.ru> Reviewed by: pjd	2012-09-11 20:20:13 +00:00
pjd	12303b5db5	Allow to pass providers with /dev/ prefix to g_provider_by_name(). MFC after: 3 days	2012-09-01 10:52:19 +00:00
ed	099a431e7f	Remove unneeded G_PF_CANDELETE flag. This flag is only used by GEOM so it can be propagated to the character device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.	2012-08-28 19:28:31 +00:00
thomas	d5129fde3c	(g_multipath_rotate): Fix algorithm so that it does rotate over all good providers, not just the last two. PR: kern/170379 Reviewed by: mav MFC after: 2 weeks	2012-08-25 10:36:31 +00:00
pjd	50fe3717a6	Always initialize sc_ekey, because as of r238116 it is always used. If GELI provider was created on FreeBSD HEAD r238116 or later (but before this change), it is using very weak keys and the data is not protected. The bug was introduced on 4th July 2012. One can verify if its provider was created with weak keys by running: # geli dump <provider> \| grep version If the version is 7 and the system didn't include this fix when provider was initialized, then the data has to be backed up, underlying provider overwritten with random data, system upgraded and provider recreated. Reported by: Fabian Keil <fk@fabiankeil.de> Tested by: Fabian Keil <fk@fabiankeil.de> Discussed with: so MFC after: 3 days	2012-08-10 18:43:29 +00:00
mav	9c79bfbda2	Add missing FAILED event to g_raid_subdisk_event2str() to print it properly in debug messages. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>	2012-08-10 13:36:33 +00:00
jimharris	187bdeb55e	Clone BIO_ORDERED flag, for disk drivers (namely CAM) that try to consume it. Sponsored by: Intel Discussed with: gibbs, scottl	2012-08-07 20:16:10 +00:00
trociny	be00f071cd	In g_gate_dumpconf() always check the result of g_gate_hold(). This fixes "Negative sc_ref" panic possible when sysctl_kern_geom_confxml() is run simultaneously with destroying GATE device. Reviewed by: pjd MFC after: 3 days	2012-08-07 18:50:33 +00:00
jimharris	af53082837	In virstor_ctl_stop(), check for a valid softc before trying to update metadata. Sponsored by: Intel Reported and tested by: Marcelo Gondim <gondim at bsdinfo dot com dot br> PR: kern/170199 MFC after: 3 days	2012-08-03 20:24:16 +00:00
thomas	671119f048	New command "gmultipath prefer" to force selection of a specified provider in an Active/Passive configuration. Reviewed by: mav MFC after: 4 weeks	2012-08-03 14:55:35 +00:00
mav	2e17d98c0d	Partially revert r238886 in part of GEOM_VFS spoiling. This change triggered interesting foot shooting condition in GEOM when RW access to root partition by fsck spoils VFS geom there, which has it opened RO at the same time. Seems spoiling concept needs some rework.	2012-07-29 20:04:09 +00:00
mav	24017b5387	Implement media change notification for DA and CD removable media devices. It includes three parts: 1) Modifications to CAM to detect media media changes and report them to disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes Asynchronous Notification mechanism to receive events from hardware. Active polling with TEST UNIT READY commands with 3 seconds period is used for incapable hardware. After that both CD and DA drivers work the same way, detecting two conditions: "NOT READY: Medium not present" after medium was detected previously, and "UNIT ATTENTION: Not ready to ready change, medium may have changed". First one reported to disk(9) as media removal, second as media insert/change. To reliably receive second event new AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by generic error handling code in cam_periph_error(). 2) Modifications to GEOM core to handle media remove and change events. Media removal handled by spoiling all consumers attached to the provider. Media change event also schedules provider retaste after spoiling to probe new media. New flag G_CF_ORPHAN was added to consumers to reflect that consumer is in process of destruction. It allows retaste to create new geom instance of the same class, while previous one is still dying. 3) Modifications to some GEOM classes: DEV -- to report media change events to devd; VFS -- to handle spoiling same as orphan to prevent accessing replaced media. PART class already handles spoiling alike to orphan. Reviewed by: silence on geom@ and scsi@ Tested by: avg Sponsored by: iXsystems, Inc. / PC-BSD MFC after: 2 months	2012-07-29 11:51:48 +00:00
trociny	5792096bc6	Reorder things in g_gate_create() so at the moment when g_new_geomf() is called name is properly initialized. Discussed with: pjd MFC after: 2 weeks	2012-07-28 16:30:50 +00:00
trasz	b5f0adea7e	Make it possible to resize opened partitions. Sponsored by: FreeBSD Foundation	2012-07-20 17:51:20 +00:00
trasz	de6625db1c	Add missing free.	2012-07-18 07:26:20 +00:00
ken	1a9d4d87e6	Add back spare fields consumed in r237545. It seems that these should only be consumed to maintain backward compatibility in stable, but should not be consumed in head. Submitted by: trasz, attilio (indirectly)	2012-07-17 22:16:10 +00:00
trasz	7686baf110	The resize GEOM event has no references, thus cannot be canceled.	2012-07-16 17:41:38 +00:00
trasz	7368f44f0f	Add back spare fields reused in r238213. According to Attilio, the rule is to use reuse spares only when MFC-ing, not in CURRENT.	2012-07-16 16:50:28 +00:00
trasz	b9655c9adb	Add trivial resize handling to gnop(8). Reviewed by: mav Sponsored by: FreeBSD Foundation	2012-07-07 22:22:13 +00:00
trasz	92879956ed	Add trivial resize handling to gmountver(8). Reviewed by: mav Sponsored by: FreeBSD Foundation	2012-07-07 22:20:47 +00:00
trasz	19d6a6d81b	Add disk_resize(), to make it possible for the disk drivers such as da(4) to notify GEOM about LUN size change. Reviewed by: mav (earlier version) Sponsored by: FreeBSD Foundation	2012-07-07 21:28:31 +00:00
trasz	f2e3e9e073	Add a new GEOM method, resize(), which is called after provider size changes. Add a new routine, g_resize_provider(), to use to notify GEOM about provider change. Reviewed by: mav Sponsored by: FreeBSD Foundation	2012-07-07 20:13:40 +00:00
trasz	c29dc3c961	Fix orphan() methods of several GEOM classes to not assume that there is an error set on the provider. With GEOM resizing, class can become orphaned when it doesn't implement resize() method and the provider size decreases. Reviewed by: mav Sponsored by: FreeBSD Foundation	2012-07-07 17:09:44 +00:00
trasz	ee444dc295	Fix typo in the comment.	2012-07-06 15:46:38 +00:00
pjd	5ef9eb30da	Extend GEOM Gate class to handle read I/O requests directly within the kernel. This will allow HAST to read directly from the local component without even communicating userland daemon. Sponsored by: Panzura, http://www.panzura.com MFC after: 1 month	2012-07-04 20:16:28 +00:00
pjd	38de8ef1dd	Use correct part of the Master-Key for generating encryption keys. Before this change the IV-Key was used to generate encryption keys, which was incorrect, but safe - for the XTS mode this key was unused anyway and for CBC mode it was used differently to generate IV vectors, so there is no risk that IV vector collides with encryption key somehow. Bump version number and keep compatibility for older versions. MFC after: 2 weeks	2012-07-04 17:54:17 +00:00
pjd	25a4db982f	Correct comment. MFC after: 3 days	2012-07-04 17:44:39 +00:00
pjd	7c1cf16027	Correct a comment and correct style of a flag check. MFC after: 3 days	2012-07-04 17:43:25 +00:00
glebius	1a62bb3224	Make geom_mirror more friendly to SSDs. To properly support TRIM, we need to pass BIO_DELETE requests down to providers that support it. Also, we need to announce our support for BIO_DELETE to upper consumer. This requires: - In g_mirror_start() return true for "GEOM::candelete" request. - In g_mirror_init_disk() probe below provider for "GEOM::candelete" attribute, and mark disk with a flag if it does support BIO_DELETE. - In g_mirror_register_request() distribute BIO_DELETE requests only to those disks, that do support it. Note that we announce "GEOM::candelete" as true unconditionally of whether we have TRIM-capable media down below or not. This is made intentionally, because upper consumer (usually UFS) requests the attribite only once at mount time. And if user ever migrates his mirror from HDDs to SSDs, then he/she would get TRIM working without remounting filesystem. Reviewed by: pjd	2012-07-01 15:43:52 +00:00
glebius	2c999b9f21	In g_mirror_regular_request() upon successful delivery treat BIO_DELETE requests same way as BIO_WRITE removing them from queue. This fixes panic with BIO_DELETE operations on geom_mirror. Reviewed by: pjd	2012-07-01 15:30:43 +00:00
imp	a7cffbbaef	Use %j to match intmax_t.	2012-07-01 05:22:13 +00:00
brooks	829d43b761	MFP4 #212266 Fix compile on MIPS64. Sponsored by: DARPA, AFRL	2012-06-29 20:15:00 +00:00
ken	b81d46dab2	In g_disk_providergone(), don't continue if the softc is NULL. This may be the case if we've already gone through g_disk_destroy(). Reported by: Michael Butler <imb@protected-networks.net> MFC after: 3 days	2012-06-27 16:05:09 +00:00
ken	9365119416	Consume spare fields for the providergone pointers added to the g_class and g_geom structures in change 237518. The original change would have broken the ABI. Suggested by: ae MFC after: 4 days	2012-06-25 04:26:10 +00:00
ken	be54b17782	Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. The solution is to add a callback to the GEOM disk code that is called when all of its resources are cleaned up. This is implemented inside GEOM by adding an optional callback that is called when all consumers have detached from a provider, and the provider is about to be deleted. scsi_cd.c, scsi_da.c: In the register routine for the cd(4) and da(4) routines, acquire a reference to the CAM peripheral instance just before we call disk_create(). Use the new GEOM disk d_gone() callback to register a callback (dadiskgonecb()/cddiskgonecb()) that decrements the peripheral reference count once GEOM has finished cleaning up its resources. In the cd(4) driver, clean up open and close behavior slightly. GEOM makes sure we only get one open() and one close call, so there is no need to set an open flag and decrement the reference count if we are not the first open. In the cd(4) driver, use cam_periph_release_locked() in a couple of error scenarios to avoid extra mutex calls. geom.h: Add a new, optional, providergone callback that is called when a provider is about to be deleted. geom_disk.h: Add a new d_gone() callback to the GEOM disk interface. Bump the DISK_VERSION to version 2. This probably should have been done after a couple of previous changes, especially the addition of the d_getattr() callback. geom_disk.c: Add a providergone callback for the disk class, g_disk_providergone(), that calls the user's d_gone() callback if it exists. Bump the DISK_VERSION to 2. geom_subr.c: In g_destroy_provider(), call the providergone callback if it has been provided. In g_new_geomf(), propagate the class's providergone callback to the new geom instance. blkfront.c: Callers of disk_create() are supposed to pass in DISK_VERSION, not an explicit disk API version number. Update the blkfront driver to do that. disk.9: Update the disk(9) man page to include information on the new d_gone() callback, as well as the previously added d_getattr() callback, d_descr field, and HBA PCI ID fields. MFC after: 5 days	2012-06-24 04:29:03 +00:00
ae	181a5cf0b0	Always reconstruct partition entries in the PMBR when Boot Camp is disabled. This helps to easily recover from situations when PMBR is damaged and contains no entries. MFC after: 1 week	2012-06-14 11:17:54 +00:00
mav	bd7c494553	Add missing newlines into XML output. MFC after: 3 days Sponsored by: iXsystems, Inc.	2012-06-05 16:46:34 +00:00
marcel	5306b1eeab	Add a partition type for nandfs to the apm, bsd, gpt and vtoc8 schemes. The gpart alias for these partition types is "freebsd-nandfs".	2012-05-25 20:33:34 +00:00
trasz	e85afbafb3	Revert r235918 for now and add comment explaining the reason for the size check.	2012-05-25 10:08:48 +00:00
trasz	fa26d41fa3	Make g_label(4) ignore provider size when looking for UFS labels. Without it, it fails to create labels for filesystems resized by growfs(8). PR: kern/165962 Submitted by: Olivier Cochard-Labbe <olivier at cochard dot me>	2012-05-24 16:48:33 +00:00
delphij	3eb6aeaebe	- Correct signedness for casts; - Wrap long line while I'm there. Noticed by: pjd, avg	2012-05-23 20:51:21 +00:00
delphij	582b1102f5	Use %ju to match uintmax_t usage	2012-05-23 18:17:02 +00:00
delphij	1b7baa2fc1	Use %j and cast off_t to intmax_t for now to fix build. Noticed by: bz	2012-05-23 17:49:59 +00:00
gber	a0e3acd0ee	Add a new geom class which allows to divide NAND Flash chip into partitions. Partitions are created based on data in dts file which are extracted and interpreted by slicer. Obtained from: Semihalf Supported by: FreeBSD Foundation, Juniper Networks	2012-05-22 08:33:14 +00:00
ae	c16abb371b	Prevent removing of the last active component from a mirror. PR: kern/154860 Reviewed by: pjd MFC after: 1 week	2012-05-18 09:22:21 +00:00
ae	a243137af2	Introduce new device flag G_MIRROR_DEVICE_FLAG_TASTING. It should protect geom from destroying while it is tasting. PR: kern/154860 Reviewed by: pjd MFC after: 1 week	2012-05-18 09:19:07 +00:00
eadler	badd8b3abc	Add missing period at the end of the error message Submitted by: pjd Approved by: cperciva (implicit) MFC after: 3 days X-MFC-With: r235201	2012-05-13 23:27:06 +00:00
mav	7e5e00e55f	- Prevent error status leak if write to some of the RAID1/1E volume disks failed while write to some other succeeded. Instead mark disk as failed. - Make RAID1E less aggressive in failing disks to avoid volume breakage. MFC after: 2 weeks	2012-05-11 13:20:17 +00:00
eadler	58bd935b72	Clarify error that geli generates when it finds corrupt data. PR: kern/165695 Submitted by: Robert Simmons <rsimmons0@gmail.com> Reviewed by: pjd Approved by: cperciva MFC after: 1 week	2012-05-09 17:26:52 +00:00
mav	64e3d8819b	Remove some hardcoded constants from code.	2012-05-06 16:41:27 +00:00
mav	6710f450f7	Plug small memory leaks.	2012-05-06 12:55:20 +00:00
mav	3d44dd0fea	Add support for RAID5R. Slightly improve support for RAIDMDF.	2012-05-06 11:32:36 +00:00
mav	997ac8e508	Fix `gmultipath configure` for big-endian machines. MFC after: 1 week	2012-05-06 05:49:23 +00:00
mav	3f57d6ecd5	Fix bug causing memory corruption and panics with big-endian metadata.	2012-05-04 08:59:19 +00:00
mav	4ed58415ed	Implement read-only support for volumes in optimal state (without using redundancy) for the following RAID levels: RAID4/5E/5EE/6/MDF.	2012-05-04 07:32:57 +00:00
mav	6a0688c8fd	Add optional -o argument to the `graid label` to specify some metadata format options. Use it for specifying byte order for the DDF metadata: big-endian defined by specification and little-endian used by Adaptec.	2012-05-03 05:32:56 +00:00
mav	ecf215ed8d	Improve spare disks support. Unluckily, for some reason Adaptec 1430SA RAID BIOS doesn't want to understand spare disks created by graid. But at least spares created by BIOS are working fine now.	2012-05-01 18:00:31 +00:00
mav	08b90a5b47	Implement volume deletion if disk has more then one partition.	2012-05-01 09:21:21 +00:00
mav	3a7fb06834	Improve DDF metadata writing.	2012-05-01 08:19:29 +00:00
mav	dbcc3abc52	Add to GEOM RAID class module, supporting the DDF metadata format, as defined by the SNIA Common RAID Disk Data Format Specification v2.0. Supports multiple volumes per array and multiple partitions per disk. Supports standard big-endian and Adaptec's little-endian byte ordering. Supports all single-layer RAID levels. Dual-layer RAID levels except RAID10 are not supported now because of GEOM RAID design limitations. Some work is still to be done, but the present code already manages basic interoperation with RAID BIOS of the Adaptec 1430SA SATA RAID controller. MFC after: 1 month Sponsored by: iXsystems, Inc.	2012-04-30 17:53:02 +00:00
mav	1781eecdcd	s/gmirror/graid/	2012-04-29 19:40:50 +00:00
mav	27867e6fc9	Fix RAID5 level names changed at r234603.	2012-04-27 08:49:15 +00:00
mav	511879d765	Fix copy-paste typo in r234603. Submitted by: kan	2012-04-23 16:35:19 +00:00
mav	2e83ed7d13	Add names for all primary RAID levels defined by DDF 2.0 specification.	2012-04-23 13:04:02 +00:00
mav	c283985a30	Add sos@ copyrights to RAID metadata modules, respecting his efforts in decoding metadata formats in ataraid(4) code.	2012-04-23 09:39:39 +00:00
mav	fdb56713f2	Add to GEOM RAID class module for reading non-degraded RAID5 volumes and some environment to differentiate 4 possible RAID5 on-disk layouts. Tested with Intel and AMD RAID BIOSes. MFC after: 2 weeks	2012-04-19 12:30:12 +00:00
marck	a91354cc7e	VMware environments are not unusual now. Add VMware partitions recognition (both MBR for ESXi <= 4.1 and GPT for ESXi 5) to g_part. Reviewed by: ae Approved by: ae MFC after: 2 weeks	2012-04-18 11:59:03 +00:00
mav	91acf8bc72	Some improvements to GEOM MULTIPATH: - Implement "configure" command to allow switching operation mode of running device on-fly without destroying and recreation. - Implement Active/Read mode as hybrid of Active/Active and Active/Passive. In this mode all paths not marked FAIL may handle reads same time, but unlike Active/Active only one path handles write requests at any point in time. It allows to closer follow original write request order if above layers need it for data consistency (not waiting for requisite write completion before sending dependent write). - Hide duplicate messages about device status change. - Remove periodic thread wake up with 10Hz rate. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2012-04-18 09:42:14 +00:00
mckusick	614fd4fe5e	Expand locking around identification of filesystem mount point when accounting for I/O counts at completion of I/O operation. Also switch from using global devmtx to vnode mutex to reduce contention. Suggested and reviewed by: kib	2012-04-08 06:20:21 +00:00
ae	0c8f817a74	VMDB offset should be greater than logical volume size only for MBR.	2012-03-29 07:29:27 +00:00
ae	c982232271	Do proper cleanup for the GPT case when an error occurs.	2012-03-29 06:37:02 +00:00
mckusick	9a7982e5a0	Keep track of the mount point associated with a special device to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib	2012-03-28 20:49:11 +00:00
ae	7232ff0e78	Check that scheme is not already registered. This may happens when a KLD is preloaded with loader(8) and leads to infinity loops. Also do not return EEXIST error code from MOD_LOAD handler, because we have undocumented(?) ability replace kernel's module with preloaded one. And if we have so, then preloaded module will be initialized first. Thus error in MOD_LOAD handler will be triggered for the kernel. PR: kern/165573 MFC after: 3 weeks	2012-03-23 07:26:17 +00:00
ae	7049ead2d4	Add CTLFLAG_TUN to sysctls. MFC after: 1 month	2012-03-19 13:21:10 +00:00
ae	63a3c6125e	Add new GEOM_PART_LDM module that implements the Logical Disk Manager scheme. The LDM is a logical volume manager for MS Windows NT and it is also known as dynamic volumes. It supports about 2000 partitions and also provides the capability for software RAID implementations. This version implements only partitioning scheme capability and based on the linux-ntfs project documentation and several publications across the Web. NOTE: JBOD, RAID0 and RAID5 volumes aren't supported. An access to the LDM metadata is read-only. When LDM is on the disk partitioned with MBR we can also destroy metadata. For the GPT partitioned disks destroy action is not supported. Reviewed by: ivoras (previous version) MFC after: 1 month	2012-03-19 13:14:44 +00:00
ae	1fa040c42d	Make kern.geom.part node not static. Also add CTLFLAG_TUN to the check_integrity sysctl. MFC after: 1 month	2012-03-19 12:57:52 +00:00
ae	924dbaa61e	Add MODULE_DEPEND() to geom_part modules. MFC after: 2 weeks	2012-03-15 08:39:10 +00:00
emaste	459dac9ae1	Remove unactionable message about label geometry It's not clear to a user what they should do after seeing the "geometry does not match label" kernel message, and it does not appear to present a problem in practice. Thus, just remove the messages. Approved by: marcel	2012-03-08 01:48:44 +00:00
ae	d92d49e46f	If nested scheme allows dump kernel to its partition, we may allow dump for the parent partition too. MFC after: 2 weeks	2012-02-20 06:35:52 +00:00
ae	db69f3b1ae	Add alias for the partition type 0x0f. Now "ebr" name is used for both types 0x05 and 0x0f, but 0x05 is preferred and used when partition is created with "gpart add -t ebr ...". This should keep EBR partitions accessible after r231754 for those, who have EBR on the partition with type 0x0f.	2012-02-20 05:48:57 +00:00
ae	e5861657ac	Add additional check to EBR probe and create methods: don't try probe and create EBR scheme when parent partition type is not "ebr". This fixes error messages about corrupted EBR for some partitions where is actually another partition scheme. NOTE: if you have EBR on the partition with different than "ebr" (0x05) type, then you will lost access to partitions until it will be changed. MFC after: 2 weeks	2012-02-15 10:33:29 +00:00
ae	3ccaf222af	Add PART::type attribute handler. It returns partition type as string. MFC after: 2 weeks	2012-02-15 10:02:19 +00:00
ae	3c2d86e2b5	Add alias for the partition with type 0x42 to the MBR scheme. MFC after: 1 week	2012-02-10 09:55:18 +00:00
ae	7a50922fdb	Let's be more realistic and limit maximum number of partition to 4k. MFC after: 1 week	2012-02-10 06:44:30 +00:00
kib	52c17430bc	Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks	2012-02-06 11:04:36 +00:00
emaste	d6c0a587a3	Correct typo in comment (numbver)	2012-02-04 18:14:39 +00:00
ae	cc4809fb9a	The scheme code may not know about some inconsistency in the metadata. So, add an integrity check after recovery attempt. MFC after: 1 week	2012-02-01 09:28:16 +00:00
attilio	1521eb4479	Avoid to check the same cache line/variable from all the locking primitives by breaking stop_scheduler into a per-thread variable. Also, store the new td_stopsched very close to td_*locks members as they will be accessed mostly in the same codepaths as td_stopsched and this results in avoiding a further cache-line pollution, possibly. STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to take advantage of already cached curthread, but in the end there should not really be a performance benefit, while introducing a KPI breakage. In collabouration with: flo Reviewed by: avg MFC after: 3 months (or never) X-MFC: r228424	2012-01-28 14:00:21 +00:00
nwhitehorn	b41629d499	Experimental support for booting CHRP-type PowerPC systems from hard disks.	2012-01-25 03:37:39 +00:00
truckman	906ade7185	Allow an MBR primary or extended Linux swap partition to be specified as the system dump device. This was already allowed for GPT. The Linux swap metadata at the beginning of the partition should not be disturbed because the crash dump is written at the end. Reviewed by: alfred, pjd, marcel MFC after: 2 weeks	2012-01-13 18:32:56 +00:00
jimharris	7b24e93323	Add support for >2TB disks in GEOM RAID for Intel metadata format. Reviewed by: mav Approved by: scottl MFC after: 1 week	2012-01-09 23:01:42 +00:00
ray	f86cbc8446	GEOM_UNCOMPRESS module, can be used with uzip images and with new ulzma images. Approved by: adrian (mentor)	2012-01-04 23:39:11 +00:00
avg	abb1713421	replace uses of libkern gets with cngets MFC after: 2 months	2011-12-17 15:26:34 +00:00
mav	96d9af62bb	Close race between geom destruction on g_vfs_close() when softc destroyed and g_vfs_orphan() call that tries to access softc, intruced at r227015. PR: kern/162997	2011-12-02 17:09:48 +00:00
ae	65be46df50	Add an ability to increase number of allocated APM entries when we have reserved free space in the APM area. Also instead of one write request per each APM entry, use MAXPHY sized writes when we are updating APM. MFC after: 1 month	2011-11-28 16:07:26 +00:00
ae	a267596c2a	The size of APM could be bigger than number of already allocated entries. And the first usable sector should not start from the inside of APM area. MFC after: 1 month	2011-11-28 12:38:24 +00:00
mav	37d5d5108c	Temporary revert r227009 to fix freeze on UP systems without PREEMPTION. Before r215687, if some withered geom or provider could not be destroyed, g_event thread went to sleep for 0.1s before retrying. After that change it is just restarting immediately. r227009 made orphaned (withered) provider to not detach immediately, but only after context switch. That made loop inside g_event thread infinite on UP systems without PREEMPTION. To address original problem with possible dead lock addressed by r227009 we have to fix r215687 change first, that needs some time to think and test.	2011-11-14 19:32:05 +00:00
mav	ec5f778a89	Major GEOM MULTIPATH class rewrite: - Improved locking and destruction process to fix crashes. - Improved "automatic" configuration method to make it consistent and safe by reading metadata back from all specified paths after writing to one. - Added provider size check to reduce chance of ordering conflict with other GEOM classes. - Added "manual" configuration method without using on-disk metadata. - Added "add" and "remove" commands to allow manage paths manually. - Failed paths are no longer dropped from geom, but only marked as FAIL and excluded from I/O operations. - Automatically restore failed paths when all others paths are marked as failed, for example, because of device-caused (not transport) errors. - Added "fail" and "restore" commands to manually control FAIL flag. - geom is now destroyed on last path disconnection. - Added optional Active/Active mode support. Unlike Active/Passive mode, load evenly distributed between all working paths. If supported by the device, it allows to significantly improve performance, utilizing bandwidth of all paths. It is controlled by -A option during creation. Disabled by default now. - Improved `status` and `list` commands output. Sponsored by: iXsystems, inc. MFC after: 1 month	2011-11-12 09:52:27 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
ed	e97eae1577	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
mav	f082fe73d7	Add mutex and two flags to make orphan() call properly asynchronous: - delay consumer closing and detaching on orphan() until all I/Os complete; - prevent new I/Os submission after orphan() called. Previous implementation could destroy consumers still having active requests and worked only because of global workaround made on GEOM level.	2011-11-02 09:24:59 +00:00
mav	98d097d322	Make orphan() method in geom_dev asynchronous using destroy_dev_sched_cb() instead of destroy_dev(). It moves device destruction waiting out of the topology lock and so fixes dead lock between orphanization and closing. Real provider and geom destruction called from swi context after device destroyed as callback of the destroy_dev_sched_cb().	2011-11-01 23:12:22 +00:00
mav	973c472be5	Refactor disk disconnection and geom destruction handling sequences. Do not close/destroy opened consumer directly in case of disconnect. Instead keep it existing until it will be closed in regular way in response to upstream provider destruction. Delay geom destruction in the same way. Previous implementation could destroy consumers still having active requests and worked only because of global workaround made on GEOM level.	2011-11-01 20:56:19 +00:00
mav	1cbd492c70	Refactor disk disconnection and geom destruction handling sequences. Do not close/destroy opened consumer directly in case of disconnect. Instead keep it existing until it will be closed in regular way in response to upstream provider destruction. Delay geom destruction in the same way. Previous implementation could destroy consumers still having active requests and worked only because of global workaround made on GEOM level.	2011-11-01 17:04:42 +00:00
mav	80b1c68a33	Workaround the problem introduced by combination of r162200 and r215687. r162200 delays provider orphanization until all running requests complete, to workaround broken orphan() method implementation in some classes. r215687 removes persistent periodic (10Hz) event thread wake ups. Together these changes can indefinitely delay orphanization until some other event wake up the event thread. One consequence of this is inability of CAM to destroy device disconnected when busy and, as consequence, create new one after reconnection. While the best solution would be to revert r162200, it is not easy, as some classes still look broken in that way. Instead conditionally wake up event thread if there are some providers waiting for orphanization. MFC after: 1 week	2011-11-01 08:57:49 +00:00
ae	97fe037955	Our geom withering function could take some time before geom with its providers and consumers will be destroyed. Before take some actions with a geom, check that it is not destroyed at the moment. Tested by: nwhitehorn MFC after: 1 week	2011-10-28 11:45:24 +00:00
pjd	247ca8dfcf	Before this change when GELI detected hardware crypto acceleration it will start only one worker thread. For software crypto it will start by default N worker threads where N is the number of available CPUs. This is not optimal if hardware crypto is AES-NI, which uses CPU for AES calculations. Change that to always start one worker thread for every available CPU. Number of worker threads per GELI provider can be easly reduced with kern.geom.eli.threads sysctl/tunable and even for software crypto it should be reduced when using more providers. While here, when number of threads exceeds number of CPUs avilable don't reduce this number, assume the user knows what he is doing. Reported by: Yuri Karaban <dev@dev97.com> MFC after: 3 days	2011-10-27 16:12:25 +00:00

... 3 4 5 6 7 ...

2091 Commits