freebsd-nq

Author	SHA1	Message	Date
Andrey V. Elsukov	743437c451	Add missing line breaks. PR: 181900 MFC after: 1 week	2013-11-11 11:13:12 +00:00
Xin LI	7ac2e58818	When zero'ing out a buffer, make sure we are using right size. Without this change, in the worst but unlikely case scenario, certain administrative operations, including change of configuration, set or delete key from a GEOM ELI provider, may leave potentially sensitive information in buffer allocated from kernel memory. We believe that it is not possible to actively exploit these issues, nor does it impact the security of normal usage of GEOM ELI providers when these operations are not performed after system boot. Security: possible sensitive information disclosure Submitted by: Clement Lecigne <clecigne google com> MFC after: 3 days	2013-11-02 01:16:10 +00:00
John Baldwin	d6d78db57f	Reject attempts to attack a disk device that has the old NEEDSGIANT flag set. Reviewed by: mav	2013-10-25 19:19:12 +00:00
Steven Hartland	c28078e903	Improve ZFS N-way mirror read performance by using load and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay	2013-10-23 09:54:58 +00:00
Mateusz Guzik	aa25ccfa36	gnop: make sure that newly allocated memory for softc is zeroed This prevents mtx_init from encountering non-zeros and panicking the kernel as a result. Reported by: Keith White <kwhite site.uottawa.ca>	2013-10-23 01:34:18 +00:00
Alexander Motin	1a29adad30	Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9). Since at least FreeBSD 7 we had only four of them in the base tree, and in head branch, thanks to jhb@, we have no any for more then a year.	2013-10-22 10:21:20 +00:00
Alexander Motin	40ea77a036	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
Edward Tomasz Napierala	fb0e57b1a2	Fix build with gcc by spelling unused format string as "unused" instead of NULL. MFC after: 29 days	2013-10-19 08:20:00 +00:00
Edward Tomasz Napierala	19e5b2d50e	Make geom_label(4) resize-aware. This fixes a situation when "gpart resize" would resize a partition, but label providers - e.g. /dev/gptid/XXX - would stay the same size. Reviewed by: mav MFC after: 1 month Sponsored by: FreeBSD Foundation	2013-10-18 09:14:19 +00:00
Andrey V. Elsukov	884c8e4fea	Add an automatic resize support to the GEOM_PART class. When parent provider has been resized, the scheme specific G_PART_RESIZE method does an update of scheme's metadata. But all changes are not saved to disk, until `gpart commit` will be called. Discussed with: trasz MFC after: 1 month	2013-10-17 16:18:43 +00:00
Alexander Motin	b43560ab19	MFprojects/camlock r256445: Add unmapped I/O support to GEOM RAID.	2013-10-16 09:33:23 +00:00
Alexander Motin	21d0712c33	MFprojects/camlock r256371: Fix passing uninitialized bio_resid argument to g_trace().	2013-10-16 09:21:40 +00:00
Alexander Motin	0fd2511ae2	MFprojects/camlock r254907: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more.	2013-10-16 09:18:01 +00:00
Alexander Motin	e431d66c04	MFprojects/camlock r254905: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.	2013-10-16 09:12:40 +00:00
Dag-Erling Smørgrav	1b2cb2b3f0	Introduce a kern.geom.notaste sysctl that can be used to temporarily disable GEOM tasting to avoid the "bouncing GEOM" problem where, when you shut down the consumer of a provider which can be viewed in multiple ways (typically a mirror whose members are labeled partitions), GEOM will immediately taste that provider's alter ego and reattach the consumer. Approved by: re (glebius)	2013-09-24 20:05:16 +00:00
Andrey V. Elsukov	87c0c612d8	Remove stub implementation. MFC after: 1 week	2013-09-05 09:44:09 +00:00
Alexander Motin	19351a14eb	Make ELI destruction (including orphanization) less aggressive, making it always wait for provider close. Old algorithm was reported to cause NULL dereference panic on attempt to close provider after softc destruction. If not global workaroung in GEOM, that could even cause destruction with requests still in flight.	2013-09-02 10:44:54 +00:00
Alexander Motin	3843eba85d	MFprojects/camlock r254895: Add unmapped BIO support to GEOM ZERO if kern.geom.zero.clear is cleared.	2013-08-26 20:39:02 +00:00
Alexander Motin	40f27d7cf6	Add new attribute lunname to report only textual LUN-specific device IDs. While lunid attribute prefers to report numeric ones, having both may be useful in some situations.	2013-08-24 09:42:14 +00:00
Kenneth D. Merry	ce625ec719	Change the way that unmapped I/O capability is advertised. The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to `1000045` for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic	2013-08-15 22:52:39 +00:00
Alexander Motin	0f0b2fd889	Return error when opening read-only volumes (like RAID4/5/...) for writing. Previously opens succeeded, but actual write operations returned errors. Requested by: peter MFC after: 2 weeks	2013-08-13 07:56:40 +00:00
Alexander Motin	db8645f05e	Oops, wrong constant at r254269.	2013-08-13 06:25:34 +00:00
Alexander Motin	e70b565ba4	Fix reasonable but safe Clang warnings.	2013-08-13 06:21:36 +00:00
Ed Schouten	647a92d62b	Fix the formatting of the error message. The G_MIRROR_DEBUG() macro already appends a newline. Also, most of the log messages emitted by gmirror start with an uppercase letter.	2013-08-12 18:17:45 +00:00
Andrey V. Elsukov	b74dd6c77b	gpt_entries is used as limit for the number of partition entries in the GEOM_PART. Instead of just using number of entries from the GPT header, calculate this limit based on the reserved space between GPT header and first available LBA. MFC after: 2 weeks	2013-08-08 16:09:20 +00:00
Marcel Moolenaar	e01c6f329a	Change <sys/diskpc98.h> to not redefine the same symbols that are being defined in <sys/diskmbr.h>. Instead give the symbols here a "PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h> can be included in the same C source file. The renaming is trivial. The only gotcha is that DOSBBSECTOR is also redefined from 0 to 1. This because DOSBBSECTOR was always used in conjunction with an addition of 1. The PC98_BBSECTOR symbol is defined as 1 and the expression is simplified. Note: it is not believed that ports are seriously impacted; or at all for that matter. Approved by: nyan@	2013-08-07 00:00:48 +00:00
Marcel Moolenaar	b9fdaa9b19	Remove inclusion of <sys/diskmbr.h>. We have no business knowing anything related to MBR in this file.	2013-08-04 21:00:22 +00:00
Alexander Motin	8531bb3f0c	Introduce 3 seconds timeout on `graid stop` command (mostly with -f flag). Since completion waiting goes in g_event thread, it may cause GEOM deadlock if consumer on top (for example, ZFS) uses g_event thread for closing.	2013-07-27 15:02:19 +00:00
Konstantin Belousov	a4a65e69c6	When panicing due to the gjournal overflow, print the geom metadata journal id. Requested by: Andreas Longwitz <longwitz@incore.de> MFC after: 1 week	2013-07-10 10:11:43 +00:00
Konstantin Belousov	cc3d8c35f5	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2013-07-09 20:49:32 +00:00
Steven Hartland	8383a92e5b	Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940. Ensure that d_delmaxsize is always set, removing init to 0 which could cause future issues if use cases change. Allow kern.cam.da.X.delete_max (which maps to d_delmaxsize) to be increased up to the calculated max after being reduced. MFC after: 1 day X-MFC-With: r249940	2013-07-03 23:46:30 +00:00
Jeff Roberson	5f51836645	- Add a general purpose resource allocator, vmem, from NetBSD. It was originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-06-28 03:51:20 +00:00
Scott Long	f07b69478e	Fix a mystery cut-n-paste corruption from the previous commit. Submitted by: Brenden Fabeny	2013-06-19 23:09:10 +00:00
Scott Long	2084cbe975	Mark geom_mirror as capable of unmapped i/o Obtained from: Netflix MFC after: 3 days	2013-06-19 21:52:32 +00:00
Alexander Motin	ccba710262	Make CAM return and GEOM DISK pass through new GEOM::lunid attribute. SPC-4 specification states that serial number may be property of device, but not a specific logical unit. People reported about FC storages using serial number in that way, making it unusable for purposes of LUN multipath detection. SPC-4 states that designators associated with logical unit from the VPD page 83h "Device Identification" should be used for that purpose. Report first of them in the new attribute in such preference order: NAA, EUI-64, T10 and SCSI name string. While there, make GEOM DISK properly report GEOM::ident in XML output also using d_getattr() method, if available. This fixes serial numbers reporting for SCSI disks in `geom disk list` output and confxml. Discussed with: gibbs, ken Sponsored by: iXsystems, Inc. MFC after: 2 weeks	2013-06-12 13:36:20 +00:00
Alexander Motin	c145d6005f	Don't update provider properties and don't set DISKFLAG_OPEN if d_open() disk method call returned error. GEOM considers devices in such case as still closed, and won't call symmetric d_close() for them.	2013-06-11 10:06:07 +00:00
Marcel Moolenaar	3bd22a9cc8	Change the set and unset ctlreqs by making the index argument optional. This allows setting attributes on tables. One simply does not provide an index in that case. Otherwise the entry corresponding the index has the attribute set or unset. Use this change to fix a relatively longstanding bug in our GPT scheme that's the result of rev 198097 (relatively harmless) followed by rev 237057 (damaging). The damaging part being that our GPT scheme always has the active flag set on the PMBR slice. This is in violation with EFI. Existing EFI implementions for both x86 and ia64 reject the GPT. As such, GPT disks created by us aren't usable under EFI because of that. After this change, GPT disks never have the active flag set on the PMBR slice. In order to make the GPT disk bootable under some x86 BIOSes, the reason of rev 198097, one must now set the active attribute on the gpt table. The kernel will apply this to the PMBR slice For (S)ATA: gpart set -a active ada0 To fix an existing GPT disk that has the active flag set in the PMBR, and that does not need the flag, use (again for (S)ATA): gpart unset -a active ada0 The EBR, MBR & PC98 schemes, which also impement at least 1 attribute, now check to make sure the entry passed is valid. They do not have attributes that apply to the table.	2013-06-09 23:34:26 +00:00
Marcel Moolenaar	0f4389991c	Remove stub implementation.	2013-06-09 23:12:43 +00:00
Brooks Davis	444e780150	MFP4 @222836 Add support for partitioning CFI disks from FDT using geom_flashmap. Sponsored by: DARPA, AFRL	2013-05-30 01:19:02 +00:00
Jaakko Heinonen	9641a51279	Remove an extra semicolon from the DOT language output. PR: kern/178540 Submitted by: Trond Endrestol MFC after: 1 week	2013-05-21 18:40:54 +00:00
Alexander Motin	57eed4a86f	Fix vdc->Secondary_Element_Count metadata field access from 16 to 8 bit. In some cases it could cause kernel panic during failed drive replacement. Reported by: trasz MFC after: 1 week	2013-05-20 00:33:54 +00:00
Stanislav Sedov	77f8606428	- Use int8_t type for the mftrecsz field in g_label_ntfs. char type used previously caused probe failure on platforms where char is unsigned (e.g. ARM), as mftrecsz can be negative. Submitted by: Ilya Bakulin <ilya@bakulin.de> MFC after: 2 weeks	2013-05-05 08:00:16 +00:00
Alexander Motin	bcb6ad36f2	Return "descr" field alike to "Intel RAID1 volume" for GEOM RAID to make it look better in bsdinstall.	2013-04-27 06:57:39 +00:00
Steven Hartland	9fe9ba5bef	Teach GEOM and CAM about the difference between the max "size" of r/w and delete requests. sys/geom/geom_disk.h: - Added d_delmaxsize which represents the maximum size of individual device delete requests in bytes. This can be used by devices to inform geom of their size limitations regarding delete operations which are generally different from the read / write limits as data is not usually transferred from the host to physical device. sys/geom/geom_disk.c: - Use new d_delmaxsize to calculate the size of chunks passed through to the underlying strategy during deletes instead of using read / write optimised values. This defaults to d_maxsize if unset (0). - Moved d_maxsize default up so it can be used to default d_delmaxsize sys/cam/ata/ata_da.c: - Added d_delmaxsize calculations for TRIM and CFA sys/cam/scsi/scsi_da.c: - Added re-calculation of d_delmaxsize whenever delete_method is set. - Added kern.cam.da.X.delete_max sysctl which allows the max size for delete requests to be limited. This is useful in preventing timeouts on devices who's delete methods are slow. It should be noted that this limit is reset then the device delete method is changed and that it can only be lowered not increased from the device max. Reviewed by: mav Approved by: pjd (mentor)	2013-04-26 16:22:54 +00:00
Steven Hartland	6f926c0b82	Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum size of a delete request sent to the providing device performed by g_dev_ioctl. This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA deletes which siginificantly improves performance. Previously this was hard coded to 65536 sectors, the new default is 262144 which doubles the throughput of deletes on commonly available SSD's. In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via newfs -E For some SSD's where delete time is pretty much constant, no matter what the request, setting this to 0 will provide significantly better throughput e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s Reviewed by: mav Approved by: pjd (mentor) MFC after: 2 weeks	2013-04-26 15:43:24 +00:00
Ivan Voras	8e9405e8a7	Comment typo fix. Is aware of the importance of comments: dim	2013-04-16 22:42:40 +00:00
Ivan Voras	9a796b22f6	Fix the buffer-overflow-fixing fixes. Pointy-hat to: me, for not realizing snprintf() is available in kernel. Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for noting that the panic can be provoked in i386 and not in amd64	2013-04-16 19:58:24 +00:00
Brooks Davis	b7b63db789	Partial MFP4 of 222836: Only look for FDT partitions if our potential parent is a DISK device. Excluding direct recursion on the flashmap geoms was insufficient because it did not prevent the underlying device from being retrieved if flashmap geoms were further partitioned. Reviewed by: imp Sponsored by: DARPA, AFRL	2013-04-16 17:47:13 +00:00
Ivan Voras	c072011223	Introduce glabel labels based on GEOM ident attributes. In this initial implementation, error on the side of conservatism and only create labels for GEOMs of classes DISK and MULTIPATH. Discussed with: trasz Approved by: silence from freebsd-geom@	2013-04-15 16:09:24 +00:00
Ivan Voras	252c094e53	Introduce a symbol for the GEOM class name instead of using the ad-hoc string constant.	2013-04-15 15:55:40 +00:00
John-Mark Gurney	d7078f3ba0	move the error report to a lower log level... Now you can see when it returns an error without getting every single io that went through it.. MFC after: 1 week	2013-04-13 19:02:58 +00:00
Edward Tomasz Napierala	16fac6c92a	Make it possible to submit FLUSH bios through geom_dev strategy. This is required for CTL to work with device-backed LUNs. Reviewed by: mav	2013-04-06 10:32:06 +00:00
Alexander Motin	0fb832fdf0	Following r241022, replace iteration over the provider list on media events by taking first one and asserting that there is no others. MFC after: 1 week	2013-04-05 13:11:28 +00:00
Alexander Motin	7868ec506b	geom_slice.c and its consumers like GEOM_LABEL are not touching the data unless hotspots are used. Pass G_PF_ACCEPT_UNMAPPED flag through except such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).	2013-03-26 07:55:24 +00:00
Alexander Motin	6c6e13b6e1	GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through.	2013-03-26 05:58:49 +00:00
Alexander Motin	a93c0ed463	Remove extra bio_data and bio_length copying to child request after calling g_clone_bio(), that already copied them.	2013-03-26 05:42:12 +00:00
Alexander Kabaev	31932fae1e	Do not pass unmapped buffers to drivers that cannot handle them In physio, check if device can handle unmapped IO and pass an appropriately mapped buffer to the driver strategy routine. The only driver in the tree that can handle unmapped buffers is one exposed by GEOM, so mark it as such with the new flag in the driver cdevsw structure. This fixes insta-panics on hosts, running dconschat, as /dev/fwmem is an example of the driver that makes use of physio routine, but bypasses the g_down thread, where the buffer gets mapped normally. Discussed with: kib (earlier version)	2013-03-26 01:17:06 +00:00
Alexander Motin	f4673017b3	Make GEOM MULTIPATH to report unmapped bio support if underling path report it. GEOM MULTIPATH itself never touches the data and so transparent.	2013-03-25 07:24:58 +00:00
Alexander Motin	30ba747160	In GEOM DISK: - Replace single done mutex with per-disk ones. On system with several disks on several HBAs that removes small, but measurable lock congestion. - Modify disk destruction process to not destroy the mutex prematurely. - Remove some extra pointer derefences.	2013-03-25 05:45:24 +00:00
Alexander Motin	3c330aff3f	Fix long known deadlock between geom dev destruction and d_close() call. Use destroy_dev_sched_cb() to not wait for device destruction while holding GEOM topology lock (that actually caused deadlock). Use request counting protected by mutex to properly wait for outstanding requests completion in cases of device closing and geom destruction. Unlike r227009, this code does not block taskqueue thread for indefinite time, waiting for completion.	2013-03-24 10:14:25 +00:00
Alexander Motin	50199fa0d0	Make g_wither_washer() to not loop by itself, but only when there was some more topology change done that may require its attention. Add few missing g_do_wither() calls in respective places to signal it. This fixes potential infinite loop here when some provider is withered, but still opened or connected for some reason and so can not be destroyed. For example, see r227009 and r227510.	2013-03-24 03:15:20 +00:00
Konstantin Belousov	e808788c05	Correct the page count when excess length is trimmed from the bio. Reported and tested by: Ivan Klymenko <fidaj@ukr.net	2013-03-21 22:36:43 +00:00
Konstantin Belousov	6c83fce371	Assert that transient mapping of the bio is only done when unmapped buffers are allowed. Sponsored by: The FreeBSD Foundation	2013-03-21 07:26:33 +00:00
Konstantin Belousov	db7bfaa8ce	The geom_part provider supports unmapped bio iff the underlying provider does so, since geom_part never inspects the bio_data. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:50:24 +00:00
Konstantin Belousov	f8c19ba466	A flag for the geom disk driver to indicate that it accepts the unmapped i/o requests. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:49:15 +00:00
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Pawel Jakub Dawidek	c4d2d401f8	We don't need buffer to handle BIO_DELETE, so don't check buffer size for it. This fixes handling BIO_DELETE larger than MAXPHYS.	2013-03-14 23:07:01 +00:00
Sean Bruno	bd9fba0cfe	Add legacy support to geom raid to create a /dev/arX device for support of upgrading older machines using ataraid(4) to newer releases. This optional parameter is controlled via kern.geom.raid.legacy_aliases and will create a /dev/ar0 device that will point at /dev/raid/r0 for example. Tested on Dell SC 1425 DDF-1 format software raid controllers installing from stable/7 and upgrading to stable/9 without having to adjust /etc/fstab Reviewed by: mav Obtained from: Yahoo! MFC after: 2 Weeks	2013-03-08 20:07:32 +00:00
Jean-Sébastien Pédron	f5c1ef84f9	g_label_ntfs_taste: Abort taste is recsize == 0 This will avoid a 0-byte read (in g_read_data()) leading to a panic, if previously read data are erroneous. Suggested by: John-Mark Gurney <jmg@funkthat.com>	2013-03-08 18:07:43 +00:00
Gavin Atkinson	10f29053d2	Support the FAT16 partition type in gpart(8) PR: kern/174714 Submitted by: 4721 at hushmail dot com MFC after: 1 week	2013-03-07 22:32:41 +00:00
Alexander Motin	34d3281c57	Fix panic when Secondary_Element_Count == 1 and Secondary_Element_Seq is not set (255). Reported by: sbruno MFC after: 1 week	2013-03-07 18:55:37 +00:00
Jean-Sébastien Pédron	5943eed4b9	g_label_ntfs.c: Mark structures as __packed Without this, read data is mis-interpreted. This could trigger a panic, as was the case on one computer where computed "recsize" was zero, leading to a call to g_read_page() asking for 0 bytes.	2013-03-05 11:02:05 +00:00
Attilio Rao	0f90e981cb	Remove ntfs headers dependency for g_label_ntfs.c by redefining the used structs and values. This patch is not targeted for MFC.	2013-03-02 18:23:59 +00:00
Kirk McKusick	2bc1a1fe5c	Add barrier write capability to the VFS buffer interface. A barrier write is a disk write request that tells the disk that the buffer being written must be committed to the media along with any writes that preceeded it before any future blocks may be written to the drive. Barrier writes are provided by adding the functions bbarrierwrite (bwrite with barrier) and babarrierwrite (bawrite with barrier). Following a bbarrierwrite the client knows that the requested buffer is on the media. It does not ensure that buffers written before that buffer are on the media. It only ensure that buffers written before that buffer will get to the media before any buffers written after that buffer. A flush command must be sent to the disk to ensure that all earlier written buffers are on the media. Reviewed by: kib Tested by: Peter Holm	2013-02-16 14:51:30 +00:00
Andriy Gapon	1f1088b843	g_mirror: g_getattr() failure should not be fatal This allows to use gmirror e.g. on top of ZVOLs. PR: kern/175323 Submitted by: Alexei.Volkov@softlynx.ru, mav Reported by: Alexei.Volkov@softlynx.ru Tested by: Alexei.Volkov@softlynx.ru Reviewed by: ae, mav, pjd MFC after: 1 week	2013-01-26 10:50:04 +00:00
Alexander Motin	c3ec009a97	- Fix rebuild position broken at r245522. - Identify one more metadata field.	2013-01-17 03:27:08 +00:00
Alexander Motin	821a0f639e	For Promise/AMD metadata add support for disks with capacity above 2TiB and for volumes with sector size above 512 bytes.	2013-01-17 00:50:25 +00:00
Alexander Motin	ed8180e665	Recalculate volume size only for real CONCATs. For SINGLE trust volume size given by metadata, as it should be correct and in some cases can be smaller then subdisk size.	2013-01-17 00:09:50 +00:00
Alexander Motin	2c6a273750	Allow to insert new component to geom_raid3 without specifying number. PR: kern/160562 MFC after: 2 weeks	2013-01-15 10:06:35 +00:00
Alexander Motin	f62c1a47d6	Alike to r242314 for GRAID make GRAID3 more aggressive in marking volumes as clean on shutdown and move that action from shutdown_pre_sync stage to shutdown_post_sync to avoid extra flapping. ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID to shutdown gracefully. To handle that, mark volume as clean just when shutdown time comes and there are no active writes. MFC after: 2 weeks	2013-01-15 01:27:04 +00:00
Alexander Motin	cbab616174	Alike to r242314 for GRAID make GMIRROR more aggressive in marking volumes as clean on shutdown and move that action from shutdown_pre_sync stage to shutdown_post_sync to avoid extra flapping. ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID to shutdown gracefully. To handle that, mark volume as clean just when shutdown time comes and there are no active writes. PR: kern/113957 MFC after: 2 weeks	2013-01-15 01:13:55 +00:00
Alexander Motin	4c10c25e33	Keep value of orig_config_id metadata field. Windows driver writes there previous value of config_id when it is changed in some cases. I guess it may be used do avoid some split-brain conditions.	2013-01-14 20:31:45 +00:00
Alexander Motin	eb84fc957c	Small cosmetic tuning of the IRRT status constants.	2013-01-14 16:38:43 +00:00
Alexander Motin	511c69d9ce	Print some more metadata fields.	2013-01-14 13:06:35 +00:00
Alexander Motin	898a4b74f4	Windows driver writes relative volume IDs to metadata field. Use that value as a hint for raid/rX device number to make it persistent across reboots.	2013-01-14 00:38:51 +00:00
Alexander Motin	f9462b9bbe	- Add checks for Intel metadata version and attributes. Ignore disks with unsupported metadata types like Intel Smart Response to not corrupt them. - Improve setting of these things during metadata writing to protect from incapable BIOS'es and other implementations.	2013-01-13 23:00:40 +00:00
Alexander Motin	b99586c25f	Improve support for disabled disks. If disabled disk disconnected and then reconnected back, leave it as disconnected. If new disk inserted instead of disabled, rebuild it and leave as enabled.	2013-01-13 14:30:37 +00:00
Alexander Motin	865aea63c3	Windows handles INIT and VERIFY as array-wide and it doesn't specify which disks should be rebuilt. Our rebuild code is same time disk-centric. To handle this situation properly check all disks for RBLD flags, and if no disk specified try rebuild/resync all of them except newly inserted.	2013-01-12 21:51:49 +00:00
Alexander Motin	4c95a24141	Implement migration from single disk to RAID1/IRRT for Intel metadata. Windows driver uses such migration when it creates new arrays. While GEOM RAID has no mechanism to implement migration in general case, this specifc case still can be handled easily via degraded RAID1 creation followed by regular rebuild.	2013-01-12 18:25:48 +00:00
Alexander Motin	26c538bc0b	Add basic support for Intel Rapid Recover Technology (Intel RRT). It is alike to RAID1, but with dedicating master and recovery disks and providing manual control over synchronization. It allows to use recovery disk as snapshot of the master disk from the time of the last sync. This implementation is not functionaly complete comparing to Windows, but it is better then silent conversion to RAID1 on first boot.	2013-01-12 09:35:44 +00:00
Konstantin Belousov	ddd6b3fc33	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
Pawel Jakub Dawidek	6011443800	Reset provider-specific fields when resending I/O request in low memory conditions. This fixes assertion which checks those fields when kernel is compiled with DIAGNOSTIC. Reported by: kib, pho MFC after: 1 week	2012-12-26 20:07:47 +00:00
Jaakko Heinonen	efec959c2c	Mangle label names containing spaces, non-printable characters '%' or '"'. Mangling is only done for label names read from file system metadata. Encoding resembles URL encoding. For example, the space character becomes %20. Help by: kib Discussed with: imp, kib, pjd	2012-12-22 13:43:12 +00:00
Jaakko Heinonen	02c62349c9	- Don't pass geom and provider names as format strings. - Add __printflike() attributes. - Remove an extra argument for the g_new_geomf() call in swapongeom_ev(). Reviewed by: pjd	2012-11-20 12:32:18 +00:00
Alfred Perlstein	bad7e7f3dd	Provide a device name in the sysctl tree for programs to query the state of crashdump target devices. This will be used to add a "-l" (ell) flag to dumpon(8) to list the currently configured dumpdev. Reviewed by: phk	2012-11-01 17:01:05 +00:00
Edward Tomasz Napierala	549f62fa42	Fix problem with geom_label(4) not recognizing UFS labels on filesystems extended using growfs(8). The problem here is that geom_label checks if the filesystem size recorded in UFS superblock is equal to the provider (i.e. device) size. This check cannot be removed due to backward compatibility. On the other hand, in most cases growfs(8) cannot set fs_size in the superblock to match the provider size, because, differently from newfs(8), it cannot recompute cylinder group sizes. To fix this problem, add another superblock field, fs_providersize, used only for this purpose. The geom_label(4) will attach if either fs_size (filesystem created with newfs(8)) or fs_providersize (filesystem expanded using growfs(8)) matches the device size. PR: kern/165962 Reviewed by: mckusick Sponsored by: FreeBSD Foundation	2012-10-30 21:32:10 +00:00
Alexander Motin	650e245ebf	Minor addition to r242323: Alike to BIO_WRITE, report success if at least one subdisk succeeded with BIO_DELETE. But unlike BIO_WRITE don't fail disk on BIO_DELETE error. Sponsored by: iXsystems, Inc. MFC after: 1 month	2012-10-29 21:08:06 +00:00
Alexander Motin	609a74746a	Add basic BIO_DELETE support to GEOM RAID class for all RAID levels. If at least one subdisk in the volume supports it, BIO_DELETE requests will be propagated down. Unfortunatelly, for RAID levels with redundancy unmapped blocks will be mapped back during first rebuild/resync process. Sponsored by: iXsystems, Inc. MFC after: 1 month	2012-10-29 18:04:38 +00:00
Edward Tomasz Napierala	1af2d09b49	Fix locking problem in disk_resize(); previously it would run without topology lock, resulting in assertion when running with DIAGNOSTIC. Reviewed by: mav (earlier version)	2012-10-29 17:52:43 +00:00
Alexander Motin	a479c51be3	Make GEOM RAID more aggressive in marking volumes as clean on shutdown and move that action from shutdown_pre_sync to shutdown_post_sync stage to avoid extra flapping. ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID to shutdown gracefully. To handle that, mark volume as clean just when shutdown time comes and there are no active writes. MFC after: 2 weeks	2012-10-29 14:18:54 +00:00

1 2 3 4 5 ...

1908 Commits