freebsd-dev

Author	SHA1	Message	Date
Alexander Motin	1631690677	Add GEOM::descr attribute for symmetry with GEOM::ident. MFC after: 2 weeks	2017-07-06 08:36:14 +00:00
Ryan Libby	fb0e3235ea	g_virstor.h: macro parenthesization Build with gcc -Wint-in-bool-context revealed a macro parenthesization error (invoking LOG_MSG with a ternary expression for lvl). Reviewed by: markj Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential revision: https://reviews.freebsd.org/D11411	2017-06-30 22:01:18 +00:00
Marcelo Araujo	4323355e76	With r318394 seems it breaks gpart(8) in some embedded systems such like PCEngines, RPI1-B, Alix and APU2 boards as well as NanoBSD with the following message: vnode_pager_generic_getpages_done: I/O read error 5 Seems the breakage was because it was missed to include acr in glabel update. Reported by: Peter Blok <pblok@bsd4all.org>, madpilot, imp and trasz. Reviewed by: trasz Tested by: Peter Blok and madpilot. MFC after: 3 days. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11365	2017-06-27 01:22:27 +00:00
Stephen J. Kiernan	9a81ba0f24	Add MD_VERIFY option to enable O_VERIFY in open for vnode type. Add -o [no]verify option to mdconfig (and document in man page.) Implement GEOM attribute MNT::verified to ask md if the backing vnode is verified. Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if the underlying device has been verified. Reviewed by: rwatson Approved by: sjg (mentor) Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2902	2017-05-31 21:18:11 +00:00
Edward Tomasz Napierala	6635c8ed2f	Fix typo. MFC after: 2 weeks	2017-05-18 08:25:07 +00:00
Mark Johnston	db7c508323	Synchronize unclean mirrors before adding them to a running gmirror. During gmirror startup, if component mirrors are found to be dirty as is typical after a system crash, the mirrors are synchronized to the mirror with highest priority. However if a gmirror starts without all of its mirrors present, for example because of some transient delays during tasting, the remaining mirrors must be synchronized before they may become active. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-05-02 23:29:42 +00:00
Alexander Motin	d109d8adc7	Dump md_iterations as signed, which it really is. PR: 208305 PR: 196834 MFC after: 2 weeks	2017-04-21 07:43:44 +00:00
Alexander Motin	d8880fd450	Always allow setting number of iterations for the first time. Before this change it was impossible to set number of PKCS#5v2 iterations, required to set passphrase, if it has two keys and never had any passphrase. Due to present metadata format limitations there are still cases when number of iterations can not be changed, but now it works in cases when it can. PR: 218512 MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D10338	2017-04-21 07:16:07 +00:00
Mark Johnston	a7d94fcc3e	Rename two gmirror state flags to make their meanings slightly clearer. No functional change. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-04-14 17:13:57 +00:00
Mark Johnston	1e91412e40	Don't set the mirror GEOM softc to NULL in g_mirror_destroy(). At this point we have not rendezvous'ed with the mirror worker thread, and I/O may still be in flight. Various I/O completion paths expect to be able to obtain a reference to the mirror softc from the GEOM, so setting it to NULL may result in various NULL pointer dereferences if the mirror is stopped with -f or the kernel is shut down while a mirror is synchronizing. The worker thread will clear the softc pointer before exiting. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-04-14 17:08:37 +00:00
Mark Johnston	77011eac86	Check for a provider error before enqueuing mirror I/O. We are otherwise susceptible to a race with a concurrent teardown of the mirror provider, causing the I/O to be left uncompleted after the mirror started withering. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-04-14 17:03:32 +00:00
Mark Johnston	a65d524afc	Stop mirror synchronization before draining the I/O queue. Regular I/O requests may be blocked by concurrent synchronization requests targeted to the same LBAs, in which case they are moved to a holding queue until the conflicting I/O completes. We therefore want to stop synchronization before completing pending I/O in g_mirror_destroy_provider() since this ensures that blocked I/O requests are completed as well. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-04-14 16:54:50 +00:00
Mark Johnston	a4834289d6	Handle NULL entries in gmirror disk ds_bios arrays. Entries may be removed and freed if an I/O error occurs during mirror synchronization, so we cannot assume that all entries of ds_bios are valid. Also ensure that a synchronization BIO's array index is preserved after a successful write. Reported and tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-04-10 17:15:59 +00:00
Allan Jude	ec5c0e5be9	Implement boot-time encryption key passing (keybuf) This patch adds a general mechanism for providing encryption keys to the kernel from the boot loader. This is intended to enable GELI support at boot time, providing a better mechanism for passing keys to the kernel than environment variables. It is designed to be extensible to other applications, and can easily handle multiple encrypted volumes with different keys. This mechanism is currently used by the pending GELI EFI work. Additionally, this mechanism can potentially be used to interface with GRUB, opening up options for coreboot+GRUB configurations with completely encrypted disks. Another benefit over the existing system is that it does not require re-deriving the user key from the password at each boot stage. Most of this patch was written by Eric McCorkle. It was extended by Allan Jude with a number of minor enhancements and extending the keybuf feature into boot2. GELI user keys are now derived once, in boot2, then passed to the loader, which reuses the key, then passes it to the kernel, where the GELI module destroys the keybuf after decrypting the volumes. Submitted by: Eric McCorkle <eric@metricspace.net> (Original Version) Reviewed by: oshogbo (earlier version), cem (earlier version) MFC after: 3 weeks Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D9575	2017-04-01 05:05:22 +00:00
Allan Jude	39b7ca4533	sys/geom/eli: Switch bzero() to explicit_bzero() for sensitive data In GELI, anywhere we are zeroing out possibly sensitive data, like the metadata struct, the metadata sector (both contain the encrypted master key), the user key, or the master key, use explicit_bzero. Didn't touch the bzero() used to initialize structs. Reviewed by: delphij, oshogbo Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D9809	2017-03-31 00:07:03 +00:00
Mark Johnston	0d75d0dfbc	Avoid sleeping when the mirror I/O queue is non-empty. A request may be queued while the queue lock is dropped when the mirror is being destroyed. The corresponding wakeup would be lost, possibly resulting in an apparent hang of the mirror worker thread. Tested by: pho (part of a larger patch) MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-03-29 19:39:07 +00:00
Mark Johnston	c1ab409cba	Remove an unneeded g_mirror_destroy_provider() call. The worker thread will destroy the mirror provider as part of its teardown sequence. The call made sense in the initial revision of gmirror, but became unnecessary in r137248. Tested by: pho (part of a larger diff) MFC afteR: 2 weeks Sponsored by: Dell EMC Isilon	2017-03-29 19:30:22 +00:00
Mark Johnston	819cd913f4	Refine r301173 a bit. - Don't execute any of g_mirror_shutdown_post_sync() when panicking. We cannot safely idle the mirror or stop synchronization in that state, and the current attempts to do so complicate debugging of gmirror itself. - Check for a non-NULL panicstr instead of using SCHEDULER_STOPPED(). The latter was added for use in the locking primitives. Reviewed by: mav, pjd MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-03-27 16:25:58 +00:00
Marcelo Araujo	7f5f84f08f	After r315112 I broke the tests with eli, instead to pass 0, I should pass M_NOWAIT to g_media_changed() that will call g_post_event() with this flag. Reported by: lwhsu, ngie and ae	2017-03-13 13:56:01 +00:00
Scott Long	d8474e52e3	Report disk flags via the sysctl tree	2017-03-13 11:09:17 +00:00
Marcelo Araujo	2ae0afa8ee	Add the capability to refresh the gpart(8) label without need a reboot. gpart(8) has functionality to change the label of an GPT partition. This functionality works like it should, however, after a label change the /dev/gpt/ entries remain unchanged. glabel(8) status output remains unchanged. The change only takes effect after a reboot. PR: 162690 Submitted by: sub.mesa@gmail, Ben RUBSON <ben.rubson@gmail.com>, ae Reviewed by: allanjude, bapt, bcr MFC after: 6 weeks. Differential Revision: https://reviews.freebsd.org/D9935	2017-03-12 04:15:56 +00:00
Alexander Motin	4d5832bc12	When chunking large DIOCGDELETE, do it on stripe edge. MFC after: 2 weeks	2017-03-08 12:18:58 +00:00
Mariusz Zaborski	c27fb0b589	The kern.geom.part.auto_resize should be tunable.	2017-02-28 20:51:20 +00:00
Mariusz Zaborski	01ad653a81	Add sysctl to control auto resize of the GEOM metadata. Reviewed by: AllanJude Differential Revision: https://reviews.freebsd.org/D9603	2017-02-27 17:54:01 +00:00
Marius Strobl	4874af73c1	- Allow different slicers for different flash types to be registered with geom_flashmap(4) and teach it about MMC for slicing enhanced user data area partitions. The FDT slicer still is the default for CFI, NAND and SPI flash on FDT-enabled platforms. - In addition to a device_t, also pass the name of the GEOM provider in question to the slicers as a single device may provide more than provider. - Build a geom_flashmap.ko. - Use MODULE_VERSION() so other modules can depend on geom_flashmap(4). - Remove redundant/superfluous GEOM routines that either do nothing or provide/just call default GEOM (slice) functionality. - Trim/adjust includes Submitted by: jhibbits (RouterBoard bits) Reviewed by: jhibbits	2017-02-22 10:21:39 +00:00
Allan Jude	85c15ab853	improve PBKDF2 performance The PBKDF2 in sys/geom/eli/pkcs5v2.c is around half the speed it could be GELI's PBKDF2 uses a simple benchmark to determine a number of iterations that will takes approximately 2 seconds. The security provided is actually half what is expected, because an attacker could use the optimized algorithm to brute force the key in half the expected time. With this change, all newly generated GELI keys will be approximately 2x as strong. Previously generated keys will talk half as long to calculate, resulting in faster mounting of encrypted volumes. Users may choose to rekey, to generate a new key with the larger default number of iterations using the geli(8) setkey command. Security of existing data is not compromised, as ~1 second per brute force attempt is still a very high threshold. PR: 202365 Original Research: https://jbp.io/2015/08/11/pbkdf2-performance-matters/ Submitted by: Joe Pixton <jpixton@gmail.com> (Original Version), jmg (Later Version) Reviewed by: ed, pjd, delphij Approved by: secteam, pjd (maintainer) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8236	2017-02-19 19:30:31 +00:00
John Baldwin	dcbe5188da	Defer startup of gjournal switcher kproc. Don't start switcher kproc until the first GEOM is created. Reviewed by: pjd MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8576	2017-02-07 22:45:59 +00:00
Andrey V. Elsukov	9ef6004352	Check that primary GPT header is valid before wiping partitioning. This allows safely destroy corrupted GPT when primary header was rewritten by some data, that do not want to destroy. MFC after: 1 week	2017-02-04 05:09:47 +00:00
Yoshihiro Takahashi	2b375b4edd	Remove pc98 support completely. I thank all developers and contributors for pc98. Relnotes: yes	2017-01-28 02:22:15 +00:00
Alexander Motin	d3fef0a092	Report disk addition errors on `add` or `create` subcommand. MFC after: 1 week	2017-01-20 13:49:04 +00:00
Alexander Motin	17160457b4	Report random flash storage as non-rotating to GEOM_DISK. While doing it, introduce respective constants in geom_disk.h. MFC after: 1 week	2017-01-12 08:53:10 +00:00
Conrad Meyer	b28ea2c250	g_raid: Prevent tasters from attempting excessively large reads Some g_raid tasters attempt metadata reads in multiples of the provider sectorsize. Reads larger than MAXPHYS are invalid, so detect and abort in such situations. Spiritually similar to r217305 / PR 147851. PR: 214721 Sponsored by: Dell EMC Isilon	2017-01-12 06:58:31 +00:00
Dimitry Andric	012039fd55	Fix logic error in gvinum's gv_set_sd_state() With clang 4.0.0, I'm getting the following warnings: sys/geom/vinum/geom_vinum_state.c:186:7: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses] if (!flags & GV_SETSTATE_FORCE) ^ ~ The logical not operator should obiously be called after masking. Reviewed by: mav, pfg MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D9093	2017-01-08 17:56:54 +00:00
Sepherosa Ziehau	c22dceff9d	build: Unbreak LINT Sponsored by: Microsoft	2016-12-21 01:39:11 +00:00
Konrad Witaszczyk	480f31c214	Add support for encrypted kernel crash dumps. Changes include modifications in kernel crash dump routines, dumpon(8) and savecore(8). A new tool called decryptcore(8) was added. A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump configuration in the diocskerneldump_arg structure to the kernel. The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for backward ABI compatibility. dumpon(8) generates an one-time random symmetric key and encrypts it using an RSA public key in capability mode. Currently only AES-256-CBC is supported but EKCD was designed to implement support for other algorithms in the future. The public key is chosen using the -k flag. The dumpon rc(8) script can do this automatically during startup using the dumppubkey rc.conf(5) variable. Once the keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O control. When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random IV and sets up the key schedule for the specified algorithm. Each time the kernel tries to write a crash dump to the dump device, the IV is replaced by a SHA-256 hash of the previous value. This is intended to make a possible differential cryptanalysis harder since it is possible to write multiple crash dumps without reboot by repeating the following commands: # sysctl debug.kdb.enter=1 db> call doadump(0) db> continue # savecore A kernel dump key consists of an algorithm identifier, an IV and an encrypted symmetric key. The kernel dump key size is included in a kernel dump header. The size is an unsigned 32-bit integer and it is aligned to a block size. The header structure has 512 bytes to match the block size so it was required to make a panic string 4 bytes shorter to add a new field to the header structure. If the kernel dump key size in the header is nonzero it is assumed that the kernel dump key is placed after the first header on the dump device and the core dump is encrypted. Separate functions were implemented to write the kernel dump header and the kernel dump key as they need to be unencrypted. The dump_write function encrypts data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps are not supported due to the way they are constructed which makes it impossible to use the CBC mode for encryption. It should be also noted that textdumps don't contain sensitive data by design as a user decides what information should be dumped. savecore(8) writes the kernel dump key to a key.# file if its size in the header is nonzero. # is the number of the current core dump. decryptcore(8) decrypts the core dump using a private RSA key and the kernel dump key. This is performed by a child process in capability mode. If the decryption was not successful the parent process removes a partially decrypted core dump. Description on how to encrypt crash dumps was added to the decryptcore(8), dumpon(8), rc.conf(5) and savecore(8) manual pages. EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU. The feature still has to be tested on arm and arm64 as it wasn't possible to run FreeBSD due to the problems with QEMU emulation and lack of hardware. Designed by: def, pjd Reviewed by: cem, oshogbo, pjd Partial review: delphij, emaste, jhb, kib Approved by: pjd (mentor) Differential Revision: https://reviews.freebsd.org/D4712	2016-12-10 16:20:39 +00:00
Alexander Motin	b6fe583c55	Add `gmirror create` subcommand, alike to gstripe, gconcat, etc. It is quite specific mode of operation without storing on-disk metadata. It can be useful in some cases in combination with some external control tools handling mirror creation and disks hot-plug. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-11-30 09:27:08 +00:00
Alexander Motin	dc399583ba	Use providergone method to cover race between destroy and g_access(). Reviewed by: markj MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-11-13 03:56:26 +00:00
Alexander Motin	80f0a89c62	Do not report error on close even if we have no paths left. MFC after: 2 weeks	2016-11-12 18:57:38 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Conrad Meyer	8532d381a9	Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code. This can be handy in tracking down what code touched hung bios and bufs last. The full history is especially useful, but adds enough bloat that it shouldn't be enabled in release builds. Function names (or arbitrary string constants) are tracked in a fixed-size ring in bufs. Bios gain a pointer to the upper buf for tracking. SCSI CCBs gain a pointer to the upper bio for tracking. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8366	2016-10-31 23:09:52 +00:00
Ruslan Bukin	ae8b1f90fe	Fix alignment issues on MIPS: align the pointers properly. All the 5520 GEOM_ELI tests passed successfully on MIPS64EB. Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D7905	2016-10-31 16:55:14 +00:00
Mark Johnston	5c2ac5cf2a	gmirror: Add a subroutine to free synchronization BIOs. This addresses a memory leak that occurs upon an I/O error during a mirror synchronization. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2016-10-20 23:08:40 +00:00
Mark Johnston	b450976dc2	gmirror: Release pending regular requests when synchronization stops. Normally gmirror allows colliding requests to proceed whenever a synchronization request completes and advances to the next offset. However if an I/O request collides with one of the final g_mirror_syncreqs, nothing releases it once synchronization completes, resulting in an apparent I/O hang. The same problem can occur if synchronization is aborted by an I/O error. Therefore, be sure to requeue pending requests when mirror synchronization is stopped for any reason. While here, remove some dead code from g_mirror_regular_release(). MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2016-10-20 23:02:30 +00:00
Alexander Motin	5a236b0ef9	Fix possible geom destruction before final provider close. Introduce internal counter to track opens. Using provider's counters is not very successfull after calling g_wither_provider(). MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2016-10-06 15:20:05 +00:00
Mark Johnston	4dea20be45	gmirror: Write an updated syncid before queuing writes. When a syncid bump is pending, any write to the mirror results in the updated syncid being written to each component's metadata block. However, the update was only being performed after the writes to the mirror componenents were queued. Instead, synchronously update the metadata block first. MFC after: 3 weeks Sponsored by: Dell EMC Isilon	2016-10-06 00:13:55 +00:00
Mark Johnston	903618cd65	gmirror: Bump the syncid if broken disks are found during startup. Consider a mirror with two components, m1 and m2. Suppose a hardware error results in the removal of m2, with m1's genid bumped. Suppose further that a replacement mirror component m3 is created and synchronized, after which the system is shut down uncleanly. During a subsequent bootup, if gmirror tastes m1 and m2 first, m2 will be removed from the mirror because it is broken, but the mirror will be started without bumping the syncid on m1 because all elements of the mirror are accounted for. Then m3 will be added to the already-running mirror with the same syncid as m1, so the components will not be synchronized despite the unclean shutdown. Handle this scenario by bumping the syncid of healthy components if any broken mirrors are discovered during mirror startup. MFC after: 3 weeks Sponsored by: Dell EMC Isilon	2016-10-06 00:05:45 +00:00
Mark Johnston	fff048e4bc	gmirror: Use bool instead of boolean_t. MFC after: 1 week Sponsored by: Dell EMC Isilon	2016-10-05 23:55:01 +00:00
Adrian Chadd	85ab1aeccf	[geom_redboot] Extend geom_redboot to handle non-zero fis offset. Submitted by: Mori Hiroki <yamori813@yahoo.co.jp> Differential Revision: https://reviews.freebsd.org/D7237	2016-10-04 16:35:38 +00:00
Alexander Motin	8b64f3ca6c	Use g_wither_provider() where applicable. It is just a helper function combining G_PF_WITHER setting with g_orphan_provider().	2016-09-23 21:29:40 +00:00
Edward Tomasz Napierala	0c4440c3aa	Follow up r305988 by removing g_bio_run_task and related code. The g_io_schedule_up() gets its "if" condition swapped to make it more similar to g_io_schedule_down(). Suggested by: mav@ Reviewed by: mav@ MFC after: 1 month	2016-09-20 09:18:33 +00:00
Edward Tomasz Napierala	bbdd6614bd	Remove unused bio_taskqueue(). MFC after: 1 month	2016-09-19 17:46:15 +00:00
Mark Johnston	4bfb585351	Don't treat an error from g_mirror_clear_metadata() as fatal. Such errors can occur as the result of a write error or because the disk backing the mirror element was removed. They result in a generation ID bump on all active elements of the mirror, so we can safely disconnect the mirror component rather than destroy it. MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7750	2016-09-06 23:42:59 +00:00
Mark Johnston	40c5032d32	Add some fail points to gmirror. These are useful for testing changes to I/O error handling, and for reproducing existing bugs in a controlled manner. The fail points are g_mirror_regular_request_read g_mirror_regular_request_write g_mirror_sync_request_read g_mirror_sync_request_write g_mirror_metadata_write They all effectively allow one to inject an error value into the bio_error field of a corresponding BIO request as it is being completed. MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division	2016-09-06 23:35:48 +00:00
Andrey V. Elsukov	0428336393	Do not invoke resize event if initial disk size is zero. Some disks report the size only after first opening. And due to the events are asynchronous, some consumers can receive this event too late and this confuses them. This partially restores previous behaviour, and at the same time this should fix the problem, when already opened provider loses resize event. PR: 211028 MFC after: 3 weeks	2016-08-01 20:54:54 +00:00
Andrey V. Elsukov	1f353a2315	Do not invoke resize method if geom is being withered. PR: 211028 MFC after: 2 weeks	2016-07-25 09:12:08 +00:00
Andrey V. Elsukov	f1ff88cf8c	Use g_resize_provider() to change the size of GEOM_DISK provider, when it is being opened. This should fix the possible loss of a resize event when disk capacity changed. PR: 211028 Reported by: Dexuan Cui <decui at microsoft dot com> MFC after: 3 weeks	2016-07-19 05:36:21 +00:00
Maxim Sobolev	55f9588af4	Relax checking if the privider size matches size recorded in the superblock, allowing provider to be bit bigger, i.e. have some extra padding after the FS image. That in some cases might be a side-effect of using CLOOP format which enforces certain block size and trying to compress image that is not exactly the number of those blocks in size. The UFS itself does not have any issues mounting such padded file systems, so it's what GEOM_LABEL should do. Submitted by: @mizhka_gmail.com Differential Revision: https://reviews.freebsd.org/D6208	2016-07-18 05:00:01 +00:00
Mark Johnston	7d31c3939a	Move some gmirror metadata update messages to a higher debug level. These can be printed quite frequently from a mostly-idle mirror, cluttering the console. MFC after: 1 week	2016-07-14 00:40:24 +00:00
Maxim Sobolev	74ba4047a3	1.Improve handling around last compressed block of the file, which is necessary because CLOOP format lacks explicit EOF or length, so that in the presence of padding or when the CLOOP is put onto a larger partition upper level provider size may be larger. Bound amount of extra data that we might touch to the max length of the compressed block and detect zero-padding in the last cluster, which when sector is all-zero might cause us to emit bogus I/O error after decompression of that fails. To not make code any more complicated that it needs to be deal with it in lazy-manner, i.e. when we first access that specific cluster. This change also fixes stupid mistake in the LZMA code, inherited from geom_lzma, which does not share length of the output buffer buffer with the decompression routine, so that in the presence of corrupted or purposedly tailored data may easily cause heap overflow and kernel memory corruption. Beef up validation of the CLOOP TOC by checking that lengths of all but the last compressed clusters match upper limit set by the decompressor and improve some error diagnostic output while I am here. 2.Add kern.geom.uzip.attach_to tunable to artifically limit attaching uzip to certain devices in the dev tree only. For example the following only makes us attaching to the GPT labels: kern.geom.uzip.attach_to="gpt/" 3.Add kern.geom.uzip.noattach_to, which does opposite to the (2) above, i.e. prevents geom_uzip from tasting / attaching to providers matching some pattern. By default we don't attach to our own kind, i.e. kern.geom.uzip.noattach_to=".uzip". It saves us quite some CPU cycles, esp on low-end embedded systems. Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D7013	2016-06-29 18:19:05 +00:00
Kenneth D. Merry	a02e196edd	Switch geom_disk over to using a pool mutex. The GEOM disk d_mtx is only acquired on disk creation and destruction. It is a good candidate for replacement with a pool mutex. This eliminates the mutex initialization and teardown and the mutex and name variables themselves from struct disk. sys/geom/geom_disk.h: Take d_mtx and d_mtx_name out of struct disk. sys/geom/geom_disk.c: Use mtx_pool_lock() and mtx_pool_unlock() to guard the disk initialization state instead of a dedicated mutex. This allows removing the initialization and destruction of d_mtx. sys/sys/param.h: Bump __FreeBSD_version to 1100119 for the change to struct disk. Suggested by: jhb Sponsored by: Spectra Logic Approved by: re (gjb)	2016-06-23 20:05:59 +00:00
Mark Johnston	be20fc2e90	Do not complete pending gmirror BIOs when tearing down the provider. This will result in lock recursion and is more generally incorrect since the completion handlers will just reinsert the BIOs into the queue we're trying to drain. Reviewed by: imp, ngie Approved by: re (gjb) MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6908	2016-06-22 21:00:28 +00:00
Kenneth D. Merry	e5616d65d0	Fix a bug that caused da(4) peripheral drivers to not fully go away after the underlying device went away. The problem was that callers who queue the GEOM resize provider event didn't check to make sure that the provider had not been withered. For the other equivalent case, g_new_provider_event(), the code checks to see whether the provider has been withered before queueing a g_new_provider_event() to the event thread. In some cases, a resize provider event would come through after the provider had been withered and all of the existing consumers had been orphaned. When the resize event triggered a taste of the provider, that would attach a new consumer to the now withered provider. The wither washer (g_wither_washer() would never be able to completely tear down the GEOM because of the consumers that were hanging around. The solution was to check the G_PF_WITHER provider flag before queueing the g_resize_provider_event(), and add an assert to g_resize_provider_event() to insure that it isn't called on a withered provider. sys/geom/geom_subr.c: In g_resize_provider(), don't try to continue if the G_PF_WITHER flag is set. In g_resize_provider_event(), add an assert that the G_PF_WITHER flag is not set. In g_access(), if a provider has an error, print out the name of the provider with the error. Sponsored by: Spectra Logic Approved by: re (marius) MFC after: 3 days	2016-06-22 14:39:13 +00:00
Kenneth D. Merry	1ff824e786	Fix a bug that caused da(4) instances to hang around after the underlying device is gone. The problem was that when disk_gone() is called, if the GEOM disk creation process has not yet happened, the withering process couldn't start. We didn't record any state in the GEOM disk code, and so the d_gone() callback to the da(4) driver never happened. The solution is to track the state of the creation process, and initiate the withering process from g_disk_create() if the disk is being created. This change does add fields to struct disk, and so I have bumped DISK_VERSION. geom_disk.c: Track where we are in the disk creation process, and check to see whether our underlying disk has gone away or not. In disk_gone(), set a new d_goneflag variable that g_disk_create() can check to see if it needs to clean up the disk instance. geom_disk.h: Add a mutex to struct disk (for internal use) disk init level, and a gone flag. Bump DISK_VERSION because the size of struct disk has changed and fields have been added at the beginning. Sponsored by: Spectra Logic Approved by: re (marius)	2016-06-21 20:18:19 +00:00
Gleb Smirnoff	a7c5163b5f	When we are in panic, always go the asynchronous path in g_mirror_destroy(), otherwise the system will hang. This is a temporarily least intrusive crutch to get certain panicing systems dumping. The proper fix should question is g_mirror_destroy() should be called on a panicing system at all. Discussed with: mav	2016-06-01 22:11:54 +00:00
Alan Somers	151746b244	Avoid issuing spa config updates for physical path when not necessary ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack. sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged Submitted by: will, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6428	2016-05-27 22:32:44 +00:00
Konstantin Belousov	d5446cc8f4	Remove unneeded Giant locking around kthreads creation. Sponsored by: The FreeBSD Foundation	2016-05-20 08:28:11 +00:00
Konstantin Belousov	4e2732b550	Removal of Giant droping wrappers for GEOM classes. Sponsored by: The FreeBSD Foundation	2016-05-20 08:25:37 +00:00
Konstantin Belousov	dff9131e58	Remove asserts that Giant is not held on entrance into geom KPI, which outlived their usefulness. This allows to remove drop/pickup Giant wrappers around GEOM calls. Discussed with: alfred, imp, phk Sponsored by: The FreeBSD Foundation	2016-05-20 08:22:20 +00:00
Kenneth D. Merry	9a6844d55f	Add support for managing Shingled Magnetic Recording (SMR) drives. This change includes support for SCSI SMR drives (which conform to the Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to the Zoned ATA Command Set or ZAC spec) behind SAS expanders. This includes full management support through the GEOM BIO interface, and through a new userland utility, zonectl(8), and through camcontrol(8). This is now ready for filesystems to use to detect and manage zoned drives. (There is no work in progress that I know of to use this for ZFS or UFS, if anyone is interested, let me know and I may have some suggestions.) Also, improve ATA command passthrough and dispatch support, both via ATA and ATA passthrough over SCSI. Also, add support to camcontrol(8) for the ATA Extended Power Conditions feature set. You can now manage ATA device power states, and set various idle time thresholds for a drive to enter lower power states. Note that this change cannot be MFCed in full, because it depends on changes to the struct bio API that break compatilibity. In order to avoid breaking the stable API, only changes that don't touch or depend on the struct bio changes can be merged. For example, the camcontrol(8) changes don't depend on the new bio API, but zonectl(8) and the probe changes to the da(4) and ada(4) drivers do depend on it. Also note that the SMR changes have not yet been tested with an actual SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT layer, so any testing help would be appreciated. These changes have been tested with Seagate Host Aware SATA drives attached to both SAS and SATA controllers. Also, I do not have any SATA Host Managed devices, and I suspect that it may take additional (hopefully minor) changes to support them. Thanks to Seagate for supplying the test hardware and answering questions. sbin/camcontrol/Makefile: Add epc.c and zone.c. sbin/camcontrol/camcontrol.8: Document the zone and epc subcommands. sbin/camcontrol/camcontrol.c: Add the zone and epc subcommands. Add auxiliary register support to build_ata_cmd(). Make sure to set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA flags as appropriate for ATA commands. Add a new get_ata_status() function to parse ATA result from SCSI sense descriptors (for ATA passthrough over SCSI) and ATA I/O requests. sbin/camcontrol/camcontrol.h: Update the build_ata_cmd() prototype Add get_ata_status(), zone(), and epc(). sbin/camcontrol/epc.c: Support for ATA Extended Power Conditions features. This includes support for all features documented in the ACS-4 Revision 12 specification from t13.org (dated February 18, 2016). The EPC feature set allows putting a drive into a power power mode immediately, or setting timeouts so that the drive will automatically enter progressively lower power states after various idle times. sbin/camcontrol/fwdownload.c: Update the firmware download code for the new build_ata_cmd() arguments. sbin/camcontrol/zone.c: Implement support for Shingled Magnetic Recording (SMR) drives via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA Command Set (ZAC). These specs were developed in concert, and are functionally identical. The primary differences are due to SCSI and ATA differences. (SCSI is big endian, ATA is little endian, for example.) This includes support for all commands defined in the ZBC and ZAC specs. sys/cam/ata/ata_all.c: Decode a number of additional ATA command names in ata_op_string(). Add a new CCB building function, ata_read_log(). Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building functions. These support both DMA and NCQ encapsulation. sys/cam/ata/ata_all.h: Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and ata_zac_mgmt_in(). sys/cam/ata/ata_da.c: Revamp the ada(4) driver to support zoned devices. Add four new probe states to gather information needed for zone support. Add a new adasetflags() function to avoid duplication of large blocks of flag setting between the async handler and register functions. Add new sysctl variables that describe zone support and paramters. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. sys/cam/scsi/scsi_all.c: Add command descriptions for the ZBC IN/OUT commands. Add descriptions for ZBC Host Managed devices. Add a new function, scsi_ata_pass() to do ATA passthrough over SCSI. This will eventually replace scsi_ata_pass_16() -- it can create the 12, 16, and 32-byte variants of the ATA PASS-THROUGH command, and supports setting all of the registers defined as of SAT-4, Revision 5 (March 11, 2016). Change scsi_ata_identify() to use scsi_ata_pass() instead of scsi_ata_pass_16(). Add a new scsi_ata_read_log() function to facilitate reading ATA logs via SCSI. sys/cam/scsi/scsi_all.h: Add the new ATA PASS-THROUGH(32) command CDB. Add extended and variable CDB opcodes. Add Zoned Block Device Characteristics VPD page. Add ATA Return SCSI sense descriptor. Add prototypes for scsi_ata_read_log() and scsi_ata_pass(). sys/cam/scsi/scsi_da.c: Revamp the da(4) driver to support zoned devices. Add five new probe states, four of which are needed for ATA devices. Add five new sysctl variables that describe zone support and parameters. The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC devices when they are attached via a SCSI to ATA Translation (SAT) layer. Since ZBC -> ZAC translation is a new feature in the T10 SAT-4 spec, most SATA drives will be supported via ATA commands sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will prefer the ZBC interface, if it is available, for performance reasons, but will use the ATA PASS-THROUGH interface to the ZAC command set if the SAT layer doesn't support translation yet. As I mentioned above, ZBC command support is untested. Add support for the new BIO_ZONE bio, and all of its subcommands: DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP, DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS. Add scsi_zbc_in() and scsi_zbc_out() CCB building functions. Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB building functions. Note that these have return values, unlike almost all other CCB building functions in CAM. The reason is that they can fail, depending upon the particular combination of input parameters. The primary failure case is if the user wants NCQ, but fails to specify additional CDB storage. NCQ requires using the 32-byte version of the SCSI ATA PASS-THROUGH command, and the current CAM CDB size is 16 bytes. sys/cam/scsi/scsi_da.h: Add ZBC IN and ZBC OUT CDBs and opcodes. Add SCSI Report Zones data structures. Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and scsi_ata_zac_mgmt_in() prototypes. sys/dev/ahci/ahci.c: Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver. ahci_setup_fis() previously set the top bits of the sector count register in the FIS to 0 for FPDMA commands. This is okay for read and write, because the PRIO field is in the only thing in those bits, and we don't implement that further up the stack. But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that byte, so it needs to be transmitted to the drive. In ahci_setup_fis(), always set the the top 8 bits of the sector count register. We need it in both the standard and NCQ / FPDMA cases. sys/geom/eli/g_eli.c: Pass BIO_ZONE commands through the GELI class. sys/geom/geom.h: Add g_io_zonecmd() prototype. sys/geom/geom_dev.c: Add new DIOCZONECMD ioctl, which allows sending zone commands to disks. sys/geom/geom_disk.c: Add support for BIO_ZONE commands. sys/geom/geom_disk.h: Add a new flag, DISKFLAG_CANZONE, that indicates that a given GEOM disk client can handle BIO_ZONE commands. sys/geom/geom_io.c: Add a new function, g_io_zonecmd(), that handles execution of BIO_ZONE commands. Add permissions check for BIO_ZONE commands. Add command decoding for BIO_ZONE commands. sys/geom/geom_subr.c: Add DDB command decoding for BIO_ZONE commands. sys/kern/subr_devstat.c: Record statistics for REPORT ZONES commands. Note that the number of bytes transferred for REPORT ZONES won't quite match what is received from the harware. This is because we're necessarily counting bytes coming from the da(4) / ada(4) drivers, which are using the disk_zone.h interface to communicate up the stack. The structure sizes it uses are slightly different than the SCSI and ATA structure sizes. sys/sys/ata.h: Add many bit and structure definitions for ZAC, NCQ, and EPC command support. sys/sys/bio.h: Convert the bio_cmd field to a straight enumeration. This will yield more space for additional commands in the future. After change r297955 and other related changes, this is now possible. Converting to an enumeration will also prevent use as a bitmask in the future. sys/sys/disk.h: Define the DIOCZONECMD ioctl. sys/sys/disk_zone.h: Add a new API for managing zoned disks. This is very close to the SCSI ZBC and ATA ZAC standards, but uses integers in native byte order instead of big endian (SCSI) or little endian (ATA) byte arrays. This is intended to offer to the complete feature set of the ZBC and ZAC disk management without requiring the application developer to include SCSI or ATA headers. We also use one set of headers for ioctl consumers and kernel bio-level consumers. sys/sys/param.h: Bump __FreeBSD_version for sys/bio.h command changes, and inclusion of SMR support. usr.sbin/Makefile: Add the zonectl utility. usr.sbin/diskinfo/diskinfo.c Add disk zoning capability to the 'diskinfo -v' output. usr.sbin/zonectl/Makefile: Add zonectl makefile. usr.sbin/zonectl/zonectl.8 zonectl(8) man page. usr.sbin/zonectl/zonectl.c The zonectl(8) utility. This allows managing SCSI or ATA zoned disks via the disk_zone.h API. You can report zones, reset write pointers, get parameters, etc. Sponsored by: Spectra Logic Differential Revision: https://reviews.freebsd.org/D6147 Reviewed by: wblock (documentation)	2016-05-19 14:08:36 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Maxim Sobolev	e3d7ead7df	Add missing include "opt_geom.h" to make GEOM_UZIP_DEBUG option working, also rename enum member so it does not conflict with GEOM_UZIP option name. Submitted by: mizhka@gmail.com Differential Revision: https://reviews.freebsd.org/D6207	2016-05-06 20:32:39 +00:00
Pedro F. Giffuni	4ed3c0e713	sys: Make use of our rounddown() macro when sys/param.h is available. No functional change.	2016-04-30 14:41:18 +00:00
Pedro F. Giffuni	e8d5712284	sys/geom: spelling fixes in comments. No functional change.	2016-04-29 20:56:58 +00:00
Pedro F. Giffuni	310aef3257	sys/geom: spelling fixes. These affect debugging messages. MFC after: 2 weeks	2016-04-28 19:26:46 +00:00
Pedro F. Giffuni	b99bce73e2	geom: unsign some types to match their definitions and avoid overflows. In struct:gctl_req, nargs is unsigned. In mirror: g_mirror_syncreqs is unsigned. In raid: in struct:g_raid_volume, v_disks_count is unsigned. In virstor: in struct:g_virstor_softc, n_components is unsigned. MFC after: 2 weeks	2016-04-27 15:10:40 +00:00
Conrad Meyer	4a2776e538	g_part_bsd64: Delete duplicate/dead code RAW_PART is handled earlier in the loop. Reported by: Coverity CID: 1223201 Sponsored by: EMC / Isilon Storage Division	2016-04-26 22:32:33 +00:00
Conrad Meyer	5ad33e776f	g_part_bsd64: Check for valid on-disk npartitions value This value is u32 on disk, but assigned to an int in memory. After we do the implicit conversion via assignment, check that the result is at least one[1] (non-negative[2]). 1. The subsequent for-loop iterates from gpt_entries minus one, down, until reaching zero. A negative or zero initial index results in undefined signed integer overflow. 2. It is also used to index into arrays later. In practice, we expected non-malicious disks to contain small positive values. Reported by: Coverity CID: 1223202 Sponsored by: EMC / Isilon Storage Division	2016-04-26 22:30:54 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
Maxim Sobolev	f260c3eadc	Relax TOC offsets checking somewhat, allowing offset pointing to the next byte past EOF to denote zero-block(s) at the very end of the file.	2016-04-26 06:50:38 +00:00
Maxim Sobolev	416ee66e25	o Fix handling of images with compression block sizes comparable to MAXPHYS. o Improve debug somewhat; o Convert "BUG BUG BUG message" into a proper KASSERT.	2016-04-23 06:31:46 +00:00
Alan Somers	1c2c346f09	DRY on buffer sizes. Update to r298420. sys/geom/geom_disk.c: In disk_attr_changed, don't repeat a buffer size. Reported by: ngie, hselasky MFC after: 4 weeks X-MFC-With: 298420 Sponsored by: Spectra Logic Corp	2016-04-21 21:13:41 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Alan Somers	42f42c9942	Notify userspace listeners when geom disk attributes have changed sys/geom/geom_disk.c: disk_attr_changed(): Generate a devctl event of type GEOM:<attr> for every call. MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5952	2016-04-21 16:43:15 +00:00
Pedro F. Giffuni	63b6b7a74a	Indentation issues. Contract some lines leftover from r298310. Mea culpa.	2016-04-20 16:19:44 +00:00
Pedro F. Giffuni	02abd40029	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
Pedro F. Giffuni	01b5c6f73e	g_gate: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:18:07 +00:00
Warner Losh	9a8fa125c1	Bump bio_cmd and bio_*flags from 8 bits to 16. Differential Revision: https://reviews.freebsd.org/D5784	2016-04-14 05:10:41 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Allan Jude	d873662594	Create the GELIBOOT GEOM_ELI flag This flag indicates that the user wishes to use the GELIBOOT feature to boot from a fully encrypted root file system. Currently, GELIBOOT does not support key files, and in the future when it does, they will be loaded differently. Due to the design of GELI, and the desire for secrecy, the GELI metadata does not know if key files are used or not, it just adds the key material (if any) to the HMAC before the optional passphrase, so there is no way to tell if a GELI partition requires key files or not. Since the GELIBOOT code in boot2 and the loader does not support keys, they will now only attempt to attach if this flag is set. This will stop GELIBOOT from prompting for passwords to GELIs that it cannot decrypt, disrupting the boot process PR: 208251 Reviewed by: ed, oshogbo, wblock Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D5867	2016-04-08 01:25:25 +00:00
Pedro F. Giffuni	21ff1f7469	g_sched_destroy(): prevent return of uninitialized scalar variable. For the !gsp case there some chance of returning an uninitialized return value. Prevent that from happening by initializing the error value. CID: 1006421	2016-04-03 16:25:51 +00:00
Warner Losh	ca19dfe480	Don't assume that bio_cmd is a bit mask. Differential Revision: https://reviews.freebsd.org/D5592	2016-03-10 06:25:39 +00:00
Warner Losh	8076d204da	Don't assume that bio_cmd is bit mask. Differential Revision: https://reviews.freebsd.org/D5593	2016-03-10 06:25:31 +00:00
Adrian Chadd	443a0f85dd	Fixes to make it compile under gcc-4.2.	2016-02-24 02:52:49 +00:00
Maxim Sobolev	5497acc527	Obsolete mkulzma(8) and geom_uncompress(4), their functionality is now provided by mkuzip(8) and geom_uzip(4) respectively. MFC after: 1 month	2016-02-24 00:39:36 +00:00
Maxim Sobolev	8f8cb840b0	Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333	2016-02-23 23:59:08 +00:00
Warner Losh	bd4c1dd6d6	Use the right size for zeroing. Submitted by: rpokala@	2016-02-17 18:28:38 +00:00
Warner Losh	c55f57071a	Create an API to reset a struct bio (g_reset_bio). This is mandatory for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI.	2016-02-17 17:16:02 +00:00
Adrian Chadd	61789a9a76	Teach the flashmap code about the SPI flash. PR: kern/206227 Submitted by: Stanislav Galabov <sgalabov@gmail.com>	2016-01-23 05:26:29 +00:00
Ravi Pokala	cb03a5029b	Add rotationrate to geom disk dumpconf Parse and report the nominal rotation rate reported by the drive. Reviewed by: sbruno, jhb Approved by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D4483 Requested by: Kevin Bowling < kevin.bowling @ kev009.com >	2016-01-14 21:52:21 +00:00
Allan Jude	4332feca4b	Make additional parts of sys/geom/eli more usable in userspace The upcoming GELI support in the loader reuses parts of this code Some ifdefs are added, and some code is moved outside of existing ifdefs The HMAC parts of GELI are broken out into their own file, to separate them from the kernel crypto/openssl dependant parts that are replaced in the boot code. Passed the GELI regression suite (tools/regression/geom/eli) Files=20 Tests=14996 Result: PASS Reviewed by: pjd, delphij MFC after: 1 week Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D4699	2016-01-07 05:47:34 +00:00
Allan Jude	9c0c355f2a	Add some additional GPT partition types 4 ChromeOS GPT types 2 Microsoft partition types the new OpenBSD partition type Approved by: marcel (mentor) MFC after: 1 week Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3841	2015-12-27 18:12:13 +00:00
Allan Jude	7a3f5d11fb	Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c cperciva's libmd implementation is 5-30% faster The same was done for SHA256 previously in r263218 cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation Extend sbin/md5 to create sha384(1) Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h} Reviewed by: cperciva, des, delphij Approved by: secteam, bapt (mentor) MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3929	2015-12-27 17:33:59 +00:00
Allan Jude	1747e1d875	Fix incorrect error message in geom map If geom_map fails to find the end of a mapped partition based on a search, it would return the incorrect error message, stating it could not parse the START value Reviewed by: adrian Approved by: bapt (mentor) Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D4187	2015-12-27 17:09:23 +00:00
Warner Losh	268f69f40b	It turns out that it's OK to sleep in this context, so use M_WAITOK for the softc for the delay module. Noticed by: rpokala@	2015-12-18 14:10:00 +00:00
Warner Losh	6a607537da	Scheduling module to introduce a fixed delay into the I/O path.	2015-12-18 05:39:25 +00:00
Steven Hartland	25080ac4d4	Prevent g_access calls to bad multipath members When a multipath member is orphaned its access members are zeroed before its removed if marked for wither, so prevent any future calls to g_access on such members. This prevents a panic on debug kernels which validates the resultant values aren't negative. Reviewed by: mav MFC after: 2 weeks Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4416	2015-12-15 21:11:41 +00:00
Andrey V. Elsukov	af90a87209	Make detection of GPT a bit more reliable. When we are detecting a partition table and didn't find PMBR, try to read backup GPT header from the last sector and if it is correct, assume that we have GPT. Reviewed by: rpokala MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D4282	2015-12-10 10:35:07 +00:00
Kenneth D. Merry	985108aeb1	Fix a style issue in g_disk_limit(). Noticed by: bdrewery MFC after: 1 week	2015-12-04 03:44:12 +00:00
Kenneth D. Merry	42fbdde413	Fix g_disk_vlist_limit() to work properly with deletes. Add a new bp argument to g_disk_maxsegs(), and add a new function, g_disk_maxsize() tha will properly determine the maximum I/O size for a delete or non-delete bio. Submitted by: will MFC after: 1 week Sponsored by: Spectra Logic	2015-12-04 03:38:35 +00:00
Kenneth D. Merry	a9934668aa	Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility. CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed. While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations. Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. 2. Add an ioctl to list currently outstanding CCBs in the various queues. 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that. 4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists. 5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it is something that could be changed later on. Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface. This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface. It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices. It can also copy data to and from regular files, block devices, tape devices, pipes, stdin, and stdout. It does not support queueing multiple commands to any of those targets, since it uses the standard read(2)/write(2)/writev(2)/readv(2) system calls. The I/O is done by two threads, one for the reader and one for the writer. The reader thread sends completed read requests to the writer thread in strictly sequential order, even if they complete out of order. That could be modified later on for random I/O patterns or slightly out of order I/O. camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from the pass(4) driver and also to send request notifications internally. For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) per CAM CCB on the reading side, and a scatter/gather list (CAM_DATA_SG) on the writing side. In addition to testing both interfaces, this makes any potential reblocking of I/O easier. No data is copied between the reader and the writer, but rather the reader's buffers are split into multiple I/O requests or combined into a single I/O request depending on the input and output blocksize. For the file I/O path, camdd(8) also uses a single buffer (read(2), write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list (readv(2), writev(2), preadv(2), pwritev(2)) on writes. Things that would be nice to do for camdd(8) eventually: 1. Add support for I/O pattern generation. Patterns like all zeros, all ones, LBA-based patterns, random patterns, etc. Right Now you can always use /dev/zero, /dev/random, etc. 2. Add support for a "sink" mode, so we do only reads with no writes. Right now, you can use /dev/null. 3. Add support for automatic queue depth probing, so that we can figure out the right queue depth on the input and output side for maximum throughput. At the moment it defaults to 6. 4. Add support for SATA device passthrough I/O. 5. Add support for random LBAs and/or lengths on the input and output sides. 6. Track average per-I/O latency and busy time. The busy time and latency could also feed in to the automatic queue depth determination. sys/cam/scsi/scsi_pass.h: Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue and fetch asynchronous CAM CCBs respectively. Although these ioctls do not have a declared argument, they both take a union ccb pointer. If we declare a size here, the ioctl code in sys/kern/sys_generic.c will malloc and free a buffer for either the CCB or the CCB pointer (depending on how it is declared). Since we have to keep a copy of the CCB (which is fairly large) anyway, having the ioctl malloc and free a CCB for each call is wasteful. sys/cam/scsi/scsi_pass.c: Add asynchronous CCB support. Add two new ioctls, CAMIOQUEUE and CAMIOGET. CAMIOQUEUE adds a CCB to the incoming queue. The CCB is executed immediately (and moved to the active queue) if it is an immediate CCB, but otherwise it will be executed in passstart() when a CCB is available from the transport layer. When CCBs are completed (because they are immediate or passdone() if they are queued), they are put on the done queue. If we get the final close on the device before all pending I/O is complete, all active I/O is moved to the abandoned queue and we increment the peripheral reference count so that the peripheral driver instance doesn't go away before all pending I/O is done. The new passcreatezone() function is called on the first call to the CAMIOQUEUE ioctl on a given device to allocate the UMA zones for I/O requests and S/G list buffers. This may be good to move off to a taskqueue at some point. The new passmemsetup() function allocates memory and scatter/gather lists to hold the user's data, and copies in any data that needs to be written. For virtual pointers (CAM_DATA_VADDR), the kernel buffer is malloced from the new pass(4) driver malloc bucket. For virtual scatter/gather lists (CAM_DATA_SG), buffers are allocated from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. Physical pointers are passed in unchanged. We have support for up to 16 scatter/gather segments (for the user and kernel S/G lists) in the default struct pass_io_req, so requests with longer S/G lists require an extra kernel malloc. The new passcopysglist() function copies a user scatter/gather list to a kernel scatter/gather list. The number of elements in each list may be different, but (obviously) the amount of data stored has to be identical. The new passmemdone() function copies data out for the CAM_DATA_VADDR and CAM_DATA_SG cases. The new passiocleanup() function restores data pointers in user CCBs and frees memory. Add new functions to support kqueue(2)/kevent(2): passreadfilt() tells kevent whether or not the done queue is empty. passkqfilter() adds a knote to our list. passreadfiltdetach() removes a knote from our list. Add a new function, passpoll(), for poll(2)/select(2) to use. Add devstat(9) support for the queued CCB path. sys/cam/ata/ata_da.c: Add support for the BIO_VLIST bio type. sys/cam/cam_ccb.h: Add a new enumeration for the xflags field in the CCB header. (This doesn't change the CCB header, just adds an enumeration to use.) sys/cam/cam_xpt.c: Add a new function, xpt_setup_ccb_flags(), that allows specifying CCB flags. sys/cam/cam_xpt.h: Add a prototype for xpt_setup_ccb_flags(). sys/cam/scsi/scsi_da.c: Add support for BIO_VLIST. sys/dev/md/md.c: Add BIO_VLIST support to md(4). sys/geom/geom_disk.c: Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size limiting code in g_disk_start() a bit. sys/kern/subr_bus_dma.c: Change _bus_dmamap_load_vlist() to take a starting offset and length. Add a new function, _bus_dmamap_load_pages(), that will load a list of physical pages starting at an offset. Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios. Allow unmapped I/O to start at an offset. sys/kern/subr_uio.c: Add two new functions, physcopyin_vlist() and physcopyout_vlist(). sys/pc98/include/bus.h: Guard kernel-only parts of the pc98 machine/bus.h header with #ifdef _KERNEL. This allows userland programs to include <machine/bus.h> to get the definition of bus_addr_t and bus_size_t. sys/sys/bio.h: Add a new bio flag, BIO_VLIST. sys/sys/uio.h: Add prototypes for physcopyin_vlist() and physcopyout_vlist(). share/man/man4/pass.4: Document the CAMIOQUEUE and CAMIOGET ioctls. usr.sbin/Makefile: Add camdd. usr.sbin/camdd/Makefile: Add a makefile for camdd(8). usr.sbin/camdd/camdd.8: Man page for camdd(8). usr.sbin/camdd/camdd.c: The new camdd(8) utility. Sponsored by: Spectra Logic MFC after: 1 week	2015-12-03 20:54:55 +00:00
Steven Hartland	86787e8d97	Fix early kernel dump via dumpdev env Setting the dumpdev via env e.g. loader.conf provides the ability to configure the kernel dump device during early boot. When using this g_io_getattr was returning EPERM due to cp->acr == 0. Fix this by calling g_access to ensure we're a read consumer prior to calling g_dev_setdumpdev. MFC after: 2 weeks Sponsored by: Multiplay	2015-11-17 20:55:50 +00:00
Steven Hartland	2dc7e36b0b	Fix g_eli error loss conditions * Ensure that error information isn't lost. * Log the error code in all cases. * Don't overwrite bio_completed set to 0 from the error condition. MFC after: 2 weeks Sponsored by: Multiplay	2015-11-05 17:37:35 +00:00
Alexander Motin	4a3760bae6	Remove compatibility shims for legacy ATA device names. We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely removed old stack at 10.x, so at 11.x it is time to remove compat shims.	2015-10-11 13:01:51 +00:00
Edward Tomasz Napierala	45d7de1d37	Make geom_nop(4) collect statistics on all types of BIOs, not just reads and writes. PR: kern/198405 Submitted by: Matthew D. Fuller <fullermd at over-yonder dot net> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3679	2015-10-10 09:03:31 +00:00
Conrad Meyer	b4d7290796	geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon Skip a /dev/ prefix, if one is present, when checking for matching device names for dump. Suggested by: avg Reviewed by: markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3725	2015-09-23 21:08:52 +00:00
Edward Tomasz Napierala	bb27d7ed90	Add a way to specify stripesize and stripeoffset to gnop(8). This makes it possible to "simulate" 4K media, to eg test alignment handling. Reviewed by: mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3664	2015-09-15 18:01:59 +00:00
Warner Losh	3f2e5b8584	After the introduction of direct dispatch, the pacing code in g_down() broke in two ways. One, the pacing variable was accessed in multiple threads in an unsafe way. Two, since large numbers of I/O could come down from the buf layer at one time, large numbers of allocation failures could happen all at once, resulting in a huge pace value that would limit I/Os to 10 IOPS for minutes (or even hours) at a time. While a real solution to these problems requires substantial work (to go to a no-allocation after the first model, or to have some way to wait for more memory with some kind of reserve for pager and swapper requests), it is relatively easy to make this simplistic pacing less pathological. Move to using a volatile variable with loads and stores. While this is a little racy, losing the race is safe: either you get memory and proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever is larger) instead of 100ms. This removes the artificial 10 IOPS limit while still easing up on new I/Os during memory shortages. Remove tying the amount of time we do this to the number of failed requests and do it only as long as we keep failing requests. Finally, to avoid needless recursion when memory is tight (start -> g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the stack), don't do direct dispatch while pacing. This should be a rare event (not steady state) so the performance hit here is worth the extra safety of not starving g_down() with directly dispatched I/O. Differential Review: https://reviews.freebsd.org/D3546	2015-09-02 17:29:30 +00:00
Justin Hibbits	6aabc119b6	Create a RouterBoard platform and use it to create a flash map Summary: The RouterBoard uses a predefined partition map which doesn't exist in the fdt. This change allows overriding the fdt slicer with a custom slicer, and uses this custom slicer to define the flash map on the RouterBoard RB800. D3305 converts the mpc85xx platform into a base class, so that systems based on the mpc85xx platform can add their own overrides. This change builds on D3305, and creates a RouterBoard (RB800) platform to initialize the slicer override. Reviewed By: nwhitehorn, imp Differential Revision: https://reviews.freebsd.org/D3345	2015-08-22 05:50:18 +00:00
Pedro F. Giffuni	6bc3fe5f4e	Clean out some externally visible "more then" grammar MFC after: 3 days	2015-08-11 03:12:09 +00:00
Enji Cooper	604083d74c	Make some debug printf's into DPRINTF's to reduce noise on attach/detahh Similar reasoning to what was done in r286367 with geom_uzip(4) MFC after: 2 weeks Differential Revision: D3320 Sponsored by: EMC / Isilon Storage Division	2015-08-09 06:58:06 +00:00
Pawel Jakub Dawidek	46e3447026	Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected when GELI is used on a SSD or inside virtual machine, so that guest can tell host that it is no longer using some of the storage. Enabling BIO_DELETE passthru comes with a small security consequence - an attacker can tell how much space is being really used on encrypted device and has less data no analyse then. This is why the -T option can be given to the init subcommand to turn off this behaviour and -t/T options for the configure subcommand can be used to adjust this setting later. PR: 198863 Submitted by: Matthew D. Fuller fullermd at over-yonder dot net This commit also includes a fix from Fabian Keil freebsd-listen at fabiankeil.de for 'configure' on onetime providers which is not strictly related, but is entangled in the same code, so would cause conflicts if separated out.	2015-08-08 09:51:38 +00:00
Konstantin Belousov	347e9d5495	Minor style cleanup of the code surrounding r286404. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-07 08:24:12 +00:00
Konstantin Belousov	9b34965019	The condition to use direct processing for the unmapped bio is reverted. We can do direct processing when g_io_check() does not need to perform transient remapping of the bio, otherwise the thread has to sleep. Reviewed by: mav (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-07 08:13:34 +00:00
Pawel Jakub Dawidek	5ee9ea19fe	After crypto_dispatch() bio might be already delivered and destroyed, so we cannot access it anymore. Setting an error later lead to memory corruption. Assert that crypto_dispatch() was successful. It can fail only if we pass a bogus crypto request, which is a bug in the program, not a runtime condition. PR: 199705 Submitted by: luke.tw Reviewed by: emaste MFC after: 3 days	2015-08-06 17:13:34 +00:00
Enji Cooper	fcc8461cfb	Make some debug printf's into DPRINTF's to reduce noise on attach/detach Differential Revision: https://reviews.freebsd.org/D3306 MFC after: 1 week Reviewed by: loos Sponsored by: EMC / Isilon Storage Division	2015-08-06 15:30:14 +00:00
Edward Tomasz Napierala	72800098bf	Fix panic triggered by code like this: open("/dev/md0", O_EXEC); Discussed with: kib@, mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3051	2015-08-04 10:40:08 +00:00
Edward Tomasz Napierala	d6cc35b287	Fix panic that would happen on forcibly unmounting devfs (note that as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified to trigger the panic) with consumers still opened. Note that this still results in a leak of r/w/e counters. It seems to be harmless, though. If anyone knows a better way to approach this - please tell. Discussed with: kib@, mav@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3050	2015-08-03 16:35:18 +00:00
Andrey V. Elsukov	da6c24e123	Report the scheme and provider names in warning message about unaligned partition. PR: 201873 MFC after: 1 week	2015-07-26 11:16:48 +00:00
Allan Jude	ce808c7ad8	Add a new option to gpart(8) to fix Lenovo BIOS boot issue PR: 184910 Reviewed by: ae, wblock Approved by: marcel MFC after: 3 days Relnotes: yes Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3065	2015-07-15 02:23:55 +00:00
Pawel Jakub Dawidek	4273d41299	Spoil even can happen for some time now even on providers opened exclusively (on the media change event). Update GELI to handle that situation. PR: 201185 Submitted by: Matthew D. Fuller	2015-07-10 19:27:19 +00:00
Pawel Jakub Dawidek	fefb6a143a	Properly propagate errors in metadata reading. PR: 198860 Submitted by: Matthew D. Fuller	2015-07-02 10:57:34 +00:00
Pawel Jakub Dawidek	edaa9008ff	Allow to omit keyfile number for the first keyfile.	2015-07-02 10:55:32 +00:00
Edward Tomasz Napierala	628b712826	Fix off-by-one error in fstyp(8) and geom_label(4) that made them use a single space (" ") as a CD9660 label name when no label was present. Similar problem was also present in msdosfs label recognition. PR: 200828 Differential Revision: https://reviews.freebsd.org/D2830 Reviewed by: asomers@, emaste@ MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2015-06-18 21:55:55 +00:00
Andrey V. Elsukov	e7d0c7e458	Teach G_PART_GPT class to handle g_resize_provider event. MFC after: 10 days	2015-06-08 12:52:41 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Andrey V. Elsukov	153c57b5b4	Read GEOM_UNCOMPRESS metadata using several requests that fit into MAXPHYS. For large compressed images the metadata size can be bigger than MAXPHYS and this triggers KASSERT in g_read_data(). Also use g_free() to free memory allocated by g_read_data(). PR: 199476 MFC after: 2 weeks	2015-05-19 09:28:52 +00:00
Andrey V. Elsukov	4b8d4f97b0	Add apple-boot, apple-hfs and apple-ufs aliases to MBR scheme. Sort DOSPTYP_* entries in diskmbr.h by value. Document these scheme-specific types in gpart(8). MFC after: 1 week	2015-05-05 09:33:02 +00:00
Craig Rodrigues	d9db52256e	Move zlib.c from net to libkern. It is not network-specific code and would be better as part of libkern instead. Move zlib.h and zutil.h from net/ to sys/ Update includes to use sys/zlib.h and sys/zutil.h instead of net/ Submitted by: Steve Kiernan stevek@juniper.net Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28 Relnotes: yes	2015-04-22 14:38:58 +00:00
Pedro F. Giffuni	4a5e6b854d	g_uncompress_taste: prevent a double free. Found by: Clang Static Analyzer MFC after: 1 week	2015-04-20 16:31:27 +00:00
Alexander Motin	0ada3afc25	Remove sleeps from geom_up thread on device destruction. MFC after: 3 days.	2015-04-09 13:09:05 +00:00
Alexander Motin	5d85cd2d11	Remove extra semicolon. MFC after: 1 week	2015-03-27 12:45:20 +00:00
Alexander Motin	3ab0187add	Remove request sorting from GEOM_MIRROR and GEOM_RAID. When CPU is not busy, those queues are typically empty. When CPU is busy, then one more extra sorting is the last thing it needs. If specific device (HDD) really needs sorting, then it will be done later by CAM. This supposed to fix livelock reported for mirror of two SSDs, when UFS fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by pointless sorting of requests and responses under single mutex lock. MFC after: 2 weeks	2015-03-27 12:44:28 +00:00
Alexander Motin	41fe4ba647	Fix bug on memory allocation error in split method. While there, use bioq_takefirst() in place where it is convenient. MFC after: 1 week	2015-03-27 11:14:12 +00:00
Alexander Motin	5523c82c1a	Make GEOM_PART work in presence of previous withered self. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2015-03-26 12:17:47 +00:00
Alexander Motin	2f36085dcf	Report withered providers as such alike to GEOMs. MFC after: 2 weeks	2015-03-26 11:19:24 +00:00
Alexander Motin	ba772028db	When searching for provider by name, prefer non-withered one. MFC after: 2 weeks	2015-03-26 11:02:29 +00:00
Adrian Chadd	28d507fcec	Fix the label search routine in geom_map to not trip up on '\0' bytes. * Just do the buf check early and fail out * If the offset being searched is: 00110000 00 b5 7e 45 61 e2 76 d3 c1 78 dd 15 95 cd 1f f1 \|..~Ea.v..x......\| .. and the match string is '.!/bin/sh' .. then it'll set the match string[0] to '\0', do a strncmp() against the read buffer, find it's matching two zero-length strings, and think that's where to start. MFC after: 2 weeks	2015-03-19 03:58:25 +00:00
Andrey V. Elsukov	4fb4ebe0a4	Add GUID and alias for Apple Core Storage partition. PR: 196241 MFC after: 1 week	2015-03-12 18:51:31 +00:00
Alexander Motin	7715befdf2	Fix couple BIO_DELETE bugs in geom_mirror. Do not report GEOM::candelete if none of providers support BIO_DELETE. If consumer still requests BIO_DELETE, report error instead of hanging. MFC after: 2 weeks	2015-03-12 10:20:53 +00:00
Alexander Motin	0b1b7c2cec	Replace constant with proper sizeof(). Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2015-02-25 10:18:11 +00:00
Edward Tomasz Napierala	01de1a0650	Add devd(8) notifications for creation and destruction of GEOM devices. Differential Revision: https://reviews.freebsd.org/D1211 MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-01-14 11:15:57 +00:00
Warner Losh	a91275f72f	Remove old ioctl use and support, once and for all.	2015-01-06 05:28:37 +00:00
Warner Losh	0acf08d985	Remove support for FreeBSD 7 and really old FreeBSD 8. The classifiers have been in the base for a while, so the gymnastics here aren't needed. In addition, the bugs in subr_disk.c have been fixed since 2009, so there's no need for an identical copy of it in the tree anymore. There's really no need to binary patch g_io_request, so let's get rid of the code (not compiled in anymore) lest others think it is a good idea.	2014-12-20 00:04:01 +00:00
John-Mark Gurney	08fca7a56b	Add some new modes to OpenCrypto. These modes are AES-ICM (can be used for counter mode), and AES-GCM. Both of these modes have been added to the aesni module. Included is a set of tests to validate that the software and aesni module calculate the correct values. These use the NIST KAT test vectors. To run the test, you will need to install a soon to be committed port, nist-kat that will install the vectors. Using a port is necessary as the test vectors are around 25MB. All the man pages were updated. I have added a new man page, crypto.7, which includes a description of how to use each mode. All the new modes and some other AES modes are present. It would be good for someone else to go through and document the other modes. A new ioctl was added to support AEAD modes which AES-GCM is one of them. Without this ioctl, it is not possible to test AEAD modes from userland. Add a timing safe bcmp for use to compare MACs. Previously we were using bcmp which could leak timing info and result in the ability to forge messages. Add a minor optimization to the aesni module so that single segment mbufs don't get copied and instead are updated in place. The aesni module needs to be updated to support blocked IO so segmented mbufs don't have to be copied. We require that the IV be specified for all calls for both GCM and ICM. This is to ensure proper use of these functions. Obtained from: p4: //depot/projects/opencrypto Relnotes: yes Sponsored by: FreeBSD Foundation Sponsored by: NetGate	2014-12-12 19:56:36 +00:00
Alexander Motin	1e68fe9c33	Avoid unneeded malloc/memcpy/free if there is no metadata on disk. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2014-12-05 10:23:18 +00:00
Alexander Motin	26f0f92fa2	Decode some binary fields of Intel metadata. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 2 weeks	2014-12-04 15:54:45 +00:00
Warner Losh	66cc25a224	Actually, that was a bad idea. Go back to MAXPARTITIONS. Submitted by: bruce	2014-11-20 17:31:25 +00:00
Warner Losh	dd87e2c610	The number of BSD partitions is variable. Return the proper number (which is in basetable->gpt_entries). Submitted by: ae@	2014-11-19 18:55:27 +00:00
Warner Losh	73f49e9eef	Implement the historic DIOCGDINFO ioctl for gpart on BSD partitions. Several utilities still use this interface and require additional information since gpart was activated than before. This allows fsck of a UFS partition without having to specify it is UFS, per historic behavior.	2014-11-18 17:06:40 +00:00
Pawel Jakub Dawidek	5ebb15b942	Add missing privilege check when setting the dump device. Before that change it was possible for a regular user to setup the dump device if he had write access to the given device. In theory it is a security issue as user might get access to kernel's memory after provoking kernel crash, but in practise it is not recommended to give regular users direct access to storage devices. Rework the code so that we do privileges check within the set_dumper() function to avoid similar problems in the future. Discussed with: secteam	2014-11-11 04:48:09 +00:00
Dag-Erling Smørgrav	133cdd9e13	Constify the AES code and propagate to consumers. This allows us to update the Fortuna code to use SHAd-256 as defined in FS&K. Approved by: so (self)	2014-11-10 09:44:38 +00:00
Poul-Henning Kamp	cd15a01091	Translate the errno to gctl_error() texts. Spotted by: mwlucas	2014-11-09 15:52:11 +00:00
Alexander Motin	c3e7ba3e6d	Add to CTL support for logical block provisioning threshold notifications. For ZVOL-backed LUNs this allows to inform initiators if storage's used or available spaces get above/below the configured thresholds. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-11-06 00:48:36 +00:00
Alexander Motin	ccf8a5688a	Revert somewhat hackish geom_disk optimization, committed as part of r256880, and the following r273143 commit, supposed to workaround introduced issue by quite innocent-looking change. While there is no clear understanding why, but r273143 is accused in data corruption in some environments with high I/O load. I personally don't see any problem in that commit, and possibly it is just a trigger to some other bug somewhere, but better safe then sorry for now. Requested by: scottl@ MFC after: 3 days	2014-10-25 15:16:19 +00:00
Colin Percival	66427784c1	Populate the GELI passphrase cache with the kern.geom.eli.passphrase variable (if any) provided in the boot environment. Unset it from the kernel environment after doing this, so that the passphrase is no longer present in kernel memory once we enter userland. This will make it possible to provide a GELI passphrase via the boot loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD) will have support for this soon. Tested by: kmoore	2014-10-22 23:41:15 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Andrey V. Elsukov	52fa0beb0a	Add provider's sectorsize and stripesize to confdot output. Submitted by: rpokala at panasas.com	2014-10-17 06:58:04 +00:00
Davide Italiano	2be111bf7d	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
Andrey V. Elsukov	0478dc0c16	Add an ability to set dumpdev via loader(8) tunable. MFC after: 3 weeks	2014-10-08 12:18:16 +00:00
Hiroki Sato	d17183901f	Fix a bug in r272297 which prevented dumpdev from setting. !u is not equivalent to (u != 0).	2014-10-03 04:13:25 +00:00
Pawel Jakub Dawidek	227f68edbb	Be prepared that set_dumper() might fail even when resetting it or prefix the call with (void) to document that we intentionally ignore the return value - no way to handle an error in case of device disappearing.	2014-09-30 12:00:50 +00:00
Pawel Jakub Dawidek	7f5b50719b	Style fixes.	2014-09-30 11:51:32 +00:00
Colin Percival	835c4dd436	Cache GELI passphrases entered at the console during the boot process, in order to improve user-friendliness when a system has multiple disks encrypted using the same passphrase. When examining a new GELI provider, the most recently used passphrase will be attempted before prompting for a passphrase; and whenever a passphrase is entered, it is cached for later reference. When the root disk is mounted, the cached passphrase is zeroed (triggered by the "mountroot" event), in order to minimize the possibility of leakage of passphrases. (After root is mounted, the "taste and prompt for passphrases on the console" code path is disabled, so there is no potential for a passphrase to be stored after the zeroing takes place.) This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0. Reviewed by: pjd, dteske, allanjude MFC after: 7 days	2014-09-16 08:40:52 +00:00
Sean Bruno	5f23eb4d9c	Add device name used in geom_map verbose output. This helps when using geom_map with multiple flash/spi devices. Phabric: https://reviews.freebsd.org/D766 Reviewed by: adrian MFC after: 2 weeks	2014-09-11 22:39:27 +00:00
John-Mark Gurney	89fac384c8	use a straight buffer instead of an iov w/ 1 segment... The aesni driver when it hits a mbuf/iov buffer, it mallocs and copies the data for processing.. This improves perf by ~8-10% on my machine... I have thoughts of fixing AES-NI so that it can better handle segmented buffers, which should help improve IPSEC performance, but that is for the future...	2014-09-04 23:53:51 +00:00
Scott Long	274919e965	Deal explicitly with possible failures of make_dev_alias_p() in GEOM. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> MFC after: 3 days	2014-08-18 19:27:47 +00:00
Andrey V. Elsukov	36b16d1f7d	Turn off kern.geom.part.mbr.enforce_chs by default.	2014-08-12 10:31:31 +00:00
Andrey V. Elsukov	fb86534cb1	Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set by default. It can be used to disable automatic alignment to CHS geometry, that GEOM_PART_MBR does. Reviewed by: wblock MFC after: 1 week	2014-08-12 09:10:13 +00:00
Warner Losh	cba7d97b61	cswitch is unsigned, so don't compare it < 0. Any negative numbers will look huge and be caught by > 100.	2014-08-07 21:56:42 +00:00
Warner Losh	86e26cb154	Unsigned values can never be less than 0.	2014-08-07 21:56:37 +00:00
Marcel Moolenaar	6c25615f39	In r264504, we prevented doing I/O for more than MAXPHYS by making the assumption that consumers would respect bio_completed and/or bio_resid to detect short reads. This assumption proved false and file corruption was the result. Create as many bios as we need to satisfy the original request. Check the cached chunk every time we need to do I/O to increase the hit rate. Obtained from: junipre Networks, Inc. MFC after: 1 week	2014-07-22 17:30:05 +00:00
Nathan Whitehorn	1ee0f08975	After EFI support was added to the installer, it needed to allow boot partitions of types other than "freebsd-boot" (in particular, "efi"). This allows the removal of some nasty hacks for supporting PowerPC systems, in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific code on MBR. This changes the installer to use the correct names, which also breaks a degeneracy in the meaning of "freebsd-boot" that allows the addition of support for some newer IBM systems that can boot from GPT in addition to MBR. Since I have no idea how to detect which those systems are, leave the default on IBM PPC systems as MBR for now.	2014-07-04 15:55:32 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Andrey V. Elsukov	91ca76a590	Add disklabel64 support to GEOM_PART class. This partitioning scheme is used in DragonFlyBSD. It is similar to BSD disklabel, but has the following improvements: * metadata has own dedicated place and isn't accessible through partitions; * all offsets are 64-bit; * supports 16 partitions by default (has reserved place for more); * has reserved place for backup label (but not yet implemented); * has UUIDs for partitions and partition types; No objections from: geom MFC after: 2 weeks Relnotes: yes	2014-06-11 10:42:34 +00:00
Andrey V. Elsukov	4042ab48c7	Allow swapping to DragonFlyBSD's swap partition. MFC after: 2 weeks	2014-06-11 10:23:49 +00:00
Andrey V. Elsukov	0640b71dfe	Add aliases for DragonFlyBSD's partition types. MFC after: 2 weeks	2014-06-11 10:19:11 +00:00
Brad Davis	ebd05adab8	- Fix the keyfile being cleared prematurely after r259428 PR: 185084 Submitted by: fk@fabiankeil.de Reviewed by: pjd@	2014-06-06 03:17:37 +00:00
Andrey V. Elsukov	39dcac849e	Use g_conf_printf_escaped() to escape symbols, which can break an XML tree. MFC after: 1 week	2014-05-30 10:35:51 +00:00
Andrey V. Elsukov	17e0c43319	Add a topology trace to the g_spoil_event. MFC after: 1 week	2014-05-19 16:08:15 +00:00
Andrey V. Elsukov	362073c089	We have two functions from where a geom orphan method could be called: g_orphan_register and g_resize_provider_event. Both are called from the event queue. Also we have GEOM_DEV class, which does deferred destroy for its consumers via g_dev_destroy (also called from the event queue). So it is possible, that for some consumers an orphan method will be called twice. This triggers panic in g_dev_orphan. Check that consumer isn't already orphaned before call orphan method. MFC after: 2 weeks	2014-05-19 16:05:42 +00:00
Alexander Motin	413037c8e7	Make GEOM DISK to account also BIO_FLUSH operations.	2014-05-17 15:07:00 +00:00
Andrey V. Elsukov	579259ea0d	It is safe to allow shrinking, when aligned size is bigger than current. Tested by: jmg MFC after: 1 week	2014-05-07 11:18:27 +00:00
Edward Tomasz Napierala	c7c7d7d0f0	Make r242379 - the fix for UFS labels disappearing after resizing the provider - also apply to UFS1 filesystems. This should help with resizing filesystems created by makefs(8), which still uses UFS1. Tested by: jmg@ Sponsored by: The FreeBSD Foundation	2014-05-05 09:20:30 +00:00
Andrey V. Elsukov	4f31a94bd2	Add an advice what to do when partition was automatically resized. X-MFC after: r256690	2014-05-04 20:00:08 +00:00
Andrey V. Elsukov	c778397f26	Add better error description for case when we are doing resize and scheme-specific method returns EBUSY. MFC after: 1 week	2014-05-04 16:55:51 +00:00
Andrey V. Elsukov	0dd7f00cee	Prevent an unexpected shrinking on resizing due to alignment for MBR, PC98 and VTOC8 schemes. Reported by: jmg MFC after: 1 week	2014-05-04 16:43:57 +00:00
Andrey V. Elsukov	bc1e8f56ff	For schemes that do an automatic partition aligning move this code to separate function. MFC after: 1 week	2014-05-04 10:14:25 +00:00
Luiz Otavio O Souza	81694cde44	Fix a leak in g_uzip_taste(). After retrieve all the block offsets from the uzip image, free the last data read.	2014-05-01 15:23:20 +00:00

... 2 3 4 5 6 ...

2247 Commits