Commit Graph

1987 Commits

Author SHA1 Message Date
Dag-Erling Smørgrav
133cdd9e13 Constify the AES code and propagate to consumers. This allows us to
update the Fortuna code to use SHAd-256 as defined in FS&K.

Approved by:	so (self)
2014-11-10 09:44:38 +00:00
Poul-Henning Kamp
cd15a01091 Translate the errno to gctl_error() texts.
Spotted by:	mwlucas
2014-11-09 15:52:11 +00:00
Alexander Motin
c3e7ba3e6d Add to CTL support for logical block provisioning threshold notifications.
For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-11-06 00:48:36 +00:00
Alexander Motin
ccf8a5688a Revert somewhat hackish geom_disk optimization, committed as part of r256880,
and the following r273143 commit, supposed to workaround introduced issue by
quite innocent-looking change.

While there is no clear understanding why, but r273143 is accused in data
corruption in some environments with high I/O load.  I personally don't see
any problem in that commit, and possibly it is just a trigger to some other
bug somewhere, but better safe then sorry for now.

Requested by:	scottl@
MFC after:	3 days
2014-10-25 15:16:19 +00:00
Colin Percival
66427784c1 Populate the GELI passphrase cache with the kern.geom.eli.passphrase
variable (if any) provided in the boot environment.  Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.

This will make it possible to provide a GELI passphrase via the boot
loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD)
will have support for this soon.

Tested by:	kmoore
2014-10-22 23:41:15 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Andrey V. Elsukov
52fa0beb0a Add provider's sectorsize and stripesize to confdot output.
Submitted by:	rpokala at panasas.com
2014-10-17 06:58:04 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Andrey V. Elsukov
0478dc0c16 Add an ability to set dumpdev via loader(8) tunable.
MFC after:	3 weeks
2014-10-08 12:18:16 +00:00
Hiroki Sato
d17183901f Fix a bug in r272297 which prevented dumpdev from setting.
!u is not equivalent to (u != 0).
2014-10-03 04:13:25 +00:00
Pawel Jakub Dawidek
227f68edbb Be prepared that set_dumper() might fail even when resetting it or prefix
the call with (void) to document that we intentionally ignore the return
value - no way to handle an error in case of device disappearing.
2014-09-30 12:00:50 +00:00
Pawel Jakub Dawidek
7f5b50719b Style fixes. 2014-09-30 11:51:32 +00:00
Colin Percival
835c4dd436 Cache GELI passphrases entered at the console during the boot process,
in order to improve user-friendliness when a system has multiple disks
encrypted using the same passphrase.

When examining a new GELI provider, the most recently used passphrase
will be attempted before prompting for a passphrase; and whenever a
passphrase is entered, it is cached for later reference.  When the root
disk is mounted, the cached passphrase is zeroed (triggered by the
"mountroot" event), in order to minimize the possibility of leakage
of passphrases.  (After root is mounted, the "taste and prompt for
passphrases on the console" code path is disabled, so there is no
potential for a passphrase to be stored after the zeroing takes place.)

This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0.

Reviewed by:	pjd, dteske, allanjude
MFC after:	7 days
2014-09-16 08:40:52 +00:00
Sean Bruno
5f23eb4d9c Add device name used in geom_map verbose output. This helps when using
geom_map with multiple flash/spi devices.

Phabric:  https://reviews.freebsd.org/D766
Reviewed by:	adrian
MFC after:	2 weeks
2014-09-11 22:39:27 +00:00
John-Mark Gurney
89fac384c8 use a straight buffer instead of an iov w/ 1 segment... The aesni
driver when it hits a mbuf/iov buffer, it mallocs and copies the data
for processing..  This improves perf by ~8-10% on my machine...

I have thoughts of fixing AES-NI so that it can better handle segmented
buffers, which should help improve IPSEC performance, but that is for
the future...
2014-09-04 23:53:51 +00:00
Scott Long
274919e965 Deal explicitly with possible failures of make_dev_alias_p() in GEOM.
Submitted by:   Mariusz Zaborski <oshogbo@FreeBSD.org>
MFC after:      3 days
2014-08-18 19:27:47 +00:00
Andrey V. Elsukov
36b16d1f7d Turn off kern.geom.part.mbr.enforce_chs by default. 2014-08-12 10:31:31 +00:00
Andrey V. Elsukov
fb86534cb1 Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set
by default. It can be used to disable automatic alignment to CHS geometry,
that GEOM_PART_MBR does.

Reviewed by:	wblock
MFC after:	1 week
2014-08-12 09:10:13 +00:00
Warner Losh
cba7d97b61 cswitch is unsigned, so don't compare it < 0. Any negative numbers
will look huge and be caught by > 100.
2014-08-07 21:56:42 +00:00
Warner Losh
86e26cb154 Unsigned values can never be less than 0. 2014-08-07 21:56:37 +00:00
Marcel Moolenaar
6c25615f39 In r264504, we prevented doing I/O for more than MAXPHYS by making
the assumption that consumers would respect bio_completed and/or
bio_resid to detect short reads. This assumption proved false and
file corruption was the result.
Create as many bios as we need to satisfy the original request.
Check the cached chunk every time we need to do I/O to increase the
hit rate.

Obtained from:	junipre Networks, Inc.
MFC after:	1 week
2014-07-22 17:30:05 +00:00
Nathan Whitehorn
1ee0f08975 After EFI support was added to the installer, it needed to allow boot
partitions of types other than "freebsd-boot" (in particular, "efi").
This allows the removal of some nasty hacks for supporting PowerPC systems,
in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific
code on MBR.

This changes the installer to use the correct names, which also breaks a
degeneracy in the meaning of "freebsd-boot" that allows the addition
of support for some newer IBM systems that can boot from GPT in addition to
MBR. Since I have no idea how to detect which those systems are, leave
the default on IBM PPC systems as MBR for now.
2014-07-04 15:55:32 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Andrey V. Elsukov
91ca76a590 Add disklabel64 support to GEOM_PART class.
This partitioning scheme is used in DragonFlyBSD. It is similar to
BSD disklabel, but has the following improvements:
* metadata has own dedicated place and isn't accessible through partitions;
* all offsets are 64-bit;
* supports 16 partitions by default (has reserved place for more);
* has reserved place for backup label (but not yet implemented);
* has UUIDs for partitions and partition types;

No objections from:	geom
MFC after:	2 weeks
Relnotes:	yes
2014-06-11 10:42:34 +00:00
Andrey V. Elsukov
4042ab48c7 Allow swapping to DragonFlyBSD's swap partition.
MFC after:	2 weeks
2014-06-11 10:23:49 +00:00
Andrey V. Elsukov
0640b71dfe Add aliases for DragonFlyBSD's partition types.
MFC after:	2 weeks
2014-06-11 10:19:11 +00:00
Brad Davis
ebd05adab8 - Fix the keyfile being cleared prematurely after r259428
PR:		185084
Submitted by:	fk@fabiankeil.de
Reviewed by:	pjd@
2014-06-06 03:17:37 +00:00
Andrey V. Elsukov
39dcac849e Use g_conf_printf_escaped() to escape symbols, which can break
an XML tree.

MFC after:	1 week
2014-05-30 10:35:51 +00:00
Andrey V. Elsukov
17e0c43319 Add a topology trace to the g_spoil_event.
MFC after:	1 week
2014-05-19 16:08:15 +00:00
Andrey V. Elsukov
362073c089 We have two functions from where a geom orphan method could be called:
g_orphan_register and g_resize_provider_event. Both are called from the
event queue. Also we have GEOM_DEV class, which does deferred destroy
for its consumers via g_dev_destroy (also called from the event queue).
So it is possible, that for some consumers an orphan method will be
called twice. This triggers panic in g_dev_orphan.
Check that consumer isn't already orphaned before call orphan method.

MFC after:	2 weeks
2014-05-19 16:05:42 +00:00
Alexander Motin
413037c8e7 Make GEOM DISK to account also BIO_FLUSH operations. 2014-05-17 15:07:00 +00:00
Andrey V. Elsukov
579259ea0d It is safe to allow shrinking, when aligned size is bigger than current.
Tested by:	jmg
MFC after:	1 week
2014-05-07 11:18:27 +00:00
Edward Tomasz Napierala
c7c7d7d0f0 Make r242379 - the fix for UFS labels disappearing after resizing
the provider - also apply to UFS1 filesystems.  This should help with
resizing filesystems created by makefs(8), which still uses UFS1.

Tested by:	jmg@
Sponsored by:	The FreeBSD Foundation
2014-05-05 09:20:30 +00:00
Andrey V. Elsukov
4f31a94bd2 Add an advice what to do when partition was automatically resized.
X-MFC after:	r256690
2014-05-04 20:00:08 +00:00
Andrey V. Elsukov
c778397f26 Add better error description for case when we are doing resize and
scheme-specific method returns EBUSY.

MFC after:	1 week
2014-05-04 16:55:51 +00:00
Andrey V. Elsukov
0dd7f00cee Prevent an unexpected shrinking on resizing due to alignment for MBR,
PC98 and VTOC8 schemes.

Reported by:	jmg
MFC after:	1 week
2014-05-04 16:43:57 +00:00
Andrey V. Elsukov
bc1e8f56ff For schemes that do an automatic partition aligning move this code to
separate function.

MFC after:	1 week
2014-05-04 10:14:25 +00:00
Luiz Otavio O Souza
81694cde44 Fix a leak in g_uzip_taste(). After retrieve all the block offsets from
the uzip image, free the last data read.
2014-05-01 15:23:20 +00:00
Luiz Otavio O Souza
ccb7284af1 Actually the FEATURE() macro is defined on sys/sysctl.h.
Pointyhat to:	loos
2014-05-01 14:59:04 +00:00
Luiz Otavio O Souza
6d8beede60 Some style and whitespace fixes. Reduce the difference between geom_uzip(4)
and geom_uncompress(4).  Now, they produce an almost clean diff(1) output.

Remove a duplicated variable from g_uncompress.c and an unnecessary header
from g_uzip.c.

No functional changes.
2014-05-01 14:47:27 +00:00
Bryan Drewery
74679c6a99 Remove redundant include
MFC after:	3 days
2014-04-29 01:17:43 +00:00
Alexander Motin
dea1e22600 Reduce number of opens by REOM RAID during provider taste.
Instead opening/closing provider by each of metadata classes, do it only
once in core code.  Since for SCSI disks open/close means sending some
SCSI commands to the device, this change reduces taste time.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-04-28 15:03:52 +00:00
Luiz Otavio O Souza
6f05733a1f Keep geom_uncompress(4) in line with geom_uzip(4), bring in the r264504 fix.
Make sure not to start I/O bigger than MAXPHYS bytes.

Quoting r264504:

When we detect the condition, we'll reduce the block count and perform
a "short" read.  In g_uncompress_done() we need to consider the original
I/O length and stop early if we're about to deflate a block that we didn't
read.  By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.

Reviewed by:	marcel
2014-04-22 18:08:34 +00:00
Marcel Moolenaar
855be5b2c1 Make sure not to do I/O for more than MAXPHYS bytes. Doing so can cause
problems in our providers, such as a KASSERT in md(4). We can initiate
I/O for more than MAXPHYS bytes if we've been given a BIO for MAXPHYS
bytes, the blocks from which we're reading couldn't be compressed and
we had compression in preceeding blocks resulting in misalignment of
the blocks we're trying to read relative to the sector. We're forced to
round up the I/O length to make it an multiple of the sector size.

When we detect the condition, we'll reduce the block count and perform
a "short" read. In g_uzip_done() we need to consider the original I/O
length and stop early if we're about to deflate a block that we didn't
read. By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.

Obtained from:	Juniper Networks, Inc.
2014-04-15 15:41:57 +00:00
Bryan Drewery
87bc328d63 Make g_access() KASSERT() more useful.
Sponsored by:	EMC / Isilon Storage Division
Obtained from:	Isilon OneFS
MFC after:	2 weeks
2014-04-15 14:41:41 +00:00
Marcel Moolenaar
4787115d04 Align and round the partitionable disk space to 4K by default.
Since this would also apply when recovering, make sure not to
align or round when that would have a partition fall outside
the partitionable area.
2014-04-12 20:28:39 +00:00
Bryan Drewery
1e4b22b44b Fix spelling error in g_trace() call.
Sponsored by:	EMC / Isilon Storage Division
MFC after:	1 week
2014-04-10 17:00:44 +00:00
Alexander Motin
1229e83d2b Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields.
This caused incorrect behavior of arrays with big-endian DDF metadata.
Little-endian (like used by Adaptec controllers) should not be harmed.
Add workaround should be enough to manage compatibility.

MFC after:	2 weeks
2014-04-10 16:00:33 +00:00
Alexander Motin
66b92c07fe Do not increment bio_data in case of BIO_DELETE.
This fixes KASSERT() panic in g_io_request().
2014-04-10 10:12:56 +00:00
Marcel Moolenaar
e8c166e85a An all-or-nothing approach to labels isn't flexible enough. Embedded
systems need fine-grained control over what's in and what's out.
That's ideal. For now, separate GPT labels from the rest and allow
g_label to be built with just GPT labels.

Obtained from:	Juniper Networks, Inc.
2014-04-06 02:44:37 +00:00
Marcel Moolenaar
12b2d77da9 Make sure we don't free memory that's already been freed by setting
the geom->softc pounter to NULL before freeing the g_slicer softc.
In g_slicer_free() the pointer is checked first.

Obtained from:	Juniper Networks, Inc.
2014-04-06 02:20:42 +00:00
Bryan Drewery
09adfca39f Show error code when failing to destroy a mirror on delay
Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-04-05 03:01:29 +00:00
Xin LI
c35ddb346f In g_eli_crypto_hmac_init(), zero out after using the ipad buffer,
k_ipad.

Note that the two consumers in geli(4) are not affected by this
issue because the way the code is constructed and as such, we
believe there is no security impact with or without this change
with geli(4)'s usage.

Reported by:	Serge van den Boom <serge vdboom.org>
Reviewed by:	pjd
MFC after:	2 weeks
2014-02-08 05:17:49 +00:00
Luiz Otavio O Souza
d9ffbff9f0 Fix the build with DEBUG enabled. Where possible, fix style(9) issues.
Reviewed by:	bde
Approved by:	adrian (mentor)
2014-02-07 13:06:48 +00:00
Luiz Otavio O Souza
f0d701f048 Fix a logic error. Because of this inflateReset() wasn't being called and
the output buffer wasn't being cleared between the inflate() calls,
producing zeroed output after the first inflate() call.

This fixes the read of mkuzip(8) images with geom_uncompress(4).

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-02-03 17:25:36 +00:00
Luiz Otavio O Souza
c2d90f35d5 Remove some unnecessary code. The offsets read from the first block are
overwritten a few lines bellow.

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-02-03 17:21:36 +00:00
Andrey V. Elsukov
524d7a4d4e Always free sbuf in gctl_free().
MFC after:	1 week
2014-01-23 21:30:31 +00:00
Andrey V. Elsukov
d14a7ff1f5 Remove another unneeded NULL check from geom_alloc_copyin().
Do copyout in case of gctl version mismatch and fix sbuf leak in
g_ctl_ioctl_ctl().

MFC after:	1 week
2014-01-23 20:25:38 +00:00
Andrey V. Elsukov
7f0e13dfe0 In gctl_copyin() remove unused error variable.
geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad
control request. Add check for NULL pointer in gctl_dump(), since it
can be NULL when geom_alloc_copyin() failed.

MFC after:	1 week
2014-01-23 19:55:02 +00:00
Andrey V. Elsukov
625ee733e3 Fix typo in r261084.
Add to the gctl_error() an ability to specify error description even
if numeric error code is already specified. Also by default set
error code to EINVAL.

PR:		185852
MFC after:	1 week
2014-01-23 19:31:17 +00:00
Andrey V. Elsukov
ee839ce84c malloc() with M_WAITOK doesn't return NULL.
MFC after:	1 week
2014-01-23 19:07:22 +00:00
Alexander Motin
eaed60f737 Removed unneeded and dangerous assignment. It would probably cause NULL
refererence panic if compiler not optimize it out.

Found with:	Clang static analyzer
MFC after:	2 weeks
2014-01-19 16:37:57 +00:00
Luiz Otavio O Souza
67619a4120 Build the geom_uncompress(4) module by default.
Fix geom_uncompress(4) module loading.  Don't link zlib.c (which is a module
itself) directly.

The built module was verified and used to read a few mkulzma(8) images on
amd64 to validate some of the informations on the manual page.

While here, don't overwrite CFLAGS.

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-01-10 20:29:46 +00:00
Andrey V. Elsukov
ae3bc0acff Add an ability to stop gmirror and clear its metadata in one command.
This fixes the problem, when gmirror starts again just after stop.

The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.

Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.

MFC after:	2 weeks
2013-12-27 02:43:53 +00:00
Dmitry Morozovsky
5cc596c46d Add GPT UUID for VMware vSAN meta-data partition.
Approved by:	ae
MFC after:	2 weeks
2013-12-26 21:06:12 +00:00
Andrey V. Elsukov
7c5710dbaf Prevent users from deactivating the last component of a mirror.
PR:		184985
MFC after:	1 week
2013-12-19 22:13:12 +00:00
Pawel Jakub Dawidek
396b29c74e Clear some more places with potentially sensitive data.
MFC after:	1 week
2013-12-15 22:52:18 +00:00
Pawel Jakub Dawidek
2a3237c84f Clear content of keyfiles loaded by the loader after processing them.
Pointed out by:	rwatson
MFC after:	1 week
2013-12-15 22:51:26 +00:00
Alexander Motin
2634da8cd5 Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.

Bisected by:	pho
2013-12-12 08:23:28 +00:00
Justin Hibbits
6cec74b2e4 Partially revert r259080. bde@ pointed out that there are a lot more style bugs
going on in here than can be fixed, and I introduced some of my own.  Rather
than fix the whole host of them, back out my bugs.

Found by:	bde
X-MFC with:	r259080
2013-12-08 09:34:56 +00:00
Justin Hibbits
8991c54091 Fix some integer signs. These unsigned integers should all be signed.
Found by:	clang (powerpc64)
2013-12-07 19:55:34 +00:00
Eitan Adler
7a22215c53 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
Alexander Motin
7ae1a87bfe Escape special XML chars, returned by some devices, confusing XML parsers.
MFC after:	1 month
2013-11-27 14:25:06 +00:00
Marcel Moolenaar
3e5a0a6b70 Have the GPT probe return a lower priority when the MBR is not a PMBR
The purpose of the PMBR is to have the disk appear in use to GPT
unaware utilities (like fdisk).  However, if the PMBR has been changed
by a GPT unaware utlity then we must assume that this was deliberate
(as it involved removal of the special slice) and we should not treat
the unmodified GPT-specific sectors as being valid.  By lowering the
probe priority in that case, the MBR scheme will take precedence and
the kernel will end up using the MBR and not the GPT. We will still
use the GPT if the kernel does not support the MBR scheme.
2013-11-21 22:02:59 +00:00
Andrey V. Elsukov
32cea4ca0f Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4).
Now it is easy to expand the size of the mirror when all its components
are replaced. Also add g_resize method to geom_mirror class. It will write
updated metadata to new last sector, when parent provider is resized.

Silence from:	geom@
MFC after:	1 month
2013-11-19 22:55:17 +00:00
Alexander Motin
f8c79813cb In addition to r258220 allow shrinking in "automatic" mode if there is
already valid metadata found at the new location.  This should allow easy
transparent recovery if first resize was done by mistake.

While there, unify metadata write code and fix minor memory leak.

MFC after:	1 month
2013-11-17 05:38:54 +00:00
Alexander Motin
e6afd72b93 Implement automatic live resize support for GEOM MULTIPATH class.
In "manual" mode just automatically resize provider in any direction.
In "automatic" mode allow only growth (with new metadata write); in case
of shrinking destroy the multipath device same as before since it may be
undesirable to write new metadata within old user area.

MFC after:	1 month
2013-11-16 14:31:49 +00:00
Andrey V. Elsukov
743437c451 Add missing line breaks.
PR:		181900
MFC after:	1 week
2013-11-11 11:13:12 +00:00
Xin LI
7ac2e58818 When zero'ing out a buffer, make sure we are using right size.
Without this change, in the worst but unlikely case scenario, certain
administrative operations, including change of configuration, set or
delete key from a GEOM ELI provider, may leave potentially sensitive
information in buffer allocated from kernel memory.

We believe that it is not possible to actively exploit these issues, nor
does it impact the security of normal usage of GEOM ELI providers when
these operations are not performed after system boot.

Security:	possible sensitive information disclosure
Submitted by:	Clement Lecigne <clecigne google com>
MFC after:	3 days
2013-11-02 01:16:10 +00:00
John Baldwin
d6d78db57f Reject attempts to attack a disk device that has the old NEEDSGIANT
flag set.

Reviewed by:	mav
2013-10-25 19:19:12 +00:00
Steven Hartland
c28078e903 Improve ZFS N-way mirror read performance by using load and locality
information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay
2013-10-23 09:54:58 +00:00
Mateusz Guzik
aa25ccfa36 gnop: make sure that newly allocated memory for softc is zeroed
This prevents mtx_init from encountering non-zeros and panicking
the kernel as a result.

Reported by:	Keith White <kwhite site.uottawa.ca>
2013-10-23 01:34:18 +00:00
Alexander Motin
1a29adad30 Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9).
Since at least FreeBSD 7 we had only four of them in the base tree, and
in head branch, thanks to jhb@, we have no any for more then a year.
2013-10-22 10:21:20 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Edward Tomasz Napierala
fb0e57b1a2 Fix build with gcc by spelling unused format string as "unused" instead of NULL.
MFC after:	29 days
2013-10-19 08:20:00 +00:00
Edward Tomasz Napierala
19e5b2d50e Make geom_label(4) resize-aware. This fixes a situation when "gpart resize"
would resize a partition, but label providers - e.g. /dev/gptid/XXX - would
stay the same size.

Reviewed by:	mav
MFC after:	1 month
Sponsored by:	FreeBSD Foundation
2013-10-18 09:14:19 +00:00
Andrey V. Elsukov
884c8e4fea Add an automatic resize support to the GEOM_PART class.
When parent provider has been resized, the scheme specific G_PART_RESIZE
method does an update of scheme's metadata. But all changes are not saved
to disk, until `gpart commit` will be called.

Discussed with:	trasz
MFC after:	1 month
2013-10-17 16:18:43 +00:00
Alexander Motin
b43560ab19 MFprojects/camlock r256445:
Add unmapped I/O support to GEOM RAID.
2013-10-16 09:33:23 +00:00
Alexander Motin
21d0712c33 MFprojects/camlock r256371:
Fix passing uninitialized bio_resid argument to g_trace().
2013-10-16 09:21:40 +00:00
Alexander Motin
0fd2511ae2 MFprojects/camlock r254907:
Move g_io_deliver() out of the lock, as required for direct dispatch.
Move g_destroy_bio() out too to reduce lock scope even more.
2013-10-16 09:18:01 +00:00
Alexander Motin
e431d66c04 MFprojects/camlock r254905:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time.  Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.
2013-10-16 09:12:40 +00:00
Dag-Erling Smørgrav
1b2cb2b3f0 Introduce a kern.geom.notaste sysctl that can be used to temporarily
disable GEOM tasting to avoid the "bouncing GEOM" problem where, when
you shut down the consumer of a provider which can be viewed in multiple
ways (typically a mirror whose members are labeled partitions), GEOM
will immediately taste that provider's alter ego and reattach the
consumer.

Approved by:	re (glebius)
2013-09-24 20:05:16 +00:00
Andrey V. Elsukov
87c0c612d8 Remove stub implementation.
MFC after:	1 week
2013-09-05 09:44:09 +00:00
Alexander Motin
19351a14eb Make ELI destruction (including orphanization) less aggressive, making it
always wait for provider close.  Old algorithm was reported to cause NULL
dereference panic on attempt to close provider after softc destruction.
If not global workaroung in GEOM, that could even cause destruction with
requests still in flight.
2013-09-02 10:44:54 +00:00
Alexander Motin
3843eba85d MFprojects/camlock r254895:
Add unmapped BIO support to GEOM ZERO if kern.geom.zero.clear is cleared.
2013-08-26 20:39:02 +00:00
Alexander Motin
40f27d7cf6 Add new attribute lunname to report only textual LUN-specific device IDs.
While lunid attribute prefers to report numeric ones, having both may be
useful in some situations.
2013-08-24 09:42:14 +00:00
Kenneth D. Merry
ce625ec719 Change the way that unmapped I/O capability is advertised.
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver.  The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't.  The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot.  The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h:	Remove the D_UNMAPPED_IO cdevsw flag and replace it
		with an SI_UNMAPPED cdev flag.

kern_physio.c:	Look at the cdev SI_UNMAPPED flag to determine
		whether or not a particular driver can handle
		unmapped I/O.

geom_dev.c:	Set the SI_UNMAPPED flag for all GEOM cdevs.
		Since GEOM will create a temporary mapping when
		needed, setting SI_UNMAPPED unconditionally will
		work.

		Remove the D_UNMAPPED_IO flag.

nvme_ns.c:	Set the SI_UNMAPPED flag on cdevs created here
		if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c:	In aio_qphysio(), check the SI_UNMAPPED flag on a
		cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h:	Bump __FreeBSD_version to 1000045 for the switch from
		setting the D_UNMAPPED_IO flag in the cdevsw to setting
		SI_UNMAPPED in the cdev.

Reviewed by:	kib, jimharris
MFC after:	1 week
Sponsored by:	Spectra Logic
2013-08-15 22:52:39 +00:00
Alexander Motin
0f0b2fd889 Return error when opening read-only volumes (like RAID4/5/...) for writing.
Previously opens succeeded, but actual write operations returned errors.

Requested by:	peter
MFC after:	2 weeks
2013-08-13 07:56:40 +00:00