Commit Graph

2190 Commits

Author SHA1 Message Date
Kenneth D. Merry
985108aeb1 Fix a style issue in g_disk_limit().
Noticed by:	bdrewery
MFC after:	1 week
2015-12-04 03:44:12 +00:00
Kenneth D. Merry
42fbdde413 Fix g_disk_vlist_limit() to work properly with deletes.
Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.

Submitted by:	will
MFC after:	1 week
Sponsored by:	Spectra Logic
2015-12-04 03:38:35 +00:00
Kenneth D. Merry
a9934668aa Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl.  User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists.  This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out.  This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
   Currently the unmapped I/O interface operates on a struct bio,
   which includes only one address and length.  It would be nice
   to be able to send an unmapped scatter/gather list down to
   busdma.  This would allow eliminating the copy we currently do
   for data.

2. Add an ioctl to list currently outstanding CCBs in the various
   queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
   that.

4. Test physical address support.  Virtual pointers and scatter
   gather lists have been tested, but I have not yet tested
   physical addresses or scatter/gather lists.

5. Investigate multiple queue support.  At the moment there is one
   queue of commands per pass(4) device.  If multiple processes
   open the device, they will submit I/O into the same queue and
   get events for the same completions.  This is probably the right
   model for most applications, but it is something that could be
   changed later on.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout.  It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.

The I/O is done by two threads, one for the reader and one for the
writer.  The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order.  That could be modified later on for random I/O patterns
or slightly out of order I/O.

camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.

For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side.  In addition to testing both
interfaces, this makes any potential reblocking of I/O easier.  No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.

For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.

Things that would be nice to do for camdd(8) eventually:

1.  Add support for I/O pattern generation.  Patterns like all
    zeros, all ones, LBA-based patterns, random patterns, etc. Right
    Now you can always use /dev/zero, /dev/random, etc.

2.  Add support for a "sink" mode, so we do only reads with no
    writes.  Right now, you can use /dev/null.

3.  Add support for automatic queue depth probing, so that we can
    figure out the right queue depth on the input and output side
    for maximum throughput.  At the moment it defaults to 6.

4.  Add support for SATA device passthrough I/O.

5.  Add support for random LBAs and/or lengths on the input and
    output sides.

6.  Track average per-I/O latency and busy time.  The busy time
    and latency could also feed in to the automatic queue depth
    determination.

sys/cam/scsi/scsi_pass.h:
	Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
	and fetch asynchronous CAM CCBs respectively.

	Although these ioctls do not have a declared argument, they
	both take a union ccb pointer.  If we declare a size here,
	the ioctl code in sys/kern/sys_generic.c will malloc and free
	a buffer for either the CCB or the CCB pointer (depending on
	how it is declared).  Since we have to keep a copy of the
	CCB (which is fairly large) anyway, having the ioctl malloc
	and free a CCB for each call is wasteful.

sys/cam/scsi/scsi_pass.c:
	Add asynchronous CCB support.

	Add two new ioctls, CAMIOQUEUE and CAMIOGET.

	CAMIOQUEUE adds a CCB to the incoming queue.  The CCB is
	executed immediately (and moved to the active queue) if it
	is an immediate CCB, but otherwise it will be executed
	in passstart() when a CCB is available from the transport layer.

	When CCBs are completed (because they are immediate or
	passdone() if they are queued), they are put on the done
	queue.

	If we get the final close on the device before all pending
	I/O is complete, all active I/O is moved to the abandoned
	queue and we increment the peripheral reference count so
	that the peripheral driver instance doesn't go away before
	all pending I/O is done.

	The new passcreatezone() function is called on the first
	call to the CAMIOQUEUE ioctl on a given device to allocate
	the UMA zones for I/O requests and S/G list buffers.  This
	may be good to move off to a taskqueue at some point.
	The new passmemsetup() function allocates memory and
	scatter/gather lists to hold the user's data, and copies
	in any data that needs to be written.  For virtual pointers
	(CAM_DATA_VADDR), the kernel buffer is malloced from the
	new pass(4) driver malloc bucket.  For virtual
	scatter/gather lists (CAM_DATA_SG), buffers are allocated
	from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
	Physical pointers are passed in unchanged.  We have support
	for up to 16 scatter/gather segments (for the user and
	kernel S/G lists) in the default struct pass_io_req, so
	requests with longer S/G lists require an extra kernel malloc.

	The new passcopysglist() function copies a user scatter/gather
	list to a kernel scatter/gather list.  The number of elements
	in each list may be different, but (obviously) the amount of data
	stored has to be identical.

	The new passmemdone() function copies data out for the
	CAM_DATA_VADDR and CAM_DATA_SG cases.

	The new passiocleanup() function restores data pointers in
	user CCBs and frees memory.

	Add new functions to support kqueue(2)/kevent(2):

	passreadfilt() tells kevent whether or not the done
	queue is empty.

	passkqfilter() adds a knote to our list.

	passreadfiltdetach() removes a knote from our list.

	Add a new function, passpoll(), for poll(2)/select(2)
	to use.

	Add devstat(9) support for the queued CCB path.

sys/cam/ata/ata_da.c:
	Add support for the BIO_VLIST bio type.

sys/cam/cam_ccb.h:
	Add a new enumeration for the xflags field in the CCB header.
	(This doesn't change the CCB header, just adds an enumeration to
	use.)

sys/cam/cam_xpt.c:
	Add a new function, xpt_setup_ccb_flags(), that allows specifying
	CCB flags.

sys/cam/cam_xpt.h:
	Add a prototype for xpt_setup_ccb_flags().

sys/cam/scsi/scsi_da.c:
	Add support for BIO_VLIST.

sys/dev/md/md.c:
	Add BIO_VLIST support to md(4).

sys/geom/geom_disk.c:
	Add BIO_VLIST support to the GEOM disk class.  Re-factor the I/O size
	limiting code in g_disk_start() a bit.

sys/kern/subr_bus_dma.c:
	Change _bus_dmamap_load_vlist() to take a starting offset and
	length.

	Add a new function, _bus_dmamap_load_pages(), that will load a list
	of physical pages starting at an offset.

	Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
	Allow unmapped I/O to start at an offset.

sys/kern/subr_uio.c:
	Add two new functions, physcopyin_vlist() and physcopyout_vlist().

sys/pc98/include/bus.h:
	Guard kernel-only parts of the pc98 machine/bus.h header with
	#ifdef _KERNEL.

	This allows userland programs to include <machine/bus.h> to get the
	definition of bus_addr_t and bus_size_t.

sys/sys/bio.h:
	Add a new bio flag, BIO_VLIST.

sys/sys/uio.h:
	Add prototypes for physcopyin_vlist() and physcopyout_vlist().

share/man/man4/pass.4:
	Document the CAMIOQUEUE and CAMIOGET ioctls.

usr.sbin/Makefile:
	Add camdd.

usr.sbin/camdd/Makefile:
	Add a makefile for camdd(8).

usr.sbin/camdd/camdd.8:
	Man page for camdd(8).

usr.sbin/camdd/camdd.c:
	The new camdd(8) utility.

Sponsored by:	Spectra Logic
MFC after:	1 week
2015-12-03 20:54:55 +00:00
Steven Hartland
86787e8d97 Fix early kernel dump via dumpdev env
Setting the dumpdev via env e.g. loader.conf provides the ability to
configure the kernel dump device during early boot. When using this
g_io_getattr was returning EPERM due to cp->acr == 0.

Fix this by calling g_access to ensure we're a read consumer prior
to calling g_dev_setdumpdev.

MFC after:	2 weeks
Sponsored by:	Multiplay
2015-11-17 20:55:50 +00:00
Steven Hartland
2dc7e36b0b Fix g_eli error loss conditions
* Ensure that error information isn't lost.
* Log the error code in all cases.
* Don't overwrite bio_completed set to 0 from the error condition.

MFC after:	2 weeks
Sponsored by:	Multiplay
2015-11-05 17:37:35 +00:00
Alexander Motin
4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00
Edward Tomasz Napierala
45d7de1d37 Make geom_nop(4) collect statistics on all types of BIOs, not just
reads and writes.

PR:		kern/198405
Submitted by:	Matthew D. Fuller <fullermd at over-yonder dot net>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3679
2015-10-10 09:03:31 +00:00
Conrad Meyer
b4d7290796 geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon
Skip a /dev/ prefix, if one is present, when checking for matching
device names for dump.

Suggested by:	avg
Reviewed by:	markj
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3725
2015-09-23 21:08:52 +00:00
Edward Tomasz Napierala
bb27d7ed90 Add a way to specify stripesize and stripeoffset to gnop(8). This makes
it possible to "simulate" 4K media, to eg test alignment handling.

Reviewed by:	mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3664
2015-09-15 18:01:59 +00:00
Warner Losh
3f2e5b8584 After the introduction of direct dispatch, the pacing code in g_down()
broke in two ways. One, the pacing variable was accessed in multiple
threads in an unsafe way. Two, since large numbers of I/O could come
down from the buf layer at one time, large numbers of allocation
failures could happen all at once, resulting in a huge pace value that
would limit I/Os to 10 IOPS for minutes (or even hours) at a
time. While a real solution to these problems requires substantial
work (to go to a no-allocation after the first model, or to have some
way to wait for more memory with some kind of reserve for pager and
swapper requests), it is relatively easy to make this simplistic
pacing less pathological.

Move to using a volatile variable with loads and stores. While this is
a little racy, losing the race is safe: either you get memory and
proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever
is larger) instead of 100ms. This removes the artificial 10 IOPS limit
while still easing up on new I/Os during memory shortages. Remove
tying the amount of time we do this to the number of failed requests
and do it only as long as we keep failing requests.

Finally, to avoid needless recursion when memory is tight (start ->
g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the
stack), don't do direct dispatch while pacing. This should be a rare
event (not steady state) so the performance hit here is worth the
extra safety of not starving g_down() with directly dispatched I/O.

Differential Review: https://reviews.freebsd.org/D3546
2015-09-02 17:29:30 +00:00
Justin Hibbits
6aabc119b6 Create a RouterBoard platform and use it to create a flash map
Summary:
The RouterBoard uses a predefined partition map which doesn't exist in the fdt.
This change allows overriding the fdt slicer with a custom slicer, and uses this
custom slicer to define the flash map on the RouterBoard RB800.
D3305 converts the mpc85xx platform into a base class, so that systems based on
the mpc85xx platform can add their own overrides.  This change builds on D3305,
and creates a RouterBoard (RB800) platform to initialize the slicer override.

Reviewed By: nwhitehorn, imp
Differential Revision: https://reviews.freebsd.org/D3345
2015-08-22 05:50:18 +00:00
Pedro F. Giffuni
6bc3fe5f4e Clean out some externally visible "more then" grammar
MFC after:	3 days
2015-08-11 03:12:09 +00:00
Enji Cooper
604083d74c Make some debug printf's into DPRINTF's to reduce noise on attach/detahh
Similar reasoning to what was done in r286367 with geom_uzip(4)

MFC after: 2 weeks
Differential Revision: D3320
Sponsored by: EMC / Isilon Storage Division
2015-08-09 06:58:06 +00:00
Pawel Jakub Dawidek
46e3447026 Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected when
GELI is used on a SSD or inside virtual machine, so that guest can tell
host that it is no longer using some of the storage.

Enabling BIO_DELETE passthru comes with a small security consequence - an
attacker can tell how much space is being really used on encrypted device and
has less data no analyse then. This is why the -T option can be given to the
init subcommand to turn off this behaviour and -t/T options for the configure
subcommand can be used to adjust this setting later.

PR:		198863
Submitted by:	Matthew D. Fuller fullermd at over-yonder dot net

This commit also includes a fix from Fabian Keil freebsd-listen at
fabiankeil.de for 'configure' on onetime providers which is not strictly
related, but is entangled in the same code, so would cause conflicts if
separated out.
2015-08-08 09:51:38 +00:00
Konstantin Belousov
347e9d5495 Minor style cleanup of the code surrounding r286404.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-07 08:24:12 +00:00
Konstantin Belousov
9b34965019 The condition to use direct processing for the unmapped bio is
reverted.  We can do direct processing when g_io_check() does not need
to perform transient remapping of the bio, otherwise the thread has to
sleep.

Reviewed by:	mav (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-07 08:13:34 +00:00
Pawel Jakub Dawidek
5ee9ea19fe After crypto_dispatch() bio might be already delivered and destroyed,
so we cannot access it anymore. Setting an error later lead to memory
corruption.

Assert that crypto_dispatch() was successful. It can fail only if we pass a
bogus crypto request, which is a bug in the program, not a runtime condition.

PR:		199705
Submitted by:	luke.tw
Reviewed by:	emaste
MFC after:	3 days
2015-08-06 17:13:34 +00:00
Enji Cooper
fcc8461cfb Make some debug printf's into DPRINTF's to reduce noise on attach/detach
Differential Revision: https://reviews.freebsd.org/D3306
MFC after: 1 week
Reviewed by: loos
Sponsored by: EMC / Isilon Storage Division
2015-08-06 15:30:14 +00:00
Edward Tomasz Napierala
72800098bf Fix panic triggered by code like this:
open("/dev/md0", O_EXEC);

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3051
2015-08-04 10:40:08 +00:00
Edward Tomasz Napierala
d6cc35b287 Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters.  It seems
to be harmless, though.  If anyone knows a better way to approach
this - please tell.

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3050
2015-08-03 16:35:18 +00:00
Andrey V. Elsukov
da6c24e123 Report the scheme and provider names in warning message about unaligned
partition.

PR:		201873
MFC after:	1 week
2015-07-26 11:16:48 +00:00
Allan Jude
ce808c7ad8 Add a new option to gpart(8) to fix Lenovo BIOS boot issue
PR:		184910
Reviewed by:	ae, wblock
Approved by:	marcel
MFC after:	3 days
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3065
2015-07-15 02:23:55 +00:00
Pawel Jakub Dawidek
4273d41299 Spoil even can happen for some time now even on providers opened exclusively
(on the media change event). Update GELI to handle that situation.

PR:		201185
Submitted by:	Matthew D. Fuller
2015-07-10 19:27:19 +00:00
Pawel Jakub Dawidek
fefb6a143a Properly propagate errors in metadata reading.
PR:		198860
Submitted by:	Matthew D. Fuller
2015-07-02 10:57:34 +00:00
Pawel Jakub Dawidek
edaa9008ff Allow to omit keyfile number for the first keyfile. 2015-07-02 10:55:32 +00:00
Edward Tomasz Napierala
628b712826 Fix off-by-one error in fstyp(8) and geom_label(4) that made them use
a single space (" ") as a CD9660 label name when no label was present.
Similar problem was also present in msdosfs label recognition.

PR:		200828
Differential Revision:	https://reviews.freebsd.org/D2830
Reviewed by:	asomers@, emaste@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2015-06-18 21:55:55 +00:00
Andrey V. Elsukov
e7d0c7e458 Teach G_PART_GPT class to handle g_resize_provider event.
MFC after:	10 days
2015-06-08 12:52:41 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Andrey V. Elsukov
153c57b5b4 Read GEOM_UNCOMPRESS metadata using several requests that fit into
MAXPHYS. For large compressed images the metadata size can be bigger
than MAXPHYS and this triggers KASSERT in g_read_data().
Also use g_free() to free memory allocated by g_read_data().

PR:		199476
MFC after:	2 weeks
2015-05-19 09:28:52 +00:00
Andrey V. Elsukov
4b8d4f97b0 Add apple-boot, apple-hfs and apple-ufs aliases to MBR scheme.
Sort DOSPTYP_* entries in diskmbr.h by value.
Document these scheme-specific types in gpart(8).

MFC after:	1 week
2015-05-05 09:33:02 +00:00
Craig Rodrigues
d9db52256e Move zlib.c from net to libkern.
It is not network-specific code and would
be better as part of libkern instead.
Move zlib.h and zutil.h from net/ to sys/
Update includes to use sys/zlib.h and sys/zutil.h instead of net/

Submitted by:		Steve Kiernan stevek@juniper.net
Obtained from:		Juniper Networks, Inc.
GitHub Pull Request:	https://github.com/freebsd/freebsd/pull/28
Relnotes:		yes
2015-04-22 14:38:58 +00:00
Pedro F. Giffuni
4a5e6b854d g_uncompress_taste: prevent a double free.
Found by:	Clang Static Analyzer
MFC after:	1 week
2015-04-20 16:31:27 +00:00
Alexander Motin
0ada3afc25 Remove sleeps from geom_up thread on device destruction.
MFC after:	3 days.
2015-04-09 13:09:05 +00:00
Alexander Motin
5d85cd2d11 Remove extra semicolon.
MFC after:	1 week
2015-03-27 12:45:20 +00:00
Alexander Motin
3ab0187add Remove request sorting from GEOM_MIRROR and GEOM_RAID.
When CPU is not busy, those queues are typically empty.  When CPU is busy,
then one more extra sorting is the last thing it needs.  If specific device
(HDD) really needs sorting, then it will be done later by CAM.

This supposed to fix livelock reported for mirror of two SSDs, when UFS
fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by
pointless sorting of requests and responses under single mutex lock.

MFC after:	2 weeks
2015-03-27 12:44:28 +00:00
Alexander Motin
41fe4ba647 Fix bug on memory allocation error in split method.
While there, use bioq_takefirst() in place where it is convenient.

MFC after:	1 week
2015-03-27 11:14:12 +00:00
Alexander Motin
5523c82c1a Make GEOM_PART work in presence of previous withered self.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-03-26 12:17:47 +00:00
Alexander Motin
2f36085dcf Report withered providers as such alike to GEOMs.
MFC after:	2 weeks
2015-03-26 11:19:24 +00:00
Alexander Motin
ba772028db When searching for provider by name, prefer non-withered one.
MFC after:	2 weeks
2015-03-26 11:02:29 +00:00
Adrian Chadd
28d507fcec Fix the label search routine in geom_map to not trip up on '\0' bytes.
* Just do the buf check early and fail out
* If the offset being searched is:

00110000  00 b5 7e 45 61 e2 76 d3  c1 78 dd 15 95 cd 1f f1  |..~Ea.v..x......|

.. and the match string is '.!/bin/sh'

.. then it'll set the match string[0] to '\0', do a strncmp() against
the read buffer, find it's matching two zero-length strings, and think
that's where to start.

MFC after:	2 weeks
2015-03-19 03:58:25 +00:00
Andrey V. Elsukov
4fb4ebe0a4 Add GUID and alias for Apple Core Storage partition.
PR:		196241
MFC after:	1 week
2015-03-12 18:51:31 +00:00
Alexander Motin
7715befdf2 Fix couple BIO_DELETE bugs in geom_mirror.
Do not report GEOM::candelete if none of providers support BIO_DELETE.
If consumer still requests BIO_DELETE, report error instead of hanging.

MFC after:	2 weeks
2015-03-12 10:20:53 +00:00
Alexander Motin
0b1b7c2cec Replace constant with proper sizeof().
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2015-02-25 10:18:11 +00:00
Edward Tomasz Napierala
01de1a0650 Add devd(8) notifications for creation and destruction of GEOM devices.
Differential Revision:	https://reviews.freebsd.org/D1211
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-01-14 11:15:57 +00:00
Warner Losh
a91275f72f Remove old ioctl use and support, once and for all. 2015-01-06 05:28:37 +00:00
Warner Losh
0acf08d985 Remove support for FreeBSD 7 and really old FreeBSD 8. The classifiers
have been in the base for a while, so the gymnastics here aren't
needed. In addition, the bugs in subr_disk.c have been fixed since
2009, so there's no need for an identical copy of it in the tree
anymore. There's really no need to binary patch g_io_request, so let's
get rid of the code (not compiled in anymore) lest others think it is
a good idea.
2014-12-20 00:04:01 +00:00
John-Mark Gurney
08fca7a56b Add some new modes to OpenCrypto. These modes are AES-ICM (can be used
for counter mode), and AES-GCM.  Both of these modes have been added to
the aesni module.

Included is a set of tests to validate that the software and aesni
module calculate the correct values.  These use the NIST KAT test
vectors.  To run the test, you will need to install a soon to be
committed port, nist-kat that will install the vectors.  Using a port
is necessary as the test vectors are around 25MB.

All the man pages were updated.  I have added a new man page, crypto.7,
which includes a description of how to use each mode.  All the new modes
and some other AES modes are present.  It would be good for someone
else to go through and document the other modes.

A new ioctl was added to support AEAD modes which AES-GCM is one of them.
Without this ioctl, it is not possible to test AEAD modes from userland.

Add a timing safe bcmp for use to compare MACs.  Previously we were using
bcmp which could leak timing info and result in the ability to forge
messages.

Add a minor optimization to the aesni module so that single segment
mbufs don't get copied and instead are updated in place.  The aesni
module needs to be updated to support blocked IO so segmented mbufs
don't have to be copied.

We require that the IV be specified for all calls for both GCM and ICM.
This is to ensure proper use of these functions.

Obtained from:	p4: //depot/projects/opencrypto
Relnotes:	yes
Sponsored by:	FreeBSD Foundation
Sponsored by:	NetGate
2014-12-12 19:56:36 +00:00
Alexander Motin
1e68fe9c33 Avoid unneeded malloc/memcpy/free if there is no metadata on disk.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2014-12-05 10:23:18 +00:00
Alexander Motin
26f0f92fa2 Decode some binary fields of Intel metadata.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2014-12-04 15:54:45 +00:00
Warner Losh
66cc25a224 Actually, that was a bad idea. Go back to MAXPARTITIONS.
Submitted by: bruce
2014-11-20 17:31:25 +00:00
Warner Losh
dd87e2c610 The number of BSD partitions is variable. Return the proper number
(which is in basetable->gpt_entries).

Submitted by: ae@
2014-11-19 18:55:27 +00:00
Warner Losh
73f49e9eef Implement the historic DIOCGDINFO ioctl for gpart on BSD
partitions. Several utilities still use this interface and require
additional information since gpart was activated than before. This
allows fsck of a UFS partition without having to specify it is UFS,
per historic behavior.
2014-11-18 17:06:40 +00:00
Pawel Jakub Dawidek
5ebb15b942 Add missing privilege check when setting the dump device. Before that change it
was possible for a regular user to setup the dump device if he had write access
to the given device. In theory it is a security issue as user might get access
to kernel's memory after provoking kernel crash, but in practise it is not
recommended to give regular users direct access to storage devices.

Rework the code so that we do privileges check within the set_dumper() function
to avoid similar problems in the future.

Discussed with:	secteam
2014-11-11 04:48:09 +00:00
Dag-Erling Smørgrav
133cdd9e13 Constify the AES code and propagate to consumers. This allows us to
update the Fortuna code to use SHAd-256 as defined in FS&K.

Approved by:	so (self)
2014-11-10 09:44:38 +00:00
Poul-Henning Kamp
cd15a01091 Translate the errno to gctl_error() texts.
Spotted by:	mwlucas
2014-11-09 15:52:11 +00:00
Alexander Motin
c3e7ba3e6d Add to CTL support for logical block provisioning threshold notifications.
For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-11-06 00:48:36 +00:00
Alexander Motin
ccf8a5688a Revert somewhat hackish geom_disk optimization, committed as part of r256880,
and the following r273143 commit, supposed to workaround introduced issue by
quite innocent-looking change.

While there is no clear understanding why, but r273143 is accused in data
corruption in some environments with high I/O load.  I personally don't see
any problem in that commit, and possibly it is just a trigger to some other
bug somewhere, but better safe then sorry for now.

Requested by:	scottl@
MFC after:	3 days
2014-10-25 15:16:19 +00:00
Colin Percival
66427784c1 Populate the GELI passphrase cache with the kern.geom.eli.passphrase
variable (if any) provided in the boot environment.  Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.

This will make it possible to provide a GELI passphrase via the boot
loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD)
will have support for this soon.

Tested by:	kmoore
2014-10-22 23:41:15 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Andrey V. Elsukov
52fa0beb0a Add provider's sectorsize and stripesize to confdot output.
Submitted by:	rpokala at panasas.com
2014-10-17 06:58:04 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Andrey V. Elsukov
0478dc0c16 Add an ability to set dumpdev via loader(8) tunable.
MFC after:	3 weeks
2014-10-08 12:18:16 +00:00
Hiroki Sato
d17183901f Fix a bug in r272297 which prevented dumpdev from setting.
!u is not equivalent to (u != 0).
2014-10-03 04:13:25 +00:00
Pawel Jakub Dawidek
227f68edbb Be prepared that set_dumper() might fail even when resetting it or prefix
the call with (void) to document that we intentionally ignore the return
value - no way to handle an error in case of device disappearing.
2014-09-30 12:00:50 +00:00
Pawel Jakub Dawidek
7f5b50719b Style fixes. 2014-09-30 11:51:32 +00:00
Colin Percival
835c4dd436 Cache GELI passphrases entered at the console during the boot process,
in order to improve user-friendliness when a system has multiple disks
encrypted using the same passphrase.

When examining a new GELI provider, the most recently used passphrase
will be attempted before prompting for a passphrase; and whenever a
passphrase is entered, it is cached for later reference.  When the root
disk is mounted, the cached passphrase is zeroed (triggered by the
"mountroot" event), in order to minimize the possibility of leakage
of passphrases.  (After root is mounted, the "taste and prompt for
passphrases on the console" code path is disabled, so there is no
potential for a passphrase to be stored after the zeroing takes place.)

This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0.

Reviewed by:	pjd, dteske, allanjude
MFC after:	7 days
2014-09-16 08:40:52 +00:00
Sean Bruno
5f23eb4d9c Add device name used in geom_map verbose output. This helps when using
geom_map with multiple flash/spi devices.

Phabric:  https://reviews.freebsd.org/D766
Reviewed by:	adrian
MFC after:	2 weeks
2014-09-11 22:39:27 +00:00
John-Mark Gurney
89fac384c8 use a straight buffer instead of an iov w/ 1 segment... The aesni
driver when it hits a mbuf/iov buffer, it mallocs and copies the data
for processing..  This improves perf by ~8-10% on my machine...

I have thoughts of fixing AES-NI so that it can better handle segmented
buffers, which should help improve IPSEC performance, but that is for
the future...
2014-09-04 23:53:51 +00:00
Scott Long
274919e965 Deal explicitly with possible failures of make_dev_alias_p() in GEOM.
Submitted by:   Mariusz Zaborski <oshogbo@FreeBSD.org>
MFC after:      3 days
2014-08-18 19:27:47 +00:00
Andrey V. Elsukov
36b16d1f7d Turn off kern.geom.part.mbr.enforce_chs by default. 2014-08-12 10:31:31 +00:00
Andrey V. Elsukov
fb86534cb1 Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set
by default. It can be used to disable automatic alignment to CHS geometry,
that GEOM_PART_MBR does.

Reviewed by:	wblock
MFC after:	1 week
2014-08-12 09:10:13 +00:00
Warner Losh
cba7d97b61 cswitch is unsigned, so don't compare it < 0. Any negative numbers
will look huge and be caught by > 100.
2014-08-07 21:56:42 +00:00
Warner Losh
86e26cb154 Unsigned values can never be less than 0. 2014-08-07 21:56:37 +00:00
Marcel Moolenaar
6c25615f39 In r264504, we prevented doing I/O for more than MAXPHYS by making
the assumption that consumers would respect bio_completed and/or
bio_resid to detect short reads. This assumption proved false and
file corruption was the result.
Create as many bios as we need to satisfy the original request.
Check the cached chunk every time we need to do I/O to increase the
hit rate.

Obtained from:	junipre Networks, Inc.
MFC after:	1 week
2014-07-22 17:30:05 +00:00
Nathan Whitehorn
1ee0f08975 After EFI support was added to the installer, it needed to allow boot
partitions of types other than "freebsd-boot" (in particular, "efi").
This allows the removal of some nasty hacks for supporting PowerPC systems,
in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific
code on MBR.

This changes the installer to use the correct names, which also breaks a
degeneracy in the meaning of "freebsd-boot" that allows the addition
of support for some newer IBM systems that can boot from GPT in addition to
MBR. Since I have no idea how to detect which those systems are, leave
the default on IBM PPC systems as MBR for now.
2014-07-04 15:55:32 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Andrey V. Elsukov
91ca76a590 Add disklabel64 support to GEOM_PART class.
This partitioning scheme is used in DragonFlyBSD. It is similar to
BSD disklabel, but has the following improvements:
* metadata has own dedicated place and isn't accessible through partitions;
* all offsets are 64-bit;
* supports 16 partitions by default (has reserved place for more);
* has reserved place for backup label (but not yet implemented);
* has UUIDs for partitions and partition types;

No objections from:	geom
MFC after:	2 weeks
Relnotes:	yes
2014-06-11 10:42:34 +00:00
Andrey V. Elsukov
4042ab48c7 Allow swapping to DragonFlyBSD's swap partition.
MFC after:	2 weeks
2014-06-11 10:23:49 +00:00
Andrey V. Elsukov
0640b71dfe Add aliases for DragonFlyBSD's partition types.
MFC after:	2 weeks
2014-06-11 10:19:11 +00:00
Brad Davis
ebd05adab8 - Fix the keyfile being cleared prematurely after r259428
PR:		185084
Submitted by:	fk@fabiankeil.de
Reviewed by:	pjd@
2014-06-06 03:17:37 +00:00
Andrey V. Elsukov
39dcac849e Use g_conf_printf_escaped() to escape symbols, which can break
an XML tree.

MFC after:	1 week
2014-05-30 10:35:51 +00:00
Andrey V. Elsukov
17e0c43319 Add a topology trace to the g_spoil_event.
MFC after:	1 week
2014-05-19 16:08:15 +00:00
Andrey V. Elsukov
362073c089 We have two functions from where a geom orphan method could be called:
g_orphan_register and g_resize_provider_event. Both are called from the
event queue. Also we have GEOM_DEV class, which does deferred destroy
for its consumers via g_dev_destroy (also called from the event queue).
So it is possible, that for some consumers an orphan method will be
called twice. This triggers panic in g_dev_orphan.
Check that consumer isn't already orphaned before call orphan method.

MFC after:	2 weeks
2014-05-19 16:05:42 +00:00
Alexander Motin
413037c8e7 Make GEOM DISK to account also BIO_FLUSH operations. 2014-05-17 15:07:00 +00:00
Andrey V. Elsukov
579259ea0d It is safe to allow shrinking, when aligned size is bigger than current.
Tested by:	jmg
MFC after:	1 week
2014-05-07 11:18:27 +00:00
Edward Tomasz Napierala
c7c7d7d0f0 Make r242379 - the fix for UFS labels disappearing after resizing
the provider - also apply to UFS1 filesystems.  This should help with
resizing filesystems created by makefs(8), which still uses UFS1.

Tested by:	jmg@
Sponsored by:	The FreeBSD Foundation
2014-05-05 09:20:30 +00:00
Andrey V. Elsukov
4f31a94bd2 Add an advice what to do when partition was automatically resized.
X-MFC after:	r256690
2014-05-04 20:00:08 +00:00
Andrey V. Elsukov
c778397f26 Add better error description for case when we are doing resize and
scheme-specific method returns EBUSY.

MFC after:	1 week
2014-05-04 16:55:51 +00:00
Andrey V. Elsukov
0dd7f00cee Prevent an unexpected shrinking on resizing due to alignment for MBR,
PC98 and VTOC8 schemes.

Reported by:	jmg
MFC after:	1 week
2014-05-04 16:43:57 +00:00
Andrey V. Elsukov
bc1e8f56ff For schemes that do an automatic partition aligning move this code to
separate function.

MFC after:	1 week
2014-05-04 10:14:25 +00:00
Luiz Otavio O Souza
81694cde44 Fix a leak in g_uzip_taste(). After retrieve all the block offsets from
the uzip image, free the last data read.
2014-05-01 15:23:20 +00:00
Luiz Otavio O Souza
ccb7284af1 Actually the FEATURE() macro is defined on sys/sysctl.h.
Pointyhat to:	loos
2014-05-01 14:59:04 +00:00
Luiz Otavio O Souza
6d8beede60 Some style and whitespace fixes. Reduce the difference between geom_uzip(4)
and geom_uncompress(4).  Now, they produce an almost clean diff(1) output.

Remove a duplicated variable from g_uncompress.c and an unnecessary header
from g_uzip.c.

No functional changes.
2014-05-01 14:47:27 +00:00
Bryan Drewery
74679c6a99 Remove redundant include
MFC after:	3 days
2014-04-29 01:17:43 +00:00
Alexander Motin
dea1e22600 Reduce number of opens by REOM RAID during provider taste.
Instead opening/closing provider by each of metadata classes, do it only
once in core code.  Since for SCSI disks open/close means sending some
SCSI commands to the device, this change reduces taste time.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-04-28 15:03:52 +00:00
Luiz Otavio O Souza
6f05733a1f Keep geom_uncompress(4) in line with geom_uzip(4), bring in the r264504 fix.
Make sure not to start I/O bigger than MAXPHYS bytes.

Quoting r264504:

When we detect the condition, we'll reduce the block count and perform
a "short" read.  In g_uncompress_done() we need to consider the original
I/O length and stop early if we're about to deflate a block that we didn't
read.  By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.

Reviewed by:	marcel
2014-04-22 18:08:34 +00:00
Marcel Moolenaar
855be5b2c1 Make sure not to do I/O for more than MAXPHYS bytes. Doing so can cause
problems in our providers, such as a KASSERT in md(4). We can initiate
I/O for more than MAXPHYS bytes if we've been given a BIO for MAXPHYS
bytes, the blocks from which we're reading couldn't be compressed and
we had compression in preceeding blocks resulting in misalignment of
the blocks we're trying to read relative to the sector. We're forced to
round up the I/O length to make it an multiple of the sector size.

When we detect the condition, we'll reduce the block count and perform
a "short" read. In g_uzip_done() we need to consider the original I/O
length and stop early if we're about to deflate a block that we didn't
read. By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.

Obtained from:	Juniper Networks, Inc.
2014-04-15 15:41:57 +00:00
Bryan Drewery
87bc328d63 Make g_access() KASSERT() more useful.
Sponsored by:	EMC / Isilon Storage Division
Obtained from:	Isilon OneFS
MFC after:	2 weeks
2014-04-15 14:41:41 +00:00
Marcel Moolenaar
4787115d04 Align and round the partitionable disk space to 4K by default.
Since this would also apply when recovering, make sure not to
align or round when that would have a partition fall outside
the partitionable area.
2014-04-12 20:28:39 +00:00
Bryan Drewery
1e4b22b44b Fix spelling error in g_trace() call.
Sponsored by:	EMC / Isilon Storage Division
MFC after:	1 week
2014-04-10 17:00:44 +00:00
Alexander Motin
1229e83d2b Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields.
This caused incorrect behavior of arrays with big-endian DDF metadata.
Little-endian (like used by Adaptec controllers) should not be harmed.
Add workaround should be enough to manage compatibility.

MFC after:	2 weeks
2014-04-10 16:00:33 +00:00
Alexander Motin
66b92c07fe Do not increment bio_data in case of BIO_DELETE.
This fixes KASSERT() panic in g_io_request().
2014-04-10 10:12:56 +00:00
Marcel Moolenaar
e8c166e85a An all-or-nothing approach to labels isn't flexible enough. Embedded
systems need fine-grained control over what's in and what's out.
That's ideal. For now, separate GPT labels from the rest and allow
g_label to be built with just GPT labels.

Obtained from:	Juniper Networks, Inc.
2014-04-06 02:44:37 +00:00
Marcel Moolenaar
12b2d77da9 Make sure we don't free memory that's already been freed by setting
the geom->softc pounter to NULL before freeing the g_slicer softc.
In g_slicer_free() the pointer is checked first.

Obtained from:	Juniper Networks, Inc.
2014-04-06 02:20:42 +00:00
Bryan Drewery
09adfca39f Show error code when failing to destroy a mirror on delay
Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-04-05 03:01:29 +00:00
Xin LI
c35ddb346f In g_eli_crypto_hmac_init(), zero out after using the ipad buffer,
k_ipad.

Note that the two consumers in geli(4) are not affected by this
issue because the way the code is constructed and as such, we
believe there is no security impact with or without this change
with geli(4)'s usage.

Reported by:	Serge van den Boom <serge vdboom.org>
Reviewed by:	pjd
MFC after:	2 weeks
2014-02-08 05:17:49 +00:00
Luiz Otavio O Souza
d9ffbff9f0 Fix the build with DEBUG enabled. Where possible, fix style(9) issues.
Reviewed by:	bde
Approved by:	adrian (mentor)
2014-02-07 13:06:48 +00:00
Luiz Otavio O Souza
f0d701f048 Fix a logic error. Because of this inflateReset() wasn't being called and
the output buffer wasn't being cleared between the inflate() calls,
producing zeroed output after the first inflate() call.

This fixes the read of mkuzip(8) images with geom_uncompress(4).

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-02-03 17:25:36 +00:00
Luiz Otavio O Souza
c2d90f35d5 Remove some unnecessary code. The offsets read from the first block are
overwritten a few lines bellow.

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-02-03 17:21:36 +00:00
Andrey V. Elsukov
524d7a4d4e Always free sbuf in gctl_free().
MFC after:	1 week
2014-01-23 21:30:31 +00:00
Andrey V. Elsukov
d14a7ff1f5 Remove another unneeded NULL check from geom_alloc_copyin().
Do copyout in case of gctl version mismatch and fix sbuf leak in
g_ctl_ioctl_ctl().

MFC after:	1 week
2014-01-23 20:25:38 +00:00
Andrey V. Elsukov
7f0e13dfe0 In gctl_copyin() remove unused error variable.
geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad
control request. Add check for NULL pointer in gctl_dump(), since it
can be NULL when geom_alloc_copyin() failed.

MFC after:	1 week
2014-01-23 19:55:02 +00:00
Andrey V. Elsukov
625ee733e3 Fix typo in r261084.
Add to the gctl_error() an ability to specify error description even
if numeric error code is already specified. Also by default set
error code to EINVAL.

PR:		185852
MFC after:	1 week
2014-01-23 19:31:17 +00:00
Andrey V. Elsukov
ee839ce84c malloc() with M_WAITOK doesn't return NULL.
MFC after:	1 week
2014-01-23 19:07:22 +00:00
Alexander Motin
eaed60f737 Removed unneeded and dangerous assignment. It would probably cause NULL
refererence panic if compiler not optimize it out.

Found with:	Clang static analyzer
MFC after:	2 weeks
2014-01-19 16:37:57 +00:00
Luiz Otavio O Souza
67619a4120 Build the geom_uncompress(4) module by default.
Fix geom_uncompress(4) module loading.  Don't link zlib.c (which is a module
itself) directly.

The built module was verified and used to read a few mkulzma(8) images on
amd64 to validate some of the informations on the manual page.

While here, don't overwrite CFLAGS.

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-01-10 20:29:46 +00:00
Andrey V. Elsukov
ae3bc0acff Add an ability to stop gmirror and clear its metadata in one command.
This fixes the problem, when gmirror starts again just after stop.

The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.

Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.

MFC after:	2 weeks
2013-12-27 02:43:53 +00:00
Dmitry Morozovsky
5cc596c46d Add GPT UUID for VMware vSAN meta-data partition.
Approved by:	ae
MFC after:	2 weeks
2013-12-26 21:06:12 +00:00
Andrey V. Elsukov
7c5710dbaf Prevent users from deactivating the last component of a mirror.
PR:		184985
MFC after:	1 week
2013-12-19 22:13:12 +00:00
Pawel Jakub Dawidek
396b29c74e Clear some more places with potentially sensitive data.
MFC after:	1 week
2013-12-15 22:52:18 +00:00
Pawel Jakub Dawidek
2a3237c84f Clear content of keyfiles loaded by the loader after processing them.
Pointed out by:	rwatson
MFC after:	1 week
2013-12-15 22:51:26 +00:00
Alexander Motin
2634da8cd5 Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.

Bisected by:	pho
2013-12-12 08:23:28 +00:00
Justin Hibbits
6cec74b2e4 Partially revert r259080. bde@ pointed out that there are a lot more style bugs
going on in here than can be fixed, and I introduced some of my own.  Rather
than fix the whole host of them, back out my bugs.

Found by:	bde
X-MFC with:	r259080
2013-12-08 09:34:56 +00:00
Justin Hibbits
8991c54091 Fix some integer signs. These unsigned integers should all be signed.
Found by:	clang (powerpc64)
2013-12-07 19:55:34 +00:00
Eitan Adler
7a22215c53 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
Alexander Motin
7ae1a87bfe Escape special XML chars, returned by some devices, confusing XML parsers.
MFC after:	1 month
2013-11-27 14:25:06 +00:00
Marcel Moolenaar
3e5a0a6b70 Have the GPT probe return a lower priority when the MBR is not a PMBR
The purpose of the PMBR is to have the disk appear in use to GPT
unaware utilities (like fdisk).  However, if the PMBR has been changed
by a GPT unaware utlity then we must assume that this was deliberate
(as it involved removal of the special slice) and we should not treat
the unmodified GPT-specific sectors as being valid.  By lowering the
probe priority in that case, the MBR scheme will take precedence and
the kernel will end up using the MBR and not the GPT. We will still
use the GPT if the kernel does not support the MBR scheme.
2013-11-21 22:02:59 +00:00
Andrey V. Elsukov
32cea4ca0f Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4).
Now it is easy to expand the size of the mirror when all its components
are replaced. Also add g_resize method to geom_mirror class. It will write
updated metadata to new last sector, when parent provider is resized.

Silence from:	geom@
MFC after:	1 month
2013-11-19 22:55:17 +00:00
Alexander Motin
f8c79813cb In addition to r258220 allow shrinking in "automatic" mode if there is
already valid metadata found at the new location.  This should allow easy
transparent recovery if first resize was done by mistake.

While there, unify metadata write code and fix minor memory leak.

MFC after:	1 month
2013-11-17 05:38:54 +00:00
Alexander Motin
e6afd72b93 Implement automatic live resize support for GEOM MULTIPATH class.
In "manual" mode just automatically resize provider in any direction.
In "automatic" mode allow only growth (with new metadata write); in case
of shrinking destroy the multipath device same as before since it may be
undesirable to write new metadata within old user area.

MFC after:	1 month
2013-11-16 14:31:49 +00:00
Andrey V. Elsukov
743437c451 Add missing line breaks.
PR:		181900
MFC after:	1 week
2013-11-11 11:13:12 +00:00
Xin LI
7ac2e58818 When zero'ing out a buffer, make sure we are using right size.
Without this change, in the worst but unlikely case scenario, certain
administrative operations, including change of configuration, set or
delete key from a GEOM ELI provider, may leave potentially sensitive
information in buffer allocated from kernel memory.

We believe that it is not possible to actively exploit these issues, nor
does it impact the security of normal usage of GEOM ELI providers when
these operations are not performed after system boot.

Security:	possible sensitive information disclosure
Submitted by:	Clement Lecigne <clecigne google com>
MFC after:	3 days
2013-11-02 01:16:10 +00:00
John Baldwin
d6d78db57f Reject attempts to attack a disk device that has the old NEEDSGIANT
flag set.

Reviewed by:	mav
2013-10-25 19:19:12 +00:00
Steven Hartland
c28078e903 Improve ZFS N-way mirror read performance by using load and locality
information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay
2013-10-23 09:54:58 +00:00
Mateusz Guzik
aa25ccfa36 gnop: make sure that newly allocated memory for softc is zeroed
This prevents mtx_init from encountering non-zeros and panicking
the kernel as a result.

Reported by:	Keith White <kwhite site.uottawa.ca>
2013-10-23 01:34:18 +00:00
Alexander Motin
1a29adad30 Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9).
Since at least FreeBSD 7 we had only four of them in the base tree, and
in head branch, thanks to jhb@, we have no any for more then a year.
2013-10-22 10:21:20 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Edward Tomasz Napierala
fb0e57b1a2 Fix build with gcc by spelling unused format string as "unused" instead of NULL.
MFC after:	29 days
2013-10-19 08:20:00 +00:00
Edward Tomasz Napierala
19e5b2d50e Make geom_label(4) resize-aware. This fixes a situation when "gpart resize"
would resize a partition, but label providers - e.g. /dev/gptid/XXX - would
stay the same size.

Reviewed by:	mav
MFC after:	1 month
Sponsored by:	FreeBSD Foundation
2013-10-18 09:14:19 +00:00
Andrey V. Elsukov
884c8e4fea Add an automatic resize support to the GEOM_PART class.
When parent provider has been resized, the scheme specific G_PART_RESIZE
method does an update of scheme's metadata. But all changes are not saved
to disk, until `gpart commit` will be called.

Discussed with:	trasz
MFC after:	1 month
2013-10-17 16:18:43 +00:00
Alexander Motin
b43560ab19 MFprojects/camlock r256445:
Add unmapped I/O support to GEOM RAID.
2013-10-16 09:33:23 +00:00
Alexander Motin
21d0712c33 MFprojects/camlock r256371:
Fix passing uninitialized bio_resid argument to g_trace().
2013-10-16 09:21:40 +00:00
Alexander Motin
0fd2511ae2 MFprojects/camlock r254907:
Move g_io_deliver() out of the lock, as required for direct dispatch.
Move g_destroy_bio() out too to reduce lock scope even more.
2013-10-16 09:18:01 +00:00
Alexander Motin
e431d66c04 MFprojects/camlock r254905:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time.  Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.
2013-10-16 09:12:40 +00:00
Dag-Erling Smørgrav
1b2cb2b3f0 Introduce a kern.geom.notaste sysctl that can be used to temporarily
disable GEOM tasting to avoid the "bouncing GEOM" problem where, when
you shut down the consumer of a provider which can be viewed in multiple
ways (typically a mirror whose members are labeled partitions), GEOM
will immediately taste that provider's alter ego and reattach the
consumer.

Approved by:	re (glebius)
2013-09-24 20:05:16 +00:00
Andrey V. Elsukov
87c0c612d8 Remove stub implementation.
MFC after:	1 week
2013-09-05 09:44:09 +00:00
Alexander Motin
19351a14eb Make ELI destruction (including orphanization) less aggressive, making it
always wait for provider close.  Old algorithm was reported to cause NULL
dereference panic on attempt to close provider after softc destruction.
If not global workaroung in GEOM, that could even cause destruction with
requests still in flight.
2013-09-02 10:44:54 +00:00
Alexander Motin
3843eba85d MFprojects/camlock r254895:
Add unmapped BIO support to GEOM ZERO if kern.geom.zero.clear is cleared.
2013-08-26 20:39:02 +00:00
Alexander Motin
40f27d7cf6 Add new attribute lunname to report only textual LUN-specific device IDs.
While lunid attribute prefers to report numeric ones, having both may be
useful in some situations.
2013-08-24 09:42:14 +00:00
Kenneth D. Merry
ce625ec719 Change the way that unmapped I/O capability is advertised.
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver.  The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't.  The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot.  The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h:	Remove the D_UNMAPPED_IO cdevsw flag and replace it
		with an SI_UNMAPPED cdev flag.

kern_physio.c:	Look at the cdev SI_UNMAPPED flag to determine
		whether or not a particular driver can handle
		unmapped I/O.

geom_dev.c:	Set the SI_UNMAPPED flag for all GEOM cdevs.
		Since GEOM will create a temporary mapping when
		needed, setting SI_UNMAPPED unconditionally will
		work.

		Remove the D_UNMAPPED_IO flag.

nvme_ns.c:	Set the SI_UNMAPPED flag on cdevs created here
		if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c:	In aio_qphysio(), check the SI_UNMAPPED flag on a
		cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h:	Bump __FreeBSD_version to 1000045 for the switch from
		setting the D_UNMAPPED_IO flag in the cdevsw to setting
		SI_UNMAPPED in the cdev.

Reviewed by:	kib, jimharris
MFC after:	1 week
Sponsored by:	Spectra Logic
2013-08-15 22:52:39 +00:00
Alexander Motin
0f0b2fd889 Return error when opening read-only volumes (like RAID4/5/...) for writing.
Previously opens succeeded, but actual write operations returned errors.

Requested by:	peter
MFC after:	2 weeks
2013-08-13 07:56:40 +00:00
Alexander Motin
db8645f05e Oops, wrong constant at r254269. 2013-08-13 06:25:34 +00:00
Alexander Motin
e70b565ba4 Fix reasonable but safe Clang warnings. 2013-08-13 06:21:36 +00:00
Ed Schouten
647a92d62b Fix the formatting of the error message.
The G_MIRROR_DEBUG() macro already appends a newline. Also, most of the
log messages emitted by gmirror start with an uppercase letter.
2013-08-12 18:17:45 +00:00
Andrey V. Elsukov
b74dd6c77b gpt_entries is used as limit for the number of partition entries in
the GEOM_PART. Instead of just using number of entries from the GPT
header, calculate this limit based on the reserved space between
GPT header and first available LBA.

MFC after:	2 weeks
2013-08-08 16:09:20 +00:00
Marcel Moolenaar
e01c6f329a Change <sys/diskpc98.h> to not redefine the same symbols that are
being defined in <sys/diskmbr.h>. Instead give the symbols here a
"PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h>
can be included in the same C source file.

The renaming is trivial. The only gotcha is that DOSBBSECTOR is
also redefined from 0 to 1. This because DOSBBSECTOR was always
used in conjunction with an addition of 1. The PC98_BBSECTOR symbol
is defined as 1 and the expression is simplified.

Note: it is not believed that ports are seriously impacted; or at
all for that matter.

Approved by: nyan@
2013-08-07 00:00:48 +00:00
Marcel Moolenaar
b9fdaa9b19 Remove inclusion of <sys/diskmbr.h>. We have no business knowing
anything related to MBR in this file.
2013-08-04 21:00:22 +00:00
Alexander Motin
8531bb3f0c Introduce 3 seconds timeout on graid stop command (mostly with -f flag).
Since completion waiting goes in g_event thread, it may cause GEOM deadlock
if consumer on top (for example, ZFS) uses g_event thread for closing.
2013-07-27 15:02:19 +00:00
Konstantin Belousov
a4a65e69c6 When panicing due to the gjournal overflow, print the geom metadata
journal id.

Requested by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	1 week
2013-07-10 10:11:43 +00:00
Konstantin Belousov
cc3d8c35f5 There are several code sequences like
vfs_busy(mp);
      vfs_write_suspend(mp);
which are problematic if other thread starts unmount between two
calls.  The unmount starts a write, while vfs_write_suspend() drain
writers.  On the other hand, unmount drains busy references, causing
the deadlock.

Add a flag argument to vfs_write_suspend and require the callers of it
to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the
mount path, i.e. the covered vnode is not locked.  The suspension is
not attempted if VS_SKIP_UNMOUNT is specified and unmount is in
progress.

Reported and tested by:	Andreas Longwitz <longwitz@incore.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2013-07-09 20:49:32 +00:00
Steven Hartland
8383a92e5b Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940.
Ensure that d_delmaxsize is always set, removing init to 0 which could cause
future issues if use cases change.

Allow kern.cam.da.X.delete_max (which maps to d_delmaxsize) to be increased
up to the calculated max after being reduced.

MFC after:	1 day
X-MFC-With: r249940
2013-07-03 23:46:30 +00:00
Jeff Roberson
5f51836645 - Add a general purpose resource allocator, vmem, from NetBSD. It was
originally inspired by the Solaris vmem detailed in the proceedings
   of usenix 2001.  The NetBSD version was heavily refactored for bugs
   and simplicity.
 - Use this resource allocator to allocate the buffer and transient maps.
   Buffer cache defrags are reduced by 25% when used by filesystems with
   mixed block sizes.  Ultimately this may permit dynamic buffer cache
   sizing on low KVA machines.

Discussed with:	alc, kib, attilio
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-06-28 03:51:20 +00:00
Scott Long
f07b69478e Fix a mystery cut-n-paste corruption from the previous commit.
Submitted by:	Brenden Fabeny
2013-06-19 23:09:10 +00:00
Scott Long
2084cbe975 Mark geom_mirror as capable of unmapped i/o
Obtained from:	Netflix
MFC after:	3 days
2013-06-19 21:52:32 +00:00
Alexander Motin
ccba710262 Make CAM return and GEOM DISK pass through new GEOM::lunid attribute.
SPC-4 specification states that serial number may be property of device,
but not a specific logical unit.  People reported about FC storages using
serial number in that way, making it unusable for purposes of LUN multipath
detection.  SPC-4 states that designators associated with logical unit from
the VPD page 83h "Device Identification" should be used for that purpose.
Report first of them in the new attribute in such preference order: NAA,
EUI-64, T10 and SCSI name string.

While there, make GEOM DISK properly report GEOM::ident in XML output also
using d_getattr() method, if available.  This fixes serial numbers reporting
for SCSI disks in `geom disk list` output and confxml.

Discussed with:	gibbs, ken
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2013-06-12 13:36:20 +00:00
Alexander Motin
c145d6005f Don't update provider properties and don't set DISKFLAG_OPEN if d_open()
disk method call returned error.  GEOM considers devices in such case as
still closed, and won't call symmetric d_close() for them.
2013-06-11 10:06:07 +00:00
Marcel Moolenaar
3bd22a9cc8 Change the set and unset ctlreqs by making the index argument optional.
This allows setting attributes on tables. One simply does not provide
an index in that case. Otherwise the entry corresponding the index has
the attribute set or unset.

Use this change to fix a relatively longstanding bug in our GPT scheme
that's the result of rev 198097 (relatively harmless) followed by rev
237057 (damaging). The damaging part being that our GPT scheme always
has the active flag set on the PMBR slice. This is in violation with
EFI. Existing EFI implementions for both x86 and ia64 reject the GPT.
As such, GPT disks created by us aren't usable under EFI because of
that.

After this change, GPT disks never have the active flag set on the PMBR
slice. In order to make the GPT disk bootable under some x86 BIOSes,
the reason of rev 198097, one must now set the active attribute on the
gpt table. The kernel will apply this to the PMBR slice For (S)ATA:
	gpart set -a active ada0

To fix an existing GPT disk that has the active flag set in the PMBR,
and that does not need the flag, use (again for (S)ATA):
	gpart unset -a active ada0

The EBR, MBR & PC98 schemes, which also impement at least 1 attribute,
now check to make sure the entry passed is valid. They do not have
attributes that apply to the table.
2013-06-09 23:34:26 +00:00
Marcel Moolenaar
0f4389991c Remove stub implementation. 2013-06-09 23:12:43 +00:00
Brooks Davis
444e780150 MFP4 @222836
Add support for partitioning CFI disks from FDT using geom_flashmap.

Sponsored by:	DARPA, AFRL
2013-05-30 01:19:02 +00:00
Jaakko Heinonen
9641a51279 Remove an extra semicolon from the DOT language output.
PR:		kern/178540
Submitted by:	Trond Endrestol
MFC after:	1 week
2013-05-21 18:40:54 +00:00
Alexander Motin
57eed4a86f Fix vdc->Secondary_Element_Count metadata field access from 16 to 8 bit.
In some cases it could cause kernel panic during failed drive replacement.

Reported by:	trasz
MFC after:	1 week
2013-05-20 00:33:54 +00:00
Stanislav Sedov
77f8606428 - Use int8_t type for the mftrecsz field in g_label_ntfs. char type
used previously caused probe failure on platforms where char is unsigned
  (e.g. ARM), as mftrecsz can be negative.

Submitted by:	Ilya Bakulin <ilya@bakulin.de>
MFC after:	2 weeks
2013-05-05 08:00:16 +00:00
Alexander Motin
bcb6ad36f2 Return "descr" field alike to "Intel RAID1 volume" for GEOM RAID to make
it look better in bsdinstall.
2013-04-27 06:57:39 +00:00
Steven Hartland
9fe9ba5bef Teach GEOM and CAM about the difference between the max "size" of r/w and delete
requests.

sys/geom/geom_disk.h:
        - Added d_delmaxsize which represents the maximum size of individual
          device delete requests in bytes. This can be used by devices to
          inform geom of their size limitations regarding delete operations
          which are generally different from the read / write limits as data
          is not usually transferred from the host to physical device.

sys/geom/geom_disk.c:
        - Use new d_delmaxsize to calculate the size of chunks passed through to
          the underlying strategy during deletes instead of using read / write
          optimised values. This defaults to d_maxsize if unset (0).

        - Moved d_maxsize default up so it can be used to default d_delmaxsize

sys/cam/ata/ata_da.c:
        - Added d_delmaxsize calculations for TRIM and CFA

sys/cam/scsi/scsi_da.c:
        - Added re-calculation of d_delmaxsize whenever delete_method is set.

        - Added kern.cam.da.X.delete_max sysctl which allows the max size for
          delete requests to be limited. This is useful in preventing timeouts
          on devices who's delete methods are slow. It should be noted that
          this limit is reset then the device delete method is changed and
          that it can only be lowered not increased from the device max.

Reviewed by:	mav
Approved by:	pjd (mentor)
2013-04-26 16:22:54 +00:00
Steven Hartland
6f926c0b82 Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum
size of a delete request sent to the providing device performed by g_dev_ioctl.

This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA
deletes which siginificantly improves performance.

Previously this was hard coded to 65536 sectors, the new default is 262144
which doubles the throughput of deletes on commonly available SSD's.

In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput
from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via
newfs -E

For some SSD's where delete time is pretty much constant, no matter what
the request, setting this to 0 will provide significantly better throughput
e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s

Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:43:24 +00:00
Ivan Voras
8e9405e8a7 Comment typo fix.
Is aware of the importance of comments: dim
2013-04-16 22:42:40 +00:00
Ivan Voras
9a796b22f6 Fix the buffer-overflow-fixing fixes.
Pointy-hat to: me, for not realizing snprintf() is available in kernel.
Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for
           noting that the panic can be provoked in i386 and not in amd64
2013-04-16 19:58:24 +00:00
Brooks Davis
b7b63db789 Partial MFP4 of 222836:
Only look for FDT partitions if our potential parent is a DISK device.

Excluding direct recursion on the flashmap geoms was insufficient
because it did not prevent the underlying device from being retrieved if
flashmap geoms were further partitioned.

Reviewed by:	imp
Sponsored by:	DARPA, AFRL
2013-04-16 17:47:13 +00:00
Ivan Voras
c072011223 Introduce glabel labels based on GEOM ident attributes. In this initial
implementation, error on the side of conservatism and only create labels
for GEOMs of classes DISK and MULTIPATH.

Discussed with:	trasz
Approved by:	silence from freebsd-geom@
2013-04-15 16:09:24 +00:00
Ivan Voras
252c094e53 Introduce a symbol for the GEOM class name instead of using the ad-hoc string
constant.
2013-04-15 15:55:40 +00:00
John-Mark Gurney
d7078f3ba0 move the error report to a lower log level... Now you can see when it
returns an error without getting every single io that went through it..

MFC after:	1 week
2013-04-13 19:02:58 +00:00
Edward Tomasz Napierala
16fac6c92a Make it possible to submit FLUSH bios through geom_dev strategy. This
is required for CTL to work with device-backed LUNs.

Reviewed by:	mav
2013-04-06 10:32:06 +00:00
Alexander Motin
0fb832fdf0 Following r241022, replace iteration over the provider list on media events
by taking first one and asserting that there is no others.

MFC after:	1 week
2013-04-05 13:11:28 +00:00
Alexander Motin
7868ec506b geom_slice.c and its consumers like GEOM_LABEL are not touching the data
unless hotspots are used.  Pass G_PF_ACCEPT_UNMAPPED flag through except
such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).
2013-03-26 07:55:24 +00:00
Alexander Motin
6c6e13b6e1 GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through. 2013-03-26 05:58:49 +00:00
Alexander Motin
a93c0ed463 Remove extra bio_data and bio_length copying to child request after calling
g_clone_bio(), that already copied them.
2013-03-26 05:42:12 +00:00
Alexander Kabaev
31932fae1e Do not pass unmapped buffers to drivers that cannot handle them
In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.

This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.

Discussed with: kib (earlier version)
2013-03-26 01:17:06 +00:00
Alexander Motin
f4673017b3 Make GEOM MULTIPATH to report unmapped bio support if underling path report
it.  GEOM MULTIPATH itself never touches the data and so transparent.
2013-03-25 07:24:58 +00:00
Alexander Motin
30ba747160 In GEOM DISK:
- Replace single done mutex with per-disk ones.  On system with several
disks on several HBAs that removes small, but measurable lock congestion.
 - Modify disk destruction process to not destroy the mutex prematurely.
 - Remove some extra pointer derefences.
2013-03-25 05:45:24 +00:00
Alexander Motin
3c330aff3f Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock).  Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction.  Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.
2013-03-24 10:14:25 +00:00
Alexander Motin
50199fa0d0 Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention.  Add few missing
g_do_wither() calls in respective places to signal it.

This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed.  For
example, see r227009 and r227510.
2013-03-24 03:15:20 +00:00
Konstantin Belousov
e808788c05 Correct the page count when excess length is trimmed from the bio.
Reported and tested by:	Ivan Klymenko <fidaj@ukr.net
2013-03-21 22:36:43 +00:00
Konstantin Belousov
6c83fce371 Assert that transient mapping of the bio is only done when unmapped
buffers are allowed.

Sponsored by:	The FreeBSD Foundation
2013-03-21 07:26:33 +00:00
Konstantin Belousov
db7bfaa8ce The geom_part provider supports unmapped bio iff the underlying
provider does so, since geom_part never inspects the bio_data.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:50:24 +00:00
Konstantin Belousov
f8c19ba466 A flag for the geom disk driver to indicate that it accepts the
unmapped i/o requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:49:15 +00:00
Konstantin Belousov
ee75e7de7b Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
Pawel Jakub Dawidek
c4d2d401f8 We don't need buffer to handle BIO_DELETE, so don't check buffer size for it.
This fixes handling BIO_DELETE larger than MAXPHYS.
2013-03-14 23:07:01 +00:00
Sean Bruno
bd9fba0cfe Add legacy support to geom raid to create a /dev/arX device for support
of upgrading older machines using ataraid(4) to newer releases.

This optional parameter is controlled via kern.geom.raid.legacy_aliases
and will create a /dev/ar0 device that will point at /dev/raid/r0 for
example.

Tested on Dell SC 1425 DDF-1 format software raid controllers installing from
stable/7 and upgrading to stable/9 without having to adjust /etc/fstab

Reviewed by:	mav
Obtained from:	Yahoo!
MFC after:	2 Weeks
2013-03-08 20:07:32 +00:00
Jean-Sébastien Pédron
f5c1ef84f9 g_label_ntfs_taste: Abort taste is recsize == 0
This will avoid a 0-byte read (in g_read_data()) leading to a panic, if
previously read data are erroneous.

Suggested by:	John-Mark Gurney <jmg@funkthat.com>
2013-03-08 18:07:43 +00:00
Gavin Atkinson
10f29053d2 Support the FAT16 partition type in gpart(8)
PR:		kern/174714
Submitted by:	4721 at hushmail dot com
MFC after:	1 week
2013-03-07 22:32:41 +00:00
Alexander Motin
34d3281c57 Fix panic when Secondary_Element_Count == 1 and Secondary_Element_Seq
is not set (255).

Reported by:	sbruno
MFC after:	1 week
2013-03-07 18:55:37 +00:00
Jean-Sébastien Pédron
5943eed4b9 g_label_ntfs.c: Mark structures as __packed
Without this, read data is mis-interpreted. This could trigger a panic,
as was the case on one computer where computed "recsize" was zero,
leading to a call to g_read_page() asking for 0 bytes.
2013-03-05 11:02:05 +00:00
Attilio Rao
0f90e981cb Remove ntfs headers dependency for g_label_ntfs.c by redefining the
used structs and values.

This patch is not targeted for MFC.
2013-03-02 18:23:59 +00:00
Kirk McKusick
2bc1a1fe5c Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.

Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).

Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.

Reviewed by: kib
Tested by:   Peter Holm
2013-02-16 14:51:30 +00:00
Andriy Gapon
1f1088b843 g_mirror: g_getattr() failure should not be fatal
This allows to use gmirror e.g. on top of ZVOLs.

PR:		kern/175323
Submitted by:	Alexei.Volkov@softlynx.ru, mav
Reported by:	Alexei.Volkov@softlynx.ru
Tested by:	Alexei.Volkov@softlynx.ru
Reviewed by:	ae, mav, pjd
MFC after:	1 week
2013-01-26 10:50:04 +00:00
Alexander Motin
c3ec009a97 - Fix rebuild position broken at r245522.
- Identify one more metadata field.
2013-01-17 03:27:08 +00:00
Alexander Motin
821a0f639e For Promise/AMD metadata add support for disks with capacity above 2TiB
and for volumes with sector size above 512 bytes.
2013-01-17 00:50:25 +00:00
Alexander Motin
ed8180e665 Recalculate volume size only for real CONCATs. For SINGLE trust volume
size given by metadata, as it should be correct and in some cases can be
smaller then subdisk size.
2013-01-17 00:09:50 +00:00
Alexander Motin
2c6a273750 Allow to insert new component to geom_raid3 without specifying number.
PR:		kern/160562
MFC after:	2 weeks
2013-01-15 10:06:35 +00:00
Alexander Motin
f62c1a47d6 Alike to r242314 for GRAID make GRAID3 more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after:	2 weeks
2013-01-15 01:27:04 +00:00
Alexander Motin
cbab616174 Alike to r242314 for GRAID make GMIRROR more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

PR:		kern/113957
MFC after:	2 weeks
2013-01-15 01:13:55 +00:00
Alexander Motin
4c10c25e33 Keep value of orig_config_id metadata field. Windows driver writes there
previous value of config_id when it is changed in some cases.  I guess it
may be used do avoid some split-brain conditions.
2013-01-14 20:31:45 +00:00
Alexander Motin
eb84fc957c Small cosmetic tuning of the IRRT status constants. 2013-01-14 16:38:43 +00:00
Alexander Motin
511c69d9ce Print some more metadata fields. 2013-01-14 13:06:35 +00:00
Alexander Motin
898a4b74f4 Windows driver writes relative volume IDs to metadata field. Use that value
as a hint for raid/rX device number to make it persistent across reboots.
2013-01-14 00:38:51 +00:00
Alexander Motin
f9462b9bbe - Add checks for Intel metadata version and attributes. Ignore disks with
unsupported metadata types like Intel Smart Response to not corrupt them.
 - Improve setting of these things during metadata writing to protect from
incapable BIOS'es and other implementations.
2013-01-13 23:00:40 +00:00
Alexander Motin
b99586c25f Improve support for disabled disks. If disabled disk disconnected and then
reconnected back, leave it as disconnected. If new disk inserted instead of
disabled, rebuild it and leave as enabled.
2013-01-13 14:30:37 +00:00
Alexander Motin
865aea63c3 Windows handles INIT and VERIFY as array-wide and it doesn't specify which
disks should be rebuilt. Our rebuild code is same time disk-centric.  To
handle this situation  properly check all disks for RBLD flags, and if no
disk specified try rebuild/resync all of them except newly inserted.
2013-01-12 21:51:49 +00:00
Alexander Motin
4c95a24141 Implement migration from single disk to RAID1/IRRT for Intel metadata.
Windows driver uses such migration when it creates new arrays.  While GEOM
RAID has no mechanism to implement migration in general case, this specifc
case still can be handled easily via degraded RAID1 creation followed by
regular rebuild.
2013-01-12 18:25:48 +00:00
Alexander Motin
26c538bc0b Add basic support for Intel Rapid Recover Technology (Intel RRT).
It is alike to RAID1, but with dedicating master and recovery disks and
providing manual control over synchronization.  It allows to use recovery
disk as snapshot of the master disk from the time of the last sync.

This implementation is not functionaly complete comparing to Windows,
but it is better then silent conversion to RAID1 on first boot.
2013-01-12 09:35:44 +00:00
Konstantin Belousov
ddd6b3fc33 Add flags argument to vfs_write_resume() and remove
vfs_write_resume_flags().

Sponsored by:	The FreeBSD Foundation
2013-01-11 06:08:32 +00:00
Pawel Jakub Dawidek
6011443800 Reset provider-specific fields when resending I/O request in low memory
conditions. This fixes assertion which checks those fields when kernel is
compiled with DIAGNOSTIC.

Reported by:	kib, pho
MFC after:	1 week
2012-12-26 20:07:47 +00:00
Jaakko Heinonen
efec959c2c Mangle label names containing spaces, non-printable characters '%' or
'"'.  Mangling is only done for label names read from file system
metadata. Encoding resembles URL encoding. For example, the space
character becomes %20.

Help by:	kib
Discussed with:	imp, kib, pjd
2012-12-22 13:43:12 +00:00
Jaakko Heinonen
02c62349c9 - Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by:	pjd
2012-11-20 12:32:18 +00:00
Alfred Perlstein
bad7e7f3dd Provide a device name in the sysctl tree for programs to query the
state of crashdump target devices.

This will be used to add a "-l" (ell) flag to dumpon(8) to list the
currently configured dumpdev.

Reviewed by:	phk
2012-11-01 17:01:05 +00:00
Edward Tomasz Napierala
549f62fa42 Fix problem with geom_label(4) not recognizing UFS labels on filesystems
extended using growfs(8).  The problem here is that geom_label checks if
the filesystem size recorded in UFS superblock is equal to the provider
(i.e. device) size.  This check cannot be removed due to backward
compatibility.  On the other hand, in most cases growfs(8) cannot set
fs_size in the superblock to match the provider size, because, differently
from newfs(8), it cannot recompute cylinder group sizes.

To fix this problem, add another superblock field, fs_providersize, used
only for this purpose.  The geom_label(4) will attach if either fs_size
(filesystem created with newfs(8)) or fs_providersize (filesystem expanded
using growfs(8)) matches the device size.

PR:		kern/165962
Reviewed by:	mckusick
Sponsored by:	FreeBSD Foundation
2012-10-30 21:32:10 +00:00
Alexander Motin
650e245ebf Minor addition to r242323:
Alike to BIO_WRITE, report success if at least one subdisk succeeded with
BIO_DELETE.  But unlike BIO_WRITE don't fail disk on BIO_DELETE error.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2012-10-29 21:08:06 +00:00
Alexander Motin
609a74746a Add basic BIO_DELETE support to GEOM RAID class for all RAID levels.
If at least one subdisk in the volume supports it, BIO_DELETE requests
will be propagated down.  Unfortunatelly, for RAID levels with redundancy
unmapped blocks will be mapped back during first rebuild/resync process.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2012-10-29 18:04:38 +00:00
Edward Tomasz Napierala
1af2d09b49 Fix locking problem in disk_resize(); previously it would run without
topology lock, resulting in assertion when running with DIAGNOSTIC.

Reviewed by:	mav (earlier version)
2012-10-29 17:52:43 +00:00
Alexander Motin
a479c51be3 Make GEOM RAID more aggressive in marking volumes as clean on shutdown
and move that action from shutdown_pre_sync to shutdown_post_sync stage
to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after:	2 weeks
2012-10-29 14:18:54 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Attilio Rao
682ee99e7a It seems that it is preferable to keep support for glabel also for
filesystems that we don't support natively.
Revert part of r241636 to do so.

This patch is not targeted for MFC.

Requested by:	gleb, jhb
2012-10-18 22:18:11 +00:00
Attilio Rao
a42ac676f5 Disconnect non-MPSAFE NTFS from the build in preparation for dropping
GIANT from VFS. This code is particulary broken and fragile and other
in-kernel implementations around, found in other operating systems,
don't really seem clean and solid enough to be imported at all.
If someone wants to reconsider in-kernel NTFS implementation for
inclusion again, a fair effort for completely fixing and cleaning it
up is expected.

In the while NTFS regular users can use FUSE interface and ntfs-3g
port to work with their NTFS partitions.

This is not targeted for MFC.
2012-10-17 11:30:00 +00:00
Alexander Motin
c6f0cd57e3 NULL-ify last previously used pointer instead of last possible pointer.
This should be only a cosmetic change.

Found by:	Clang Static Analyzer
2012-10-10 20:41:37 +00:00
Alexander Motin
6871a543f9 Make graid command line a bit more friendly by allowing volume name or
provider name to be specified instead of geom name (first argument in all
subcommands except label).  In most cases there is only one array used
any way, so it is not really useful to make user type ugly geom names like
Intel-f0bdf223 or SiI-732c2b9448cf.  Though they can be used in some cases.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2012-10-07 19:30:16 +00:00
Andriy Gapon
a90c9dfeab g_part_taste: directly destroy consumer and geom here, no need for withering
Besides withered but still alive consumers may interfere with
re-tatsing.

MFC after:	16 days
2012-10-06 19:52:50 +00:00
Pawel Jakub Dawidek
5d8a6a1078 Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.

The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.

Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.

Discussed with:	ken
MFC after:	1 week
2012-09-28 08:22:51 +00:00
Pawel Jakub Dawidek
171f6b3a34 Use the topology lock to protect list of providers while withering them.
It is possible that provider is destroyed while we are iterating over the
list.

Reported by:	Brian Parkison <parkison@panzura.com>
Discussed with:	phk
MFC after:	1 week
2012-09-22 12:41:49 +00:00
Andriy Gapon
85f5b9aa70 g_disk_flushcache definitely should not be traced under G_T_TOPOLOGY
... use G_T_BIO instead

MFC after:	1 week
2012-09-18 07:57:34 +00:00
Alexander Motin
c89d2fbe18 Add global and per-module sysctls/tunables to enable/disable metadata taste.
That should help to handle some cases when disk has some RAID metadata that
should be ignored, especially during boot.

MFC after:	3 days
2012-09-13 13:27:09 +00:00
Gleb Smirnoff
4a7f7b10b5 When synchronizing, include in the config dump amount of
bytes syncronized.
  The rationale behind this is the following: for large disks the
percent synchronisation counter ticks too seldom, and monitoring
software (as well as human operator) can't tell whether
synchronisation goes on or one of disks got stuck. On an idle
server one can look into gstat and see whether synchronisation goes
on or not, but on a busy server that won't work. Also, new value
monitored can be differentiated obtaining the synchronisation speed
quite precisely.

Submitted by:	Konstantin Kukushkin <dark ramtel.ru>
Reviewed by:	pjd
2012-09-11 20:20:13 +00:00
Pawel Jakub Dawidek
769afdc71e Allow to pass providers with /dev/ prefix to g_provider_by_name().
MFC after:	3 days
2012-09-01 10:52:19 +00:00
Ed Schouten
24d1105dde Remove unneeded G_PF_CANDELETE flag.
This flag is only used by GEOM so it can be propagated to the character
device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.
2012-08-28 19:28:31 +00:00
Thomas Quinot
8fb378d6b1 (g_multipath_rotate): Fix algorithm so that it does rotate over all good
providers, not just the last two.

PR:		kern/170379
Reviewed by:	mav
MFC after:	2 weeks
2012-08-25 10:36:31 +00:00
Pawel Jakub Dawidek
9d18043979 Always initialize sc_ekey, because as of r238116 it is always used.
If GELI provider was created on FreeBSD HEAD r238116 or later (but before this
change), it is using very weak keys and the data is not protected.
The bug was introduced on 4th July 2012.

One can verify if its provider was created with weak keys by running:

	# geli dump <provider> | grep version

If the version is 7 and the system didn't include this fix when provider was
initialized, then the data has to be backed up, underlying provider overwritten
with random data, system upgraded and provider recreated.

Reported by:	Fabian Keil <fk@fabiankeil.de>
Tested by:	Fabian Keil <fk@fabiankeil.de>
Discussed with:	so
MFC after:	3 days
2012-08-10 18:43:29 +00:00
Alexander Motin
d9d6849693 Add missing FAILED event to g_raid_subdisk_event2str() to print it properly
in debug messages.

Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
2012-08-10 13:36:33 +00:00
Jim Harris
82a6ae1009 Clone BIO_ORDERED flag, for disk drivers (namely CAM) that try to
consume it.

Sponsored by: Intel
Discussed with: gibbs, scottl
2012-08-07 20:16:10 +00:00
Mikolaj Golub
1d9db37c77 In g_gate_dumpconf() always check the result of g_gate_hold().
This fixes "Negative sc_ref" panic possible when sysctl_kern_geom_confxml()
is run simultaneously with destroying GATE device.

Reviewed by:	pjd
MFC after:	3 days
2012-08-07 18:50:33 +00:00