Commit Graph

2406 Commits

Author SHA1 Message Date
Kirk McKusick
dffce2150e Refactoring of reading and writing of the UFS/FFS superblock.
Specifically reading is done if ffs_sbget() and writing is done
in ffs_sbput(). These functions are exported to libufs via the
sbget() and sbput() functions which then used in the various
filesystem utilities. This work is in preparation for adding
subperblock check hashes.

No functional change intended.

Reviewed by: kib
2018-01-26 00:58:32 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Alan Somers
6f7f85e0e1 gnop(8): add the ability to set a nop provider's physical path
While I'm here, expand the existing tests a bit.

MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13579
2018-01-18 05:57:10 +00:00
Pedro F. Giffuni
0cee6dcdb0 misc geom and gnu: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:23:16 +00:00
Andriy Gapon
6ce374aa94 geom_disk / scsi_da: deny opening write-protected disks for writing
Ths change consists of two parts.

geom_disk: deny opening a disk for writing if it's marked as
write-protected.  A new disk(9) flag is added to mark write protected
disks.  A possible alternative could be to add another parameter to d_open,
so that the open mode could be passed to it and the disk drivers could
make the decision internally, but the flag required less churn.

scsi_da: add a new phase of disk probing to query the all pages mode
sense page.  We can determine if the disk is write protected using bit 7
of the device specific field in the mode parameter header returned by
MODE SENSE.

PR:		224037
Reviewed by:	mav
MFC after:	4 weeks
Differential Revision: https://reviews.freebsd.org/D13360
2018-01-15 11:20:00 +00:00
Mark Johnston
762f440f15 Fix handling of read errors during mirror synchronization.
We would previously just free the request BIO, which would either cause
the disk to stay stuck in the SYNCHRONIZING state, or result in
synchronization completing without having copied the block which
returned an error.

With this change, if the disk which returned an error is the only active
disk in the mirror, the synchronizing disk is kicked out. Otherwise, the
read is retried.

Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-01-10 19:37:21 +00:00
Mark Johnston
792f0c3b09 Clarify the use of the gmirror flag mask constants.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2018-01-10 15:21:36 +00:00
Mark Johnston
aed882a9fb Avoid referencing a possibly freed consumer after r327496.
g_mirror_regular_request() may free the gmirror consumer for a disk
if that disk is being disconnected, after which we must not dereference
the consumer pointer.

CID:		1384280
X-MFC with:	r327496
2018-01-10 05:06:21 +00:00
Mark Johnston
8b0a00b745 Sort and remove unneeded includes.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2018-01-08 15:56:40 +00:00
Mark Johnston
7653e6d781 Release the queue lock before restarting the worker loop.
Reported and tested by:	pho
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2018-01-08 15:41:49 +00:00
Mark Johnston
1787c3feb4 Fix some I/O ordering issues in gmirror.
- BIO_FLUSH requests were dispatched to the disks directly from
  g_mirror_start() rather than going through the mirror's I/O request
  queue, so they could have been reordered with preceding writes.
  Address this by processing such requests from the queue, avoiding
  direct dispatch.
- Handling for collisions with synchronization requests was too
  fine-grained and could cause reordering of writes. In particular,
  BIO_ORDERED was not being honoured. Address this by effectively
  freezing the request queue any time a collision with a synchronization
  request occurs. The queue is unfrozen once the collision with the
  first frozen request is over.
- The above-mentioned collision handling allowed reads to jump ahead
  of writes to the same offset. Address this by freezing all request
  types when a collision occurs, not just BIO_WRITEs and BIO_DELETEs.

Also add some more fail points for use in testing error handling.

Reviewed by:	imp
MFC after:	3 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13559
2018-01-02 18:11:54 +00:00
Colin Percival
8b8a7c43a9 Instrument "boot holds" for the benefit of the TSLOG framework. These
are places where the "main thread" of the booting kernel (either the
thread which later becomes swapper or the thread which later becomes
init) has to stop and wait for action to take place in another thread
before continuing.

There are currently three such holds:
1. The intr_config_hooks SYSINIT waits for hooks registered via the
config_intrhook_establish function; this allows (typically) devices
which need interrupts enabled to complete their initialization to do
so before root is mounted.

2. The g_waitidle function waits for the GEOM event queue to be empty;
this ensures that all of the disks which have been attached have been
tasted before we attempt to mount root.

3. The vfs_mountroot_wait function (in addition to calling g_waitidle)
waits for holds registered via root_mount_hold; among other things, this
is used by the USB subsystem to ensure that we don't fail to mount root
if it's located on a USB disk which takes a while to probe.
2017-12-31 09:23:52 +00:00
Pedro F. Giffuni
2afb21f309 geom_ccd.c: Fix the licenses properly
The license merging in r109471 didn't take into account that licensing
could change. Just removing the 3rd clause obviates the copyright
assignment to the NetBSD Foundation.

We do have plenty of files that have two or more licensing as in this
case, so fix this properly by splitting back the licenses as they are
upstream.

Obtained from:	NetBSD
2017-12-30 02:07:18 +00:00
Pedro F. Giffuni
68689f580b geom_ccd.c: Update the license with changes from upstream.
Part of this file originated in NetBSD, with the original file
carrying two versions of 4-clause BSD licenses. r109471 attempted to
simplify the situation by putting both licenses together.

Meanwhile, NetBSD dropped Clauses 3 and 4 from their own license, and
eventually NetBSD got permission from the University of Utah to drop the
3rd clause.

Keep the license "simple" by dropping the third clause since both TNF,
Utah/Berkeley and phk agree in principle that it can be dropped.

Obtained from:	NetBSD (ccd.c CVS 1.128, 1.138)
2017-12-30 01:37:08 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Mark Johnston
9abe2e7e98 Avoid using bioq_* in gmirror.
gmirror does not perform any sorting of I/O requests, so the bioq API
doesn't provide any advantages over plain TAILQs. The API also does not
provide operations needed by an upcoming change.

No functional change intended. The diff shrinks the geom_mirror.ko
text and the gmirror softc slightly.

Tested by:	pho (part of a larger patch)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-19 17:13:04 +00:00
Mark Johnston
68eadcec0f Give a couple of predication functions a bool return type.
No functional change intended.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-15 19:14:21 +00:00
Mark Johnston
204d94f161 Typo.
MFC after:	1 week
2017-12-15 19:03:03 +00:00
Mark Johnston
8b93770503 Address a possible lost wakeup for gmirror events.
g_mirror_event_send() acquires the I/O queue lock to deliver a wakeup
to the worker thread, and this is done after enqueuing the event.
So it's sufficient to check the event queue before atomically releasing
the queue lock and going to sleep.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-12 17:29:34 +00:00
Mark Johnston
b634781eac Give g_mirror_event_get() a more accurate name.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-12 17:25:25 +00:00
Mark Johnston
a3584ee355 Decrement sc_writes when BIO_DELETE requests complete.
Otherwise a gmirror that has received a BIO_DELETE request will never be
marked clean (unless sc_writes overflows).

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-12 17:24:30 +00:00
Eugene Grosbein
1b8ea9beff geom_raid (RAID5): do not lose bp->bio_error, keep it in pbp->bio_error
and return it by passing to g_raid_iodone()

Approved by:	mav (mentor)
MFC after:	3 days
2017-12-07 20:09:17 +00:00
Eugene Grosbein
01a51aba38 Fix use-after-free that sometimes results in a garbage returned
instead of right error code after requests to SINGLE/CONCAT volumes, f.e:

# dd if=/dev/raid/r0 bs=512 of=/dev/null
dd: /dev/raid/r0: Unknown error: -559038242

Reviewed by:	avg (mentor), mav (mentor)
MFC after:	3 days
2017-12-07 05:55:18 +00:00
Warner Losh
700b6f8e23 When building standalone, include stand.h rather than the kernel
includes or the userland includes.

Sponsored by: Netflix
2017-12-05 21:37:32 +00:00
Warner Losh
3a7d67e741 We don't need both _STAND and _STANDALONE. There's more places that
use _STANDALONE, so change the former to the latter.

Sponsored by: Netflix
2017-12-02 00:07:09 +00:00
Mark Johnston
2ceafb776e Update gmirror metadata less frequently when synchronizing.
We periodically record synchronization progress in the metadata
block of the disk being synchronized; this allows an interrupted
synchronization to be resumed. However, the frequency of these
updates heavily pessimized synchronization time on some media. This
change modifies gmirror to update metadata based on a time period,
and adds a sysctl to control that period. The default value results
in a much lower update frequency and increases the completion time
for an interrupted rebuild only marginally.

Reported by:	Andre Albsmeier <andre@fbsd.e4m.org>
MFC after:	3 weeks
2017-11-30 20:36:29 +00:00
Pedro F. Giffuni
3728855a0f sys/geom: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:17:37 +00:00
Mark Johnston
0349817103 Allow kern.geom.mirror.debug to be negative.
A negative value can be used to suppress all prints from the gmirror
kernel code, which can be useful when attempting to trigger race
conditions using stress tests.

MFC after:	1 week
2017-11-23 14:07:52 +00:00
Warner Losh
fdcf0c7477 While the EFI spec allows numbers to be in many forms, libefivar
produces hex numbers for the dsn. Since that come is from EDK2, change
this for symmetry, by generating the dsn as a hex number.

Noticed by: gpart list | grep efimedia | awk -F: '{print $2;}' | \
	sed -e 's/^ *//g;s/,,/,/' | grep MBR | efidp -p | efidp -f
Sponsored by: Netflix
2017-11-21 06:12:21 +00:00
Warner Losh
2ab9683565 Remove trailing whitespace (one I just introduced and a bunch of
others in the same directory).

Sponsored by: Netflix
2017-11-21 05:42:13 +00:00
Warner Losh
d65b6588d6 Implement efi media tagging for MBR partitioning types.
Sponsored by: Netflix
2017-11-21 05:35:21 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Andriy Gapon
f0fa2af656 geom_slice: fix r325227, protect against multiple calls to g_slice_free
This geom does not immediately detach its consumer relying on the
wither-washer to do that.  Since that happens asynchronously we may get
additional spoiling events.  So, we need to account for that.

There are multiple options for fixing this issue like detaching
immediately or checking for G_CF_ORPHAN in g_slice_spoiled().
The most reliable and least intrusive fix seems to be setting
geom->softc to NULL on the first call and checking for NULL on
subsequent calls.  This is something that the code did before r325227.

Reported by:	David Wolfskill <david@catwhisker.org>,
		O. Hartmann <o.hartmann@walstatt.org>
Tested by:	David Wolfskill <david@catwhisker.org> (earlier version)
Discussed with:	mav
MFC after:	1 week
X-MFC with:	r325227
2017-11-01 10:53:10 +00:00
Andriy Gapon
9662d80af5 geom_slice: do not destroy softc until providers are gone
At present, g_slice_orphan and g_slice_spoiled destroy the softc
(struct g_slicer) even before calling g_wither_geom, so there can
be active and incoming io requests at that time and g_slice_start
can access the softc.

This commit changes the code to destroy the softc only after all
providers are closed.

While there, a couple of small cleanups.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
Tested by:	Ben RUBSON <ben.rubson@gmail.com>
Reviewed by:	mav, smh (earlier version)
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D12809
2017-10-31 10:10:13 +00:00
Edward Tomasz Napierala
338ed98ad2 Add back missing MTX_DEF, it still needs to be there.
(Although it's defined to be 0, so there's no functional change.)

Reported by:	glebius
MFC after:	2 weeks
2017-10-29 12:03:06 +00:00
Mark Johnston
cef5abd140 Fix a lock leak in g_mirror_destroy().
g_mirror_destroy() is supposed to unlock the softc before indicating
success, but it wasn't doing so if the caller raced with another
thread destroying the mirror.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-27 17:05:14 +00:00
Edward Tomasz Napierala
0d73fface2 Make gmountver(8) use direct dispatch.
MFC after:	2 weeks
2017-10-26 10:18:31 +00:00
Edward Tomasz Napierala
0a8cfed8cf Make gmountver(8) use G_PF_ACCEPT_UNMAPPED.
MFC after:	2 weeks
2017-10-26 09:29:35 +00:00
Mark Johnston
64a16434d8 Add support for compressed kernel dumps.
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.

Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.

savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.

A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.

Reviewed by:	cem (earlier version)
Discussed with:	def, rgrimes
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11723
2017-10-25 00:51:00 +00:00
Alan Somers
27f0f2ec5f Display rotation rate and TRIM/UNMAP support in diskinfo(8)
Bump __FreeBSD_version due to the expansion of struct diocgattr_arg.

Reviewed by:	mav, allanjude, imp
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12578
2017-10-04 15:09:49 +00:00
Edward Tomasz Napierala
b73e1f746a Don't destroy gmountver(8) devices on shutdown, unless they are orphaned.
Otherwise we would fail to sync the filesystem on reboot.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-10-04 12:25:39 +00:00
Edward Tomasz Napierala
2b4490a5d3 Clear G_CF_ORPHAN when attaching. This fixes cases where the same
GEOM consumer can be orphaned, and then reattach to another provider.

From a user point of view, this makes gmountver(4) work again.

Reviewed by:	avg, mav
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12228
2017-10-02 11:57:00 +00:00
Conrad Meyer
a523de2365 g_resize_provider_event: Do not invoke orphan method twice
Like r266444, g_resize_provider_event can attempt to orphan an already
orphaned geom_dev consumer.  This will cause a panic in g_dev_orphan.  Apply
the same fix as was applied to g_orphan_register.

Reviewed by:	ae
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12469
2017-09-24 19:59:26 +00:00
Andriy Gapon
7103ac8ad6 gmirror: treat ENXIO as disk disconnect, not media error
In theory, all data access errors mean that a member is out of sync
at most.  But they were treated as more serious errors to avoid the
situation where a flaky disk gets repeatedly disconnected, re-synchronized,
reconnected and then disconnected again.

ENXIO is a special error that means that the member disk disappeared,
so it should get the same handling as the GEOM orphaning event.
There is a better chance that when the disk is reconnected, it will be
a good member again.

When ENXIO happens on a read we use the exisiting G_MIRROR_BUMP_SYNCID
mechanism which means that the mirror's syncid is increased as soon
as there is a write to the mirror.  That's because no data has got out
of sync yet, but the problematic memeber is disconnected, so the future
write will make it stale.

When ENXIO happens on a write we use a new G_MIRROR_BUMP_SYNCID_NOW
mechanism which means that we update the mirror metadata as soon as
possible because the problematic memeber is already behind.

Reviewed by:	markj, imp
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D9463
2017-09-15 13:57:08 +00:00
Conrad Meyer
ea5eee641e Fix information leak in geli(8) integrity mode
In integrity mode, a larger logical sector (e.g., 4096 bytes) spans several
physical sectors (e.g., 512 bytes) on the backing device.  Due to hash
overhead, a 4096 byte logical sector takes 8.5625 512-byte physical sectors.
This means that only 288 bytes (256 data + 32 hash) of the last 512 byte
sector are used.

The memory allocation used to store the encrypted data to be written to the
physical sectors comes from malloc(9) and does not use M_ZERO.

Previously, nothing initialized the final physical sector backing each
logical sector, aside from the hash + encrypted data portion.  So 224 bytes
of kernel heap memory was leaked to every block :-(.

This patch addresses the issue by initializing the trailing portion of the
physical sector in every logical sector to zeros before use.  A much simpler
but higher overhead fix would be to tag the entire allocation M_ZERO.

PR:		222077
Reported by:	Maxim Khitrov <max AT mxcrypt.com>
Reviewed by:	emaste
Security:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12272
2017-09-09 01:41:01 +00:00
Warner Losh
a905e3962c The hard drive media device path contains the size of the partition,
not its end. This makes the GEOM efimedia attribute match the
FreeBSD:Boot1Device environment variable now.

Sponsored by: Netflix
2017-09-02 07:04:06 +00:00
Warner Losh
ab4effdc68 Add efimedia attribute for all GPT partitions.
Sposnored by: Netflix
Differential Revision: https://reviews.freebsd.org/D12206
2017-09-01 17:55:25 +00:00
Konstantin Belousov
58d8f357c7 Let g_access() log the actual error number.
Submitted by:	 Fabian Keil <fk@fabiankeil.de>
PR:	221855
MFC after:	1 week
2017-08-27 12:24:25 +00:00
Mariusz Zaborski
3453dc72ad Hide length of geli passphrase during boot.
Introduce additional flag to the geli which allows to restore previous
behavior.

Reviewed by:	AllanJude@, cem@ (previous version)
MFC:		1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11751
2017-08-26 14:07:24 +00:00
Kirk McKusick
037331ddbd When read requests are sent from a filesystem running above g_journal,
the g_journal level needs to check whether it is holding a newer
copy of the block than that which exists on the disk. If so, it
needs to return its copy. If not, it should pass the request down
to the disk to fulfill. It currently considers six queues:

0) delayed queue,
1) unsent (current queue),
2) in-flight to the journal (flush queue),
3) active journal (active queue),
4) inactive journal (inactive queue), and
5) inflight to the disk (copy queue).

Checking on two of these queues is unnecessary:

0) The delayed requests should not be used for reads because they
   have not yet been entered into the journal, so their value should
   reflect the disk contents, not the future contents that are not
   yet committed.

2) Because all the bio's in the flush queue are also found on the
   active queue, there is no need to inspect the flush queue for
   reads since they will be found when searching the active queue.

Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-13 18:09:22 +00:00
Kirk McKusick
8fccf8ffd7 Eliminate a variable that is only ever set.
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-13 18:06:38 +00:00
Warner Losh
0038725697 Also provide a warning for geom_fox.
Differential Review: https://reviews.freebsd.org/D11935
Requested by: jhb@
MFC After: 3 days
2017-08-09 16:37:37 +00:00
Warner Losh
20995eab57 Mark geom classes as deprecated.
geom_bsd, geom_mbr and geom_sunlabel have been obsolete since Marcel
Moolenaar's geom_part was in FreeBSD 7. They haven't been in GENERIC
since FreeBSD 8. Add warning when used.

geom_vol_ffs has been obsolete since ufs support to geom_label was
committed in FreeBSD 5. It hasn't been in GENERIC since FreeBSD 5.
Add warning when used.

geom_fox has been obsolete since gmultipath was committed in FreeBSD 7.
(no warning added, since this is a very obscure class).

These will all be removed in FreeBSD 12.

MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D11935

Note: Classes will be removed after MFC
2017-08-09 16:15:24 +00:00
Warner Losh
36d6e01474 Eliminate useless adjustments of aliased device.
No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.

Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919
2017-08-07 22:42:46 +00:00
Warner Losh
d3517d306c Expose API to allow disks to ask for alias names in devfs.
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:38 +00:00
Warner Losh
5d7d13290a Add alias support to gpart.
When we're creating new providers for each of the partitions, add
aliases to the geom before we create the provider so when geom_dev
tastes the provider, the aliases are in place so the proper /dev
entries are created. So foo5p6 gets created as an alias for bar5p6
when foo is an alias for bar in the geom we're partitioning with
g_part. This also copies aliases from the container geom (eg disk) to
the label geom (the disk with GPT partitioning) so that aliases nest
properly.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:33 +00:00
Warner Losh
c624eb2598 Add aliasing concept to geom.
Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:28 +00:00
Kirk McKusick
6c6118b390 gjournal is broken in handling its flush_queue. If we have 10 bio's
in the flush_queue:
         1 2 3 4 5 6 7 8 9 10
and another 10 bio's go into the flush queue after only the first five
bio's are removed from the flush queue, the queue should look like:
         6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,
but because of the bug we end up with
         6 11 12 13 14  15 16 17 18 19 20 7 8 9 10.
So the sequence of the bio's is damaged in the flush queue (and
therefore in the journal on disk !). This error can be triggered by
ffs_snapshot() when a block is read with readblock() and gjournal finds
this block in the broken flush queue before it goes to the correct
active queue.

The fix is to place all new blocks at the end of the queue.

Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-07 19:40:03 +00:00
Kirk McKusick
683590b642 sysctl kern.geom.journal.cache.limit shows negative value for FreeBSD/amd64
system having over 4GB RAM. That's due to:

1) the limit being u_int instead of u_long like vm.kmem_size (the limit is
   half of vm.kmem_size by default for amd64);
2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long.

The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit
sysctl variable.

PR: 198500
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Reported by: Eugene Grosbein
Discussed with: kib
MFC after: 1 week
2017-08-07 19:18:27 +00:00
Alexander Motin
1631690677 Add GEOM::descr attribute for symmetry with GEOM::ident.
MFC after:	2 weeks
2017-07-06 08:36:14 +00:00
Ryan Libby
fb0e3235ea g_virstor.h: macro parenthesization
Build with gcc -Wint-in-bool-context revealed a macro parenthesization
error (invoking LOG_MSG with a ternary expression for lvl).

Reviewed by:	markj
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential revision:	https://reviews.freebsd.org/D11411
2017-06-30 22:01:18 +00:00
Marcelo Araujo
4323355e76 With r318394 seems it breaks gpart(8) in some embedded systems such like PCEngines,
RPI1-B, Alix and APU2 boards as well as NanoBSD with the following message:

vnode_pager_generic_getpages_done: I/O read error 5

Seems the breakage was because it was missed to include acr in glabel update.

Reported by:	Peter Blok <pblok@bsd4all.org>,
		madpilot, imp and trasz.
Reviewed by:	trasz
Tested by:	Peter Blok and madpilot.
MFC after:	3 days.
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11365
2017-06-27 01:22:27 +00:00
Stephen J. Kiernan
9a81ba0f24 Add MD_VERIFY option to enable O_VERIFY in open for vnode type.
Add -o [no]verify option to mdconfig (and document in man page.)
Implement GEOM attribute MNT::verified to ask md if the backing vnode is
  verified.
Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if
  the underlying device has been verified.

Reviewed by:	rwatson
Approved by:	sjg (mentor)
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D2902
2017-05-31 21:18:11 +00:00
Edward Tomasz Napierala
6635c8ed2f Fix typo.
MFC after:	2 weeks
2017-05-18 08:25:07 +00:00
Mark Johnston
db7c508323 Synchronize unclean mirrors before adding them to a running gmirror.
During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-02 23:29:42 +00:00
Alexander Motin
d109d8adc7 Dump md_iterations as signed, which it really is.
PR:		208305
PR:		196834
MFC after:	2 weeks
2017-04-21 07:43:44 +00:00
Alexander Motin
d8880fd450 Always allow setting number of iterations for the first time.
Before this change it was impossible to set number of PKCS#5v2 iterations,
required to set passphrase, if it has two keys and never had any passphrase.
Due to present metadata format limitations there are still cases when number
of iterations can not be changed, but now it works in cases when it can.

PR:		218512
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D10338
2017-04-21 07:16:07 +00:00
Mark Johnston
a7d94fcc3e Rename two gmirror state flags to make their meanings slightly clearer.
No functional change.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 17:13:57 +00:00
Mark Johnston
1e91412e40 Don't set the mirror GEOM softc to NULL in g_mirror_destroy().
At this point we have not rendezvous'ed with the mirror worker thread, and
I/O may still be in flight. Various I/O completion paths expect to be able
to obtain a reference to the mirror softc from the GEOM, so setting it to
NULL may result in various NULL pointer dereferences if the mirror is
stopped with -f or the kernel is shut down while a mirror is
synchronizing. The worker thread will clear the softc pointer before
exiting.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 17:08:37 +00:00
Mark Johnston
77011eac86 Check for a provider error before enqueuing mirror I/O.
We are otherwise susceptible to a race with a concurrent teardown of the
mirror provider, causing the I/O to be left uncompleted after the mirror
started withering.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 17:03:32 +00:00
Mark Johnston
a65d524afc Stop mirror synchronization before draining the I/O queue.
Regular I/O requests may be blocked by concurrent synchronization requests
targeted to the same LBAs, in which case they are moved to a holding queue
until the conflicting I/O completes. We therefore want to stop
synchronization before completing pending I/O in g_mirror_destroy_provider()
since this ensures that blocked I/O requests are completed as well.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-14 16:54:50 +00:00
Mark Johnston
a4834289d6 Handle NULL entries in gmirror disk ds_bios arrays.
Entries may be removed and freed if an I/O error occurs during mirror
synchronization, so we cannot assume that all entries of ds_bios are
valid.

Also ensure that a synchronization BIO's array index is preserved after
a successful write.

Reported and tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-04-10 17:15:59 +00:00
Allan Jude
ec5c0e5be9 Implement boot-time encryption key passing (keybuf)
This patch adds a general mechanism for providing encryption keys to the
kernel from the boot loader. This is intended to enable GELI support at
boot time, providing a better mechanism for passing keys to the kernel
than environment variables. It is designed to be extensible to other
applications, and can easily handle multiple encrypted volumes with
different keys.

This mechanism is currently used by the pending GELI EFI work.
Additionally, this mechanism can potentially be used to interface with
GRUB, opening up options for coreboot+GRUB configurations with completely
encrypted disks.

Another benefit over the existing system is that it does not require
re-deriving the user key from the password at each boot stage.

Most of this patch was written by Eric McCorkle. It was extended by
Allan Jude with a number of minor enhancements and extending the keybuf
feature into boot2.

GELI user keys are now derived once, in boot2, then passed to the loader,
which reuses the key, then passes it to the kernel, where the GELI module
destroys the keybuf after decrypting the volumes.

Submitted by:	Eric McCorkle <eric@metricspace.net> (Original Version)
Reviewed by:	oshogbo (earlier version), cem (earlier version)
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D9575
2017-04-01 05:05:22 +00:00
Allan Jude
39b7ca4533 sys/geom/eli: Switch bzero() to explicit_bzero() for sensitive data
In GELI, anywhere we are zeroing out possibly sensitive data, like
the metadata struct, the metadata sector (both contain the encrypted
master key), the user key, or the master key, use explicit_bzero.

Didn't touch the bzero() used to initialize structs.

Reviewed by:	delphij, oshogbo
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D9809
2017-03-31 00:07:03 +00:00
Mark Johnston
0d75d0dfbc Avoid sleeping when the mirror I/O queue is non-empty.
A request may be queued while the queue lock is dropped when the mirror is
being destroyed. The corresponding wakeup would be lost, possibly resulting
in an apparent hang of the mirror worker thread.

Tested by:	pho (part of a larger patch)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-03-29 19:39:07 +00:00
Mark Johnston
c1ab409cba Remove an unneeded g_mirror_destroy_provider() call.
The worker thread will destroy the mirror provider as part of its teardown
sequence. The call made sense in the initial revision of gmirror, but
became unnecessary in r137248.

Tested by:	pho (part of a larger diff)
MFC afteR:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-03-29 19:30:22 +00:00
Mark Johnston
819cd913f4 Refine r301173 a bit.
- Don't execute any of g_mirror_shutdown_post_sync() when panicking. We
  cannot safely idle the mirror or stop synchronization in that state, and
  the current attempts to do so complicate debugging of gmirror itself.
- Check for a non-NULL panicstr instead of using SCHEDULER_STOPPED(). The
  latter was added for use in the locking primitives.

Reviewed by:	mav, pjd
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-03-27 16:25:58 +00:00
Marcelo Araujo
7f5f84f08f After r315112 I broke the tests with eli, instead to pass 0, I should pass
M_NOWAIT to g_media_changed() that will call g_post_event() with this flag.

Reported by:	lwhsu, ngie and ae
2017-03-13 13:56:01 +00:00
Scott Long
d8474e52e3 Report disk flags via the sysctl tree 2017-03-13 11:09:17 +00:00
Marcelo Araujo
2ae0afa8ee Add the capability to refresh the gpart(8) label without need a reboot.
gpart(8) has functionality to change the label of an GPT partition.
This functionality works like it should, however, after a label change
the /dev/gpt/ entries remain unchanged. glabel(8) status output remains
unchanged. The change only takes effect after a reboot.

PR:		162690
Submitted by:	sub.mesa@gmail, Ben RUBSON <ben.rubson@gmail.com>, ae
Reviewed by:	allanjude, bapt, bcr
MFC after:	6 weeks.
Differential Revision:	https://reviews.freebsd.org/D9935
2017-03-12 04:15:56 +00:00
Alexander Motin
4d5832bc12 When chunking large DIOCGDELETE, do it on stripe edge.
MFC after:	2 weeks
2017-03-08 12:18:58 +00:00
Mariusz Zaborski
c27fb0b589 The kern.geom.part.auto_resize should be tunable. 2017-02-28 20:51:20 +00:00
Mariusz Zaborski
01ad653a81 Add sysctl to control auto resize of the GEOM metadata.
Reviewed by:	AllanJude
Differential Revision:	https://reviews.freebsd.org/D9603
2017-02-27 17:54:01 +00:00
Marius Strobl
4874af73c1 - Allow different slicers for different flash types to be registered
with geom_flashmap(4) and teach it about MMC for slicing enhanced
  user data area partitions. The FDT slicer still is the default for
  CFI, NAND and SPI flash on FDT-enabled platforms.
- In addition to a device_t, also pass the name of the GEOM provider
  in question to the slicers as a single device may provide more than
  provider.
- Build a geom_flashmap.ko.
- Use MODULE_VERSION() so other modules can depend on geom_flashmap(4).
- Remove redundant/superfluous GEOM routines that either do nothing
  or provide/just call default GEOM (slice) functionality.
- Trim/adjust includes

Submitted by:	jhibbits (RouterBoard bits)
Reviewed by:	jhibbits
2017-02-22 10:21:39 +00:00
Allan Jude
85c15ab853 improve PBKDF2 performance
The PBKDF2 in sys/geom/eli/pkcs5v2.c is around half the speed it could be

GELI's PBKDF2 uses a simple benchmark to determine a number of iterations
that will takes approximately 2 seconds. The security provided is actually
half what is expected, because an attacker could use the optimized
algorithm to brute force the key in half the expected time.

With this change, all newly generated GELI keys will be approximately 2x
as strong. Previously generated keys will talk half as long to calculate,
resulting in faster mounting of encrypted volumes. Users may choose to
rekey, to generate a new key with the larger default number of iterations
using the geli(8) setkey command.

Security of existing data is not compromised, as ~1 second per brute force
attempt is still a very high threshold.

PR:		202365
Original Research:	https://jbp.io/2015/08/11/pbkdf2-performance-matters/
Submitted by:	Joe Pixton <jpixton@gmail.com> (Original Version), jmg (Later Version)
Reviewed by:	ed, pjd, delphij
Approved by:	secteam, pjd (maintainer)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D8236
2017-02-19 19:30:31 +00:00
John Baldwin
dcbe5188da Defer startup of gjournal switcher kproc.
Don't start switcher kproc until the first GEOM is created.

Reviewed by:	pjd
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8576
2017-02-07 22:45:59 +00:00
Andrey V. Elsukov
9ef6004352 Check that primary GPT header is valid before wiping partitioning.
This allows safely destroy corrupted GPT when primary header was
rewritten by some data, that do not want to destroy.

MFC after:	1 week
2017-02-04 05:09:47 +00:00
Yoshihiro Takahashi
2b375b4edd Remove pc98 support completely.
I thank all developers and contributors for pc98.

Relnotes:	yes
2017-01-28 02:22:15 +00:00
Alexander Motin
d3fef0a092 Report disk addition errors on add or create subcommand.
MFC after:	1 week
2017-01-20 13:49:04 +00:00
Alexander Motin
17160457b4 Report random flash storage as non-rotating to GEOM_DISK.
While doing it, introduce respective constants in geom_disk.h.

MFC after:	1 week
2017-01-12 08:53:10 +00:00
Conrad Meyer
b28ea2c250 g_raid: Prevent tasters from attempting excessively large reads
Some g_raid tasters attempt metadata reads in multiples of the provider
sectorsize.  Reads larger than MAXPHYS are invalid, so detect and abort
in such situations.

Spiritually similar to r217305 / PR 147851.

PR:		214721
Sponsored by:	Dell EMC Isilon
2017-01-12 06:58:31 +00:00
Dimitry Andric
012039fd55 Fix logic error in gvinum's gv_set_sd_state()
With clang 4.0.0, I'm getting the following warnings:

    sys/geom/vinum/geom_vinum_state.c:186:7: error: logical not is only
    applied to the left hand side of this bitwise operator
    [-Werror,-Wlogical-not-parentheses]
                    if (!flags & GV_SETSTATE_FORCE)
                        ^      ~

The logical not operator should obiously be called after masking.

Reviewed by:	mav, pfg
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D9093
2017-01-08 17:56:54 +00:00
Sepherosa Ziehau
c22dceff9d build: Unbreak LINT
Sponsored by:	Microsoft
2016-12-21 01:39:11 +00:00
Konrad Witaszczyk
480f31c214 Add support for encrypted kernel crash dumps.
Changes include modifications in kernel crash dump routines, dumpon(8) and
savecore(8). A new tool called decryptcore(8) was added.

A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump
configuration in the diocskerneldump_arg structure to the kernel.
The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for
backward ABI compatibility.

dumpon(8) generates an one-time random symmetric key and encrypts it using
an RSA public key in capability mode. Currently only AES-256-CBC is supported
but EKCD was designed to implement support for other algorithms in the future.
The public key is chosen using the -k flag. The dumpon rc(8) script can do this
automatically during startup using the dumppubkey rc.conf(5) variable.  Once the
keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O
control.

When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random
IV and sets up the key schedule for the specified algorithm. Each time the
kernel tries to write a crash dump to the dump device, the IV is replaced by
a SHA-256 hash of the previous value. This is intended to make a possible
differential cryptanalysis harder since it is possible to write multiple crash
dumps without reboot by repeating the following commands:
# sysctl debug.kdb.enter=1
db> call doadump(0)
db> continue
# savecore

A kernel dump key consists of an algorithm identifier, an IV and an encrypted
symmetric key. The kernel dump key size is included in a kernel dump header.
The size is an unsigned 32-bit integer and it is aligned to a block size.
The header structure has 512 bytes to match the block size so it was required to
make a panic string 4 bytes shorter to add a new field to the header structure.
If the kernel dump key size in the header is nonzero it is assumed that the
kernel dump key is placed after the first header on the dump device and the core
dump is encrypted.

Separate functions were implemented to write the kernel dump header and the
kernel dump key as they need to be unencrypted. The dump_write function encrypts
data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps
are not supported due to the way they are constructed which makes it impossible
to use the CBC mode for encryption. It should be also noted that textdumps don't
contain sensitive data by design as a user decides what information should be
dumped.

savecore(8) writes the kernel dump key to a key.# file if its size in the header
is nonzero. # is the number of the current core dump.

decryptcore(8) decrypts the core dump using a private RSA key and the kernel
dump key. This is performed by a child process in capability mode.
If the decryption was not successful the parent process removes a partially
decrypted core dump.

Description on how to encrypt crash dumps was added to the decryptcore(8),
dumpon(8), rc.conf(5) and savecore(8) manual pages.

EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU.
The feature still has to be tested on arm and arm64 as it wasn't possible to run
FreeBSD due to the problems with QEMU emulation and lack of hardware.

Designed by:	def, pjd
Reviewed by:	cem, oshogbo, pjd
Partial review:	delphij, emaste, jhb, kib
Approved by:	pjd (mentor)
Differential Revision:	https://reviews.freebsd.org/D4712
2016-12-10 16:20:39 +00:00
Alexander Motin
b6fe583c55 Add gmirror create subcommand, alike to gstripe, gconcat, etc.
It is quite specific mode of operation without storing on-disk metadata.
It can be useful in some cases in combination with some external control
tools handling mirror creation and disks hot-plug.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2016-11-30 09:27:08 +00:00
Alexander Motin
dc399583ba Use providergone method to cover race between destroy and g_access().
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2016-11-13 03:56:26 +00:00
Alexander Motin
80f0a89c62 Do not report error on close even if we have no paths left.
MFC after:	 2 weeks
2016-11-12 18:57:38 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Conrad Meyer
8532d381a9 Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging
Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code.
This can be handy in tracking down what code touched hung bios and bufs
last. The full history is especially useful, but adds enough bloat that
it shouldn't be enabled in release builds.

Function names (or arbitrary string constants) are tracked in a
fixed-size ring in bufs. Bios gain a pointer to the upper buf for
tracking. SCSI CCBs gain a pointer to the upper bio for tracking.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8366
2016-10-31 23:09:52 +00:00
Ruslan Bukin
ae8b1f90fe Fix alignment issues on MIPS: align the pointers properly.
All the 5520 GEOM_ELI tests passed successfully on MIPS64EB.

Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D7905
2016-10-31 16:55:14 +00:00
Mark Johnston
5c2ac5cf2a gmirror: Add a subroutine to free synchronization BIOs.
This addresses a memory leak that occurs upon an I/O error during a mirror
synchronization.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2016-10-20 23:08:40 +00:00
Mark Johnston
b450976dc2 gmirror: Release pending regular requests when synchronization stops.
Normally gmirror allows colliding requests to proceed whenever a
synchronization request completes and advances to the next offset. However
if an I/O request collides with one of the final g_mirror_syncreqs, nothing
releases it once synchronization completes, resulting in an apparent I/O
hang. The same problem can occur if synchronization is aborted by an
I/O error. Therefore, be sure to requeue pending requests when
mirror synchronization is stopped for any reason.

While here, remove some dead code from g_mirror_regular_release().

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2016-10-20 23:02:30 +00:00
Alexander Motin
5a236b0ef9 Fix possible geom destruction before final provider close.
Introduce internal counter to track opens.  Using provider's counters is
not very successfull after calling g_wither_provider().

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2016-10-06 15:20:05 +00:00
Mark Johnston
4dea20be45 gmirror: Write an updated syncid before queuing writes.
When a syncid bump is pending, any write to the mirror results in the
updated syncid being written to each component's metadata block. However,
the update was only being performed after the writes to the mirror
componenents were queued. Instead, synchronously update the metadata block
first.

MFC after:	3 weeks
Sponsored by:	Dell EMC Isilon
2016-10-06 00:13:55 +00:00
Mark Johnston
903618cd65 gmirror: Bump the syncid if broken disks are found during startup.
Consider a mirror with two components, m1 and m2. Suppose a hardware error
results in the removal of m2, with m1's genid bumped. Suppose further that
a replacement mirror component m3 is created and synchronized, after which
the system is shut down uncleanly. During a subsequent bootup, if gmirror
tastes m1 and m2 first, m2 will be removed from the mirror because it is
broken, but the mirror will be started without bumping the syncid on m1
because all elements of the mirror are accounted for. Then m3 will be
added to the already-running mirror with the same syncid as m1, so the
components will not be synchronized despite the unclean shutdown.

Handle this scenario by bumping the syncid of healthy components if any
broken mirrors are discovered during mirror startup.

MFC after:	3 weeks
Sponsored by:	Dell EMC Isilon
2016-10-06 00:05:45 +00:00
Mark Johnston
fff048e4bc gmirror: Use bool instead of boolean_t.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2016-10-05 23:55:01 +00:00
Adrian Chadd
85ab1aeccf [geom_redboot] Extend geom_redboot to handle non-zero fis offset.
Submitted by:	Mori Hiroki <yamori813@yahoo.co.jp>
Differential Revision:	https://reviews.freebsd.org/D7237
2016-10-04 16:35:38 +00:00
Alexander Motin
8b64f3ca6c Use g_wither_provider() where applicable.
It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().
2016-09-23 21:29:40 +00:00
Edward Tomasz Napierala
0c4440c3aa Follow up r305988 by removing g_bio_run_task and related code.
The g_io_schedule_up() gets its "if" condition swapped to make
it more similar to g_io_schedule_down().

Suggested by:	mav@
Reviewed by:	mav@
MFC after:	1 month
2016-09-20 09:18:33 +00:00
Edward Tomasz Napierala
bbdd6614bd Remove unused bio_taskqueue().
MFC after:	1 month
2016-09-19 17:46:15 +00:00
Mark Johnston
4bfb585351 Don't treat an error from g_mirror_clear_metadata() as fatal.
Such errors can occur as the result of a write error or because the disk
backing the mirror element was removed. They result in a generation ID bump
on all active elements of the mirror, so we can safely disconnect the mirror
component rather than destroy it.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7750
2016-09-06 23:42:59 +00:00
Mark Johnston
40c5032d32 Add some fail points to gmirror.
These are useful for testing changes to I/O error handling, and for
reproducing existing bugs in a controlled manner. The fail points are

    g_mirror_regular_request_read
    g_mirror_regular_request_write
    g_mirror_sync_request_read
    g_mirror_sync_request_write
    g_mirror_metadata_write

They all effectively allow one to inject an error value into the bio_error
field of a corresponding BIO request as it is being completed.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-09-06 23:35:48 +00:00
Andrey V. Elsukov
0428336393 Do not invoke resize event if initial disk size is zero. Some disks
report the size only after first opening.  And due to the events are
asynchronous, some consumers can receive this event too late and
this confuses them. This partially restores previous behaviour, and
at the same time this should fix the problem, when already opened
provider loses resize event.

PR:		211028
MFC after:	3 weeks
2016-08-01 20:54:54 +00:00
Andrey V. Elsukov
1f353a2315 Do not invoke resize method if geom is being withered.
PR:		211028
MFC after:	2 weeks
2016-07-25 09:12:08 +00:00
Andrey V. Elsukov
f1ff88cf8c Use g_resize_provider() to change the size of GEOM_DISK provider,
when it is being opened. This should fix the possible loss of a resize
event when disk capacity changed.

PR:		211028
Reported by:	Dexuan Cui <decui at microsoft dot com>
MFC after:	3 weeks
2016-07-19 05:36:21 +00:00
Maxim Sobolev
55f9588af4 Relax checking if the privider size matches size recorded in the
superblock, allowing provider to be bit bigger, i.e. have some
extra padding after the FS image. That in some cases might be
a side-effect of using CLOOP format which enforces certain block
size and trying to compress image that is not exactly the number
of those blocks in size. The UFS itself does not have any issues
mounting such padded file systems, so it's what GEOM_LABEL should
do.

Submitted by:	@mizhka_gmail.com
Differential Revision:	https://reviews.freebsd.org/D6208
2016-07-18 05:00:01 +00:00
Mark Johnston
7d31c3939a Move some gmirror metadata update messages to a higher debug level.
These can be printed quite frequently from a mostly-idle mirror, cluttering
the console.

MFC after:	1 week
2016-07-14 00:40:24 +00:00
Maxim Sobolev
74ba4047a3 1.Improve handling around last compressed block of the file, which is
necessary because CLOOP format lacks explicit EOF or length, so that
  in the presence of padding or when the CLOOP is put onto a larger
  partition upper level provider size may be larger. Bound amount
  of extra data that we might touch to the max length of the compressed
  block and detect zero-padding in the last cluster, which when
  sector is all-zero might cause us to emit bogus I/O error after
  decompression of that fails. To not make code any more complicated
  that it needs to be deal with it in lazy-manner, i.e. when we
  first access that specific cluster.

  This change also fixes stupid mistake in the LZMA code, inherited
  from geom_lzma, which does not share length of the output buffer
  buffer with the decompression routine, so that in the presence
  of corrupted or purposedly tailored data may easily cause heap
  overflow and kernel memory corruption.

  Beef up validation of the CLOOP TOC by checking that lengths of
  all but the last compressed clusters match upper limit set by
  the decompressor and improve some error diagnostic output while
  I am here.

2.Add kern.geom.uzip.attach_to tunable to artifically limit
  attaching uzip to certain devices in the dev tree only.

    For example the following only makes us attaching to the
    GPT labels:

    kern.geom.uzip.attach_to="gpt/*"

3.Add kern.geom.uzip.noattach_to, which does opposite to the (2)
  above, i.e. prevents geom_uzip from tasting / attaching to
  providers matching some pattern. By default we don't attach
  to our own kind, i.e. kern.geom.uzip.noattach_to="*.uzip".
  It saves us quite some CPU cycles, esp on low-end embedded
  systems.

Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D7013
2016-06-29 18:19:05 +00:00
Kenneth D. Merry
a02e196edd Switch geom_disk over to using a pool mutex.
The GEOM disk d_mtx is only acquired on disk creation and destruction.
It is a good candidate for replacement with a pool mutex.  This eliminates
the mutex initialization and teardown and the mutex and name variables
themselves from struct disk.

sys/geom/geom_disk.h:
	Take d_mtx and d_mtx_name out of struct disk.

sys/geom/geom_disk.c:
	Use mtx_pool_lock() and mtx_pool_unlock() to guard the disk
	initialization state instead of a dedicated mutex.

	This allows removing the initialization and destruction of
	d_mtx.

sys/sys/param.h:
	Bump __FreeBSD_version to 1100119 for the change to struct disk.

Suggested by:	jhb
Sponsored by:	Spectra Logic
Approved by:	re (gjb)
2016-06-23 20:05:59 +00:00
Mark Johnston
be20fc2e90 Do not complete pending gmirror BIOs when tearing down the provider.
This will result in lock recursion and is more generally incorrect since
the completion handlers will just reinsert the BIOs into the queue we're
trying to drain.

Reviewed by:	imp, ngie
Approved by:	re (gjb)
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D6908
2016-06-22 21:00:28 +00:00
Kenneth D. Merry
e5616d65d0 Fix a bug that caused da(4) peripheral drivers to not fully go away
after the underlying device went away.

The problem was that callers who queue the GEOM resize provider
event didn't check to make sure that the provider had not been
withered.  For the other equivalent case, g_new_provider_event(),
the code checks to see whether the provider has been withered
before queueing a g_new_provider_event() to the event thread.

In some cases, a resize provider event would come through after
the provider had been withered and all of the existing consumers
had been orphaned.  When the resize event triggered a taste of
the provider, that would attach a new consumer to the now
withered provider.  The wither washer (g_wither_washer() would
never be able to completely tear down the GEOM because of the
consumers that were hanging around.

The solution was to check the G_PF_WITHER provider flag before
queueing the g_resize_provider_event(), and add an assert to
g_resize_provider_event() to insure that it isn't called on a
withered provider.

sys/geom/geom_subr.c:
	In g_resize_provider(), don't try to continue if the
	G_PF_WITHER flag is set.

	In g_resize_provider_event(), add an assert that the
	G_PF_WITHER flag is not set.

	In g_access(), if a provider has an error, print out the
	name of the provider with the error.

Sponsored by:	Spectra Logic
Approved by:	re (marius)
MFC after:	3 days
2016-06-22 14:39:13 +00:00
Kenneth D. Merry
1ff824e786 Fix a bug that caused da(4) instances to hang around after the underlying
device is gone.

The problem was that when disk_gone() is called, if the GEOM disk
creation process has not yet happened, the withering process
couldn't start.

We didn't record any state in the GEOM disk code, and so the d_gone()
callback to the da(4) driver never happened.

The solution is to track the state of the creation process, and
initiate the withering process from g_disk_create() if the disk is
being created.

This change does add fields to struct disk, and so I have bumped
DISK_VERSION.

geom_disk.c:	Track where we are in the disk creation process,
		and check to see whether our underlying disk has
		gone away or not.

		In disk_gone(), set a new d_goneflag variable that
		g_disk_create() can check to see if it needs to
		clean up the disk instance.

geom_disk.h:    Add a mutex to struct disk (for internal use) disk
		init level, and a gone flag.

		Bump DISK_VERSION because the size of struct disk has
		changed and fields have been added at the beginning.

Sponsored by:	Spectra Logic
Approved by:	re (marius)
2016-06-21 20:18:19 +00:00
Gleb Smirnoff
a7c5163b5f When we are in panic, always go the asynchronous path in g_mirror_destroy(),
otherwise the system will hang.

This is a temporarily least intrusive crutch to get certain panicing systems
dumping. The proper fix should question is g_mirror_destroy() should be called
on a panicing system at all.

Discussed with:	mav
2016-06-01 22:11:54 +00:00
Alan Somers
151746b244 Avoid issuing spa config updates for physical path when not necessary
ZFS's configuration needs to be updated whenever the physical path for a
device changes, but not when a new device is introduced. This is because new
devices necessarily cause config updates, but only if they are actually
accepted into the pool.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
	Split vdev_geom_set_physpath out of vdev_geom_attrchanged.  When
	setting the vdev's physical path, only request a config update if
	the physical path has changed.  Don't request it when opening a
	device for the first time, because the config sync will happen
	anyway upstack.

sys/geom/geom_dev.c
	Split g_dev_set_physpath and g_dev_set_media out of
	g_dev_attrchanged

Submitted by:	will, asomers
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D6428
2016-05-27 22:32:44 +00:00
Konstantin Belousov
d5446cc8f4 Remove unneeded Giant locking around kthreads creation.
Sponsored by:	The FreeBSD Foundation
2016-05-20 08:28:11 +00:00
Konstantin Belousov
4e2732b550 Removal of Giant droping wrappers for GEOM classes.
Sponsored by:	The FreeBSD Foundation
2016-05-20 08:25:37 +00:00
Konstantin Belousov
dff9131e58 Remove asserts that Giant is not held on entrance into geom KPI, which
outlived their usefulness.  This allows to remove drop/pickup Giant
wrappers around GEOM calls.

Discussed with:	alfred, imp, phk
Sponsored by:	The FreeBSD Foundation
2016-05-20 08:22:20 +00:00
Kenneth D. Merry
9a6844d55f Add support for managing Shingled Magnetic Recording (SMR) drives.
This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set.  You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity.  In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged.  For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation.  I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated.  These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers.  Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
	Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
	Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
	Add the zone and epc subcommands.

	Add auxiliary register support to build_ata_cmd().  Make sure to
	set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
	flags as appropriate for ATA commands.

	Add a new get_ata_status() function to parse ATA result from SCSI
	sense descriptors (for ATA passthrough over SCSI) and ATA I/O
	requests.

sbin/camcontrol/camcontrol.h:
	Update the build_ata_cmd() prototype

	Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
	Support for ATA Extended Power Conditions features.  This includes
	support for all features documented in the ACS-4 Revision 12
	specification from t13.org (dated February 18, 2016).

	The EPC feature set allows putting a drive into a power power mode
	immediately, or setting timeouts so that the drive will
	automatically enter progressively lower power states after various
	idle times.

sbin/camcontrol/fwdownload.c:
	Update the firmware download code for the new build_ata_cmd()
	arguments.

sbin/camcontrol/zone.c:
	Implement support for Shingled Magnetic Recording (SMR) drives
	via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
	Command Set (ZAC).

	These specs were developed in concert, and are functionally
	identical.  The primary differences are due to SCSI and ATA
	differences.  (SCSI is big endian, ATA is little endian, for
	example.)

	This includes support for all commands defined in the ZBC and
	ZAC specs.

sys/cam/ata/ata_all.c:
	Decode a number of additional ATA command names in ata_op_string().

	Add a new CCB building function, ata_read_log().

	Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
	functions.  These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
	Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
	ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
	Revamp the ada(4) driver to support zoned devices.

	Add four new probe states to gather information needed for zone
	support.

	Add a new adasetflags() function to avoid duplication of large
	blocks of flag setting between the async handler and register
	functions.

	Add new sysctl variables that describe zone support and paramters.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
	Add command descriptions for the ZBC IN/OUT commands.

	Add descriptions for ZBC Host Managed devices.

	Add a new function, scsi_ata_pass() to do ATA passthrough over
	SCSI.  This will eventually replace scsi_ata_pass_16() -- it
	can create the 12, 16, and 32-byte variants of the ATA
	PASS-THROUGH command, and supports setting all of the
	registers defined as of SAT-4, Revision 5 (March 11, 2016).

	Change scsi_ata_identify() to use scsi_ata_pass() instead of
	scsi_ata_pass_16().

	Add a new scsi_ata_read_log() function to facilitate reading
	ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
	Add the new ATA PASS-THROUGH(32) command CDB.  Add extended and
	variable CDB opcodes.

	Add Zoned Block Device Characteristics VPD page.

	Add ATA Return SCSI sense descriptor.

	Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
	Revamp the da(4) driver to support zoned devices.

	Add five new probe states, four of which are needed for ATA
	devices.

	Add five new sysctl variables that describe zone support and
	parameters.

	The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
	devices when they are attached via a SCSI to ATA Translation (SAT)
	layer.  Since ZBC -> ZAC translation is a new feature in the T10
	SAT-4 spec, most SATA drives will be supported via ATA commands
	sent via the SCSI ATA PASS-THROUGH command.  The da(4) driver will
	prefer the ZBC interface, if it is available, for performance
	reasons, but will use the ATA PASS-THROUGH interface to the ZAC
	command set if the SAT layer doesn't support translation yet.
	As I mentioned above, ZBC command support is untested.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

	Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

	Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
	building functions.  Note that these have return values, unlike
	almost all other CCB building functions in CAM.  The reason is
	that they can fail, depending upon the particular combination
	of input parameters.  The primary failure case is if the user
	wants NCQ, but fails to specify additional CDB storage.  NCQ
	requires using the 32-byte version of the SCSI ATA PASS-THROUGH
	command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
	Add ZBC IN and ZBC OUT CDBs and opcodes.

	Add SCSI Report Zones data structures.

	Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
	scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
	Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

	ahci_setup_fis() previously set the top bits of the sector count
	register in the FIS to 0 for FPDMA commands.  This is okay for
	read and write, because the PRIO field is in the only thing in
	those bits, and we don't implement that further up the stack.

	But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
	byte, so it needs to be transmitted to the drive.

	In ahci_setup_fis(), always set the the top 8 bits of the
	sector count register.  We need it in both the standard
	and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
	Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
	Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
	Add new DIOCZONECMD ioctl, which allows sending zone commands to
	disks.

sys/geom/geom_disk.c:
	Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
	Add a new flag, DISKFLAG_CANZONE, that indicates that a given
	GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
	Add a new function, g_io_zonecmd(), that handles execution of
	BIO_ZONE commands.

	Add permissions check for BIO_ZONE commands.

	Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
	Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
	Record statistics for REPORT ZONES commands.  Note that the
	number of bytes transferred for REPORT ZONES won't quite match
	what is received from the harware.  This is because we're
	necessarily counting bytes coming from the da(4) / ada(4) drivers,
	which are using the disk_zone.h interface to communicate up
	the stack.  The structure sizes it uses are slightly different
	than the SCSI and ATA structure sizes.

sys/sys/ata.h:
	Add many bit and structure definitions for ZAC, NCQ, and EPC
	command support.

sys/sys/bio.h:
	Convert the bio_cmd field to a straight enumeration.  This will
	yield more space for additional commands in the future.  After
	change r297955 and other related changes, this is now possible.
	Converting to an enumeration will also prevent use as a bitmask
	in the future.

sys/sys/disk.h:
	Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
	Add a new API for managing zoned disks.  This is very close to
	the SCSI ZBC and ATA ZAC standards, but uses integers in native
	byte order instead of big endian (SCSI) or little endian (ATA)
	byte arrays.

	This is intended to offer to the complete feature set of the ZBC
	and ZAC disk management without requiring the application developer
	to include SCSI or ATA headers.  We also use one set of headers
	for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
	Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
	of SMR support.

usr.sbin/Makefile:
	Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
	Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
	Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
	zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
	The zonectl(8) utility.  This allows managing SCSI or ATA zoned
	disks via the disk_zone.h API.  You can report zones, reset write
	pointers, get parameters, etc.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D6147
Reviewed by:	wblock (documentation)
2016-05-19 14:08:36 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Maxim Sobolev
e3d7ead7df Add missing include "opt_geom.h" to make GEOM_UZIP_DEBUG option working,
also rename enum member so it does not conflict with GEOM_UZIP option
name.

Submitted by:	mizhka@gmail.com
Differential Revision:	https://reviews.freebsd.org/D6207
2016-05-06 20:32:39 +00:00
Pedro F. Giffuni
4ed3c0e713 sys: Make use of our rounddown() macro when sys/param.h is available.
No functional change.
2016-04-30 14:41:18 +00:00
Pedro F. Giffuni
e8d5712284 sys/geom: spelling fixes in comments.
No functional change.
2016-04-29 20:56:58 +00:00
Pedro F. Giffuni
310aef3257 sys/geom: spelling fixes.
These affect debugging messages.

MFC after:	2 weeks
2016-04-28 19:26:46 +00:00
Pedro F. Giffuni
b99bce73e2 geom: unsign some types to match their definitions and avoid overflows.
In struct:gctl_req, nargs is unsigned.

In mirror:
g_mirror_syncreqs is unsigned.

In raid:
in struct:g_raid_volume, v_disks_count is unsigned.

In virstor:
in struct:g_virstor_softc, n_components is unsigned.

MFC after:	2 weeks
2016-04-27 15:10:40 +00:00
Conrad Meyer
4a2776e538 g_part_bsd64: Delete duplicate/dead code
RAW_PART is handled earlier in the loop.

Reported by:	Coverity
CID:		1223201
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 22:32:33 +00:00
Conrad Meyer
5ad33e776f g_part_bsd64: Check for valid on-disk npartitions value
This value is u32 on disk, but assigned to an int in memory.  After we do the
implicit conversion via assignment, check that the result is at least one[1]
(non-negative[2]).

1. The subsequent for-loop iterates from gpt_entries minus one, down, until
   reaching zero.  A negative or zero initial index results in undefined signed
   integer overflow.
2. It is also used to index into arrays later.

In practice, we expected non-malicious disks to contain small positive values.

Reported by:	Coverity
CID:		1223202
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 22:30:54 +00:00
Pedro F. Giffuni
55e0987aea sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
Maxim Sobolev
f260c3eadc Relax TOC offsets checking somewhat, allowing offset pointing to
the next byte past EOF to denote zero-block(s) at the very end of
the file.
2016-04-26 06:50:38 +00:00
Maxim Sobolev
416ee66e25 o Fix handling of images with compression block sizes comparable to
MAXPHYS.

o Improve debug somewhat;

o Convert "BUG BUG BUG message" into a proper KASSERT.
2016-04-23 06:31:46 +00:00
Alan Somers
1c2c346f09 DRY on buffer sizes. Update to r298420.
sys/geom/geom_disk.c:
	In disk_attr_changed, don't repeat a buffer size.

Reported by: ngie, hselasky
MFC after:	4 weeks
X-MFC-With:	298420
Sponsored by:	Spectra Logic Corp
2016-04-21 21:13:41 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Alan Somers
42f42c9942 Notify userspace listeners when geom disk attributes have changed
sys/geom/geom_disk.c:
	disk_attr_changed(): Generate a devctl event of type GEOM:<attr> for
	every call.

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D5952
2016-04-21 16:43:15 +00:00
Pedro F. Giffuni
63b6b7a74a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Pedro F. Giffuni
01b5c6f73e g_gate: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 16:18:07 +00:00
Warner Losh
9a8fa125c1 Bump bio_cmd and bio_*flags from 8 bits to 16.
Differential Revision: https://reviews.freebsd.org/D5784
2016-04-14 05:10:41 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Allan Jude
d873662594 Create the GELIBOOT GEOM_ELI flag
This flag indicates that the user wishes to use the GELIBOOT feature to boot from a fully encrypted root file system.
Currently, GELIBOOT does not support key files, and in the future when it does, they will be loaded differently.
Due to the design of GELI, and the desire for secrecy, the GELI metadata does not know if key files are used or not, it just adds the key material (if any) to the HMAC before the optional passphrase, so there is no way to tell if a GELI partition requires key files or not.

Since the GELIBOOT code in boot2 and the loader does not support keys, they will now only attempt to attach if this flag is set. This will stop GELIBOOT from prompting for passwords to GELIs that it cannot decrypt, disrupting the boot process

PR:		208251
Reviewed by:	ed, oshogbo, wblock
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D5867
2016-04-08 01:25:25 +00:00
Pedro F. Giffuni
21ff1f7469 g_sched_destroy(): prevent return of uninitialized scalar variable.
For the !gsp case there some chance of returning an uninitialized
return value. Prevent that from happening by initializing the
error value.

CID:	1006421
2016-04-03 16:25:51 +00:00
Warner Losh
ca19dfe480 Don't assume that bio_cmd is a bit mask.
Differential Revision: https://reviews.freebsd.org/D5592
2016-03-10 06:25:39 +00:00
Warner Losh
8076d204da Don't assume that bio_cmd is bit mask.
Differential Revision: https://reviews.freebsd.org/D5593
2016-03-10 06:25:31 +00:00
Adrian Chadd
443a0f85dd Fixes to make it compile under gcc-4.2. 2016-02-24 02:52:49 +00:00
Maxim Sobolev
5497acc527 Obsolete mkulzma(8) and geom_uncompress(4), their functionality
is now provided by mkuzip(8) and geom_uzip(4) respectively.

MFC after:	1 month
2016-02-24 00:39:36 +00:00
Maxim Sobolev
8f8cb840b0 Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8)
and geom_uncompress(4):

1. mkuzip(8):

 - Proper support for eliminating all-zero blocks when compressing an
   image. This feature is already supported by the geom_uzip(4) module
   and CLOOP format in general, so it's just a matter of making mkuzip(8)
   match. It should be noted, however that this feature while it sounds
   great, results in very slight improvement in the overall compression
   ratio, since compressing default 16k all-zero block produces only 39
   bytes compressed output block, which is 99.8% compression ratio. With
   typical average compression ratio of amd64 binaries and data being
   around 60-70% the difference between 99.8% and 100.0% is not that
   great further diluted by the ratio of number of zero blocks in the
   uncompressed image to the overall number of blocks being less than
   0.5 (typically). However, this may be important from performance
   standpoint, so that kernel are not spinning its wheels decompressing
   those empty blocks every time this zero region is read. It could also
   be important when you create huge image mostly filled with zero
   blocks for testing purposes.

 - New feature allowing to de-duplicate output image. It turns out that
   if you twist CLOOP format a bit you can do that as well. And unlike
   zero-blocks elimination, this gives a noticeable improvement in the
   overall compression ratio, reducing output image by something like
   3-4% on my test UFS2 3GB image consisting of full FreeBSD base system
   plus some of the packages (openjdk, apache etc), about 2.3GB worth of
   file data (800+MB compressed). The only caveat is that images created
   with this feature "on" would not work on older versions of FeeBSDxi
   kernel, hence it's turned off by default.

 - provide options to control both features and document them in manual
   page.

 - merge in all relevant LZMA compression support from the mkulzma(8),
   add new option to select between both.

 - switch license from ad-hoc beerware into standard 2-clause BSD.

2. geom_uzip(4):

 - implement support for de-duplicated images;

 - optimize some code paths to handle "all-zero" blocks without reading
   any compressed data;

 - beef up manual page to explain that geom_uzip(4) is not limited only
   to md(4) images. The compressed data can be written to the block
   device and accessed directly via magic of GEOM(4) and devfs(4),
   including to mount root fs from a compressed drive.

 - convert debug log code from being compiled in conditionally into
   being present all the time and provide two sysctls to turn it on or
   off. Due to intended use of the module, it can be used in
   environments where there may not be a luxury to put new kernel with
   debug code enabled. Having those options handy allows debug issues
   without as much problem by just having access to serial console or
   network shell access to a box/appliance. The resulting additional
   CPU cycles are just few int comparisons and branches, and those are
   minuscule when compared to data decompression which is the main
   feature of the module.

 - hopefully improve robustness and resiliency of the geom_uzip(4) by
   performing some of the data validation / range checking on the TOC
   entries and rejecting to attach to an image if those checks fail.

 - merge in all relevant LZMA decompression support from the
   geom_uncompress(4), enable automatically when appropriate format is
   indicated in the header.

 - move compilation work into its own worker thread so that it does not
   clog g_up. This allows multiple instances work in parallel utilizing
   smp cores.

 - document new knobs in the manual page.

Reviewed by:		adrian
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
Warner Losh
bd4c1dd6d6 Use the right size for zeroing.
Submitted by: rpokala@
2016-02-17 18:28:38 +00:00
Warner Losh
c55f57071a Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.
2016-02-17 17:16:02 +00:00
Adrian Chadd
61789a9a76 Teach the flashmap code about the SPI flash.
PR:		kern/206227
Submitted by:	Stanislav Galabov <sgalabov@gmail.com>
2016-01-23 05:26:29 +00:00
Ravi Pokala
cb03a5029b Add rotationrate to geom disk dumpconf
Parse and report the nominal rotation rate reported by the drive.

Reviewed by:	sbruno, jhb
Approved by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D4483
Requested by:	Kevin Bowling < kevin.bowling @ kev009.com >
2016-01-14 21:52:21 +00:00
Allan Jude
4332feca4b Make additional parts of sys/geom/eli more usable in userspace
The upcoming GELI support in the loader reuses parts of this code
Some ifdefs are added, and some code is moved outside of existing ifdefs

The HMAC parts of GELI are broken out into their own file, to separate
them from the kernel crypto/openssl dependant parts that are replaced
in the boot code.

Passed the GELI regression suite (tools/regression/geom/eli)
 Files=20 Tests=14996
 Result: PASS

Reviewed by:	pjd, delphij
MFC after:	1 week
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D4699
2016-01-07 05:47:34 +00:00
Allan Jude
9c0c355f2a Add some additional GPT partition types
4 ChromeOS GPT types
2 Microsoft partition types
the new OpenBSD partition type

Approved by:	marcel (mentor)
MFC after:	1 week
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3841
2015-12-27 18:12:13 +00:00
Allan Jude
7a3f5d11fb Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c
cperciva's libmd implementation is 5-30% faster

The same was done for SHA256 previously in r263218

cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation

Extend sbin/md5 to create sha384(1)

Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}

Reviewed by:	cperciva, des, delphij
Approved by:	secteam, bapt (mentor)
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3929
2015-12-27 17:33:59 +00:00
Allan Jude
1747e1d875 Fix incorrect error message in geom map
If geom_map fails to find the end of a mapped partition based on a search, it would return the incorrect error message, stating it could not parse the START value

Reviewed by:	adrian
Approved by:	bapt (mentor)
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D4187
2015-12-27 17:09:23 +00:00
Warner Losh
268f69f40b It turns out that it's OK to sleep in this context, so use M_WAITOK
for the softc for the delay module.

Noticed by: rpokala@
2015-12-18 14:10:00 +00:00
Warner Losh
6a607537da Scheduling module to introduce a fixed delay into the I/O path. 2015-12-18 05:39:25 +00:00
Steven Hartland
25080ac4d4 Prevent g_access calls to bad multipath members
When a multipath member is orphaned its access members are zeroed before its
removed if marked for wither, so prevent any future calls to g_access on
such members.

This prevents a panic on debug kernels which validates the resultant values
aren't negative.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4416
2015-12-15 21:11:41 +00:00
Andrey V. Elsukov
af90a87209 Make detection of GPT a bit more reliable.
When we are detecting a partition table and didn't find PMBR, try to
read backup GPT header from the last sector and if it is correct,
assume that we have GPT.

Reviewed by:	rpokala
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D4282
2015-12-10 10:35:07 +00:00
Kenneth D. Merry
985108aeb1 Fix a style issue in g_disk_limit().
Noticed by:	bdrewery
MFC after:	1 week
2015-12-04 03:44:12 +00:00
Kenneth D. Merry
42fbdde413 Fix g_disk_vlist_limit() to work properly with deletes.
Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.

Submitted by:	will
MFC after:	1 week
Sponsored by:	Spectra Logic
2015-12-04 03:38:35 +00:00
Kenneth D. Merry
a9934668aa Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl.  User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists.  This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out.  This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
   Currently the unmapped I/O interface operates on a struct bio,
   which includes only one address and length.  It would be nice
   to be able to send an unmapped scatter/gather list down to
   busdma.  This would allow eliminating the copy we currently do
   for data.

2. Add an ioctl to list currently outstanding CCBs in the various
   queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
   that.

4. Test physical address support.  Virtual pointers and scatter
   gather lists have been tested, but I have not yet tested
   physical addresses or scatter/gather lists.

5. Investigate multiple queue support.  At the moment there is one
   queue of commands per pass(4) device.  If multiple processes
   open the device, they will submit I/O into the same queue and
   get events for the same completions.  This is probably the right
   model for most applications, but it is something that could be
   changed later on.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout.  It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.

The I/O is done by two threads, one for the reader and one for the
writer.  The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order.  That could be modified later on for random I/O patterns
or slightly out of order I/O.

camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.

For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side.  In addition to testing both
interfaces, this makes any potential reblocking of I/O easier.  No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.

For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.

Things that would be nice to do for camdd(8) eventually:

1.  Add support for I/O pattern generation.  Patterns like all
    zeros, all ones, LBA-based patterns, random patterns, etc. Right
    Now you can always use /dev/zero, /dev/random, etc.

2.  Add support for a "sink" mode, so we do only reads with no
    writes.  Right now, you can use /dev/null.

3.  Add support for automatic queue depth probing, so that we can
    figure out the right queue depth on the input and output side
    for maximum throughput.  At the moment it defaults to 6.

4.  Add support for SATA device passthrough I/O.

5.  Add support for random LBAs and/or lengths on the input and
    output sides.

6.  Track average per-I/O latency and busy time.  The busy time
    and latency could also feed in to the automatic queue depth
    determination.

sys/cam/scsi/scsi_pass.h:
	Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
	and fetch asynchronous CAM CCBs respectively.

	Although these ioctls do not have a declared argument, they
	both take a union ccb pointer.  If we declare a size here,
	the ioctl code in sys/kern/sys_generic.c will malloc and free
	a buffer for either the CCB or the CCB pointer (depending on
	how it is declared).  Since we have to keep a copy of the
	CCB (which is fairly large) anyway, having the ioctl malloc
	and free a CCB for each call is wasteful.

sys/cam/scsi/scsi_pass.c:
	Add asynchronous CCB support.

	Add two new ioctls, CAMIOQUEUE and CAMIOGET.

	CAMIOQUEUE adds a CCB to the incoming queue.  The CCB is
	executed immediately (and moved to the active queue) if it
	is an immediate CCB, but otherwise it will be executed
	in passstart() when a CCB is available from the transport layer.

	When CCBs are completed (because they are immediate or
	passdone() if they are queued), they are put on the done
	queue.

	If we get the final close on the device before all pending
	I/O is complete, all active I/O is moved to the abandoned
	queue and we increment the peripheral reference count so
	that the peripheral driver instance doesn't go away before
	all pending I/O is done.

	The new passcreatezone() function is called on the first
	call to the CAMIOQUEUE ioctl on a given device to allocate
	the UMA zones for I/O requests and S/G list buffers.  This
	may be good to move off to a taskqueue at some point.
	The new passmemsetup() function allocates memory and
	scatter/gather lists to hold the user's data, and copies
	in any data that needs to be written.  For virtual pointers
	(CAM_DATA_VADDR), the kernel buffer is malloced from the
	new pass(4) driver malloc bucket.  For virtual
	scatter/gather lists (CAM_DATA_SG), buffers are allocated
	from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
	Physical pointers are passed in unchanged.  We have support
	for up to 16 scatter/gather segments (for the user and
	kernel S/G lists) in the default struct pass_io_req, so
	requests with longer S/G lists require an extra kernel malloc.

	The new passcopysglist() function copies a user scatter/gather
	list to a kernel scatter/gather list.  The number of elements
	in each list may be different, but (obviously) the amount of data
	stored has to be identical.

	The new passmemdone() function copies data out for the
	CAM_DATA_VADDR and CAM_DATA_SG cases.

	The new passiocleanup() function restores data pointers in
	user CCBs and frees memory.

	Add new functions to support kqueue(2)/kevent(2):

	passreadfilt() tells kevent whether or not the done
	queue is empty.

	passkqfilter() adds a knote to our list.

	passreadfiltdetach() removes a knote from our list.

	Add a new function, passpoll(), for poll(2)/select(2)
	to use.

	Add devstat(9) support for the queued CCB path.

sys/cam/ata/ata_da.c:
	Add support for the BIO_VLIST bio type.

sys/cam/cam_ccb.h:
	Add a new enumeration for the xflags field in the CCB header.
	(This doesn't change the CCB header, just adds an enumeration to
	use.)

sys/cam/cam_xpt.c:
	Add a new function, xpt_setup_ccb_flags(), that allows specifying
	CCB flags.

sys/cam/cam_xpt.h:
	Add a prototype for xpt_setup_ccb_flags().

sys/cam/scsi/scsi_da.c:
	Add support for BIO_VLIST.

sys/dev/md/md.c:
	Add BIO_VLIST support to md(4).

sys/geom/geom_disk.c:
	Add BIO_VLIST support to the GEOM disk class.  Re-factor the I/O size
	limiting code in g_disk_start() a bit.

sys/kern/subr_bus_dma.c:
	Change _bus_dmamap_load_vlist() to take a starting offset and
	length.

	Add a new function, _bus_dmamap_load_pages(), that will load a list
	of physical pages starting at an offset.

	Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
	Allow unmapped I/O to start at an offset.

sys/kern/subr_uio.c:
	Add two new functions, physcopyin_vlist() and physcopyout_vlist().

sys/pc98/include/bus.h:
	Guard kernel-only parts of the pc98 machine/bus.h header with
	#ifdef _KERNEL.

	This allows userland programs to include <machine/bus.h> to get the
	definition of bus_addr_t and bus_size_t.

sys/sys/bio.h:
	Add a new bio flag, BIO_VLIST.

sys/sys/uio.h:
	Add prototypes for physcopyin_vlist() and physcopyout_vlist().

share/man/man4/pass.4:
	Document the CAMIOQUEUE and CAMIOGET ioctls.

usr.sbin/Makefile:
	Add camdd.

usr.sbin/camdd/Makefile:
	Add a makefile for camdd(8).

usr.sbin/camdd/camdd.8:
	Man page for camdd(8).

usr.sbin/camdd/camdd.c:
	The new camdd(8) utility.

Sponsored by:	Spectra Logic
MFC after:	1 week
2015-12-03 20:54:55 +00:00
Steven Hartland
86787e8d97 Fix early kernel dump via dumpdev env
Setting the dumpdev via env e.g. loader.conf provides the ability to
configure the kernel dump device during early boot. When using this
g_io_getattr was returning EPERM due to cp->acr == 0.

Fix this by calling g_access to ensure we're a read consumer prior
to calling g_dev_setdumpdev.

MFC after:	2 weeks
Sponsored by:	Multiplay
2015-11-17 20:55:50 +00:00
Steven Hartland
2dc7e36b0b Fix g_eli error loss conditions
* Ensure that error information isn't lost.
* Log the error code in all cases.
* Don't overwrite bio_completed set to 0 from the error condition.

MFC after:	2 weeks
Sponsored by:	Multiplay
2015-11-05 17:37:35 +00:00
Alexander Motin
4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00
Edward Tomasz Napierala
45d7de1d37 Make geom_nop(4) collect statistics on all types of BIOs, not just
reads and writes.

PR:		kern/198405
Submitted by:	Matthew D. Fuller <fullermd at over-yonder dot net>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3679
2015-10-10 09:03:31 +00:00
Conrad Meyer
b4d7290796 geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon
Skip a /dev/ prefix, if one is present, when checking for matching
device names for dump.

Suggested by:	avg
Reviewed by:	markj
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3725
2015-09-23 21:08:52 +00:00
Edward Tomasz Napierala
bb27d7ed90 Add a way to specify stripesize and stripeoffset to gnop(8). This makes
it possible to "simulate" 4K media, to eg test alignment handling.

Reviewed by:	mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3664
2015-09-15 18:01:59 +00:00
Warner Losh
3f2e5b8584 After the introduction of direct dispatch, the pacing code in g_down()
broke in two ways. One, the pacing variable was accessed in multiple
threads in an unsafe way. Two, since large numbers of I/O could come
down from the buf layer at one time, large numbers of allocation
failures could happen all at once, resulting in a huge pace value that
would limit I/Os to 10 IOPS for minutes (or even hours) at a
time. While a real solution to these problems requires substantial
work (to go to a no-allocation after the first model, or to have some
way to wait for more memory with some kind of reserve for pager and
swapper requests), it is relatively easy to make this simplistic
pacing less pathological.

Move to using a volatile variable with loads and stores. While this is
a little racy, losing the race is safe: either you get memory and
proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever
is larger) instead of 100ms. This removes the artificial 10 IOPS limit
while still easing up on new I/Os during memory shortages. Remove
tying the amount of time we do this to the number of failed requests
and do it only as long as we keep failing requests.

Finally, to avoid needless recursion when memory is tight (start ->
g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the
stack), don't do direct dispatch while pacing. This should be a rare
event (not steady state) so the performance hit here is worth the
extra safety of not starving g_down() with directly dispatched I/O.

Differential Review: https://reviews.freebsd.org/D3546
2015-09-02 17:29:30 +00:00
Justin Hibbits
6aabc119b6 Create a RouterBoard platform and use it to create a flash map
Summary:
The RouterBoard uses a predefined partition map which doesn't exist in the fdt.
This change allows overriding the fdt slicer with a custom slicer, and uses this
custom slicer to define the flash map on the RouterBoard RB800.
D3305 converts the mpc85xx platform into a base class, so that systems based on
the mpc85xx platform can add their own overrides.  This change builds on D3305,
and creates a RouterBoard (RB800) platform to initialize the slicer override.

Reviewed By: nwhitehorn, imp
Differential Revision: https://reviews.freebsd.org/D3345
2015-08-22 05:50:18 +00:00
Pedro F. Giffuni
6bc3fe5f4e Clean out some externally visible "more then" grammar
MFC after:	3 days
2015-08-11 03:12:09 +00:00
Enji Cooper
604083d74c Make some debug printf's into DPRINTF's to reduce noise on attach/detahh
Similar reasoning to what was done in r286367 with geom_uzip(4)

MFC after: 2 weeks
Differential Revision: D3320
Sponsored by: EMC / Isilon Storage Division
2015-08-09 06:58:06 +00:00
Pawel Jakub Dawidek
46e3447026 Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected when
GELI is used on a SSD or inside virtual machine, so that guest can tell
host that it is no longer using some of the storage.

Enabling BIO_DELETE passthru comes with a small security consequence - an
attacker can tell how much space is being really used on encrypted device and
has less data no analyse then. This is why the -T option can be given to the
init subcommand to turn off this behaviour and -t/T options for the configure
subcommand can be used to adjust this setting later.

PR:		198863
Submitted by:	Matthew D. Fuller fullermd at over-yonder dot net

This commit also includes a fix from Fabian Keil freebsd-listen at
fabiankeil.de for 'configure' on onetime providers which is not strictly
related, but is entangled in the same code, so would cause conflicts if
separated out.
2015-08-08 09:51:38 +00:00
Konstantin Belousov
347e9d5495 Minor style cleanup of the code surrounding r286404.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-07 08:24:12 +00:00
Konstantin Belousov
9b34965019 The condition to use direct processing for the unmapped bio is
reverted.  We can do direct processing when g_io_check() does not need
to perform transient remapping of the bio, otherwise the thread has to
sleep.

Reviewed by:	mav (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-07 08:13:34 +00:00
Pawel Jakub Dawidek
5ee9ea19fe After crypto_dispatch() bio might be already delivered and destroyed,
so we cannot access it anymore. Setting an error later lead to memory
corruption.

Assert that crypto_dispatch() was successful. It can fail only if we pass a
bogus crypto request, which is a bug in the program, not a runtime condition.

PR:		199705
Submitted by:	luke.tw
Reviewed by:	emaste
MFC after:	3 days
2015-08-06 17:13:34 +00:00
Enji Cooper
fcc8461cfb Make some debug printf's into DPRINTF's to reduce noise on attach/detach
Differential Revision: https://reviews.freebsd.org/D3306
MFC after: 1 week
Reviewed by: loos
Sponsored by: EMC / Isilon Storage Division
2015-08-06 15:30:14 +00:00
Edward Tomasz Napierala
72800098bf Fix panic triggered by code like this:
open("/dev/md0", O_EXEC);

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3051
2015-08-04 10:40:08 +00:00
Edward Tomasz Napierala
d6cc35b287 Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters.  It seems
to be harmless, though.  If anyone knows a better way to approach
this - please tell.

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3050
2015-08-03 16:35:18 +00:00
Andrey V. Elsukov
da6c24e123 Report the scheme and provider names in warning message about unaligned
partition.

PR:		201873
MFC after:	1 week
2015-07-26 11:16:48 +00:00
Allan Jude
ce808c7ad8 Add a new option to gpart(8) to fix Lenovo BIOS boot issue
PR:		184910
Reviewed by:	ae, wblock
Approved by:	marcel
MFC after:	3 days
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3065
2015-07-15 02:23:55 +00:00
Pawel Jakub Dawidek
4273d41299 Spoil even can happen for some time now even on providers opened exclusively
(on the media change event). Update GELI to handle that situation.

PR:		201185
Submitted by:	Matthew D. Fuller
2015-07-10 19:27:19 +00:00
Pawel Jakub Dawidek
fefb6a143a Properly propagate errors in metadata reading.
PR:		198860
Submitted by:	Matthew D. Fuller
2015-07-02 10:57:34 +00:00
Pawel Jakub Dawidek
edaa9008ff Allow to omit keyfile number for the first keyfile. 2015-07-02 10:55:32 +00:00
Edward Tomasz Napierala
628b712826 Fix off-by-one error in fstyp(8) and geom_label(4) that made them use
a single space (" ") as a CD9660 label name when no label was present.
Similar problem was also present in msdosfs label recognition.

PR:		200828
Differential Revision:	https://reviews.freebsd.org/D2830
Reviewed by:	asomers@, emaste@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2015-06-18 21:55:55 +00:00
Andrey V. Elsukov
e7d0c7e458 Teach G_PART_GPT class to handle g_resize_provider event.
MFC after:	10 days
2015-06-08 12:52:41 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Andrey V. Elsukov
153c57b5b4 Read GEOM_UNCOMPRESS metadata using several requests that fit into
MAXPHYS. For large compressed images the metadata size can be bigger
than MAXPHYS and this triggers KASSERT in g_read_data().
Also use g_free() to free memory allocated by g_read_data().

PR:		199476
MFC after:	2 weeks
2015-05-19 09:28:52 +00:00
Andrey V. Elsukov
4b8d4f97b0 Add apple-boot, apple-hfs and apple-ufs aliases to MBR scheme.
Sort DOSPTYP_* entries in diskmbr.h by value.
Document these scheme-specific types in gpart(8).

MFC after:	1 week
2015-05-05 09:33:02 +00:00
Craig Rodrigues
d9db52256e Move zlib.c from net to libkern.
It is not network-specific code and would
be better as part of libkern instead.
Move zlib.h and zutil.h from net/ to sys/
Update includes to use sys/zlib.h and sys/zutil.h instead of net/

Submitted by:		Steve Kiernan stevek@juniper.net
Obtained from:		Juniper Networks, Inc.
GitHub Pull Request:	https://github.com/freebsd/freebsd/pull/28
Relnotes:		yes
2015-04-22 14:38:58 +00:00
Pedro F. Giffuni
4a5e6b854d g_uncompress_taste: prevent a double free.
Found by:	Clang Static Analyzer
MFC after:	1 week
2015-04-20 16:31:27 +00:00
Alexander Motin
0ada3afc25 Remove sleeps from geom_up thread on device destruction.
MFC after:	3 days.
2015-04-09 13:09:05 +00:00
Alexander Motin
5d85cd2d11 Remove extra semicolon.
MFC after:	1 week
2015-03-27 12:45:20 +00:00
Alexander Motin
3ab0187add Remove request sorting from GEOM_MIRROR and GEOM_RAID.
When CPU is not busy, those queues are typically empty.  When CPU is busy,
then one more extra sorting is the last thing it needs.  If specific device
(HDD) really needs sorting, then it will be done later by CAM.

This supposed to fix livelock reported for mirror of two SSDs, when UFS
fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by
pointless sorting of requests and responses under single mutex lock.

MFC after:	2 weeks
2015-03-27 12:44:28 +00:00
Alexander Motin
41fe4ba647 Fix bug on memory allocation error in split method.
While there, use bioq_takefirst() in place where it is convenient.

MFC after:	1 week
2015-03-27 11:14:12 +00:00
Alexander Motin
5523c82c1a Make GEOM_PART work in presence of previous withered self.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-03-26 12:17:47 +00:00
Alexander Motin
2f36085dcf Report withered providers as such alike to GEOMs.
MFC after:	2 weeks
2015-03-26 11:19:24 +00:00
Alexander Motin
ba772028db When searching for provider by name, prefer non-withered one.
MFC after:	2 weeks
2015-03-26 11:02:29 +00:00
Adrian Chadd
28d507fcec Fix the label search routine in geom_map to not trip up on '\0' bytes.
* Just do the buf check early and fail out
* If the offset being searched is:

00110000  00 b5 7e 45 61 e2 76 d3  c1 78 dd 15 95 cd 1f f1  |..~Ea.v..x......|

.. and the match string is '.!/bin/sh'

.. then it'll set the match string[0] to '\0', do a strncmp() against
the read buffer, find it's matching two zero-length strings, and think
that's where to start.

MFC after:	2 weeks
2015-03-19 03:58:25 +00:00
Andrey V. Elsukov
4fb4ebe0a4 Add GUID and alias for Apple Core Storage partition.
PR:		196241
MFC after:	1 week
2015-03-12 18:51:31 +00:00
Alexander Motin
7715befdf2 Fix couple BIO_DELETE bugs in geom_mirror.
Do not report GEOM::candelete if none of providers support BIO_DELETE.
If consumer still requests BIO_DELETE, report error instead of hanging.

MFC after:	2 weeks
2015-03-12 10:20:53 +00:00
Alexander Motin
0b1b7c2cec Replace constant with proper sizeof().
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2015-02-25 10:18:11 +00:00
Edward Tomasz Napierala
01de1a0650 Add devd(8) notifications for creation and destruction of GEOM devices.
Differential Revision:	https://reviews.freebsd.org/D1211
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-01-14 11:15:57 +00:00
Warner Losh
a91275f72f Remove old ioctl use and support, once and for all. 2015-01-06 05:28:37 +00:00
Warner Losh
0acf08d985 Remove support for FreeBSD 7 and really old FreeBSD 8. The classifiers
have been in the base for a while, so the gymnastics here aren't
needed. In addition, the bugs in subr_disk.c have been fixed since
2009, so there's no need for an identical copy of it in the tree
anymore. There's really no need to binary patch g_io_request, so let's
get rid of the code (not compiled in anymore) lest others think it is
a good idea.
2014-12-20 00:04:01 +00:00
John-Mark Gurney
08fca7a56b Add some new modes to OpenCrypto. These modes are AES-ICM (can be used
for counter mode), and AES-GCM.  Both of these modes have been added to
the aesni module.

Included is a set of tests to validate that the software and aesni
module calculate the correct values.  These use the NIST KAT test
vectors.  To run the test, you will need to install a soon to be
committed port, nist-kat that will install the vectors.  Using a port
is necessary as the test vectors are around 25MB.

All the man pages were updated.  I have added a new man page, crypto.7,
which includes a description of how to use each mode.  All the new modes
and some other AES modes are present.  It would be good for someone
else to go through and document the other modes.

A new ioctl was added to support AEAD modes which AES-GCM is one of them.
Without this ioctl, it is not possible to test AEAD modes from userland.

Add a timing safe bcmp for use to compare MACs.  Previously we were using
bcmp which could leak timing info and result in the ability to forge
messages.

Add a minor optimization to the aesni module so that single segment
mbufs don't get copied and instead are updated in place.  The aesni
module needs to be updated to support blocked IO so segmented mbufs
don't have to be copied.

We require that the IV be specified for all calls for both GCM and ICM.
This is to ensure proper use of these functions.

Obtained from:	p4: //depot/projects/opencrypto
Relnotes:	yes
Sponsored by:	FreeBSD Foundation
Sponsored by:	NetGate
2014-12-12 19:56:36 +00:00
Alexander Motin
1e68fe9c33 Avoid unneeded malloc/memcpy/free if there is no metadata on disk.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2014-12-05 10:23:18 +00:00
Alexander Motin
26f0f92fa2 Decode some binary fields of Intel metadata.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2014-12-04 15:54:45 +00:00
Warner Losh
66cc25a224 Actually, that was a bad idea. Go back to MAXPARTITIONS.
Submitted by: bruce
2014-11-20 17:31:25 +00:00
Warner Losh
dd87e2c610 The number of BSD partitions is variable. Return the proper number
(which is in basetable->gpt_entries).

Submitted by: ae@
2014-11-19 18:55:27 +00:00
Warner Losh
73f49e9eef Implement the historic DIOCGDINFO ioctl for gpart on BSD
partitions. Several utilities still use this interface and require
additional information since gpart was activated than before. This
allows fsck of a UFS partition without having to specify it is UFS,
per historic behavior.
2014-11-18 17:06:40 +00:00
Pawel Jakub Dawidek
5ebb15b942 Add missing privilege check when setting the dump device. Before that change it
was possible for a regular user to setup the dump device if he had write access
to the given device. In theory it is a security issue as user might get access
to kernel's memory after provoking kernel crash, but in practise it is not
recommended to give regular users direct access to storage devices.

Rework the code so that we do privileges check within the set_dumper() function
to avoid similar problems in the future.

Discussed with:	secteam
2014-11-11 04:48:09 +00:00
Dag-Erling Smørgrav
133cdd9e13 Constify the AES code and propagate to consumers. This allows us to
update the Fortuna code to use SHAd-256 as defined in FS&K.

Approved by:	so (self)
2014-11-10 09:44:38 +00:00
Poul-Henning Kamp
cd15a01091 Translate the errno to gctl_error() texts.
Spotted by:	mwlucas
2014-11-09 15:52:11 +00:00
Alexander Motin
c3e7ba3e6d Add to CTL support for logical block provisioning threshold notifications.
For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-11-06 00:48:36 +00:00
Alexander Motin
ccf8a5688a Revert somewhat hackish geom_disk optimization, committed as part of r256880,
and the following r273143 commit, supposed to workaround introduced issue by
quite innocent-looking change.

While there is no clear understanding why, but r273143 is accused in data
corruption in some environments with high I/O load.  I personally don't see
any problem in that commit, and possibly it is just a trigger to some other
bug somewhere, but better safe then sorry for now.

Requested by:	scottl@
MFC after:	3 days
2014-10-25 15:16:19 +00:00
Colin Percival
66427784c1 Populate the GELI passphrase cache with the kern.geom.eli.passphrase
variable (if any) provided in the boot environment.  Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.

This will make it possible to provide a GELI passphrase via the boot
loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD)
will have support for this soon.

Tested by:	kmoore
2014-10-22 23:41:15 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Andrey V. Elsukov
52fa0beb0a Add provider's sectorsize and stripesize to confdot output.
Submitted by:	rpokala at panasas.com
2014-10-17 06:58:04 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Andrey V. Elsukov
0478dc0c16 Add an ability to set dumpdev via loader(8) tunable.
MFC after:	3 weeks
2014-10-08 12:18:16 +00:00
Hiroki Sato
d17183901f Fix a bug in r272297 which prevented dumpdev from setting.
!u is not equivalent to (u != 0).
2014-10-03 04:13:25 +00:00
Pawel Jakub Dawidek
227f68edbb Be prepared that set_dumper() might fail even when resetting it or prefix
the call with (void) to document that we intentionally ignore the return
value - no way to handle an error in case of device disappearing.
2014-09-30 12:00:50 +00:00
Pawel Jakub Dawidek
7f5b50719b Style fixes. 2014-09-30 11:51:32 +00:00
Colin Percival
835c4dd436 Cache GELI passphrases entered at the console during the boot process,
in order to improve user-friendliness when a system has multiple disks
encrypted using the same passphrase.

When examining a new GELI provider, the most recently used passphrase
will be attempted before prompting for a passphrase; and whenever a
passphrase is entered, it is cached for later reference.  When the root
disk is mounted, the cached passphrase is zeroed (triggered by the
"mountroot" event), in order to minimize the possibility of leakage
of passphrases.  (After root is mounted, the "taste and prompt for
passphrases on the console" code path is disabled, so there is no
potential for a passphrase to be stored after the zeroing takes place.)

This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0.

Reviewed by:	pjd, dteske, allanjude
MFC after:	7 days
2014-09-16 08:40:52 +00:00
Sean Bruno
5f23eb4d9c Add device name used in geom_map verbose output. This helps when using
geom_map with multiple flash/spi devices.

Phabric:  https://reviews.freebsd.org/D766
Reviewed by:	adrian
MFC after:	2 weeks
2014-09-11 22:39:27 +00:00
John-Mark Gurney
89fac384c8 use a straight buffer instead of an iov w/ 1 segment... The aesni
driver when it hits a mbuf/iov buffer, it mallocs and copies the data
for processing..  This improves perf by ~8-10% on my machine...

I have thoughts of fixing AES-NI so that it can better handle segmented
buffers, which should help improve IPSEC performance, but that is for
the future...
2014-09-04 23:53:51 +00:00
Scott Long
274919e965 Deal explicitly with possible failures of make_dev_alias_p() in GEOM.
Submitted by:   Mariusz Zaborski <oshogbo@FreeBSD.org>
MFC after:      3 days
2014-08-18 19:27:47 +00:00
Andrey V. Elsukov
36b16d1f7d Turn off kern.geom.part.mbr.enforce_chs by default. 2014-08-12 10:31:31 +00:00
Andrey V. Elsukov
fb86534cb1 Add sysctl and loader tunable kern.geom.part.mbr.enforce_chs that is set
by default. It can be used to disable automatic alignment to CHS geometry,
that GEOM_PART_MBR does.

Reviewed by:	wblock
MFC after:	1 week
2014-08-12 09:10:13 +00:00
Warner Losh
cba7d97b61 cswitch is unsigned, so don't compare it < 0. Any negative numbers
will look huge and be caught by > 100.
2014-08-07 21:56:42 +00:00
Warner Losh
86e26cb154 Unsigned values can never be less than 0. 2014-08-07 21:56:37 +00:00
Marcel Moolenaar
6c25615f39 In r264504, we prevented doing I/O for more than MAXPHYS by making
the assumption that consumers would respect bio_completed and/or
bio_resid to detect short reads. This assumption proved false and
file corruption was the result.
Create as many bios as we need to satisfy the original request.
Check the cached chunk every time we need to do I/O to increase the
hit rate.

Obtained from:	junipre Networks, Inc.
MFC after:	1 week
2014-07-22 17:30:05 +00:00
Nathan Whitehorn
1ee0f08975 After EFI support was added to the installer, it needed to allow boot
partitions of types other than "freebsd-boot" (in particular, "efi").
This allows the removal of some nasty hacks for supporting PowerPC systems,
in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific
code on MBR.

This changes the installer to use the correct names, which also breaks a
degeneracy in the meaning of "freebsd-boot" that allows the addition
of support for some newer IBM systems that can boot from GPT in addition to
MBR. Since I have no idea how to detect which those systems are, leave
the default on IBM PPC systems as MBR for now.
2014-07-04 15:55:32 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Andrey V. Elsukov
91ca76a590 Add disklabel64 support to GEOM_PART class.
This partitioning scheme is used in DragonFlyBSD. It is similar to
BSD disklabel, but has the following improvements:
* metadata has own dedicated place and isn't accessible through partitions;
* all offsets are 64-bit;
* supports 16 partitions by default (has reserved place for more);
* has reserved place for backup label (but not yet implemented);
* has UUIDs for partitions and partition types;

No objections from:	geom
MFC after:	2 weeks
Relnotes:	yes
2014-06-11 10:42:34 +00:00
Andrey V. Elsukov
4042ab48c7 Allow swapping to DragonFlyBSD's swap partition.
MFC after:	2 weeks
2014-06-11 10:23:49 +00:00
Andrey V. Elsukov
0640b71dfe Add aliases for DragonFlyBSD's partition types.
MFC after:	2 weeks
2014-06-11 10:19:11 +00:00
Brad Davis
ebd05adab8 - Fix the keyfile being cleared prematurely after r259428
PR:		185084
Submitted by:	fk@fabiankeil.de
Reviewed by:	pjd@
2014-06-06 03:17:37 +00:00
Andrey V. Elsukov
39dcac849e Use g_conf_printf_escaped() to escape symbols, which can break
an XML tree.

MFC after:	1 week
2014-05-30 10:35:51 +00:00
Andrey V. Elsukov
17e0c43319 Add a topology trace to the g_spoil_event.
MFC after:	1 week
2014-05-19 16:08:15 +00:00
Andrey V. Elsukov
362073c089 We have two functions from where a geom orphan method could be called:
g_orphan_register and g_resize_provider_event. Both are called from the
event queue. Also we have GEOM_DEV class, which does deferred destroy
for its consumers via g_dev_destroy (also called from the event queue).
So it is possible, that for some consumers an orphan method will be
called twice. This triggers panic in g_dev_orphan.
Check that consumer isn't already orphaned before call orphan method.

MFC after:	2 weeks
2014-05-19 16:05:42 +00:00
Alexander Motin
413037c8e7 Make GEOM DISK to account also BIO_FLUSH operations. 2014-05-17 15:07:00 +00:00
Andrey V. Elsukov
579259ea0d It is safe to allow shrinking, when aligned size is bigger than current.
Tested by:	jmg
MFC after:	1 week
2014-05-07 11:18:27 +00:00
Edward Tomasz Napierala
c7c7d7d0f0 Make r242379 - the fix for UFS labels disappearing after resizing
the provider - also apply to UFS1 filesystems.  This should help with
resizing filesystems created by makefs(8), which still uses UFS1.

Tested by:	jmg@
Sponsored by:	The FreeBSD Foundation
2014-05-05 09:20:30 +00:00
Andrey V. Elsukov
4f31a94bd2 Add an advice what to do when partition was automatically resized.
X-MFC after:	r256690
2014-05-04 20:00:08 +00:00
Andrey V. Elsukov
c778397f26 Add better error description for case when we are doing resize and
scheme-specific method returns EBUSY.

MFC after:	1 week
2014-05-04 16:55:51 +00:00
Andrey V. Elsukov
0dd7f00cee Prevent an unexpected shrinking on resizing due to alignment for MBR,
PC98 and VTOC8 schemes.

Reported by:	jmg
MFC after:	1 week
2014-05-04 16:43:57 +00:00
Andrey V. Elsukov
bc1e8f56ff For schemes that do an automatic partition aligning move this code to
separate function.

MFC after:	1 week
2014-05-04 10:14:25 +00:00
Luiz Otavio O Souza
81694cde44 Fix a leak in g_uzip_taste(). After retrieve all the block offsets from
the uzip image, free the last data read.
2014-05-01 15:23:20 +00:00
Luiz Otavio O Souza
ccb7284af1 Actually the FEATURE() macro is defined on sys/sysctl.h.
Pointyhat to:	loos
2014-05-01 14:59:04 +00:00
Luiz Otavio O Souza
6d8beede60 Some style and whitespace fixes. Reduce the difference between geom_uzip(4)
and geom_uncompress(4).  Now, they produce an almost clean diff(1) output.

Remove a duplicated variable from g_uncompress.c and an unnecessary header
from g_uzip.c.

No functional changes.
2014-05-01 14:47:27 +00:00
Bryan Drewery
74679c6a99 Remove redundant include
MFC after:	3 days
2014-04-29 01:17:43 +00:00
Alexander Motin
dea1e22600 Reduce number of opens by REOM RAID during provider taste.
Instead opening/closing provider by each of metadata classes, do it only
once in core code.  Since for SCSI disks open/close means sending some
SCSI commands to the device, this change reduces taste time.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-04-28 15:03:52 +00:00
Luiz Otavio O Souza
6f05733a1f Keep geom_uncompress(4) in line with geom_uzip(4), bring in the r264504 fix.
Make sure not to start I/O bigger than MAXPHYS bytes.

Quoting r264504:

When we detect the condition, we'll reduce the block count and perform
a "short" read.  In g_uncompress_done() we need to consider the original
I/O length and stop early if we're about to deflate a block that we didn't
read.  By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.

Reviewed by:	marcel
2014-04-22 18:08:34 +00:00
Marcel Moolenaar
855be5b2c1 Make sure not to do I/O for more than MAXPHYS bytes. Doing so can cause
problems in our providers, such as a KASSERT in md(4). We can initiate
I/O for more than MAXPHYS bytes if we've been given a BIO for MAXPHYS
bytes, the blocks from which we're reading couldn't be compressed and
we had compression in preceeding blocks resulting in misalignment of
the blocks we're trying to read relative to the sector. We're forced to
round up the I/O length to make it an multiple of the sector size.

When we detect the condition, we'll reduce the block count and perform
a "short" read. In g_uzip_done() we need to consider the original I/O
length and stop early if we're about to deflate a block that we didn't
read. By using bio_completed in the cloned BIO and not bio_length to
check for this, we automatically and gracefully handle short reads that
our providers may be doing on top of the short reads we may initiate
ourselves.

Obtained from:	Juniper Networks, Inc.
2014-04-15 15:41:57 +00:00
Bryan Drewery
87bc328d63 Make g_access() KASSERT() more useful.
Sponsored by:	EMC / Isilon Storage Division
Obtained from:	Isilon OneFS
MFC after:	2 weeks
2014-04-15 14:41:41 +00:00
Marcel Moolenaar
4787115d04 Align and round the partitionable disk space to 4K by default.
Since this would also apply when recovering, make sure not to
align or round when that would have a partition fall outside
the partitionable area.
2014-04-12 20:28:39 +00:00
Bryan Drewery
1e4b22b44b Fix spelling error in g_trace() call.
Sponsored by:	EMC / Isilon Storage Division
MFC after:	1 week
2014-04-10 17:00:44 +00:00
Alexander Motin
1229e83d2b Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields.
This caused incorrect behavior of arrays with big-endian DDF metadata.
Little-endian (like used by Adaptec controllers) should not be harmed.
Add workaround should be enough to manage compatibility.

MFC after:	2 weeks
2014-04-10 16:00:33 +00:00
Alexander Motin
66b92c07fe Do not increment bio_data in case of BIO_DELETE.
This fixes KASSERT() panic in g_io_request().
2014-04-10 10:12:56 +00:00
Marcel Moolenaar
e8c166e85a An all-or-nothing approach to labels isn't flexible enough. Embedded
systems need fine-grained control over what's in and what's out.
That's ideal. For now, separate GPT labels from the rest and allow
g_label to be built with just GPT labels.

Obtained from:	Juniper Networks, Inc.
2014-04-06 02:44:37 +00:00
Marcel Moolenaar
12b2d77da9 Make sure we don't free memory that's already been freed by setting
the geom->softc pounter to NULL before freeing the g_slicer softc.
In g_slicer_free() the pointer is checked first.

Obtained from:	Juniper Networks, Inc.
2014-04-06 02:20:42 +00:00
Bryan Drewery
09adfca39f Show error code when failing to destroy a mirror on delay
Sponsored by:	EMC / Isilon Storage Division
MFC after:	2 weeks
2014-04-05 03:01:29 +00:00
Xin LI
c35ddb346f In g_eli_crypto_hmac_init(), zero out after using the ipad buffer,
k_ipad.

Note that the two consumers in geli(4) are not affected by this
issue because the way the code is constructed and as such, we
believe there is no security impact with or without this change
with geli(4)'s usage.

Reported by:	Serge van den Boom <serge vdboom.org>
Reviewed by:	pjd
MFC after:	2 weeks
2014-02-08 05:17:49 +00:00
Luiz Otavio O Souza
d9ffbff9f0 Fix the build with DEBUG enabled. Where possible, fix style(9) issues.
Reviewed by:	bde
Approved by:	adrian (mentor)
2014-02-07 13:06:48 +00:00
Luiz Otavio O Souza
f0d701f048 Fix a logic error. Because of this inflateReset() wasn't being called and
the output buffer wasn't being cleared between the inflate() calls,
producing zeroed output after the first inflate() call.

This fixes the read of mkuzip(8) images with geom_uncompress(4).

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-02-03 17:25:36 +00:00
Luiz Otavio O Souza
c2d90f35d5 Remove some unnecessary code. The offsets read from the first block are
overwritten a few lines bellow.

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-02-03 17:21:36 +00:00
Andrey V. Elsukov
524d7a4d4e Always free sbuf in gctl_free().
MFC after:	1 week
2014-01-23 21:30:31 +00:00
Andrey V. Elsukov
d14a7ff1f5 Remove another unneeded NULL check from geom_alloc_copyin().
Do copyout in case of gctl version mismatch and fix sbuf leak in
g_ctl_ioctl_ctl().

MFC after:	1 week
2014-01-23 20:25:38 +00:00
Andrey V. Elsukov
7f0e13dfe0 In gctl_copyin() remove unused error variable.
geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad
control request. Add check for NULL pointer in gctl_dump(), since it
can be NULL when geom_alloc_copyin() failed.

MFC after:	1 week
2014-01-23 19:55:02 +00:00
Andrey V. Elsukov
625ee733e3 Fix typo in r261084.
Add to the gctl_error() an ability to specify error description even
if numeric error code is already specified. Also by default set
error code to EINVAL.

PR:		185852
MFC after:	1 week
2014-01-23 19:31:17 +00:00
Andrey V. Elsukov
ee839ce84c malloc() with M_WAITOK doesn't return NULL.
MFC after:	1 week
2014-01-23 19:07:22 +00:00
Alexander Motin
eaed60f737 Removed unneeded and dangerous assignment. It would probably cause NULL
refererence panic if compiler not optimize it out.

Found with:	Clang static analyzer
MFC after:	2 weeks
2014-01-19 16:37:57 +00:00
Luiz Otavio O Souza
67619a4120 Build the geom_uncompress(4) module by default.
Fix geom_uncompress(4) module loading.  Don't link zlib.c (which is a module
itself) directly.

The built module was verified and used to read a few mkulzma(8) images on
amd64 to validate some of the informations on the manual page.

While here, don't overwrite CFLAGS.

Reviewed by:	ray
Approved by:	adrian (mentor)
2014-01-10 20:29:46 +00:00
Andrey V. Elsukov
ae3bc0acff Add an ability to stop gmirror and clear its metadata in one command.
This fixes the problem, when gmirror starts again just after stop.

The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.

Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.

MFC after:	2 weeks
2013-12-27 02:43:53 +00:00
Dmitry Morozovsky
5cc596c46d Add GPT UUID for VMware vSAN meta-data partition.
Approved by:	ae
MFC after:	2 weeks
2013-12-26 21:06:12 +00:00
Andrey V. Elsukov
7c5710dbaf Prevent users from deactivating the last component of a mirror.
PR:		184985
MFC after:	1 week
2013-12-19 22:13:12 +00:00
Pawel Jakub Dawidek
396b29c74e Clear some more places with potentially sensitive data.
MFC after:	1 week
2013-12-15 22:52:18 +00:00
Pawel Jakub Dawidek
2a3237c84f Clear content of keyfiles loaded by the loader after processing them.
Pointed out by:	rwatson
MFC after:	1 week
2013-12-15 22:51:26 +00:00
Alexander Motin
2634da8cd5 Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.

Bisected by:	pho
2013-12-12 08:23:28 +00:00
Justin Hibbits
6cec74b2e4 Partially revert r259080. bde@ pointed out that there are a lot more style bugs
going on in here than can be fixed, and I introduced some of my own.  Rather
than fix the whole host of them, back out my bugs.

Found by:	bde
X-MFC with:	r259080
2013-12-08 09:34:56 +00:00
Justin Hibbits
8991c54091 Fix some integer signs. These unsigned integers should all be signed.
Found by:	clang (powerpc64)
2013-12-07 19:55:34 +00:00
Eitan Adler
7a22215c53 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
Alexander Motin
7ae1a87bfe Escape special XML chars, returned by some devices, confusing XML parsers.
MFC after:	1 month
2013-11-27 14:25:06 +00:00
Marcel Moolenaar
3e5a0a6b70 Have the GPT probe return a lower priority when the MBR is not a PMBR
The purpose of the PMBR is to have the disk appear in use to GPT
unaware utilities (like fdisk).  However, if the PMBR has been changed
by a GPT unaware utlity then we must assume that this was deliberate
(as it involved removal of the special slice) and we should not treat
the unmodified GPT-specific sectors as being valid.  By lowering the
probe priority in that case, the MBR scheme will take precedence and
the kernel will end up using the MBR and not the GPT. We will still
use the GPT if the kernel does not support the MBR scheme.
2013-11-21 22:02:59 +00:00
Andrey V. Elsukov
32cea4ca0f Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4).
Now it is easy to expand the size of the mirror when all its components
are replaced. Also add g_resize method to geom_mirror class. It will write
updated metadata to new last sector, when parent provider is resized.

Silence from:	geom@
MFC after:	1 month
2013-11-19 22:55:17 +00:00
Alexander Motin
f8c79813cb In addition to r258220 allow shrinking in "automatic" mode if there is
already valid metadata found at the new location.  This should allow easy
transparent recovery if first resize was done by mistake.

While there, unify metadata write code and fix minor memory leak.

MFC after:	1 month
2013-11-17 05:38:54 +00:00
Alexander Motin
e6afd72b93 Implement automatic live resize support for GEOM MULTIPATH class.
In "manual" mode just automatically resize provider in any direction.
In "automatic" mode allow only growth (with new metadata write); in case
of shrinking destroy the multipath device same as before since it may be
undesirable to write new metadata within old user area.

MFC after:	1 month
2013-11-16 14:31:49 +00:00
Andrey V. Elsukov
743437c451 Add missing line breaks.
PR:		181900
MFC after:	1 week
2013-11-11 11:13:12 +00:00
Xin LI
7ac2e58818 When zero'ing out a buffer, make sure we are using right size.
Without this change, in the worst but unlikely case scenario, certain
administrative operations, including change of configuration, set or
delete key from a GEOM ELI provider, may leave potentially sensitive
information in buffer allocated from kernel memory.

We believe that it is not possible to actively exploit these issues, nor
does it impact the security of normal usage of GEOM ELI providers when
these operations are not performed after system boot.

Security:	possible sensitive information disclosure
Submitted by:	Clement Lecigne <clecigne google com>
MFC after:	3 days
2013-11-02 01:16:10 +00:00