Commit Graph

2275 Commits

Author SHA1 Message Date
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Alexander Motin
0aabbeff36 Remove extra check for provider being closed.
We already checked for that earlier, and since we hold topology lock
it could not change.

MFC after:	1 week
2020-01-02 20:30:53 +00:00
Alexander Motin
4aa1289a38 Avoid few memory accesses in g_disk_done(). 2019-12-31 03:43:13 +00:00
Alexander Motin
024932aae9 Use atomic for start_count in devstat_start_transaction().
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places.  It would be cool if we had
more SMP-friendly statistics, but this helps too.

Sponsored by:	iXsystems, Inc.
2019-12-30 03:13:38 +00:00
Alexander Motin
9794a803fd Retire nstart/nend counters.
Those counters were abused for decade to workaround broken orphanization
process in different classes by delaying the call while there are active
requests.  But from one side it did not close all the races, while from
another was quite expensive on SMP due to trashing twice per request cache
lines of consumer and provider and requiring locks.  It lost its sense
after I manually went through all the GEOM classes in base and made
orphanization wait for either provider close or request completion.

Consumer counters are still used under INVARIANTS to detect premature
consumer close and detach.  Provider counters are removed completely.

Sponsored by:	iXsystems, Inc.
2019-12-30 00:46:10 +00:00
Alexander Motin
86c06ff886 Remove GEOM_SCHED class and gsched tool.
This code was not actively maintained since it was introduced 10 years ago.
It lacks support for many later GEOM features, such as direct dispatch,
unmapped I/O, stripesize/stripeoffset, resize, etc.  Plus it is the only
remaining use of GEOM nstart/nend request counters, used there to implement
live insertion/removal, questionable by itself.  Plus, as number of people
commented, GEOM is not the best place for I/O scheduler, since it has
limited information about layers both above and below it, required for
efficient scheduling.  Plus with the modern shift to SSDs there is just no
more significant need for this kind of scheduling.

Approved by:	imp, phk, luigi
Relnotes:	yes
2019-12-29 21:16:03 +00:00
Alexander Motin
cfdb91850c Missed part of r356162.
If we postpone consumer destruction till close, then the close calls should
not be ignored.  Delay geom withering till the last close too.

MFC after:	2 weeks
X-MFC-with:	r356162
Sponsored by:	iXsystems, Inc.
2019-12-29 19:33:41 +00:00
Alexander Motin
1d301810d3 Fix GEOM_VIRSTOR orphanization.
Previous code closed and destroyed consumer even with I/O in progress.
This patch postpones the destruction till the last close.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-29 19:21:29 +00:00
Alexander Motin
d2d5fee931 Fix GEOM_MOUNTVER orphanization.
Previous code closed and detached consumer even with I/O still in progress.
This patch adds locking and request counting to postpone the close till
the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-29 17:10:21 +00:00
Mariusz Zaborski
645532a448 gnop: change the "count until fail" option
Change the "count_until_fail" option of gnop, now it enables the failing
rating instead of setting them to 100%.

The original patch introduced the new flag, which sets the fail/rate to 100%
after N requests. In some cases, we don't want to have 100% of failure
probabilities. We want to start failing at some point.
For example, on the early stage, we may like to allow some read/writes requests
before having some requests delayed - when we try to mount the partition,
or when we are trying to import the pool.
Another case may be to check how scrub in ZFS will behave on different stages.

This allows us to cover more cases.
The previous behavior still may be configured.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22632
2019-12-29 15:47:37 +00:00
Mariusz Zaborski
80e63e0a90 gnop: allow to change the name of created device
Thanks to this option we can create more then one gnop provider from
single provider. This may be useful for temporary labeling some data
on the disk.

Reviewed by:	markj, allanjude, bcr
Differential Revision:	https://reviews.freebsd.org/D22304
2019-12-29 15:40:02 +00:00
Alexander Motin
6a8eef35b5 Fix GEOM_SHSEC orphanization.
Previous code closed and destroyed consumer even with I/O in progress.
This patch postpones the destruction till the last close, identical to
GEOM_STRIPE, since they seem to have common origin.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 23:21:53 +00:00
Alexander Motin
351d0fa6df Fix GEOM_GATE orphanization.
Previous code closed and destroyed direct read consumer even with I/O still
in progress.  This patch adds locking and request counting to postpone the
close till the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 17:52:53 +00:00
Alexander Motin
2178f45b86 Fix GEOM_UZIP orphanization.
Previous code destroyed softc even with provider still open, that resulted
in panic under load.  This change postpones the free till the final close,
when we know for sure there will be no more I/O requests.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-27 21:44:13 +00:00
Alexander Motin
a29df733fa Reimplement gvinum orphanization.
gvinum was the only GEOM class, using consumer nstart/nend fields. Making
it do its own accounting for orphanization purposes allows in perspective
to remove burden of that expensive for SMP accounting from GEOM.

Also the previous implementation spinned in a tight event loop, waiting
for all active BIOs to complete, while the new one knows exactly when it
is possible to close the consumer.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2019-12-27 01:36:53 +00:00
Warner Losh
b182c79211 Add BIO_SPEEDUP
Add BIO_SPEEDUP bio command and g_io_speedup wrapper. It tells the
lower layers that the upper layers are dealing with some shortage
(dirty pages and/or disk blocks). The lower layers should do what they
can to speed up anything that's been delayed.

The first use will be to tell the CAM I/O scheduler that any TRIM
shaping should be short-circuited because the system needs
blocks. We'll also call it when there's too many resources used by
UFS.

Reviewed by: kirk, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18351
2019-12-17 00:13:35 +00:00
Edward Tomasz Napierala
2006d590d6 Add kern.geom.part.separator tunable. This makes it possible
to specify an optional separator to insert before partition name;
eg if it's set to "c/", you'll get "ada0c/s1" instead of "ada0s1".
(It cannot be set to just “/“, since ada0 is a device node, not
a directory.)

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D22193
2019-12-13 09:28:44 +00:00
Alexander Motin
5ccbeea1c5 Remove some branching from GEOM_DISK hot path.
pp->private just can not be NULL in those places.

In g_disk_start() and g_disk_ioctl() both dp != NULL and !dp->d_destroyed
should always be true if disk_gone() and disk_destroy() are used properly,
since GEOM does not send requests to errored providers.  If the protocol is
not followed, then no amount of additional checks here give real safety.

In g_disk_access() though the checks are useful, since GEOM blocks only
new opens for errored providers, but allows closes.  It should not happen
if disk_gone() and disk_destroy() are used properly, but may otherwise.

To improve cases when disk_gone() is not used, call it from disk_destroy().
It does not give full guaranties, but it errors the provider and makes
GEOM block unwanted requests at least after some race.

MFC after:	2 weeks
2019-12-06 16:48:36 +00:00
Alexander Motin
19cfcf253e Block ioctls for dying GEOM_DEV instances.
For normal I/Os consumer and provider statuses are checked by g_io_check().
But ioctl calls often do not go through it, being dispatched directly. This
change makes their semantics more alike, protecting lower levels.

MFC after:	2 weeks
2019-12-06 03:46:38 +00:00
Alexander Motin
6b3c68bf09 Make GEOM_DEV code slightly more compact.
Should be no functional change.

MFC after:	2 weeks
2019-12-06 03:18:37 +00:00
Alan Somers
67f72211dd gmultipath: add ATF tests
Add ATF tests for most gmultipath operations. Add some dtrace probes too,
primarily for configuration changes that happen in response to provider
errors.

PR:		178473
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D22235
2019-12-06 00:12:14 +00:00
Alexander Motin
c4c88d4718 Remove duplicate g_debugflags declaration.
While there, define G_F_FOOTSHOOTING instead of numeric constants.

MFC after:	13 days
X-MFX-with:	r355412
2019-12-05 15:07:32 +00:00
Alexander Motin
2efaef42e4 Wrap g_trace() into a macro to avoid unneeded calls.
In most cases with debug disabled this function does nothing, but argument
passing and the call still cost measurable time due to cache misses, etc.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:52:19 +00:00
Alexander Motin
e61ed7983e Switch GEOM_DEV from make_dev_p() to make_dev_s().
It closes the race condition and so allows to remove few NULL checks.

Also while there, use dev->si_drv1 in addition to cp->private to store
softc pointer.  For calls coming from the dev side it gives reliable cache
hit instead of often miss before.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:03:08 +00:00
Alexander Motin
61322a0a8a Mark some more hot global variables with __read_mostly.
MFC after:	1 week
2019-12-04 21:26:03 +00:00
Warner Losh
283a5a3796 We don't even need Giant here. It isn't protecting anything internal
to geom, and nothing we call requires it to be held. It's left over
from a time when the latter wasn't the case. Retire it.

Reviewed in concept: scottl@
2019-11-23 23:44:00 +00:00
Edward Tomasz Napierala
b5961be1ab Add GEOM attribute to report physical device name, and report it
via 'diskinfo -v'.  This avoids the need to track it down via CAM,
and should also work for disks that don't use CAM.  And since it's
inherited thru the GEOM hierarchy, in most cases one doesn't need
to walk the GEOM graph either, eg you can use it on a partition
instead of disk itself.

Reviewed by:	allanjude, imp
Sponsored by:	Klara Inc
Differential Revision:	https://reviews.freebsd.org/D22249
2019-11-09 17:30:19 +00:00
Chuck Silvers
30738a349d Make all the gnop parameters optional in the request from userland,
filling in the same defaults that the current userland module uses.
This allows an old geom_nop.so userland module to work with a new kernel.

Approved by:	imp (mentor)
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21972
2019-10-16 21:49:44 +00:00
Chuck Silvers
1e00bb458d Add a new gctl_get_paraml_opt() interface to extract optional parameters from
the request.  It is the same as gctl_get_paraml() except that the request
is not marked with an error if the parameter is not present.

Approved by:	imp (mentor)
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21972
2019-10-16 21:49:39 +00:00
Chuck Silvers
090a3ea3c2 Add a "count_until_fail" option to gnop, which says to start failing
I/O requests after the given number have been allowed though.

Approved by:    imp (mentor)
Reviewed by:    rpokala kib 0mp mckusick
Sponsored by:   Netflix
Differential Revision:  https://reviews.freebsd.org/D21593
2019-09-13 23:03:56 +00:00
Kyle Evans
ef03f57dd2 Allow more nesting of GEOM partitioning schemes
GEOM is supposed to be topology-agnostic, but the GPT and BSD partition code
has arbitrary restrictions on nesting that are annoying in cases such as
running VMs on raw partitions (since the VM's partitioning scheme is not
visible to the host).

This patch adds sysctls to disable the restrictions except in the case of
BSD label (and similar) partitions with offset 0 (where we need to avoid
recursively recognizing the label).

Submitted by:	Andrew Gierth
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21350
2019-09-03 20:57:20 +00:00
Conrad Meyer
eefd8f96fb geom_uzip(4), mkuzip(8): Add Zstd image mode
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems.  Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.

mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format.  Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE.  The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned.  This happened in about 1 out of every 512
cases.  The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.

Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).

mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.

  * Input block size validation was pulled out of individual compression
    init routines into main().

  * Init routines now validate a user-provided compression level or select
    an algorithm-specific default, if none was provided.

  * A new interface for calculating the maximal compressed size of an
    incompressible input block was added for each driver.  The generic code
    uses it to validate against MAXPHYS as well as to allocate compression
    result buffers in the generic code.

  * Algorithm selection is now driven by a table lookup, to increase ease of
    adding other formats in the future.

mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'.  The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.

Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'.  '-L' is considered deprecated, but will probably never be removed.

All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.

Relnotes:	yes
2019-08-13 23:32:56 +00:00
Conrad Meyer
ac8e5d02cf Remove deprecated GEOM classes
Follow-up on r322318 and r322319 and remove the deprecated modules.

Shift some now-unused kernel files into userspace utilities that incorporate
them.  Remove references to removed GEOM classes in userspace utilities.

Reviewed by:	imp (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21249
2019-08-13 20:06:55 +00:00
Xin LI
2b0cabbdae Update geom_uzip to use new zlib:
- Use new zlib headers;
 - Removed z_alloc and z_free to use the common sys/dev/zlib version.
 - Replace z_compressBound with compressBound from zlib.

While there, limit LZMA CFLAGS to apply only for g_uzip_lzma.c.

PR:		229763
Submitted by:	Yoshihiro Ota <ota j email ne jp> (with changes,
		bugs are mine)
Differential Revision:	https://reviews.freebsd.org/D20271
2019-08-08 06:27:39 +00:00
Conrad Meyer
ac03832ef3 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
Kirk McKusick
e9660daffb Ignore UFS/FFS superblock check hash failures so as to allow a higher
level in the filesystem stack to decide what to do about them.

Reported by:  Peter Holm
Tested by:    Peter Holm
Sponsored by: Netflix
2019-08-06 18:28:44 +00:00
Mariusz Zaborski
4d7486c30f gnop: style nits 2019-07-31 17:51:06 +00:00
Mariusz Zaborski
4f80c85519 gnop: Introduce requests delay.
This allows to simulated disk that is responding slowly to the IO requests.

Reviewed by:	markj, bcr, pjd (previous version)
Differential Revision:	https://reviews.freebsd.org/D21052
2019-07-31 17:47:12 +00:00
Ryan Libby
9167705c8c g_mirror_taste: avoid deadlock, always clear tasting flag
If g_mirror_taste encountered an error at g_mirror_add_disk, it might
try to g_mirror_destroy the device with the G_MIRROR_DEVICE_FLAG_TASTING
flag still set.  This would wait on a worker to complete the destruction
with g_mirror_try_destroy, but that function bails out if the tasting
flag is set, resulting in a deadlock.  Clear the tasting flag before
trying to destroy the device.

Test Plan:
sysctl debug.fail_point.mnowait="1%return"
kyua test -k /usr/tests/sys/geom/class/mirror/Kyuafile

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20744
2019-07-01 22:06:36 +00:00
Ryan Libby
3bb6e0f0c7 g_eli_create: only dec g_access acw if we inc'd it
Reviewed by:	cem, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20743
2019-07-01 22:06:16 +00:00
Warner Losh
f5a95d9a07 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
Alexander Motin
49ee0fcea5 Use sbuf_cat() in GEOM confxml generation.
When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-19 15:36:02 +00:00
Alexander Motin
5c32e9fcb2 Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output.  Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns.  Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
 - on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
 - on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-18 21:05:10 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Mariusz Zaborski
a802439365 geli: style nits 2019-06-12 19:29:48 +00:00
Mariusz Zaborski
e7630efbe6 geli: partially revert r348709
Let's change the unsigned arguments to the signed one, but let's don't
change pointers to the array notation.

Requested by:	pjd
2019-06-12 19:29:12 +00:00
Mariusz Zaborski
1808673cc4 geli: build warning fixes
Submitted by:	Aaron Prieger <aprieger@llnw.com>
Reviewed by:	sbruno
Differential Revision:	https://reviews.freebsd.org/D11068
2019-06-05 22:46:18 +00:00
Kirk McKusick
10519e1398 When using the destroy option to shut down a nop GEOM module, I/O
operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the module needs
to wait for the draining to happen. The waiting is done by adding
a g_nop_providergone() function to wait for the I/O operations to
finish up. This change is similar to change -r345758 made to the
memory-disk driver.

Submitted by: Chuck Silvers
Tested by:    Chuck Silvers
MFC after:    1 week
Sponsored by: Netflix
2019-05-25 00:07:49 +00:00
John Baldwin
5c420aae3b Add deprecation warnings for weaker algorithms to geli(4).
- Triple DES has been formally deprecated in Kerberos (RFC 8429)
  and is soon to be deprecated in IPsec (RFC 8221).
- Blowfish is deprecated.  FreeBSD doesn't support its successor
  (Twofish).
- MD5 is generally considered a weak digest that has known attacks.

geli refuses to create new volumes using these algorithms via 'geli
init'.  It also warns when attaching to existing volumes or creating
temporary volumes via 'geli onetime' .  The plan is to fully remove
support for these algorithms in FreeBSD 13.

Note that none of these algorithms have ever been the default
algorithm used by geli(8).  Users would have had to explicitly select
these algorithms when creating volumes in the past.

Reviewed by:	cem, delphij
MFC after:	3 days
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20344
2019-05-23 22:31:55 +00:00