Commit Graph

2423 Commits

Author SHA1 Message Date
Robert Wing
65c87a6c81 geom_dev: extend kevent support for geom dev
Add support for the following NOTE events:
    NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, NOTE_READ, and NOTE_WRITE.

Differential Revision:	https://reviews.freebsd.org/D34777
2022-04-28 08:40:13 -08:00
Warner Losh
347a8e93fe g_vfs_done: Only report ENXIO once
The contract with the lower layers is that once ENXIO is reported, all
further I/O to the device is not possible. This is reported when the
device departs for good or changes in some material manner out from
underneath the system. Since the lower layers terminate all pending I/O
when this is detected with ENXIO, reporting more than one provides no
extra value. ENXIO suppression done with atomics due to race described
in e8827f4094. It's on the error path and a rare event, so this won't affect
performance.

Sponsored by:		Netflix
Reviewed by:		mckusick, kib
Differential Revision:	https://reviews.freebsd.org/D35034
2022-04-24 14:01:33 -06:00
Warner Losh
e8827f4094 g_vfs_done: Report when we switch on ENXIO conversion
On the 0 -> 1 transition of sc_enxio_active, report that we're doing
this. This is a rare, but interesting, event. Convert to using atomics
to set this field to prevent a rare race:

    In CAM, when we invalidate a device, one thread (T1) will start the
    process in error processing called from *dadone
    (cam_periph_error). This routine will queue work to xpt_async_td
    (T2) and indicate to *dadone to call biodone(ENXIO) for the bio. T2
    wakes up and basically waits to acquire the periph lock. T2 will do
    so when T1 drops the periph lock just before T1's call to
    biodone. T2 acquires the lock and calls biodone(ENXIO) on all
    pending bios. These two threads will race and we could lose the
    printf or get two in rare cases. Since we only touch sc_enxio_active
    in an error path that's infrequent, the extra atomic traffic will be
    rare but will ensure robustness.

Sponsored by:		Netflix
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D35037
2022-04-24 14:01:08 -06:00
Warner Losh
f58385f3da geom_vfs: make sc_orphaned a bool
Differential Revision:	https://reviews.freebsd.org/D35036
Sponsored by:		Netflix
2022-04-24 14:01:08 -06:00
Mark Johnston
081b4452a7 geli: Add a chicken switch for unmapped I/O
We have a report of a panic in GELI that appears to go away when
unmapped I/O is disabled.  Add a tunable to make such investigations
easier in the future.  No functional change intended.

PR:		262894
Reviewed by:	asomers
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34944
2022-04-18 17:55:24 -04:00
Mitchell Horne
0a90043e63 Remove 12.x ABI compat for kernel dump ioctls
This code was marked gone_in(14), so it can now be removed.

The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.

Reviewed by:	markj, emaste
MFC after:	never
Differential Revision:	https://reviews.freebsd.org/D34914
2022-04-15 12:06:05 -03:00
Mitchell Horne
9c90bfcd31 Remove 11.x ABI compat for kernel dump ioctls
This code was marked gone_in(13), so its time has passed.

The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.

Reviewed by:	markj, emaste
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34913
2022-04-15 12:06:04 -03:00
Kirk McKusick
4710aa248b Avoid dereferencing a possibly null pointer.
Reported by: Coverity
CID:         1475868
2022-04-06 14:25:55 -07:00
Robert Wing
f3f6e0ebe9 geom_vinum: fix set but not used warnings 2022-04-04 13:23:47 -08:00
Robert Wing
8f7878e3e1 geom_eli: fix set but not used warning 2022-04-04 13:20:27 -08:00
Gordon Bergling
81ed3cae69 gpart(8): Fix two typos in source code comments
- s/partiton/partition/

MFC after:	3 days
2022-03-28 19:36:48 +02:00
Alexander Motin
7f16b501e2 GEOM: Introduce partial confxml API
Traditionally the GEOM's primary channel of information from kernel to
user-space was confxml, fetched by libgeom through kern.geom.confxml
sysctl.  It is convenient and informative, representing full state of
GEOM in a single XML document.  But problems start to arise on systems
with hundreds of disks, where the full confxml size reaches many
megabytes, taking significant time to first write it and then parse.

This patch introduces alternative solution, allowing to fetch much
smaller XML document, subset of the full confxml, limited to 64KB and
representing only one specified geom and optionally its parents.  It
uses existing GEOM control interface, extended with new "getxml" verb.
In case of any error, such as the buffer overflow, it just transparently
falls back to traditional full confxml.  This patch uses the new API in
user-space GEOM tools where it is possible.

Reviewed by:	imp
MFC after:	2 month
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D34529
2022-03-12 11:55:52 -05:00
Alexander Motin
dd7a5bc1e6 GEOM: Make G_F_CTLDUMP also dump result.
MFC after:	1 month
2022-03-07 14:41:47 -05:00
Alexander Motin
01b9c48b5d GEOM: Skip copyin() for GCTL_PARAM_WR parameters.
Kernel does not read those parameters, so copyin is pointless.

While there, replace some KASSERT()s with CTASSERT()s.

MFC after:	1 month
2022-03-07 11:12:25 -05:00
Warner Losh
094f1dc40e g_part: Allow attributes to be querried
Create g_part_getattr to allow gpart geoms to have their attributes queried.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D32782
2022-03-01 08:06:42 -07:00
Kirk McKusick
3cf2f812f5 Add casts to printf statements to keep armv6, armv7, and powerpc
builds happy.
2022-02-28 19:28:02 -08:00
Kirk McKusick
c7996ddf80 Create a new GEOM utility, gunion(8).
The gunion(8) utility is used to track changes to a read-only disk on
a writable disk. Logically, a writable disk is placed over a read-only
disk. Write requests are intercepted and stored on the writable
disk. Read requests are first checked to see if they have been
written on the top (writable disk) and if found are returned. If
they have not been written on the top disk, then they are read from
the lower disk.

The gunion(8) utility can be especially useful if you have a large
disk with a corrupted filesystem that you are unsure of how to
repair. You can use gunion(8) to place another disk over the corrupted
disk and then attempt to repair the filesystem. If the repair fails,
you can revert all the changes in the upper disk and be back to the
unchanged state of the lower disk thus allowing you to try another
approach to repairing it. If the repair is successful you can commit
all the writes recorded on the top disk to the lower disk.

Another use of the gunion(8) utility is to try out upgrades to your
system. Place the upper disk over the disk holding your filesystem
that is to be upgraded and then run the upgrade on it. If it works,
commit it; if it fails, revert the upgrade.

Further details can be found in the gunion(8) manual page.

Reviewed by: Chuck Silvers, kib (earlier version)
tested by:   Peter Holm
Differential Revision: https://reviews.freebsd.org/D32697
2022-02-28 16:36:08 -08:00
Kirk McKusick
c8cc568961 Provide an interface that allows GEOM modules to return multiple messages.
The gctl_error() function provides GEOM modules with the ability
to report only a single message. When running with the verbose
flag, commands that handle multiple devices may want to report a
message for each of the devices on which it operates. This commit
adds the gctl_msg() function that can be called multiple times
to post messages. When finished issuing messages, the application
must either call gctl_post_messages() or call gctl_error() to cause
the messages to be reported to the calling process.

Tested by:    Peter Holm
2022-02-19 21:33:02 -08:00
Kirk McKusick
85f7e9a4f0 In GEOM debugging output, show consumer for cloned and duplicated bio's.
When using bio's created by g_clone_bio() or g_duplicate_bio()
their consumer device (the device to which their I/O requests
are sent) is listed by the geom debugging facility as [unknown].
If available, this update lists the consumer associated with
the bio's parent.

MFC after:    2 weeks
Sponsored by: Netflix
2022-01-30 17:21:13 -08:00
Alexander Motin
67c58cd729 GEOM: Remove g_wait_sim.
It seems never been used since addition.
2022-01-29 22:12:43 -05:00
Alexander Motin
10ae42ccbd GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers.
All I/O requests through the taste consumers are synchronous, done
with g_read_data() and without any locks held.  It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after:	2 weeks
2022-01-29 21:59:03 -05:00
Peter Jeremy
afcd121024
geom_gate: Distinguish between classes of errors
The geom_gate API provides 2 distinct paths for exchanging error
details between the kernel and the userland client: Including an error
code in the g_gate_ctl_io structure passed in the ioctl(2) call or
having the ioctl(2) call return -1 with an error code in errno. The
latter reflects errors in the ioctl(2) call itself whilst the former
reflects errors within the geom_gate instance.

The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be
directed to the geom_gate instance and the wait can fail
(necessitating an error return) if the geom_gate instance is destroyed
or if the msleep(9) fails. The code previously treated both error
cases indentically: Returning ECANCELED as a geom_gate instance error
(which the ggatec treats as a fatal error).  Whilst this is the correct
behaviour if the geom_gate instance is destroyed, a msleep(9) failure
is unrelated to the geom_gate instance itself and should be reported
as an ioctl(2) "failure".  The distinction is important because
msleep(9) can return ERESTART, which means the system call should be
retried (and this will occur automatically as part of the generic
syscall return processing).

This change alters the msleep(9) handling to directly return the error
code from msleep(9), which ensures ERESTART is correctly handled,
rather than being treated as a fatal error.

Reviewed by:    Johannes Totz <jo@bruelltuete.com>
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D33996
2022-01-29 21:15:51 +11:00
Alexander Motin
29998bf2ac glabel: Set G_CF_DIRECT_SEND/RECEIVE for taste consumer.
All I/O requests through the taste consumer are synchronous, done
with g_read_data() and without any locks held.  It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after:	2 weeks
2022-01-28 14:22:41 -05:00
Alexander Motin
ffc1cc95e7 GEOM: Relax direct dispatch for GEOM threads.
The only cases when direct dispatch does not make sense is for I/O
submission from down thread and for completion from up thread.  In
all other cases, if both consumer and producer are OK about it, we
can save on context switches.

MFC after:	2 weeks
2022-01-28 14:21:21 -05:00
Alexander Motin
0d8cec7658 graid: Set G_CF_DIRECT_SEND for task consumer.
Unlike normal consumers all taste consumer I/O is synchronous, done
with g_read_data() and without any locks held.  It makes no sense to
delegate I/O submission to g_down thread.

This should remove number of context switches during disk retaste.

MFC after:	2 weeks
2022-01-28 11:09:30 -05:00
Mark Johnston
38da0c96dc geom: Assert that BIO_SPEEDUP BIOs have bio_data set to NULL
Like BIO_FLUSH, there is no reason for consumers to pass a BIO_SPEEDUP
request with non-NULL bio_data, so assert this.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-27 09:58:19 -05:00
Mark Johnston
a2dfffb989 shsec: Allocate data blocks only for BIO_READ/WRITE requests
In particular, there is no need to allocate a data block when passing
BIO_FLUSH requests to child providers, and g_io_request() asserts that
bp->bio_data == NULL for such requests.

PR:		255131
Reported and tested by:	nvass@gmx.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-27 09:56:07 -05:00
Andriy Gapon
5d5f44623e g_mirror: don't fail reads while losing next-to-last disk
I observed a situation where some read requests failed when a 2-way geom
mirror lost one disk.  The problem appears to be in the logic that skips
retrying a failed request when a mirror has only one active disk.
Generally, that makes sense.  But during a transition from two disks to
one it is possible that the request failed on the failing disk before it
was inactivated and, so, the remaining active disk is the disk that
should be tried.

This change adds an additional check to ensure that it was the (only)
active disk that was already tried.

Reviewed by:	mav
MFC after:	3 weeks
2022-01-27 13:22:52 +02:00
Ed Maste
9c296a2105 geom: Add HiFive boot partitions
As documented in the HiFive Unmatched Software Reference Manual.

Reviewed by:	imp, mhorne
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34010
2022-01-26 10:54:45 -05:00
Mark Johnston
d91d2b513e geom: Handle partial I/O in g_{read,write,delete}_data()
These routines are used internally by GEOM to dispatch I/O requests to a
provider, typically for tasting or for updating GEOM class metadata
blocks.

These routines assumed that partial I/O did not occur without setting
BIO_ERROR, but this is possible in at least two cases:
- Some or all of the I/O range is beyond the provider's mediasize.
  In this scenario g_io_check() truncates the bounds of the request
  before it is handed to the target provider.
- A read from vnode-backed md(4) device returns EOF (the backing vnode
  is allowed to be smaller than the device itself) or partial vnode I/O
  occurs.
In these scenarios g_read_data() could return a partially uninitialized
buffer.  Many consumers are not affected by the first case, since the
offsets used for provider metadata or tasting are relative to the
provider's mediasize, but in some cases metadata is read at fixed
offsets, such as when searching for a UFS superblock using the offsets
defined by SBLOCKSEARCH.

Thus, modify the routines to explicitly check for a non-zero residual
and return EIO in that case.  Remove a related check from the
DIOCGDELETE ioctl handler, it is handled within g_delete_data() now.

Reviewed by:	mav, imp, kib
Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31293
2022-01-20 08:29:39 -05:00
Robert Wing
a50e92cc20 geom: add kqfilter support for geom dev
The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized. Support for other NOTE_* events to follow.

Reviewed by:	kib, jhb
Differential Revision:	https://reviews.freebsd.org/D33402
2022-01-18 10:54:59 -09:00
John Baldwin
d61effd38b Use G_ELI_IVKEYLEN as the size of IV in the user test code.
IVs are not the size of keys as a general case.  Most often they are
the size of a single block.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33885
2022-01-13 17:22:06 -08:00
Konstantin Belousov
9f4073d446 geom label msdosfs: sanity check BPB before using it for io request
It must be greater than zero, and be multiple of the device block size.

In collaboration with:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33721
2022-01-08 05:41:44 +02:00
Alan Somers
f284bed200 geom_gate: ensure readprov is null-terminated
With crafted input to the G_GATE_CMD_CREATE ioctl, geom_gate can be made
to print kernel memory to the system console, potentially revealing
sensitive data from whatever was previously in that memory page.

But but but: this is a case of the sys admin misconfiguring, and you'd
need root privileges to do this.

Submitted By:	Johannes Totz <jo@bruelltuete.com>
MFC after:	2 weeks
Reviewed By:	asomers
Differential Revision: https://reviews.freebsd.org/D31727
2022-01-02 18:01:23 -07:00
John Baldwin
0ff783dc15 sys/geom: Use C99 fixed-width integer types.
No functional change.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D33635
2021-12-28 09:41:51 -08:00
Alexander Motin
f4bf48c25c GEOM: Minor polishing in geom_event.
- Remove timeouts from msleep()'s.  Those should always be woken up.
 - Move wakeup() under the lock to not call on possibly freed pointer.
 - Remove some dead code.

MFC after:	2 weeks
2021-12-27 21:01:08 -05:00
Edward Tomasz Napierala
739a9c51b0 geom(4): Fix some of the "set but not used" warnings
The few I've left in place look like potential bugs.

Sponsored By:	EPSRC
2021-12-18 11:42:34 +00:00
Mateusz Guzik
8ad5b9498e Revert "geom_bde: plug set-but-not-used vars"
The commit at hand happens to break userspace build as the header ins
included by sbin/gbde/gbde.c and the __diagused macro is not provided to
userspace.

Revert until this gets sorted out.

This reverts commit 26e837e2d4.
2021-12-09 19:23:05 +00:00
Mateusz Guzik
2cc5a480a6 geom_raid3: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:08:03 +00:00
Mateusz Guzik
c904812018 geom_eli: mostly plug set-but-not-unused vars
The remaining case is an ignored error.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:05:06 +00:00
Mateusz Guzik
0d81fba680 geom_mirror: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:00:27 +00:00
Mateusz Guzik
26e837e2d4 geom_bde: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 17:53:48 +00:00
Scott Long
2d5d242406 Fix "set but not used" for geom
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-03 23:40:24 -07:00
Mitchell Horne
0d2224733e Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.

Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.

PR:		259157
Reviewed by:	mav, kib, markj (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32580
2021-11-30 11:15:56 -04:00
Mateusz Guzik
b74fdaaf1c geom_multipath: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-25 11:31:50 +00:00
Mateusz Guzik
cb2bfd3ecb geom_journal: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-24 21:21:59 +00:00
Alexander Motin
06bd74e1e3 GEOM: Switch g_io_deliver() locking from cp to pp.
Single provider may have multiple consumers, and locking one of consumers
is not sufficient to protect the provider.  Though the only part of the
provider this locking protects now is its statistics.

Reported by:	Arka Sharma <arka.sw1988@gmail.com>
MFC after:	2 weeks
2021-11-21 18:50:59 -05:00
Wuyang Chung
9cb485d18f geom: Remove g_class.config
g_class.config is write only, remove it.
2021-11-18 23:17:07 -07:00
Konstantin Belousov
4fdc5b8494 g_vfs_close(): vp is unused
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-18 05:02:59 +02:00
Konstantin Belousov
c34a5148e8 ffs: fix newly introduced LOR between mntfs vnode lock and topology lock
The mntfs vnode lock should be before topology, as established in
ffs_mountfs().  Extend the locked region in ffs_unmount().

Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33013
2021-11-16 20:01:31 +02:00