Commit Graph

2406 Commits

Author SHA1 Message Date
Kirk McKusick
c8cc568961 Provide an interface that allows GEOM modules to return multiple messages.
The gctl_error() function provides GEOM modules with the ability
to report only a single message. When running with the verbose
flag, commands that handle multiple devices may want to report a
message for each of the devices on which it operates. This commit
adds the gctl_msg() function that can be called multiple times
to post messages. When finished issuing messages, the application
must either call gctl_post_messages() or call gctl_error() to cause
the messages to be reported to the calling process.

Tested by:    Peter Holm
2022-02-19 21:33:02 -08:00
Kirk McKusick
85f7e9a4f0 In GEOM debugging output, show consumer for cloned and duplicated bio's.
When using bio's created by g_clone_bio() or g_duplicate_bio()
their consumer device (the device to which their I/O requests
are sent) is listed by the geom debugging facility as [unknown].
If available, this update lists the consumer associated with
the bio's parent.

MFC after:    2 weeks
Sponsored by: Netflix
2022-01-30 17:21:13 -08:00
Alexander Motin
67c58cd729 GEOM: Remove g_wait_sim.
It seems never been used since addition.
2022-01-29 22:12:43 -05:00
Alexander Motin
10ae42ccbd GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers.
All I/O requests through the taste consumers are synchronous, done
with g_read_data() and without any locks held.  It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after:	2 weeks
2022-01-29 21:59:03 -05:00
Peter Jeremy
afcd121024
geom_gate: Distinguish between classes of errors
The geom_gate API provides 2 distinct paths for exchanging error
details between the kernel and the userland client: Including an error
code in the g_gate_ctl_io structure passed in the ioctl(2) call or
having the ioctl(2) call return -1 with an error code in errno. The
latter reflects errors in the ioctl(2) call itself whilst the former
reflects errors within the geom_gate instance.

The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be
directed to the geom_gate instance and the wait can fail
(necessitating an error return) if the geom_gate instance is destroyed
or if the msleep(9) fails. The code previously treated both error
cases indentically: Returning ECANCELED as a geom_gate instance error
(which the ggatec treats as a fatal error).  Whilst this is the correct
behaviour if the geom_gate instance is destroyed, a msleep(9) failure
is unrelated to the geom_gate instance itself and should be reported
as an ioctl(2) "failure".  The distinction is important because
msleep(9) can return ERESTART, which means the system call should be
retried (and this will occur automatically as part of the generic
syscall return processing).

This change alters the msleep(9) handling to directly return the error
code from msleep(9), which ensures ERESTART is correctly handled,
rather than being treated as a fatal error.

Reviewed by:    Johannes Totz <jo@bruelltuete.com>
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D33996
2022-01-29 21:15:51 +11:00
Alexander Motin
29998bf2ac glabel: Set G_CF_DIRECT_SEND/RECEIVE for taste consumer.
All I/O requests through the taste consumer are synchronous, done
with g_read_data() and without any locks held.  It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after:	2 weeks
2022-01-28 14:22:41 -05:00
Alexander Motin
ffc1cc95e7 GEOM: Relax direct dispatch for GEOM threads.
The only cases when direct dispatch does not make sense is for I/O
submission from down thread and for completion from up thread.  In
all other cases, if both consumer and producer are OK about it, we
can save on context switches.

MFC after:	2 weeks
2022-01-28 14:21:21 -05:00
Alexander Motin
0d8cec7658 graid: Set G_CF_DIRECT_SEND for task consumer.
Unlike normal consumers all taste consumer I/O is synchronous, done
with g_read_data() and without any locks held.  It makes no sense to
delegate I/O submission to g_down thread.

This should remove number of context switches during disk retaste.

MFC after:	2 weeks
2022-01-28 11:09:30 -05:00
Mark Johnston
38da0c96dc geom: Assert that BIO_SPEEDUP BIOs have bio_data set to NULL
Like BIO_FLUSH, there is no reason for consumers to pass a BIO_SPEEDUP
request with non-NULL bio_data, so assert this.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-27 09:58:19 -05:00
Mark Johnston
a2dfffb989 shsec: Allocate data blocks only for BIO_READ/WRITE requests
In particular, there is no need to allocate a data block when passing
BIO_FLUSH requests to child providers, and g_io_request() asserts that
bp->bio_data == NULL for such requests.

PR:		255131
Reported and tested by:	nvass@gmx.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-01-27 09:56:07 -05:00
Andriy Gapon
5d5f44623e g_mirror: don't fail reads while losing next-to-last disk
I observed a situation where some read requests failed when a 2-way geom
mirror lost one disk.  The problem appears to be in the logic that skips
retrying a failed request when a mirror has only one active disk.
Generally, that makes sense.  But during a transition from two disks to
one it is possible that the request failed on the failing disk before it
was inactivated and, so, the remaining active disk is the disk that
should be tried.

This change adds an additional check to ensure that it was the (only)
active disk that was already tried.

Reviewed by:	mav
MFC after:	3 weeks
2022-01-27 13:22:52 +02:00
Ed Maste
9c296a2105 geom: Add HiFive boot partitions
As documented in the HiFive Unmatched Software Reference Manual.

Reviewed by:	imp, mhorne
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34010
2022-01-26 10:54:45 -05:00
Mark Johnston
d91d2b513e geom: Handle partial I/O in g_{read,write,delete}_data()
These routines are used internally by GEOM to dispatch I/O requests to a
provider, typically for tasting or for updating GEOM class metadata
blocks.

These routines assumed that partial I/O did not occur without setting
BIO_ERROR, but this is possible in at least two cases:
- Some or all of the I/O range is beyond the provider's mediasize.
  In this scenario g_io_check() truncates the bounds of the request
  before it is handed to the target provider.
- A read from vnode-backed md(4) device returns EOF (the backing vnode
  is allowed to be smaller than the device itself) or partial vnode I/O
  occurs.
In these scenarios g_read_data() could return a partially uninitialized
buffer.  Many consumers are not affected by the first case, since the
offsets used for provider metadata or tasting are relative to the
provider's mediasize, but in some cases metadata is read at fixed
offsets, such as when searching for a UFS superblock using the offsets
defined by SBLOCKSEARCH.

Thus, modify the routines to explicitly check for a non-zero residual
and return EIO in that case.  Remove a related check from the
DIOCGDELETE ioctl handler, it is handled within g_delete_data() now.

Reviewed by:	mav, imp, kib
Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31293
2022-01-20 08:29:39 -05:00
Robert Wing
a50e92cc20 geom: add kqfilter support for geom dev
The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized. Support for other NOTE_* events to follow.

Reviewed by:	kib, jhb
Differential Revision:	https://reviews.freebsd.org/D33402
2022-01-18 10:54:59 -09:00
John Baldwin
d61effd38b Use G_ELI_IVKEYLEN as the size of IV in the user test code.
IVs are not the size of keys as a general case.  Most often they are
the size of a single block.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33885
2022-01-13 17:22:06 -08:00
Konstantin Belousov
9f4073d446 geom label msdosfs: sanity check BPB before using it for io request
It must be greater than zero, and be multiple of the device block size.

In collaboration with:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33721
2022-01-08 05:41:44 +02:00
Alan Somers
f284bed200 geom_gate: ensure readprov is null-terminated
With crafted input to the G_GATE_CMD_CREATE ioctl, geom_gate can be made
to print kernel memory to the system console, potentially revealing
sensitive data from whatever was previously in that memory page.

But but but: this is a case of the sys admin misconfiguring, and you'd
need root privileges to do this.

Submitted By:	Johannes Totz <jo@bruelltuete.com>
MFC after:	2 weeks
Reviewed By:	asomers
Differential Revision: https://reviews.freebsd.org/D31727
2022-01-02 18:01:23 -07:00
John Baldwin
0ff783dc15 sys/geom: Use C99 fixed-width integer types.
No functional change.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D33635
2021-12-28 09:41:51 -08:00
Alexander Motin
f4bf48c25c GEOM: Minor polishing in geom_event.
- Remove timeouts from msleep()'s.  Those should always be woken up.
 - Move wakeup() under the lock to not call on possibly freed pointer.
 - Remove some dead code.

MFC after:	2 weeks
2021-12-27 21:01:08 -05:00
Edward Tomasz Napierala
739a9c51b0 geom(4): Fix some of the "set but not used" warnings
The few I've left in place look like potential bugs.

Sponsored By:	EPSRC
2021-12-18 11:42:34 +00:00
Mateusz Guzik
8ad5b9498e Revert "geom_bde: plug set-but-not-used vars"
The commit at hand happens to break userspace build as the header ins
included by sbin/gbde/gbde.c and the __diagused macro is not provided to
userspace.

Revert until this gets sorted out.

This reverts commit 26e837e2d4.
2021-12-09 19:23:05 +00:00
Mateusz Guzik
2cc5a480a6 geom_raid3: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:08:03 +00:00
Mateusz Guzik
c904812018 geom_eli: mostly plug set-but-not-unused vars
The remaining case is an ignored error.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:05:06 +00:00
Mateusz Guzik
0d81fba680 geom_mirror: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:00:27 +00:00
Mateusz Guzik
26e837e2d4 geom_bde: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 17:53:48 +00:00
Scott Long
2d5d242406 Fix "set but not used" for geom
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-03 23:40:24 -07:00
Mitchell Horne
0d2224733e Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.

Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.

PR:		259157
Reviewed by:	mav, kib, markj (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32580
2021-11-30 11:15:56 -04:00
Mateusz Guzik
b74fdaaf1c geom_multipath: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-25 11:31:50 +00:00
Mateusz Guzik
cb2bfd3ecb geom_journal: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-24 21:21:59 +00:00
Alexander Motin
06bd74e1e3 GEOM: Switch g_io_deliver() locking from cp to pp.
Single provider may have multiple consumers, and locking one of consumers
is not sufficient to protect the provider.  Though the only part of the
provider this locking protects now is its statistics.

Reported by:	Arka Sharma <arka.sw1988@gmail.com>
MFC after:	2 weeks
2021-11-21 18:50:59 -05:00
Wuyang Chung
9cb485d18f geom: Remove g_class.config
g_class.config is write only, remove it.
2021-11-18 23:17:07 -07:00
Konstantin Belousov
4fdc5b8494 g_vfs_close(): vp is unused
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-18 05:02:59 +02:00
Konstantin Belousov
c34a5148e8 ffs: fix newly introduced LOR between mntfs vnode lock and topology lock
The mntfs vnode lock should be before topology, as established in
ffs_mountfs().  Extend the locked region in ffs_unmount().

Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33013
2021-11-16 20:01:31 +02:00
Kirk McKusick
728f2c6131 Suppress UFS/FFS superblock check-hash failure messages when identifying
disk labels.

When the geom label subsystem is checking labels to discover if they
are UFS/FFS filesystems, do not print a kernel error message if a
superblock is found with a check-hash error. That issue is best
handled later if an attempt is made to actually use the filesystem.

Sponsored by: Netflix
2021-11-15 09:26:21 -08:00
Konstantin Belousov
8db7d16526 geom_vfs: lock devvp in g_vfs_close()
It is needed for g_vfs_close() invalidating the buffers.  We rely on the
vnode lock for correctness.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:00:13 +02:00
Gordon Bergling
9d2e51884e gjournal(8): Fix a typo in a source code comment
- s/writting/writing/

MFC after:	3 days
2021-11-03 17:14:00 +01:00
Warner Losh
edfbbfd541 gpart: Move MBR efimedia reporting to a separate routine
Move the efimedia reporting to g_part_mbr_efimedia and use that from
g_part_mbr_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32781
2021-11-02 17:09:17 -06:00
Warner Losh
e3ab141fda gpart: Move GPT efimedia reporting to a separate routine
Move the efimedia reporting to g_part_gpt_efimedia and use that from
g_part_gpt_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32780
2021-11-02 17:09:17 -06:00
Mateusz Guzik
627d5d1966 geli: eli data -> eli_data for consistency with other geom classes
PR:	259392
Reported by:	dewayne@heuristicsystems.com.au
MFC after:	1 week
2021-10-31 20:36:51 +00:00
Jessica Clarke
63d24336fd Fix off-by-one error in msdosfs FAT32 volume label copying
I dropped the + 1 from the other two instances in each file but failed
to do so for this one, resulting in a more egregious buffer overread
than the one I was fixing (since the read character ended up in the
output if there was space).

Reported by:	Jenkins
Fixes:	34fb1c133c ("Fix intra-object buffer overread for labeled msdosfs volumes")
2021-10-28 01:01:00 +01:00
Jessica Clarke
34fb1c133c Fix intra-object buffer overread for labeled msdosfs volumes
Volume labels, like directory entries, are padded with spaces and so
have no NUL terminator. Whilst the MIN for the dsize argument to strlcpy
ensures that the copy does not overflow the destination, strlcpy is
defined to return the number of characters in the source string,
regardless of the provided dsize, and so keeps reading until it finds a
NUL, which likely exists somewhere within the following fields, but On
CHERI with the subobject bounds enabled in the compiler this buffer
overread will be detected and trap with a bounds violation.

Found by:	CHERI
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D32579
2021-10-27 18:38:37 +01:00
Mark Johnston
f0a08fa9f5 geom_label: Add more validation for NTFS volume tasting
- Ensure that the computed MFT record size isn't negative or larger than
  maxphys before trying to read $Volume.
- Guard against truncated records in volume metadata.
- Ensure that the record length is large enough to contain the volume
  name.
- Verify that the (UTF-16-encoded) volume name's length is a multiple of
  two.

PR:		258833, 258914
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-10-04 18:15:06 -04:00
Gleb Smirnoff
b984d153e0 Don't set GELI UMA zone as UMA_ZONE_NOFREE.
That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388e. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a the UMA is fixed not to drain into reserves.

Discussed with:	jtl, markj
Fixes:		2dbc9a388e
PR:		258787
2021-10-01 10:31:17 -07:00
Gleb Smirnoff
2dbc9a388e Fix memory deadlock when GELI partition is used for swap.
When we get low on memory, the VM system tries to free some by swapping
pages. However, if we are so low on free pages that GELI allocations block,
then the swapout operation cannot complete. This keeps the VM system from
being able to free enough memory so the allocation can complete.

To alleviate this, keep a UMA pool at the GELI layer which is used for data
buffer allocation in the fast path, and reserve some of that memory for swap
operations. If an IO operation is a swap, then use the reserved memory. If
the allocation still fails, return ENOMEM instead of blocking.

For non-swap allocations, change the default to using M_NOWAIT. In general,
this *should* be better, since it gives upper layers a signal of the memory
pressure and a chance to manage their failure strategy appropriately. However,
a user can set the kern.geom.eli.blocking_malloc sysctl/tunable to restore
the previous M_WAITOK strategy.

Submitted by:		jtl
Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:52 -07:00
Gleb Smirnoff
c6213beff4 Add flag BIO_SWAP to mark IOs that are associated with swap.
Submitted by:		jtl
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:51 -07:00
Mark Johnston
5402baa5b5 g_label: Handle small sector sizes when tasting
Make sure that the provider sector size is large enough to contain a
valid label before trying to read it.  We performed this check already
for most label types, but not for several filesystem labels.

Reported by:	syzbot+f52918174cdf193ae29c@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-07 11:19:29 -04:00
Mark Johnston
9e9ba9c73d graid: Avoid tasting devices with small sector sizes
The RAID metadata parsers effectively assume a sector size of 512 bytes
or larger, but md(4) devices can be created with a sector size that's
any power of 2.  Add some seatbelts to graid tasting routines to ensure
that the requested sector(s) are large enough for the device to
plausibly contain RAID metadata.

Reported by:	syzbot+f43583c9bf8357c8b56f@syzkaller.appspotmail.com
Reported by:	syzbot+537dd9f22b91b698e161@syzkaller.appspotmail.com
Reported by:	syzbot+51509dd48871c57c6e47@syzkaller.appspotmail.com
Reported by:	syzbot+c882a31037ea2a54ff63@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-08-31 17:09:52 -04:00
Gordon Bergling
5bdf58e196 Fix some common typos in source code comments
- s/priviledged/privileged/
- s/funtion/function/
- s/doens't/doesn't/
- s/sychronization/synchronization/

MFC after:	3 days
2021-08-28 18:57:23 +02:00
Mark Johnston
645b7efd49 geom_disk: Add KMSAN checks
- In g_disk_start(), verify that the data to be written is initialized
  according to KMSAN shadow state.
- In g_disk_done(), verify that the block driver updated shadow state as
  expected, so as to catch sources of false positives early.

Sponsored by:	The FreeBSD Foundation
2021-08-11 16:33:41 -04:00
Alexander Motin
c2da954203 geom(4): Mark all sysctls as CTLFLAG_MPSAFE.
This code does not use Giant lock for very long time.

MFC after:	2 weeks
2021-08-10 20:18:46 -04:00