Commit Graph

2386 Commits

Author SHA1 Message Date
Mateusz Guzik
8ad5b9498e Revert "geom_bde: plug set-but-not-used vars"
The commit at hand happens to break userspace build as the header ins
included by sbin/gbde/gbde.c and the __diagused macro is not provided to
userspace.

Revert until this gets sorted out.

This reverts commit 26e837e2d4.
2021-12-09 19:23:05 +00:00
Mateusz Guzik
2cc5a480a6 geom_raid3: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:08:03 +00:00
Mateusz Guzik
c904812018 geom_eli: mostly plug set-but-not-unused vars
The remaining case is an ignored error.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:05:06 +00:00
Mateusz Guzik
0d81fba680 geom_mirror: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 18:00:27 +00:00
Mateusz Guzik
26e837e2d4 geom_bde: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 17:53:48 +00:00
Scott Long
2d5d242406 Fix "set but not used" for geom
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-03 23:40:24 -07:00
Mitchell Horne
0d2224733e Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.

Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.

PR:		259157
Reviewed by:	mav, kib, markj (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32580
2021-11-30 11:15:56 -04:00
Mateusz Guzik
b74fdaaf1c geom_multipath: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-25 11:31:50 +00:00
Mateusz Guzik
cb2bfd3ecb geom_journal: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-24 21:21:59 +00:00
Alexander Motin
06bd74e1e3 GEOM: Switch g_io_deliver() locking from cp to pp.
Single provider may have multiple consumers, and locking one of consumers
is not sufficient to protect the provider.  Though the only part of the
provider this locking protects now is its statistics.

Reported by:	Arka Sharma <arka.sw1988@gmail.com>
MFC after:	2 weeks
2021-11-21 18:50:59 -05:00
Wuyang Chung
9cb485d18f geom: Remove g_class.config
g_class.config is write only, remove it.
2021-11-18 23:17:07 -07:00
Konstantin Belousov
4fdc5b8494 g_vfs_close(): vp is unused
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-18 05:02:59 +02:00
Konstantin Belousov
c34a5148e8 ffs: fix newly introduced LOR between mntfs vnode lock and topology lock
The mntfs vnode lock should be before topology, as established in
ffs_mountfs().  Extend the locked region in ffs_unmount().

Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33013
2021-11-16 20:01:31 +02:00
Kirk McKusick
728f2c6131 Suppress UFS/FFS superblock check-hash failure messages when identifying
disk labels.

When the geom label subsystem is checking labels to discover if they
are UFS/FFS filesystems, do not print a kernel error message if a
superblock is found with a check-hash error. That issue is best
handled later if an attempt is made to actually use the filesystem.

Sponsored by: Netflix
2021-11-15 09:26:21 -08:00
Konstantin Belousov
8db7d16526 geom_vfs: lock devvp in g_vfs_close()
It is needed for g_vfs_close() invalidating the buffers.  We rely on the
vnode lock for correctness.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:00:13 +02:00
Gordon Bergling
9d2e51884e gjournal(8): Fix a typo in a source code comment
- s/writting/writing/

MFC after:	3 days
2021-11-03 17:14:00 +01:00
Warner Losh
edfbbfd541 gpart: Move MBR efimedia reporting to a separate routine
Move the efimedia reporting to g_part_mbr_efimedia and use that from
g_part_mbr_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32781
2021-11-02 17:09:17 -06:00
Warner Losh
e3ab141fda gpart: Move GPT efimedia reporting to a separate routine
Move the efimedia reporting to g_part_gpt_efimedia and use that from
g_part_gpt_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32780
2021-11-02 17:09:17 -06:00
Mateusz Guzik
627d5d1966 geli: eli data -> eli_data for consistency with other geom classes
PR:	259392
Reported by:	dewayne@heuristicsystems.com.au
MFC after:	1 week
2021-10-31 20:36:51 +00:00
Jessica Clarke
63d24336fd Fix off-by-one error in msdosfs FAT32 volume label copying
I dropped the + 1 from the other two instances in each file but failed
to do so for this one, resulting in a more egregious buffer overread
than the one I was fixing (since the read character ended up in the
output if there was space).

Reported by:	Jenkins
Fixes:	34fb1c133c ("Fix intra-object buffer overread for labeled msdosfs volumes")
2021-10-28 01:01:00 +01:00
Jessica Clarke
34fb1c133c Fix intra-object buffer overread for labeled msdosfs volumes
Volume labels, like directory entries, are padded with spaces and so
have no NUL terminator. Whilst the MIN for the dsize argument to strlcpy
ensures that the copy does not overflow the destination, strlcpy is
defined to return the number of characters in the source string,
regardless of the provided dsize, and so keeps reading until it finds a
NUL, which likely exists somewhere within the following fields, but On
CHERI with the subobject bounds enabled in the compiler this buffer
overread will be detected and trap with a bounds violation.

Found by:	CHERI
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D32579
2021-10-27 18:38:37 +01:00
Mark Johnston
f0a08fa9f5 geom_label: Add more validation for NTFS volume tasting
- Ensure that the computed MFT record size isn't negative or larger than
  maxphys before trying to read $Volume.
- Guard against truncated records in volume metadata.
- Ensure that the record length is large enough to contain the volume
  name.
- Verify that the (UTF-16-encoded) volume name's length is a multiple of
  two.

PR:		258833, 258914
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-10-04 18:15:06 -04:00
Gleb Smirnoff
b984d153e0 Don't set GELI UMA zone as UMA_ZONE_NOFREE.
That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388e. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a the UMA is fixed not to drain into reserves.

Discussed with:	jtl, markj
Fixes:		2dbc9a388e
PR:		258787
2021-10-01 10:31:17 -07:00
Gleb Smirnoff
2dbc9a388e Fix memory deadlock when GELI partition is used for swap.
When we get low on memory, the VM system tries to free some by swapping
pages. However, if we are so low on free pages that GELI allocations block,
then the swapout operation cannot complete. This keeps the VM system from
being able to free enough memory so the allocation can complete.

To alleviate this, keep a UMA pool at the GELI layer which is used for data
buffer allocation in the fast path, and reserve some of that memory for swap
operations. If an IO operation is a swap, then use the reserved memory. If
the allocation still fails, return ENOMEM instead of blocking.

For non-swap allocations, change the default to using M_NOWAIT. In general,
this *should* be better, since it gives upper layers a signal of the memory
pressure and a chance to manage their failure strategy appropriately. However,
a user can set the kern.geom.eli.blocking_malloc sysctl/tunable to restore
the previous M_WAITOK strategy.

Submitted by:		jtl
Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:52 -07:00
Gleb Smirnoff
c6213beff4 Add flag BIO_SWAP to mark IOs that are associated with swap.
Submitted by:		jtl
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:51 -07:00
Mark Johnston
5402baa5b5 g_label: Handle small sector sizes when tasting
Make sure that the provider sector size is large enough to contain a
valid label before trying to read it.  We performed this check already
for most label types, but not for several filesystem labels.

Reported by:	syzbot+f52918174cdf193ae29c@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-07 11:19:29 -04:00
Mark Johnston
9e9ba9c73d graid: Avoid tasting devices with small sector sizes
The RAID metadata parsers effectively assume a sector size of 512 bytes
or larger, but md(4) devices can be created with a sector size that's
any power of 2.  Add some seatbelts to graid tasting routines to ensure
that the requested sector(s) are large enough for the device to
plausibly contain RAID metadata.

Reported by:	syzbot+f43583c9bf8357c8b56f@syzkaller.appspotmail.com
Reported by:	syzbot+537dd9f22b91b698e161@syzkaller.appspotmail.com
Reported by:	syzbot+51509dd48871c57c6e47@syzkaller.appspotmail.com
Reported by:	syzbot+c882a31037ea2a54ff63@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-08-31 17:09:52 -04:00
Gordon Bergling
5bdf58e196 Fix some common typos in source code comments
- s/priviledged/privileged/
- s/funtion/function/
- s/doens't/doesn't/
- s/sychronization/synchronization/

MFC after:	3 days
2021-08-28 18:57:23 +02:00
Mark Johnston
645b7efd49 geom_disk: Add KMSAN checks
- In g_disk_start(), verify that the data to be written is initialized
  according to KMSAN shadow state.
- In g_disk_done(), verify that the block driver updated shadow state as
  expected, so as to catch sources of false positives early.

Sponsored by:	The FreeBSD Foundation
2021-08-11 16:33:41 -04:00
Alexander Motin
c2da954203 geom(4): Mark all sysctls as CTLFLAG_MPSAFE.
This code does not use Giant lock for very long time.

MFC after:	2 weeks
2021-08-10 20:18:46 -04:00
John Baldwin
419d406e4e geom_vfs: Pre-allocate event for g_vfs_destroy.
When an active g_vfs is orphaned due to an underlying disk going away
the destroy is deferred until the filesystem is unmounted in
g_vfs_done().  However, g_vfs_done() is invoked from a non-sleepable
context and cannot use M_WAITOK to allocate the event.  Instead,
allocate the event in g_vfs_orphan() and save it in the softc to be
retrieved by the last call to g_vfs_done().

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31354
2021-07-29 17:09:23 -07:00
John Baldwin
5b5d78897c Use a more specific type for geom_disk.d_event.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D31353
2021-07-29 16:34:46 -07:00
Warner Losh
47aeda7b70 geom_disk: use a preallocated geom_event for disk destruction.
Preallocate a geom_event (using the new geom_alloc_event) when we create
a disk. When we create the disk, we're going to be in a sleepable
context, so we can always allocate this extra bit of memory. Then use
this preallocated memory to free the disk. CAM can try to free the disk
from an unsleepable context if there was I/O outstanding when the disk
was destroyted (say because the SIM said it had gone away). The I/O
context isn't sleepable. Rather than trying to invent a retry mechanism
and making sure all the other geom_disk consumers did it properly,
preallocating the event ensure that the geom_disk will be properly torn
down, even when there's memory pressure when the disk departs.

Reviewd by:		jhb
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30544
2021-07-23 18:08:52 -06:00
Warner Losh
380710a5c8 geom: create an API to allocate events, and use that storage to send them
g_alloc_event will allocate storage for an opaque event. g_post_event_ep
can use memory returned by g_alloc_event to send an event from a context
that might not be able to allocate the event. Occasionally, we can
alloate memory when we create an object, but not while we're destroy
it. This allows one to allocate at creation time memory to use when
destorying the object.

Reviewed by:		jhb
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30544
2021-07-23 18:08:45 -06:00
Jessica Clarke
e77ef47d36 geom_label: Partially reinstate old sysinstall(8) workaround
This partially reverts commit af433832f7.
Since such bogus disklabels still exist in the wild, we now probe for a
disklabel to decide whether to ignore the UFS partition or not; if there
is a label then we use the old behaviour, and if there isn't one then we
use the new behaviour.

Reviewed by:	cy, mckusick
Differential Revision:	https://reviews.freebsd.org/D31068
2021-07-21 02:51:25 +01:00
Mark Johnston
0fcafe8516 eli: Zero pad bytes that arise when certain auth algorithms are used
When authentication is configured, GELI ensures that the amount of data
per sector is a multiple of 16 bytes.  This is done in
eli_metadata_softc().  When the digest size is not a multiple of 16
bytes, this leaves some extra pad bytes at the end of every sector, and
they were not being zeroed before being written to disk.  In particular,
this happens with the HMAC/SHA1, HMAC/RIPEMD160 and HMAC/SHA384 data
authentication algorithms.

This change ensures that they are zeroed before being written to disk.

Reported by:	KMSAN
Reviewed by:	delphij, asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31170
2021-07-15 12:23:04 -04:00
Mark Johnston
39552dff7b graid3: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
graid3 metadata.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:46:02 -04:00
Mark Johnston
0f09ab89cc gconcat: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
gconcat metadata.  Also make sure to zero the full block buffer before
encoding the metadata and writing.

Fix some style bugs in g_concat_write_metadata() while here.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:59 -04:00
Mark Johnston
7f053a44ae gmirror: Zero the metadata block before writing
The mirror metadata fields contain string buffers and pad bytes, neither
were being zeroed before metadata was written to disk.  Also, the
metadata structure is smaller than the sector size, and in one case
gmirror was failing to zero-fill the full buffer before writing.

Fix these problems by pre-zeroing the metadata structure and the sector
buffer.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:57 -04:00
Jessica Clarke
af433832f7 geom_label: Remove an old sysinstall(8) workaround
We removed sysinstall(8) back in 2011, so this workaround should be long
since unnecessary. This workaround can end up breaking cases that are
hit in the real world, such as dd'ing a small pre-built disk image to a
large partition that you intend to grow on first boot and uses a UFS
disk label for / in its /etc/fstab (as the only reliable thing a raw UFS
image can reference).

Reviewed by:	imp, mckusick
Differential Revision:	https://reviews.freebsd.org/D30825
2021-07-05 16:15:32 +01:00
Noah Bergbauer
d575e81fbc gconcat: Implement new online append feature
Implement the "gconcat append" command which can be used
to append a disk to the end of an existing gconcat device
without unmounting.

If the gconcat device is using the "automatic" method, i.e.,
stores metadata on the devices, new metadata is written
to all existing components, as well as to the newly added one.

Pull Request:	https://github.com/freebsd/freebsd-src/pull/472
Reviewed by:	imp@
2021-06-14 11:42:03 -06:00
Noah Bergbauer
56fd97660a gconcat: Add new lock to allow modifications to the disk list in preparation for online append
In addition, rename existing sc_lock to sc_append_lock

Reviewed by:		imp@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/447
Sponsored by:		Netflix
2021-06-02 15:59:25 -06:00
Noah Bergbauer
e61e072f3b gconcat: Switch array to TAILQ to prepare for online append
Reviewed by:		imp@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/447
Sponsored by:		Netflix
2021-06-02 15:50:27 -06:00
Alan Somers
420dbe763f gmultipath: make physpath distinct from the underlying providers'
zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot.  Currently gmultipath passes through its underlying providers'
physical path attribute.  That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk.  That would
be bad.

This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.

Sponsored by:	Axcient
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D29941
2021-05-06 12:32:27 -06:00
Mark Johnston
2f1cfb7f63 gmirror: Pre-allocate the timeout event structure
We can't call malloc(M_WAITOK) in a callout handler.

Reviewed by:	imp
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29223
2021-03-11 15:45:15 -05:00
Mark Johnston
68f6800ce0 opencrypto: Introduce crypto_dispatch_async()
Currently, OpenCrypto consumers can request asynchronous dispatch by
setting a flag in the cryptop.  (Currently only IPSec may do this.)   I
think this is a bit confusing: we (conditionally) set cryptop flags to
request async dispatch, and then crypto_dispatch() immediately examines
those flags to see if the consumer wants async dispatch. The flag names
are also confusing since they don't specify what "async" applies to:
dispatch or completion.

Add a new KPI, crypto_dispatch_async(), rather than encoding the
requested dispatch type in each cryptop. crypto_dispatch_async() falls
back to crypto_dispatch() if the session's driver provides asynchronous
dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER().

Similarly, add crypto_dispatch_batch() to request processing of a tailq
of cryptops, rather than encoding the scheduling policy using cryptop
flags.  Convert GELI, the only user of this interface (disabled by
default) to use the new interface.

Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine
whether crypto requests will be dispatched synchronously. This is just
a helper macro. Use it instead of looking at cap flags directly.

Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and
just check the relevant queues directly. This could result in some
unnecessary wakeups but I think it's very uncommon to be using more than
one queue per worker in a given workload, so checking all three queues
is a waste of cycles.

Reviewed by:	jhb
Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28194
2021-02-08 09:19:19 -05:00
Edward Tomasz Napierala
123019739c geom(4): make g_newprovider_event() return if G_P_WITHER is set
This fixes a failed assertion in scenario where the provider
disappears, disk_gone() gets called, and at the exact same
time something else closes the device node triggering a retaste.

Reviewed By:	mav
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27330
2020-12-29 14:29:59 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Mateusz Guzik
d5127d1ae2 gbde: replace malloc_last_fail with a kludge
This facilitates removal of malloc_last_fail without really impacting
anything.
2020-11-12 20:20:57 +00:00
Mark Johnston
f44994874b ffs: Clamp BIO_SPEEDUP length
On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".

Clamp the size to LONG_MAX.  Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().

Reviewed by:	chs, kib
Reported by:	pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27081
2020-11-11 13:48:07 +00:00