Commit Graph

2351 Commits

Author SHA1 Message Date
Mark Johnston
0fcafe8516 eli: Zero pad bytes that arise when certain auth algorithms are used
When authentication is configured, GELI ensures that the amount of data
per sector is a multiple of 16 bytes.  This is done in
eli_metadata_softc().  When the digest size is not a multiple of 16
bytes, this leaves some extra pad bytes at the end of every sector, and
they were not being zeroed before being written to disk.  In particular,
this happens with the HMAC/SHA1, HMAC/RIPEMD160 and HMAC/SHA384 data
authentication algorithms.

This change ensures that they are zeroed before being written to disk.

Reported by:	KMSAN
Reviewed by:	delphij, asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31170
2021-07-15 12:23:04 -04:00
Mark Johnston
39552dff7b graid3: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
graid3 metadata.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:46:02 -04:00
Mark Johnston
0f09ab89cc gconcat: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
gconcat metadata.  Also make sure to zero the full block buffer before
encoding the metadata and writing.

Fix some style bugs in g_concat_write_metadata() while here.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:59 -04:00
Mark Johnston
7f053a44ae gmirror: Zero the metadata block before writing
The mirror metadata fields contain string buffers and pad bytes, neither
were being zeroed before metadata was written to disk.  Also, the
metadata structure is smaller than the sector size, and in one case
gmirror was failing to zero-fill the full buffer before writing.

Fix these problems by pre-zeroing the metadata structure and the sector
buffer.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:57 -04:00
Jessica Clarke
af433832f7 geom_label: Remove an old sysinstall(8) workaround
We removed sysinstall(8) back in 2011, so this workaround should be long
since unnecessary. This workaround can end up breaking cases that are
hit in the real world, such as dd'ing a small pre-built disk image to a
large partition that you intend to grow on first boot and uses a UFS
disk label for / in its /etc/fstab (as the only reliable thing a raw UFS
image can reference).

Reviewed by:	imp, mckusick
Differential Revision:	https://reviews.freebsd.org/D30825
2021-07-05 16:15:32 +01:00
Noah Bergbauer
d575e81fbc gconcat: Implement new online append feature
Implement the "gconcat append" command which can be used
to append a disk to the end of an existing gconcat device
without unmounting.

If the gconcat device is using the "automatic" method, i.e.,
stores metadata on the devices, new metadata is written
to all existing components, as well as to the newly added one.

Pull Request:	https://github.com/freebsd/freebsd-src/pull/472
Reviewed by:	imp@
2021-06-14 11:42:03 -06:00
Noah Bergbauer
56fd97660a gconcat: Add new lock to allow modifications to the disk list in preparation for online append
In addition, rename existing sc_lock to sc_append_lock

Reviewed by:		imp@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/447
Sponsored by:		Netflix
2021-06-02 15:59:25 -06:00
Noah Bergbauer
e61e072f3b gconcat: Switch array to TAILQ to prepare for online append
Reviewed by:		imp@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/447
Sponsored by:		Netflix
2021-06-02 15:50:27 -06:00
Alan Somers
420dbe763f gmultipath: make physpath distinct from the underlying providers'
zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot.  Currently gmultipath passes through its underlying providers'
physical path attribute.  That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk.  That would
be bad.

This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.

Sponsored by:	Axcient
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D29941
2021-05-06 12:32:27 -06:00
Mark Johnston
2f1cfb7f63 gmirror: Pre-allocate the timeout event structure
We can't call malloc(M_WAITOK) in a callout handler.

Reviewed by:	imp
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29223
2021-03-11 15:45:15 -05:00
Mark Johnston
68f6800ce0 opencrypto: Introduce crypto_dispatch_async()
Currently, OpenCrypto consumers can request asynchronous dispatch by
setting a flag in the cryptop.  (Currently only IPSec may do this.)   I
think this is a bit confusing: we (conditionally) set cryptop flags to
request async dispatch, and then crypto_dispatch() immediately examines
those flags to see if the consumer wants async dispatch. The flag names
are also confusing since they don't specify what "async" applies to:
dispatch or completion.

Add a new KPI, crypto_dispatch_async(), rather than encoding the
requested dispatch type in each cryptop. crypto_dispatch_async() falls
back to crypto_dispatch() if the session's driver provides asynchronous
dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER().

Similarly, add crypto_dispatch_batch() to request processing of a tailq
of cryptops, rather than encoding the scheduling policy using cryptop
flags.  Convert GELI, the only user of this interface (disabled by
default) to use the new interface.

Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine
whether crypto requests will be dispatched synchronously. This is just
a helper macro. Use it instead of looking at cap flags directly.

Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and
just check the relevant queues directly. This could result in some
unnecessary wakeups but I think it's very uncommon to be using more than
one queue per worker in a given workload, so checking all three queues
is a waste of cycles.

Reviewed by:	jhb
Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28194
2021-02-08 09:19:19 -05:00
Edward Tomasz Napierala
123019739c geom(4): make g_newprovider_event() return if G_P_WITHER is set
This fixes a failed assertion in scenario where the provider
disappears, disk_gone() gets called, and at the exact same
time something else closes the device node triggering a retaste.

Reviewed By:	mav
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27330
2020-12-29 14:29:59 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Mateusz Guzik
d5127d1ae2 gbde: replace malloc_last_fail with a kludge
This facilitates removal of malloc_last_fail without really impacting
anything.
2020-11-12 20:20:57 +00:00
Mark Johnston
f44994874b ffs: Clamp BIO_SPEEDUP length
On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".

Clamp the size to LONG_MAX.  Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().

Reviewed by:	chs, kib
Reported by:	pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27081
2020-11-11 13:48:07 +00:00
Warner Losh
a3f4217ec0 Remove frontstuff
Nothing implements this in the tree. Remove the ioctl and the
conversion to the geom atttribute stuff.

This was introduced in r94287 in 2002 and was retired in r113390
2003. It appeared in FreeBSD 5.0, but no other releases. This is a
vestige that was missed at the time and overlooked until now. No
compat is provided for this reason.  And there's no implementation of
it today. And it was never part of a release from a stable branch.

Reviewed by: phk@
Differential Revision: https://reviews.freebsd.org/D26967
2020-10-27 06:43:24 +00:00
Alexander Motin
8b220f8915 Fix asymmetry in devstat(9) calls by GEOM.
Before this GEOM passed bio pointer to transaction start, but not end.
It was irrelevant until devstat(9) got DTrace hooks, that appeared to
provide bio pointer on I/O completion, but not on submission.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-10-24 21:07:10 +00:00
Robert Wing
a2b559df1e geom_ctl.c: remove stale header files
- Remove "opt_geom.h", no kernel options are used.

- Remove <sys/sysctl.h>, no sysctl functionality is used here.

- Remove <sys/bio.h>, requirements for bio moved out in r112534.

- Remove <sys/lock.h> and <sys/mutex.h>, last used by DROP_GIANT() and
  PICKUP_GIANT(), which were removed in r115624.

- Remove <sys/disk.h> and <sys/kernel.h>, not used.

Reviewed by: phk, kevans (mentor)
Approved by: phk, kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D26805
2020-10-20 20:59:13 +00:00
Edward Tomasz Napierala
3001e97deb Fix fallout from r366811.
PR:		250442
Reported by:	lwhsu
Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26855
2020-10-19 20:26:37 +00:00
Edward Tomasz Napierala
d22ff249d9 Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26658
2020-10-18 16:24:08 +00:00
Warner Losh
bc683a89a3 Move kernel env global variables, etc to sys/kenv.h
The kernel globals for kenv are confined to 2 files that need them and
a few that likely shouldn't (but as written the code does). Move them
from sys/systm.h to sys/kenv.h. This removed a XXX from systm.h and
cleans it up a little bit...
2020-10-07 06:16:37 +00:00
Eugene Grosbein
b2b5d4c07d geom_part: make it possible recovering broken GPT after some LBAs cut off
This is followup to r365477.

If pre-formatted device has GPT and a partition covering
last available LBAs and the device is attached using
a bridge reducing amount of LBAs, then it could be not enough
forcing GEOM to use primary GPT. Also, we should make it possible
to recover GPT and this requires either deleting or resizing the partition.

This change enables "gpart delete" and "gpart resize" commands
on corrupted GPT with following "gpart recover".

It still does not allow modifying corrupted GPT without
preliminary setting sysctl kern.geom.part.check_integrity=0

For example:

# gpart show da0
=>        34  3906963389  da0  GPT  (1.8T) [CORRUPT]
          34      262144    1  ms-reserved  (128M)
      262178        2014       - free -  (1.0M)
      264192  3906764943    2  freebsd-swap  (1.8T)
# gpart resize -i 2 -s 3900000000 da0
# gpart recover da0

Reported by:	Alex Korchmar
MFC after:	3 days
2020-09-17 04:39:39 +00:00
Warner Losh
0c97af56a7 We don't need the sc_ekeys_lock in standalone environment.
When we bring in geli into the boot loader, we are single threaded so
we don't have to worry about locking. We have no mutexes, and don't need
to use them, so comment it out.

MFC After: 3 days
2020-09-14 23:51:14 +00:00
Edward Tomasz Napierala
60f083efe2 Move TDP_GEOM check from userret() to ast(); this code path is quite
infrequent.

Reviewed by:	kib
No objections:	mav
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26374
2020-09-14 10:14:03 +00:00
Eugene Grosbein
cea05ed9a9 geom_part: extend kern.geom.part.check_integrity to work on GPT
There are multiple USB/SATA bridges on the market that unconditionally
cut some LBAs off connected media. This could be a problem
for pre-partitioned drives so GEOM complains and does not create
devices in /dev for slices/partitions preventing access to existing data.

We have kern.geom.part.check_integrity that allows us to correct
partitioning if changed from default 1 to 0 but it works for MBR only.
If backup copy of GPT is unavailable due to decreases number of LBAs,
kernel still does not give access to partitions and prints to dmesg:

GEOM: md0: corrupt or invalid GPT detected.
GEOM: md0: GPT rejected -- may not be recoverable.

This change makes it work for GPT too, so it created partitions in /dev
and prints to dmesg this instead:

GEOM: md0: the secondary GPT table is corrupt or invalid.
GEOM: md0: using the primary only -- recovery suggested.

Then "gpart recover" re-created backup copy of GPT
and allows further manipulations with partitions.

This change is no-op for default configuration having
kern.geom.part.check_integrity=1

Reported by:	Alex Korchmar
MFC after:	3 days.
2020-09-08 22:23:53 +00:00
Mateusz Guzik
d40bc60752 geom: clean up empty lines in .c and .h files 2020-09-01 22:14:09 +00:00
Warner Losh
887611b122 Retire devctl_notify_f()
devctl_notify_f isn't needed, so retire it. The flags argument is now
unused, so rather than keep it around, retire it. Convert all old
users of it to devctl_notify(). This path no longer sleeps, so is safe
to call from any context. Since it doesn't sleep, it doesn't need to
know if it is OK to sleep or not.

Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26140
2020-08-29 04:30:06 +00:00
Alan Somers
7d874f0f36 geli: use unmapped I/O
Use unmapped I/O for geli. Unlike most geom providers, geli needs to
manipulate data on every read or write. Previously it would always map bios.

On my 16-core, dual socket server using geli atop md(4) devices, with 512B
sectors, this change increases geli IOPs by about 3x.

Note that geli still can't use unmapped I/O when data integrity verification
is enabled (but it could, with a little more work).  And it can't use
unmapped I/O in combination with ZFS, because ZFS uses mapped bios.

Reviewed by:	markj, kib, jhb, mjg, mat, bcr (manpages)
MFC after:	1 week
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25671
2020-08-26 02:44:35 +00:00
Warner Losh
773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Conrad Meyer
cb1480f8d4 gpart(8): Recognize apple-zfs and solaris-reserved partition ids
Introduce G_PART_ALIAS_SOLARIS_RESERVED, GPT_ENT_TYPE_SOLARIS_RESERVED et al.,
to make gpart show output more convenient on systems with illumos/openindiana
disks visible.

Submitted by:	Juraj Lutter <otis AT sk.FreeBSD.org>
Reviewed by:	bcr(manpages), delphij, myself
Differential Revision:	https://reviews.freebsd.org/D26012
2020-08-17 17:07:05 +00:00
John Baldwin
e2bbd168ad Fix indentation. 2020-07-27 16:31:21 +00:00
Xin LI
a450ecfdbd gctl_get_geom: Skip validation of g_class.
The caller from kernel is expected to provide an valid g_class
pointer, instead of traversing the global g_class list, just
use that pointer directly instead.

Reviewed by:		mav
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D25811
2020-07-26 22:30:55 +00:00
Xin LI
178d88fa39 geom_map and geom_redboot: Remove unused ctlreq handler.
The two classes do not take any verbs and always gctl_error for
all requests, so don't bother to provide a ctlreq handler.

Reviewed by:		mav
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D25810
2020-07-26 22:30:01 +00:00
Xin LI
7201590bbf Use snprintf instead of sprintf.
MFC after:	2 weeks
2020-07-26 01:45:26 +00:00
Xin LI
795c5f365e geom_label: Make glabel labels more trivial by separating the tasting
routines out.

While there, also simplify the creation of label paths a little bit
by requiring the / suffix for label directory prefixes (ld_dir renamed
to ld_dirprefix to indicate the change) and stop defining macros for
these when they are only used once.

Reviewed by:		cem
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D25597
2020-07-26 00:44:59 +00:00
Xin LI
fcf69f3dbc Consistently use gctl_get_provider instead of home-grown variants.
Reviewed by:		cem, imp
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D25739
2020-07-22 02:15:21 +00:00
Xin LI
0ab851aac3 gctl_get_class, gctl_get_geom and gctl_get_provider: provide feedback
when the requested argument is missing.

Reviewed by:		cem
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D25738
2020-07-22 02:14:27 +00:00
Alan Somers
aafaa8b794 Fix geli's null cipher, and add a test case
PR:		247954
Submitted by:	jhb (sys), asomers (tests)
Reviewed by:	jhb (tests), asomers (sys)
MFC after:	2 weeks
Sponsored by:	Axcient
2020-07-21 19:18:29 +00:00
Xin LI
82b17c8e91 Fix indent for if clause.
MFC after:	2 weeks
2020-07-20 01:55:19 +00:00
Xin LI
b23a7fbaab g_concat_find_device: trim /dev/ if it is present, like other GEOM
classes.

Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25596
2020-07-09 08:00:46 +00:00
Xin LI
8510f61acd sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
Alan Somers
6f818c1fb0 geli: enable direct dispatch
geli does all of its crypto operations in a separate thread pool, so
g_eli_start, g_eli_read_done, and g_eli_write_done don't actually do very
much work. Enabling direct dispatch eliminates the g_up/g_down bottlenecks,
doubling IOPs on my system. This change does not affect the thread pool.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25587
2020-07-08 17:12:12 +00:00
Conrad Meyer
64612d4e44 geom(4): Kill GEOM_PART_EBR_COMPAT option
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.

Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234."  However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.

Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0.  This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered.  That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.

(This change does not add support for the 'bootcode' verb to EBR.)

PR:		232463
Reported by:	Manish Jain <bourne.identity AT hotmail.com>
Discussed with:	ae (no objection)
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D24939
2020-07-01 02:16:36 +00:00
John Baldwin
6572e5ff66 Use explicit_bzero() instead of bzero() for sensitive data.
Reviewed by:	delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25441
2020-06-25 20:25:35 +00:00
John Baldwin
b172f23dd7 Use zfree() instead of bzero() and free().
These bzero's should have been explicit_bzero's.

Reviewed by:	cem, delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25437
2020-06-25 20:20:22 +00:00
John Baldwin
4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Kirk McKusick
9407f25df2 Optimize g_journal's superblock update by noting that the summary
information is neither read nor written so it need not be written
out when updating the superblock.

PR:           247425
Sponsored by: Netflix
2020-06-23 21:44:00 +00:00
Baptiste Daroussin
5b990a9463 Revert r362466
Such change should not have happen without prior discussion and review.

With hat:	transitioning core
2020-06-22 07:46:24 +00:00
Hans Petter Selasky
7747001b12 Improve wording to be more precise and clear.
No functional change intended.

s/Master Boot/Main Boot/ (also called MBR)

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-21 13:34:08 +00:00
Kirk McKusick
34816cb9ae Move the pointers stored in the superblock into a separate
fs_summary_info structure. This change was originally done
by the CheriBSD project as they need larger pointers that
do not fit in the existing superblock.

This cleanup of the superblock eases the task of the commit
that immediately follows this one.

Suggested by: brooks
Reviewed by:  kib
PR:           246983
Sponsored by: Netflix
2020-06-19 01:02:53 +00:00