Commit Graph

18548 Commits

Author SHA1 Message Date
John Baldwin
470e851c4b ktls: Support asynchronous dispatch of AEAD ciphers.
KTLS OCF support was originally targeted at software backends that
used host CPU cycles to encrypt TLS records.  As a result, each KTLS
worker thread queued a single TLS record at a time and waited for it
to be encrypted before processing another TLS record.  This works well
for software backends but limits throughput on OCF drivers for
coprocessors that support asynchronous operation such as qat(4) or
ccr(4).  This change uses an alternate function (ktls_encrypt_async)
when encrypt TLS records via a coprocessor.  This function queues TLS
records for encryption and returns.  It defers the work done after a
TLS record has been encrypted (such as marking the mbufs ready) to a
callback invoked asynchronously by the coprocessor driver when a
record has been encrypted.

- Add a struct ktls_ocf_state that holds the per-request state stored
  on the stack for synchronous requests.  Asynchronous requests malloc
  this structure while synchronous requests continue to allocate this
  structure on the stack.

- Add a ktls_encrypt_async() variant of ktls_encrypt() which does not
  perform request completion after dispatching a request to OCF.
  Instead, the ktls_ocf backends invoke ktls_encrypt_cb() when a TLS
  record request completes for an asynchronous request.

- Flag AEAD software TLS sessions as async if the backend driver
  selected by OCF is an async driver.

- Pull code to create and dispatch an OCF request out of
  ktls_encrypt() into a new ktls_encrypt_one() function used by both
  ktls_encrypt() and ktls_encrypt_async().

- Pull code to "finish" the VM page shuffling for a file-backed TLS
  record into a helper function ktls_finish_noanon() used by both
  ktls_encrypt() and ktls_encrypt_cb().

Reviewed by:	markj
Tested on:	ccr(4) (jhb), qat(4) (markj)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D31665
2021-08-30 13:11:52 -07:00
Andrew Turner
b792434150 Create sys/reg.h for the common code previously in machine/reg.h
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).

Reviewed by:	imp, markj
Sponsored by:	DARPA, AFRL (original work)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19830
2021-08-30 12:50:53 +01:00
Ka Ho Ng
a58e222b3b vfs: yield in vn_deallocate_impl() loop
Yield at the end of each loop iteration if there are remaining works as
indicated by the value of *len updated by VOP_DEALLOCATE. Without this,
when calling vop_stddeallocate to zero a large region, the
implementation only zerofills a relatively small chunk and returns.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31705
2021-08-29 16:26:00 +08:00
Mark Johnston
7326e8589c fsetown: Avoid process group lock recursion
Restore the pre-1d874ba4f8ba behaviour of disassociating the current
SIGIO recipient before looking up the specified process or process
group.  This avoids a lock recursion in the scenario where a process
group is configured to receive SIGIO for an fd when it has already been
so configured.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib
MFC after:	3 days
2021-08-28 15:50:44 -04:00
Rick Macklem
da779f262c vfs_default: Change vop_stddeallocate() from static to global
A future commit to the NFS client uses vop_stddeallocate() for
cases where the NFS server does not support a Deallocate operation.
Change vop_stddeallocate() from static to global so that it can
be called by the NFS client.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31640
2021-08-27 18:25:44 -07:00
Konstantin Belousov
f19063ab02 vfs_hash_rehash(): require the vnode to be exclusively locked
Rehash updates v_hash.  Also, rehash moves the vnode to different hash
bucket, which should be noticed in vfs_hash_get() after sleeping for
the vnode lock.

Reviewed by:	mckusick, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31464
2021-08-27 18:39:45 +03:00
Konstantin Belousov
7c1e4aab79 vfs_hash_insert: ensure that predicate is true
After vnode lock, recheck v_hash. When vfs_hash_insert() is used with
a predicate, recheck it after the selected vnode is locked. Since
vfs_hash_lock is dropped, vnode could be rehashed during the sleep for
the vnode lock, which could go unnoticed there.

Reported and tested by:	pho
Reviewed by:	mckusick, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31464
2021-08-27 18:39:45 +03:00
Mark Johnston
091869def9 connect: Use soconnectat() unconditionally in kern_connect()
soconnect(...) is equivalent to soconnectat(AT_FDCWD, ...), so rely on
this to save a branch.  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-08-27 08:32:07 -04:00
Mateusz Guzik
f1e2cc1c66 vfs: drop dedicated sysinit for mountlist_mtx
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 20:52:03 +02:00
Mateusz Guzik
0d28d014c8 vfs: refactor kern_unmount
Split unmounting by path and id in preparation for other changes.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 13:58:28 +02:00
Mateusz Guzik
7b2561b46b vfs: stop open-coding vfs_getvfs in kern_unmount
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 11:38:31 +00:00
Mark Johnston
a507a40f3b fsetown: Simplify error handling
No functional change intended.

Suggested by:	kib
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31671
2021-08-25 16:20:07 -04:00
Mark Johnston
1d874ba4f8 fsetown: Fix process lookup bugs
- pget()/pfind() will acquire the PID hash bucket locks, which are
  sleepable sx locks, but this means that the sigio mutex cannot be held
  while calling these functions.  Instead, use pget() to hold the
  process, after which we lock the sigio and proc locks, respectively.
- funsetownlst() assumes that processes cannot be registered for SIGIO
  once they have P_WEXIT set.  However, pfind() will happily return
  exiting processes, breaking the invariant.  Add an explicit check for
  P_WEXIT in fsetown() to fix this. [1]

Fixes:	f52979098d ("Fix a pair of races in SIGIO registration")
Reported by:	syzkaller [1]
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31661
2021-08-25 16:18:10 -04:00
Ka Ho Ng
9e202d036d fspacectl(2): Changes on rmsr.r_offset's minimum value returned
rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes
zeroed before hitting the end-of-file. After this change rmsr.r_offset
no longer contains the EOF when the requested operation range is
completely beyond the end-of-file. Instead in such case rmsr.r_offset is
equal to rqsr.r_offset.  Callers can obtain the number of bytes zeroed
by subtracting rqsr.r_offset from rmsr.r_offset.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31677
2021-08-26 00:03:37 +08:00
Ka Ho Ng
5c1428d2c4 uipc_shm: Handle offset on shm_size as if it is beyond shm_size
This avoids any unnecessary works in such case.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D31655
2021-08-24 23:49:18 +08:00
Ka Ho Ng
1eaa36523c fspacectl(2): Clarifies the return values
rmacklem@ spotted two things in the system call:
- Upon returning from a successful operation, vop_stddeallocate can
  update rmsr.r_offset to a value greater than file size. This behavior,
  although being harmless, can be confusing.
- The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is
  undocumented.

This commit has the following changes:
- vop_stddeallocate and shm_deallocate to bound the the affected area
  further by the file size.
- The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is
  documented.
- The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return
  len is explicitly documented the be the value 0, and the return offset
  is restricted to be the smallest of off + len and current file size
  suggested by kib@. This semantic allows callers to interact better
  with potential file size growth after the call.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D31604
2021-08-24 17:08:28 +08:00
Mateusz Guzik
b65ad70195 cache: retire cache_fast_revlookup sysctl
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-23 15:31:44 +02:00
Mateusz Guzik
7fd856ba07 vfs: s/__unused/__diagused in crossmp_*
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-23 15:23:42 +02:00
Mateusz Guzik
614faa3269 vfs: fix cache-relatecd LOR introduced in the previous change
Reported by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-22 16:20:07 +00:00
Thomas Munro
f30a1ae8d5 lio_listio(2): Allow LIO_READV and LIO_WRITEV.
Allow multiple vector IOs to be started with one system call.
aio_readv() and aio_writev() already used these opcodes under the
covers.  This commit makes them available to user space.

Being non-standard extensions, they're only visible if __BSD_VISIBLE is
defined, like the functions.

Reviewed by:    asomers, kib
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D31627
2021-08-22 23:00:42 +12:00
Jason A. Harmening
e81e71b0e9 Use interruptible wait for blocking recursive unmounts
Now that we allow recursive unmount attempts to be abandoned upon
exceeding the retry limit, we should avoid leaving an unkillable
thread when a synchronous unmount request was issued against the
base filesystem.

Reviewed by:	kib (earlier revision), mkusick
Differential Revision:  https://reviews.freebsd.org/D31450
2021-08-20 13:21:56 -07:00
Jason A. Harmening
a8c732f4e5 VFS: add retry limit and delay for failed recursive unmounts
A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount.  In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.

Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up.  Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount.

Create a new sysctl node, vfs.deferred_unmount, to export the total
number of failed recursive unmount attempts since boot, and to allow
the retry limit and retry grace period to be tuned.

Reviewed by:	kib (earlier revision), mkusick
Differential Revision:  https://reviews.freebsd.org/D31450
2021-08-20 13:20:50 -07:00
Mateusz Guzik
5d75ffdd0c vfs: remove an unused variable from nameicap_tracker_add
Reported by cc --analyze

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-20 17:52:24 +00:00
Mateusz Guzik
dbc689cdef vfs: use vn_lock_pair to avoid establishing an ordering on mount
This fixes some of the LORs seen on mount/unmount.

Complete fix will require taking care of unmount as well.

Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31611
2021-08-20 17:52:24 +00:00
Kyle Evans
d7e1bdfeba uipc: avoid circular pr_{slow,fast}timos
domain_init() gets reinvoked for each vnet on a system, so we must not
alter global state.  Practically speaking, we were creating circular
lists and tying up a softclock thread into an infinite loop.

The breakage here was most easily observed by simply creating a jail
in a new vnet and watching the system suddenly become erratic.

Reported by:	markj
Fixes:	e0a17c3f06 ("uipc: create dedicated lists for fast ...")
Pointy hat:	kevans
2021-08-18 12:46:54 -05:00
Kristof Provost
07edc89c39 witness: remove ifnet_rw
This lock no longer exists. It was removed in
a60100fdfc (if: Remove ifnet_rwlock, 2020-11-25)

Reviewed by:		mjg
Pointed out by:		Dheeraj Kandula <dheerajk@netapp.com>
Different Revision:	https://reviews.freebsd.org/D31585
2021-08-18 08:51:26 +02:00
Kristof Provost
a051ca72e2 Introduce m_get3()
Introduce m_get3() which is similar to m_get2(), but can allocate up to
MJUM16BYTES bytes (m_get2() can only allocate up to MJUMPAGESIZE).

This simplifies the bpf improvement in f13da24715.

Suggested by:	glebius
Differential Revision:	https://reviews.freebsd.org/D31455
2021-08-18 08:48:27 +02:00
Mateusz Guzik
e0a17c3f06 uipc: create dedicated lists for fast and slow timeout callbacks
This avoids having to walk all possible protocols only to check if they
have one (vast majority does not).

Original patch by kevans@.

Reviewed by:	kevans
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-17 21:56:05 +02:00
Mark Johnston
c4feb1ab0a sigtimedwait: Use a unique wait channel for sleeping
When a sigtimedwait(2) caller goes to sleep, it uses a wait channel of
p->p_sigacts with the proc lock as the interlock.  However, p_sigacts
can be shared between processes if a child is created with
rfork(RFSIGSHARE | RFPROC).  Thus we can end up with two threads
sleeping on the same wait channel using different locks, which is not
permitted.

Fix the problem simply by using a process-unique wait channel, following
the example of sigsuspend.  The actual wait channel value is irrelevant
here, sleeping threads are awoken using sleepq_abort().

Reported by:	syzbot+8c417afabadb50bb8827@syzkaller.appspotmail.com
Reported by:	syzbot+1d89fc2a9ef92ef64fa8@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31563
2021-08-16 15:11:15 -04:00
John Baldwin
d16cb228c1 ktls: Fix accounting for TLS 1.0 empty fragments.
TLS 1.0 empty fragment mbufs have no payload and thus m_epg_npgs is
zero.  However, these mbufs need to occupy a "unit" of space for the
purposes of M_NOTREADY tracking similar to regular mbufs.  Previously
this was done for the page count returned from ktls_frame() and passed
to ktls_enqueue() as well as the page count passed to pru_ready().

However, sbready() and mb_free_notready() only use m_epg_nrdy to
determine the number of "units" of space in an M_EXT mbuf, so when a
TLS 1.0 fragment was marked ready it would mark one unit of the next
mbuf in the socket buffer as ready as well.  To fix, set m_epg_nrdy to
1 for empty fragments.  This actually simplifies the code as now only
ktls_frame() has to handle TLS 1.0 fragments explicitly and the rest
of the KTLS functions can just use m_epg_nrdy.

Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D31536
2021-08-16 10:42:46 -07:00
Konstantin Belousov
81b895a95b pipe_paircreate(): do not leak pipepair memory on error
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-08-16 17:08:44 +03:00
Kyle Evans
29e400e994 domain: make it safer to add domains post-domainfinalize
I can see two concerns for adding domains after domainfinalize:

1.) The slow/fast callouts have already been setup.
2.) Userland could create a socket while we're in the middle of
  initialization.

We can address #1 fairly easily by tracking whether the domain's been
initialized for at least the default vnet. There are still some concerns
about the callbacks being invoked while a vnet is in the process of
being created/destroyed, but this is a pre-existing issue that the
callbacks must coordinate anyways.

We should also address #2, but technically this has been an issue
anyways because we don't assert on post-domainfinalize additions; we
don't seem to hit it in practice.

Future work can fix that up to make sure we don't find partially
constructed domains, but care must be taken to make sure that at least,
e.g., the usages of pffindproto in ip_input.c can still find them.

Differential Revision:	https://reviews.freebsd.org/D25459
2021-08-16 00:59:56 -05:00
Kyle Evans
239aebee61 domain: give domains a chance to probe for availability
This gives any given domain a chance to indicate that it's not actually
supported on the current system. If dom_probe isn't supplied, we assume
the domain is universally applicable as most of them are. Keeping
fully-initialized and registered domains around that physically can't
work on a large majority of FreeBSD deployments is sub-optimal and leads
to errors that aren't consistent with the reality of why the socket
can't be created (e.g. ESOCKTNOSUPPORT) because such scenario has to be
caught upon pru_attach, at which point kicking back the more-appropriate
EAFNOSUPPORT would seem weird.

The initial consumer of this will be hvsock, which is only available on
HyperV guests.

Reviewed by:	cem (earlier version), bcr (manpages)
Differential Revision:	https://reviews.freebsd.org/D25062
2021-08-16 00:59:56 -05:00
Konstantin Belousov
9446d9e88f fstatat(2): handle non-vnode file descriptors for AT_EMPTY_PATH
Set NIRES_EMPTYPATH earlies, to have use of EMPTYPATH recorded even if
we are going to return error.  When namei_setup() refused to accept dirfd,
which is not of the vnode type, and indicated by ENOTDIR error return,
fall back to kern_fstat(dirfd).

Reported by:	dchagin
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31530
2021-08-14 00:17:18 +03:00
Ka Ho Ng
454bc887f2 uipc_shm: Implements fspacectl(2) support
This implements fspacectl(2) support on shared memory objects. The
semantic of SPACECTL_DEALLOC is equivalent to clearing the backing
store and free the pages within the affected range. If the call
succeeds, subsequent reads on the affected range return all zero.

tests/sys/posixshm/posixshm_tests.c is expanded to include a
fspacectl(2) functional test.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kevans, kib
Differential Revision:	https://reviews.freebsd.org/D31490
2021-08-12 23:04:18 +08:00
Ka Ho Ng
a638dc4ebc vfs: Add ioflag to VOP_DEALLOCATE(9)
The addition of ioflag allows callers passing
IO_SYNC/IO_DATASYNC/IO_DIRECT down to the file system implementation.
The vop_stddeallocate fallback implementation is updated to pass the
ioflag to the file system implementation. vn_deallocate(9) internally is
also changed to pass ioflag to the VOP_DEALLOCATE call.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D31500
2021-08-12 23:03:49 +08:00
Ka Ho Ng
c15384f896 vfs: Add get_write_ioflag helper to calculate ioflag
Converted vn_write to use this helper.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31513
2021-08-12 17:35:34 +08:00
Dmitry Chagin
71854d9b2b fork: Remove the unnecessary spaces.
MFC after:		2 weeks
2021-08-12 11:58:17 +03:00
Dmitry Chagin
de8374df28 fork: Allow ABI to specify fork return values for child.
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D31472
MFC after:		2 weeks
2021-08-12 11:45:25 +03:00
Eric van Gyzen
13a58148de netdump: send key before dump, in case dump fails
Previously, if an encrypted netdump failed, such as due to a timeout or
network failure, the key was not saved, so a partial dump was
completely useless.

Send the key first, so the partial dump can be decrypted, because even a
partial dump can be useful.

Reviewed by:	bdrewery, markj
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D31453
2021-08-11 10:54:56 -05:00
Mark Johnston
10a8e93da1 kmsan: Export kmsan_mark_mbuf() and kmsan_mark_bio()
Sponsored by:	The FreeBSD Foundation
2021-08-11 16:33:41 -04:00
Andrew Gallatin
95c51fafa4 ktls: Init reset tag task for cloned sessions
When cloning a ktls session (which is needed when we need to
switch output NICs for a NIC TLS session), we need to also
init the reset task, like we do when creating a new tls session.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-11 14:06:43 -04:00
Mitchell Horne
4ccaa87f69 kdb: Handle process enumeration before procinit()
Make kdb_thr_first() and kdb_thr_next() return sane values if the
allproc list and pidhashtbl haven't been initialized yet. This can
happen if the debugger is entered very early on, for example with the
'-d' boot flag.

This allows remote gdb to attach at such a time, and fixes some ddb
commands like 'show threads'.

Be explicit about the static initialization of these variables. This
part has no functional change.

Reviewed by:	markj, imp (previous version)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D31495
2021-08-11 14:44:22 -03:00
Ka Ho Ng
4a9b832a2a vfs: Rename ioflg to ioflag in vn_deallocate
This includes a style fix around ioflag checking as well.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib, bcr
Differential Revision:	https://reviews.freebsd.org/D31505
2021-08-11 17:45:47 +08:00
Alexander Motin
67f508db84 Mark some sysctls as CTLFLAG_MPSAFE.
MFC after:	2 weeks
2021-08-10 22:18:26 -04:00
Mark Johnston
100949103a uma: Add KMSAN hooks
For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO).  Some zones are exempted from
this when it would otherwise raise false positives.

Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations.  This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report.  For example:
  panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe

Sponsored by:	The FreeBSD Foundation
2021-08-10 21:27:54 -04:00
Mark Johnston
693c9516fa busdma: Add KMSAN integration
Sanitizer instrumentation of course cannot automatically update shadow
state when devices write to host memory.  KMSAN thus hooks into busdma,
both to update shadow state after a device write, and to verify that the
kernel does not publish uninitalized bytes to devices.

To implement this, when KMSAN is configured, each dmamap embeds a memory
descriptor describing the region currently loaded into the map.
bus_dmamap_sync() uses the operation flags to determine whether to
validate the loaded region or to mark it as initialized in the shadow
map.

Note that in cases where the amount of data written is less than the
buffer size, the entire buffer is marked initialized even when it is
not.  For example, if a NIC writes a 128B packet into a 2KB buffer, the
entire buffer will be marked initialized, but subsequent accesses past
the first 128 bytes are likely caused by bugs.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31338
2021-08-10 21:27:54 -04:00
Mark Johnston
b0f71f1bc5 amd64: Add MD bits for KMSAN
Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code.  This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls.  Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.

Use these functions in amd64 interrupt and exception handlers.  Note
that handlers for user->kernel transitions need not be annotated.

Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31467
2021-08-10 21:27:53 -04:00
Mark Johnston
8978608832 amd64: Populate the KMSAN shadow maps and integrate with the VM
- During boot, allocate PDP pages for the shadow maps.  The region above
  KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array.  For now, this array is
  not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled.  As with
  KASAN, sanitizer instrumentation appears to create stack frames large
  enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured.  KMSAN
  cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured.  Each
  buffer has a static MAXPHYS-sized allocation of KVA, which in turn
  eats 2*MAXPHYS of space in the shadow map.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31295
2021-08-10 21:27:53 -04:00
Mark Johnston
5dda15adbc kern: Ensure that thread-local KMSAN state is available
Sponsored by:	The FreeBSD Foundation
2021-08-10 21:27:53 -04:00