Commit Graph

136893 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
1b11173c00 linux: extend the LINUX_O_ constants to make room for O_PATH
No functional changes.

Sponsored By:	EPSRC
2021-04-15 15:04:44 +01:00
Konstantin Belousov
e3d6759585 b_vflags update requries bufobj lock
The trunc_dependencies() issue was reported by	Alexander Lochmann
<alexander.lochmann@tu-dortmund.de>, who found the problem by performing
lock analysis using LockDoc, see https://doi.org/10.1145/3302424.3303948.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-15 15:47:42 +03:00
Konstantin Belousov
bbf7a4e878 O_PATH: allow vnode kevent filter on such files
if VREAD access is checked as allowed during open

Requested by:	wulf
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:49:18 +03:00
Konstantin Belousov
f9b923af34 O_PATH: Allow to open symlink
When O_NOFOLLOW is specified, namei() returns the symlink itself.  In
this case, open(O_PATH) should be allowed, to denote the location of symlink
itself.

Prevent O_EXEC in this case, execve(2) code is not ready to try to execute
symlinks.

Reported by:	wulf
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:49:09 +03:00
Konstantin Belousov
a5970a529c Make files opened with O_PATH to not block non-forced unmount
by only keeping hold count on the vnode, instead of the use count.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:48:27 +03:00
Konstantin Belousov
8d9ed174f3 open(2): Implement O_PATH
Reviewed by:	markj
Tested by:	pho
Discussed with:	walker.aj325_gmail.com, wulf
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:48:24 +03:00
Konstantin Belousov
509124b626 Add AT_EMPTY_PATH for several *at(2) syscalls
It is currently allowed to fchownat(2), fchmodat(2), fchflagsat(2),
utimensat(2), fstatat(2), and linkat(2).

For linkat(2), PRIV_VFS_FHOPEN privilege is required to exercise the flag.
It allows to link any open file.

Requested by:	trasz
Tested by:	pho, trasz
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29111
2021-04-15 12:48:11 +03:00
Konstantin Belousov
d51b4b0aac AT_RESOLVE_BENEATH is bsd-specific
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29111
2021-04-15 12:48:02 +03:00
Konstantin Belousov
437c241d0c vfs_vnops.c: Make vn_statfile() non-static
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:47:56 +03:00
Konstantin Belousov
42be0a7b10 Style.
Add missed spaces, wrap long lines.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:47:46 +03:00
Mateusz Guzik
4f0279e064 cache: extend mismatch vnode assert print to include the name 2021-04-15 07:55:43 +00:00
Kirk McKusick
14d0cd7225 Ensure that the mount command shows "with quotas" when quotas are enabled.
When quotas are enabled with the quotaon(8) command, it sets the
MNT_QUOTA flag in the mount structure mnt_flag field. The mount
structure holds a cached copy of the filesystem statfs structure
in mnt_stat that includes a copy of the mnt_flag field in
mnt_stat.f_flags. The mnt_stat structure may not be updated for
hours. Since the mount command requests mount details using the
MNT_NOWAIT option, it gets the mount's mnt_stat statfs structure
whose f_flags field does not yet show the MNT_QUOTA flag being set
in mnt_flag.

The fix is to have quotaon(8) set the MNT_QUOTA flag in both mnt_flag
and in mnt_stat.f_flags so that it will be immediately visible to
callers of statfs(2).

Reported by:  Christos Chatzaras
Tested by:    Christos Chatzaras
PR:           254682
MFC after:    3 days
Sponsored by: Netflix
2021-04-14 15:25:08 -07:00
Vladimir Kondratyev
8e84712d01 hidmap: add missing opt_hid.h to module Makefile
Reported by:	pstef
MFC after:	2 weeks
2021-04-14 23:05:59 +03:00
Mark Johnston
aabe13f145 uma: Introduce per-domain reclamation functions
Make it possible to reclaim items from a specific NUMA domain.

- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations.  Use a counter instead of a flag to
  synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
  can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
  domain.

Currently the new KPIs are unused, so there should be no functional
change.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29685
2021-04-14 13:03:34 -04:00
Mark Johnston
54f421f9e8 uma: Split bucket_cache_drain() to permit per-domain reclamation
Note that the per-domain variant does not shrink the target bucket size.

No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-04-14 13:03:34 -04:00
Mark Johnston
29bb6c19f0 domainset: Define additional global policies
Add global definitions for first-touch and interleave policies.  The
former may be useful for UMA, which implements a similar policy without
using domainset iterators.

No functional change intended.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29104
2021-04-14 13:03:33 -04:00
Emmanuel Vadot
0c80ad2dc6 arm: Add no-cftconvert for sdma-imx6 files
Fixes a warning when building kernel:
ctfconvert: file.c: Couldn't read ehdr: Invalid argument

MFC after:	3 days
2021-04-14 15:43:37 +02:00
Martin Matuska
6db169e920 zfs: merge openzfs/zfs@3522f57b6 (master)
Notable upstream pull request merges:
  #11742 When specifying raidz vdev name, parity count should match
  #11744 Use a helper function to clarify gang block size
  #11771 Support running FreeBSD buildworld on Arm-based macOS hosts

This is the last update that will be MFCed into stable/13.

From now on, the tracking of OpenZFS branches will be different:
- main continues tracking openzfs/zfs/master
- stable/13 is going to track openzfs/zfs/zfs-2.1-release

Obtained from:	OpenZFS
MFC after:	1 week
2021-04-14 12:51:51 +02:00
Vladimir Kondratyev
6678e75e4f pchtherm: Add IDs for CannonLake-H, CometLake and Lewisburg controllers
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2021-04-14 13:15:19 +03:00
Konstantin Belousov
75c5cf7a72 filt_timerexpire: avoid process lock recursion
Found by:	syzkaller
Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29746
2021-04-14 10:53:28 +03:00
Konstantin Belousov
5cc1d19941 realtimer_expire: avoid proc lock recursion when called from itimer_proc_continue()
It is fine to drop the process lock there, process cannot exit until its
timers are cleared.

Found by:	syzkaller
Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29746
2021-04-14 10:53:19 +03:00
Konstantin Belousov
5edf7227ec pseudofs: limit writes to 1M
Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29752
2021-04-14 10:23:21 +03:00
Konstantin Belousov
116f26f947 sbuf_uionew(): sbuf_new() takes int as length
and length should be not less than SBUF_MINSIZE

Reported and tested by:	pho
Noted and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29752
2021-04-14 10:23:20 +03:00
Vladimir Kondratyev
fb451895fb ichsmb: Add PCI ID for Intel Gemini Lake SMBus controller
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2021-04-14 03:58:07 +03:00
Navdeep Parhar
d107ee06f3 cxgbe(4): RSS hash for VXLAN traffic is computed from the inner frame.
Sponsored by:	Chelsio Communications
2021-04-13 16:50:12 -07:00
John Baldwin
774c4c82ff TOE: Use a read lock on the PCB for syncache_add().
Reviewed by:	np, glebius
Fixes:		08d9c92027
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29739
2021-04-13 16:31:04 -07:00
Tai-hwa Liang
d9b61e7153 if_firewire: fixing panic upon packet reception for VNET build
netisr_dispatch_src() needs valid VNET pointer or firewire_input() will panic
when receiving a packet.

Reviewed by:	glebius
MFC after:	2 weeks
2021-04-13 22:59:58 +00:00
Mark Johnston
06a53ecf24 malloc: Add state transitions for KASAN
- Reuse some REDZONE bits to keep track of the requested and allocated
  sizes, and use that to provide red zones.
- As in UMA, disable memory trashing to avoid unnecessary CPU overhead.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29461
2021-04-13 17:42:21 -04:00
Mark Johnston
f1c3adefd9 execve: Mark exec argument buffers
We cache mapped execve argument buffers to avoid the overhead of TLB
shootdowns.  Mark them invalid when they are freed to the cache.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29460
2021-04-13 17:42:21 -04:00
Mark Johnston
b261bb4057 vfs: Add KASAN state transitions for vnodes
vnodes are a bit special in that they may exist on per-CPU lists even
while free.  Add a KASAN-only destructor that poisons regions of each
vnode that are not expected to be accessed after a free.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29459
2021-04-13 17:42:21 -04:00
Mark Johnston
2b914b85dd kmem: Add KASAN state transitions
Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't
provide a lot of benefit, but since allocations are always a multiple of
the page size we can create a redzone when the allocation request size
is not a multiple of the page size.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29458
2021-04-13 17:42:21 -04:00
Mark Johnston
244f3ec642 kstack: Add KASAN state transitions
We allocate kernel stacks using a UMA cache zone.  Cache zones have
KASAN disabled by default, but in this case it makes sense to enable it.

Reviewed by:	andrew
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29457
2021-04-13 17:42:21 -04:00
Mark Johnston
09c8cb717d uma: Add KASAN state transitions
- Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular
  zone should not be sanitized.  This is applied implicitly for NOFREE
  and cache zones.
- Add KASAN call backs which get invoked:
  1) when a slab is imported into a keg
  2) when an item is allocated from a zone
  3) when an item is freed to a zone
  4) when a slab is freed back to the VM

  In state transitions 1 and 3, memory is poisoned so that accesses will
  trigger a panic.  In state transitions 2 and 4, memory is marked
  valid.
- Disable trashing if KASAN is enabled.  It just adds extra CPU overhead
  to catch problems that are detected by KASAN.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29456
2021-04-13 17:42:21 -04:00
Mark Johnston
f115c06121 amd64: Add MD bits for KASAN
- Initialize KASAN before executing SYSINITs.
- Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN.
- Increase the kernel stack size if KASAN is enabled.  Some of the
  ASAN instrumentation increases stack usage and it's enough to
  trigger stack overflows in ZFS.
- Mark the trapframe as valid in interrupt handlers if it is
  assigned to td_intr_frame.  Otherwise, an interrupt in a function
  which creates a poisoned alloca region can trigger false positives.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29455
2021-04-13 17:42:20 -04:00
Mark Johnston
6faf45b34b amd64: Implement a KASAN shadow map
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map.  This region is the shadow map.  The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid.  Various kernel allocators call kasan_mark() to update
the shadow map.

Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN.  UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.

The shadow map uses one byte per eight bytes in the kernel map.  In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.

When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map.  kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29417
2021-04-13 17:42:20 -04:00
Mark Johnston
38da497a4d Add the KASAN runtime
KASAN enables the use of LLVM's AddressSanitizer in the kernel.  This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses.  It is particularly
effective when combined with test suites or syzkaller.  KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.

The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.

The runtime implements a number of functions defined by the compiler
ABI.  These are prefixed by __asan.  The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.

kasan_mark() is called by various kernel allocators to update state in
the shadow map.  Updates to those allocators will come in subsequent
commits.

The runtime also defines various interceptors.  Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation.  To handle this, the runtime implements these routines
on behalf of the rest of the kernel.  The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.

The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.

Obtained from:	NetBSD
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29416
2021-04-13 17:42:20 -04:00
Mark Johnston
01028c736c Add a KASAN option to the kernel build
LLVM support for enabling KASAN has not yet landed so the option is not
yet usable, but hopefully this will change soon.

Reviewed by:	imp, andrew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29454
2021-04-13 17:42:20 -04:00
Mitchell Horne
5742f2d89c arm64: adjust comments in dbg_monitor_exit()
These comments were copied from dbg_monitor_enter(), but the intended
modifications weren't made. Update them to reflect what this code
actually does.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-04-13 14:41:31 -03:00
Mitchell Horne
a2a8b582bd arm64: clear debug registers after execve(2)
This is both intuitive and required, as any previous breakpoint settings
may not be applicable to the new process.

Reported by:	arichardson
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29672
2021-04-13 14:41:03 -03:00
Konstantin Belousov
8cca7b7f28 nfs client: depend on xdr
Since 7763814fc9 nfsrpc_setclient() uses mem_alloc() that is macro
around malloc(M_RPC).  M_RPC is provided by xdr.ko.

Reviewed by:	rmacklem
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-13 18:04:43 +03:00
Edward Tomasz Napierala
ca6e1fa3ce linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609
2021-04-13 13:14:30 +01:00
Kurosawa Takahiro
2aa21096c7 pf: Implement the NAT source port selection of MAP-E Customer Edge
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.

PR:		254577
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D29468
2021-04-13 10:53:18 +02:00
John Baldwin
45d5c28439 cxgbe: Ignore doomed virtual interfaces when updating the clip table.
A doomed VI does not have a valid ifnet.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29662
2021-04-12 14:36:40 -07:00
John Baldwin
76681661be OCF: Remove support for asymmetric cryptographic operations.
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works.  The only known consumer of this interface was the engine
in OpenSSL < 1.1.  Modern OpenSSL versions do not include support for
this interface as it was not well-documented.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29736
2021-04-12 14:28:43 -07:00
John Baldwin
89df484739 iscsi: Kick threads out of iscsi_ioctl() during unload.
iscsid can be sleeping in iscsi_ioctl() causing the destroy_dev() to
sleep forever if iscsi.ko is unloaded while iscsid is running.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29688
2021-04-12 13:58:21 -07:00
John Baldwin
568e69e4eb cxgbe: Add counters for iSCSI PDUs transmitted via TOE.
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29297
2021-04-12 13:57:45 -07:00
Warner Losh
662053e8dc hptrr: Move to using .o files
Use .o files directly. Replace the .o.uu files that we uudecode with .o files.
Adjust the kernel and module build to cope.

Suggestions by:		markj@, emaste@
Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29636
2021-04-12 13:47:55 -06:00
Warner Losh
fddb3f4d7d hptmv: use .o files directly
uudecode the .o.uu files and commit directly to the tree. Adjust the build
infrastructure to cope with the new location, both for the kernel and modules.

Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29635
2021-04-12 13:47:55 -06:00
Warner Losh
550cb4ab85 hpt27xx: store the .o files directly in the tree
Store the .o files directly in the tree. We no longer need to play uuencode
games like we did in the CVS days. Adjust the build infrastructure to match.

Reviewed by:            markj@
Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29634
2021-04-12 13:47:55 -06:00
Warner Losh
5b20c5e1f8 hptnr: Store the .o files directly in the repo
We no longer need to use uuencode to uuencode files in our tree.  Store the .o
file directly instead. Adjust the build to cope with the new arrangement.

Suggestions by:		emaste, bz, donner
Reviewed by:		markm
Sposnored by:		Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29632
2021-04-12 13:47:55 -06:00