Commit Graph

135701 Commits

Author SHA1 Message Date
Mitchell Horne
818390ce0c arm64: fix early devmap assertion
The purpose of this KASSERT is to ensure that we do not run out of space
in the early devmap. However, the devmap grew beyond its initial size of
2MB in r336519, and this assertion did not grow with it.

A devmap mapping of a 1080p framebuffer requires 1920x1080 bytes, or
1.977 MB, so it is just barely able to fit without triggering the
assertion, provided no other devices are mapped before it. With the
addition of `options GDB` in GENERIC by bbfa199cbc, the uart is now
mapped for the purposes of a debug port, before mapping the framebuffer.
The presence of both these conditions pushes the selected virtual
address just below the threshold, triggering the assertion.

To fix this, use the correct size of the devmap, defined by
PMAP_MAPDEV_EARLY_SIZE. Since this code is shared with RISC-V, define
it for that platform as well (although it is a different size).

PR:		25241
Reported by:	gbe
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-01-13 17:27:44 -04:00
John Baldwin
074a91f746 Enable accelerated AES-XTS software crypto in GENERIC.
In particular, using GELI on a root filesystem will only use
accelerated software crypto drivers if they are available before the
root filesystem is mounted.  While these modules can be loaded from
the loader, including them in GENERIC provides a better out-of-the-box
experience for users.

Both aesni(4) and armv8crypto(4) provide accelerated implementations
of the default cipher used by GELI (AES-XTS) in addition to other
ciphers.

Reviewed by:	mhorne, allanjude, markj
Differential Revision:	https://reviews.freebsd.org/D28100
2021-01-13 13:13:01 -08:00
Kristof Provost
ea36212bf5 pf: Don't hold PF_RULES_WLOCK during copyin() on DIOCRCLRTSTATS
We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.

Reported by:	syzbot+81e380344d4a6c37d78a@syzkaller.appspotmail.com
MFC after:	1 week
2021-01-13 19:49:42 +01:00
Emmanuel Vadot
6003bf9290 dwwdt: Add PNP info for the driver 2021-01-13 18:43:51 +01:00
Emmanuel Vadot
0a05676b44 Add driver for Synopsys Designware Watchdog timer.
This driver supports some arm and arm64 boards equipped with
"snps,dw-wdt"-compatible watchdog device.
Tested on RK3399-based board (RockPro64).
Once started watchdog device cannot be stopped.
Interrupt handler has mode to kick watchdog even when software does not do it
properly.
This can be controlled via sysctl: dev.dwwdt.prevent_restart.
Also - driver handles system shutdown and prevents from restart when system
is asked to reboot.

Submitted by:	kjopek@gmail.com
Differential Revision:	https://reviews.freebsd.org/D26761
2021-01-13 18:43:47 +01:00
Andrew Turner
63c858a04d Switch the arm64 pcpu to a global register variable
This removes an unneeded instruction to move the pointer from x18 to a
temporary register.

Reviewed by:	emaste
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D26971
2021-01-13 16:36:52 +00:00
Andrew Turner
594389d1de Create a stack frame when needed in the arm64 kernel
When building the arm64 kernel for use with dtrace or hwpmc we need
to include a stack frame so they can extract a stack trace.

As with amd64 also build a stack frame in modules.

Sponsored by:	Innovate UK
2021-01-13 16:36:52 +00:00
Konstantin Belousov
f9d85a0821 Revert "x86 busdma_bounce: do not make assumptions about alignment of malloc(9) results."
This reverts commit 8f54940f01.
The free needs to be called on the address returned by malloc,
not the realigned address.

Noted by:	andrew
Sponsored by:	The FreeBSD Foundation
2021-01-13 17:44:00 +02:00
Mateusz Guzik
ef23df1354 vfs: set NC_KEEPPOSENTRY alongside NOCACHE when creating a file
Arguably the entire NOCACHE logic should get retired, in the meantime
at least prevent the code from evicting existing entries.
2021-01-13 15:29:34 +00:00
Mateusz Guzik
5753be8e43 fd: add refcount argument to falloc_noinstall
This lets callers avoid atomic ops by initializing the count to required
value from the get go.

While here add falloc_abort to backpedal from this without having to
fdrop.
2021-01-13 15:29:34 +00:00
Konstantin Belousov
8f54940f01 x86 busdma_bounce: do not make assumptions about alignment of malloc(9) results.
Reported by:	dim
Reviewed by:	dim, jah
Tested by:	dim, pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28108
2021-01-13 17:00:49 +02:00
Konstantin Belousov
895ad33784 x86 budma_bounce: style.
Reviewed by:	dim, jah
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28108
2021-01-13 17:00:46 +02:00
Hans Petter Selasky
f2a7b434b3 Variable declarations are since C99 and r363250 allowed inside for-loops.
Partial revert of bafb682656.

Suggested by:	mmel@
2021-01-13 12:30:41 +01:00
Edward Tomasz Napierala
ec2700e015 linux: mute the "unsupported prctl option 23" warnings
Make the PR_CAPBSET_READ prctl(2) return EINVAL without logging
any warnings; this is way too noisy with Focal.

Sponsored by:	The FreeBSD Foundation
2021-01-13 10:31:56 +00:00
Alexander V. Chernikov
a6b7689718 Remove redundant rtinit() calls from tuntap.
Removed code iterates over if_addrhead and tries to remove
 routes for each ifa.
This is exactly the thing that if_purgeaddrs() do, and
 if_purgeaddr() is already called in the end.

Reviewed by:		glebius
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D28106
2021-01-13 10:03:15 +00:00
Alexander V. Chernikov
e58c8da068 Map IPv6 link-local prefix to the link-local ifa.
Currently we create link-local route by creating an always-on IPv6 prefix
 in the prefix list. This prefix is not tied to the link-local ifa.

This leads to the following problems:

First, when flushing interface addresses we skip on-link route, leaving
 fe80::/64 prefix on the interface without any IPv6 addresses.
Second, when creating and removing link-local alias we lose fe80::/64 prefix
 from the routing table.

Fix this by attaching link-local prefix to the ifa at the initial creation.

Reviewed by:	hrs
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D28129
2021-01-13 10:03:15 +00:00
Edward Tomasz Napierala
a339b4223a linux: bump the default version from 3.10.0 to 3.17.0
This is required for Qt5, as found in Ubuntu Focal.  The library contains
the minimum kernel version encoded in an ELF note; this makes rtld ignore
it altogether, with a confusing error message.  Without it, things fail
like this:

$ konsole: error while loading shared libraries: libQt5Core.so.5: cannot
open shared object file: No such file or directory

For reference, the Qt kernel version requirements can be found at:
https://github.com/qt/qtbase/blob/dev/src/corelib/global/minimum-linux_p.h

Sponsored by:	The FreeBSD Foundation
Reviewed By:	emaste
Differential Revision:	https://reviews.freebsd.org/D28105
2021-01-13 10:02:16 +00:00
Hans Petter Selasky
bafb682656 Fix for off-by-one in GPIO driver after r368585.
While at it declare the iteration variable outside the for-loop
to appease older compilers.

Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-01-13 10:06:30 +01:00
Mateusz Guzik
5171310e66 vfs: use finstall_refed in openat
This avoids 2 atomic ops in the common case: 1 to grab an extra
reference and 1 to release it.
2021-01-13 03:30:38 +00:00
Mateusz Guzik
530b699a62 fd: add finstall_refed
Can be used to consume an already existing reference and consequently
avoid atomic ops.
2021-01-13 03:27:03 +01:00
Mateusz Guzik
4faa375cdd fd: provide a dedicated closef variant for unix socket code
This avoids testing for td != NULL.
2021-01-13 03:27:03 +01:00
Konstantin Belousov
0659df6fad vm_map_protect: allow to set prot and max_prot in one go.
This prevents a situation where other thread modifies map entries
permissions between setting max_prot, then relocking, then setting prot,
confusing the operation outcome.  E.g. you can get an error that is not
possible if operation is performed atomic.

Also enable setting rwx for max_prot even if map does not allow to set
effective rwx protection.

Reviewed by:	brooks, markj (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28117
2021-01-13 01:35:22 +02:00
Mitchell Horne
d89e1db5a3 if_wg: fix modules load on !x86
Only x86 provides optimized implementations via the blake2 module. The
software "reference" implementation is already included in the crypto(4)
module, we can drop the extra MODULE_DEPEND for other platforms.

Without this change, if_wg.ko could not be loaded due to the missing
dependency.

PR:		252156
Reported by:	gbe
Sponsored by:	The FreeBSD Foundation
2021-01-12 18:07:10 -04:00
Rick Macklem
f6dc363f6d nfs-over-tls: handle res.gid.gid_val correctly for memory allocation
When the server side nfs-over-tls does an upcall to rpc.tlsservd(8)
for the handshake and the rpc.tlsservd "-u" command line option has
been specified, a list of gids may be returned.
The list will be returned in malloc'd memory pointed to by
res.gid.gid_val. To ensure the malloc occurs, res.gid.gid_val must
be NULL before the call. Then, the malloc'd memory needs to be free'd.
mem_free() just calls free(9), so a NULL pointer argument is fine
and a length argument == 0 is ok, since the "len" argument is not used.

This bug would have only affected nfs-over-tls and only when
rpc.tlsservd(8) is running with the "-u" command line option.
2021-01-12 13:59:52 -08:00
Hans Petter Selasky
6e5baec33c Fix for use-after-free in if_ure(4) driver.
When detaching the if_ure(4) driver, the TX active USB transfer array may
point to freed USB transfers. Given that the number of USB transfers is
very low, simply start all transfers every time there is a packet to
keep safe from use-after-free.

PR: 252608
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-12 17:57:58 +01:00
mhorne
0628f68357 riscv pmap: add some pv list assertions
Ensure that we don't end up with a superpage in the vm_page_t's pv list.

This may help with debugging the panic reported in PR 250866, in which
l3 in pmap_remove_write() was found to be NULL. Adding a KASSERT to this
function will help narrow down the cause of this panic the next time it
occurs.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D28109
2021-01-12 11:12:02 -04:00
Mateusz Guzik
70ba77706d vfs: extend vfs:namei:lookup:return probe with nameidata 2021-01-12 13:35:27 +00:00
Mateusz Guzik
cdb62ab74e vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callers
Check the comment above the routine for reasoning.
2021-01-12 13:16:10 +00:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Andrew Turner
c00ec4dab2 Handle using a sub instruction in the arm64 fbt
Some stack frames are too large for a store pair instruction we already
detect in the arm64 fbt code. Add support for handling subtracting the
stack pointer directly.

Sponsored by:	Innovate UK
2021-01-12 12:42:23 +00:00
Andrew Turner
d0df1a2d54 Only allow a store through sp in the arm64 fbt
When searching for an instruction to patch out in the arm64 function
boundary trace we search for a store pair with a write back. This
instruction is commonly used to store two registers to the stack
and update the stack pointer to hold space for more.

This works in many cases, however not all functions use this, e.g.
when the stack frame is too large. In these cases we may find another
instruction of the same type that doesn't store through the stack
pointer. Filter these instructions out and assume if we see one we
are past the function prologue.

Reported by:	rwatson
Sponsored by:	Innovate UK
2021-01-12 12:42:23 +00:00
Emmanuel Vadot
35a39dc5b3 Bump __FreeBSD_version after linuxkpi changes 2021-01-12 12:31:00 +01:00
Emmanuel Vadot
11d62b6f31 linuxkpi: add kernel_fpu_begin/kernel_fpu_end
With newer AMD GPUs (>=Navi,Renoir) there is FPU context usage in the
amdgpu driver.
The `kernel_fpu_begin/end` implementations in drm did not even allow nested
begin-end blocks.

Submitted by: Greg V
Reviewed By: manu, hselasky
Differential Revision: https://reviews.freebsd.org/D28061
2021-01-12 12:31:00 +01:00
Emmanuel Vadot
2c95fb753f linuxkpi: Add shrinker support
A driver can register a shrinker that will be called when the kernel
wants to free some memory.
Add support for that in linuxkpi and call the registered shrinkers
when the lowmem event is triggered.

Reviewed by:	bz
Differential Revision:	 https://reviews.freebsd.org/D27728
2021-01-12 12:31:00 +01:00
Emmanuel Vadot
105a37cac7 linuxkpi: Add more pci functions needed by DRM
-pci_get_class : This function search for a matching pci device based on
   the class/subclass and returns a newly created pci_dev.
 - pci_{save,restore}_state : This is analogous to ours with the same name
 - pci_is_root_bus : Return true if this is the root bus
 - pci_get_domain_bus_and_slot : This function search for a matching pci
   device based on domain, bus and slot/function concat into a single
   unsigned int (devfn) and returns a newly created pci_dev
 - pci_bus_{read,write}_config* : Read/Write to the config space.

While here add some helper function to alloc and fill the pci_dev struct.

Reviewed by:   hselasky, bz (older version)
Differential Revision:	   https://reviews.freebsd.org/D27550
2021-01-12 12:31:00 +01:00
Emmanuel Vadot
8517a547a0 pci: Add pci_find_class_from
pci_find_class_from help finding one or multiple device matching
a class and subclass.
If the from argument is not null we will first loop in the device list
until we find the matching device and only then start to check if the
class/subclass matches.

Reviewed by:   jhb
Differential Revision:	https://reviews.freebsd.org/D27549
2021-01-12 12:25:28 +01:00
Konstantin Belousov
57f22c828e sigfastblock: do not skip cursig/postsig loop in ast()
Even if sigfastblock block is non-zero, non-blockable signals must be
checked on ast and delivered now.  This also affects debugger ability
to attach, because issignal() also calls ptracestop() if there is
a pending stop for debugee.

Instead of checking for sigfastblock, and either setting PENDING flag
for usermode or doing signal delivery loop, always do the loop after
checking, and then handle PENDING bit. issignal() already does the right
thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to
happen.

Reported by:	Vasily Postnicov <shamaz.mazum@gmail.com>, markj
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28089
2021-01-12 12:45:26 +02:00
Konstantin Belousov
513320c0f1 sigfastblock_setpend(): do not set PEND user flag unless TDP_SIGFASTPENDING is set.
User pending bit should not be set if kernel did not noted a pending signal.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28089
2021-01-12 12:43:34 +02:00
Kristof Provost
f743976583 dtrace: Blacklist riscv exception handlers for fbt
We can't safely instrument those exception handlers, so blacklist them.

Test case: dtrace -n :::

Reviewed by:		markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D27754
2021-01-12 10:33:16 +01:00
Mateusz Guzik
44121a0fbe amd64: fix tlb shootdown when all cpus are passed in the bitmap
Right now the routine leaves the current CPU in the map, later tripping
on an assert when filling in the scoreboard: panic: IPI scoreboard is
zero, initiator 1 target 1

Instead pre-check if all CPUs are present in the map and remember that
outcome for later.

Fixes:	7eaea04a5b ("amd64: compare TLB shootdown target to all_cpus")
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28111
2021-01-12 08:47:32 +00:00
Konstantin Belousov
9402bb44f1 vmspace_fork: preserve wx settings in the child vm map after fork
Noted by:	markj
Sponsored by:	The FreeBSD Foundation
2021-01-12 08:09:59 +02:00
Alan Somers
ff1a307801 lio_listio: validate aio_lio_opcode
Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland).  The situation became more serious with
022ca2fc7f.  After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.

Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.

MFC-with:	022ca2fc7f
Reviewed by:	jhb, tmunro, 0mp
Differential Revision:	<https://reviews.freebsd.org/D28078
2021-01-11 19:53:01 -07:00
Andrew Gallatin
7eaea04a5b amd64: compare TLB shootdown target to all_cpus
On amd64, the pmap code passes all_cpus to
smp_targeted_tlb_shootdown() when unmapping from the
kernel pmap.  This function has an optimized path to send IPIs
to all but itself, which it intends to do when the target
is all cpus.   However, we need to compare the target cpu mask
with all_cpus, rather than using CPU_ISFULLSET().  Comparing with
CPU_ISFULLSET() will only work when we have MAXCPU cpus active in
the system, otherwise, we'll be sending repeated IPIs, rather than
a single IPI to all CPUs but ourself.

Fixing this should reduce the time spent in native_lapic_ipi_wait()
as we will be sending ipis in parallel, rather than one-by-one.
This is confirmed by dtrace.

Reviewed by: alc, jhb, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D28102
2021-01-11 20:09:32 -05:00
Kirk McKusick
2d4422e799 Eliminate lock order reversal in UFS ffs_unmount().
UFS uses a new "mntfs" pseudo file system which provides private
device vnodes for a file system to safely access its disk device.
The original device vnode is saved in um_odevvp to hold the exclusive
lock on the device so that any attempts to open it for writing will
fail. But it is otherwise unused and has its BO_NOBUFS flag set to
enforce that file systems using mntfs vnodes do not accidentally
use the original devfs vnode. When the file system is unmounted,
um_odevvp is no longer needed and is released.

The lock order reversal happens because device vnodes must be locked
before UFS vnodes. During unmount, the root directory vnode lock
is held. When when calling vrele() on um_odevvp, vrele() attempts to
exclusive lock um_odevvp causing the lock order reversal. The problem
is eliminated by doing a non-blocking exclusive lock on um_odevvp
which will always succeed since there are no users of um_odevvp.
With um_odevvp locked, it can be released using vput which does not
attempt to do a blocking exclusive lock request and thus avoids the
lock order reversal.

Sponsored by: Netflix
2021-01-11 16:49:07 -08:00
Alan Somers
58a08f9e99 [skip ci] Delete an accidentally-committed comment
MFC-With:	19cca0b961
2021-01-11 17:01:22 -07:00
Jason A. Harmening
e8a5a1ad71 rctl(4): support throttling resource usage to 0
For rate-based resources that support throttling (e.g.
readiops/writeips), this fixes a divide-by-zero panic when rctl(8)
passes 0 as the throttle value.  For these resources, treat
zero-throttle requests as requests to suspend forward progress as long
as possible using the duration specified in
kern.racct.rctl.throttle_max.

PR:		251803
Reported by:	chris@cretaforce.gr
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27858
2021-01-11 15:36:57 -08:00
Alexander V. Chernikov
2defbe9f0e Use rn_match instead of doing indirect calls in fib_algo.
Relevant inet/inet6 code has the control over deciding what
 the RIB lookup function currently is. With that in mind,
 explicitly set it to the current value (rn_match) in the
 datapath lookups. This avoids cost on indirect call.

Differential Revision: https://reviews.freebsd.org/D28066
2021-01-11 23:30:35 +00:00
Konstantin Belousov
4ea65707d3 exec_new_vmspace: print useful error message on ctty if stack cannot be mapped.
After old vmspace is destroyed during execve(2), but before the new space
is fully constructed, an error during image activation cannot be returned
because there is no executing program to receive it.

In the relatively common case of failure to map stack, print some hints
on the control terminal.  Note that user has enough knobs to cause stack
mapping error, and this is the most common reason for execve(2) aborting
the process.

Requested by:	jhb
Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28050
2021-01-12 01:15:43 +02:00
Konstantin Belousov
2e1c94aa1f Implement enforcing write XOR execute mapping policy.
It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE |
PROT_EXEC are never specified together, if vm_map has MAP_WX flag set.
FreeBSD control flag allows specific binary to request WX exempt, and
there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/
disable globally.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28050
2021-01-12 01:15:43 +02:00
Kristof Provost
86b653ed7e pf: quiet debugging printfs
Only log these when debugging output is enabled.
2021-01-11 22:30:44 +01:00