Commit Graph

136729 Commits

Author SHA1 Message Date
Martin Matuska
9db44a8e5d zfs: merge OpenZFS master-9305ff2ed
Notable upstream pull request merges:
  #11153 Scalable teardown lock for FreeBSD
  #11651 Don't bomb out when using keylocation=file://
  #11667 zvol: call zil_replaying() during replay
  #11683 abd_get_offset_struct() may allocate new abd
  #11693 Intentionally allow ZFS_READONLY in zfs_write
  #11716 zpool import cachefile improvements
  #11720 FreeBSD: Clean up zfsdev_close to match Linux
  #11730 FreeBSD: bring back possibility to rewind the
         checkpoint from bootloader

Obtained from:	OpenZFS
MFC after:	2 weeks
2021-03-14 02:32:14 +01:00
Gordon Bergling
5666643a95 Fix some common typos in comments
- occured -> occurred
- normaly -> normally
- controling -> controlling
- fileds -> fields
- insterted -> inserted
- outputing -> outputting

MFC after:	1 week
2021-03-13 18:26:15 +01:00
Gordon Bergling
183502d162 Fix a few typos in comments
- trough -> through

MFC after:	1 week
2021-03-13 16:37:28 +01:00
Gordon Bergling
564a3ac63a i386: Fix a few typos
- wheter -> whether
- while here, fix some whitespace issues

MFC after:	1 week
2021-03-13 16:10:01 +01:00
Gordon Bergling
d197bf2b20 net80211: Fix a typo in a comment
- destionation -> destination
- while here, fix some whitespace issues

MFC after:	1 week
2021-03-13 15:51:30 +01:00
Mariusz Zaborski
653ed678c7 zfs: bring back possibility to rewind the checkpoint from
Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abff
[2] OpenZFS repo: f2c027bd6a

Originally reviewed by:		tsoome, allanjude
Originally reviewed by:		kevans (ok from high-level overview)

Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>

PR:		254152
Reported by:	Zhenlei Huang <zlei.huang at gmail.com>
Obtained from:	https://github.com/openzfs/zfs/pull/11730
2021-03-13 12:56:17 +01:00
Mateusz Guzik
59146a6921 zfs: make seqc asserts conditional on replay
Avoids tripping on asserts when doing pool recovery.
2021-03-13 09:31:49 +00:00
Adrian Chadd
fb3edd4f3f [ath] do a cold reset if TSFOOR triggers
TSFOOR happens if a beacon with a given TSF isn't received within the
programmed/expected TSF value, plus/minus a fudge range. (OOR == out of range.)

If this happens then it could be because the baseband/mac is stuck, or
the baseband is deaf.  So, do a cold reset and resync the beacon to
try and unstick the hardware.

It also happens when a bad AP decides to err, slew its TSF because they
themselves are resetting and they don't preserve the TSF "well."

This has fixed a bunch of weird corner cases on my 2GHz AP radio upstairs
here where it occasionally goes deaf due to how much 2GHz noise is up
here (and ANI gets a little sideways) and this unsticks the station
VAP.

For AP modes a hung baseband/mac usually ends up as a stuck beacon
and those have been addressed for a long time by just resetting the
hardware.  But similar hangs in station mode didn't have a similar
recovery mechanism.

Tested:

* AR9380, STA mode, 2GHz/5GHz
* AR9580, STA mode, 5GHz
* QCA9344 SoC w/ on-board wifi (TL-WDR4300/3600 devices); 2GHz
  STA mode
2021-03-12 23:30:25 -08:00
Adrian Chadd
51dfae383b [ath] validate ts_antenna before updating tx statistics
Right now ts_antenna is either 0 or 1 in each supported HAL so
this is purely a sanity check.

Later on if I ever get magical free time I may add some extensions
for the NICs that can have slightly more complicated antenna switches
for transmit and I'd like this to not bust memory.
2021-03-12 17:33:07 -08:00
Peter Jeremy
07564e1762 arm64: Add support for the RK805/RK808 RTC
Implement a driver for the RTC embedded in the RK805/RK808 power
management system used for RK3328 and RK3399 SoCs.

Based on experiments on my RK808, setting the time doesn't alter the
internal/inaccessible sub-second counter, therefore there's no point
in calling clock_schedule().

Based on an earlier revision by andrew.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D22692
Sponsored by:	Google
MFC after:	1 week
2021-03-13 09:06:04 +11:00
Warner Losh
7381bbee29 cam: Run all XPT_ASYNC ccbs in a dedicated thread
Queue all XPT_ASYNC ccb's and run those in a new cam async thread. This thread
is allowed to sleep for things like memory. This should allow us to make all the
registration routines for cam periph drivers simpler since they can assume they
can always allocate memory. This is a separate thread so that any I/O that's
completed in xpt_done_td isn't held up.

This should fix the panics for WAITOK alloations that are elsewhere in the
storage stack that aren't so easy to convert to NOWAIT. Additional future work
will convert other allocations in the registration path to WAITOK should
detailed analysis show it to be safe.

Reviewed by: chs@, rpokala@
Differential Revision:	https://reviews.freebsd.org/D29210
2021-03-12 13:29:42 -07:00
John Baldwin
5fe0cd6503 ccr: Disable requests on port 1 when needed to workaround a firmware bug.
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug.  Disable requests on port 1 by
default on affected firmware.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26581
2021-03-12 10:59:35 -08:00
John Baldwin
9c5137beb5 ccr: Add per-port stats of queued and completed requests.
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29176
2021-03-12 10:59:35 -08:00
John Baldwin
8f885fd1f3 ccr: Set the RX channel ID correctly in work requests.
These fixes are only relevant for requests on the second port.  In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.

- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
  and use it when an RX channel ID is required in crypto requests
  instead of using the tx_channel_id.

- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
  the crypto result.

- Set the FID to the first rx queue ID on the adapter rather than the
  queue ID of the first rx queue for the port.

- While here, use tx_chan to set the tx_channel_id though this is
  identical to the previous value.

Reviewed by:	np
Reported by:	Chelsio QA
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29175
2021-03-12 10:59:35 -08:00
Jonathan T. Looney
dbec10e088 Fetch the sigfastblock value in syscalls that wait for signals
We have seen several cases of processes which have become "stuck" in
kern_sigsuspend(). When this occurs, the kernel's td_sigblock_val
is set to 0x10 (one block outstanding) and the userspace copy of the
word is set to 0 (unblocked). Because the kernel's cached value
shows that signals are blocked, kern_sigsuspend() blocks almost all
signals, which means the process hangs indefinitely in sigsuspend().

It is not entirely clear what is causing this condition to occur.
However, it seems to make sense to add some protection against this
case by fetching the latest sigfastblock value from userspace for
syscalls which will sleep waiting for signals. Here, the change is
applied to kern_sigsuspend() and kern_sigtimedwait().

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29225
2021-03-12 18:14:17 +00:00
John Baldwin
40d593d17e x86: Update some stale comments in cpu_fork() and cpu_copy_thread().
Neither of these routines allocate stacks.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29227
2021-03-12 09:48:49 -08:00
John Baldwin
c7b0213523 x86: Always use clean FPU and segment base state for new kthreads.
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29208
2021-03-12 09:48:36 -08:00
John Baldwin
640d54045b Set TDP_KTHREAD before calling cpu_fork() and cpu_copy_thread().
This permits these routines to use special logic for initializing MD
kthread state.

For the kproc case, this required moving the logic to set these flags
from kproc_create() into do_fork().

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29207
2021-03-12 09:48:20 -08:00
John Baldwin
5a50eb6585 Don't pass RFPROC to kproc_create(), it is redundant.
Reviewed by:	tuexen, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29206
2021-03-12 09:48:10 -08:00
John Baldwin
645b15e558 Remove unused wrappers around kproc_create() and kproc_exit().
Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29205
2021-03-12 09:47:58 -08:00
John Baldwin
755efb8d8f x86: Copy the FPU/XSAVE state from the creating thread to new threads.
POSIX states that new threads created via pthread_create() should
inherit the "floating point environment" from the creating thread.

Discussed with:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29204
2021-03-12 09:47:41 -08:00
John Baldwin
704547ce1c amd64: Cleanups to setting TLS registers for Linux binaries.
- Use update_pcb_bases() when updating FS or GS base addresses to
  permit use of FSBASE and GSBASE in Linux processes.  This also sets
  PCB_FULL_IRET.  linux32 was setting PCB_32BIT which should be a
  no-op (exec sets it).

- Remove write-only variables to construct unused segment descriptors
  for linux32.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29026
2021-03-12 09:47:31 -08:00
John Baldwin
9221145868 amd64: Only update fsbase/gsbase in pcb for curthread.
Before the pcb is copied to the new thread during cpu_fork() and
cpu_copy_thread(), the kernel re-reads the current register values in
case they are stale.  This is done by setting PCB_FULL_IRET in
pcb_flags.

This works fine for user threads, but the creation of kernel processes
and kernel threads do not follow the normal synchronization rules for
pcb_flags.  Specifically, new kernel processes are always forked from
thread0, not from curthread, so adjusting pcb_flags via a simple
instruction without the LOCK prefix can race with thread0 running on
another CPU.  Similarly, kthread_add() clones from the first thread in
the relevant kernel process, not from curthread.  In practice, Netflix
encountered a panic where the pcb_flags in the first kthread of the
KTLS process were trashed due to update_pcb_bases() in
cpu_copy_thread() running from thread0 to create one of the other KTLS
threads racing with the first KTLS kthread calling fpu_kern_thread()
on another CPU.  In the panicking case, the write to update pcb_flags
in fpu_kern_thread() was lost triggering an "Unregistered use of FPU
in kernel" panic when the first KTLS kthread later tried to use the
FPU.

Reported by:	gallatin
Discussed with:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29023
2021-03-12 09:45:18 -08:00
Edward Tomasz Napierala
0dfbdd9fc2 linux(4): make getcwd(2) return ERANGE instead of ENOMEM
For native FreeBSD binaries, the return value from __getcwd(2)
doesn't really matter, as the libc wrapper takes over and returns
the proper errno.

PR:		kern/254120
Reported By:	Alex S <iwtcex@gmail.com>
Reviewed By:	kib
Sponsored By:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29217
2021-03-12 15:31:45 +00:00
Kristof Provost
cecfaf9bed pf: Fully remove interrupt events on vnet cleanup
swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.

We can remove it manually with intr_event_destroy().

PR:		254171
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29211
2021-03-12 12:12:43 +01:00
Kristof Provost
28dc2c954f pf: Simplify cleanup
We can now counter_u64_free(NULL), so remove the checks.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29190
2021-03-12 12:12:35 +01:00
Kristof Provost
b8f7267d49 uma: allow uma_zfree_pcu(..., NULL)
We already allow free(NULL) and uma_zfree(..., NULL). Make
uma_zfree_pcpu(..., NULL) work as well.
This also means that counter_u64_free(NULL) will work.

These make cleanup code simpler.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29189
2021-03-12 12:12:35 +01:00
Konstantin Belousov
16dea83410 null_vput_pair(): release use reference on dvp earlier
We might own the last use reference, and then vrele() at the end would
need to take the dvp vnode lock to inactivate, which causes deadlock
with vp. We cannot vrele() dvp from start since this might unlock ldvp.

Handle it by holding the vnode and dropping use ref after lowerfs
VOP_VPUT_PAIR() ended.  This effectivaly requires unlock of the vp vnode
after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true
unconditionally.  This opens more opportunities for vp to be reclaimed,
if lvp is still alive we reinstantiate vp with null_nodeget().

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
44691b33cc vlrureclaim: only skip vnode with resident pages if it own the pages
Nullfs vnode which shares vm_object and pages with the lower vnode should
not be exempt from the reclaim just because lower vnode cached a lot.
Their reclamation is actually very cheap and should be preferred over
real fs vnodes, but this change is already useful.

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
0b3948e73b softdep_unmount: assert that no dandling dependencies are left
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
7a8d4b4da6 FFS: assign fully initialized struct mount_softdeps to um_softdep
Other threads observing the non-NULL um_softdep can assume that it is
safe to use it. This is important for ro->rw remounts where change from
read-only to read-write status cannot be made atomic.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
2af934cc15 Assert that um_softdep is NULL on free(ump), i.e. softdep_unmount() was called
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
f776c54cee ffs_mount: when remounting ro->rw and sbupdate failed, cleanup softdeps
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
d7e5e37416 softdep_unmount: handle spurious wakeups
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
fabbc3d879 softdep_flush(): do not access ump after we acked FLUSH_EXIT and unlocked SU lock
otherwise we might follow a pointer in the freed memory.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
7c7a6681fa ffs: clear MNT_SOFTDEP earlier when remounting rw to ro
Suppose that we remount rw->ro and in parallel some reader tries to
instantiate a vnode, e.g. during lookup.  Suppose that softdep_unmount()
already started, but we did not cleared the MNT_SOFTDEP flag yet.
Then ffs_vgetf() calls into softdep_load_inodeblock() which accessed
destroyed hashes and freed memory.

Set/clear fs_ronly simultaneously (WRT to files flush) with MNT_SOFTDEP.
It might be reasonable to move the change of fs_ronly to under MNT_ILOCK,
but no readers take it.

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:07 +02:00
Konstantin Belousov
7f682bdcab Rework MOUNTED/DOING SOFTDEP/SUJ macros
Now MNT_SOFTDEP indicates that SU are active in any variant +-J, and
SU+J is indicated by MNT_SOFTDEP | MNT_SUJ combination.  The reason is
that unmount will be able to easily hide SU from other operations by
clearing MNT_SOFTDEP while keeping the record of the active journal.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:07 +02:00
Konstantin Belousov
81cdb19e04 ffs softdep: clear ump->um_softdep on softdep_unmount()
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:07 +02:00
Konstantin Belousov
a285d3edac ffs_extern.h: Add comments for ffs_vgetf() flags
Requested and reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:30:59 +02:00
Konstantin Belousov
fd97fa6463 Add FFSV_FORCEINODEDEP flag for ffs_vgetf()
It will be used to allow SU flush code to sync the volume while external
consumers see that SU is already disabled on the filesystem.  Use it where
ffs_vgetf() called by SU code to process dependencies.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:30:38 +02:00
Konstantin Belousov
25aac48d2c simplify journal_mount: move the out label after success block
This removes the need to check for error == 0.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:30:37 +02:00
Wei Hu
a491581f3f Hyper-V: hn: Enable vSwitch RSC support in hn netvsc driver
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.

The patch also updates NVS version to 6.1 as needed for RSC
enablement.

MFC after:	2 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D29075
2021-03-12 04:35:16 +00:00
Warner Losh
ba5de7e930 SPDX: Spell 4 clause BSD license correctly 2021-03-11 14:17:54 -07:00
Mark Johnston
2f1cfb7f63 gmirror: Pre-allocate the timeout event structure
We can't call malloc(M_WAITOK) in a callout handler.

Reviewed by:	imp
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29223
2021-03-11 15:45:15 -05:00
Warner Losh
8423f5d4c1 nvme: use config_intrhook_drain to avoid removable card races
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.

Sponsored by:		Netflix, Inc
Reviewed by:		jhb, mav
Differential Revision:	https://reviews.freebsd.org/D29006
2021-03-11 09:45:10 -07:00
Warner Losh
e52368365d config_intrhook: provide config_intrhook_drain
config_intrhook_drain will remove the hook from the list as
config_intrhook_disestablish does if the hook hasn't been called.  If it has,
config_intrhook_drain will wait for the hook to be disestablished in the normal
course (or expedited, it's up to the driver to decide how and when
to call config_intrhook_disestablish).

This is intended for removable devices that use config_intrhook and might be
attached early in boot, but that may be removed before the kernel can call the
config_intrhook or before it ends. To prevent all races, the detach routine will
need to call config_intrhook_train.

Sponsored by:		Netflix, Inc
Reviewed by:		jhb, mav, gde (in D29006 for man page)
Differential Revision:	https://reviews.freebsd.org/D29005
2021-03-11 09:45:10 -07:00
Edward Tomasz Napierala
dc0119c281 linsysfs: create /sys/bus/ and /sys/subsystem/
This looks like a no-op, but it prevents udevadm(8) with failing
loudly, which in turn unbreaks installation of libfprint-2-2, which
in Focal is a dependency for make-4.2.1-1.2.

One might wonder why installing a build utility involves messing
with device handling...

Sponsored By:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29133
2021-03-11 15:50:51 +00:00
Mark Johnston
968079f253 vm_reserv: Fix list locking in vm_reserv_reclaim_contig()
The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.

During the scan, we drop the per-domain lock.  At this point, the rvn
pointer may be invalidated.  Take care to load rvn after re-acquiring
the per-domain lock.

While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.

Reviewed by:	alc, kib
Reported by:	gallatin
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29203
2021-03-11 10:35:35 -05:00
Warner Losh
1645a4ae64 usb: tiny formatting nit
Format 300 baud like all the others here. No functional change.
2021-03-11 08:24:13 -07:00
Kristof Provost
913e7dc3e0 pf: Remove redundant kif != NULL checks
pf_kkif_free() already checks for NULL, so we don't have to check before
we call it.

Reviewed by:	melifaro@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29195
2021-03-11 10:39:43 +01:00
Kristof Provost
5e9dae8e14 pf: Factor out pf_krule_free()
Reviewed by:	melifaro@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29194
2021-03-11 10:39:43 +01:00
Oskar Holmund
17b14d8f77 usr.sbin/pwm/pwm add support for flags
The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this
patch add the option -I (capital letter i) to send it to the drivers.

None of existing PWM driver have implemented support for flags.
But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer.

Differential Revision: https://reviews.freebsd.org/D29137
MFC after:   2 weeks
2021-03-11 09:57:56 +01:00
Oskar Holmund
7d4a5de84d share/man/man9/pwmbus.9 fix types in arguments
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.

Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after:   3 days
2021-03-11 09:57:04 +01:00
Greg V
15565e0a21 kern.mk: fix -Wno-error style to fix build with Clang 12
Clang 12 no longer supports -Wno-error-..., only the -Wno-error=...
style (which is already used everywhere else in the tree).

Differential Revision:	https://reviews.freebsd.org/D29157
2021-03-10 17:34:35 -05:00
Alexander V. Chernikov
b1d63265ac Flush remaining routes from the routing table during VNET shutdown.
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
 VNET.

PR:	253998
Reported by:	rashey at superbox.pl
MFC after:	3 days

Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.

Test Plan:
```
set_skip:set_skip_group_lo  ->  passed  [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```

Reviewers: #network, kp, bz

Reviewed By: kp

Subscribers: imp, ae

Differential Revision: https://reviews.freebsd.org/D29116
2021-03-10 21:10:14 +00:00
John Baldwin
3fa034210c ktls: Fix non-inplace TLS 1.3 encryption.
Copy the iovec for the trailer from the proper place.  This is the same
fix for CBC encryption from ff6a7e4ba6.

Reported by:	gallatin
Reviewed by:	gallatin, markj
Fixes:		49f6925ca
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29177
2021-03-10 11:07:40 -08:00
Alexander Motin
2cee045b4d Move time math out of disabled interrupts sections.
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.

While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really.  The conversion
is already done by AcpiGetTimerDuration().  Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.

MFC after:	1 month
2021-03-10 13:52:51 -05:00
Olivier Houchard
c328f64d81 arm64: Fix COMPAT_FREEBSD32.
The ENTRY() macro was modified by commit
28d945204e to add an optional NOP instruction
at the beginning of the function. It is of course an arm64 instruction, so
unsuitable for the 32bits sigcode. So just use EENTRY() instead for
aarch32_sigcode. This should fix receiving signals when running 32bits
binaries on FreeBSD/arm64.

MFC After: 1 week
2021-03-10 19:06:42 +01:00
Mitchell Horne
7e7f7beee7 ns8250: don't drop IER_TXRDY on bus_grab/ungrab
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.

I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.

The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.

PR:		223917, 240122
Reviewed by:	imp, manu
Tested by:	bz
MFC after:	5 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29130
2021-03-10 11:04:42 -04:00
Alex Richardson
953a7d7c61 Arch64: Clear VFP state on execve()
I noticed that many of the math-related tests were failing on AArch64.
After a lot of debugging, I noticed that the floating point exception flags
were not being reset when starting a new process. This change resets the
VFP inside exec_setregs() to ensure no VFP register state is leaked from
parent processes to children.

This commit also moves the clearing of fpcr that was added in 65618fdda0
from fork() to execve() since that makes more sense: fork() can retain
current register values, but execve() should result in a well-defined
clean state.

Reviewed By:	andrew
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D29060
2021-03-10 12:44:42 +00:00
Hans Petter Selasky
dfb33cb0ef Allocating the LinuxKPI current structure from a software interrupt thread
must be done using the M_NOWAIT flag after 1ae20f7c70 .

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-10 13:27:40 +01:00
Hans Petter Selasky
6eb60f5b7f Use the word "LinuxKPI" instead of "Linux compatibility", to not confuse with
user-space Linux compatibility support. No functional change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-10 12:35:16 +01:00
Hans Petter Selasky
d1cbe79089 Allocating the LinuxKPI current structure from an interrupt thread must be
done using the M_NOWAIT flag after 1ae20f7c70 .

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-10 10:51:04 +01:00
Hans Petter Selasky
ebe5cf355d Implement basic support for allocating memory from a specific numa node
in the LinuxKPI.

Differential Revision:	https://reviews.freebsd.org/D29077
Reviewed by:	markj@ and kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-09 21:01:47 +01:00
Kyle Evans
94dddbfd00 if_wg: export tx_bytes, rx_bytes, and last_handshake
The names are self-explanatory; these are currently only used by the
wg(8) tool, but they are handy data points to have.

Reviewed by:	grehan
MFC after:	3 days
Discussed with:	decke
Differential Revision:	https://reviews.freebsd.org/D29143
2021-03-09 13:50:41 -06:00
Kyle Evans
0dd691b412 iflib: allow clone detach if not yet init
If we hit an error during init, then we'll unwind our state and attempt
to detach the device -- don't block it.

This was discovered by creating a wg0 with missing parameters; said
failure ended up leaving this orphaned device in place and ended up
panicking the system upon enumeration of the dev.* sysctl space.

Reviewed by:	gallatin, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29145
2021-03-09 13:49:13 -06:00
Kyle Evans
299f8977ce if_wg: wg_input: remove a couple locals (NFC)
We have no use for the udphdr or this hlen local, just spell out the
addition inline.

MFC after:	3 days
Reviewed by:	grehan, markj
Differential Revision:	https://reviews.freebsd.org/D29142
2021-03-09 13:49:13 -06:00
Jason A. Harmening
e4b8deb222 amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters.  Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.

The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios.  However, this change does add two new counters that
are available without PV_STATS.  pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively.  These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28923
2021-03-09 09:27:10 -08:00
Leandro Lupori
043577b721 ofwfb: fix boot on LE
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.

This fixes boot on PowerPC64 LE, when using ofwfb as the system console.

Reviewed by:    bdragon
Sponsored by:   Eldorado Research Institute (eldorado.org.br)
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D27475
2021-03-09 13:29:24 -03:00
Kyle Evans
b3dac3913d ifconfig: allow displaying/setting persistent-keepalive
The kernel-side already accepted a persistent-keepalive-interval, so
just add a verb to ifconfig(8) for it and start exporting it so that
ifconfig(8) can view it.

PR:		253790
MFC after:	3 days
Discussed with:	decke
2021-03-09 05:16:42 -06:00
Kyle Evans
1ae20f7c70 kern: malloc: fix panic on M_WAITOK during THREAD_NO_SLEEPING()
Simple condition flip; we wanted to panic here after epoch_trace_list().

Reviewed by:	glebius, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29125
2021-03-09 05:16:39 -06:00
Kyle Evans
e80e371d79 if_wg: avoid sleeping under the net epoch
No sleeping allowed here, so avoid it.  Collect the subset of data we
want inside of the epoch, as we'll need extra allocations when we add
items to the nvlist.

Reviewed by:	grehan (earlier version), markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29124
2021-03-09 05:16:30 -06:00
Kyle Evans
bae59285f9 if_wg: return to m_defrag() of incoming mbuf, sans leak
This partially reverts df55485085 but still fixes the leak. It was
overlooked (sigh) that some packets will exceed MHLEN and cannot be
physically contiguous without clustering, but we don't actually need
it to be. m_defrag() should pull up enough for any of the headers that
we do need to be accessible.

Fixes:	df55485085
Pointy hat;	kevans
2021-03-09 04:52:22 -06:00
Alexander Motin
075e4807df Do not read timer extra time when MWAIT is used.
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU.  Extra register read covers that time.  But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.

On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.

MFC after:	1 month
2021-03-08 18:43:47 -05:00
Alexander Motin
455219675d Change mwait_bm_avoidance use to match Linux.
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.

On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.

Discussed with:	kib
MFC after:	1 month
2021-03-08 18:27:36 -05:00
Warner Losh
6ffdaa5f2d Move back the isa non-PNP driver deadline to FreeBSD 14. 2021-03-08 16:00:23 -07:00
Warner Losh
88a5591203 config_intrhook: Move from TAILQ to STAILQ and padding
config_intrhook doesn't need to be a two-pointer TAILQ. We rarely add/delete
from this and so those need not be optimized. Instaed, use the one-pointer
STAILQ plus a uintptr_t to be used as a flags word. This will allow these
changes to be MFC'd to 12 and 13 to fix a race in removable devices.

Feedback from: jhb
Reviewed by: mav
Differential Revision:	https://reviews.freebsd.org/D29004
2021-03-08 15:59:00 -07:00
Alexander V. Chernikov
7634919e15 Fix 'in6_purgeaddr: err=65, destination address delete failed' message.
P2P ifa may require 2 routes: one is the loopback route, another is
 the "prefix" route towards its destination.

Current code marks loopback routes existence with IFA_RTSELF and
 "prefix" p2p routes with IFA_ROUTE.

For historic reasons, we fill in ifa_dstaddr for loopback interfaces.
To avoid installing the same route twice, we preemptively set
 IFA_RTSELF when adding "prefix" route for loopback.
However, the teardown part doesn't have this hack, so we try to
 remove the same route twice.

Fix this by checking if ifa_dstaddr is different from the ifa_addr
 and moving this logic into a separate function.

Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D29121
MFC after:	3 days
2021-03-08 21:28:35 +00:00
Jessica Clarke
f2f8405cf6 if_vtbe: Add missing includes to fix build
PR:		254137
Reported by:	Mina Galić <me@igalic.co>
MFC after:	3 days
Fixes:		f8bc74e2f4 ("tap: add support for virtio-net offloads")
2021-03-08 20:48:48 +00:00
Kyle Evans
8cc15b0dfc x86: tsc: deprioritize TSC on VirtualBox
Misbehavior has been observed with TSC under VirtualBox, where threads
doing small sleeps (~1 second) may miss their wake up and hang around
in a sleep state indefinitely.  Switching back to ACPI-fast decidedly
fixes it, so stop using TSC on VirtualBox at least for the time being.

This partially reverts 84eaf2ccc6, applying it only to VirtualBox and
increasing the quality to 0. Negative qualities can never be chosen and
cannot be chosen with the tunable recently added. If we do not have a
timecounter with a higher quality than 0, then TSC does at least leave
the system mostly usable.

PR:		253087
Reviewed by:	emaste, kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29132
2021-03-08 14:43:06 -06:00
Mark Johnston
1bb7c45c53 wg: Fix a mismerge
df55485085 fixed a leak that I had initially fixed in a11009dccb.

Fixes:	a11009dccb
2021-03-08 12:56:14 -05:00
Mark Johnston
ffe3def903 iflib: Make if_shared_ctx_t a pointer to const
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface.  This is especially important for
software-only drivers like wg.

DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.

Reviewed by:	gallatin, erj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29102
2021-03-08 12:39:06 -05:00
Mark Johnston
435c7cfb24 Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29103
2021-03-08 12:39:06 -05:00
Mark Johnston
0e72eb4602 ath_hal: Stop printing messages during boot
ath_hal is compiled into the kernel by default and so always prints a
message to dmesg even when the system has no ath hardware.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-08 12:39:06 -05:00
Mark Johnston
7995dae9d3 posix timers: Improve the overrun calculation
timer_settime(2) may be used to configure a timeout in the past.  If
the timer is also periodic, we also try to compute the number of timer
overruns that occurred between the initial timeout and the time at which
the timer fired.  This is done in a loop which iterates once per period
between the initial timeout and now.  If the period is small and the
initial timeout was a long time ago, this loop can take forever to run,
so the system is effectively DOSed.

Replace the loop with a more direct calculation of
(now - initial timeout) / period to compute the number of overruns.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29093
2021-03-08 12:39:06 -05:00
Mark Johnston
60d12ef952 posix timers: Sprinkle some style fixes
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-08 12:39:06 -05:00
Mark Johnston
8ff2b41c05 posix timers: Declare unexported functions as static
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-08 12:39:06 -05:00
Mark Johnston
d8cebef50e wg: Style
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-08 12:39:05 -05:00
Mark Johnston
a11009dccb wg: Avoid leaking mbufs when the input handshake queue is full
Reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29011
2021-03-08 12:39:05 -05:00
Alex Richardson
01fe4cac28 kern.mk: Fix wrong variable being used for linker path after 172a624f0
When I synchronized kern.mk with bsd.sys.mk, I accidentally changed
CCLDFLAGS to LDFLAGS which is not used by the kernel builds. This commit
should unbreak the GitHub actions cross-build CI. I didn't notice it
locally because cheribuild already passes -fuse-ld in the linker flags as
it predates this being done in the makefiles.

Reported By:	Jose Luis Duran
Fixes:		172a624f0 ("Silence annoying and incorrect non-default linker warning with GCC")
2021-03-08 09:37:46 +00:00
Kyle Evans
2b82c94acf if_wg: avoid null ptr deref
While we're here, sync up with OpenBSD and don't use a keypair !kp_valid

MFC after:	3 days
2021-03-08 00:25:34 -06:00
Kyle Evans
df55485085 wg_input: avoid leaking due to an m_defrag failure
m_defrag() will not free the chain on failure, leaking the mbuf.

Obtained from:	OpenBSD
MFC after:	3 days
2021-03-08 00:21:23 -06:00
Kyle Evans
d9a50109e2 if_wg: release correct lock in noise_remote_begin_session()
The keypair lock is not taken until later.

Obtained from:	Jason A. Donenfeld via OpenBSD
MFC after:	3 days
2021-03-08 00:21:03 -06:00
Konstantin Belousov
56b9bee63a Make kern.timecounter.hardware tunable
Noted and reviewed by:	kevans
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29122
2021-03-08 03:48:21 +02:00
Alexander V. Chernikov
d5be41beb7 Fix dpdk/ldradix fib lookup algorithm preference calculation.
The current preference number were copied from IPv4 code,
 assuming 500k routes to be the full-view. Adjust with the current
 reality (100k full-view).

Reported by:	Marek Zarychta <zarychtam at plan-b.pwste.edu.pl>
MFC after:	3 days
2021-03-07 22:17:53 +00:00
Oleksandr Tymoshenko
748be78e60 armv8crypto: fix AES-XTS regression introduced by ed9b7f44
Initialization of the XTS key schedule was accidentally dropped
when adding AES-GCM support so all-zero schedule was used instead.
This rendered previously created GELI partitions unusable.
This change restores proper XTS key schedule initialization.

Reported by:	Peter Jeremy <peter@rulingia.com>
MFC after:	immediately
2021-03-07 12:03:47 -08:00
Bjoern A. Zeeb
3fca90af43 net80211: ratectl header guard against multiple inclusions
Add missing #ifndef/#define/#endif guards against multiple inclusions
to ieee80211_ratectl.h as they are missing.

MFC after:	3 days
Sponsored-by:	Rubicon Communications, LLC ("Netgate")
2021-03-07 17:35:58 +00:00
Michal Meloun
01c6d79189 mvebu_gpio: Fix settings of gpio pin direction.
Data Output Enable Control register is inverted – 0 means output direction.
Reflect this fact in code.

MFC after:	3 weeks
2021-03-07 11:41:30 +01:00
Brandon Bergren
bad9fa5662 [PowerPC] Fix AP bringup on 32-bit AIM SMP
In r361544, the pmap drivers were converted to ifuncs. When doing so,
this changed the call type of pmap functions to be called via the
secure-plt stubs.

These stubs depend on the TOC base being loaded to r30 to run properly.

On SMP AIM (i.e. a dual processor G4 or running 32-bit on G5), since the
APs were being started up from the reset vector instead of going
through __start, they had never had r30 initialized properly, so when the
cpu_reset code in trap_subr32.S attempted to branch to
pmap_cpu_bootstrap(), it was loading the target from the wrong location.

Ensure r30 is set up directly in the cpu_reset trap code, so we can make
PLT calls as normal.

Fixes boot on my SMP G4.

Reviewed by:	jhibbits
MFC after:	3 days
Sponsored by:	Tag1 Consulting, Inc.
2021-03-06 15:46:28 -06:00
Edward Tomasz Napierala
cd84c82c6a linux: add support for SO_PEERGROUPS
The su(8) and sudo(8) from Ubuntu Bionic use it.

Sponsored By:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28165
2021-03-06 19:48:58 +00:00
Tai-hwa Liang
092f3f0812 net: fixing a memory leak in if_deregister_com_alloc()
Drain the callbacks upon if_deregister_com_alloc() such that the
if_com_free[type] won't be nullified before if_destroy().

Taking fwip(4) as an example, before this fix, kldunload if_fwip will
go through the following:

  1. fwip_detach()
  2. if_free() -> schedule if_destroy() through NET_EPOCH_CALL
  3. fwip_detach() returns
  4. firewire_modevent(MOD_UNLOAD) -> if_deregister_com_alloc()
  5. kernel complains about:
	Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).
  6. EPOCH runs if_destroy() -> if_free_internal()i

By this time, if_com_free[if_alloctype] is NULL since it's already
nullified by if_deregister_com_alloc(); hence, firewire_free() won't
have a chance to release the allocated fw_com.

Reviewed by:	hselasky, glebius
MFC after:	2 weeks
2021-03-06 14:43:16 +00:00
Gordon Bergling
e797dc58bd arm64: Add support for bcm2838 RNG
The hardware random number generator of the RPi4 differs slightly
from the version found on the RPi3.

This commit extends the existing bcm2835_rng driver to function on the RPi4.

Submitted by:	James Mintram <me at jamesrm dot com>
Reviewed by:	markm, cem, delphij
Approved by:	csprng(cem, markm)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22493
2021-03-06 12:28:35 +01:00
Hans Petter Selasky
c743a6bd4f Implement mallocarray_domainset(9) variant of mallocarray(9).
Reviewed by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-06 11:38:55 +01:00
Konstantin Belousov
ead7697f04 Restore AT_RESOLVE_BENEATH support for funlinkat(2)/unlinkat(2).
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-06 07:24:18 +02:00
Alexander Motin
6ed39db257 Do not exit ctl_be_block_worker() prematurely.
Return while there are any I/Os in a queue may result in them stuck
indefinitely, since there is only one taskqueue task for all of them.
I think I've reproduced this by switching ha_role to secondary under
heavy load.

MFC after:	3 days
2021-03-05 22:45:47 -05:00
Eric Joyner
d08b8680e1 ice(4): Update to version 0.28.1-k
This updates the driver to align with the version included in
the "Intel Ethernet Adapter Complete Driver Pack", version 25.6.

There are no major functional changes; this mostly contains
bug fixes and changes to prepare for new features. This version
of the driver uses the previously committed ice_ddp package
1.3.19.0.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Tested by:	jeffrey.e.pieper@intel.com
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D28640
2021-03-05 17:33:39 -08:00
Richard Scheffenegger
e53138694a tcp: Add prr_out in preparation for PRR/nonSACK and LRD
Reviewed By:           #transport, kbowling
MFC after:             3 days
Sponsored By:          Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D29058
2021-03-06 00:38:22 +01:00
Mark Johnston
fef8450971 netmap: Stop printing a line to the dmesg in netmap_init()
netmap is compiled into the kernel by default so initialization was
always reported, and netmap uses a formatting convention not used in the
rest of the kernel.

Reviewed by:	vmaffione
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29099
2021-03-05 18:07:47 -05:00
Navdeep Parhar
765d623d60 cxgbe(4): Remove extra blank line.
No functional change.
2021-03-05 12:48:39 -08:00
Navdeep Parhar
4a4e9c516c cxgbe(4): Fix an assertion that is not valid during attach.
Firmware access from t4_attach takes place without any synchronization.
The driver should not panic (debug kernels) if something goes wrong in
early communication with the firmware.  It should still load so that
it's possible to poke around with cxgbetool.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-03-05 11:28:18 -08:00
Eric van Gyzen
8b434feedf Only set delayed inval for procs using PTI
invltlb_invpcid_pti_handler() was requesting delayed TLB invalidation
even for processes that aren't using PTI.  With an out-of-tree
change to avoid PTI for non-jailed root processes, this caused an
assertion failure in pmap_activate_sw_pcid_pti() when context-switching
between PTI and non-PTI processes.

Reviewed by:	bdrewery kib tychon
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D29094
2021-03-05 13:20:08 -06:00
Mark Johnston
4fc60fa929 opencrypto: Make cryptosoft attach silently
cryptosoft is always present and doesn't print any useful information
when it attaches.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29098
2021-03-05 13:11:25 -05:00
Mark Johnston
89b650872b ktls: Hide initialization message behind bootverbose
We don't typically print anything when a subsystem initializes itself,
and KTLS is currently disabled by default anyway.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29097
2021-03-05 13:11:02 -05:00
John Baldwin
bb6e84c988 poly1305: Don't export generic Poly1305_* symbols from xform_poly1305.c.
There currently isn't a need to provide a public interface to a
software Poly1305 implementation beyond what is already available via
libsodium's APIs and these symbols conflict with symbols shared within
the ossl.ko module between ossl_poly1305.c and ossl_chacha20.c.

Reported by:	se, kp
Fixes:		78991a93eb
Sponsored by:	Netflix
2021-03-05 09:55:11 -08:00
Mark Johnston
732b69c9f9 acpi: Make nexus_acpi quiet on amd64 and i386
Otherwise during attach newbus prints "nexus0", which is not very
useful.

The generic nexus device is already quiet, as is nexus_acpi on arm64.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-05 12:54:00 -05:00
Richard Scheffenegger
9a13d9dcee tcp: remove a superfluous local var in tcp_sack_partialack()
No functional change.

Reviewed By: #transport, tuexen
MFC after:   3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29088
2021-03-05 18:20:23 +01:00
Richard Scheffenegger
4a8f3aad37 tcp: remove incorrect reset of SACK variable in PRR
Reviewed By:   #transport, rrs, tuexen
PR:            253848
MFC after:     3 days
Sponsored By:  NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29083
2021-03-05 17:45:54 +01:00
Michael Tuexen
705d06b289 rack: unbreak TCP fast open for the client side
Allow sending user data on the SYN segment.

Reviewed by:		rrs
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D29082
Sponsored by:		Netflix, Inc.
2021-03-05 16:03:03 +01:00
Martin Matuska
6781b8a32e zfs: update openzfs version reference to bedbc13da
It was missed in the latest merge.

X-MFC-with:	caed7b1c39
2021-03-05 15:55:58 +01:00
Michal Vanco
5084dde5f0 pchtherm: fix a wrong bit and a wrong register use
Probably just copy-paste errors that slipped in.

PR:		253915
Reported by:	Michal Vanco <michal.vanco@gmail.com>
MFC after:	1 week
2021-03-05 11:01:28 +02:00
Kristof Provost
29698ed904 pf: Mark struct pf_pdesc as kernel only
This structure is only used by the kernel module internally. It's not
shared with user space, so hide it behind #ifdef _KERNEL.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-03-05 09:21:06 +01:00
Wei Hu
80f39bd95f Hyper-V: hn: Store host hash value in flowid
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.

MFC after:	2 weeks
Sponsored by:	Microsoft
2021-03-05 04:46:54 +00:00
Mark Johnston
ff6a7e4ba6 ktls: Fix CBC encryption when input and output iov sizes are different
Reported by:	gallatin
Tested by:	gallatin
Fixes:		49f6925ca
Differential Revision:	https://reviews.freebsd.org/D29073
2021-03-04 22:45:40 -05:00
Mitchell Horne
0d3b3beeb2 riscv: fix errors in some atomic type aliases
This appears to be a copy-and-paste error that has simply been
overlooked. The tree contains only two calls to any of the affected
variants, but recent additions to the test suite started exercising the
call to atomic_clear_rel_int() in ng_leave_write(), reliably causing
panics.

Apparently, the issue was inherited from the arm64 atomic header. That
instance was addressed in c90baf6817, but the fix did not make its way
to RISC-V.

Note that the particular test case ng_macfilter_test:main still appears
to fail on this platform, but this change reduces the panic to a
timeout.

PR:		253237
Reported by:	Jenkins, arichardson
Reviewed by:	kp, arichardson
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29064
2021-03-04 16:59:58 -04:00
Kristof Provost
448732b8e2 altq: Increase maximum number of CBQ and HFSC classes
In some configurations we need more classes than ALTQ supports by
default.  Increase the maximum number of classes we allow.
This will only cost us a comparatively trivial amount of memory, so
there's little reason not to do so.

If ever we find we want even more we may want to consider turning these
defines into a tunable, but for now do the easy thing.

Reviewed by:	donner@
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29034
2021-03-04 20:58:22 +01:00
Kristof Provost
bb4a7d94b9 net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros
Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.

Use them where appropriate.

Reviewed by:	ae (previous version), rscheff, tuexen, rgrimes
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29056
2021-03-04 20:56:48 +01:00
Kristof Provost
f19323847c pf: Retrieve DSCP value from the IPv6 header
Teach pf to read the DSCP value from the IPv6 header so that we can
match on them.

Reviewed by:	donner
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29048
2021-03-04 20:56:48 +01:00
Alex Richardson
172a624f0c Silence annoying and incorrect non-default linker warning with GCC
The CROSS_TOOLCHAIN GCC .mk files include -B${CROSS_BINUTILS_PREFIX}, so
GCC will select the right linker and we don't need to warn.
While here also apply 17b8b8fb5f to kern.mk.

Test Plan:	no more warning printed with CROSS_TOOLCHAIN=mips-gcc6
Reviewed By:	jhb
Differential Revision: https://reviews.freebsd.org/D29015
2021-03-04 18:27:39 +00:00
Alex Richardson
b2c8cbf992 Remove unnecessary semicolon from CRITICAL_ASSERT()
This fixes off-by-default "empty statement" compiler warnings.
2021-03-04 18:26:50 +00:00
Alex Richardson
0072e5e0f3 sys/arm64/arm64/vfp.c: Fix -Wunused and -Wpointer-sign warnings
These are off by default but were flagged by my IDE while adding some
debugging printfs for D29060.
2021-03-04 18:25:44 +00:00
Michal Meloun
a5dce53b75 mvebu_gpio: Multiple fixes.
- gpio register access primitives
- locking in interrupt path
- cleanup

In cooperation with: mw
Reviewed by:	mw (initial version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D29044
Differential Revision:	https://reviews.freebsd.org/D28911
2021-03-04 17:54:40 +01:00
Michal Meloun
f97f57b518 simple_mfd: switch to controllable locking for syscon provider.
MFC after	3 weeks
2021-03-04 16:12:39 +01:00
Mark Johnston
5e6989ba4f link_elf_obj: Handle init_array sections in KLDs
Reuse existing handling for .ctors, print a warning if multiple
constructor sections are present.   Destructors are not handled as of
yet.

This is required for KASAN.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29049
2021-03-04 10:07:10 -05:00
Andrew Turner
23553d6b94 Fix creating the early arm64 level 2 blocks
In 48ba9b2669 we switched from creating level 1 blocks to smaller
level 2 blocks when creating the early arm64 page tables. On issue
was that they had a different meaning for register x7. The former used
it to hold page table attributes, while the latter held just the memory
type. This caused these attributes to be incorrectly shifted.

Fix this by changing the meaning of x7 to hold the block attributes
and fix the only caller that used the old meaning.

Most hardware seems to have handled the bits being off however qemu
failed to boot as reserved bits that should be zero were being set and
qemu fails to clear these when translating from a virtual address to a
physical address.

Sponsored by:	Innovate UK
2021-03-04 14:39:12 +00:00
John Baldwin
78991a93eb ossl: Add support for the ChaCha20 + Poly1305 AEAD cipher from RFC 8439
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D28757
2021-03-03 15:20:57 -08:00
John Baldwin
92aecd1e6f ossl: Add ChaCha20 cipher support.
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D28756
2021-03-03 15:20:57 -08:00
John Baldwin
a079e38b08 ossl: Add Poly1305 digest support.
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D28754
2021-03-03 15:20:57 -08:00
Vladimir Kondratyev
6241b57131 hid: add opt_hid.h to modules that use HID_DEBUG
Submitted by:	Greg V <greg_AT_unrelenting_DOT_technology>
Reviewed by:	imp, wulf
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28995
2021-03-04 01:43:29 +03:00
Mark Johnston
49f6925ca3 ktls: Cache output buffers for software encryption
Maintain a cache of physically contiguous runs of pages for use as
output buffers when software encryption is configured and in-place
encryption is not possible.  This makes allocation and free cheaper
since in the common case we avoid touching the vm_page structures for
the buffer, and fewer calls into UMA are needed.  gallatin@ reports a
~10% absolute decrease in CPU usage with sendfile/KTLS on a Xeon after
this change.

It is possible that we will not be able to allocate these buffers if
physical memory is fragmented.  To avoid frequently calling into the
physical memory allocator in this scenario, rate-limit allocation
attempts after a failure.  In the failure case we fall back to the old
behaviour of allocating a page at a time.

N.B.: this scheme could be simplified, either by simply using malloc()
and looking up the PAs of the pages backing the buffer, or by falling
back to page by page allocation and creating a mapping in the cache
zone.  This requires some way to save a mapping of an M_EXTPG page array
in the mbuf, though.  m_data is not really appropriate.  The second
approach may be possible by saving the mapping in the plinks union of
the first vm_page structure of the array, but this would force a vm_page
access when freeing an mbuf.

Reviewed by:	gallatin, jhb
Tested by:	gallatin
Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D28556
2021-03-03 17:34:01 -05:00
Alexander Motin
afc3e54eee Move ic_check_send_space clear to the actual check.
It closes tiny race when the flag could be set between being cleared
and the space is checked, that would create us some more work.  The
flag setting is protected by both locks, so we can clear it in either
place, but in between both locks are dropped.

MFC after:	1 week
2021-03-03 15:29:35 -05:00
Alexander Motin
aff9b9ee89 Restore condition removed in df3747c660.
I think it allowed to avoid some TX thread wakeups while the socket
buffer is full.  But add there another options if ic_check_send_space
is set, which means socket just reported that new space appeared, so
it may have sense to pull more data from ic_to_send for better TX
coalescing.

MFC after:	1 week
2021-03-03 12:03:08 -05:00
Zyta Szpak
622d17da46 arm64: mv_ap806_gicp: Fix spi_ranges_cnt
Previously the spi_ranges_cnt stored the table size in bytes
instead of the number of elements. Fix that.

Reviewed by: mmel
Submitted by: Zyta Szpak <zr@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
2021-03-03 17:08:12 +01:00
Andrew Turner
28d945204e Handle functions that use a nop in the arm64 fbt
To trace leaf asm functions we can insert a single nop instruction as
the first instruction in a function and trigger off this.

Reviewed by:	gnn
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28132
2021-03-03 14:18:03 +00:00
Andrew Turner
48ba9b2669 Use L2 blocks when in the identity map
This reduces the memory mapped to be closer to the minimal memory
needed to enable the MMU.

Reviewed by:	mmel
Sponsored by:	Innovate UK
Differential Revision:://reviews.freebsd.org/D27765
2021-03-03 14:18:03 +00:00
Xin LI
5842073a9b arcmsr(4): Fixed no action of hot plugging device on type_F adapter.
Many thanks to Areca for continuing to support FreeBSD.

Submitted by:	黃清隆 <ching2048 areca com tw>
MFC after:	3 days
2021-03-02 22:57:20 -08:00
Krzysztof Galazka
21802a127d ixl(4): Add ability to control link state on ifconfig down
Add sysctl link_active_on_if_down, which allows user to control
if interface is kept in active state when it is brought
down with ifconfig. Set it to enabled by default to preserve
backwards compatibility.

Reviewed by:	erj
Tested by:	gowtham.kumar.ks@intel.com
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D28028
2021-03-02 17:43:38 -08:00
Krzysztof Galazka
9f99061ef9 ixl(4): Report RX errors as sum of all RX error counters
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.

There is no HW counter for frames with invalid L3/L4 checksums
so add a SW one.

Also add a "rx_errors" sysctl with a copy of netstat IERRORS
counter value to make it easier accessible from scripts.

Reviewed By:	erj
Tested By:	gowtham.kumar.ks@intel.com
Sponsored By:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D27639
2021-03-02 17:37:04 -08:00
Warner Losh
b3fce46a3e cam: remove redundant scsi_vpd_block_characteristics definition
There were two definitions for the SCSI VPD Block Device Characteristics (page
0xb1): struct scsi_vpd_block_characteristics and struct
scsi_vpd_block_device_characteristics. The latter is more complete and more
widely used. Convert uses of the former to the latter by tweaking the da driver
and removing sturct scsi_vpd_block_characteristics.
2021-03-02 18:35:09 -07:00
Alan Somers
ab63da3564 Speed up geom_stats_resync in the presence of many devices
The old code had a O(n) loop, where n is the size of /dev/devstat.
Multiply that by another O(n) loop in devstat_mmap for a total of
O(n^2).

This change adds DIOCGMEDIASIZE support to /dev/devstat so userland can
quickly determine the right amount of memory to map, eliminating the
O(n) loop in userland.

This change decreases the time to run "gstat -bI0.001" with 16,384 md
devices from 29.7s to 4.2s.

Also, fix a memory leak first reported as PR 203097.

Sponsored by:	Axcient
Reviewed by:	mav, imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28968
2021-03-02 18:33:45 -07:00
Piotr Pietruszewski
afb1aa4e6d ix(4): Report RX errors as sum of all RX error counters
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.

Also, add new "rx_errs" sysctl in the dev.ix.N.mac_stats tree. This is
to provide an another way to display the sum of RX errors.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27191
2021-03-02 17:25:32 -08:00
Martin Matuska
caed7b1c39 zfs: merge OpenZFS master-bedbc13da
Notable upstream commits:
  8e43fa12c Fix vdev_rebuild_thread deadlock
  03ef8f09e Add missing checks for unsupported features
  2e160dee9 Fix assert in FreeBSD-specific dmu_read_pages
  bedbc13da Cancel TRIM / initialize on FAULTED non-writeable vdevs

MFC after:	1 week
Obtained from:	OpenZFS
2021-03-03 02:15:33 +01:00
Alexander Motin
df3747c660 Replace STAILQ_SWAP() with simpler STAILQ_CONCAT().
Also remove stray STAILQ_REMOVE_AFTER(), not causing problems only
because STAILQ_SWAP() fixed corrupted stqh_last.

MFC after:	1 week
2021-03-02 18:48:49 -05:00
Marcin Wojtas
09c3f04ff3 iflib: add support for admin completion queues
For interfaces with admin completion queues, introduce a new devmethod
IFDI_ADMIN_COMPLETION_HANDLE and a corresponding flag IFLIB_HAS_ADMINCQ.

This provides an option for handling any admin cq logic, which cannot be
run from an interrupt context.

Said method is called from within iflib's admin task, making it safe to
sleep.

Reviewed by: mmacy
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28708
2021-03-03 00:40:47 +01:00
Marcin Wojtas
819760b35f mvebu_gpio: fix interrupt cause register configuration
According to Armada 8k documentation, the interrupt cause register
(at offset 0x14) is RW0C. Update the configuration in attach and
the mvebu_gpio_isrc_eoi() to follow the description.

Reviewed by: mmel
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D29013
2021-03-03 00:22:42 +01:00
Robert Wing
0cc746f193 filt_fsevent: only record interested events
Respect filter-specific flags for the EVFILT_FS filter.

When a kevent is registered with the EVFILT_FS filter, it is always
triggered when an EVFILT_FS event occurs, regardless of the
filter-specific flags used. Fix that.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28974
2021-03-02 14:19:22 -09:00
Alfredo Dal'Ava Junior
231633a2e9 [PowerPC64] add mpr to GENERIC64 and GENERIC64LE
Submitted by:	Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by:	luporl, alfredo, Sreekanth Reddy <sreekanth.reddy@broadcom.com> (by email)
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25785
2021-03-02 22:21:43 -03:00
Alfredo Dal'Ava Junior
71900a794d mpr: big-endian support
This fixes mpr driver on big-endian devices.
Tested on powerpc64 and powerpc64le targets using a SAS9300-8i card
(LSISAS3008 pci vendor=0x1000 device=0x0097)

Submitted by:	Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by:	luporl, alfredo, Sreekanth Reddy <sreekanth.reddy@broadcom.com> (by email)
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25785
2021-03-02 22:21:42 -03:00
Rick Macklem
c04199affe nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing I/O DS operations on a Flexible File Layout pNFS
server.  For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
instead of the ReadDS/WriteDS/CommitDS counts.
This patch fixes this.

Only the RPC counts reported by nfsstat(1) were affected by this bug,
the I/O operations were performed correctly.

MFC after:	2 weeks
2021-03-02 14:18:23 -08:00
Alexander Motin
06e9c71099 Fix initiator panic after 6895f89fe5.
There are sessions without socket that are not disconnecting yet.

MFC after:	3 weeks
2021-03-02 16:11:29 -05:00
Konstantin Belousov
28cd3a673e O_RELATIVE_BENEATH: return ENOTCAPABLE instead of EINVAL for abs path
Requested and reviewed by:	markj
Tested by:	arichardson,  pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:40 +02:00
Konstantin Belousov
49c98a4bf3 nameicap_check_dotdot: trim tracker on check
Tracker should contain exactly the path from the starting directory to
the current lookup point. Otherwise we might not detect some cases of
dotdot escape. Consequently, if we are walking up the tree by dotdot
lookup, we must remove an entries below the walked directory.

Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:35 +02:00
Konstantin Belousov
e8a2862aa0 Add nameicap_cleanup_from(), to clean tracker list starting from some element
Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:30 +02:00
Konstantin Belousov
2388ad7c29 nameicap_tracker_add: avoid duplicates in the tracker list
Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:23 +02:00
Konstantin Belousov
59e7494281 Do not call nameicap_tracker_add() for dotdot case.
Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:14 +02:00
Konstantin Belousov
20e91ca36a open(2): Remove O_BENEATH and AT_BENEATH
with the reasoning that the flags did not worked properly, and were not
shipped in a release.

O_RESOLVE_BENEATH is kept as useful.

Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:16:55 +02:00
Mark Johnston
0401989282 vm: Round up npages and alignment for contig reclamation
When searching for runs to reclaim, we need to ensure that the entire
run will be added to the buddy allocator as a single unit.  Otherwise,
it will not be visible to vm_phys_alloc_contig() as it is currently
implemented.  This is a problem for allocation requests that are not a
power of 2 in size, as with 9KB jumbo mbuf clusters.

Reported by:	alc
Reviewed by:	alc
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28924
2021-03-02 10:21:02 -05:00
Michael Tuexen
99adf23006 RACK: fix an issue triggered by using the CDG CC module
Obtained from:		rrs@
MFC after:		3 days
PR:			238741
Sponsored by:		Netlix, Inc.
2021-03-02 12:32:16 +01:00
Andrey V. Elsukov
a9f7eba959 ipfw: add IPv6 support for sockarg opcode.
MFC after:	1 week
Sponsored by:	Yandex LLC
2021-03-02 12:45:59 +03:00
Peter Grehan
95331c228a Import wireguard fixes from pfSense 2.5
Merge the following fixes from https://github.com/pfsense/FreeBSD-src
 1940e7d3  Save	address	of ingress packets to allow wg to work on HA
 8f5531f1  Fix connection to IPv6 endpoint
 825ed9ee  Fix tcpdump for wg IPv6 rx tunnel traffic
 2ec232d3  Fix issue with replying to INITIATION messages in server mode
 ec77593a  Return immediately in wg_init if in DETACH'd state
 0f0dde6f  Remove unnecessary wg debug printf on transmit
 2766dc94  Detect and fix case in wg_init() where sockets weren't cleaned up
 b62cc7ac  Close the UDP tunnel sockets when the interface has been stopped

Reviewed by:	kevans
Obtained from:	pfSense 2.5
MFC after:	3 days
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D28962
2021-03-02 14:55:46 +10:00
Alexander Motin
b85a67f54a Optimize TX coalescing by keeping pointer to last mbuf.
Before m_cat() each time traversed through all the coalesced chain.

MFC after:	1 week
2021-03-01 23:34:11 -05:00
Konstantin Belousov
8742817ba6 FFS extattr: fix handling of the tail
There are three issues with change that stopped truncating ea area before
write, and resulted in possible zero tail in the ea area:
- Truncate to zero checked i_ea_len after the reference was dropped,
  making the last drop effectively truncate to zero length always.
- Loop to fill uio for zeroing specified too large length, that triggered
  assert in normal situation.
- Integrity check could trip over the tail, instead we must allow
  partial header or header with zero length, and clamp ea image in
  memory at it.

Reported by:	arichardson
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Fixup:	5e198e7646
Differential Revision:	https://reviews.freebsd.org/D28999
2021-03-02 02:19:34 +02:00
Alexander Motin
a59e2982fe Optimize out few extra memory accesses.
MFC after:	1 week
2021-03-01 18:36:33 -05:00
Rick Macklem
94f2e42f5e nfsclient: Fix the stripe unit size for a File Layout pNFS layout
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing a File Layout pNFS DS I/O operation.
The size of the I/O operation was smaller than expected.
The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
in the layout.  I had misinterpreted RFC5661 and had shifted the value
right by 6 bits. The correct interpretation is to use the value as
presented (it is always an exact multiple of 64), clearing bits 0->5.
This patch fixes this.

Without the patch, I/O through the DSs work, but the I/O size is 1/64th
of what is optimal.

MFC after:	2 weeks
2021-03-01 12:49:32 -08:00
Kyle Evans
60c4ec806d jail: allow root to implicitly widen its cpuset to attach
The default behavior for attaching processes to jails is that the jail's
cpuset augments the attaching processes, so that it cannot be used to
escalate a user's ability to take advantage of more CPUs than the
administrator wanted them to.

This is problematic when root needs to manage jails that have disjoint
sets with whatever process is attaching, as this would otherwise result
in a deadlock. Therefore, if we did not have an appropriate common
subset of cpus/domains for our new policy, we now allow the process to
simply take on the jail set *if* it has the privilege to widen its mask
anyways.

With the new logic, root can still usefully cpuset a process that
attaches to a jail with the desire of maintaining the set it was given
pre-attachment while still retaining the ability to manage child jails
without jumping through hoops.

A test has been added to demonstrate the issue; cpuset of a process
down to just the first CPU and attempting to attach to a jail without
access to any of the same CPUs previously resulted in EDEADLK and now
results in taking on the jail's mask for privileged users.

PR:		253724
Reviewed by:	jamie (also discussed with)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28952
2021-03-01 12:38:31 -06:00
Richard Scheffenegger
0b0f8b359d calculate prr_out correctly when pipe < ssthresh
Reviewed By:	#transport, tuexen
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28998
2021-03-01 16:26:05 +01:00
Rick Macklem
a5f9fe2bab copy_file_range(2): Fix for small values of input file offset and len
r366302 broke copy_file_range(2) for small values of
input file offset and len.

It was possible for rem to be greater than len and then
"len - rem" was a large value, since both variables are
unsigned.

Reported by: koobs, Pablo <pablogsal gmail com> (Python)
Reviewed by:	asomers, koobs
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28981
2021-03-01 06:31:10 -08:00
Alex Richardson
0e4ff0acbe AArch64: Don't set flush-subnormals-to-zero flag on startup
This flag has been set on startup since 65618fdda0.
However, This causes some of the math-related tests to fail as they report
zero instead of a tiny number. This fixes at least
/usr/tests/lib/msun/ldexp_test and possibly others.
Additionally, setting this flag prevents printf() from printing subnormal
numbers in decimal form.
See also https://www.openwall.com/lists/musl/2021/02/26/1

PR:		253847
Reviewed By:	mmel
Differential Revision: https://reviews.freebsd.org/D28938
2021-03-01 14:27:30 +00:00
Mitchell Horne
e152c88273 arm64: add definition for IS_SSTEP_TRAP()
arm64 has a distinct exception code for single-step, so we can use this
to detect when an unexpected SS trap is encountered, or when an expected
one is not. See db_stop_at_pc().

Reviewed by:	markj, jhb
MFC after:	5 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28942
2021-03-01 10:04:23 -04:00
Mitchell Horne
bd0b7cbf5a arm64: update kdb_thrctx->pcb_lr with BKPT_SKIP
This value should be kept in sync with updates to kdb_frame->tf_elr,
since it is queried by PC_REGS() in several places.

Reviewed by:	markj, jhb
MFC after:	5 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28943
2021-03-01 10:04:22 -04:00
Mitchell Horne
874635e381 arm64: fix hardware single-stepping from EL1
The main issue is that debug exceptions must to be disabled for the
entire duration that SS bit in MDSCR_EL1 is set. Otherwise, a
single-step exception will be generated immediately. This can occur
before returning from the debugger (when MDSCR is written to) or before
re-entering it after the single-step (when debug exceptions are unmasked
in the exception handler).

Solve this by delaying the unmask to C code for EL1, and avoid unmasking
at all while handling debug exceptions, thus avoiding any recursive
debug traps.

Reviewed by:	markj, jhb
MFC after:	5 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28944
2021-03-01 10:04:22 -04:00
Michal Meloun
ce5a4083de pci_dw_mv: Don't enable unhandled interrupts.
Mainly link errors interrupts should only be activated on fully linked port,
otherwise noise on lanes can cause livelock. But we don't have error
counters yet, so leave these interrupts disabled.
2021-03-01 14:03:34 +01:00
Elliott Mitchell
a2c0e94ccf xen: remove x86-ism from Xen common code
PAT_WRITE_BACK is x86-only, whereas sys/dev/xen could be shared
between multiple architectures.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28831
2021-03-01 13:33:01 +01:00
Mateusz Guzik
2c1c1255e4 kcsan: add atomic_interrupt_fence
Unbreaks building GENERIC-KCSAN.
2021-03-01 07:43:27 +00:00
Rick Macklem
15bed8c46b nfsclient: add nfs node locking around uses of n_direofoffset
During code inspection I noticed that the n_direofoffset field
of the NFS node was being manipulated without any lock being
held to make it SMP safe.
This patch adds locking of the NFS node's mutex around
handling of n_direofoffset to make it SMP safe.

I have not seen any failure that could be attributed to n_direofoffset
being manipulated concurrently by multiple processors, but I think this
is possible, since directories are read with shared vnode
locking, plus locks only on individual buffer cache blocks.
However, there have been as yet unexplained issues w.r.t reading
large directories over NFS that could have conceivably been caused
by concurrent manipulation of n_direofoffset.

MFC after:	2 weeks
2021-02-28 14:53:54 -08:00
Rick Macklem
3e04ab36ba nfsclient: add checks for a server returning the current directory
Commit 3fe2c68ba2 dealt with a panic in cache_enter_time() where
the vnode referred to the directory argument.
It would also be possible to get these panics if a broken
NFS server were to return the directory as an new object being
created within the directory or in a Lookup reply.

This patch adds checks to avoid the panics and logs
messages to indicate that the server is broken for the
file object creation cases.

Reviewd by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28987
2021-02-28 14:15:32 -08:00
Elliott Mitchell
530d38441d armv8crypto: add missing newline
The missing newline mildly garbles boot-time messages and this can be
troublesome if you need those.

Fixes:		a520f5ca58 ("armv8crypto: print a message on probe failure")
Reported by:	Mike Karels (mike@karels.net)
Reviewed By:	gonzo
Differential Revision:	https://reviews.freebsd.org/D28988
2021-02-28 16:03:55 -04:00
Bjoern A. Zeeb
a9cc796fa7 net80211: rx_stats add 160Mhz channel width.
Add the missing receive stat(u)s flag for 160Mhz channel width.
While here correct the comment for c_phytype to reference the correct
flags.

MFC-after:	3 days
Sponsored-by:	Rubicon Communications, LLC ("Netgate")
2021-02-28 19:24:22 +00:00
Richard Scheffenegger
e9071000c9 Improve PRR initial transmission timing
Reviewed By:	tuexen, #transport
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28953
2021-02-28 15:46:54 +01:00
Alexander Motin
d01032736c Fix diroffdiroff, probably copy/paste bug.
Too long name looks bad in `vmstat -m`.

MFC after:	1 week
2021-02-28 09:08:31 -05:00
Rick Macklem
3fe2c68ba2 nfsclient: fix panic in cache_enter_time()
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.

This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.

Reported by:	otis
Tested by:	otis
Reviewed by:	mjg
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28969
2021-02-27 17:54:05 -08:00
Konstantin Belousov
b5449c92b4 Use atomic_interrupt_fence() instead of bare __compiler_membar()
for the which which definitely use membar to sync with interrupt handlers.

libc and rtld uses of __compiler_membar() seems to want compiler barriers
proper.

The barrier in sched_unpin_lite() after td_pinned decrement seems to be not
needed and removed, instead of convertion.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28956
2021-02-28 01:27:29 +02:00
Mateusz Guzik
1d8510c1a6 zfs: add missing seqc write begin/end around zfs_acl_chown_setattr
It happens to trip over an assert but does not matter for correctness at
this time. However, do it for future proofing.

Reported by:	avg
2021-02-27 22:29:50 +00:00
Mateusz Guzik
1239a72221 cache: temporarily drop the assert that dvp != vp when adding an entry
Historically it was allowed for any names, but arguably should never be
even attempted. Allow it again since there is a release pending and
allowing it is bug-compatible with previous behavior.

Reported by:	otis
2021-02-27 22:29:50 +00:00
Michael Tuexen
70e95f0b69 sctp: avoid integer overflow when starting the HB timer
MFC after:	3 days
Reported by:	syzbot+14b9d7c3c64208fae62f@syzkaller.appspotmail.com
2021-02-27 23:27:30 +01:00
Robert Watson
a92c6b24c0 Add a comment on why the call to mac_vnode_relabel() might be in the wrong
place -- in the VOP rather than vn_setexttr() -- and that it is for historic
reasons.  We might wish to relocate it in due course, but this way at least
we document the asymmetry.
2021-02-27 16:25:26 +00:00
Alexander Motin
9d9fd8b79f Micro-optimize OOA queue processing.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
  the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
  only one row (cache line).  Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.

Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve.  About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.

Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2021-02-27 10:40:24 -05:00
Warner Losh
c01da939b0 cardbus: Be sure to acquire Giant when calling into newbus
Acquire Giant in cardbus_detach_card. This used to be done above us, but no
more.

Tested by: kargl@
MFC After: 3 days
2021-02-27 01:23:09 -07:00
Martin Matuska
c170aa9f37 zfs: add missing checks for unsupported features
After the merge of OpenZFS master-9312e0fd1 it has become possible to
import ZFS pools witn an active org.illumos:edonr feature on FreeBSD,
leading to a panic.

In addition, "zpool status" reported all pools without edonr as upgradable
and "zpool upgrade -v" lists edonr in the list of upgradable features.

This is an accepted but not yet included bugfix by upstream.

Obtained from:		https://github.com/openzfs/zfs/pull/11653
Differential Revision:	https://reviews.freebsd.org/D28935
Reported by:		garga (on freebsd-current@)
Reviewed by:		freqlabs
X-MFC-with:		ba27dd8be8
2021-02-27 00:05:50 +01:00
Richard Scheffenegger
9e83a6a556 Include new data sent in PRR calculation
Reviewed By:	#transport, kbowling
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28941
2021-02-26 22:31:58 +01:00
Warner Losh
34d6961108 cam: add new ASC and ASCQ values related to drive depopulation
Add 04/25 Depopulation restoration in progress, 31/04 Depopulation failed, and
31/05 Depopulation restoration failed.

These are defined in SPC-6r2 (though 31/4 was added in an earlier draft). They
relate to different aspects of in-progress or failed depopulation removal and
restoration commands.
2021-02-26 11:45:06 -07:00
Navdeep Parhar
dfff1de729 cxgbe(4): Read the rx 'c' channel for a port and make it available.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-02-25 23:46:14 -08:00
Jamie Gritton
589e4c1df4 jail: Add safety around prison_deref() flags.
do_jail_attach() now only uses the PD_XXX flags that refer to lock
status, so make sure that something else like PD_KILL doesn't slip
through.

Add a KASSERT() in prison_deref() to catch any further PD_KILL misuse.
2021-02-25 20:10:42 -08:00
Jamie Gritton
108a9384e9 jail: Fix locking on an early jail_set error.
I had locked allprison_lock without immediately setting PD_LIST_LOCKED.
2021-02-25 19:52:58 -08:00
Mitchell Horne
1fc928770b Remove stale references to opt_sio.h
The sio(4) driver was removed entirely in 2019, commit 71f0077631.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D28929
2021-02-25 21:43:12 -04:00
Alexander Motin
a9bd22814f Remove pointless lun->be_lun checks.
There is no such thing as LUN without backend, at least for years.

MFC after:	1 week
2021-02-25 19:48:03 -05:00
Mark Johnston
aac25e2225 pmap: Fix largemap restart checks in the kernel_maps sysctl handler
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system.  However, if PG_PS is set, there is no next level.

Reported by:	rpokala
Reviewed by:	kib
Tested by:	rpokala
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28922
2021-02-25 18:49:47 -05:00
John Baldwin
90972f0402 ktls: Use COUNTER_U64_DEFINE_EARLY for the ktls_toe_chacha20 counter.
I missed updating this counter when rebasing the changes in
9c64fc4029 after the switch to
COUNTER_U64_DEFINE_EARLY in 1755b2b989.

Fixes:		9c64fc4029 Add Chacha20-Poly1305 as a KTLS cipher suite.
Sponsored by:	Netflix
2021-02-25 15:00:13 -08:00
Kristof Provost
5f1b1f184b pf: Fix incorrect fragment handling
A sequence of overlapping IPv4 fragments could crash the kernel in
pf due to an assertion.

Reported by:	Alexander Bluhm
Obtained from:	OpenBSD
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-02-25 21:51:08 +01:00
Adrian Chadd
547739cc00 [ar71xx] Fix routerstation / routerstation pro redboot FIS probing
Some changes back in ye olde times somewhere has changed the default
block size the flash device exposes.  So, the default geom redboot
FIS probing (to find the partition table structure in flash!)
is no longer finding it.

So, force it to probe at the last 64k of flash regardless of the
underlying flash block size.

Tested:

* Ubiquiti Routerstation pro, boots -HEAD MIPS
2021-02-25 13:14:55 -08:00
Brandon Bergren
5001c579ba [PowerPC64LE] pseries: Fix input buffering logic.
In uart_phyp_get(), when the internal buffer is empty, we make a
hypercall to retrieve up to 16 bytes of input data from the
hypervisor. As this is specified to be returned in BE format, we need
to do a 64-bit byte swap on the first and second half of the data.

If the buffer being passed in was insufficient to return the fetched
data, we store the remainder in the internal buffer and use it to
satisfy the following calls to uart_phyp_get() until it is drained.

However, in this case, we were accidentally byteswapping the internal
buffer again.

Move the byteswapping code to just after the hypercall so it only gets
swapped when we're filling the buffer.

Fixes arrow keys in qemu on pseries, among other console oddities.

Sponsored by:	Tag1 Consulting, Inc.
MFC after:	3 days
2021-02-25 14:50:13 -06:00
Ryan Libby
d7671ad8d6 Close races in vm object chain traversal for unlock
We were unlocking the vm object before reading the backing_object field.
In the meantime, the object could be freed and reused.  This could cause
us to go off the rails in the object chain traversal, failing to unlock
the rest of the objects in the original chain and corrupting the lock
state of the victim chain.

Reviewed by:	bdrewery, kib, markj, vangyzen
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28926
2021-02-25 12:11:19 -08:00
Edward Tomasz Napierala
e7a5b3bd05 Modify lock_delay() to increase the delay time after spinning
Modify lock_delay() to increase the delay time after spinning,
not before.  Previously we would spin at least twice instead of once.
In NetApp's benchmarks this fixes a performance regression compared
to FreeBSD 10, which called cpu_spinwait() directly.

Reviewed By:	mjg
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27331
2021-02-25 18:55:26 +00:00
Richard Scheffenegger
2593f858d7 A TCP server has to take into consideration, if TCP_NOOPT is preventing
the negotiation of TCP features. This affects most TCP options but
adherance to RFC7323 with the timestamp option will prevent a session
from getting established.

PR:	253576
Reviewed By:	tuexen, #transport
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28652
2021-02-25 19:12:20 +01:00
Richard Scheffenegger
31d7a27c6e PRR: Avoid accounting left-edge twice in partial ACK.
Reviewed By:	#transport, kbowling
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28819
2021-02-25 18:37:47 +01:00
Richard Scheffenegger
48396dc779 Address two incorrect calculations and enhance readability of PRR code
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)

Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
2021-02-25 18:32:04 +01:00
Mark Johnston
369706a6f8 buf: Fix the dirtybufthresh check
dirtybufthresh is a watermark, slightly below the high watermark for
dirty buffers.  When a delayed write is issued, the dirtying thread will
start flushing buffers if the dirtybufthresh watermark is reached.  This
helps ensure that the high watermark is not reached, otherwise
performance will degrade as clustering and other optimizations are
disabled (see buf_dirty_count_severe()).

When the buffer cache was partitioned into "domains", the dirtybufthresh
threshold checks were not updated.  Fix this.

Reported by:	Shrikanth R Kamath <kshrikanth@juniper.net>
Reviewed by:	rlibby, mckusick, kib, bdrewery
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Fixes:		3cec5c77d6
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28901
2021-02-25 10:04:44 -05:00
Mark Johnston
faa998f6ff sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size.  When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file.  An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager.  In other words, both the file size and pager
size were used interchangeably.

With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES.  In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache.  The on-disk copy is unaffected.

Fix this by consistently using the pager size when available.

Reported by:	dumbbell
Reviewed by:	chs, kib
Tested by:	dumbbell, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28811
2021-02-25 10:04:44 -05:00
Andrew Turner
3fd63ddfdf Limit when we call DELAY from KCSAN on amd64
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.

Reviewed by:	kib
Tested by:	arichardson
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28895
2021-02-25 12:38:05 +00:00
Andrew Turner
59f6ddb2bc Use pmap_qenter in the N1SDP PCIe driver
In the Neoverse N1 SDP PCIe driver we need to map a page shared
between the firmware and the kernel. Previously we would use
pmap_kenter for this, however as this is not standardised between
architectures switch to the common pmap_qenter.

While here fix the error handling code to clean up on failure.

Reviewed by:	br
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28890
2021-02-25 12:38:05 +00:00
Kristof Provost
f5537cd069 bridgestp: Ensure we send STP on VLAN interfaces
Reviewed by:	donner@
MFC after:	1 week
X-MFC-with:	711ed156b9
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28916
2021-02-25 10:16:25 +01:00
Kristof Provost
f3245be349 net: remove legacy in_addmulti()
Despite the comment to the contrary neither pf nor carp use
in_addmulti(). Nothing does, so get rid of it.

Carp stopped using it in 08b68b0e4c
(2011). It's unclear when pf stopped using it, but before
d6d3f01e0a (2012).

Reviewed by:	bz@, melifaro@
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D28918
2021-02-25 10:13:52 +01:00
Jamie Gritton
c861373bdf jail: re-commit 811e27fa3c with fixes
Make sure PD_KILL isn't passed to do_jail_attach, where it might end
up trying to kill the caller's prison (even prison0).

Fix the child jail loop in prison_deref_kill, which was doing the
post-order part during the pre-order part.  That's not a system-
killer, but make jails not always die correctly.
2021-02-24 21:54:49 -08:00
Jamie Gritton
ddfffb41a2 jail: back out 811e27fa3c until it doesn't break Jenkins
Reported by:	arichardson
2021-02-24 21:10:47 -08:00
Marcin Wojtas
ef567155d3 Fix powerpc build after 6dd69f0064
Commit 6dd69f0064 ("iflib: introduce isc_dma_width")
failed to build on powerpc due to implicit type conversion
error. Fix that.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2021-02-25 02:35:41 +01:00
Ryan Libby
bf667f282a ofed: quiet gcc -Wint-in-bool-context
The int in the argument to the ternary triggered -Wint-in-bool-context
from gcc.  Upstream linux has a larger and more entangled patch,
12f727721eee61b3d19dedb95cb893b2baa9fe41, which doesn't apply cleanly.
When we eventually sync that, we can just drop this change.

Reviewed by:	hselasky, imp, kib
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28762
2021-02-24 15:56:16 -08:00
Ryan Libby
d8404b7ec3 ddb: just move cursor when the lexer backs up
Get rid of db_look_char because it's not compatible with db_get_line().
This fixes the following issue:

db> script lockinfo=show alllocks
db> run lockinfo
db:0:lockinfo> how alllocks
No such command; use "help" to list available commands

Reported by:	markj
Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28725
2021-02-24 15:56:16 -08:00
Ryan Libby
d85c9cef13 ddb: reliably fail with ambiguous commands
db_cmd_match had an even/odd bug, where if a third command was partially
matched (or any odd number greater than one) the search result would be
set back from CMD_AMBIGUOUS to CMD_FOUND, causing the last command in
the list to be executed instead of failing the match.

Reported by:	mlaier
Reviewed by:	markj, mlaier, vangyzen
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28659
2021-02-24 15:56:16 -08:00
Max Laier
14b5a3c7d5 vm pqbatch: move unmanaged page assert under pagequeue lock
This KASSERT is overzealous because of the following race condition:
 1) A managed page which is currently in PQ_LAUNDRY is freed.
    vm_page_free_prep calls vm_page_dequeue_deferred()

    The page state is:
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED

 2) The laundry worker comes around and pick up the page and calls
    vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the
    queue.  We do a vm_page_astate_load and get
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
    as per above.

 3) The laundry worker is pre-empted and another thread allocates our page
    from the free pool.  For example vm_page_alloc_domain_after calls
    vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for
    an OBJT_UNMANAGED object.

    The page state is:
       PQ_NONE, 0 - VPO_UNMANAGED

 4) The laundry worker resumes, and processes vm_pageout_defer based on the
    stale astate which leads to a call to vm_page_pqbatch_submit, which will
    trip on the KASSERT.

Submitted by:	mlaier
Reviewed by:	markj, rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28563
2021-02-24 15:56:16 -08:00
Marcin Wojtas
6dd69f0064 iflib: introduce isc_dma_width
Some DMA controllers are unable to address the full host memory space
and are instead limited to a subset of address range (e.g. 48-bit).

Allow the driver to specify the maximum allowed DMA addressing width
(in bits) for the NIC hardware, by introducing a new field in
if_softc_ctx.

If said field is omitted (set to 0), the lowaddr of DMA window bounds
defaults to BUS_SPACE_MAXADDR.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28706
2021-02-25 00:25:39 +01:00
Alexander V. Chernikov
cc3fa1e29f Fix crash with rtadv-originated multipath IPv6 routes.
PR:		253800
Reported by:	Frederic Denis <freebsdml at hecian.net>
MFC after:	immediately
2021-02-24 16:44:10 +00:00
Konstantin Belousov
e2494f7561 atomic: add atomic_interrupt_fence()
with the semantic following C11 signal_fence, that is, it establishes
ordering between its place and any interrupt handler executing on the
same CPU.

Reviewed by:	markj, mjg, rlibby
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28909
2021-02-24 22:45:24 +02:00
Brett Mastbergen
43d4dfac96 pwm_backlight: Add MODULE_DEPEND on backlight
Make the pwm_backlight module depend on backlight, so it
has access to the backlight interface symbols.  Otherwise you'll
get an error like:

link_elf: symbol backlight_get_info_desc undefined

Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
MFC after:	3 days
PR: 		253765
2021-02-24 17:56:26 +01:00
Mark Johnston
b6999635b1 iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice.  This was introduced when
pfil hooks were added to the iflib receive path.  We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.

PR:		253583
Reviewed by:	gallatin, erj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28900
2021-02-24 10:08:53 -05:00
Konstantin Belousov
6f30ac9995 Call softdep_prealloc() before taking ffs_lock_ea(), if unlock is committing
softdep_prealloc() must be called to ensure enough journal space is
available, before ffs_extwrite(). Also it must be done before taking
ffs_lock_ea(), because it calls ffs_syncvnode(), potentially dropping
the vnode lock.

Reviewed by:	mckusick
Tested by:	pho
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
2021-02-24 09:55:21 +02:00
Konstantin Belousov
5e198e7646 ffs_close_ea: do not relock vnode under lock_ea
ffs_lock_ea is after the vnode lock, so vnode must not be relocked under
lock_ea. Move ffs_truncate() call in ffs_close_ea() after the lock_ea is
dropped, and only truncate to length zero, since this is the only mode
supported by ffs_truncate() for EAs. Previously code did truncation and
then write.

Zero the part of the ext area that is unused, if truncation is due but not
done because ea area is not zero-length.

Reviewed by:	mckusick
Tested by:	pho
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
2021-02-24 09:55:04 +02:00
Konstantin Belousov
c6d68ca842 ffs_vnops.c: style
Use local var to shorten ap->a_vp expression.

Reviewed by:	mckusick
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-02-24 09:54:53 +02:00
Konstantin Belousov
4983146279 ffs: do not call softdep_prealloc() from UFS_BALLOC()
Do it in ffs_write(), where we can gracefuly handle relock and its
consequences. In particular, recheck the v_data to see if the vnode
reclamation ended, and return EBADF when we cannot proceed with the
write.

Reviewed by:	mckusick
Reported by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-02-24 09:54:50 +02:00
Konstantin Belousov
cc9958bf22 ffs_reallocblks: change the guard for softdep_prealloc() call to DOINGSUJ()
instead of DOINGSOFTDEP().  The softdep_prealloc() function does nothing
in SU case.

Note that the call should be safe with regard to the vnode relock,
because it is called with MNT_NOWAIT, which does not descend into fsync.

Reviewed by:	mckusick
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-02-24 09:54:30 +02:00
Mark Johnston
1d44514fcd rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue.  Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other.  This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.

Reviewed by:	rlibby, scottl
Discussed with:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28821
2021-02-23 21:17:12 -05:00
Allan Jude
6d67af5f8e Revert "ipmi_smbios: Deduplicate smbios entry point discovery logic"
This depends on another commit that has not landed yet, and broke the build

This reverts commit ba6e37e47f.
2021-02-23 22:49:13 +00:00
Allan Jude
4a5dfded17 Revert "ipmi_smbios: remove unused smbios_cksum function"
This reverts commit d2589dc3d5.
2021-02-23 22:48:59 +00:00
Alexander V. Chernikov
9c4a8d24f0 Fix nd6 rib_action() handling.
rib_action() guarantees valid rc filling IFF it returns without error.
Check rib_action() return code instead of checking rc fields.

PR:		253800
Reported by:	Frederic Denis <freebsdml@hecian.net>
MFC after:	immediately
2021-02-23 22:40:01 +00:00
Vladimir Kondratyev
bbacb7ce72 ig4: Add PCI IDs for Intel Gemini Lake I2C controller.
Submitted by:	Dmitry Luhtionov
MFC after:	2 weeks
2021-02-24 01:23:43 +03:00
Allan Jude
d2589dc3d5 ipmi_smbios: remove unused smbios_cksum function
Sponsored By:	Ampere Computing LLC
Submitted By:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D28751
2021-02-23 21:24:47 +00:00
Allan Jude
ba6e37e47f ipmi_smbios: Deduplicate smbios entry point discovery logic
Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D28743
2021-02-23 21:17:37 +00:00
Allan Jude
d0673fe160 smbios: Move smbios driver out from x86 machdep code
Add it to the x86 GENERIC and MINIMAL kernels

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	rpokala
Differential Revision:	https://reviews.freebsd.org/D28738
2021-02-23 21:17:09 +00:00
Allan Jude
11ba8488b8 iicsmb: Request the bus recursively in bread()
ipmi_ssif will `smbus_request_bus()` to do multiple smbus requests
(which requests the iicbus), and then here in `bread()` we also need to
request the bus because `bread()` takes multiple transactions.
This causes deadlock as it's waiting for the bus it already has without
`IIC_RECURSIVE`.

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D28742
2021-02-23 20:06:16 +00:00
Konstantin Belousov
3ae8d83d04 Remove __NO_TLS.
All supported platforms support thread-local vars and __thread.

Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28796
2021-02-23 20:08:10 +02:00
Alex Richardson
fa32350347 close_range: add audit support
This fixes the closefrom test in sys/audit.

Includes cherry-picks of the following commits from openbsm:

4dfc628aaf
99ff6fe32a
da48a0399e

Reviewed By:	kevans
Differential Revision: https://reviews.freebsd.org/D28388
2021-02-23 17:47:07 +00:00
Alexander Motin
7d4c444374 Bump CTL block backend threads from 14 to 32 per LUN.
This makes random read benchmarks look better on a wide ZFS pools.
I am not sure where the original value goes from, but it is there
for too long now.

MFC after:	1 week
2021-02-23 11:03:32 -05:00