Commit Graph

134778 Commits

Author SHA1 Message Date
Navdeep Parhar
76b976ad98 cxgbe(4): Add the firmware binaries missing in r367428.
Obtained from:	Chelsio Communications
MFC after:	5 days
Sponsored by:	Chelsio Communications
2020-11-08 22:30:13 +00:00
Mitchell Horne
4a3fc6e22e Fix definition of rn_addmask()
Add the missing static keyword present in the declaration.

Reviewed by:	melifaro
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27024
2020-11-08 19:02:22 +00:00
Mitchell Horne
b02c4e5c78 igmp: convert igmpstat to use PCPU counters
Currently there is no locking done to protect this structure. It is
likely okay due to the low-volume nature of IGMP, but allows for
the possibility of underflow. This appears to be one of the only
holdouts of the conversion to counter(9) which was done for most
protocol stat structures around 2013.

This also updates the visibility of this stats structure so that it can
be consumed from elsewhere in the kernel, consistent with the vast
majority of VNET_PCPUSTAT structures.

Reviewed by:	kp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27023
2020-11-08 18:49:23 +00:00
Richard Scheffenegger
4d0770f172 Prevent premature SACK block transmission during loss recovery
Under specific conditions, a window update can be sent with
outdated SACK information. Some clients react to this by
subsequently delaying loss recovery, making TCP perform very
poorly.

Reported by:	chengc_netapp.com
Reviewed by:	rrs, jtl
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24237
2020-11-08 18:47:05 +00:00
Alexander V. Chernikov
2d39824195 Switch net.add_addr_allfibs default to 0.
The goal of the fib support is to provide multiple independent
 routing tables, isolated from each other.
net.add_addr_allfibs default tries to shift gears in the opposite
 direction, unconditionally inserting all addresses to all of the fibs.

There are use cases when this is necessary, however this is not a
 default expected behaviour, especially compared to other implementations.

Provide WARNING message for the setups with multiple fibs to notify
 potential users of the feature.

Differential Revision:	https://reviews.freebsd.org/D26076
2020-11-08 18:27:49 +00:00
Alexander V. Chernikov
76e6b37f6b Temporarily revert setting net.add_addr_allfibs to 0.
It accidentally sweeped in r367486.
Revert to allow for proper commit message & warning.
2020-11-08 18:11:12 +00:00
Edward Tomasz Napierala
a1bd83fede Move syscall_thread_{enter,exit}() into the slow path. This is only
needed for syscalls from unloadable modules.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D26988
2020-11-08 15:54:59 +00:00
Mariusz Zaborski
36d6566e59 Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
OpenZFS Pull Request: https://github.com/openzfs/zfs/pull/11152
PR:		250934
2020-11-08 14:08:00 +00:00
Alexander V. Chernikov
770495f4c0 Fix build broken by r367484: add route_ifaddrs.c.
Pointy hat to: melifaro
Reported by:	jenkins
2020-11-08 13:30:44 +00:00
Alexander V. Chernikov
bad6b23606 Move all ifaddr route creation business logic to net/route/route_ifaddr.c
Differential Revision:	https://reviews.freebsd.org/D26318
2020-11-08 11:12:00 +00:00
Alexander Leidinger
8ec6c4a38b - add more linux socket options (sorted by value)
- map those IPv4 / IPv6 socket options which exist in FreeBSD
   + most of them visually verified to have the same type/layout of arguments
   + not tested with linux programs to behave as intended
 - be more human readable for known options which are not handled
 - be more verbose for unhandled socket message flags we know about
 - print the jail ID in linux_msg if run in a jail
 - add possibility to print debug message about known missing parts only once
 - add multiple levels of sysctl linux.debug:
   1: print debug messages, tell about unimplemented stuff (only once)
   2: like 1, but also print messages about implemented but not tested
      stuff (only once)
   3+: like 2, but no rate limiting of messages
 - increase default linux debug level from 1 to 3

We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.

IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.

Discussed with:	trasz
2020-11-08 09:50:58 +00:00
Kyle Evans
8c28aa5e45 imgact_binmisc: limit the extent of match on incoming entries
imgact_binmisc matches magic/mask from imgp->image_header, which is only a
single page in size mapped from the first page of an image. One can specify
an interpreter that matches on, e.g., --offset 4096 --size 256 to read up to
256 bytes past the mapped first page.

The limitation is that we cannot specify a magic string that exceeds a
single page, and we can't allow offset + size to exceed a single page
either.  A static assert has been added in case someone finds it useful to
try and expand the size, but it does seem a little unlikely.

While this looks kind of exploitable at a sideways squinty-glance, there are
a couple of mitigating factors:

1.) imgact_binmisc is not enabled by default,
2.) entries may only be added by the superuser,
3.) trying to exploit this information to read what's mapped past the end
  would be worse than a root canal or some other relatably painful
  experience, and
4.) there's no way one could pull this off without it being completely
  obvious.

The first page is mapped out of an sf_buf, the implementation of which (or
lack thereof) depends on your platform.

MFC after:	1 week
2020-11-08 04:24:29 +00:00
Michael Tuexen
f908d8247e The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK
access the socket send or receive buffer. This is not possible for
listening sockets since r319722.
Because send()/recv() calls fail on listening sockets, fail also ioctl()
indicating EINVAL.

PR:			250366
Reported by:		Yong-Hao Zou
Reviewed by:		glebius, rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26897
2020-11-07 21:17:49 +00:00
Kyle Evans
1024ef27fe imgact_binmisc: move some calculations out of the exec path
The offset we need to account for in the interpreter string comes in two
variants:

1. Fixed - macros other than #a that will not vary from invocation to
   invocation
2. Variable - #a, which is substitued with the argv0 that we're replacing

Note that we don't have a mechanism to modify an existing entry.  By
recording both of these offset requirements when the interpreter is added,
we can avoid some unnecessary calculations in the exec path.

Most importantly, we can know up-front whether we need to grab
calculate/grab the the filename for this interpreter. We also get to avoid
walking the string a first time looking for macros. For most invocations,
it's a swift exit as they won't have any, but there's no point entering a
loop and searching for the macro indicator if we already know there will not
be one.

While we're here, go ahead and only calculate the argv0 name length once per
invocation. While it's unlikely that we'll have more than one #a, there's no
reason to recalculate it every time we encounter an #a when it will not
change.

I have not bothered trying to benchmark this at all, because it's arguably a
minor and straightforward/obvious improvement.

MFC after:	1 week
2020-11-07 18:07:55 +00:00
Mateusz Guzik
ff19fd6242 zfs: remove 2 assertions that teardown lock is not held
They are not very useful and hard to implement with rms.

This has a side effect of simplying the code.
2020-11-07 16:58:38 +00:00
Mateusz Guzik
42e7abd5db rms: several cleanups + debug read lockers handling
This adds a dedicated counter updated with atomics when INVARIANTS
is used. As a side effect one can reliably determine the lock is held
for reading by at least one thread, but it's still not possible to
find out whether curthread has the lock in said mode.

This should be good enough in practice.

Problem spotted by avg.
2020-11-07 16:57:53 +00:00
Kyle Evans
ecb4fdf943 imgact_binmisc: reorder members of struct imgact_binmisc_entry (NFC)
This doesn't change anything at the moment since the out-of-order elements
were a pair of uint32_t, but future additions may have caused unnecessary
padding by following the existing precedent.

MFC after:	1 week
2020-11-07 16:41:59 +00:00
Kyle Evans
e0f14ecf60 vt: resolve conflict between VT_ALT_TO_ESC_HACK and DBG
When using the ALT+CTRL+ESC sequence to break into kdb, the keyboard is
completely borked when you return. watch(8) shows that it's working, but
it's inserting escape sequences.

Further investigation revealed that VT_ALT_TO_ESC_HACK is the default and
directly conflicts with this sequence, so upon return from the debugger
ALKED is set.

If they triggered the break to debugger, it's safe to assume they didn't
mean to use VT_ALT_TO_ESC_HACK, so just unset it to reduce the surprise when
the keyboard seems non-functional upon return.

Reviewed by:	tsoome
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27109
2020-11-07 15:38:01 +00:00
Michal Meloun
eb20867f52 Add a method to determine whether given interrupt is per CPU or not.
MFC after:	2 weeks
2020-11-07 14:58:01 +00:00
Edward Tomasz Napierala
da45ea6bc6 Move TDB_USERWR check under 'if (traced)'.
If we hadn't been traced in the first place when syscallenter()
started executing, we can ignore TDB_USERWR.  TDB_USERWR can get set,
sure, but if it does, it's because the debugger raced with the syscall,
and it cannot depend on winning that race.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D26585
2020-11-07 13:09:51 +00:00
Kyle Evans
2192cd125f imgact_binmisc: abstract away the list lock (NFC)
This module handles relatively few execs (initial qemu-user-static, then
qemu-user-static handles exec'ing itself for binaries it's already running),
but all execs pay the price of at least taking the relatively expensive
sx/slock to check for a match when this module is loaded. Future work will
almost certainly swap this out for another lock, perhaps an rmslock.

The RLOCK/WLOCK phrasing was chosen based on what the callers are really
wanting, rather than using the verbiage typically appropriate for an sx.

MFC after:	1 week
2020-11-07 05:10:46 +00:00
Kyle Evans
7d3ed9777a imgact_binmisc: validate flags coming from userland
We may want to reserve bits in the future for kernel-only use, so start
rejecting any that aren't the two that we're currently expecting from
userland.

MFC after:	1 week
2020-11-07 04:10:23 +00:00
Kyle Evans
7667824ade epoch: support non-preemptible epochs checking in_epoch()
Previously, non-preemptible epochs could not check; in_epoch() would always
fail, usually because non-preemptible epochs don't imply THREAD_NO_SLEEPING.

For default epochs, it's easy enough to verify that we're in the given
epoch: if we're in a critical section and our record for the given epoch
is active, then we're in it.

This patch also adds some additional INVARIANTS bookkeeping. Notably, we set
and check the recorded thread in epoch_enter/epoch_exit to try and catch
some edge-cases for the caller. It also checks upon freeing that none of the
records had a thread in the epoch, which may make it a little easier to
diagnose some improper use if epoch_free() took place while some other
thread was inside.

This version differs slightly from what was just previously reviewed by the
below-listed, in that in_epoch() will assert that no CPU has this thread
recorded even if it *is* currently in a critical section. This is intended
to catch cases where the caller might have somehow messed up critical
section nesting, we can catch both if they exited the critical section or if
they exited, migrated, then re-entered (on the wrong CPU).

Reviewed by:	kib, markj (both previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27098
2020-11-07 03:29:04 +00:00
Kyle Evans
80083216cb imgact_binmisc: minor re-organization of imgact_binmisc_exec exits
Notably, streamline error paths through the existing 'done' label, making it
easier to quickly verify correct cleanup.

Future work might add a kernel-only flag to indicate that a interpreter uses
#a. Currently, all executions via imgact_binmisc pay the penalty of
constructing sname/fname, even if they will not use it. qemu-user-static
doesn't need it, the stock rc script for qemu-user-static certainly doesn't
use it, and I suspect these are the vast majority of (if not the only)
current users.

MFC after:	1 week
2020-11-07 03:28:32 +00:00
Mateusz Guzik
e25d8b67c3 malloc: tweak the version check in r367432 to include type name
While here fix a whitespace problem.
2020-11-07 01:32:16 +00:00
Bjoern A. Zeeb
2144eb7568 usb_hub: giving up port reset - device vanished
Improve the output of the recently often experienced debug message in order
to gather further data.

PR:		237666
Reviewed by:	hselasky
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D27108
2020-11-06 22:40:00 +00:00
Conrad Meyer
76b2bfeda4 linux(4): Fix loadable modules after r367395
Move dtrace SDT definitions into linux_common module code.  Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.

Reported by:	several
PR:		250897
Discussed with:	emaste, trasz
Tested by:	John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With:	r367395
Differential Revision:	https://reviews.freebsd.org/D27124
2020-11-06 22:04:57 +00:00
Mateusz Guzik
bdcc222644 malloc: move malloc_type_internal into malloc_type
According to code comments the original motivation was to allow for
malloc_type_internal changes without ABI breakage. This can be trivially
accomplished by providing spare fields and versioning the struct, as
implemented in the patch below.

The upshots are one less memory indirection on each alloc and disappearance
of mt_zone.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27104
2020-11-06 21:33:59 +00:00
Toomas Soome
0244378f43 efifb: vt_generate_cons_palette() takes max color, not mask
vt_generate_cons_palette() does take max values of RGB component colours, not
mask. Also we need to set info->fb_cmsize, or vt_fb_init() will re-initialize
the info->fb_cmap.
2020-11-06 21:27:54 +00:00
Edward Tomasz Napierala
096068b976 Make powerpc use MAXARGS (defined as 8) instead of hardcoding '10'.
This brings its 'struct syscall_args' in sync with other architectures.

Reviewed by:	bdragon, jhibbits
MFC after:	2 weeks
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D26605
2020-11-06 19:27:27 +00:00
Edward Tomasz Napierala
24adaab477 Remove 'struct trapframe' pointer from mips64's 'struct syscall_args'.
While here, use MAXARGS.  This brings its 'struct syscall_args' in sync
with most other architectures.

Reviewed by:	arichardson, brooks
MFC after:	2 weeks
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D26619
2020-11-06 19:19:51 +00:00
Navdeep Parhar
890efa1ab9 cxgbe(4): Update firmwares to 1.25.0.40.
This fixes a potential crash in firmware 1.25.0.0 on the passive open
side during TOE operation.

Obtained from:	Chelsio Communications
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-06 19:04:20 +00:00
Leandro Lupori
8b2133d4e1 Fix powerpc and LINT builds
Fix build errors introduced by r367417 and r367390:

- Guard label reached only by powerpc64
- Guard vm_reserv_level_iffullpop call, that is not defined on powerpc
  variants that don't support superpages
- Add missing hwpmc file, for when hwpmc is built into kernel
2020-11-06 18:50:00 +00:00
John Baldwin
3acf4d2374 Use void * in place of caddr_t.
Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27065
2020-11-06 18:09:52 +00:00
John Baldwin
c423784dc5 Group session management routines together before first use.
- Rename cse*() to cse_*() to more closely match other local APIs in
  this file.

- Merge the old csecreate() into cryptodev_create_session() and rename
  the new function to cse_create().

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27070
2020-11-06 18:05:29 +00:00
Mark Johnston
acb5785aae Add firmware modules for qat(4), take two
My script to convert git commits to svn patch does not handle binary
files correctly, and r367387 committed a set of empty files as a result.

MFC with:	r367387
Sponsored by:	Rubicon Communications, LLC (Netgate)
2020-11-06 16:12:06 +00:00
Leandro Lupori
e2d6c417e3 Implement superpages for PowerPC64 (HPT)
This change adds support for transparent superpages for PowerPC64
systems using Hashed Page Tables (HPT). All pmap operations are
supported.

The changes were inspired by RISC-V implementation of superpages,
by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture
and existing MMU OEA64 code.

While these changes are not better tested, superpages support is disabled by
default. To enable it, use vm.pmap.superpages_enabled=1.

In this initial implementation, when superpages are disabled, system
performance stays at the same level as without these changes. When
superpages are enabled, buildworld time increases a bit (~2%). However,
for workloads that put a heavy pressure on the TLB the performance boost
is much bigger (see HPC Challenge and pgbench on D25237).

Reviewed by:	jhibbits
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D25237
2020-11-06 14:12:45 +00:00
Alfredo Dal'Ava Junior
5d0e861910 [POWERPC] Floating-Point Exception trap support
Add support for Floating-Point Exception traps on 32 and 64 bit platforms.
Also make sure to clean FPSCR on EXEC and thread exit

Author of initial version: Renato Riolino <renato.riolino@eldorad.org.br>

Reviewed by:	jhibbits
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D23623
2020-11-06 13:34:30 +00:00
John Baldwin
f5074add75 Move cryptof_ioctl() below the routines it calls.
Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27069
2020-11-06 00:15:52 +00:00
John Baldwin
b19d4c075f Split logic to create new sessions into a separate function.
This simplifies cryptof_ioctl as it now a wrapper around functions that
contain the bulk of the per-ioctl logic.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27068
2020-11-06 00:10:58 +00:00
John Baldwin
c54004c6a9 Move cryptodev_cb earlier before it is used.
This is consistent with cryptodevkey_cb being defined before it is used
and removes a prototype in the middle of the file.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27067
2020-11-05 23:42:36 +00:00
John Baldwin
195105254f Check cipher key lengths during probesession.
OCF drivers in general should perform as many session parameter checks
as possible during probesession rather than when creating a new
session.  I got this wrong for aesni(4) in r359374.  In addition,
aesni(4) was performing the check for digest-only requests and failing
to create digest-only sessions as a result.

Reported by:	jkim
Tested by:	jkim
Sponsored by:	Chelsio Communications
2020-11-05 23:31:58 +00:00
John Baldwin
5973f4922d Style fixes for function prototypes and definitions.
Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27066
2020-11-05 23:28:05 +00:00
John Baldwin
84fea065db Don't modify the destination pointer in ioctl requests.
This breaks the case where the original pointer was NULL but an
in-line IV was used.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27064
2020-11-05 23:26:02 +00:00
Mateusz Guzik
71460dfcb2 nvme: change namei_request_zone into a malloc type
Both the size (128 bytes) and ephemeral nature of allocations make it a great
fit for malloc.

A dedicated zone unnecessarily avoids sharing buckets with 128-byte objects.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D27103
2020-11-05 21:44:58 +00:00
Konstantin Belousov
f10845877e Suspend all writeable local filesystems on power suspend.
This ensures that no writes are pending in memory, either metadata or
user data, but not including dirty pages not yet converted to fs writes.

Only filesystems declared local are suspended.

Note that this does not guarantee absence of the metadata errors or
leaks if resume is not done: for instance, on UFS unlinked but opened
inodes are leaked and require fsck to gc.

Reviewed by:	markj
Discussed with:	imp
Tested by:	imp (previous version), pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27054
2020-11-05 20:52:49 +00:00
Leandro Lupori
6a32dae2b7 Fix powerpc and powerpcspe builds
This change fixes 32-bit PowerPC builds, that r367390 broke
(shift count >= width of type).
2020-11-05 20:18:00 +00:00
Conrad Meyer
e9b13c6612 linux(4): Deduplicate unimpl/dummy syscall handlers
No functional change.

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27099
2020-11-05 19:30:31 +00:00
Edward Tomasz Napierala
6998d5b1c8 Remove the 'nap' field from ARM's 'struct syscall_args', to bring it
in sync with (most) other architectures.  No functional changes.

Reviewed by:	manu
Tested by:	mmel
MFC after:	2 weeks
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D26604
2020-11-05 18:10:03 +00:00
Leandro Lupori
68dd718256 [PowerPC] hwpmc: add support for POWER8/9 PMCs
This change adds support for POWER8 and POWER9 PMCs (bare metal and
pseries).
All PowerISA 2.07B non-random events are supported.

Implementation was based on that of PPC970.

Reviewed by:	jhibbits
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D26110
2020-11-05 16:36:39 +00:00
Mateusz Guzik
16b971ed6d malloc: add a helper returning size allocated for given request
Sample usage: kernel modules can decide whether to stick to malloc or
create their own zone.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27097
2020-11-05 16:21:21 +00:00
Mark Johnston
f078c492a9 Add firmware modules for qat(4)
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
2020-11-05 16:00:30 +00:00
Mark Johnston
72143e89bb Add qat(4)
This provides an OpenCrypto driver for Intel QuickAssist devices.  The
driver was initially ported from NetBSD and comes with a few
improvements:
- support for GMAC/AES-GCM, AES-CTR and AES-XTS, and support for
  SHA/HMAC-authenticated encryption
- support for detaching the driver
- various bug fixes
- DH895X support

Discussed with:	jhb
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D26963
2020-11-05 15:55:23 +00:00
Mateusz Guzik
2dee296a3d Rationalize per-cpu zones.
The 2 provided zones had inconsistent naming between each other
("int" and "64") and other allocator zones (which use bytes).

Follow malloc by naming them "pcpu-" + size in bytes.

This is a step towards replacing ad-hoc per-cpu zones with
general slabs.
2020-11-05 15:08:56 +00:00
Leandro Lupori
9fe896ec79 [PowerPC] Make PPC 970 PMC SPRs the standard ones
And add a _74XX suffix to 74XX SPRs.

This is a preparation for adding support to POWER8/9 PMCs, which have most
SPRs equal to 970 ones.

Reviewed by:	jhibbits
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D26532
2020-11-05 14:15:50 +00:00
Mateusz Guzik
ea33cca971 poll/select: change selfd_zone into a malloc type
On a sample box vmstat -z shows:

ITEM                   SIZE  LIMIT     USED     FREE      REQ
64:                      64,      0, 1043784, 4367538,3698187229
selfd:                   64,      0,    1520,   13726,182729008

But at the same time:
vm.uma.selfd.keg.domain.1.pages: 121
vm.uma.selfd.keg.domain.0.pages: 121

Thus 242 pages got pulled even though the malloc zone would likely accomodate
the load without using extra memory.
2020-11-05 12:24:37 +00:00
Mateusz Guzik
2fbb45c601 vfs: change nt_zone into a malloc type
Elements are small in size and allocated for short periods.
2020-11-05 12:06:50 +00:00
Mateusz Guzik
f24aa01f9d tmpfs: reorder struct tmpfs_node to shrink it by 8 bytes
The reduction (232 -> 224 bytes) allows UMA to fit one more item (17 -> 18)
per slab as reported in vm.uma.TMPFS_node.keg.ipers.
2020-11-05 11:24:45 +00:00
Andrew Turner
d3d8ca7425 Stop trying to bounce in memory allocated by bus dma
Memory allocated by bus_dmamem_alloc will take into account any alignment
requirements of the CPU it's running on. Stop trying to bounce in this case
as there is no bounce zone allocated.

Reported by:	manu, tuexen
Tested by:	manu
Sponsored by:	Innovate UK
2020-11-05 09:55:55 +00:00
Conrad Meyer
20172854ab Add sbuf streaming mode to pseudofs(9), use in linprocfs(5)
Add a pseudofs node flag 'PFS_AUTODRAIN', which automatically emits sbuf
contents to the caller when the sbuf buffer fills.  This is only
permissible if the corresponding PFS node fill function can sleep
whenever it appends to the sbuf.

linprocfs' /proc/self/maps node happens to meet this requirement.
Streaming out the file as it is composed avoids truncating the output
and also avoids preallocating a very large buffer.

Reviewed by:	markj; earlier version: emaste, kib, trasz
Differential Revision:	https://reviews.freebsd.org/D27047
2020-11-05 06:48:51 +00:00
Kyle Evans
df69035d7f imgact_binmisc: fix up some minor nits
- Removed a bunch of redundant headers
- Don't explicitly initialize to 0
- The !error check prior to setting imgp->interpreter_name is redundant, all
  error paths should and do return or go to 'done'. We have larger problems
  otherwise.
2020-11-05 04:19:48 +00:00
Mateusz Guzik
aebc96831f zfs: lz4: add optional kmem_alloc support
lz4 port from illumos to Linux added a 16KB per-CPU cache to accommodate for
the missing 16KB malloc. FreeBSD supports this size, making the extra cache
harmful as it can't share buckets.
2020-11-05 03:25:23 +00:00
Mateusz Guzik
3c50616fc1 fd: make all f_count uses go through refcount_* 2020-11-05 02:12:33 +00:00
Mateusz Guzik
d737e9eaf5 fd: hide _fdrop 0 count check behind INVARIANTS
While here use refcount_load and make sure to report the tested value.
2020-11-05 02:12:08 +00:00
Mitchell Horne
caaddb88e8 riscv: set kernel_pmap hart mask more precisely
In pmap_bootstrap(), we fill kernel_pmap->pm_active since it is
invariably active on all harts. However, this marks it as active even
for harts that don't exist in the system, which can cause issue when the
mask is passed to the SBI firmware via sbi_remote_sfence_vma().
Specifically, the SBI spec allows SBI_ERR_INVALID_PARAM to be returned
when an invalid hart is set in the mask.

The latest version of OpenSBI does not have this issue, but v0.6 does,
and this is triggering a recently added KASSERT in CI. Switch to only
setting bits in pm_active for harts that enter the system.

Reported by:	Jenkins
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27080
2020-11-05 00:52:52 +00:00
Justin Hibbits
01db2f5461 Fix UMA alignment for COP2 context structure.
UMA alignment needs specified as (power-of-2) - 1, not power-of-2.

Discussed with:	gonzo
MFC after:	3 days
2020-11-04 23:29:27 +00:00
Mateusz Guzik
331c21dd5e pipe: whitespace nit in previous 2020-11-04 23:17:41 +00:00
Mateusz Guzik
c22ba7bb06 pipe: fix POLLHUP handling if no events were specified
Linux allows polling without any events specified and it happens to be the case
in FreeBSD as well. POLLHUP has to be delivered regardless of the event mask
and this works fine if the condition is already present. However, if it is
missing, selrecord is only called if the eventmask has relevant bits set. This
in particular leads to a conditon where pipe_poll can return 0 events and
neglect to selrecord, while kern_poll takes it as an indication it has to go to
sleep, but then there is nobody to wake it up.

While the problem seems systemic to *_poll handlers the least we can do is fix
it up for pipes.

Reported by:	Jeremie Galarneau <jeremie.galarneau at efficios.com>
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D27094
2020-11-04 23:11:54 +00:00
Vladimir Kondratyev
07030f3362 atkbdc(4): Add quirk for "System76 lemur Pro" laptops.
Currently atkbdc(4) assumes all coreboot BIOSes belonging to Chromebooks
and unconditionally sets a number of quirks to workaround known issues.

Exclude "System76" laptops from this set as they appeared to be a
traditional hardware ("lemur Pro" is a rebranded Clevo chassis) with
coreboot firmware on board. KBDC_QUIRK_KEEP_ACTIVATED quirk activated for
Chromebook platform makes keyboard on this devices inoperable.

"Purism Librem" laptops may require the same exclusion too.

PR:		250711
Reported by:	nick.lott@gmail.com
MFC after:	2 weeks
2020-11-04 21:52:10 +00:00
Edward Tomasz Napierala
cdf6e4e922 Unbreak buildworld after r367339.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-11-04 21:39:04 +00:00
Mateusz Guzik
9a97c95070 Bump __FreeBSD_version after rms changes 2020-11-04 21:23:25 +00:00
Mateusz Guzik
926ad187fd zfs: use rms lock for teardown handling
This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly.
2020-11-04 21:22:41 +00:00
Mateusz Guzik
4008dd4581 zfs: macroify teardown handling 2020-11-04 21:19:54 +00:00
Mateusz Guzik
ae5642a670 zfs: rename teardown inactive macros to mimick rrm convention 2020-11-04 21:19:25 +00:00
Mateusz Guzik
4a0b7fd502 zfs: add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
They are expected to fail only in corner cases.
2020-11-04 21:18:51 +00:00
Mateusz Guzik
8ce21ae6ba zfs: even up assert 2020-11-04 21:18:27 +00:00
Mateusz Guzik
6fc2b069ca rms: fixup concurrent writer handling and add more features
Previously the code had one wait channel for all pending writers.
This could result in a buggy scenario where after a writer switches
the lock mode form readers to writers goes off CPU, another writer
queues itself and then the last reader wakes up the latter instead
of the former.

Use a separate channel.

While here add features to reliably detect whether curthread has
the lock write-owned. This will be used by ZFS.
2020-11-04 21:18:08 +00:00
Emmanuel Vadot
4e306624d1 dtb/rockchip: Add rockpi-4 to the build
We boot on this board to add the dtb to the build.

Requested by:	Daniel Engberg <daniel.engberg.lists@pyret.net>
2020-11-04 20:15:14 +00:00
Edward Tomasz Napierala
2f927d87f9 Add linux_to_bsd_errtbl[], mapping Linux errnos to their BSD counterparts.
This will be used by fuse(4).

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26974
2020-11-04 19:54:18 +00:00
Emmanuel Vadot
03d0d84bf6 Plug minor memory leak in dwc3 USB2/USB3 controller.
OF_getprop_alloc called earlier requires corresponding OF_prop_free to release allocated memory.

Submitted by:	kjopek@gmail.com
Differential Revision:	https://reviews.freebsd.org/D27085
2020-11-04 18:23:59 +00:00
Mark Johnston
cff169880e amd64: Make it easier to configure exception stack sizes
The amd64 kernel handles certain types of exceptions on a dedicated
stack.  Currently the sizes of these stacks are all hard-coded to
PAGE_SIZE, but for at least NMI handling it can be useful to use larger
stacks.  Add constants to intr_machdep.h to make this easier to tweak.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27076
2020-11-04 16:42:20 +00:00
Mark Johnston
f7db0c9532 vmspace: Convert to refcount(9)
This is mostly mechanical except for vmspace_exit().  There, use the new
refcount_release_if_last() to avoid switching to vmspace0 unless other
processes are sharing the vmspace.  In that case, upon switching to
vmspace0 we can unconditionally release the reference.

Remove the volatile qualifier from vm_refcnt now that accesses are
protected using refcount(9) KPIs.

Reviewed by:	alc, kib, mmel
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27057
2020-11-04 16:30:56 +00:00
Mark Johnston
e89004612a refcount(9): Add refcount_release_if_last() and refcount_load()
The former is intended for use in vmspace_exit().  The latter is to
encourage use of explicit loads rather than relying on the volatile
qualifier.  This works better with kernel sanitizers, which can
intercept atomic(9) calls, and makes tricky lockless code easier to read
by not forcing the reader to remember which variables are declared
volatile.

Reviewed by:	kib, mjg, mmel
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27056
2020-11-04 16:30:30 +00:00
Bjoern A. Zeeb
4ceeb398bf arm64: implement bs_sr_<N>
Implement the bs_sr_<N> generic functions based on the generic
mips implementation calling the generic bs_w_<N> functions in a loop.

ral(4) (rt2860.c) panics in RAL_SET_REGION_4() because bs_sr_4()
is NULL.  It seems ral(4) and ti(4) might be the only consumers of
these functions I could find quickly so keeping them in C rather than asm.

Reported by:	Steve Wheeler (https://redmine.pfsense.org/issues/11021)
Reviewed by:	mmel
MFC after:	3 days
2020-11-04 12:11:50 +00:00
Bjoern A. Zeeb
60ec31e93f net80211: fix a typo
Correct a typo referring to the wrong flags in a comment.
No functional changes.

MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
2020-11-04 12:07:33 +00:00
Andrew Turner
9815c092de Add the pmap.h changes missed in r367320
Reported by:	bz
Sponsored by:	Innovate UK
2020-11-04 11:48:08 +00:00
Mateusz Piotrowski
a858a39b31 Fix a typo 2020-11-04 10:38:25 +00:00
Andrew Turner
6f802908da Allow the creation of 3 level page tables on arm64
The stage 2 arm64 page tables may need to start at a lower level. This
is because we may only be able to map a limited IPA range and trying
to use a full 4 levels will cause the CPU to fault in an unrecoverable
way.

To simplify the code we still allocate the full 4 levels, however level 0
will only ever be used to find the level 1 table used as the base. Handle
this by creating a dummy entry in the level 0 table to point to the level 1
table.

Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D26066
2020-11-04 10:21:30 +00:00
John Baldwin
9038e6a1e4 Replace some K&R function definitions with ANSI C.
Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27062
2020-11-03 22:32:30 +00:00
John Baldwin
d3d79e968b Consistently use C99 fixed-width types in the in-kernel crypto code.
Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27061
2020-11-03 22:27:54 +00:00
Ilya Bakulin
b6b885c4fe Always return MMC errors from mmc_handle_reply()
There are two ways to propagate the error in MMCCAM:
 * Using cmd.error which is set by the peripheral driver;
 * Using CCB status which is... also set by the driver.

The problem is that those two error conditions don't necessarily match.
This leads to the confusion when handling the MMC reply. So enforce the consistency
by panicking if request is marked as completed successfully but MMC-level error
is present (this hints to the programming error).

Reviewed by:	manu
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D26925
2020-11-03 21:38:59 +00:00
Conrad Meyer
9e47480e94 linux(4): Improve netlink diagnostics
Add some missing netlink_family definitions and produce vaguely
human-readable error messages for those definitions, like we used to do for
just ROUTE and KOBJECT_UEVENTS.

Additionally, if we know it's a netfilter socket but didn't find it in the
table, fall back to printing that instead of the generic handler ("socket
domain 16, ...").

No change to the emulator correctness, just mildly improved diagnostics for
gaps.
2020-11-03 19:50:42 +00:00
Brooks Davis
19647e76fc sysvshm: pass relevant uap members as arguments
Alter shmget_allocate_segment and shmget_existing to take the values
they want from struct shmget_args rather than passing the struct
around.  In general, uap structures should only be the interface to
sys_<foo> functions.

This makes on small functional change and records the allocated space
rather than the requested space.  If this turns out to be a problem (e.g.
if software tries to find undersized segments by exact size rather than
using keys), we can correct that easily.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27077
2020-11-03 19:14:03 +00:00
Edward Tomasz Napierala
7abf30d339 Make linux_errtbl[] static.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27004
2020-11-03 19:12:33 +00:00
Edward Tomasz Napierala
939e5de8d4 Fix rookie mistake - it's nitems(), not sizeof().
Reported by:	xtouqh_icloud.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-11-03 14:44:33 +00:00
Konstantin Belousov
80ba361b2f if_media.c SIOCGMEDIAX handler: improve loop
Stop advancing counter past the current iteration number at the start
of iteration.  This removes the need of subtracting one when
calculating index for copyout, and arguably fixes off-by-one reporting
of copied out elements when copyout failed.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies / NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D27073
2020-11-03 14:33:04 +00:00
Conrad Meyer
eaa5afcefa linux(4) prctl(2): Implement PR_[GS]ET_DUMPABLE
Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.

TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced.  There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.

Reviewed by:	markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D27015
2020-11-03 02:10:54 +00:00
Conrad Meyer
443d8a07df linux(4): Emulate Linux SOL_SOCKET:SO_PASSCRED
This is required by some major linux applications, such as Chrome and
Firefox.  (As well as Electron-using applications, which are essentially
a bundled version of Chrome.)

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27012
2020-11-03 01:19:13 +00:00
Conrad Meyer
2de07e4096 unix(4): Add SOL_LOCAL:LOCAL_CREDS_PERSISTENT
This option is intended to be semantically identical to Linux's
SOL_SOCKET:SO_PASSCRED.  For now, it is mutually exclusive with the
pre-existing sockopt SOL_LOCAL:LOCAL_CREDS.

Reviewed by:	markj (penultimate version)
Differential Revision:	https://reviews.freebsd.org/D27011
2020-11-03 01:17:45 +00:00
Conrad Meyer
a98f03786e linux(4): style: Eliminate dead 'break' after 'return'
No functional change.
2020-11-03 01:10:27 +00:00