268508 Commits

Author SHA1 Message Date
Mark Johnston
2bd9826995 vfs: Permit unix sockets to be opened with O_PATH
As with FIFOs, a path descriptor for a unix socket cannot be used with
kevent().

In principle connectat(2) and bindat(2) could be modified to support an
AT_EMPTY_PATH-like mode which operates on the socket referenced by an
O_PATH fd referencing a unix socket.  That would eliminate the path
length limit imposed by sockaddr_un.

Update O_PATH tests.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31970
2021-09-17 14:19:06 -04:00
Mark Johnston
c13f6dd7d2 aio_test: Validate interactions between AIO on sockets and shutdown(2)
Reviewed by:	asomers, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31976
2021-09-17 14:19:06 -04:00
Mark Johnston
ade1daa5c0 socket: Synchronize soshutdown() with listen(2) and AIO
To handle shutdown(SHUT_RD) we flush the receive buffer of the socket.
This may involve searching for control messages of type SCM_RIGHTS,
since we need to close the file references.  Closing arbitrary files
with socket buffer locks held is undesirable, mainly due to lock
ordering issues, so we instead make a copy of the socket buffer and
operate on that without any locks.  Fields in the original buffer are
cleared.

This behaviour clobbered the AIO job queue associated with a receive
buffer.  It could also cause us to leak a KTLS session reference.
Reorder socket buffer fields to address this.

An alternate solution would be to remove the hack in sorflush(), but
this is not quite feasible (yet).  In particular, though sorflush()
flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to
be queued - the flag just prevents userspace from reading more data.  I
suspect we should fix this; SBS_CANTRCVMORE represents a terminal state
and protocols can likely just drop any data destined for such a buffer.
Many of them already do, but in some cases the check is racy, and some
KPI churn will be needed to fix everything.  This approach is more
straightforward for now.

Reported by:	syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com
Reported by:	syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31976
2021-09-17 14:19:06 -04:00
Mark Johnston
883761f0a8 socket: Remove NOFREE from the socket zone
This flag was added during the transition away from the legacy zone
allocator, commit c897b81311792ccf6a93feff2a405e2ae53f664e.  The old
zone allocator effectively provided _NOFREE semantics, but it seems that
they are not required for sockets.  In particular, we use reference
counting to keep sockets live.

One somewhat dangerous case is sonewconn(), which returns a pointer to a
socket with reference count 0.  This socket is still effectively owned
by the listening socket.  Protocols must therefore be careful to
synchronize sonewconn() calls with their pru_close implementations,
since for listening sockets soclose() will abort the child sockets.  For
example, TCP holds the listening socket's PCB read locked across the
sonewconn() call, which blocks tcp_usr_close(), and sofree()
synchronizes with a concurrent soabort() of the nascent socket.
However, _NOFREE semantics are not required here.

Eliminating _NOFREE has several benefits: it enables use-after-free
detection (e.g., by KASAN) and lets the system reclaim memory from the
socket zone under memory pressure.  No functional change intended.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31975
2021-09-17 14:19:06 -04:00
Mark Johnston
6b288408ca socket: Add assertions around naked refcount decrements
Sockets in a listen queue hold a reference to the parent listening
socket.  Several code paths release this reference manually when moving
a child socket out of the queue.

Replace comments about the expected post-decrement refcount value with
assertions.  Use refcount_load() instead of a plain load.  No functional
change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31974
2021-09-17 14:19:06 -04:00
Mark Johnston
dfcef87714 socket: Fix a use-after-free in soclose()
After releasing the fd reference to a socket "so", we should avoid
testing SOLISTENING(so) since the socket may have been freed.  Instead,
directly test whether the list of unaccepted sockets is empty.

Fixes:		f4bb1869ddd2 ("Consistently use the SOLISTENING() macro")
Pointy hat:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31973
2021-09-17 14:19:05 -04:00
Mark Johnston
bf25678226 ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers
ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden.  Fix this by using a separate output
parameter.  Convert to the new socket buffer locking macros while here.

Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31977
2021-09-17 14:19:05 -04:00
Mark Johnston
d6e77cda9b uma: Show the count of free slabs in each per-domain keg's sysctl tree
This is useful for measuring the number of pages that could be freed
from a NOFREE zone under memory pressure.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-17 14:19:05 -04:00
Mark Johnston
7fabaac221 rpc: Convert an SOLISTENING check to an assertion
Per the comment, this socket should always be a listening socket.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-17 14:19:05 -04:00
Mark Johnston
40fcdb9366 kcov: Disable address and memory sanitizers in get_kinfo()
get_kinfo() is only called from the coverage sanitizer callbacks, which
are similarly uninstrumented.

Sponsored by:	The FreeBSD Foundation
2021-09-17 14:19:05 -04:00
Konstantin Belousov
197a4f29f3 buffer pager: allow get_blksize method to return error
Reported and reviewed by:	asomers
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31998
2021-09-17 20:29:55 +03:00
John Baldwin
c6efcb1281 bhyve: Support setting the disk serial number for VirtIO block devices.
Reviewed by:	allanjude
Obtained from:	illumos
Differential Revision:	https://reviews.freebsd.org/D31983
2021-09-17 09:55:48 -07:00
Mark Johnston
7eb138a9e5 libc/locale: Fix races between localeconv(3) and setlocale(3)
Each locale embeds a lazily initialized lconv which is populated by
localeconv(3) and localeconv_l(3).  When setlocale(3) updates the global
locale, the lconv needs to be (lazily) reinitialized.  To signal this,
we set flag variables in the locale structure.  There are two problems:

- The flags are set before the locale is fully updated, so a concurrent
  localeconv() call can observe partially initialized locale data.
- No barriers ensure that localeconv() observes a fully initialized
  locale if a flag is set.

So, move the flag update appropriately, and use acq/rel barriers to
provide some synchronization.  Note that this is inadequate in the face
of multiple concurrent calls to setlocale(3), but this is not expected
to work regardless.

Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
demonstrating the race.

PR:		258360
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31899
2021-09-17 10:47:46 -04:00
Ed Maste
deef4b8ce8 readelf: document that -u / --unwind is not yet implemented
ELF tool chain readelf accepts -u / --unwind but just ignores the
option.  This was previously undocumented, which could be confusing for
someone encountering `readelf -u` (in a script or GNU readelf example).

Reported by:	markj (in D32003)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-17 10:22:05 -04:00
Ed Maste
f161abf9f2 readelf: include notes (-n) and unwind (-u) in --all/-a
This matches the GNU and LLVM versions of readelf.

As markj noted in the review -u is not actually implemented yet and has
no effect.  The option is accepted and just ignored.

Reported by:	andrew
Reviewed by:	andrew, markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32003
2021-09-17 09:51:59 -04:00
Konstantin Belousov
ac8af19380 proccontrol(1): Add wxmap control
Reviewed by:	brooks, emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31779
2021-09-17 15:42:07 +03:00
Konstantin Belousov
796a8e1ad1 procctl(2): Add PROC_WXMAP_CTL/STATUS
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.

Reviewed by:	brooks, emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31779
2021-09-17 15:42:01 +03:00
Konstantin Belousov
1349891a0e Style
Reviewed by:	brooks, emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31779
2021-09-17 15:41:54 +03:00
Mike Karels
fd0765933c Change lowest address on subnet (host 0) not to broadcast by default.
The address with a host part of all zeros was used as a broadcast long
ago, but the default has been all ones since 4.3BSD and RFC1122.  Until
now, we would broadcast the host zero address as well as the configured
address.  Change to not broadcasting that address by default, but add a
sysctl (net.inet.ip.broadcast_lowest) to re-enable it.  Note that the
correct way to use the zero address for broadcast would be to configure
it as the broadcast address for the network.

See https:/datatracker.ietf.org/doc/draft-schoen-intarea-lowest-address/
and the discussion in https://reviews.freebsd.org/D19316.  Note, Linux
now implements this.

Reviewed by:	rgrimes, tuexen; melifaro (previous version)
MFC after:	1 month
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D31861
2021-09-16 19:42:20 -05:00
Colin Percival
b43d7aa09b EC2: Default to UEFI booting
This reduces the FreeBSD boot time by approximately 5 seconds,
roughly equally divided betwenn two factors:
* Disk I/O is faster in the EFI loader since it can perform larger
I/Os.  (The BIOS loader is limited due to the use of bounce buffers
in sub-1M memory.)
* The EFI console is much faster than the VGA console.

Note however that not all EC2 instance types support UEFI; as a
general rule the newer instances (based on Amazon's "Nitro" platform)
support UEFI but the older instances (based on Xen) do not.

X-MFC:	TBD based on tradeoff between performance and compatibility
Relnotes:	yes
Sponsored by:	https://www.patreon.com/cperciva
2021-09-16 12:23:19 -07:00
Colin Percival
0aa2a94ea6 EC2: Allow AMI boot mode to be specified
The default boot method for amd64 AMIs is BIOS, but at AMI creation
time a flag can be set to specify that UEFI should be used instead.
This commit adds a variable AMIBOOTMETHOD which, if set to "UEFI",
causes the appropriate flag to be set during AMI creation.

The only boot method supported by EC2 for arm64 is UEFI.

The names of AMIs are also amended to include the boot method; they
now look like "FreeBSD 14.0-CURRENT-amd64-20210915 UEFI".

MFC after:	1 week
Sponsored by:	https://www.patreon.com/cperciva
2021-09-16 12:23:19 -07:00
Ed Maste
adb56e58e8 openssh: use global state for blacklist in grace_alarm_handler
Obtained from:	security/openssh-portable
Fixes:		19261079b743 ("openssh: update to OpenSSH v8.7p1")
Sponsored by:	The FreeBSD Foundation
2021-09-16 14:10:11 -04:00
Warner Losh
7cf62c68c0 nanobsd: Provide empty routines for new embedded scheme
calculate_partitioning and create_code_slice are now required in
nanobsd.sh. While things work with the ones provided by legacy.sh, it's
fighting embedded/common's other actions. Instead, replace them with
stubs.

Sponsored by:		Netflix
2021-09-16 11:54:18 -06:00
Konstantin Belousov
9a8eb5db55 test/ptrace/scescx.c: fix printing of braces for syscalls without args
Also do not print stray closing brace for error condition.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-09-16 20:26:18 +03:00
Konstantin Belousov
f575573ca5 Remove PT_GET_SC_ARGS_ALL
Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.

Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31968
2021-09-16 20:11:27 +03:00
Konstantin Belousov
181bfb42fd vm_phys: do not ignore phys_avail[] segments that do not fit completely into vm_phys segments
If phys_avail[] segment only intersect with some vm_phys segment, add
pages from it to the free list that belong to the given vm_phys_seg,
instead of dropping them.

The vm_phys segments are generally result of subdivision of phys_avail
segments, for instance DMA32 or LOWMEM boundaries split them. On
amd64, after UEFI in-place kernel activation (copy_staging disable)
was enabled, we typically have a large phys_avail[] segment below 4G
which crosses LOWMEM (1M) boundary. With the current way of requiring
phys_avail[] fully fit into vm_phys_seg, this memory was ignored.

Reported by:	madpilot
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31958
2021-09-16 20:01:19 +03:00
Marcin Wojtas
e8a8725360 pci_host_generic: update Synopsys device description for ACPI
The recent addition of Synopsys ECAM quirk set the
device description only for the DT variant.
Do the same in ACPI case.

Reported by: jrtc27
2021-09-16 16:53:11 +02:00
Artur Rojek
a3f0d18237 ena: fix building in-kernel driver
When building ENA as compiled into the kernel, the driver would fail to
build. Resolve the problem by introducing the following changes:
1. Add missing `ena_rss.c` entry in `sys/conf/files`.
2. Prevent SYSCTL_ADD_INT from throwing an assert due to an extra
CTLTYPE_INT flag.

Fixes: 986e7b92276 ("ena: Move RSS logic into its own source files")
Fixes: 6d1ef2abd33 ("ena: Implement full RSS reconfiguration")

Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
2021-09-16 16:47:45 +02:00
Marko Zec
eb3148cc4d [fib algo][dxr] Fix division by zero.
A division by zero would occur if DXR would be activated on a vnet
with no IP addresses configured on any interfaces.

PR:		257965
MFC after:	3 days
Reported by:	Raul Munoz
2021-09-16 16:34:05 +02:00
Piotr Pawel Stefaniak
12061d2626 diff: link with libm for sqrt()
Reported by:	Jenkins
Fixes:		bcf2e78dc48378456798191f1c15cb76d6221a65
2021-09-16 09:31:44 +02:00
Peter Holm
bab406830a stress2: Added more unionfs tests 2021-09-16 06:29:07 +00:00
Rick Macklem
9ebe4b8c67 nfscl: Add vfs.nfs.maxalloclen to limit Allocate/Deallocate RPC RTT
Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not
allow a reply with partial completion.  As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation/deallocation in the NFSv4.2
client.

This patch adds a sysctl called vfs.nfs.maxalloclen to set
the limit on the size of the Allocate operation.
There is no way to know how long a server will take to do an
allocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on, so that is what
the default value for vfs.nfs.maxalloclen is set to.

For an 8Gbyte allocation, the elapsed time for doing it in 64Mbyte
chunks was the same as the elapsed time taken for a single large
allocation operation for a FreeBSD server with a UFS file system.

MFC after:	2 weeks
2021-09-15 17:29:45 -07:00
Piotr Pawel Stefaniak
e51aabf8cb diff: implement option -F (--show-function-line)
With unified and context diffs, show the last line that matches the
provided pattern before the context.

Reviewed by:	bapt
Differential Revision:	https://reviews.freebsd.org/D31714
2021-09-16 01:46:44 +02:00
Cameron Katri
f38702e5a5 diff(1): Add --color support
Adds a --color flag to diff(1) that supports the same options as GNU's
diff(1). The colors are customizable with the env var DIFFCOLORS in
a format similar to grep(1)'s GREPCOLORS. An example would be 04;36:41
for additions to be underlined light blue, and deletions have a red
background.

Differential Revision:	https://reviews.freebsd.org/D30545
2021-09-16 01:46:44 +02:00
Piotr Pawel Stefaniak
7760b85414 diff: decrease indent level
An upcoming change will add more code in the loop.
2021-09-16 01:46:44 +02:00
Piotr Pawel Stefaniak
2171b2cbe0 diff: avoid applying offsets to null pointer
This was the only instance of undefined behavior I could find so far.
2021-09-16 01:46:44 +02:00
Piotr Pawel Stefaniak
bcf2e78dc4 diff: replace isqrt() with sqrt()
Remove cruft and use a system-provided and maintained function instead.
2021-09-16 01:46:43 +02:00
Piotr Pawel Stefaniak
e43df07e37 diff: move functions around and reduce their visibility
Most of them become static. There will be more such functions added in
upcoming commits, so they would be inconsistent with existing code.
Improve the existing code instead of reinforcing the unwanted pattern.
2021-09-16 01:36:41 +02:00
Piotr Pawel Stefaniak
b5541f456d diff: convert boolean flag variables to bool
There will be more boolean flags added in upcoming commits and they
would have to be stored in ints in order to be consistent with existing
code. Change the existing code to use the bool type.
2021-09-16 01:36:41 +02:00
Piotr Pawel Stefaniak
0358202111 diff: improve code style
Reflow comments, strip trailing space, improve wrapping of lines.
2021-09-16 01:36:41 +02:00
Kevin Bowling
22b20b45c9 e1000: Fix variable typo
Forgot to git add this in last commit

Reported by:	jenkins
Fixes:		2796f7cab107
MFC after:	2 week
2021-09-15 09:18:59 -07:00
Kevin Bowling
2796f7cab1 e1000: Fix up HW vlan ops
* Don't reset the entire adapter for vlan changes, fix up the problems
* Add some functions for vlan filter (vfta) manipulation
* Don't muck with the vfta if we aren't doing HW vlan filtering
* Disable interrupts when manipulating vfta on lem(4)-class NICs
* On the I350 there is a specification update (2.4.20) in which the
suggested workaround is to write to the vfta 10 times (if at first you
don't succeed, try, try again). Our shared code has the goods, use it
* Increase a VF's frame receive size in the case of vlans

From the referenced PR, this reduced vlan configuration from minutes
to seconds with hundreds or thousands of vlans and prevents wedging the
adapter with needless adapter reinitialization for each vlan ID.

PR:		230996
Reviewed by:	markj
Tested by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30002
2021-09-15 08:03:01 -07:00
Marko Zec
b51f8bae57 [fib algo][dxr] Optimize trie updating.
Don't rebuild in vain trie parts unaffected by accumulated incremental
RIB updates.

PR:		257965
Tested by:	Konrad Kreciwilk
MFC after:	3 days
2021-09-15 22:42:49 +02:00
Marko Zec
442c8a245e [fib algo][dxr] Fix undefined behavior.
The result of shifting uint32_t by 32 (or more) is undefined: fix it.
2021-09-15 22:42:48 +02:00
John Baldwin
0cd6e85e24 iscsi: Abort data-out tasks queued on a terminating session.
cfiscsi_datamove_out() can race with cfiscsi_session_terminate_tasks()
and enqueue a new task after the latter function has aborted existing
tasks.  This could result in a deadlock as
cfiscsi_session_terminate_tasks() waited forever for this task to
complete.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31892
2021-09-15 13:25:30 -07:00
John Baldwin
529364b032 iscsi: Add a helper routine to abort a data-out task.
Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31891
2021-09-15 13:25:04 -07:00
Alan Somers
ff33e5c83f stress2: replace fuse.ko with fusefs.ko
It got renamed in FreeBSD 13

Reviewed by:	pho
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision: https://reviews.freebsd.org/D31963
2021-09-15 12:59:21 -06:00
Leandro Lupori
a58abcde2c powerpc64: change CAS to support Radix MMU
Use radix_mmu environment variable to select between Hash or Radix
MMU, when performing the CAS method call. This matches kernel's
behavior, by selecting Hash MMU by default and Radix if radix_mmu
is not zero, to make sure that both loader and kernel always select
the same MMU.

The device tree is queried to detect Radix/GTSE support and to
find out if CAS is supported, making the old CPU version and HV
bit checks unnecessary now.

Reviewed by:		jhibbits
MFC after:		2 weeks
Sponsored by:		Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D31951
2021-09-15 15:24:40 -03:00
Emmanuel Vadot
78bf40e10c arm: rockchip: rk3288: Use the macros that already exists in rk_cru.h 2021-09-15 20:10:42 +02:00
Emmanuel Vadot
548a706608 arm64: rockchip: rk3328: Add watchdog clock
The watchdog clock is controlled by the secure world but we need a clock
to sastify the driver so add a fixed clock for it.

Reported by:   avg
2021-09-15 19:09:56 +02:00