This should be triggered only if arc_read() fails, i.e., quite rarely.
The same logic is already present in OpenZFS.
PR: 247445
Submitted by: jdolecek@NetBSD.org
MFC after: 1 week
The FDT may contain a short /chosen/bootargs string which we should pass
to boot_parse_cmdline. Notably, this allows the use of qemu's -append
option to pass things like -s to boot to single user mode.
Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25544
MK_EFI was added to kern.opts.mk in r331099, but is currently unused.
Take advantage of that fact and gate the build of efirt behind it.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D24673
Limit manipulations to use %rax as scratch to the pti portion of the
syscall entry code.
Submitted by: alc
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25722
Split the MANAGE privilege into MANAGE, SETMAC and CREATE_VAP.
+ VAP_MANAGE is everything but setting the MAC and creating a VAP.
+ VAP_SETMAC is setting the MAC address of the VAP.
Typically you wouldn't want the jail to be able to modify this.
+ CREATE_VAP is to create a new VAP. Again, you don't want to be doing
this in a jail, but this DOES stop being able to run some corner
cases like Dynamic WDS (DWDS) AP in a jail/vnet. We can figure this
bit out later.
This allows me to run wpa_supplicant in a jail after transferring
a STA VAP into it. I unfortunately can't currently set the wlan
debugging inside the jail; that would be super useful!
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D25630
This avoids a use-after-free reported for the userland stack.
Thanks to Taylor Brandstetter for suggesting a patch for
the userland stack.
MFC after: 1 week
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
When pmap operates in PTI mode, we must reload %cr3 on return to
userspace. In non-PCID mode the reload always flushes all non-global
TLB entries and we take advantage of it by only invalidating the KPT
TLB entries (there is no cached UPT entries at all).
In PCID mode, we flush both KPT and UPT TLB explicitly, but we can
take advantage of the fact that PCID mode command to reload %cr3
includes a flag to flush/not flush target TLB. In particular, we can
avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3
on return to usermode should be flushing. This is done by providing
either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is
automatically reset to all-1s on return to usermode.
Similarly, we can avoid flushing UPT TLB on context switch, replacing
it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID
PTI ifunc, leaving only 4 cases instead of 6. This trick is also
applicable both to the TLB shootdown IPI handlers, since handlers
interrupt the target thread.
But then we need to check pc_curpmap in handlers, and this would
reopen the same race for INVPCID machines as was fixed in r306350 for
non-INVPCID. To not introduce the same bug, unconditionally do
spinlock_enter() in pmap_activate().
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25483
While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12.
There's also no reason for it, as we can easily report the out of memory error
to the caller (i.e. userspace). All of these can already fail.
PR: 248046
MFC after: 3 days
verified under VMWare Fusion, comparing to what's reported under CentOS,
and by comparing numbers reported by linuxulator on T420 with a googled
up Linux cpuinfo (https://lkml.org/lkml/2011/11/29/116).
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20693
Rather than marking the read ahead/behind pages valid even though they were
not initialized, free them using the new function vm_page_free_invalid().
Reviewed by: markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25430
that might be wired. If the page is wired then it cannot be freed now,
but the thread that eventually unwires it will free it at that point.
Reviewed by: markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25430
ACL packet boundary flag should be 0 instead of 2 for LE PDU.
Some HCI will drop LE packet with PB flag is 2, and if sent,
some target may reject the packet.
PR: 248024
Reported by: Greg V
Reviewed by: Greg V, emax
Differential Revision: https://reviews.freebsd.org/D25704
This new function allows us to find the SMMU instance assigned
for a particular PCI RID.
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D25687
The function is called from a KLD load handler, so it may sleep.
- Stop checking for errors from uma_zcreate(), they don't happen.
- Convert M_NOWAIT allocations to M_WAITOK.
- Remove error handling for existing M_WAITOK allocations.
- Fix style.
Reviewed by: cem, delphij, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25696
When emulating a load pair or store pair in dtrace on arm64 we need to
copy the data between the stack and trap frame. When the registers are
either the link register or the zero register we will access memory
past the end of the trap frame as these are encoded as registers 30 and
31 respectively while the array they access only has 30 entries.
Fix this by creating 2 helper functions to perform the operation with
special cases for these registers.
Sponsored by: Innovate UK
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
These routines are similar to crypto_getreq() and crypto_freereq() but
operate on caller-supplied storage instead of allocating crypto
requests from a UMA zone.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25691
In xpt_release_device(), callout_stop() was being called without
holding the mutex (send_mtx) that is used to protect the callout.
So, move the mtx_unlock() call so that it is protected.
MFC after: 1 week
Sponsored by: Spectra Logic
In r361752 an error handling was introduced for using 0.0.0.0 or
255.255.255.255 as the address in connect() for TCP, since both
addresses can't be used. However, the stack maps 0.0.0.0 implicitly
to a local address and at least two regressions were reported.
Therefore, re-allow the usage of 0.0.0.0.
While there, change the error indicated when using 255.255.255.255
from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect().
Reviewed by: rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D25401
If the hypervisor advertises support for the DISCARD command then the
guest can perform TRIM commands, freeing space on the backing store.
If VIRTIO_BLK_F_DISCARD is enabled, advertise DISKFLAG_CANDELETE
Tested with FreeBSD guests on bhyve and KVM
Reviewed by: jhb
Tested by: freqlabs
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21708
This removes SCTP from in-tree kernel configuration files. Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25611
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.
Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.
This in particular fixes poll2 which passes 128 fds.
$ ./poll2_processes -s 10
before: 134411
after: 271572
if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time
an ethernet interface is setup *after*
SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to
nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu
*before* ifp->if_mtu has been properly set in some scenarios, e.g., USB
ethernet adapter plugged in later on.
For interfaces that are created in early boot, we don't have this issue as
domains aren't constructed enough for them to attach and thus it gets
deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND
*after* the mtu has been set earlier in ether_ifattach().
PR: 248005
Submitted by: Mathew <mjanelle blackberry com>
MFC after: 1 week
This shortens fdalloc by over 60 bytes. Correctness verified by running both
variants at the same time and comparing the result of each call.
Note someone(tm) should make a pass at converting everything else feasible.