Thread switching used to be atomic with respect to the current CPU's
tdq lock. Since commit 686bcb5c14ab that is no longer the case. Now
sched_switch() does this:
1. lock tdq (might already be locked)
2. maybe put the current thread in the tdq, choose a new thread to run
2a. update tdq_lowpri
3. unlock tdq
4. switch CPU context, update curthread
Some code paths in ULE will load pc_curthread from a remote CPU with
that CPU's tdq lock held, usually to inspect its priority. But, as of
the aforementioned commit this is racy.
The problem I noticed is in tdq_notify(), which optionally sends an IPI
to a remote CPU when a new thread is added to its runqueue. If the new
thread's priority is higher (lower) than the currently running thread's
priority, then we deliver an IPI. But inspecting
pc_curthread->td_priority doesn't work, since pc_curthread might be
between steps 3 and 4 above. If pc_curthread's priority is higher than
that of the newly added thread, but pc_curthread is switching to a
lower-priority thread, then tdq_notify() might fail to deliever an IPI,
leaving a high priority thread stuck on the runqueue for longer than it
should. This can cause multi-millisecond stalls in
interactive/ithread/realtime threads.
Fix this problem by modifying tdq_add() and tdq_move() to return the
value of tdq_lowpri before the addition of the new thread. This ensures
that tdq_notify() has the correct priority value to compare against.
The other two uses of pc_curthread are susceptible to the same race. To
fix the one in sched_rem()->tdq_setlowpri() we need to have an exact
value for curthread. Thus, introduce a new tdq_curthread field to the
tdq which gets updated any time a new thread is selected to run on the
CPU. Because this field is synchronized by the thread lock, its
priority reflects the correct lowpri value for the tdq.
PR: 264867
Fixes: 686bcb5c14ab ("schedlock 4/4")
Reviewed by: mav, kib, jhb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35736
Older sysctls with constant OID macros were identified with those
in inet.4, tcp.4, and udp.4; newer sysctls with automatic numbering
were identified by sysctl names. No one remembers the OID macros,
or knows what they are; sysctls are always done by name now, usually
via sysctl(8).
Replace the OID macro names with sysctl names so that there is one
uniform identifier type; sysctl names were previously in parens.
Make the formatting a little more consistent in this area. In inet.4
and udp.4, move the "ip." or "udp." prefix from each entry into the
top-level name at the start of the section, as they are all the same.
Reviewed by: rpokala
Differential Revision: https://reviews.freebsd.org/D35806
with grab_faults, we can try to print out the trace of function calls.
Without symbol table, we can not translate addresses to function names,
but even addresses can help to track the bugs.
For loader functions, print out absolute address, so it could be
searched from objdump -d output.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D35433
On x86 systems, the debug.late_console tunable makes it possible to set
up the console before we call pmap_bootstrap. (The tunable is turned
on by default; setting late_console=0 results in consoles being probed
early.)
Unfortunately this is not compatible with using the ACPI SPCR table to
find the console, since consulting ACPI tables requires mapping memory
addresses. As such, we skip the call to uart_cpu_acpi_spcr from
uart_cpu_x86 in the !late_console case.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35794
Currently for distributeworld we pass DESTDIR to certctl.sh as an
environment variable, which sets the default value in the script.
However, for -DNO_ROOT builds, CERTCTLFLAGS has METALOG_INSTALLFLAGS
which includes -D ${DESTDIR}, overriding the custom DESTDIR pointing at
the base dist directory.
Moreover, in order to ensure that the METALOG includes the base/ prefix
for all the files, we need to have certctl call install with -D set to
DESTDIR/DISTDIR without the /base suffix but also ensure the files get
installed to DESTDIR/DISTDIR/base.
Fix these by passing the custom DESTDIR to certctl via -D rather than in
the environment and to pass the /base suffix in the distributeworld case
via the newly-added -d option.
We also need to run certctl rehash before we generate the .meta files
from the METALOG, not after, otherwise they won't include the METALOG
additions, so move the certctl rehash call.
Finally, add a missing semicolon that results in no message being
printed in the missing openssl case. By not including the semicolon,
else echo "..." is treated as extra arguments to certctl, which is lax
in its argument parsing and ignores additional arguments, and the
semicolon and fi after the intended echo terminate the if statement as
normal so there's no syntax error at the shell level. This is harmless
as we weren't trying to do anything other than echo anyway, all that
happens is the echo doesn't actually get run.
Reported by: markj (missing semicolon)
Reviewed by: brooks, kevans
Obtained from: CheriBSD
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35809
This will be used by Makefile.inc1 to fix -DNO_ROOT distributeworld,
which needs to split out DESTDIR from DISTBASE so the METALOG file
includes the base/ prefix.
Reviewed by: kevans
Obtained from: CheriBSD
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35808
I mis-read the RFC w.r.t. handling of the sequenceid
when a CreateSession is done after the initial one
that confirms the ClientID. Fortunately this does
not affect most extant NFSv4.1/4.2 clients, since
they only acquire a single session for TCP for a
ClientID (Solaris might be an exception?).
This patch fixes the server to handle this case,
where the RFC requires the sequenceid be incremented
for each CreateSession and is required to reply to
a retried CreateSession with a cached reply.
It adds a field to nfsclient called lc_prevsess,
which caches the sessionid, which is the only field
in a CreateSession reply that will change for a
retry, to implement this reply cache.
The recent commits up to d4a11b3e3bdd that mark
session slots bad when "intr" and/or "soft" mounts
are used by the client needs this server patch.
Without this patch, the client will do a full
recovery, including a new ClientID, losing all
byte range locks. However, prior to the recent
client commits, the client would hang when all
session slots were bad, so even without this
patch it is not a regression.
PR: 260011
MFC after: 2 weeks
Not including ${dist} results in the following non-fatal error printed
once per extra distribution:
mkdir //usr/obj/usr/src/amd64.amd64/release/dist/usr/include/i386
mkdir: //usr/obj/usr/src/amd64.amd64/release/dist/usr/include: No such file or directory
*** Error code 1 (ignored)
Also fix a whitespace nit on this line whilst here.
Reviewed by: brooks
Fixes: a09ea2bbc305 ("amd64: add an i386 include directory")
The K&R style in UFS and other places in the tree's days are numbered
as this syntax is removed in C2x proposal N2432:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf
Though running to nearly 6000 lines of diffs this update should cause
no functional change to the code.
Requested by: Warner Losh
MFC after: 2 weeks
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16). All changes are disabled by default, and can be
enabled by the following sysctls:
net.inet.ip.allow_net0=1
net.inet.ip.allow_net240=1
net.inet.ip.loopback_prefixlen=16
When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.
Add descriptions of the new sysctls to inet.4.
Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.
The proposals motivating this experimentation can be found in
https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127
Reviewed by: rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
- Remove a flag variable.
- Convert a runtime check of the passed cpuid to a KASSERT.
- Remove the cc_inited flag. An attempt to schedule a callout before
SI_SUB_CPU will crash anyway since the per-CPU mutexes won't have been
initialized, and that flag was only checked in the case where a cpuid
was explicitly specified by the caller.
No functional change intended.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
In preparation for removing OBJT_DEFAULT, eliminate some stale/unhelpful
comments from vm_mmap(), and remove an unused case. In particular, the
remaining callers of vm_mmap() in the tree do not specify OBJT_DEFAULT.
It's much more common to use vm_map_find() to map an object into user
memory, so rather than adjusting vm_mmap() to handle OBJT_SWAP objects,
let's further discourage its use and simply remove OBJT_DEFAULT
handling.
Reviewed by: dougm, alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35778
When ice is built as a module, it can't be loaded due to unresolved
symbol. Ensuring in Makefile that the ice_rdma.c is built fixes the
problem.
Signed-off-by: Bartosz Sobczak <bartosz.sobczak@intel.com>
Signed-off-by: Eric Joyner <erj@freebsd.org>
Reviewed by: erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D35687
The hw and ifp of a vsi will be NULL if such ixl_vsi is allocated
for a VF. When ixl_reconfigure_filters called, it is trying to
access vsi->ifp and hw->mac.addr and therefore is casuing panic.
This commit add checks to determine if vsi is a VF by checking
if vsi->hw is NULL, before adding IXL_VLAN_ANY filter (which
is already in a VF vsi's filter list) and VLAN filters.
(erj's Note: The SR-IOV flows need revisiting; this will help
prevent panics for now)
Reviewed by: krzysztof.galazka@intel.com, erj@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35649
55abf23dd36b inverted the value passed to origin_subst_one when rolling
up the existing code into a loop. If the first token is found ($ORIGIN),
this results in a wild free of part of strtab. Processing the second
token works fine and will act how the first should have regardless of
whether found, allocating memory for the string without freeing.
Processing subsequent tokens however will then leak, regardless of
whether found, as they will also believe they need to allocate memory
and can't free the string.
Found by: CHERI
Reviewed by: kib, markj
Fixes: 55abf23dd36b ("rtld: make token substitution table-driven")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D35792
vm_object_allocate_anon() automatically sets "charge" to 0 if no cred
reference is provided, so the caller doesn't need any conditional logic.
No functional change intended.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35781
This is in preparation for removal of OBJT_DEFAULT. In particular, it
is now cheap to check whether an OBJT_SWAP object has any swap blocks
allocated, so the benefit of having a separate OBJT_DEFAULT type is
quite marginal, and the OBJT_DEFAULT->SWAP transition is a source of
bugs.
Reviewed by: alc, hselasky, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35779
If the link is down or we can't find a peer we do not transmit the
packet, but also don't fee it.
Remember to m_freem() mbufs we can't transmit.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Commit 981ef32230b2 added optional use of the session
slots marked bad to recover a new session when all
slots are marked bad. The recovery worked against
a FreeBSD NFSv4.1/4.2 server, but not a Linux one.
It turns out that it was a bug in the FreeBSD client
and not the Linux server.
This patch fixes the client so that DeleteSession
followed by CreateSession after receiving a
NFSERR_BADSESSION error reply works against the
Linux server (and conforms to the RFC).
This also implies that the FreeBSD NFSv4.1/4.2
server needs to be fixed in a future commit.
Without the fix, the FreeBSD server does a full
recovery, including creation of a new ClientID,
but since "intr" mounts were broken, this does
not result in a regression.
This patch only affects the case where a CreateSession
is done for an already confirmed ClientID, which was
not being done prior to commit 981ef32230b2.
PR: 260011
MFC after: 2 weeks
Commit 326bcf9394c7 added a new "cred" argument to nfscl_reqstart().
Fsinfo is a NFSv3 RPC and since the "cred" argument is not
used for NFSv3, it does not matter what is passed in.
However, to be consistent with the rest of the patch, change the
argument to NULL.
This patch should not result in a semantics change.
PR: 260011
MFC after: 2 weeks
The reason is that in order to protect a process procctl(2) needs
the PRIV_VM_MADV_PROTECT privilege, which is currently denied in jails
(see kern_jail.c).
MFC after: 1 week
When we extended the switch statement to allow for PROTO_LAYER2 |
PROTO_IPV6 in c21cbaca2b we didn't extend the check for a non-NULL
struct ifnet pointer.
Happily the only PROTO_IPV6 case is pf's layer 2 support, which always
provides one.
Reported by: Coverity (CID 1490459)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Stop including the current CPU in all event messages, since it's already
saved in KTR log entries and thus is redundant. All eventtimer traces
occur in a context where CPU migration is not possible.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
uma_timeout() has several responsibilities; it visits every UMA zone and
as of recently will drain underutilized caches, so is rather expensive
(>1ms in some cases). Currently it is executed by softclock threads
and so will preempt most other CPU activity. None of this work requires
a high scheduling priority, though, so defer it to a taskqueue so as to
avoid stalling higher-priority work.
Reviewed by: rlibby, alc, mav, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35738
In handleevents(), lock the timer state before fetching the time for the
next event. A concurrent callout_cc_add() call might be changing the
next event time, and the race can cause handleevents() to program an
out-of-date time, causing the callout to run later (by an unbounded
period, up to the idle hardclock period of 1s) than requested.
In cpu_idleclock(), call getnextcpuevent() with the timer state mutex
held, for similar reasons. In particular, cpu_idleclock() runs with
interrupts enabled, so an untimely timer interrupt can result in a stale
next event time being programmed. Further, an interrupt can cause
cpu_idleclock() to use a stale value for "now".
In cpu_activeclock(), disable interrupts before loading "now", so as to
avoid going backwards in time when calling handleevents(). It's ok to
leave interrupts enabled when checking "state->idle", since the race at
worst will cause handleevents() to be called unnecessarily. But use an
atomic load to indicate that the test is racy.
PR: 264867
Reviewed by: mav, jhb, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35735
Callers have already loaded the pointer, so these functions don't need
to fetch it again.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
- Correct the description (vm_fault_copy_entry() does not create a
shadow object).
- Move some initialization and assertions out of the scope of the object
locks, when doing so makes sense.
- Merge a pair of conditional blocks.
- Use __unused when appropriate.
No functional change intended.
Reviewed by: alc
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Teach pft_ether.py to send a range of packet sizes. Use this to move the
size sweep into Python, removing the repeated Python startup overhead
and greatly speeding up the pf.ether.short_pkt test.
This should fix test timeouts seen on ci.freebsd.org.
While here also extend the range of packet sizes tested, because it adds
very little runtime now.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Replace ppp(4) removed since FreeBSD 8.0-RELEASE with vlan(4).
While here, remove commented out reference to non-existing "egress"
interface group hiding since initial import of interface groups
from OpenBSD in 2006.
Commit 981ef32230b2 enabled marking of potentially bad
session slots when an RPC is interrupted if the "intr"
mount option is used. As such, it no longer makes
sense to call nfscl_hasexpired() for I/O operations that
reply NFSERR_BADSTATEID for NFSv4.1/4.2, which does a full
recovery of NFSv4 open state, destroying all byte range locks.
Recovery of open state should not be usually needed, since
the session slot has been marked potentially bad and,
although opens for the process that has been terminated via
a signal may be broken, locks for other processes will still
be valid.
This patch disables calls to nfscl_hasexpired for NFSv4.1/4.2
mounts, when I/O RPCs receive NFSERR_BADSTATEID replies.
It does not affect the behaviour of NFSv4.0 mounts nor
hard (non "intr") mounts.
PR: 260011
MFC after: 2 weeks
To deal with broken session slots caused by the use of the
"soft" and/or "intr" mount options, nfsv4_sequencelookup()
has been modified to track the potentially broken session
slots (commit 40ada74ee1da). Then, when all session slots
are potentially broken, nfsv4_sequencelookup() does a
DeleteSession operation, so that the NFSv4.1/4.2 server will
reply NFSERR_BADSESSION to uses of the session.
The client will then recover by doing a CreateSession to
acquire a new session.
This patch adds the code that marks potentially bad
slots, so that the above semantics become functional.
It has been successfully tested against a FreeBSD
NFSv4.1/4.2 server, but does not work against a Linux 5.15
NFSv4.1/4.2 server. (The Linux 5.15 server creates
a new session with the same sessionid as the destroyed
one and, as such, keeps returning NFSERR_BADSESSION.
I believe this is a bug in the Linux server.)
However, this should not cause a regression and will
make "intr" mounts fairly usable against the NFSv4.1/4.2
servers where it works.
PR: 260011
MFC after: 2 weeks
Merge lowermatch and uppermatch into find_space. Eliminate uppermatch
recursion. Merge match_insert into match_one and eliminate some
redundant calculation. Move some initialization out of find_space and
into map (and out from under a lock).
Reviewed by: kib (previous version), alc
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D35440