which makes stack prot correct for non-main threads created by binaries
with statically linked libthr.
Cache result, but do not engage into the full double-checked locking,
since calculation of the return value is idempotent.
PR: 252549
Reported and reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28075
Track the the current lock/reference state in a single variable,
rather than deducing the proper prison_deref() flags from a
combination of equations and hard-coded values.
Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked
mutex provides different failure mode.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Detect and use RDTSCP if available, instead of fence+RDTSC. For AMD Zens+,
use LFENCE+RDTSC instead of RDTSCP (or MFENCE;RDTSC previously).
Reviewed by: gallatin, markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Create array of rdtsc selectors and provide helper that calculate the
index into the selectors array.
Reviewed by: gallatin, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Instead of providing ifuncs for each kind of fence, define ifuncs
that combine fence and invocation of RDTSC. This refactoring makes
introduction of RDTSCP use possible.
Reviewed by: gallatin, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Use it in preference of Xfenced RDTSC if RDTSCP is supported. It is
recommended by both Intel and AMD. But, on AMD Zens and newer use
LFENCE, as recommended by AMD [*]. In particular, this means that now
AMD CPUs use more appropriate fence instead of too harsh MFENCe.
Add comment explaining the intent of the selection logic.
Reported by: gallatin [*]
Reviewed by: gallatin, markj
Tested by: gallatin, pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Instead of trying to maintain pg_jobc counter on each process group
update (and sometimes before), just calculate the counter when needed.
Still, for the benefit of the signal delivery code, explicitly mark
orphaned groups as such with the new process group flag.
This way we prevent bugs in the corner cases where updates to the counter
were missed due to complicated configuration of p_pptr/p_opptr/real_parent
(debugger).
Since we need to iterate over all children of the process on exit, this
change mostly affects the process group entry and leave, where we need
to iterate all process group members to detect orpaned status.
(For MFC, keep pg_jobc around but unused).
Reported by: jhb
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
This improves code structure and allows to put the lock asserts right
into place where the locks are needed.
Also move zeroing of the kinfo_proc structure from fill_kinfo_proc_only()
to fill_kinfo_proc(), this looks more symmetrical.
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
Proctree lock is needed for correct calculation and collection of the
job-control related data in kinfo_proc. There was even an XXX comment
about it.
Satisfy locking and lock ordering requirements by taking proctree lock
around pass over each bucket in proc_iterate(), and in sysctl_kern_proc()
and note_procstat_proc() for individual process reporting.
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
Increase the scope of the process group lock ownership. This ensures that
we are consistent in returning EIO for tty write from an orphan and delivery
of TTYOUT signals.
Reviewed by: jilles
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
Often, we have a process locked and need to get locked process group.
In this case, because progress group lock is before process lock,
unlocking process allows the group to be freed. See for instance
tty_wait_background().
Make pgrp structures allocated from nofree zone, and ensure type stability
of the pgrp mutex.
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871
Don't try to cleanup the zpool if we couldn't create a zpool in the
first place.
Submitted by: tmunro
MFC-with: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e
Similarly to what done for iflib in 1d238b07d5d4d9660ae0e,
this patch prevents access to the krings during the interface
reset triggered by netmap_register().
MFC after: 1 week
The netmap_reset() function is meant to be called by the driver
when they initialize (or re-initialize) a hardware ring.
However, since the introduction of support for opening (in
netmap mode) a subset of the available rings, netmap_reset()
may be called multiple times on actively used rings, causing
both kring and netmap ring to transition to an inconsistent
state.
This changes improves the situation by resetting all the
indices fields of the kring to 0, as expected after the
reinitialization of a hardware ring.
PR: 252518
MFC after: 1 week
When netmap_fl_refill() is called at initialization time (e.g.,
during netmap_iflib_register()), nic_i must be 0, since the
free list is reinitialized. At the end of the refill cycle, nic_i
must still be zero, because exactly N descriptors (N is the ring size)
are refilled.
This patch therefore fixes the assertions to check on nic_i rather
than on nm_i. The current netmap_reset() may in fact cause nm_i
to be != 0 while the device is resetting: this may happen when
multiple non-cooperating processes open different subsets of the
available netmap rings.
PR: 252518
MFC after: 1 week
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.
PR: 252453
MFC after: 1 week
This imposes a fairly severe limitation on space available for mmap that
was not noticed prior to commit. Unfixed mmap will only map from
[data + MAXSIZE, end of user VA space], bringing the amount of usable space
down way too low for non-trivial link jobs (for instance).
Reported by: mmel
When using NOTE_NSECONDS in the kevent(2) API, US_TO_SBT should be
used instead of NS_TO_SBT, otherwise the timeout results are
misleading.
PR: 252539
Reviewed by: kevans, kib
Approved by: kevans
MFC after: 3 weeks
ldd had #defines for AOUT, ELF, and ELF32. The removal of AOUT left a
possibly confusing gap. These are not used anywhere but this file so
renumber to avoid the gap.
Reported by: allanjude
Previously -q (just print a line when files differ) ignored flags like
-w (ignore whitespace). Avoid the D_BRIEF short-circuit when flags are
in effect.
PR: 252515
Reported by: Scott Aitken
Reviewed by: kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28064
Add two simple examples showing the use of the flags: d, n, s, t
While here, reorder cross references properly by section
Bump .Dd
Approved by: manpages (gbe@)
Differential Revision: https://reviews.freebsd.org/D27540
last(1): Bump .Dd
Add some examples showing the use of the flags: a, k, P, w
Reviewed by: gbe@, yuripv@
Approved by: manpages (gbe@)
Differential Revision: https://reviews.freebsd.org/D27545
This module intended to measure performance of routing lookups.
Uses a list of IP addresses specified by sysctl one-by-one.
Performance testing is triggered by changing sysctl OID with a number of lookups to execute.
Lookups are done by the chunks of 10K routes, entering/exiting epoch on
chunk granularity to amortise cost.
Example:
make -C sys/modules/test/fib_lookup unload load
for i in `cat ~/ip4.txt`; do sysctl net.route.test.add_inet_addr=$i; done
for i in `cat ~/ip6.txt`; do sysctl net.route.test.add_inet6_addr=$i; done
sysctl net.route.test.run_inet=10000000
dmesg | tail
Dec 13 23:24:05 current kernel: 10000000 packets in 417240173 nanoseconds, 23967011 pps
Dec 13 23:24:06 current kernel: run: 10000000 packets vnet 0xfffff80003073f00
Dec 13 23:24:07 current kernel: 10000000 packets in 423086254 nanoseconds, 23635842 pps
Differential Revision: https://reviews.freebsd.org/D27604
This change introduces loadable fib lookup modules based on
DPDK rte_lpm lib targeted for high-speed lookups in large-scale tables.
It is based on the lookup framework described in D27401.
IPv4 module is called dpdk_lpm4. It wraps around rte_lpm [1] library.
This library implements variation of DIR24-8 [2] lookup algorithm.
Module provide lockless route lookups and in-place incremental updates,
allowing for good RIB performance.
IPv6 module is called dpdk_lpm6. It wraps around rte_lpm6 [3] library.
Implementation can be seen as multi-bit trie where the stride or number of bits
inspected on each level varies from level to level.
It can vary from 1 to 14 memory accesses, with 5 being the average value
for the lengths that are most commonly used in IPv6.
Module provide lockless route lookups for global unicast addresses
and in-place incremental updates, allowing for good RIB performance.
Implementation details:
* wrapper code lives in `sys/contrib/dpdk_rte_lpm/dpdk_lpm[6].c`.
* rte_lpm[6] implementation contains both RIB and FIB code.
. RIB ("rule_") code, backed by array of hash tables part has been commented out,
as base radix already provides all the necessary primitives.
* link-local lookups are currently implemented as base radix lookup.
This part should be converted to something like read-only radix trie.
Usage detail:
Compile kernel with option FIB_ALGO and load dpdk_lpm4/dpdk_lpm6
module at any time. They will be picked up automatically when
amount of routes raises to several thousand.
[1]: https://doc.dpdk.org/guides/prog_guide/lpm_lib.html
[2]: http://yuba.stanford.edu/~nickm/papers/Infocom98_lookup.pdf
[3]: https://doc.dpdk.org/guides/prog_guide/lpm6_lib.html
Differential Revision: https://reviews.freebsd.org/D27412
create_blacklisted() will identify a cert whether it's provided a path to
a cert or the hash.serial format that is shown by `certctl list`.
Factor this logic out into a resolve_certname() so that it may be reused
elsewhere.
Use the new user.localbase sysctl here as well, to reduce the number of
hardcoded localbase by one (1).
MFC after: 3 days (note: just use a literal /usr/local default)
The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.
Point hat: me
The general style in sbin/nvmecontrol apppears to print uint64_t types
using %j, so I'm using that instead of the more general (but admittedly
ugly) PRIu64.
A test of this is funcs/tst.strtok.d which has this filter:
BEGIN
/(this->field = strtok(this->str, ",")) == NULL/
{
exit(1);
}
The test will randomly fail with exit status of 1 indicating that this->field
was NULL even though printing it out shows it is not.
This is compiled to the DTrace instruction set:
// Pushed arguments not shown here
// call strtok() and set result into %r1
07: 2f001f01 call DIF_SUBR(31), %r1 ! strtok
// set thread local scalar this->field from %r1
08: 39050101 stls %r1, DT_VAR(1281) ! DT_VAR(1281) = "field"
// Prepare for the == comparison
// Set right side of %r2 to NULL
09: 25000102 setx DT_INTEGER[1], %r2 ! 0x0
// string compare %r1 (strtok result) to %r2
10: 27010200 scmp %r1, %r2
In this case only %r1 is loaded with a string limit set to lim1. %r2 being
NULL does not get loaded and does not set lim2. Then we call dtrace_strncmp()
with MIN(lim1, lim2) resulting in passing 0 and comparing neither side.
dtrace_strncmp() handles this case fine and it already has been while
being lucky with what lim2 was [un]initialized as.
Reviewed by: markj, Don Morris <dgmorris AT earthlink.net>
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27671
Currently only libexec/rtld-elf32 uses internal LIBC_NOSSP_PIC during
the build but it gets it directly from the objdir rather than a sysroot.
For example, /usr/obj/usr/src/amd64.amd64/obj-lib32/lib/libc/libc_nossp_pic.a.
We don't stage lib32 libraries in WORLDTMP/usr/lib32 and doing so doesn't
buy much. If we want to use a staged lib32 library then we need to look in
LIBCOMPATTMP where they were staged. For example if LIBC_PIC were wanted then
look for /usr/obj/usr/src/amd64.amd64/obj-lib32/tmp/usr/lib32/libc_pic.a.
Reported by: rlibby
Reviewed by: rlibby
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27648
release/Makefile.inc1 has git executions that were being ran for each of
these lookups. The results were not needed so just lookup what we want
directly instead.
Reviewed by: gjb, rlibby, emaste (maybe)
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27643
Add an additional set of braces to clarify intention. The '&' operator
has a higher precedence than '|', but the reader may not always remember
this. No functional change.
This is just like debug.kdb.panic, except the string that's passed in
is reported in the panic message. This allows people with automated
systems to collect kernel panics over a large fleet of machines to
flag panics better. Strings like "Warner look at this hang" or "see
JIRA ABC-1234 for details" allow these automated systems to route the
forced panic to the appropriate engineers like you can with other
types of panics. Other users are likely possible.
Relnotes: Yes
Sponsored by: Netflix
Reviewed by: allanjude (earlier version)
Suggestions from review folded in by: 0mp, emaste, lwhsu
Differential Revision: https://reviews.freebsd.org/D28041