The current global list is a significant problem, in particular induces a lot
of cross-domain thread frees. When running poudriere on a 2 domain box about
half of all frees were of that nature.
Patch below introduces per-domain thread data containing zombie lists and
domain-aware reaping. By default it only reaps from the current domain, only
reaping from others if there is free TID shortage.
A dedicated callout is introduced to reap lingering threads if there happens
to be no activity.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D27185
pipes get stated all thet time and this avoidably contributed to contention.
The pipe lock is only held to accomodate MAC and to check the type.
Since normally there is no probe for pipe stat depessimize this by having the
flag.
The pipe_state field gets modified with locks held all the time and it's not
feasible to convert them to use atomic store. Move the type flag away to a
separate variable as a simple cleanup and to provide stable field to read.
Use short for both fields to avoid growing the struct.
While here short-circuit MAC for pipe_poll as well.
The arm configs that required it have been removed from the tree.
Removing this option makes the callout code easier to read and
discourages developers from adding new configs without eventtimer
drivers.
Reviewed by: ian, imp, mav
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27270
The suser_enable sysctl allows to remove a privileged rights from uid 0.
This change introduce per jail setting which allow to make root a
normal user.
Reviewed by: jamie
Previous version reviewed by: kevans, emaste, markj, me_igalic.co
Discussed with: pjd
Differential Revision: https://reviews.freebsd.org/D27128
- Mask out recently added VV_* bits to avoid printing them twice.
- Keep VI_LOCKed on the same line as the rest of the flags.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27261
A copy-pasto left us copying in 24-bytes at the address of the rb pointer
instead of the intended target.
Reported by: sigsys@gmail.com
Sighing: kevans
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Discussed with: markj, trasz (earlier version)
Differential Revision: https://reviews.freebsd.org/D27016
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
One of the last shifts inadvertently moved these static assertions out of a
COMPAT_FREEBSD32 block, which the relevant definitions are limited to.
Fix it.
Pointy hat: kevans
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27222
Specifically, if we're waking up some value n > BATCH_SIZE, then the
copyin(9) is wrong on the second iteration due to upp being the wrong type.
upp is currently a uint32_t**, so upp + pos advances it by twice as many
elements as it should (host pointer size vs. compat32 pointer size).
Fix it by just making upp a uint32_t*; it's still technically a double
pointer, but the distinction doesn't matter all that much here since we're
just doing arithmetic on it.
Add a test case that demonstrates the problem, placed with the libthr tests
since one messing with _umtx_op should be running these tests. Running under
compat32, the new test case will hang as threads after the first 128 get
missed in the wake. it's not immediately clear how to hit it in practice,
since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed
to be around ~50 -- I did not spend much time digging into it.
The uintptr_t change makes no functional difference, but i've tossed it in
since it's more accurate (semantically).
Reported by: Andrew Gierth (andrew_tao173.riddles.org.uk, inspection)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27231
Add __unused to some args.
Change type of the iterator variables to match loop control.
Remove excessive {}.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27220
This moves entire large alloc handling out of all consumers, apart from
deciding to go there.
This is a step towards creating a fast path.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27198
Since thread_zone is marked NOFREE the thread_fini callback is never
executed, meaning memory allocated by seltdinit is never released.
Adding the call to thread_dtor is not sufficient as exiting processes
cache the main thread.
Refcounting was added to combat a race between selfdfree and doselwakup,
but it adds avoidable overhead.
selfdfree detects it can free the object by ->sf_si == NULL, thus we can
ensure that the condition only holds after all accesses are completed.
The global array has prohibitive performance impact on multicore systems.
The same data (and more) can be obtained with dtrace.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27199
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata. In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating. Right now this should be nop.
Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
The routine does not serve any practical purpose.
Memory can be allocated in many other ways and most consumers pass the
M_WAITOK flag, making malloc not fail in the first place.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27143
This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.
MFC after: 1 week
This gets rid of the most contended spinlock seen when creating/destroying
threads in a loop. (modulo kstack)
Tested by: alfredo (ppc64), bdragon (ppc64)
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list. Then it
acquires the global sigio lock and processes the list. However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.
Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty. Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.
Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object. However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object. However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.
Fix this by introducing funsetown_locked(), which avoids the racy check.
Reviewed by: kib
Reported by: pho
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27157
Note this still does not scale but is enough to move it out of the way
for the foreseable future.
In particular a trivial benchmark spawning/killing threads stops contesting
on tidhash.
This in particular unbreaks rtkit.
The limitation was a leftover of previous state, to quote a
comment:
/*
* Though lwpid is unique, only current process is supported
* since there is no efficient way to look up a LWP yet.
*/
Long since then a global tid hash was introduced to remedy
the problem.
Permission checks still apply.
Submitted by: greg_unrelenting.technology (Greg V)
Differential Revision: https://reviews.freebsd.org/D27158
There are workloads with very bursty tid allocation and since unr tries very
hard to have small-sized bitmaps it keeps reallocating memory. Just doing
buildkernel gives almost 150k calls to free coming from unr.
This also gets rid of the hack which tried to postpone TID reuse.
Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D27101
The intent is to replace the current id allocation method and a known upper
bound will be useful.
Reviewed by: kib (previous version), markj (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D27100