KCSAN complains about racy accesses in the locking code. Those races are
fine since they are inside a TD_SET_RUNNING() loop that expects the value
to be changed by another CPU.
Use relaxed atomic stores/loads to indicate that this variable can be
written/read by multiple CPUs at the same time. This will also prevent
the compiler from doing unexpected re-ordering.
Reported by: GENERIC-KCSAN
Test Plan: KCSAN no longer complains, kernel still runs fine.
Reviewed By: markj, mjg (earlier version)
Differential Revision: https://reviews.freebsd.org/D28569
The "findstack", "show all trace", and "show active trace" commands
were iterating over allproc to enumerate threads. This missed threads
executing in exit1() after being removed from allproc.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27829
Exiting processes that have been removed from allproc but are still
executing are not yet marked PRS_ZOMBIE, so they were not listed (for
example, if a thread panics during exit1()). To detect these
processes, clear p_list.le_prev to NULL explicitly after removing a
process from the allproc list and check for this sentinel rather than
PRS_ZOMBIE when walking the pidhash.
While here, simplify the pidhash walk to use a single outer loop.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27824
This is useful for stack unwinders which need to avoid out-of-bounds
reads of a kernel stack which can trigger kernel faults.
Reviewed by: kib, markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27356
_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers. Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST(). Plumb const through all of the
underlying sleepqueue bits.
No functional change.
Reviewed by: rlibby
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D22914
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21447
The previous calculations for displaying the time since last switch
easily overflowed, after less than 36 min for hz=1000. Now overflow
takes 2000 times longer (as long as ticks takes to wrap).
Reviewed by: cem, markj
Sponsored by: Dell EMC Isilon
Differential revision: https://reviews.freebsd.org/D20273
This can aid with debugging when a thread is running and has no backtrace.
State can be estimated based on the pcb, and refined from there, for
example, to get a rough idea of the stack pointer.
We use ps to collect the information of all processes in textdump. But
it doesn't contain process arguments which however sometimes are very
useful for debugging. The new 'a' modifier adds that capability.
While here, remove 'm' modifier from ddb.4. It was in the manual page
from its very first revision, but I could not find any evidence of the
code ever supporting it.
Submitted by: Terry Hu <thu@panzura.com>
Reviewed by: kib
MFC after: 1 week
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D16603
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
the fixed-width numeric field 'wchan', as in ps(1). They were sort
of centered, although the template shows 'state' as right-justified.
The `wmesg' field very rarely has a prefix of '*' (for lock names)
that is still to the left of the header, and the width of this field
is reduced from 8 to 7 (more than 6 is an error).
The 'wmesg' and 'wchan' fields are still misnamed and poorly handled.
They are named sort of backwards relative to ps(1):
- wmesg in ddb = mwchan in ps
- wmesg in ddb = wchan in ps (if it is a wait channel name, not a lock name)
- wchan in ddb = nwchan in ps
ddb ps wastes lots of space for the unimportant 'wchan' field (20
columns altogether on 64-bit arches). ps(1) documents using a
compressed format, but the compression only omits leading nybbles of
0 so it has neveqr worked on arches that put the kernel in the top half
of the address space. It just avoids wasting space for an 0x prefix.
specifics of callout KPI. Esp., do not depend on the exact interface
of callout_stop(9) return values.
The main change is that instead of requiring precise callouts, code
maintains absolute time to wake up. Callouts now should ensure that a
wake occurs at the requested moment, but we can tolerate both run-away
callout, and callout_stop(9) lying about running callout either way.
As consequence, it removes the constant source of the bugs where
sleepq_check_timeout() causes uninterruptible thread state where the
thread is detached from CPU, see e.g. r234952 and r296320.
Patch also removes dual meaning of the TDF_TIMEOUT flag, making code
(IMO much) simpler to reason about.
Tested by: pho
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7137
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.
The tunable was tested on x86 only. From the visual inspection, it
seems that it might work on arm and powerpc. The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size. I only
changed the macros to use variable instead of constant, since I cannot
test.
On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.
Sponsored by: The FreeBSD Foundation
MFC after: 2 week
If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.
Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.
The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool. This also involved cleaning some type
mismatches and ansifying old C function declarations.
Pointed out by: bde
Discussed with: bde, ian, jhb
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.
Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.
Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.
The ULE scheduler compiles again but I have no idea if it works.
The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.
Tested by David Xu, and Dan Eischen using libthr and libpthread.
- Right justify 'pid' label.
- Move the uid column to the right 2 columns so that the 3 process id
columns (pid, ppid, pgrp) are grouped together.
- Expand the uid column to 5 chars.
- Don't indent the tid for multithreaded processes.
Requested by: bde (1, 2, 4)
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.
Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
now back to using fixed-size columns for output and each line of output
should fit in 80 columns on both 32-bit and 64-bit architectures. In
general the output is close to that of the userland ps(1) with the
exception that the 'wmesg' field is mostly similar to the "state" field
in top(1) in that it will show either a wmesg, a lock name (prefixed with
an *), "CPU xx" (for a running thread), or nothing if none of those three
conditions are true. It also respects td_name when listing threads in
a multithreaded process. There is a somewhat evilly-defined PTR64 macro
I use to make account for the change in the size of the 'wchan' column
in the formatted output (wchan is now the only pointer in the ps output
and is available so it can be passed to 'show sleepq', 'show turnstile',
or 'show lock').
- Add two new commands "show proc [process]" and "show thread [thread]"
that show details about the specified process or thread (specified
either by pid/tid or pointer), respectively. If an address it not
specified, it uses the current kdb thread.
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.
MFC after: 1 month
Inspired by: kris
but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.
The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.
With this change, ia64 has support for breakpoints.