userspace to control NUMA policy administratively and programmatically.
Implement domainset based iterators in the page layer.
Remove the now legacy numa_* syscalls.
Cleanup some header polution created by having seq.h in proc.h.
Reviewed by: markj, kib
Discussed with: alc
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13403
We currently use a set of subroutines in kern_gzio.c to perform
compression of user and kernel core dumps. In the interest of adding
support for other compression algorithms (zstd) in this role without
complicating the API consumers, add a simple compressor API which can be
used to select an algorithm.
Also change the (non-default) GZIO kernel option to not enable
compressed user cores by default. It's not clear that such a default
would be desirable with support for multiple algorithms implemented,
and it's inconsistent in that it isn't applied to kernel dumps.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D13632
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
This helps simplify the code in kern_shutdown.c and reduces the number
of globally visible functions.
No functional change intended.
Reviewed by: cem, def
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11603
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.
ANSIfy related prototypes while here.
Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193
I fixed this in 1997, but the fix was over-engineered and fragile and
was broken in 2003 if not before. i386 parameters were copied to 8
other arches verbatim, mostly after they stopped working on i386, and
mostly without the large comment saying how the values were chosen on
i386. powerpc has a non-verbatim copy which just changes the uncritical
parameter and seems to add a sign extension bug to it.
Just treat negative offsets as offsets if they are no more negative than
-db_offset_max (default -64K), and remove all the broken parameters.
-64K is not very negative, but it is enough for frame and stack pointer
offsets since kernel stacks are small.
The over-engineering was mainly to go more negative than -64K for the
negative offset format, without affecting printing for more than a
single address.
Addresses in the top 64K of a (full 32-bit or 64-bit) address space
are now printed less well, but there aren't many interesting ones.
For arches that have many interesting ones very near the top (e.g.,
68k has interrupt vectors there), there would be no good limit for
the negative offset format and -64K is a good as anything.
in practice).
db_expr_t is a signed type, but right shifts are fudged to evaluate
them in an unsigned type, and the unsigned type was broken by hard-
coding it as 'unsigned', so casting to it lost the top bits on arches
with db_expr_t larger than u_int.
The unsigned type with the same size as db_expr_t is not declared;
assume that db_addr_t gives it. Fixing this properly is less important
than using the correct type for db_expr_t (originally always long for
C90, but always intmax_t since C99).
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
This lets one interrupt DDB's output, which is useful if paging is
disabled and the output device is slow.
Submitted by: Anton Rang <rang@acm.org>
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9138
On all of our platforms, db_expr_t is a signed integer while
db_addr_t is an unsigned integer value. db_search_symbol used variables
of type db_expr_t to hold the current offset of the requested address from
the "best" symbol found so far. This value was initialized to '~0'.
When a new symbol is found from a symbol table, the associated diff for the
new symbol is compared against the existing value as 'if (newdiff < diff)'
to determine if the new symbol had a smaller diff and was thus a closer
match.
On 64-bit MIPS, the '~0' was treated as a negative value (-1). A lookup
that found a perfect match of an address against a symbol returned a diff
of 0. However, in signed comparisons, 0 is not less than -1. As a result,
DDB on 64-bit MIPS never resolved any addresses to symbols. Workaround
this by using casts to force an unsigned comparison.
Probably the diff returned from db_search_symbol() and X_db_search_symbol()
should be changed to a db_addr_t instead of a db_expr_t as it is an
unsigned value (and is an offset of an address, so should fit in the same
size as an address).
Sponsored by: DARPA / AFRL
Changes include modifications in kernel crash dump routines, dumpon(8) and
savecore(8). A new tool called decryptcore(8) was added.
A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump
configuration in the diocskerneldump_arg structure to the kernel.
The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for
backward ABI compatibility.
dumpon(8) generates an one-time random symmetric key and encrypts it using
an RSA public key in capability mode. Currently only AES-256-CBC is supported
but EKCD was designed to implement support for other algorithms in the future.
The public key is chosen using the -k flag. The dumpon rc(8) script can do this
automatically during startup using the dumppubkey rc.conf(5) variable. Once the
keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O
control.
When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random
IV and sets up the key schedule for the specified algorithm. Each time the
kernel tries to write a crash dump to the dump device, the IV is replaced by
a SHA-256 hash of the previous value. This is intended to make a possible
differential cryptanalysis harder since it is possible to write multiple crash
dumps without reboot by repeating the following commands:
# sysctl debug.kdb.enter=1
db> call doadump(0)
db> continue
# savecore
A kernel dump key consists of an algorithm identifier, an IV and an encrypted
symmetric key. The kernel dump key size is included in a kernel dump header.
The size is an unsigned 32-bit integer and it is aligned to a block size.
The header structure has 512 bytes to match the block size so it was required to
make a panic string 4 bytes shorter to add a new field to the header structure.
If the kernel dump key size in the header is nonzero it is assumed that the
kernel dump key is placed after the first header on the dump device and the core
dump is encrypted.
Separate functions were implemented to write the kernel dump header and the
kernel dump key as they need to be unencrypted. The dump_write function encrypts
data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps
are not supported due to the way they are constructed which makes it impossible
to use the CBC mode for encryption. It should be also noted that textdumps don't
contain sensitive data by design as a user decides what information should be
dumped.
savecore(8) writes the kernel dump key to a key.# file if its size in the header
is nonzero. # is the number of the current core dump.
decryptcore(8) decrypts the core dump using a private RSA key and the kernel
dump key. This is performed by a child process in capability mode.
If the decryption was not successful the parent process removes a partially
decrypted core dump.
Description on how to encrypt crash dumps was added to the decryptcore(8),
dumpon(8), rc.conf(5) and savecore(8) manual pages.
EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU.
The feature still has to be tested on arm and arm64 as it wasn't possible to run
FreeBSD due to the problems with QEMU emulation and lack of hardware.
Designed by: def, pjd
Reviewed by: cem, oshogbo, pjd
Partial review: delphij, emaste, jhb, kib
Approved by: pjd (mentor)
Differential Revision: https://reviews.freebsd.org/D4712
db_segsize().
Use db_segsize() to set the default operand/address size for
disassembling. Allow overriding this with the "alternate" display
format /I. The API of db_disasm() should be debooleanized to pass a
more general request (amd64 needs overrides to sizes of 16, 32, and
64, but this commit doesn't implement anything for amd64 since much
larger changes are needed to restore the amd64 disassmbler's support
for non-default sizes).
Fix db_print_loc_and_inst() to ask for the normal format and not the
alternate in normal operation.
This is most useful for vm86 mode, but also works for 16-bit protected
mode.
Use db_segsize() to avoid trying to print a garbage stack trace if %cs
is 16 bits. Print something like the stack trace termination message
for a trap boundary instead.
Document that the alternate format is now useful on i386.
off single-stepping). Only do this on arches (only x86 so far)
which classify single-step traps unambiguously.
This allows other parts of the kernel to be intentionally and
unintentionally sloppy about generating single-step traps. On
x86, at least the following places were unintentionally sloppy:
- all operations that context-switched [er]flags. Especially
spinlock_enter()/exit() and cpu_switch(). When single-stepped,
saving the flags leaves PSL_T set in the saved flags, so
restoring gives a trap that is spurious if it occurs after
single-step mode has been left. Switching contexts away from
a low priority thread gives especially long-lived saved copies.
- the vm86 emulation allows user mode to set PSL_T. This was
correct until vm86 bios call mode was unintentionally given
access to kdb handling its single-step traps.
Now these places are intentionally sloppy, but unexpected
debugger traps still cause panics if no debugger that handles
the trap is attached when the trap is delivered.
current on first entry. This fixes a spurious "Stepping aborted"
message when the first entry is for a breakpoint.
Don't reset to the run mode to STEP_NONE when stopping, and remove
STEP_NONE. This mode was never really used, except transiently to
mis-decide whether to print the message on first entry.
This is not very easy to do, since ddb didn't know when traps are
for single-stepping. It more or less assumed that traps are either
breakpoints or single-step, but even for x86 this became inadequate
with the release of the i386 in ~1986, and FreeBSD passes it other
trap types for NMIs and panics.
On x86, teach ddb when a trap is for single stepping using the %dr6
register. Unknown traps are now treated almost the same as breakpoints
instead of as the same as single-steps. Previously, the classification
of breakpoints was almost correct and everything else was unknown so
had to be treated as a single-step. Now the classification of single-
steps is precise, the classification of breakpoints is almost correct
(as before) and everything else is unknown and treated like a
breakpoint.
This fixes:
- breakpoints not set by ddb, including the main one in kdb_enter(),
were treated as single-steps and not stopped on when stepping
(except for the usual, simple case of a step with residual count 1).
As special cases, kdb_enter() didn't stop for fatal traps or panics
- similarly for "hardware breakpoints".
Use a new MD macro IS_SSTEP_TRAP(type, code) to code to classify
single-steps. This is excessively complicated for bug-for-bug and
backwards compatibilty. Design errors apparently started in Mach
in ~1990 or perhaps in the FreeBSD interface in ~1993. Common trap
types like single steps should have a unique MI code (like the TRAP*
codes for user SIGTRAP) so that debuggers don't need macros like
IS_SSTEP_TRAP() to decode them. But 'type' is actually an ambiguous
MD trap number, and code was always 0 (now it is (int)%dr6 on x86).
So it was impossible to determine the trap type from the args.
Global variables had to be used.
There is already a classification macro db_pc_is_single_step(), but
this just gets in the way. It is only used to recover from bugs in
IS_BREAKPOINT_TRAP(). On some arches, IS_BREAKPOINT_TRAP() just
duplicates the ambiguity in 'type' and misclassifies single-steps as
breakpoints. It defaults to 'false', which is the opposite of what is
needed for bug-for-bug compatibility.
When this is cleaned up, MI classification bits should be passed in
'code'. This could be done now for positive-logic bits, since 'code'
was always 0, but some negative logic is needed for compatibility so
a simple MI classificition is not usable yet.
After reading %dr6, clear the single-step bit in it so that the type
of the next debugger trap can be decoded. This is a little
ddb-specific. ddb doesn't understand the need to clear this bit and
doing it before calling kdb is easiest. gdb would need to reverse
this to support hardware breakpoints, but it just doesn't support
them now since gdbstub doesn't support %dr*.
Fix a bug involving %dr6: when emulating a single-step trap for vm86,
set the bit for it in %dr6. Userland debuggers need this. ddb now
needs this for vm86 bios calls. The bit gets copied to 'code' then
cleared again.
Fix related style bugs:
- when clearing bits for hardware breakpoints in %dr6, spell the mask
as ~0xf on both amd64 and i386 to get the correct number of bits
using sign extension and not need a comment about using the wrong
mask on amd64 (amd64 traps for invalid results but clearing the
reserved top bits didn't trap since they are 0).
- rewrite my old wrong comments about using %dr6 for ddb watchpoints.
that the latter can easily determine what the trap type actually is
after callers are fixed to encode the type unambigously.
ddb currently barely understands breakpoints, and it treats all
non-breakpoints as single-step traps. This works OK for stopping
after every instruction when single-stepping, but is broken for
single-stepping with a count > 1 (especially with a large count).
ddb needs to stop on the first non-single-step trap while single-
stepping. Otherwise, ddb doesn't even stop the first time for
fatal traps and external breakpoints like the one in kdb_enter().
On big endian hardware that uses 1 byte bool a type mismatch of bool vs int will
cause the least signifcant byte of db_cmd_loop_done to be set, but the MSB to be
read, and read as 0. This causes ddb to stay in an infinite loop.
MFC after: 1 week
and negative shift counts.
Fix error messages: print "Division" instead of "Divide"; print
multiplier-like, addition-like and logical operator tokens instead of
garbage (usually the command name).
ddb has a primitive lexer with excessive information hiding that makes
it hard to find even the point in the line where a syntax error is
detected. Old ddb just printed "Syntax error" and this was unimproved
in most places by printing a garbage token.
'show active trace', or 'acttrace' for short, prints backtraces from running
threads only.
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D7646
the fixed-width numeric field 'wchan', as in ps(1). They were sort
of centered, although the template shows 'state' as right-justified.
The `wmesg' field very rarely has a prefix of '*' (for lock names)
that is still to the left of the header, and the width of this field
is reduced from 8 to 7 (more than 6 is an error).
The 'wmesg' and 'wchan' fields are still misnamed and poorly handled.
They are named sort of backwards relative to ps(1):
- wmesg in ddb = mwchan in ps
- wmesg in ddb = wchan in ps (if it is a wait channel name, not a lock name)
- wchan in ddb = nwchan in ps
ddb ps wastes lots of space for the unimportant 'wchan' field (20
columns altogether on 64-bit arches). ps(1) documents using a
compressed format, but the compression only omits leading nybbles of
0 so it has neveqr worked on arches that put the kernel in the top half
of the address space. It just avoids wasting space for an 0x prefix.
single stepping of multiple instructions (e.g., s/p,<count> and n/p).
db_print_loc_and_inst() already prints a newline on all arches although
it probably shouldn't.
Especially on SMP systems, single stepping tends to deadlock or panic
too quickly to be useful for anything except finding bugs in itself,
but with printing "itself" includes console drivers so it is useful
for generating stress tests for console drivers.
specifics of callout KPI. Esp., do not depend on the exact interface
of callout_stop(9) return values.
The main change is that instead of requiring precise callouts, code
maintains absolute time to wake up. Callouts now should ensure that a
wake occurs at the requested moment, but we can tolerate both run-away
callout, and callout_stop(9) lying about running callout either way.
As consequence, it removes the constant source of the bugs where
sleepq_check_timeout() causes uninterruptible thread state where the
thread is detached from CPU, see e.g. r234952 and r296320.
Patch also removes dual meaning of the TDF_TIMEOUT flag, making code
(IMO much) simpler to reason about.
Tested by: pho
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7137
This are based on Mach3.
Documentation is pending but has been promised.
Submitted by: Dan Partelly
Reviewed by: adrian, jhb (older version)
Differential Revision: https://reviews.freebsd.org/D4230
RelNotes: yes
* Change x/a to work similar to gdb. The content of the memory is
treated as an address, printed symbolically and the address is advanced.
This way you can x/a <stack_address> and then just hit return a bunch
of times to locate useful data on the stack.
* Add x/p. The content of the memory is treated as an address and
printed as hex.
This is based on the similar commit from DragonFlyBSD without the
cosmetic changes.
Relnotes: yes
Obtained from: DragonflyBSD (Matthew Dillon)
Reference: 0624d20e86affcd708609cbf9014207537537a72