sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
and condition variables. Having sleep qeueus in a hash table avoids
having to allocate a queue head for each wait channel. Thus, struct cv
has shrunk down to just a single char * pointer now. However, the
hash table does not hold threads directly, but queue heads. This means
that once you have located a queue in the hash bucket, you no longer have
to walk the rest of the hash chain looking for threads. Instead, you have
a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
differentiates between cv's and sleep/wakeup. For example, calls to
abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and
cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
sleep's no longer inherently bump the priority. Instead, this is soley
a propery of msleep() which explicitly calls sched_prio() before
blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used. The
associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
dropped and replaced with a single explicit clearing of td_wchan.
TD_SET_ONSLEEPQ() would really have only made sense if it had taken
the wait channel and message as arguments anyway. Now that that only
happens in one place, a macro would be overkill.
also prints the actual numerical value of the symbol in question.
Users of addr2line(1) will be less proficient in hex arithmetic as a
consequence.
This amongst other things means that traceback lines change from:
siointr1(c4016800,c073bda0,0,c06b699c,69f) at siointr1+0xc5
to
siointr1(c4016800,c073bda0,0,c06b699c,69f) at 0xc062b0bd = siointr1+0xc5
I made this an option to avoid bikesheds.
~
~
~
debug.ddb_use_printf sysctl, output kernel debugger data to both the
console and kernel message buffer via printf. This fixes the case where
backtrace() went directly to the console and should help debugging greatly.
Thanks to Ian Dowse for the work, minor edits or any bugs are by myself.
Submitted by: iedowse
some symbols in X_db_search_symbol(). Reject the same symbols that
rev.1.13 did (all except STT_OBJECT and STT_FUNC), except don't reject
typeless symbols. This keeps the typeless symbols in non-verbosely
written assembler code visible, but makes file symbols invisible. ELF
file symbols have type STT_FILE and value 0, so this stops small values
and offsets sometimes being displayed in terms of the first file symbol
in the kernel (usually device_if.c). I think it rejects some other
unwanted symbols (small absolute symbols for things like struct offsets).
It may reject some wanted symbols (large absolute symbols for addresses
like PTmap).
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.
ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).
alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.
powerpc: MD prototypes left in cpu.h. Comment added.
Suggested by: bde
Tested with: make universe (pc98 incomplete)
integer value and then to construct the integer from it. This buffer
was sizeof(int) bytes long, which was fine until the (undocumented) 'g'
modifier for 8-byte integers was introduced. Change this to sizeof(uint64_t).
callout when a specified number of lines have been output. This can be
used to implement pagers for ddb commands that output a lot of text. A
simple paging function is included that automatically rearms itself when
fired.
Reviewed by: bde, julian
fit on one line. Account for threads better.
* No need to report that it is on a sleep queue if it is actually sleeping
* "Normal" state is almost ubiquitous.. only report abnormal states.
* increment the #lines count for each separate thread shown in threaded
programs.
makes it less likely that a threaded program will make all the data
on a screen overflow off the top of the screen.
I'm not convinced there is anything major wrong with the patch but
them's the rules..
I am using my "David's mentor" hat to revert this as he's
offline for a while.
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.
A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.
Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.
Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.
KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.
When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.
The code hasn't been tested under SMP by author due to lack of hardware.
Reviewed by: julian
implement ALT_BREAK_TO_DEBUGGER. The caller provides a pointer to a state
variable to allow different state to be maintained for separate instances of
a device.
- Use struct vm_map * instead of vm_map_t in db_break.h to avoid its users
needing to include vm headers.
Requested by: njl
(show thread {address})
Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.
All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().
kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.
Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.
- Make DDB use %y instead of %z.
- Teach GCC about %y.
- Implement support for the C99 %z format modifier.
Approved by: re@
Reviewed by: peter
Tested on: i386, sparc64
It is never used. I left it there from pre-KSE days as I didn't know
if I'd need it or not but now I know I don't.. It's functionality
is in TDI_IWAIT in the thread.
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.
Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".
DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)
Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.
name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of
TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that
will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will
be used internally by the turnstile code and will not be specific to
mutexes. Making the change now ensures that turnstiles can be dropped
in at a later date without affecting the ABI of userland applications.
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:
- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.
Tested on: i386
Requested by: many
X_db_search_symbol(). Otherwise we don't see important symbols in
non-verbosely written assembler code.
NetBSD already has this. The kld version already has a stronger form
of it without really trying -- linker_ddb_search_symbol() doesn't
support ddb's symbol search strategy parameter, so the kld
X_db_search_symbol() doesn't pass the parameter to linker_ddb...() and
linker_ddb...() doesn't make distinctions based on the symbol type.
db_elf.c now works better than db_kld.c when it works (which is essentially
when there are no modules except the kernel). It works after booting
with -d. db_kld.c doesn't work until lots of SYSINIT()s have run.
symbol table sections. Reconstruct the necessary section headers from
(ksym_start, ksym_end). This was much easier than converting to use
module metadata, and just works for static symbols, unlike db_kld when
there is no module metadata. Initialize (ksym_start, ksym_end) from
bootinfo on i386's only.
The boot loader should load section headers for all sections that it
loads, and apparently did this for at least the symbol table sections
when this file last worked under FreeBSD (on alphas only) and always
did this under NetBSD (where this file was obtained from). At least
on i386's, boot2 discards the section headers (except for converting
them to (bootinfo.bi_symtab, bootinfo.bi_esymtab), and as far as I can
tell, loader(8) discards them apart from converting them to the bootinfo
values and module metadata.
and renaming ALIGNED_POINTER() to _ALIGNED_POINTER() plus the following
hacks for i386's:
- define _ALIGNED_POINTER() if it is not already defined. Most non-i386
arches define it <machine/param.h> define it in <machine/param.h>,
although none actually used it in the kernel.
- define ksym_start and ksym_end. Most non-i386 arches still define and
initialize these in machdep.c although they didn't used them. Here is
a better place to define them but not to initialize them.
Don't attempt to follow null pointers for zombie processes in db_ps().
Style fix: use explicit an comparison with NULL for all null pointer
checks in db_ps() instead of for half of them.
db_interface.c:
Fixed ddb's handling of traps from with ddb on i386's only.
This was mostly fixed in rev.1.27 (by longjmp()'ing back to the top
level) but was completly broken in rev.1.48 (by not unwinding the new
state (mainly db_active) either before or after the longjmp(). This
mostly never worked for other arches, since rev.1.27 has not been ported
and lower level longjmp()'s only handle traps for memory accesses. All
cases should be handled at a lower level to provided better control and
simplify unwinding of state.
Implementation details: don't pretend to maintain db_active in a nested
way -- ddb cannot be reentered in a nested way. Use db_active instead
of the db_global_jmpbuf_valid flag and longjmp()'s return value for things
related to reentering ddb. [re]entering is still not atomic enough.
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
the indirection operator ('*') and address examination ('x/a') on
big-endian platoforms for which the above is not true, as well as on
little-endian platforms if the cut-off bits are not 0.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
respects locks. Before SMPng, one was able to call psignal()
using the "call" command, but this is no longer possible because it
does not respect locks by itself. This is very useful when one has
gotten their machine into a state where it is impossible to spawn
ps/kill or su to root.
In this case, respecting locks essentially means trying to aquire the
proc lock before calling psignal(). We can't block in the debugger,
so if trylock fails, the operation fails. This also means that we
can't use pfind(), since that will attempt to lock the process for us.
Reviewed by: jhb
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.
No objection by: audit@, hackers@
MFC after: 2 weeks
Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.
The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).
The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.
For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.
For a.out, we use the old linker_set struct.
NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.
The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.
linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.
Reviewed by: eivind
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
with 'options DDB'). Setting this to `ddb' or `gdb' breaks into the
kernel debugger in the corresponding mode. This mechanism has proven
very useful at Whistle for setting breakpoints, etc., while doing
remote serial line kernel debugging.
Obtained from: Whistle source tree
ddb is entered. Don't refer to `in_Debugger' to see if we
are in the debugger. (The variable used to be static in Debugger()
and wasn't updated if ddb is entered via traps and panic anyway.)
- Don't refer to `in_Debugger'.
- Add `db_active' to i386/i386/db_interface.d (as in
alpha/alpha/db_interface.c).
- Remove cnpollc() stub from ddb/db_input.c.
- Add the dbctl function to syscons, pcvt, and sio. (The function for
pcvt and sio is noop at the moment.)
Jointly developed by: bde and me
(The final version was tweaked by me and not reviewed by bde. Thus,
if there is any error in this commit, that is entirely of mine, not
his.)
Some changes were obtained from: NetBSD
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.
This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
/sys/ddb/db_input.c rev 1.19 to recognize syscons's cursor keycodes.
It is unnecessary now that scgetc() in syscons returns the escape
sequence for the cursor keys rather than their raw, internal key
codes.
I'm not too happy about the result either, but at least it has less
chance of backfiring.
This particular feature could be called "a mess" without offending
anybody.
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
get to all the symbol tables for all modules, not just the core kernel
symbol table. Yes, DDB can see KLD module symbols with this, both by
lookup and in tracebacks. No more references to _end from tracebacks
within an LKM. :-)
because the alpha boot loader hasn't been converted yet, and because
it needs the full symbol tables with local symbols in order to make sense
of stack tracebacks. KLD will implement this (using full sybmol table
rather than the globals only) shortly.
case it's possible to compile in something like ECOFF)
The three db_xxx.c symbol interfaces are "standard" because config isn't
flexible enough without forcing the user to know about it.
Use them to `make gcc -Wformat' check formats for all printf-like
and scanf-like functions in /usr/src except for the err()/warn()
family. err() isn't quite printf-like since its format arg can
legitimately be NULL. syslog() isn't quite printf-like, but gcc
already accepts %m, even for plain printf() when it shouldn't.
(nonstandard %n and '+' with %x), and ones not found by -Wformat on
386's (some db_expr_t's are still printed as ints).
I decided not to change the arg type for %n from [unsigned] int to
register_t, since about half of the uses of %n are to print plain
ints and casting to [unsigned] long for %n is no harder than for %x.
work in progress and has never booted a real machine. Initial
development and testing was done using SimOS (see
http://simos.stanford.edu for details). On the SimOS simulator, this
port successfully reaches single-user mode and has been tested with
loads as high as one copy of /bin/ls :-).
Obtained from: partly from NetBSD/alpha
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
Clean up (or if antipodic: down) some of the msgbuf stuff.
Use an inline function rather than a macro for timecounter delta.
Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.
Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()
This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.
WARNING: Programs which muck about with struct proc in userland
will have to be fixed.
Reviewed, but found imperfect by: bde
1) Fix the initialization of malloc structure that changed
due to perf opt.
2) Remove unneeded include.
3) An initialization assert added to malloc.
Submitted by: John Hood <cgull@smoke.marlboro.vt.us>
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
`show vmopag', `show page' and `show pageq'. Moved all vm ddb stuff
to the ends of the vm source files.
Changed printf() to db_printf(), `indent' to db_indent, and iprintf()
to db_iprintf() in ddb commands. Moved db_indent and db_iprintf()
from vm to ddb.
vm_page.c:
Don't use __pure. Staticized.
db_output.c:
Reduced page width from 80 to 79 to inhibit double spacing for long
lines (there are still some problems if words are printed across
column 79).