we are required to do if we let user processes use the extra 128 bit
registers etc.
This is the base part of the diff I got from:
http://www.issei.org/issei/FreeBSD/sse.html
I believe this is by: Mr. SUZUKI Issei <issei@issei.org>
SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp>
Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html
I have fixed a couple of style(9) deviations. I have some followup
commits to fix a couple of non-style things.
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.
No objection by: audit@, hackers@
MFC after: 2 weeks
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.
vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.
Reported by: tegge (signal race), bde (addupc_task a while back)
startup routine more closely matches that of alpha and ia64. At some
point the common mutexes shared across all platforms probably should move
into sys/kern_mutex.c.
trace code that was brought over from NetBSD.)
- Check for "syscall_with_err_pushed" as the label prior to a syscall trap
frame rather than "Xlcall_syscall" and "Xint0x80_syscall". We don't
have a valid trapframe during the short range of code that those two
symbols now cover.
- Simplify db_next_frame() to avoid duplicating the code for the different
trap frame types.
- Don't try to trace a swapped-out process. (Brought over from NetBSD via
the new alpha trace code.)
- Replace some very poorly thought out API hacks that should have been
fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
inconvenient temporary ioconf table from config(). We already had a
fallback to using strings before malloc/vm was running anyway.
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs
Requested by: bde
Reviewed by: bde (some time ago)
from cpu_switch(), curproc has been changed, but the sched_lock owner will
not be updated until we return to mi_switch(), thus we deadlock against
ourselves. As a workaround, push the acquire and release of sched_lock out
to the callers of set_user_ldt(). Note that we can't use a mtx_assert() in
set_user_ldt for the same reason.
Sleuting by: tmm
Tested by: tmm, dougb
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).
Submitted by: original (pre-SMPng) version by luoqi
npxsave() went to great lengths to excecute fnsave with interrupts
enabled in case executing it froze the CPU. This case can't happen,
at least for Intel CPU/NPX's. Spurious IRQ13's don't imply spurious
freezes. Anyway, the complications were usually no-ops because IRQ13
is not used on i486's and newer CPUs, and because SMPng broke them in
rev.1.84. Forcible enabling of interrupts was changed to
write_eflags(old_eflags), but since SMPng usually calls npxsave() from
cpu_switch() with interrupts disabled, write_eflags() usually just
kept interrupts disabled.
npxinit() didn't have the usual race because it doesn't save to curpcb,
but it may have had a worse form of it since it uses the npx when it
doesn't "own" it. I'm not sure if locking prevented this. npxinit()
is normally caled with the proc lock but not sched_lock.
Use a critical region to protect pushing of curproc's npx state to
curpcb in npxexit(). Not doing so was harmless since it at worst
saved a wrong state to a dieing pcb.
Not doing this was fairly harmless because savectx() is only called
for panic dumps and the bug could at worse reset the state.
savectx() is still missing saving of (volatile) debug registers, and
still isn't called for core dumps.
vm_mtx does not recurse and is required for most low level
vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the
vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
afer the display of the copyright message instead of doing it by hand in
three MD places.
If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.
Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.
pcb for fork(). It was possible for the state to be saved twice when an
interrupt handler saved it concurrently. This corrupted (reset) the state
because fnsave has the (in)convenient side effect of doing an implicit
fninit. Mundane null pointer bugs were not possible, because we save to
an "arbitrary" process's pcb and not to the "right" place (npxproc).
Push the parent's %gs to the pcb for fork(). Changes to %gs before
fork() were not preserved in the child unless an accidental context
switch did the pushing. Updated the list of pcb contents which is
supposed to inhibit bugs like this. pcb_dr*, pcb_gs and pcb_ext were
missing. Copying is correct for pcb_dr*, and pcb_ext is already
handled specially (although XXX'ly).
Reducing the savectx() call to an npxsave() call in rev.1.80 was a
mistake. The above bugs are duplicated in many places, including in
savectx() itself.
The arbitraryness of the parent process pointer for the fork()
subroutines, the pcb pointer for savectx(), and the save87 pointer
for npxsave(), is illusory. These functions don't work "right" unless
the pointers are precisely curproc, curpcb, and the address of npxproc's
save87 area, respectively, although the special context in which they
are called allows savectx(&dumppcb) to sort of work and npxsave(&dummy)
to work. cpu_fork() just doesn't work unless the parent process
pointer is curproc, or the caller has pushed %gs to the pcb, or %gs
happens to already be in the pcb.
follow Linux' convention and use %gs. This adds back the setting of
%fs to a sane value in sendsig(). The value of %gs remains preserved
to whatever it was in user context.
safe from preemption and concurrent access to the LDT.
- Move the prototype for i386_extend_pcb() to <machine/pcb_ext.h>.
Reviewed by: silence on -hackers
%fs and %gs registers instead of setting them to known sane values.
%fs is going to be used for thread/KSE specific data by the new
threads library; we'll want it to be valid inside of signal handlers.
According to bde, Linux preserves the state of %fs and %gs when setting
up signal handlers, so there is precedent for doing this.
The same changes should be made in the Linux emulator, but when made,
they seem to break (at least one version of) the IBM JDK for Linux
(reported by drew).
Approved by: bde
handling, SMPng always switches the npx context away from curproc
before calling the handler, so the handler always paniced. When using
exception 16 exception handling, SMPng sometimes switches the npx
context away from curproc before calling the handler, so the handler
sometimes paniced. Also, we didn't lock the context while using it,
so we sometimes didn't detect the switch and then paniced in a less
controlled way.
Just lock the context while using it, and return without doing anything
except clearing the busy latch if the context is not for curproc. This
fixes the exception 16 case and makes the IRQ13 case harmless. In both
cases, the instruction that caused the exception is restarted and the
exception repeats. In the exception 16 case, we soon get an exception
that can be handled without doing anything special. In the IRQ13 case,
we get an easy to kill hung process.
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.
API for IPI's that isn't tied to the Intel APIC. MD code can still use
the apic_ipi() function or dink with the apic directly if needed to send
MD IPI's.
because:
- it used a better namespace (smp_ipi_* rather than *_ipi),
- it used better constant names for the IPI's (IPI_* rather than
X*_OFFSET), and
- this API also somewhat exists for both alpha and ia64 already.
Since pid's are not in the kernel address space, this doesn't conflict
with the funcionality of specifying an arbitrary frame pointer to the
trace command.
- If the first function of a backtrace maps to fork_trampoline, then this
is a newly fork'd process that has not been executed yet, so just print
out the first frame and then return for that case.
- Lower the default count from 65535 to 1024. ddb doesn't trace into
userland, and if the stack gets hosed and starts looping it's less
annoying.
Specifically, the cpuid, curproc, curpcb, npxproc, and idleproc members.
Also, if witness is compiled into the kernel, then a list of all the spin
locks held by this CPU is displayed. By default the information for the
current CPU is displayed, but a decimal cpu id may be specified as a
parameter to obtain information on a specific CPU.
stylistic.
# Yes, this break K&R, but this file already used so many gcc extensions
# keeping K&R support seemed too anachronistic for me.
Didn't fix the bug where functions that can only be used in the kernel
are exported to userland.
that people use from userland in C++ programs. I've had this in my
tree for ages and just got bit by it not being in the real tree again.
This is a MFC candidate.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.