be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.
Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.
Reviewed by: jake
Approved by: jake
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):
- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).
htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.
Make use of the new functions in a few places where local implementations
of the same functionality existed.
Reviewed by: mike, bde
Tested on alpha by: mike
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.
enabled in critical sections and streamline critical_enter() and
critical_exit().
This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.
This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.
The new code places the burden of entering the critical section
in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.
* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.
The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.
The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().
Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.
This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.
While I'm here, also hide osigcontext types from userland; suggested
by bde.
Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>
for SMP in the plain profiling case. It seems to work too.
This error was not detected by LINT because LINT only compiles the
GUPROF profiling case, which is is a superset of the plain profiling
case for !SMP but which is so broken for SMP that the buggy code is
not compiled.
they make it through to userland. This should fix the p5-smp problem
without affecting the other cpus (eg: cyrix, see initcpu.c and the special
cache handling for these cpu types).
whether the machine context is valid and whether the FPU state is
valid (saved).
Mark the machine context valid before copying it out when sending a
signal.
Approved by: -arch
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().
Tested on: i386, alpha
- Axe inlvtlb_ok as it was completely redundant with smp_active.
- Remove references to non-existent variable and non-existent file
in i386/include/smp.h.
- Don't perform initializations local to each CPU while holding the
ap boot lock on i386 while an AP bootstraps itself.
- Reorganize the AP startup code some to unify the latter half of the
functions to bring an AP up. Eventually this might be broken out into
a MI function in subr_smp.c.
- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.
Tested on: alpha, i386
Reviewed by: peter, jake
alpha pmap. In particular -
- pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer.
This is to enable cleaner PAE and x86-64 support down the track sor
that we can change the pd_entry_t/pt_entry_t types to 64 bit entities.
- Terminate "unsigned *ptep, pte" with extreme prejudice and use the
correct pt_entry_t/pd_entry_t types.
- Various other cosmetic changes to match cleanups elsewhere.
- This eliminates a boatload of casts.
- use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places
where we're testing user address space limits. Assuming the page tables
start directly after the end of user space is not a safe assumption.
There is still more to go.
ill effects. This should fix problems threaded programs are having with
auto-detecting CPU type.
Reported by: Joe Clarke <marcus@marcuscom.com>
Tested by: Joe Clarke <marcus@marcuscom.com>
Reviewed by: jhb
MFC after: 1 week
o Make <stdint.h> a symbolic link to <sys/stdint.h>.
o Move most of <sys/inttypes.h> into <sys/stdint.h>, as per C99.
o Remove <sys/inttypes.h>.
o Adjust includes in sys/types.h and boot/efi/include/ia64/efibind.h
to reflect new location of integer types in <sys/stdint.h>.
o Remove previously symbolicly linked <inttypes.h>, instead create a
new file.
o Add MD headers <machine/_inttypes.h> from NetBSD.
o Include <sys/stdint.h> in <inttypes.h>, as required by C99; and
include <machine/_inttypes.h> in <inttypes.h>, to fill in the
remaining requirements for <inttypes.h>.
o Add additional integer types in <machine/ansi.h> and
<machine/limits.h> which are included via <sys/stdint.h>.
Partially obtain from: NetBSD
Tested on: alpha, i386
Discussed on: freebsd-standards@bostonradio.org
Reviewed by: bde, fenner, obrien, wollman
a temporary fix so that we can compile kernels. I waited 30 minutes
for a response from the person who would likely know, but any longer
is too long to wait with breakage at ToT.
by the profiler on a running system. This is not done sparsely, as
memory is cheaper than processor speed and each gprof mcount() and
mexitcount() operation is already very expensive.
Obtained from: NAI Labs CBOSS project
Funded by: DARPA
{set,fill}_{,fp,db}regs() fixup:
- Add dummy {set,fill}_dbregs() on architectures that don't have them.
- KSEfy the powerpc versions (struct proc -> struct thread).
- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.
These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.
Run-tested on i386, build-tested on Alpha, untested on other platforms.
the size of the kernel virtual address space relatively painlessly.
Userland will adapt via the exported kernbase symbol. Increasing
this causes the user part of address space to reduce.
will be private to each CPU.
- Re-style(9) the globaldata structures. There really needs to be a MI
struct pcpu that has a MD struct mdpcpu member at some point.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
with system statistics monitoring tools (such as systat, vmstat...)
because of stopping RTC interrupts generation.
Restore all the timers (RTC and i8254) atomically.
Reviewed by: bde
MFC after: 1 week
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.
This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.
and such was just a bad idea and one that users should be forced to
enable if they want it. This patch introduces a hw.pci.enable_pcibios
tunable for those people. This does not impact the pcibios interrupt
routing at all.
Approved by: peter, msmith
some bios vendors took it apon themselves to "censor" the
host->pci bridges from PCIBIOS callers, even when the caller
explicitly asks for them. This includes certain Compaq machines
(eg: DL360) and some laptops.
If we detect this, shut down pcibios and revert to using IO
port bashing.
Under -current, apcica does a better job anyway.
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.
Reviewed by: bde (mostly)
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.
gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.
dynamic symbol table buckets and chains. The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.
Discussed with: dfr, jdp
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().
- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.
This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.
Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob
we are required to do if we let user processes use the extra 128 bit
registers etc.
This is the base part of the diff I got from:
http://www.issei.org/issei/FreeBSD/sse.html
I believe this is by: Mr. SUZUKI Issei <issei@issei.org>
SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp>
Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html
I have fixed a couple of style(9) deviations. I have some followup
commits to fix a couple of non-style things.
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).
Submitted by: original (pre-SMPng) version by luoqi
safe from preemption and concurrent access to the LDT.
- Move the prototype for i386_extend_pcb() to <machine/pcb_ext.h>.
Reviewed by: silence on -hackers
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.
because:
- it used a better namespace (smp_ipi_* rather than *_ipi),
- it used better constant names for the IPI's (IPI_* rather than
X*_OFFSET), and
- this API also somewhat exists for both alpha and ia64 already.
stylistic.
# Yes, this break K&R, but this file already used so many gcc extensions
# keeping K&R support seemed too anachronistic for me.
Didn't fix the bug where functions that can only be used in the kernel
are exported to userland.
that people use from userland in C++ programs. I've had this in my
tree for ages and just got bit by it not being in the real tree again.
This is a MFC candidate.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.
sections.
- Add implementations of the critical_enter() and critical_exit() functions
and remove restore_intr() and save_intr().
- Remove the somewhat bogus disable_intr() and enable_intr() functions on
the alpha as the alpha actually uses a priority level and not simple bit
flag on the CPU.
For UP, we were using $tmp_stk as a stack from the data section. If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping. This would cause the trampoline
up to high memory to fault. The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.
MFC candidate - this breaks on 4.x just the same..
Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks. Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock. At
least on the i386. To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks. Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes. Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.
Consulting from: cp
with egcs-1.1.1. bus_space_write_multi_2() had an extra operation that
should have been removed.
Remove it.
This fixes the panic when bus_space_write_multi_2() is used.
Obtained from: jake
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.
Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.
moral support from: peter, jhb