enabled in critical sections and streamline critical_enter() and
critical_exit().
This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.
This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.
The new code places the burden of entering the critical section
in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.
* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.
The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.
The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().
Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.
This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)
Reviewed by: dillon@freebsd.org
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.
While I'm here, also hide osigcontext types from userland; suggested
by bde.
Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).
Reviewed by: jhb@freebsd.org, bde@freebsd.org (slightly different version)
SMP we'd like as much feedback as possible from users about possible
locking problems as early as possible.
To negate most of the performance impact I've also enabled
WITNESS_SKIPSPIN. I've done this as we've been running WITNESS
over the spinlock code for a while without incident and it goes a
long way to making the performance problems of WITNESS much more
bearable.
Users who should be running current should know about turning WITNESS
off for performance reasons.
That said and done, WITNESS could/should be made into a tuneable,
but we'll leave that as an excersize to those that want to disable
it without a kernel recompile.
slower, and may be impeding adoption of -CURRENT by developers. We
recommend turning on WITNESS by default on crash boxes, and when doing
locking development. It will probably get turned on by default for a week
or two following any major locking commits, also.
Approved by: all and sundry (jhb, phk, ...)
feature bit on newer Athlon CPUs if the BIOS has forgotten to enable
it.
This patch was constructed using some info made available by John
Clemens at http://www.deater.net/john/PavilionN5430.html
Reviewed by: -audit
MFC after: 3 weeks
- Collected i486 identification codes in one place like
586 and 686.
- Merged two cases (0x470 and 0x490) for `Enhanced Am486DX4
Write-Back.'
- Replaced `unknown' into `Unknown'.
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
from old signal handlers. This is simpler and faster, and fixes (new)
sigreturn(2) when %eip in the new signal context happens to match the
magic value (0x1d516). 0x1d516 is below the default ELF text section,
so this probably never broken anything in practice.
locore.s:
In addition, don't build the signal trampoline for old signal handlers
when it is not used.
alpha:
Not fixed, but seems to be even less broken in practice due to more
advanced magic. A false match occurs for register #32 in mc_regs[].
Since there is no hardware register #32, a false match is only possible
for direct calls to sigreturn(2) that happen to have the magic number
in the spare mc_regs[32] field.
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.
ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.
powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.
sparc64: copy the cleaned up the ia64 stub, since there was no stub before.
for SMP in the plain profiling case. It seems to work too.
This error was not detected by LINT because LINT only compiles the
GUPROF profiling case, which is is a superset of the plain profiling
case for !SMP but which is so broken for SMP that the buggy code is
not compiled.
the packet transfer routines, since rev.1.468 of machdep.c does this
better. I'm surprised that disabling interrupts helped much. Disabling
them in the packet receive routine is too late.
Fixed some minor style bugs in rev.1.14.
to fetch the magic word instead of useracc() plus a direct access.
This is more efficient as well as simpler and less incorrect:
- it was inefficent because useracc() takes much longer than just
accessing the data using a correct access method, at least on i386's.
- it was incorrect because direct access is incorrect unless the address
has been mapped. This and nearby direct accesses are mostly handled
better for other arches because they have to be (direct accesses don't
work).
- using magic in sigreturn is still fundamentally broken because false
matches are possible. On i386's, a false match occurs when %eip in a
new signal context happens to equal the magic value. This is not
handled better for other arches.
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.
This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.
prior ICP Vortex models. This driver was developed by Achim Leubner
of Intel (previously with ICP Vortex) and Boji Kannanthanam of Intel.
Submitted by: "Kannanthanam, Boji T" <boji.t.kannanthanam@intel.com>
MFC after: 2 weeks
This typo keeps us from properly routing an interrupt for CardBus
bridges on this machine. So, now we look for $PIR and then _PIR to
cope. With these changes, the Libretto L1 now works properly.
Evidentally, the idea comes from patch that the Japanese version of
RedHat (or against a Japanese version of Red Hat), but my Japanese
isn't good enough to to know for sure.
Reported by: Hiroyuki Aizu-san <eyes@navi.org>
# This may be an MFC candidate, but I'm not yet sure.
cpuid with %eax=1 will return a logical cpu count in bits 16-23 of %ebx.
Bit 29 is actually 'TM' according to AP-485. This signifies the presence
of the thermal control circuit (which I believe can slow the clock down
to reduce core temperature).