general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).
Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.
This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.
Reviewed by: core
Approved by: core
Problem:
selwakeup required calling pfind which would cause lock order
reversals with the allproc_lock and the per-process filedesc lock.
Solution:
Instead of recording the pid of the select()'ing process into the
selinfo structure, actually record a pointer to the thread. To
avoid dereferencing a bad address all the selinfo structures that
are in use by a thread are kept in a list hung off the thread
(protected by sellock). When a selwakeup occurs the selinfo is
removed from that threads list, it is also removed on the way out
of select or poll where the thread will traverse its list removing
all the selinfos from its own list.
Problem:
Previously the PROC_LOCK was used to provide the mutual exclusion
needed to ensure proper locking, this couldn't work because there
was a single condvar used for select and poll and condvars can
only be used with a single mutex.
Solution:
Introduce a global mutex 'sellock' which is used to provide mutual
exclusion when recording events to wait on as well as performing
notification when an event occurs.
Interesting note:
schedlock is required to manipulate the per-thread TDF_SELECT
flag, however if given its own field it would not need schedlock,
also because TDF_SELECT is only manipulated under sellock one
doesn't actually use schedlock for syncronization, only to protect
against corruption.
Proc locks are no longer used in select/poll.
Portions contributed by: davidc
enabled in critical sections and streamline critical_enter() and
critical_exit().
This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.
This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.
The new code places the burden of entering the critical section
in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.
* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.
The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.
The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().
Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.
This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.
otherwise breaks on the Alpha arch. I think this is wrong since i'd
actually like to probe for a PC architecture, not for a particular CPU
type. Anyway, now it's again the way it used to be.
. The main device node now supports automatic density selection for
commonly used media densities. So you can stuff your 1.44 MB and
720 KB media into your drive and just access /dev/fd0, no questions
asked. It's all that easy, isn't it? :)
. Device density handling has been completely overhauled. The old way
of hardwired kernel density knowledge is no longer there. Instead,
the kernel now implements 16 subdevices per drive. The first
subdevice uses automatic density selection, while the remaining 15
devices are freely programmable. They can be assigned an arbitrary
name of the form /dev/fd[:digit]+.[:digit:]{1,4}, where the second
number is meant to either implement device names that are mnemonic
for their raw capacity (as it used to be), or they can alternatively
be created as "anonymous" devices like fd0.1 through fd0.15,
depending on the taste of the administrator. After creating a
subdevice, it is initialized to the maximal native density of the
respective drive type, so it needs to be customized for other
densities by using fdcontrol(8). Pseudo-partition devices (fd0a
through fd0h) are still supported as symlinks.
. The old hack to use flags 0x1 to always assume drive 0 were there is
no longer supported; this is now supposed to be done by wiring the
devices down from the loader via device flags. On IA32
architectures, the first two drives are looked up in the CMOS
configuration records though. On PCMCIA (i. e., the Y-E Data
controller of the Toshiba Libretto), a single drive is always
assumed.
. Other specialities like disabling the FIFO and not probing the drive
at boot-time are selected by per-controller or per-drive flags, too.
. Unit attentions (media has been changed) are supposed to be detected
now; density autoselection only occurs after a unit attention. (Can
be turned off by a per-drive flag, this will cause each Fdopen() to
perform the autoselection.)
. FM floppies can be handled now (on controllers that actually support
it -- not all do these days).
. Fdopen() can be told to avoid density selection by setting
O_NONBLOCK; this leaves the descriptor in a half-opened state where
only a few ioctls are accepted. This is necessary to run fdformat
on a device that uses automatic density selection (since you cannot
autoselect on an unformatted medium, obviously).
. Just differentiate between a plain old NE765 and the enhanced chips,
but don't try more; the existing code was wrong and only misdetected
the chips anyway.
BUGS and TODOs:
. All documentation update still needs to be done.
. Formatting not-so-standard format yields unpredictable results; i
have yet to figure out why this happens. "Standard" formats like
720 and 1440 KB do work, however.
. rc scripts are needed to setup device nodes with nonstandard
densities (like the old /dev/fdN.MMM we used to have).
. Obtaining device flags from the kernel environment doesn't work yet,
thus currently only drives that are present in (IA32) CMOS are
really detected. Someone who knows the odds and ends about device
flags is needed here, i can't figure out what i'm doing wrong.
. 2.88 MB still needs to be done.
- Now that apm loadable module can inform its existence to other kernel
components (e.g. i386/isa/clock.c:startrtclock()'s TCS hack).
- Exchange priority of SI_SUB_CPU and SI_SUB_KLD for above purpose.
- Add simple arbitration mechanism for APM vs. ACPI. This prevents
the kernel enables both of them.
- Remove obsolete `#ifdef DEV_APM' related code.
- Add abstracted interface for Powermanagement operations. Public apm(4)
functions, such as apm_suspend(), should be replaced new interfaces.
Currently only power_pm_suspend (successor of apm_suspend) is implemented.
Reviewed by: peter, arch@ and audit@
sio_pccard_detach to use new siodetach. Add an extra arg to sioprobe
to tell driver to probe/not probe the device for IRQs.
This incorporates most of Bruce's review material. I'm at a good
checkpoint, but there will be more to come based on bde's further
reviews.
Reviewed by: bde
Move sio from isa/sio.c to dev/sio/sio.c. The next step is to break
out the front end attachments, improve support for these parts on
different busses, and maybe, if we're lucky, merging in pc98 support.
It will also be MI and live in conf/files rather than files.*.
Approved by: bde
Tested with: i386, pc98
- Count the number of this error.
- When the error is detected for the first time, the psm driver will
throw few data bytes (up to entire packet size) and see if it can
get back to sync.
- If the error still persists, the psm driver disable/enable the mouse
and see if it works.
- If the error still persists and the count goes up to 20,
the psm driver reset and reinitialize the mouse. The counter
is reset to zero.
- It also discards an incomplete data packet when the interval
between two consequtive bytes are longer than pre-defined timeout
(2 seconds). The last byte which arrived late will be regarded as
the first byte of a new packet. This is louie's idea.
You may see the following error logs during the above operations:
"psmintr: delay too long; resetting byte count"
"psmintr: out of sync (%04x != %04x)"
"psmintr: discard a byte (%d)"
"psmintr: re-enable the mouse"
"psmintr: reset the mouse"
MFC after: 1 month
sio_lock has been initialized. This prevents the low level console
output (kernel printf) from clobbering the sio settings if the system
happens to be in the middle of comstart().
problems currently experienced in -CURRENT.
This should fix the problem that the PS/2 mouse is detected
twice if the acpi module is not loaded on some systems.
I am not sure if this is absolutely necessary on all systems. Yet,
there certainly are motherboards and notebook systems which require
this, although there are other systems which just don't. I hope we
shall know when to do this on which systems, as the development of our
ACPI subsystem progresses... (I know we didn't need this for the APM
resume.)
mandatory "card" identifier string. A logical devices on the ISA PnP
card may optionally have a "device" identifier string. Do not confuse
them.
The "card" identifier string is assigned to a logical device as the
default description string when the device is found. (If the "card"
identifier string has not been found, use the EISA PnP ID string.
Strictly speaking, this is an error.) We will override it when a
"device" identifier string is found later.
- Add workaround for the problematic PnP BIOS which does not assign
irq resource for the PS/2 mouse device node; if there is no irq
assigned for the PS/2 mouse node, refer to device.hints for an
irq number. If we still don't find an irq number in the hints
database, use a hard-coded value.
- Delete unused ivars.
- Bit of clean up in probe/attach.
- Add PnP ID for the PS/2 mouse port on some IBM ThinkPad models.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
more cleanly and consistently in all APCI, PnP BIOS, and "hint"
cases.
NOTE: this doesn't necessarily solve the problem that the PS/2
mouse is not detected after the recent ACPI update.
the following bugs.
- When constructing a resource configuration, respect the order
in which resource descriptors are read, in order to establish
the correct mapping between the descriptors and configuration
registers.
"Plug and Play ISA Specification, Version 1.0a", Sec 4.6.1, May 5,
1994. "Clarifications to the Plug and Play ISA Specification,
Version 1.0a", Sec 6.2.1, Dec. 10, 1994.
- Do not ignore null (empty) descriptors; they are valid descriptors
acting as filler.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a",
Sec 6.2.1.
- Correctly set up logical device configuration registers for null
resources.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a"
- Handle null resources properly in the resource allocator for the
ISA bus.
with system statistics monitoring tools (such as systat, vmstat...)
because of stopping RTC interrupts generation.
Restore all the timers (RTC and i8254) atomically.
Reviewed by: bde
MFC after: 1 week