source count pointers at them so that intr_execute_handlers() won't
choke when it tries to handle an unregisterd ATPIC interrupt source.
- Install the low-level ATPIC interrupt handlers when we first program the
ATPIC in atpic_startup() rather than at SI_SUB_INTR. This is only
necessary to work around buggy code that enables interrupts too early
in the boot process (namely, the vm86 code).
Approved by: re (rwatson)
more than one sf_buf for one vm_page. To accomplish this, we add
a global hash table mapping vm_pages to sf_bufs and a reference
count to each sf_buf. (This is similar to the patches for RELENG_4
at http://www.cs.princeton.edu/~yruan/debox/.)
For the uninitiated, an sf_buf is nothing more than a kernel virtual
address that is used for temporary virtual-to-physical mappings by
sendfile(2) and zero-copy sockets. As such, there is no reason for
one vm_page to have several sf_bufs mapping it. In fact, using more
than one sf_buf for a single vm_page increases the likelihood that
sendfile(2) blocks, hurting throughput.
(See http://www.cs.princeton.edu/~yruan/debox/.)
is the warning that points to the bug in `(char *)malloc(...)' where
malloc() is implicitly declared as returning int. We do similar things
here, but they work because u_int is the same as uintptr_t on i386's.)
- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).
Obtained from: OpenBSD
MFC after: 2 weeks
Its restoration in rev.1.102 was mistranslated to the equivalent of
setsofttty() in rev.1.105. This increased overheads by causing a
context switch to the SWI handler after almost every interrupt. The
increase was approx. 50% on a Celeron 366 (from 23 usec to 34 usec
per interrupt).
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.
longer uses these interrupt vectors for its ISA interrupt pins, so these
entries will not be overwritten. If we get a spurious interrupt from the
ATPIC when using the APIC, it will be treated as a stray interrupt instead
of causing a panic.
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.
Requested by: peter (2)
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.
Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)
interrupt such as IRQ 22 or 19. However, the ACPI BIOS still routes
interrupts from some PCI devices to the same intpin calling the pin
IRQ 22. Thus, ACPI expects to address a single interrupt source via two
different names. To work around this, if the SCI is remapped to a non-ISA
interrupt (i.e., greater than 15), then we use
acpi_OverrideInterruptLevel() function to tell ACPI to use IRQ 22 or 19
rather than IRQ 9 for the SCI.
Previously we would change IRQ 22 or 19's name to IRQ 9 when we encountered
such an Interrupt Source Override entry in the MADT which routed the SCI
properly but left PCI devices mapped to IRQ 22 or 19 w/o a routable
interrupt.
Tested by: sos
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.
This change is triggered by a need on ia64 to have another knob for
get_mcontext().
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().
In collaboration with: tegge
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.
Reported by: kan
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.
Reported by: Michal Mertl <mime@traveller.cz>
isa_device pointer as its argument and uses that to call the driver's
interrupt handler passing the unit number as its argument. This should
fix COMPAT_OLDISA devices with a unit number of 0.
Reviewed by: peter
Reported by: bde
- Compile 'device acpi' into GENERIC by default as well. Note that
the beastie loader menu item to disable ACPI still works if ACPI is
compiled into the kernel.
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.
- Use intr_suspend/resume() callouts to the interrupt code layer which
suspends and resumes all the known interrupt sources instead of calling
icu_reinit() directly.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.
Reviewed by: peter
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.
the ACPI timer and we shouldn't do that if ACPI is already around to do
that for us.
- Set a description and tweak the order of checks in the probe function
to more closely match other PCI drivers.
This should probably be moved to sys/dev/piix/piix.c at some point and
turned on for all i386 kernels rather than just SMP ones.
enable strict checks of the AML. Our default behavior will be to relax
checks to work on as many platforms as possible. Also clean up and document
other ACPI options while I'm here.
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.
This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.
I changed. That is never a good sign.
1) only map 1 page at address zero, not 4096 pages
2) page 1 starts at address 4096 (PAGE_SIZE) not 4095 (PAGE_MASK). I
don't even want to think what the pte's looked like.
3) subtract the r/o page group start address from the end before
converting it to a count. Otherwise an extra page is mapped.
If you were affected by this, the symptoms of this was a hang at boot
after the spinner. Sorry folks. :-(
"You broke my laptop!" by: sam
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.
In collaboration with: tegge
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in ibcs2_emul_find();
other calls to stackgap_alloc() have not been changed since they
are small fixed-size allocations.
- Replace use of strcpy() with strlcpy() in exec_coff_imgact()
to avoid buffer overflow
- Use strlcat() instead of strcat() to avoid a one byte buffer
overflow in ibcs2_setipdomainname()
- Use copyinstr() instead of copyin() in ibcs2_setipdomainname()
to ensure that the string is null-terminated
- Avoid integer overflow in ibcs2_setgroups() and ibcs2_setgroups()
by checking that gidsetsize argument is non-negative and
no larger than NGROUPS_MAX.
- Range-check signal numbers in ibcs2_wait(), ibcs2_sigaction(),
ibcs2_sigsys() and ibcs2_kill() to avoid accessing array past
the end (or before the start)
work in, but we had it mapped read-only. While this has always been the
case, the PG_PS enable hack hid it and the apm bios code ended up taking
advantage of it.
I do not yet understand why, but apm *depended* on the fact that the old
PSE code caused the first 1MB of ram to be mapped read/write because it
was in the same 4MB page as the kernel text+data+bss blob.
If anybody ever tried DISABLE_PSE before, apm would not work.
If your cpu did not have PSE, apm would not work there either (eg: 486).
This bug has been around for a Very Long Time.
The Pentium-4-fix commits did not emulate this unintended side effect of
the PSE post-early-boot fixup, and thus apm blew up. I've added a hack to
emulate the bug until either apm is fixed or we set fire to our bridges.
This is bad though because it gives kernel mode code the opportunity
to accidently write to the first few megs of the general page pool
which is remapped at KERNBASE. It needs to be fixed properly.
A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.
For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.
A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.
Obtained from: bmilekic
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.
Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.
This is just a cleanup here (modulo rev.1.108 of kern/tty.c), since the
input speed can be different from to output speed and extra code to
handle both speeds naturally handled all cases.
provide no methods does not make any sense, and is not used by any
driver.
It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.
Change the defaults to be nullopen() and nullclose() which simply
does nothing.
Remove explicit initializations to these from the drivers which
already used them.
cd_setreg() were still using !(read_eflags() & PSL_I) as the condition
for the lock hidden by COM_LOCK() (if any) being held. This worked
when spin mutexes and/or critical_enter() used hard interrupt disablement,
but it has caused recursion on the non-recursive mutex com_mtx since
all relevant interrupt disablement became soft. The recursion is
harmless unless there are other bugs, but it breaks an invariant so
it is fatal if spinlocks are witnessed.