- export the rest of the cpu features (and amd's features).
- turn on EFER_NXE, depending on the NX amd feature bit
- reorg the identcpu stuff a bit in order to stop treating the
amd features as second class features (since it is now a primary feature
bit set) and make it easier to export.
lives in the top 12 'available' bits. atop() in the PHYS_TO_VM_PAGE()
macro only masks off the lower bits (by accident) and the upper bits
in the 64 bit ptes turn into "interesting" index values.
pmap_remove() would be called with a huge range and we'd stride across
it in only 2MB chunks. This would manifest as massive cpu time and a
largely unresponsive system during hard swap. Instead, check the higher
page directories which means we can run pmap_remove() in just a few
hundred loop iterations instead of millions since we can process
address space in chunks of 512GB and 1GB as well as 2MB.
Eternal thanks to: tmm
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.
Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.
For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.
Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week
gmon and struct gmonhdr was originally just to represent the kernel
(profiling) clock frequency and it remains poorly suited to representing
the frequencies of fast counters like the TSC. It broke a year or two
ago. This quick fix keeps it working for another year or month or two
until TSC frequencies can exceed 2^32, by dividing the frequency by 2.
Dividing the frequency by 4 would work for a little longer but would
lose a little too much precision.
ordinary functions, essentially by backing out half of rev.1.115 of
amd64/exception.S. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .S files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.
This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.
- perfmon headers must be avoided until perfmon is supported.
- all call-used registers including return registers must be preserved
by .mcount(), etc., not quite as in profile.h. __cyg_profile_func_*()
don't require this, but they are (mis)implemented as aliases for
.mcount(), etc. so they preserve the registers.
- i386 ifdefs related to perfmon have not been adjusted yet.
amd64 as necessary. This is routine, except:
- the FAKE_MCOUNT($bintr) in doreti was missing the '$'. This gave a
a garbage address made up of padding bytes (with the nop byte 0x90 as
the MSB) instead of the intended address of bintr. This accidentally
worked on i386's because (0x90 << 24) is close enough to bintr, but
it doesn't work on amd64's because (0x90 << 56) is much further away
from bintr.
- the FAKE_MCOUNT($btrap) in calltrap was similarly broken. It hasn't
been needed since FreeBSD-1, so just delete it.
and high resolution profiling of interrupt handlers. The adjustments
are routine once the magic stack offset 13*4 is decoded to be TF_RIP
(there were originally more types of stack frames so using TF_EIP for
one of them wouldn't have been much simpler).
Removed garbage comments attached to some of the FAKE_MCOUNT()s.
that the usual macro for "ret" hides the detail of calling .mexitcount
before returning.
Fixed missing call to .mexitcount in lgdt(). This was missing on
i386's, mainly because lgdt() uses lret[q] insted of ret. This is
very unimportant since lgdt() is not (normally?) called until after
profiling is initialized.
and improved some comments). Also, made the documented {f,s}uword()
functions the standard entry points and the undocumented {f,s}uword64()
functions alternative entry points, like {f,s}uword32() for i386's. The
bitrot in the comments was a little larger here -- there are new undocumented
32-bit sub-word functions, not just renaming of 16-bit functions from
documented ones to undocumented ones.
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.
different context support for 32 vs 64 bit processes. This simply omits
the save/restore of the segment selector registers for non 32 bit
processes. This avoids the rdmsr/rwmsr juggling when restoring %gs
clobbers the kernel msr that holds the gsbase.
However, I suspect it might be better to conditionally do this at
user<->kernel transition where we wouldn't need to do the juggling in the
first place. Or have per-thread extended context save/restore hooks.
to help the AMD cpus (which have a hardware tlb flush filter). I held
off to see what the 64 bit Intel cpus did, but it doesn't seem to help
much there either. Oh well, store it in the Attic.
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.
Reviewed by: marcel, bde, jhb
register controlled the trigger mode and polarity of EISA interrupts.
However, it appears that most (all?) PCI systems use the ELCR to manage
the trigger mode and polarity of ISA interrupts as well since ISA IRQs used
to route PCI interrupts need to be level triggered with active low
polarity. We check to see if the ELCR exists by sanity checking the value
we get back ensuring that IRQS 0 (8254), 1 (atkbd), 2 (the link from the
slave PIC), and 8 (RTC) are all clear indicating edge trigger and active
high polarity.
This mini-driver will be used by the atpic driver to manage the trigger and
polarity of ISA IRQs. Also, the mptable parsing code will use this mini
driver rather than examining the ELCR directly.
move its declaration to the machine-dependent header file on those
machines that use it. In principle, only i386 should have it.
Alpha and AMD64 should use their direct virtual-to-physical mapping.
- Remove pmap_kenter_temporary() from ia64. It is unused. Approved
by: marcel@
Report the %ecx bits in cpuid function 1. This is a hack.
When reporting AMD Features, only mask off the common bits. Otherwise
the SEP bit masks off SYSCALL etc in the report.
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.
With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)
Compile-tested on: i386
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.
all the ancient Intel/VIA/SIS/etc chipsets on amd64 systems. Even the
newer intel stuff won't need this since we use acpi by default and we
don't have all their magic programming information. Just use a generic
"Host to PCI bridge" name if we ever hit this code.
to build the kernel. It doesn't affect the operation if gcc.
Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.
Additional changes:
- in_cksum.[ch]:
* use a generic C version instead of the assembly version in the !gcc
case (ASM code breaks with the optimizations icc does)
-> no bad checksums with an icc compiled kernel
Help from: andre, grehan, das
Stolen from: alpha version via ppc version
The entire checksum code should IMHO be replaced with the DragonFly
version (because it isn't guaranteed future revisions of gcc will
include similar optimizations) as in:
---snip---
Revision Changes Path
1.12 +1 -0 src/sys/conf/files.i386
1.4 +142 -558 src/sys/i386/i386/in_cksum.c
1.5 +33 -69 src/sys/i386/include/in_cksum.h
1.5 +2 -0 src/sys/netinet/igmp.c
1.6 +0 -1 src/sys/netinet/in.h
1.6 +2 -0 src/sys/netinet/ip_icmp.c
1.4 +3 -4 src/contrib/ipfilter/ip_compat.h
1.3 +1 -2 src/sbin/natd/icmp.c
1.4 +0 -1 src/sbin/natd/natd.c
1.48 +1 -0 src/sys/conf/files
1.2 +0 -1 src/sys/conf/files.amd64
1.13 +0 -1 src/sys/conf/files.i386
1.5 +0 -1 src/sys/conf/files.pc98
1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c
1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h
1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c
1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c
1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c
1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c
1.6 +1 -2 src/sys/netinet/igmp.c
1.4 +158 -116 src/sys/netinet/in_cksum.c
1.6 +1 -1 src/sys/netinet/ip_gre.c
1.7 +1 -2 src/sys/netinet/ip_icmp.c
1.10 +1 -1 src/sys/netinet/ip_input.c
1.10 +1 -2 src/sys/netinet/ip_output.c
1.13 +1 -2 src/sys/netinet/tcp_input.c
1.9 +1 -2 src/sys/netinet/tcp_output.c
1.10 +1 -1 src/sys/netinet/tcp_subr.c
1.10 +1 -1 src/sys/netinet/tcp_syncache.c
1.9 +1 -2 src/sys/netinet/udp_usrreq.c
1.5 +1 -2 src/sys/netinet6/ipsec.c
1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c
1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c
1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c
and finally remove
sys/i386/i386 in_cksum.c
sys/i386/include in_cksum.h
---snip---
- endian.h:
* DTRT in C++ mode
- quad.h:
* we don't use gcc v1 anymore, remove support for it
Suggested by: bde (long ago)
- assym.h:
* avoid zero-length arrays (remove dependency on a gcc specific
feature)
This change changes the contents of the object file, but as it's
only used to generate some values for a header, and the generator
knows how to handle this, there's no impact in the gcc case.
Explained by: bde
Submitted by: Marius Strobl <marius@alchemy.franken.de>
- aicasm.c:
* minor change to teach it about the way icc spells "-nostdinc"
Not approved by: gibbs (no reply to my mail)
- bump __FreeBSD_version (lang/icc needs to know about the changes)
Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.
Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.
Reviewed by: -arch
Submitted by: netchild
in the non-_KERNEL case. This "fixes" applications that include
this "kernel-only" header and also include <strings.h> (or get
<strings.h> via the default _BSD_VISIBLE pollution in <string.h>.
In C++ there was a fatal error: the declaration specifies C linkage
but the implementation gives C++ linkage. In C there was only a
static/extern mismatch if the headers were included in a certain order
order, and a partially redundant declaration for all include orders;
gcc emits incomplete or wrong diagnostics for these, but only for
compiling with -Wsystem-headers and certain other warning options, so
the problem was usually not seen for C.
Ports breakage reported by: kris
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.
Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
to catch are already nicely caught by trapping the null pointer derefs.
Remove no-longer-used noswitch/nothrow strings. They were referenced
by the stub cpu_switch() etc functions before they were implemented.
Try something a little different for the lock prefixes.
Prompted by: bde (the first two items anyway)
updated for the regparm ABI on amd64.
Context switch debug regs.
Update for fpu simplification
Don't needlessly reload %cr3, in case the cpu has the tlb flush filter
turned off. Re-add LAZY_SWITCH stubs.
at it, use the ANSI C generic pointer type for the second argument,
thus matching the documentation.
Remove the now extraneous (and now conflicting) function declarations
in various libc sources. Remove now unnecessary casts.
Reviewed by: bde
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.
Reviewed by: deischen, dfr
faster.)
MFi386:
- Don't bother clearing PG_ZERO on the page table page in
_pmap_allocpte(); it serves no purpose.
- Don't bother clearing and setting PG_BUSY on page table directory pages.
pmap_init(). Such a large preallocation is unnecessary and wastes
nearly eight megabytes of kernel virtual address space per gigabyte
of managed physical memory.
- Increase UMA_BOOT_PAGES by two. This enables the removal of
pmap_pv_allocf(). (Note: this function was only used during
initialization, specifically, after pmap_init() but before
pmap_init2(). During pmap_init2(), a new allocator is installed.)
such that 'ispcvt' can build. Unforunately 'ispcvt' is needed in order for
/etc/rc.d/syscons to run. This fixes the bug where I could not get my
keymap effective at boot.
of doing a loop and taking two 32 bit passes at the runqueue bits. All
the 64 bit platforms should probably do this since there are 64 run queues.
Approved by: re (scottl)
used on amd64, and were actually totally broken. They had the wrong
calling conventions. I believe the i386 versions are going away too.
Approved by: re (scottl)
i386 version. The curthread special case in pcpu.h solves my complaint
about the verbose macro expansion in this case. Note that the i386
version still has some OBE comments, I didn't re-add them back again.
Approved by: re (scottl)
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.
Approved by: re (scottl)
Tested on: i386, amd64, alpha
the MTRR Base/Mask registers. If you use the documented algorithm in the
systems programming guide, you'll get a GPF. The only thing that has
prevented this so far is that the bios pre-sets some MTRR entries which
we mis-interpreted sufficiently to fool the memcontrol interface into
thinking all the address space was taken and therefore rejected XFree86's
requests. However, not all bioses do this.. You get an insta-panic in
that case. Grrr. A better fix (dynamic mask) will happen by 5.3/5-stable
so that we automatically adapt to more than 40 physical bits.
Approved by: re (scottl)
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.
Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)
known samples of broken chipsets that needed mixed mode in the first place
are so broken (ie: locks up) that we can't use IO APIC mode at all and it
needs to be turned off in the bios. So, the MIXED_MODE penalty on the
good chipsets gained nothing.
Approved by: re (scottl)
the compiler having to parse and optimize the PCPU_GET(curthread) so often.
__curthread() is an inline optimized version of PCPU_GET(curthread) that
knows that pc_curthread is at offset zero in the pcpu struct. Add a
CTASSERT() to catch any possible changes to this. This accounts for
just over a 1% wall clock speedup for total kernel compile/link time,
and 20% compile time speedup on some specific files depending on which
compile options are used.
Approved by: re (jhb)
- turn on SMP in generic
- add 'device atpic' - this is unconditional on i386, but certain nvidia
based systems need to disable acpi because the reference bios seems to be
hosed. If acpi is disabled, we won't find the apic. amd64 has the
mptable code in a seperate compile option as well.
- turn sym back on, it doesn't fail to compile anymore.
Approved by: re
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.
longer uses these interrupt vectors for its ISA interrupt pins, so these
entries will not be overwritten. If we get a spurious interrupt from the
ATPIC when using the APIC, it will be treated as a stray interrupt instead
of causing a panic.
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.
Requested by: peter (2)
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.
Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)
interrupt such as IRQ 22 or 19. However, the ACPI BIOS still routes
interrupts from some PCI devices to the same intpin calling the pin
IRQ 22. Thus, ACPI expects to address a single interrupt source via two
different names. To work around this, if the SCI is remapped to a non-ISA
interrupt (i.e., greater than 15), then we use
acpi_OverrideInterruptLevel() function to tell ACPI to use IRQ 22 or 19
rather than IRQ 9 for the SCI.
Previously we would change IRQ 22 or 19's name to IRQ 9 when we encountered
such an Interrupt Source Override entry in the MADT which routed the SCI
properly but left PCI devices mapped to IRQ 22 or 19 w/o a routable
interrupt.
Tested by: sos
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.
This change is triggered by a need on ia64 to have another knob for
get_mcontext().
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.
Reported by: kan
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.
Reported by: Michal Mertl <mime@traveller.cz>
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
setting. Make va_copy be an alias if __ISO_C_VISIBLE >= 1999.
Why? more than a few ports have an autoconf that looks for __va_copy
because it is available on glibc. It is critical that we use it if
at all possible on amd64. It generally isn't a problem for i386 and its
ilk because autoconf driven code tends to fall back to an assignment.
implement i386 compat numbers where it makes sense. This would save a
syscall translation layer. Yes, this breaks the abi slightly again, but
fortunately its just a recompile rather than tweaking the source. I will
be fixing the libc stubs while I'm here.
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.
This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.
sigreturn() ABI and the signal context on the stack.
Make the trapframe (and its shadows in the ucontext and sigframe etc)
8 bytes larger in order to preserve 16 byte stack alignment for the
following C code calls. I could have done some padding after the
trapframe was saved, but some of the C code still expects an argument of
'struct trapframe'. Anyway, this gives me a spare field that can be used
to store things like 'partial trapframe' status or something else in
the future.
The runtime impact is fairly small, *except* for threaded apps and things
that decode contexts and the signal stack (eg: cvsup binary). Signal
delivery isn't too badly affected because the kernel generates the
sigframe that sigreturn uses after the handler has been called.
The size of mcontext_t and struct sigframe hasn't changed. Only
the last few fields (sc_eip etc) got moved a little and I eliminated
a spare field. mc_len/sc_len did change location though so the
sanity checks there will still trap it.
A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.
For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.
A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.
Obtained from: bmilekic
Reimplement pmap_release() such that it uses the page table rather than
the pte object to locate the page table directory pages. (Temporarily,
retain an assertion on the emptiness of the pte object.)
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
32 bit binary stuff. 32 bit binaries do not like it much when the kernel
tries hard to put things above the 8GB mark.
I have a work-in-progress to fix this properly, but I didn't want to burn
anybody with this yet.
initialize a TSC timecounter until we know if it is broke or not.
XXX I think there is a bug in the i386 code here. init_TSC_tc() comes
after:
if (statclock_disable)
return;
ie: if you turn off the statclock interrupt, you dont get the TSC either.
for breakpoint and trace traps from usermode. Although all the setidt
entries are interrupt gates on amd64, all but the trace and bpt trap
entry handlers reenable interrupts after the swapgs instruction in order
to simulate the trap/interrupt gate distinction. In other words, the
amd64 code behaves the same way that i386 does here.