- Compile 'device acpi' into GENERIC by default as well. Note that
the beastie loader menu item to disable ACPI still works if ACPI is
compiled into the kernel.
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.
- Use intr_suspend/resume() callouts to the interrupt code layer which
suspends and resumes all the known interrupt sources instead of calling
icu_reinit() directly.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.
Reviewed by: peter
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.
the ACPI timer and we shouldn't do that if ACPI is already around to do
that for us.
- Set a description and tweak the order of checks in the probe function
to more closely match other PCI drivers.
This should probably be moved to sys/dev/piix/piix.c at some point and
turned on for all i386 kernels rather than just SMP ones.
enable strict checks of the AML. Our default behavior will be to relax
checks to work on as many platforms as possible. Also clean up and document
other ACPI options while I'm here.
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.
This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.
I changed. That is never a good sign.
1) only map 1 page at address zero, not 4096 pages
2) page 1 starts at address 4096 (PAGE_SIZE) not 4095 (PAGE_MASK). I
don't even want to think what the pte's looked like.
3) subtract the r/o page group start address from the end before
converting it to a count. Otherwise an extra page is mapped.
If you were affected by this, the symptoms of this was a hang at boot
after the spinner. Sorry folks. :-(
"You broke my laptop!" by: sam
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.
In collaboration with: tegge
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in ibcs2_emul_find();
other calls to stackgap_alloc() have not been changed since they
are small fixed-size allocations.
- Replace use of strcpy() with strlcpy() in exec_coff_imgact()
to avoid buffer overflow
- Use strlcat() instead of strcat() to avoid a one byte buffer
overflow in ibcs2_setipdomainname()
- Use copyinstr() instead of copyin() in ibcs2_setipdomainname()
to ensure that the string is null-terminated
- Avoid integer overflow in ibcs2_setgroups() and ibcs2_setgroups()
by checking that gidsetsize argument is non-negative and
no larger than NGROUPS_MAX.
- Range-check signal numbers in ibcs2_wait(), ibcs2_sigaction(),
ibcs2_sigsys() and ibcs2_kill() to avoid accessing array past
the end (or before the start)
work in, but we had it mapped read-only. While this has always been the
case, the PG_PS enable hack hid it and the apm bios code ended up taking
advantage of it.
I do not yet understand why, but apm *depended* on the fact that the old
PSE code caused the first 1MB of ram to be mapped read/write because it
was in the same 4MB page as the kernel text+data+bss blob.
If anybody ever tried DISABLE_PSE before, apm would not work.
If your cpu did not have PSE, apm would not work there either (eg: 486).
This bug has been around for a Very Long Time.
The Pentium-4-fix commits did not emulate this unintended side effect of
the PSE post-early-boot fixup, and thus apm blew up. I've added a hack to
emulate the bug until either apm is fixed or we set fire to our bridges.
This is bad though because it gives kernel mode code the opportunity
to accidently write to the first few megs of the general page pool
which is remapped at KERNBASE. It needs to be fixed properly.
A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.
For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.
A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.
Obtained from: bmilekic
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.
Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.
This is just a cleanup here (modulo rev.1.108 of kern/tty.c), since the
input speed can be different from to output speed and extra code to
handle both speeds naturally handled all cases.
provide no methods does not make any sense, and is not used by any
driver.
It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.
Change the defaults to be nullopen() and nullclose() which simply
does nothing.
Remove explicit initializations to these from the drivers which
already used them.
cd_setreg() were still using !(read_eflags() & PSL_I) as the condition
for the lock hidden by COM_LOCK() (if any) being held. This worked
when spin mutexes and/or critical_enter() used hard interrupt disablement,
but it has caused recursion on the non-recursive mutex com_mtx since
all relevant interrupt disablement became soft. The recursion is
harmless unless there are other bugs, but it breaks an invariant so
it is fatal if spinlocks are witnessed.
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
be gone in FreeBSD 6, so put BURN_BRIDGES around it. The TRB also
felt that if something better comes along sooner, it can be used to
replace this code.
Delayed by: BSDcon and subsequent disk crash.
known constants at compile time rather than at run time. We have a number
of nasty hacks around the place to cache ntohl() of constants (eg: nfs).
This change allows the compiler to compile-time evaluate ntohl(1) as
0x01000000 rather than having to emit assembler code to do it. This
has other smaller flow-on effects because the compiler can see that
ntohl(constant) itself has a constant value now and can propagate the
compile time evaluation.
Obtained from: Ideas from NetBSD and Linux, and some code from NetBSD
of "dumb" PCI-based serial/parallel boards get a hint how to enable
them.
I wasn't sure about the ia64, pc98, powerpc, and sparc64 archs whether
they'd support puc(4) or not.
reserved bits in the port that must be zero are 24:30, not 20:30. Bits
16:23 are used to set the bus number. This meant that when we tested for
config mechanism #1, if the previous PCI configuration transaction sent
used a bus number greater than 15, one of the bits in 20:23 would be
non-zero and we would fail to use config mechanism #1 and thus fail to see
that PCI existed on the machine at all.
Obtained from: Shanley's PCI System Architecture book
Tested by: des
Proxied through: njl
user mode. This goes with rev.1.468 of machdep.c which changed the gates
for these traps to interrupt gates. Having the interrupts disabled for
these traps from user mode is just an unwanted side effect.
This fixes at least 1 case of "panic: absolutely cannot call
smp_ipi_shootdown with interrupts already disabled". Too much code was
run with interrupts disabled, and it sometimes hit a sanity check.
Fix verified by: deischen
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().
intpin register is expressed in hardware where 0 means none, 1 means INTA,
2 INTB, etc. The other way is commonly used in loops where 0 means INTA,
1 means INTB, etc. The matchpin argument to pci_cfgintr_search() is
supposed to be the first form, but we passsed in a loop index of the
second. This fix adds one to the loop index to convert to the first form.
Reported by: Pavlin Radoslavov <pavlin@icir.org>
written by Stuart Walsh and Duncan Barclay (with some kibbitzing by
me). I'm checking it in on Stuart's behalf.
The BCM4401 is built into several x86 laptop and desktop systems. For the
moment, I have only enabled it in the x86 kernel config because although
it's a PCI device, I haven't heard of any standalone NICs that use it. If
somebody knows of one, we can easily add it to the other arches.
This driver uses register/structure data gleaned from the Linux
driver released by Broadcom, but does not contain any of the code
from the Linux driver itself. It uses busdma.
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.
Reviewed by: tegge
rl(4) driver and put it in a new re(4) driver. The re(4) driver shares
the if_rlreg.h file with rl(4) but is a separate module. (Ultimately
I may change this. For now, it's convenient.)
rl(4) has been modified so that it will never attach to an 8139C+
chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to
match the 8169/8169S/8110S gigE chips. if_re.c contains the same
basic code that was originally bolted onto if_rl.c, with the
following updates:
- Added support for jumbo frames. Currently, there seems to be
a limit of approximately 6200 bytes for jumbo frames on transmit.
(This was determined via experimentation.) The 8169S/8110S chips
apparently are limited to 7.5K frames on transmit. This may require
some more work, though the framework to handle jumbo frames on RX
is in place: the re_rxeof() routine will gather up frames than span
multiple 2K clusters into a single mbuf list.
- Fixed bug in re_txeof(): if we reap some of the TX buffers,
but there are still some pending, re-arm the timer before exiting
re_txeof() so that another timeout interrupt will be generated, just
in case re_start() doesn't do it for us.
- Handle the 'link state changed' interrupt
- Fix a detach bug. If re(4) is loaded as a module, and you do
tcpdump -i re0, then you do 'kldunload if_re,' the system will
panic after a few seconds. This happens because ether_ifdetach()
ends up calling the BPF detach code, which notices the interface
is in promiscuous mode and tries to switch promisc mode off while
detaching the BPF listner. This ultimately results in a call
to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init()
to handle the IFF_PROMISC flag change. Unfortunately, calling re_init()
here turns the chip back on and restarts the 1-second timeout loop
that drives re_tick(). By the time the timeout fires, if_re.ko
has been unloaded, which results in a call to invalid code and
blows up the system.
To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(),
which stops the ioctl routine from trying to reset the chip.
- Modified comments in re_rxeof() relating to the difference in
RX descriptor status bit layout between the 8139C+ and the gigE
chips. The layout is different because the frame length field
was expanded from 12 bits to 13, and they got rid of one of the
status bits to make room.
- Add diagnostic code (re_diag()) to test for the case where a user
has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some
NICs have the REQ64# and ACK64# lines connected even though the
board is 32-bit only (in this case, they should be pulled high).
This fools the chip into doing 64-bit DMA transfers even though
there is no 64-bit data path. To detect this, re_diag() puts the
chip into digital loopback mode and sets the receiver to promiscuous
mode, then initiates a single 64-byte packet transmission. The
frame is echoed back to the host, and if the frame contents are
intact, we know DMA is working correctly, otherwise we complain
loudly on the console and abort the device attach. (At the moment,
I don't know of any way to work around the problem other than
physically modifying the board, so until/unless I can think of a
software workaround, this will have do to.)
- Created re(4) man page
- Modified rlphy.c to allow re(4) to attach as well as rl(4).
Note that this code works for the sample 8169/Marvell 88E1000 NIC
that I have, but probably won't work for the 8169S/8110S chips.
RealTek has sent me some sample NICs, but they haven't arrived yet.
I will probably need to add an rlgphy driver to handle the on-board
PHY in the 8169S/8110S (it needs special DSP initialization).
Quick fix for calling DELAY() for ddb input in some (atkbd-based)
console drivers. ddb must not use any normal locks, but DELAY()
normally calls getit() which needs clock_lock. One problem with using
normal locks in ddb is that deadlock is possible, but deadlock on
clock_lock is unlikely becaluse clock_lock is bogusly recursive,
apparently just to hide the problem of ddb using it. The i8254 clock
hardware has mostly write-only registers so it is important for it to
use a lock that gives exclusive access. (atkbd hardware is also
unfriendly to reentrant software but that problem is more local and
already solved.) I mostly saw the symptoms of the bug caused by
unlocking in getit() running cpu_unpend(). cpu_unpend() should not
be called while in ddb and Debugger() calls for failing assertions
about this caused a breakpoint within ddb.
ddb must also not call getit() because ddb may be being used to step
through clock initialization code that has stopped or otherwise mangled
the clock. If the clock is stopped, then getit() always returns the
same value and DELAY() takes forever if it trusts getit().
The quick fix is implement DELAY(n) as (n * timer_freq / 1000000)
inb(0x84)'s if ddb is active.
machdep.c:
Don't permit recursion on clock_lock.
kdb_trap(). Stopping the other CPUs acts like locking them out, but
it wasn't done early enough or held long enough to prevent concurrent
accesses to shared data. In particular, the saved regs could be
clobbered.
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.
Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.
change also disables interrupts around non-S4 suspends whereas before we
did not do this. Our version of AcpiEnterSleepStateS4bios was almost
identical to the ACPICA version.
- Add a new PCIM_HDRTYPE constant for the field in PCIR_HDRTYPE that holds
the header type.
- Replace several magic numbers with appropriate constants for the header
type register and a couple of PCI_FUNCMAX.
- Merge to amd64 the fix to the i386 bridge code to skip devices with
unknown header types.
Requested by: imp (1, 2)
_pmap_allocpte(): Guarantee that the page table page is zero filled before
adding it to the directory. Otherwise, a 2nd, 3rd, etc. thread could
access a nearby virtual address and use garbage for the address
translation.
Discussed with: peter, tegge
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.
ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).
alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.
powerpc: MD prototypes left in cpu.h. Comment added.
Suggested by: bde
Tested with: make universe (pc98 incomplete)
A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.
Add a sysctl to report all available timecounters and their qualities.
Give the dummy timecounter a solid negative quality of minus a million.
Give the i8254 zero and the ACPI 1000.
The TSC gets 800, unless APM or SMP forces it negative.
Other timecounters default to zero quality and thereby retain current
selection behaviour.
- Add a macro for the logical shift needed to extract an APIC ID from
either from the local APIC ICR Hi register or the APIC ID registers of
the local and IO APICs.
was masked. However KIMURA Yasuhiro-san noticed my mistake and was
kind enough to provide a better patch in PR 55581. I've merged that
into the routine. Hopefully I've not overlooked anything this time.
MFC After: 5 days
ioctls.
In the particular case of ptrace(), this commit more-or-less reverts
revision 1.53 of sys_process.c, which appears to have been erroneous.
Reviewed by: iedowse, jhb
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.
Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.
queues lock such that it isn't held around the call to get_pv_entry(),
which calls uma_zalloc(). At the point of the call to get_pv_entry(), the
lock isn't necessary and holding it could lead to recursive acquisition,
which isn't allowed.
that the page's busy flag could be relied upon to synchronize access to the
pv list. I don't any longer. See, for example, the call to
pmap_insert_entry() from pmap_copy().)
(short) types for the port arg of inb() (rev.1.56). The warning started
working for u_short types with gcc-3.3. The pessimizations exposed
by this been fixed except for the cx and oltr drivers where the breakage
of the warning has been pushed to the drivers.
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
they haven't been counted before. This test was ommitted when bus_dmamap_load()
was merged into this function, and results in the pagesneeded field growing
without bounds when multiple deferrals happen.
Thanks to Paul Saab for beating his head against this for a few hours =-)
- Move the enabling of interrupts out of assembly and into C a few
instructions later at cpu_critical_fork_exit(). This puts more of the
MD critical section implementation under the MD critical section API
making it easier to test and develop alternative implementations.
Also change "Auto mode" to use a "special" value
instead of 0, and define and document it.
I had thought libpthread had already been switched to use auto mode but
it appears that patch hasn't been committed yet.
Discussed with: Davidxu
there is code that blindly allocates LDTEs starting at slot 6
and I quess it doesn't really matter to us if they overwrite the BSDI
syscall slot, since it isn't a BSDI binary. Also add some code to help track
down other such users (commented out for now).
Reviewed by: deischen@
type. We know about header types 0, 1 and 2. Ignore the rest in the
MD i386 code when we're looking for bridges. You cannot look at the
vendor tag. And if you don't you certainly can't look at function > 0
if the device isn't there.
The new soekris boards' GEODE cpu has issues with the old way. This
is reported to have fixed it.
MFC After: 2 days
considered to be good to try when it otherwise has no clue about which
interrupts to try. This is a band-aide and we really should try to
balance the IRQs that we arbitrarily pick, but it should help some
people that would otherwise get bad IRQs.
The other option would be to remove it, but I can imagine it may be useful
for the forseeable future as we fiddle with segments in KSE and thr libraries,
that while many maps can exist and be loaded per tag, bus_dmamap_load() and
friends can only be called on one map at a time from the tag. This is
enforced via the mutex arguments in the tag.
Fixing this bug means that s/g lists can be arbitrarily long in length, and
also removes an ugly GNU-ism from the code. No API or ABI change is
incurred. Similar changes for other platforms is forthcoming.
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.
In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.
Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?
May fix some of silby's crashes on the PV ENTRY zone.
or free a LDT entry. The function has following prototype:
int i386_set_ldt(int start_sel, union descriptor *descs, int num_sels);
Added Features:
o If start_sel is 0, num_sels is 1 and the descriptor pointed to by descs
is legal, then i386_set_ldt() will allocate a descriptor and return its
selector numbe
o If num_descs is 1, start_sels is valid, and descs is NULL, then
i386_set_ldt() will free that descriptor (making it available to be real-
located again later).
o If num_descs is 0, start_sels is 0 and descs is NULL then, as a special
case, i386_set_ldt() will free all descriptors.
Reviewed by: julian
use "\n\" instead of "\" at the end of each source line, and don't use
semicolons). Fixed some older style bugs on the same lines (mainly
English errors in comments).
with up to date comments. This fixes booting kernels with boot2
(except for loss of the features provided by loader) and is suitable
for MFC. Contrary to the old comments, most loaders don't clear the bss.
biosboot lost clearing of the bss in a code crunch in 1997, and boot2
never did it.
kan didn't notice the problem with gcc-3.3 putting variables that are
initialized to 0 in the bss until after committing gcc-3.3 because he
was already using essentially this patch. Before gcc-3.3, only the
non-critical `bootdev' variable was clobbered by clearing the bss.
MFC after: 3 days
HIDENAME() macro seems to be unimplementable in C. (HIDENAME() used
to use invalid token pasting using ## for the STDC case until gcc
started rejecting that; now it uses unportable token pasting using
juxtaposition in all cases.) This reduces use of HIDENAME() in the
kernel to only i386 and amd64 profiling code so that it doesn't bite
most kernels whenever gcc becomes stricter. Problems with HIDENAME()
in userland are smaller because userland mostly doesn't use strict
flags yet. There are some advantages to hiding the name of mcount,
but newer arches shouldn't do it; only amd64 does.
MFC after: 3 days
On second thoughts hide tmpstk better by staticizing it.
in the `video_state' structure, to larger ones (from u_char to
u_short). Each can now hold values at least as large as the
size of the array it is meant to point into.
This eliminates warnings printed by GCC 3.3.1 and hence makes
pcvt compilable using -Werror.
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
written as a template that when inlined is specialized for the caller
through constant value propagation and dead code elimination. Thus,
the specialized code that is generated for pmap_clear_reference() et
al. avoids several conditional branches inside of a loop.
fields in the low 32 bits of the local APIC ICR register. Use this macro
in place of APIC_RESV2_MASK when masking off existing bits from the ICR
when writing to it to send an IPI.
Tested by: scottl
LAZY_SWITCH changes. He pointed out the acpi code sets up an identity
mapping in the current vmspace and that got messed up by the %cr3 being
out of sync with the current page directory. As a workaround, restore
%cr3 across the sleep/resume. A more complete fix would be to undo the
lazy state and clear the pm_active bit from the borrowed pmap, but this
works and people are currently hurting. I'll clean this up.
This is mostly Ian's patch, plus a PAE tweak from me.
work when using a graphics chipset which identifies itself as
`VIA CLE266', used in some VIA EPIA boards. Two values need to be
patched in the VESA mode information structure: the widths of the modes
mentioned above are encoded in a format which was unknown to the VESA
module (and to my copy of the VBE spec.) whereas the window memory
segment values seem to be just incorrect.
I tested this on a VIA EPIA-M9000 and -M10000.
it to the bss section and skips the initialization. This causes all
sorts of havoc because the bogus bss zero code clobbered previously set
variables. All our supported boot loaders already zero the bss, even
kgzip for the elf case. Since we dont generate a.out kernels, the old
a.out bootblocks and the a.out kgzip are not a factor anymore.
reset them only if they were previously in use. Unconditionally
resetting the registers wipes them out frequently, which interferes
with their use for kernel debugging.
While I'm here, be less verbose in the associated comment of a
neighboring function.
Noticed by: bde
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.
This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.
disabled.
- Change the apm driver to match the acpi driver's behavior by checking to
see if the device is disabled in the identify routine instead of in the
probe routine. This way if the device is disabled it is never created.
Note that a few places (ips(4), Alpha SMP) used "disable" instead of
"disabled" for their hint names, and these hints must be changed to
"disabled". If this is a big problem, resource_disabled() can always be
changed to honor both names.
Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.
sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.
If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.
Reviewed by: tmm, gibbs
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel
bus_dma async callback scheme. Note that sparc64 does not seem to do
async callbacks. Note that ia64 callbacks might not be MPSAFE at the
moment. Note that powerpc doesn't seem to do async callbacks due to
the implementation being incomplete.
Reviewed by: mostly silence on arch@
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.
Two details:
1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.
2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
while after the legacy device was added since this driver hangs from
legacy and not nexus.
- Make several methods non-static so they can be reused in a mptable
host -> pci bridge driver that will be added at a later date.
- Let legacy_pcib() use pcibios_pcib_route_interrupt() directly instead of
wrapping it in a private function. Originally, I thought I was going to
have the nexus_pcib() driver make a runtime APIC vs. 8259A check and call
the appropriate routing method (MPTable vs. PIR) that way, but it ended
up being cleaner to make nexus_pcib() just work with PIR and have a
separate host -> pci bridge driver for the mptable/apic case.
bridge lives on (i.e., the parent bus) when probing the PIR table for a
bus. This could cause the PCIBIOS PCI-PCI bridge driver to bogusly attach
to bridges that weren't in the PIR but whose parent bus was in the PIR.
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.
Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.
This change officially marks the ability on ia64 for 1:1 threading.
Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64
on my part. The output asm looks correct with the previous commit in place
and it works on amd64, but on my laptop I got a spew of AE_BAD_PARAMETER
errors trying to unlock the acpi global lock.
and releasing ACPI global locks instead of (ab)using the pointers to those
locks as the constants. Also, rather than require that the address of
the lock be stored in a register, use a memory constraint allowing the
memory address to be used directly.
Noticed by: peter