- Add the following functions to the api: sched_bind(), sched_unbind(),
sched_pin(), and sched_unpin(). Bind/unbind are used for traditional
cpu binding. Pin and unpin are meant to allow the kernel to hold a thread
on a particular cpu so that it may cache per-cpu data without fear of
being migrated.
no matter where in the directory structure it may be. Use this and the "-k"
flag in the generated gdbinit files so that the "getsyms" function in gdb
requires no user intervention to run and will find every module if they're
in the kernel build's module directory. This is still quite useful for
cases where gdb knows that the path for some modules is /boot/kernel and
others are in the object directory for /usr/src/sys/$ARCH/compile/kernel.
Approved by: grog
waitrunningbufspace() calls so that they are always able to
proceed and clean up buffer space.
Submitted by: Brian Fundakowski Feldman <green@freebsd.org>
o Fix minor type problems
o Fix minor problem with a couple debug printfs.
o Default to a sane media type when none is reported.
o Minor style changes
The PR complains this will fix the IBM 300GL cards.
Submitted by: Max Gotlib
PR: 11462
Use zpfind() to see if the process became a zombie if pfind() doesn't find it
and if the caller wants to know about process death, so that the caller knows
the process died even if it happened before the kevent was actually registered.
MFC after: 1 week
This fixes a race condition (specifically with signal events) that could
lead to the kn being re-inserted into the list after it has been destroyed,
which is not something we want to happen.
PR: kern/58258
operating mode to HostAP, the card will lock up indefinitely (but
the wi(4) driver can recover if you eject the card). The problem is
that the card needs to be "reset" in a way before you even change the
media to hostap. In practice this isn't as noticeable because you
probably do some operation beforehand which prevents the lock-up
before you enable hostap mode.
e.g.:
"ifconfig wi0 up media autoselect mediaopt hostap" will lock up
(if you just inserted the card).
"ifconfig wi0 up ssid foo media autoselect mediaopt hostap" won't lock up.
- Compile 'device acpi' into GENERIC by default as well. Note that
the beastie loader menu item to disable ACPI still works if ACPI is
compiled into the kernel.
let the MD code choose whether or not to implement such a policy. The new
i386 interrupt code allows multiple FAST handlers for a given source for
example. However, the code does not allow FAST and non-FAST handlers to be
mixed.
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.
- Use intr_suspend/resume() callouts to the interrupt code layer which
suspends and resumes all the known interrupt sources instead of calling
icu_reinit() directly.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
the root path. This is reported to make non-PXE netbooting, such as
is used on sparc64 systems, work correctly when the TFTP server is
not the same as the root server.
PR: kern/57328
Submitted by: Per Kristian Hove <Per.Hove@math.ntnu.no>
header copy made on input path: this is now handled differently.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
is really EtherExpress or EEPro or what, but it does appear in a
couple of ethernet cards that have appeared recently on ebay. Silicom
appears to make these cards, and they have the 82595TX chipset in
them, and sometimes uarts. The ex driver needs some work to support
these cards, but I thought I'd get the device into pccarddevs.
The hardware driver decides the name under /dev/led and provides
the function to turn the lamp on/off.
All leds are serviced by a single timeout which runs at a basic rate
of hz/10.
The LED is controlled by ascii strings as follows.
0 Turn off.
1 Turn on.
f Flash: _-
f2 Flash: __--
f3 Flash: ___---
f4...f9 etc.
d%d Digits. "d12": -__________-_-______________________________
s%s String, roll your own:
'a-j' gives on for (1...10)/10 sec.
'A-J' gives on for (1...10)/10 sec.
'sAaAbBa': _-_--__-
m%s Morse
'.' dot
'-' dash
' ' letter space
'\n' word space
My mdoc skills do not reach to express that.
Add a sysctl declaration for hw.ata.atapi_dma, which had gone MIA (though
setting it in loader.conf still worked, it was not visible at runtime)
Approved by: sos
to the pci attachment. Cardbus is a derived class of pci so all pci
drivers are automatically available for matching against cardbus devices.
Reviewed by: imp
message encoding and decoding stuff into the base module. All of this
is accessed by several of the NgATM modules and putting this into
atmbase reduceds the memory footprint.
cr.isr sanity check. We actually encounter insanities, which very
likely means that the insanity check itself is insane. Remove an empty
comment while I'm at it.
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.
Supported by: FreeBSD Foundation
idle. They figure out that we're idle fast enough that the cache pollution
introduces by scanning their run queue is more expensive than waiting
a little longer.
- Add kseq_setidle() to mark us as being idle. Use this in place of
kseq_find().
- Remove kseq_load_highest(), kseq_find() was the only consumer of this
interface. kseq_balance() has it's own customized version that finds the
lowest and highest loads simultaneously.
Continuously told that this would be faster by: terry
GEOM was not designed to handle media that does not have
a size. Blank CD's are of that type, so cheat and set the
media size to -1. This allows burning to work, but makes
GEOM issue outofrange reads that makes the ATAPI subsystem
spew out a few warnings. GEOM should be tought about this.
GEOM was not designed to handle changing the sectorsize
between opens. Writing multitack CD's with both audio and
data tracks needs to change sector size on the fly. We
cheat here and stuff the current sectorsize into GEOM
private internals. GEOM should grow some clean way for this.
o Fix MFC cards. We were bogusly setting CCR_IOBASE[01] and CCR_IOLIMIT.
now when we activate the resource, we adjust these for MFC cards, per the
spec.
o Change type of pf_mfc_* to be bus_addr_t, which is more correct than
long.
This makes my 3C362D/3C363D and 3CXEM556 cards work! Woo Hoo!
o Remove redundant $FreeBSD$
o Better comments about ep_get_macaddr.
o remove one tab in a switch statement (style only)
o Recognize ID 0x0035 as the device ID for the 3CXEM556 that I have. This
makes the 3CXEM556 work for me. Not 100% sure this is the assigned ID,
as I don't have the datasheets for this part, but it does work and get
the correct ethrnet address.
o Comment about the whole fake IRQ 3 thing. some need it, some don't, all
work with it.
the total load, the timeshare load, and the number of threads that can
be migrated to another cpu. Account for these seperately.
- Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE
can be migrated to another CPU. Currently, this only checks to see if
we're an interrupt handler. Eventually this will also be used to support
CPU binding.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).
display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.
Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.
Tested with: buildworld, LINT
previously only considered the send sequence space. Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.
The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
slice assignment. Add a comment describing what it does.
- Remove a stale XXX comment, the nice should not impact the interactivity,
nice adjustments only effect non-interactive tasks in ULE.
- Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at
least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice
+20 tasks as intended.
- SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we
do not have to account for it in the few places that we use it.
Requested by: bde
0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm
quite a bit. Before, it dealt with arbitrary values which required us
to do nasty integer division tricks that didn't quite work out correctly.
- Chnage sched_wakeup() to detect conditions where the slp+runtime could
exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for
longer than 6 seconds. In this case, we'll just clear the runtime and
set the sleep time to the max.
- Define a new function, sched_interact_fork() which updates the slp+runtime
of a newly forked thread. We want to limit the amount of history retained
from the parent so that we learn the child's behavior quickly. We don't,
however want to decay it to nothing. Previously, we would simply divide
each parameter by 100 whenever we forked. After a few forks the values
would reach 0 and tasks would not be considered interactive.
- Add another KTR entry, cleanup some existing entries.
- Remove a useless sched_interact_update() from sched_priority(). This is
already done by the callers that require it.
destination objects are locked on entry and exit. Add comments to
the callers noting that the locks can be released by swap_pager_copy().
- Remove several instances of GIANT_REQUIRED.
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.
- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.
This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
o change os glue API to be compatible with Linux so hal.o's can
be used on any system
o add ABI version to catch driver-HAL mismatches
o move hal version information from ah_osdep.c to binary component
o remove ath_hal_wait os glue component
o assign constant values to all enums to avoid potential compiler
incompatibilities
o add support for 3Com badged cards (PCI vendor ID)
o add support for IBM mini-pci cards (PCI device ID)
o expose MAC, PHY, and radio hardware revisions
o support for big-endian platforms
o new method to set slot time in us
o bug fix for 5211: beacon timers not setup correctly
o bug fix for 5212: don't crash when handed a 5112 radio
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.
Reviewed by: peter
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
mbuf cluster, copy the data to a separate mbuf that do not use a
cluster. this change will reduce the possiblity of packet loss
in the socket layer.
Obtained from: KAME
256 raw receive buffers (96 byte each) fit into one page. This breaks the
limit imposed by the usage of an uint8_t for the buffer number. Restrict
the allocation size for buffers to a maximum of 8192.
- Add an IPI based mechanism for migrating kses. This mechanism is
broken down into several components. This is intended to reduce cache
thrashing by eliminating most cases where one cpu touches another's
run queues.
- kseq_notify() appends a kse to a lockless singly linked list and
conditionally sends an IPI to the target processor. Right now this is
protected by sched_lock but at some point I'd like to get rid of the
global lock. This is why I used something more complicated than a
standard queue.
- kseq_assign() processes our list of kses that have been assigned to us
by other processors. This simply calls sched_add() for each item on the
list after clearing the new KEF_ASSIGNED flag. This flag is used to
indicate that we have been appeneded to the assigned queue but not
added to the run queue yet.
- In sched_add(), instead of adding a KSE to another processor's queue we
use kse_notify() so that we don't touch their queue. Also in sched_add(),
if KEF_ASSIGNED is already set return immediately. This can happen if
a thread is removed and readded so that the priority is recorded properly.
- In sched_rem() return immediately if KEF_ASSIGNED is set. All callers
immediately readd simply to adjust priorites etc.
- In sched_choose(), if we're running an IDLE task or the per cpu idle thread
set our cpumask bit in 'kseq_idle' so that other processors may know that
we are idle. Before this, make a single pass through the run queues of
other processors so that we may find work more immediately if it is
available.
- In sched_runnable(), don't scan each processor's run queue, they will IPI
us if they have work for us to do.
- In sched_add(), if we're adding a thread that can be migrated and we have
plenty of work to do, try to migrate the thread to an idle kseq.
- Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into
consideration.
- No longer use kseq_choose() to steal threads, it can lose it's last
argument.
- Create a new function runq_steal() which operates like runq_choose() but
skips threads based on some criteria. Currently it will not steal
PRI_ITHD threads. In the future this will be used for CPU binding.
- Create a kseq_steal() that checks each run queue with runq_steal(), use
kseq_steal() in the places where we used kseq_choose() to steal with
before.
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
growable stack. Typically for rstack, the faulting address can be
identical to the record end of the upward growable entry, and
very likely is on ia64. The KASSERT tested for greater than, not
greater equal, so whenever the register stack had to be grown
the assertion fired.
2. When we grow the upward growable stack entry and adjust the
unlying object, don't forget to adjust the size of the VM map.
Not doing so would trigger an assert in vm_mapzdtor().
Pointy hat: marcel (for not testing with INVARIANTS).
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.
Modify the calculation of minifree in the same way.
Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.
Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
that libtool-using packages seem to love using this flag.
/usr/include/sys/cdefs.h:184:5: warning: "__STDC_VERSION__" is not defined
/usr/include/sys/cdefs.h:372:5: warning: "_POSIX_C_SOURCE" is not defined
/usr/include/sys/cdefs.h:378:5: warning: "_POSIX_C_SOURCE" is not defined
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
are now in the header of the external buffer itself which allows us
to manipulate them in the free routine without having to lock the softc
structure or the free list. To get space for these flags the chunk number
is reduced to 8 bit which amounts to a maximum of 256 chunks per allocated
page. This restriction is now enforced by a CTASSERT.
structures come out the right size.
Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))
I'm not sure if this fixes any actual problems or not.
the module. Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed. Instead
stop the callout first then lock.
Supported by: FreeBSD Foundation
o add some more debugging help for figuring out why folks are
getting complaints about releasing routing table entries with
a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG
Supported by: FreeBSD Foundation
direct dispatch) to avoid extensive kernel stack usage and to
avoid directly re-entering the network stack. The latter causes
locking problems when, for example, a complete TCP handshake`
happens w/o a context switch.
clobbers this variable. Long ago, when the idle loop wasn't in a
process, it set switchtime.tv_sec to zero to indicate that the time
needs to be read after the idle loop finishes. The special case for
this isn't needed now that there is an idle process (for each CPU).
The time is read in the normal way when the idle process is switched
away from. The seconds component of the time is only zero for the
first second after the uptime is set, and the mostly-dead code was only
executed during this time. (This was slightly broken by using uptimes
instead of times relative to the Epoch -- in the original version the
seconds component of the time was only 0 for the first second after
the Epoch.)
In mi_switch(), moved the setting of switchticks to just after the
first (and now only) setting of switchtime. This setting used to be
delayed since a late setting was needed for the idle case and an early
setting was not needed. Now the early setting is needed so that
fork_exit() doesn't need to set either switchtime or switchticks.
Removed now-completely-rotted comment attached to this. Most of the
code described by the comment had already moved to sched_switch().
very first cell in the mbuf should have a cell header word (of which
everything except the payload type and the CLP bit is ignored). All
other cells should be 48 byte and get the same header as the first cell.
This fixes a problem with sending more than 120000 raw cells/sec through
an HE155. The card seems to need 2 cell times to DMA the transmit buffer
ready queue entry and the transmit buffer descriptor so at 1/3 the
link rate the transmit buffer ready queue starts to fill up. Even with this
patch it's obviously impossible to send raw cells at link rate.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
make ip{,6}_ecn_egress() return integer to tell the caller that
this packet should be dropped.
- handle ECN at fragment reassembly in ip_input.c and frag6.c.
Obtained from: KAME
begin with sched_lock held but not recursed, so this variable was
always 0.
Removed fixup of sched_lock.mtx_recurse after context switches in
sched_switch(). Context switches always end with this variable in the
same state that it began in, so there is no need to fix it up. Only
sched_lock.mtx_lock really needs a fixup.
Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion
that sched_lock is owned and not recursed after it is fixed up. This
assertion much match the one in mi_switch(), and if sched_lock were
recursed then a non-null fixup of sched_lock.mtx_recurse would probably
be needed again, unlike in sched_switch(), since fork_exit() doesn't
return to its caller in the normal way.
Correct a bug when the number of pages for external mbufs was
very large. In this case the page number could overflow into the large
buffer flag. Make this more unlikley by move that flag further away.
is returned from the card to the driver. Add a counter that shows
how many times this allocation has failed. Note, that we could even
further delay the allocation of the mbuf until we know, that we need it
(there are no receive errors and the connection is open). This will be done
in a later commit.
Print the new statistics field in atmconfig.
atomic instructions instead. Remove the stuff used to track
whether an external mbuf travels through the system. This is
temporary only and will come back soon.
code has the typical branch prediction detour, which creates cross-
section branches. A LINT kernel is apparently large enough nowadays
that the .text and .text2 sections cannot always be layed-out so that
branches between them reach.
The fix is to stop using the alpha-specific bitops and instead use
the portable implementation used by all platforms other than alpha
and i386.
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".
Pointed out by: Jonathan Stone
Reviewed by: Robert Watson
reset in a child process after a fork(). Currently, however, only the
real timer is cleared while the virtual and profiling timers are inherited.
The realtimer is cleared because it lives directly in struct proc in
p_realtimer. It is in the zero'd section of struct proc. The other timers
live in the p_timer[] array in struct pstats. These timers are copied on
fork() rather than zero'd. The fix is to move p_timer[] to the zero'd
part of struct pstats so that they are zero'd instead of copied on fork().
Note: Since at least FreeBSD 2.0 (and possibly earlier) we've had storage
for two real interval timers. Now that the uarea is less important,
perhaps we could move all of p_timer[] over to struct proc and drop the
p_realtimer special case to fix that.
PR: kern/58647
Reported by: Dan Nelson <dnelson@allantgroup.com>
MFC after: 1 week
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.
The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.
There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.
Pointed out by: das@
Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.
the maximum number of pages for buffers) return -1 instead of 0.
This fixes a panic under conditions when many mbufs are needed.
Update the head pointer of the receive buffer pool queue even when
we could not supply a buffer to the chip. Otherwise the chip will
not re-interrupt us for another try. A better strategy would probably
be to remember this condition and to supply buffers without an interrupt
as soon as buffers get available.
Contributed by: Thomaswuerfl@gmx.de
- In sched_prio(), adjust the run queue for threads which may need to move
to the current queue due to priority propagation .
- In sched_switch(), fix style bug introduced when the KSE support went in.
Columns are 80 chars wide, not 90.
- In sched_switch(), Fix the comparison in the idle case and explicitly
re-initialize the runq in the not propagated case.
- Remove dead code in sched_clock().
- In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads
that have become runnable will get a chance to.
- In sched_runnable(), if we're not the IDLETD, we should not consider
curthread when examining the load. This mimics the 4BSD behavior of
returning 0 when the only runnable thread is running.
- In sched_userret(), remove the code for setting NEEDRESCHED entirely.
This is not necessary and is not implemented in 4BSD.
- Use the correct comparison in sched_add() when checking to see if an idle
prio task has had it's priority temporarily elevated.
instead of retrying them blindly.
This should fix some of the problems people have been having with cdrom
drives taking a long time to probe. This should also eliminate the need
for the initial TUR in cdsize().
cam_periph.c: Don't keep retrying if the error we get back is a fatal
error. This should help us detect the transition from
"Logical unit not ready, cause not reportable" to "Medium
not present" in the "TUR many" handler. (The TUR many
handler gets triggered for Logical unit not ready, cause
not reportable errors.)
scsi_cd.c: Remove the initial test unit ready in cdsize(). Hopefully
it isn't necessary after the above change.
Submitted by: gibbs (mostly)
Tested by: peter
MFC After: 2 weeks
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).
So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.