directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.
Supported by: FreeBSD Foundation
idle. They figure out that we're idle fast enough that the cache pollution
introduces by scanning their run queue is more expensive than waiting
a little longer.
- Add kseq_setidle() to mark us as being idle. Use this in place of
kseq_find().
- Remove kseq_load_highest(), kseq_find() was the only consumer of this
interface. kseq_balance() has it's own customized version that finds the
lowest and highest loads simultaneously.
Continuously told that this would be faster by: terry
GEOM was not designed to handle media that does not have
a size. Blank CD's are of that type, so cheat and set the
media size to -1. This allows burning to work, but makes
GEOM issue outofrange reads that makes the ATAPI subsystem
spew out a few warnings. GEOM should be tought about this.
GEOM was not designed to handle changing the sectorsize
between opens. Writing multitack CD's with both audio and
data tracks needs to change sector size on the fly. We
cheat here and stuff the current sectorsize into GEOM
private internals. GEOM should grow some clean way for this.
o Fix MFC cards. We were bogusly setting CCR_IOBASE[01] and CCR_IOLIMIT.
now when we activate the resource, we adjust these for MFC cards, per the
spec.
o Change type of pf_mfc_* to be bus_addr_t, which is more correct than
long.
This makes my 3C362D/3C363D and 3CXEM556 cards work! Woo Hoo!
o Remove redundant $FreeBSD$
o Better comments about ep_get_macaddr.
o remove one tab in a switch statement (style only)
o Recognize ID 0x0035 as the device ID for the 3CXEM556 that I have. This
makes the 3CXEM556 work for me. Not 100% sure this is the assigned ID,
as I don't have the datasheets for this part, but it does work and get
the correct ethrnet address.
o Comment about the whole fake IRQ 3 thing. some need it, some don't, all
work with it.
the total load, the timeshare load, and the number of threads that can
be migrated to another cpu. Account for these seperately.
- Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE
can be migrated to another CPU. Currently, this only checks to see if
we're an interrupt handler. Eventually this will also be used to support
CPU binding.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).
display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.
Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.
Tested with: buildworld, LINT
previously only considered the send sequence space. Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.
The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
slice assignment. Add a comment describing what it does.
- Remove a stale XXX comment, the nice should not impact the interactivity,
nice adjustments only effect non-interactive tasks in ULE.
- Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at
least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice
+20 tasks as intended.
- SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we
do not have to account for it in the few places that we use it.
Requested by: bde
0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm
quite a bit. Before, it dealt with arbitrary values which required us
to do nasty integer division tricks that didn't quite work out correctly.
- Chnage sched_wakeup() to detect conditions where the slp+runtime could
exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for
longer than 6 seconds. In this case, we'll just clear the runtime and
set the sleep time to the max.
- Define a new function, sched_interact_fork() which updates the slp+runtime
of a newly forked thread. We want to limit the amount of history retained
from the parent so that we learn the child's behavior quickly. We don't,
however want to decay it to nothing. Previously, we would simply divide
each parameter by 100 whenever we forked. After a few forks the values
would reach 0 and tasks would not be considered interactive.
- Add another KTR entry, cleanup some existing entries.
- Remove a useless sched_interact_update() from sched_priority(). This is
already done by the callers that require it.
destination objects are locked on entry and exit. Add comments to
the callers noting that the locks can be released by swap_pager_copy().
- Remove several instances of GIANT_REQUIRED.
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.
- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.
This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
o change os glue API to be compatible with Linux so hal.o's can
be used on any system
o add ABI version to catch driver-HAL mismatches
o move hal version information from ah_osdep.c to binary component
o remove ath_hal_wait os glue component
o assign constant values to all enums to avoid potential compiler
incompatibilities
o add support for 3Com badged cards (PCI vendor ID)
o add support for IBM mini-pci cards (PCI device ID)
o expose MAC, PHY, and radio hardware revisions
o support for big-endian platforms
o new method to set slot time in us
o bug fix for 5211: beacon timers not setup correctly
o bug fix for 5212: don't crash when handed a 5112 radio
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.
Reviewed by: peter
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
mbuf cluster, copy the data to a separate mbuf that do not use a
cluster. this change will reduce the possiblity of packet loss
in the socket layer.
Obtained from: KAME
256 raw receive buffers (96 byte each) fit into one page. This breaks the
limit imposed by the usage of an uint8_t for the buffer number. Restrict
the allocation size for buffers to a maximum of 8192.
- Add an IPI based mechanism for migrating kses. This mechanism is
broken down into several components. This is intended to reduce cache
thrashing by eliminating most cases where one cpu touches another's
run queues.
- kseq_notify() appends a kse to a lockless singly linked list and
conditionally sends an IPI to the target processor. Right now this is
protected by sched_lock but at some point I'd like to get rid of the
global lock. This is why I used something more complicated than a
standard queue.
- kseq_assign() processes our list of kses that have been assigned to us
by other processors. This simply calls sched_add() for each item on the
list after clearing the new KEF_ASSIGNED flag. This flag is used to
indicate that we have been appeneded to the assigned queue but not
added to the run queue yet.
- In sched_add(), instead of adding a KSE to another processor's queue we
use kse_notify() so that we don't touch their queue. Also in sched_add(),
if KEF_ASSIGNED is already set return immediately. This can happen if
a thread is removed and readded so that the priority is recorded properly.
- In sched_rem() return immediately if KEF_ASSIGNED is set. All callers
immediately readd simply to adjust priorites etc.
- In sched_choose(), if we're running an IDLE task or the per cpu idle thread
set our cpumask bit in 'kseq_idle' so that other processors may know that
we are idle. Before this, make a single pass through the run queues of
other processors so that we may find work more immediately if it is
available.
- In sched_runnable(), don't scan each processor's run queue, they will IPI
us if they have work for us to do.
- In sched_add(), if we're adding a thread that can be migrated and we have
plenty of work to do, try to migrate the thread to an idle kseq.
- Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into
consideration.
- No longer use kseq_choose() to steal threads, it can lose it's last
argument.
- Create a new function runq_steal() which operates like runq_choose() but
skips threads based on some criteria. Currently it will not steal
PRI_ITHD threads. In the future this will be used for CPU binding.
- Create a kseq_steal() that checks each run queue with runq_steal(), use
kseq_steal() in the places where we used kseq_choose() to steal with
before.
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
growable stack. Typically for rstack, the faulting address can be
identical to the record end of the upward growable entry, and
very likely is on ia64. The KASSERT tested for greater than, not
greater equal, so whenever the register stack had to be grown
the assertion fired.
2. When we grow the upward growable stack entry and adjust the
unlying object, don't forget to adjust the size of the VM map.
Not doing so would trigger an assert in vm_mapzdtor().
Pointy hat: marcel (for not testing with INVARIANTS).
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.
Modify the calculation of minifree in the same way.
Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.
Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
that libtool-using packages seem to love using this flag.
/usr/include/sys/cdefs.h:184:5: warning: "__STDC_VERSION__" is not defined
/usr/include/sys/cdefs.h:372:5: warning: "_POSIX_C_SOURCE" is not defined
/usr/include/sys/cdefs.h:378:5: warning: "_POSIX_C_SOURCE" is not defined
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
are now in the header of the external buffer itself which allows us
to manipulate them in the free routine without having to lock the softc
structure or the free list. To get space for these flags the chunk number
is reduced to 8 bit which amounts to a maximum of 256 chunks per allocated
page. This restriction is now enforced by a CTASSERT.
structures come out the right size.
Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))
I'm not sure if this fixes any actual problems or not.
the module. Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed. Instead
stop the callout first then lock.
Supported by: FreeBSD Foundation
o add some more debugging help for figuring out why folks are
getting complaints about releasing routing table entries with
a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG
Supported by: FreeBSD Foundation
direct dispatch) to avoid extensive kernel stack usage and to
avoid directly re-entering the network stack. The latter causes
locking problems when, for example, a complete TCP handshake`
happens w/o a context switch.
clobbers this variable. Long ago, when the idle loop wasn't in a
process, it set switchtime.tv_sec to zero to indicate that the time
needs to be read after the idle loop finishes. The special case for
this isn't needed now that there is an idle process (for each CPU).
The time is read in the normal way when the idle process is switched
away from. The seconds component of the time is only zero for the
first second after the uptime is set, and the mostly-dead code was only
executed during this time. (This was slightly broken by using uptimes
instead of times relative to the Epoch -- in the original version the
seconds component of the time was only 0 for the first second after
the Epoch.)
In mi_switch(), moved the setting of switchticks to just after the
first (and now only) setting of switchtime. This setting used to be
delayed since a late setting was needed for the idle case and an early
setting was not needed. Now the early setting is needed so that
fork_exit() doesn't need to set either switchtime or switchticks.
Removed now-completely-rotted comment attached to this. Most of the
code described by the comment had already moved to sched_switch().
very first cell in the mbuf should have a cell header word (of which
everything except the payload type and the CLP bit is ignored). All
other cells should be 48 byte and get the same header as the first cell.
This fixes a problem with sending more than 120000 raw cells/sec through
an HE155. The card seems to need 2 cell times to DMA the transmit buffer
ready queue entry and the transmit buffer descriptor so at 1/3 the
link rate the transmit buffer ready queue starts to fill up. Even with this
patch it's obviously impossible to send raw cells at link rate.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
make ip{,6}_ecn_egress() return integer to tell the caller that
this packet should be dropped.
- handle ECN at fragment reassembly in ip_input.c and frag6.c.
Obtained from: KAME
begin with sched_lock held but not recursed, so this variable was
always 0.
Removed fixup of sched_lock.mtx_recurse after context switches in
sched_switch(). Context switches always end with this variable in the
same state that it began in, so there is no need to fix it up. Only
sched_lock.mtx_lock really needs a fixup.
Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion
that sched_lock is owned and not recursed after it is fixed up. This
assertion much match the one in mi_switch(), and if sched_lock were
recursed then a non-null fixup of sched_lock.mtx_recurse would probably
be needed again, unlike in sched_switch(), since fork_exit() doesn't
return to its caller in the normal way.
Correct a bug when the number of pages for external mbufs was
very large. In this case the page number could overflow into the large
buffer flag. Make this more unlikley by move that flag further away.
is returned from the card to the driver. Add a counter that shows
how many times this allocation has failed. Note, that we could even
further delay the allocation of the mbuf until we know, that we need it
(there are no receive errors and the connection is open). This will be done
in a later commit.
Print the new statistics field in atmconfig.
atomic instructions instead. Remove the stuff used to track
whether an external mbuf travels through the system. This is
temporary only and will come back soon.
code has the typical branch prediction detour, which creates cross-
section branches. A LINT kernel is apparently large enough nowadays
that the .text and .text2 sections cannot always be layed-out so that
branches between them reach.
The fix is to stop using the alpha-specific bitops and instead use
the portable implementation used by all platforms other than alpha
and i386.
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".
Pointed out by: Jonathan Stone
Reviewed by: Robert Watson
reset in a child process after a fork(). Currently, however, only the
real timer is cleared while the virtual and profiling timers are inherited.
The realtimer is cleared because it lives directly in struct proc in
p_realtimer. It is in the zero'd section of struct proc. The other timers
live in the p_timer[] array in struct pstats. These timers are copied on
fork() rather than zero'd. The fix is to move p_timer[] to the zero'd
part of struct pstats so that they are zero'd instead of copied on fork().
Note: Since at least FreeBSD 2.0 (and possibly earlier) we've had storage
for two real interval timers. Now that the uarea is less important,
perhaps we could move all of p_timer[] over to struct proc and drop the
p_realtimer special case to fix that.
PR: kern/58647
Reported by: Dan Nelson <dnelson@allantgroup.com>
MFC after: 1 week
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.
The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.
There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.
Pointed out by: das@
Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.
the maximum number of pages for buffers) return -1 instead of 0.
This fixes a panic under conditions when many mbufs are needed.
Update the head pointer of the receive buffer pool queue even when
we could not supply a buffer to the chip. Otherwise the chip will
not re-interrupt us for another try. A better strategy would probably
be to remember this condition and to supply buffers without an interrupt
as soon as buffers get available.
Contributed by: Thomaswuerfl@gmx.de
- In sched_prio(), adjust the run queue for threads which may need to move
to the current queue due to priority propagation .
- In sched_switch(), fix style bug introduced when the KSE support went in.
Columns are 80 chars wide, not 90.
- In sched_switch(), Fix the comparison in the idle case and explicitly
re-initialize the runq in the not propagated case.
- Remove dead code in sched_clock().
- In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads
that have become runnable will get a chance to.
- In sched_runnable(), if we're not the IDLETD, we should not consider
curthread when examining the load. This mimics the 4BSD behavior of
returning 0 when the only runnable thread is running.
- In sched_userret(), remove the code for setting NEEDRESCHED entirely.
This is not necessary and is not implemented in 4BSD.
- Use the correct comparison in sched_add() when checking to see if an idle
prio task has had it's priority temporarily elevated.
instead of retrying them blindly.
This should fix some of the problems people have been having with cdrom
drives taking a long time to probe. This should also eliminate the need
for the initial TUR in cdsize().
cam_periph.c: Don't keep retrying if the error we get back is a fatal
error. This should help us detect the transition from
"Logical unit not ready, cause not reportable" to "Medium
not present" in the "TUR many" handler. (The TUR many
handler gets triggered for Logical unit not ready, cause
not reportable errors.)
scsi_cd.c: Remove the initial test unit ready in cdsize(). Hopefully
it isn't necessary after the above change.
Submitted by: gibbs (mostly)
Tested by: peter
MFC After: 2 weeks
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).
So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.
of lock acquires and releases performed.
- Move an assertion from vm_object_collapse() to vm_object_zdtor()
because it applies to all cases of object destruction.
Make the pccard attachment work with NEWCARD
Start locking of the driver, but only the macros are defined right now
Tested on: Megahertz CC10BT/2
# (These cards are very popular on ebay these days, and run < $10 including
# shipping from some sellers).
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.
if we do acquire an advisory lock, great! We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway. This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).
Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
type, rather than "object_label" as the first argument. This reduces
complexity a little for the consumer, and also makes it easier for
use to rename the underlying entry points in struct mac_policy_obj.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories