in the log message for kern_sched.c 1.83 (which should have been
repo-copied to preserve history for this file), the (4BSD) scheduler
algorithm only works right if stathz is nearly 128 Hz. The old
commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz
but there was a scale factor of 4 to give the requirement of 64 Hz,
and rev.1.83 changed the scale factor so that the requirement became
128 Hz. The change of the scale factor was incomplete in the SMP
case. Then scheduling ticks are provided by smp_ncpu CPUs, and the
scheduler cannot tell the difference between this and 1 CPU providing
scheduling ticks smp_ncpu times faster, so we need another scale
factor of smp_ncp or an algorithm change.
This quick fix uses the scale factor without even trying to optimize
the runtime divisions required for this as is done for the other
scale factor.
The main algorithmic problem is the clamp on the scheduling tick counts.
This was 295; it is now approximately 295 * smp_ncpu. When the limit
is reached, threads get free timeslices and scheduling becomes very
unfair to the threads that don't hit the limit. The limit can be
reached and maintained in the worst case if the load average is larger
than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with
2 CPUs before this change), so there are algorithmic problems even for
a load average of 1. Fortunately, the worst case isn't common enough
for the problem to be very noticeable (it is mainly for niced CPU hogs
competing with less nice CPU hogs).
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
that msync(2) is its only caller.
- Migrate the parts of the old vm_map_clean() that examined the internals
of a vm object to a new function vm_object_sync() that is implemented in
vm_object.c. At the same, introduce the necessary vm object locking so
that vm_map_sync() and vm_object_sync() can be called without Giant.
Reviewed by: tegge
are zx1 based machines and they don't particularly like it when we
poke at them with PC legacy code. The atkbd and psm devices were
disabled in the hints file so that one could enable them on machines
that support legacy devices, but that's not really something you can
expect from a first-time installer. This still leaves syscons (sc)
and the vga device, which were enabled by default and wrecking havoc
anyway. We could disable them by default like the atkbd and psm
devices, but there's really no point in pretending we're in a better
shape that way.
o pickup Giant in divert_packet to protect sbappendaddr since it
can be entered through MPSAFE callouts or through ip_input when
mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
on an ICMP redirect (may not be needed)
Supported by: FreeBSD Foundation
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond
Supported by: FreeBSD Foundation
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
PALM_4 initialisation hack. I've not confirmed it myself, but
seeing as we already don't use it for the Sony Clie_41, let's drop
it from the Clie_40 also and see what happens.
(Question: What about the Clie_S360 and Clie_NX60 devices? Do we
need to drop Palm4 from those as well? Possibly, but I've not had
any reports about those so I don't know.)
PR: kern/56575
MFC after: 3 days
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.
since there is no direct association between M:N thread and kse,
sometimes, a thread does not have a kse, in that case, return a pctcpu
from its last kse, it is not perfect, but gives a good number to be
displayed.
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().
In collaboration with: tegge
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).
Supported by: FreeBSD Foundation
successfully initialized in the label as a socket peer label, not a
socket label. For current policy modules, this didn't make a
difference, but if a policy module had label data in the peer label
that was to be GC'd in a different way than the normal socket label,
it might have been a problem.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
that it was obsoleted. it is better to fail than just hiding use
of ipsec_gethist() at build.
Sugessted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Removed banal comments about ELAN*. Complain about ELAN* being misnamed
instead (so that these options are not obviously related to a CPU and
don't sort with CPU_ELAN).
Complain about CPU_DISABLE_CMPXCHG being in the wrong namespace.
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd. This caused major havoc (for some
reason it appeared most frequently for folks running natd). Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us. This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.
Supported by: FreeBSD Foundation
Instead, let the vm objects be lazily instantiated at fault time. This
results in the allocation of fewer vm objects and vm map entries due to
aggregation in the vm system.
sure we handle stacked registers properly by taking into account
that:
1. bspstore points after the frame (due to cover),
2. we need to adjust for intermediate NaT collections.
not in scheduler specific data because eventually it will be required by
all schedulers.
- Implement sched_pin and unpin as an inline for now. If a scheduler needs
to do something more complicated than adjusting the pinned count we can
move this into a function later in an api compatible way.
o move it from subr_bus.c to netisr.c where it more properly belongs
o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to
grab Giant as needed when MPSAFE operation is enabled
Supported by: FreeBSD Foundation
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.
Reported by: kan
on ia64. The bug is present in i386 as well but didn't show up due to
more relaxed page protections. This fix has been submitted to the vendor.
Submitted by: marcel
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.
Reported by: Michal Mertl <mime@traveller.cz>
Don't put the name of the file in a comment. $FreeBSD$ gives more than
enough about the file's pathname.
Fixed misdescription of the file. It isn't the whole unified Makefile...
Moved the settings of WERROR and of the standard extra CFLAGS
-finline-limit and -fno-strict-aliasing to a less wrong place. They
were in the section for profiling.
of ffs_reload()'s mountp parameter to mp in rev.1.28 of ffs_vnops.c
had not been merged here.
ext2fs_reload() is still missing locking from not merging other changes
to ffs_reload(), but none of these is related to recent locking changes.
but can be enabled by setting hw.atm.hatmN.mpsafe in the kernel
environment to a non-zero value before loading the driver. When
the problems with network MPSAFEty have been sorted out this will
be removed and the driver will default to MPSAFE.
We put them directly onto the free list instead of calling the
external mbuf free routine (that routine would have cleaned the flag).
This fixes a bug which manifests itself in falsely reporting a lot of used
buffers when configuring the interface down.
conservative lock. The problem with the lock-less algorithm is that
it suffers from the ABA problem. Running an application with funnels
a couple of 100kpkts/s through the netgraph system on a dual CPU system
with MPSAFE drivers will panic almost immediatly with the old algorithm.
It may be possible to eliminate the contention between threads that insert
free items into the list and those that get free items by using the
Michael/Scott queue algorithm that has two locks.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.
Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.
Discussed with: jeff
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
the amd64 implementation of the pcpu macros is even more verbose than on
i386 and that causes gcc to way overestimate the complexity of this
2-instruction macro. The other platforms can probably lower their
default values.
from the xe driver. Should probably be removed when current probe/attach
problems with the driver are fixed, but is useful now when requesting
diagnostic information from users.
Reviewed by: imp (mentor)
isa_device pointer as its argument and uses that to call the driver's
interrupt handler passing the unit number as its argument. This should
fix COMPAT_OLDISA devices with a unit number of 0.
Reviewed by: peter
Reported by: bde
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
- Add the following functions to the api: sched_bind(), sched_unbind(),
sched_pin(), and sched_unpin(). Bind/unbind are used for traditional
cpu binding. Pin and unpin are meant to allow the kernel to hold a thread
on a particular cpu so that it may cache per-cpu data without fear of
being migrated.
no matter where in the directory structure it may be. Use this and the "-k"
flag in the generated gdbinit files so that the "getsyms" function in gdb
requires no user intervention to run and will find every module if they're
in the kernel build's module directory. This is still quite useful for
cases where gdb knows that the path for some modules is /boot/kernel and
others are in the object directory for /usr/src/sys/$ARCH/compile/kernel.
Approved by: grog
waitrunningbufspace() calls so that they are always able to
proceed and clean up buffer space.
Submitted by: Brian Fundakowski Feldman <green@freebsd.org>
o Fix minor type problems
o Fix minor problem with a couple debug printfs.
o Default to a sane media type when none is reported.
o Minor style changes
The PR complains this will fix the IBM 300GL cards.
Submitted by: Max Gotlib
PR: 11462
Use zpfind() to see if the process became a zombie if pfind() doesn't find it
and if the caller wants to know about process death, so that the caller knows
the process died even if it happened before the kevent was actually registered.
MFC after: 1 week
This fixes a race condition (specifically with signal events) that could
lead to the kn being re-inserted into the list after it has been destroyed,
which is not something we want to happen.
PR: kern/58258
operating mode to HostAP, the card will lock up indefinitely (but
the wi(4) driver can recover if you eject the card). The problem is
that the card needs to be "reset" in a way before you even change the
media to hostap. In practice this isn't as noticeable because you
probably do some operation beforehand which prevents the lock-up
before you enable hostap mode.
e.g.:
"ifconfig wi0 up media autoselect mediaopt hostap" will lock up
(if you just inserted the card).
"ifconfig wi0 up ssid foo media autoselect mediaopt hostap" won't lock up.
- Compile 'device acpi' into GENERIC by default as well. Note that
the beastie loader menu item to disable ACPI still works if ACPI is
compiled into the kernel.
let the MD code choose whether or not to implement such a policy. The new
i386 interrupt code allows multiple FAST handlers for a given source for
example. However, the code does not allow FAST and non-FAST handlers to be
mixed.
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.
- Use intr_suspend/resume() callouts to the interrupt code layer which
suspends and resumes all the known interrupt sources instead of calling
icu_reinit() directly.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
the root path. This is reported to make non-PXE netbooting, such as
is used on sparc64 systems, work correctly when the TFTP server is
not the same as the root server.
PR: kern/57328
Submitted by: Per Kristian Hove <Per.Hove@math.ntnu.no>
header copy made on input path: this is now handled differently.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
is really EtherExpress or EEPro or what, but it does appear in a
couple of ethernet cards that have appeared recently on ebay. Silicom
appears to make these cards, and they have the 82595TX chipset in
them, and sometimes uarts. The ex driver needs some work to support
these cards, but I thought I'd get the device into pccarddevs.
The hardware driver decides the name under /dev/led and provides
the function to turn the lamp on/off.
All leds are serviced by a single timeout which runs at a basic rate
of hz/10.
The LED is controlled by ascii strings as follows.
0 Turn off.
1 Turn on.
f Flash: _-
f2 Flash: __--
f3 Flash: ___---
f4...f9 etc.
d%d Digits. "d12": -__________-_-______________________________
s%s String, roll your own:
'a-j' gives on for (1...10)/10 sec.
'A-J' gives on for (1...10)/10 sec.
'sAaAbBa': _-_--__-
m%s Morse
'.' dot
'-' dash
' ' letter space
'\n' word space
My mdoc skills do not reach to express that.
Add a sysctl declaration for hw.ata.atapi_dma, which had gone MIA (though
setting it in loader.conf still worked, it was not visible at runtime)
Approved by: sos
to the pci attachment. Cardbus is a derived class of pci so all pci
drivers are automatically available for matching against cardbus devices.
Reviewed by: imp
message encoding and decoding stuff into the base module. All of this
is accessed by several of the NgATM modules and putting this into
atmbase reduceds the memory footprint.
cr.isr sanity check. We actually encounter insanities, which very
likely means that the insanity check itself is insane. Remove an empty
comment while I'm at it.
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.
Supported by: FreeBSD Foundation
idle. They figure out that we're idle fast enough that the cache pollution
introduces by scanning their run queue is more expensive than waiting
a little longer.
- Add kseq_setidle() to mark us as being idle. Use this in place of
kseq_find().
- Remove kseq_load_highest(), kseq_find() was the only consumer of this
interface. kseq_balance() has it's own customized version that finds the
lowest and highest loads simultaneously.
Continuously told that this would be faster by: terry
GEOM was not designed to handle media that does not have
a size. Blank CD's are of that type, so cheat and set the
media size to -1. This allows burning to work, but makes
GEOM issue outofrange reads that makes the ATAPI subsystem
spew out a few warnings. GEOM should be tought about this.
GEOM was not designed to handle changing the sectorsize
between opens. Writing multitack CD's with both audio and
data tracks needs to change sector size on the fly. We
cheat here and stuff the current sectorsize into GEOM
private internals. GEOM should grow some clean way for this.
o Fix MFC cards. We were bogusly setting CCR_IOBASE[01] and CCR_IOLIMIT.
now when we activate the resource, we adjust these for MFC cards, per the
spec.
o Change type of pf_mfc_* to be bus_addr_t, which is more correct than
long.
This makes my 3C362D/3C363D and 3CXEM556 cards work! Woo Hoo!