little/big endian fashion, so that network drivers can just reference
the standard implementation and don't have to bring their own.
As discussed on arch@.
Obtained from: NetBSD
protect the registers so it was trivially possible for a sync command and
i/o command to fight each other and confuse the controller. Make the
sync fib alloc/release functions inline and remove the somewhat worthless
AAC_SYNC_LOCK_FORCE flag. Thanks to Adil Katchi for helping me to track
this down in RELENG_4.
sched_clock() rather than using callouts. This means we no longer have to
take the load of the callout thread into consideration while balancing and
should make the balancing decisions simpler and more accurate.
Tested on: x86/UP, amd64/SMP
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:
so_qlen so_incqlen so_qstate
so_comp so_incomp so_list
so_head
While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.
While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and address
lock order concerns. In particular:
- Reorganize the optimistic concurrency behavior in accept1() to
always allocate a file descriptor with falloc() so that if we do
find a socket, we don't have to encounter the "Oh, there wasn't
a socket" race that can occur if falloc() sleeps in the current
code, which broke inbound accept() ordering, not to mention
requiring backing out socket state changes in a way that raced
with the protocol level. We may want to add a lockless read of
the queue state if polling of empty queues proves to be important
to optimize.
- In accept1(), soref() the socket while holding the accept lock
so that the socket cannot be free'd in a race with the protocol
layer. Likewise in netgraph equivilents of the accept1() code.
- In sonewconn(), loop waiting for the queue to be small enough to
insert our new socket once we've committed to inserting it, or
races can occur that cause the incomplete socket queue to
overfill. In the previously implementation, it was sufficient
to simply tested once since calling soabort() didn't release
synchronization permitting another thread to insert a socket as
we discard a previous one.
- In soclose()/sofree()/et al, it is the responsibility of the
caller to remove a socket from the incomplete connection queue
before calling soabort(), which prevents soabort() from having
to walk into the accept socket to release the socket from its
queue, and avoids races when releasing the accept mutex to enter
soabort(), permitting soabort() to avoid lock ordering issues
with the caller.
- Generally cluster accept queue related operations together
throughout these functions in order to facilitate locking.
Annotate new locking in socketvar.h.
It was based on the pty(4) driver which as a tty side an a non-tty side.
Nmdm(4) seems to have inherited two symmetric sides from pty but
unfortunately they are not quite ttys. Running a getty one one
side and tip on the other failed to produce NL->CRNL mapping for
instance.
Rip out the basically bogus cdevsw->{read,write} functions and rely
on ttyread() and ttywrite() which does the same thing.
Use taskqueue_swi_giant to run a task for either side to do what
needs to be done. (Direct calling is not an option as it leads to
recursion.) Trigger the task from the t_oproc and t_stop methods.
Default the ports to not ECHO. Since we neither rate limiting nor
emulation, two ports echoing each other is a really bad idea, which
can only be properly mitigated by rate limiting, rate emulation or
intelligent detection. Rate emulation would be a neat feature.
Ditch the modem-line emulation, if needed for some app, it needs
to be thought much more about how it interacts with the open/close
logic.
APIC interrupt pin based on the settings in the corresponding interrupt
source structure.
- Use ioapic_program_intpin() in place of manual frobbing of the intpin
configuration in ioapic_program_destination() and ioapic_register().
- Use ioapic_program_intpin() to implement suspend/resume support for I/O
APICs.
descriptors out of fdrop_locked() and into vn_closefile(). This
removes all knowledge of vnodes from fdrop_locked(), since the lock
behavior was specific to vnodes. This also removes the specific
requirement for Giant in fdrop_locked(), it's now only required by
code that it calls into.
Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
nextpkt within the m_hdr was not being initialized to NULL for
!M_PKTHDR cases. *Maybe* this will fix weird socket buffer
inconsistency panics, but we'll see.
every iteration of aac_startio(). This ensures that a command that is
deferred for lack of resources doesn't immediately get retried in the
aac_startio() loop. This avoids an almost certain livelock.
the socket is on an accept queue of a listen socket. This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
It doesn't take 'align' and 'flags' but 'master' instead, which is
a reference to the Master Zone, containing the backing Keg.
Pointed out by: Tim Robbins (tjr)
them to behave the same as if the SS_NBIO socket flag had been set
for this call. The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).
Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation. The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
Add two additional pairs of assertions, one at the end of the NFS
server event loop, and one one exit from the NFS daemon, that
assert that if debug.mpsafenet is enabled, Giant is not held, and
that if it is not enabled, Giant will be held. This is intended
to support debugging scenarios where Giant is "leaked" during NFS
processing.
my Elektor card. Note that the hints are necessary to specify the
IO base of the pcf chip. This enables to check the IO base when the
probe routine is called during ISA enumeration.
The interrupt driven code is mixed with polled mode, which is wrong
and produces supposed spurious interrupts at each access. I still have
to work on it.
install nfssvc(). It also updates the argument count, but did so
without setting SYF_MPSAFE, effectively removing the MPSAFE flag even
when syscalls.master indicates it doesn't require Giant. This change
forces the modevent to set MPSAFE as a flag to its internal notion of
an argument coutn.
Note: this duplication of information is a bad thing, but is a more
general problem I'm not currently willing to address.
kernel). No other sys/*.h file requires machine/foo.h to be included
before it. In addition, all the files that include rman.h would need
to include those two anyway. From these two perspectives, it is
traditional to include things like this.
This lets us stop treating sys/rman.h specially in every bus frontend
file.
a vnode. Not bumped into with asserts in the main tree because we
run the NFS server with Giant by default. Discovered by inspection.
Complete annotations of Giant acquisition/release to note that it's
only because of VFS that we acquire Giant in most places in the NFS
server.
long time, i.e., since the cleanup of the VM Page-queues code done two
years ago.
Reviewed by: Alan Cox <alc at freebsd.org>,
Matthew Dillon <dillon at backplane.com>
This is part 2/2 of fixing autonegotiation on hme(4) using DP83840A PHYs.
It appears to also fix the occasional problems to establish a link on
hme(4) using LU6612 PHYs and shouldn't hurt on those using QS6612 PHYs.
Obtained from: NetBSD
properly. This causes the autonegotiation to e.g. never establish a
100baseTX full-duplex link. The solution to this problem is to manually
write the capabilities from the BMSR to the ANAR every time a media
change occurs, even when already in autonegotiation mode.
The NetBSD way of doing this is to set their MIIF_FORCEANEG flag in the
NIC driver. This causes mii_phy_setmedia() to call mii_phy_auto() (which
will set the ANAR according to the BMSR) even when the PHY alread is in
autonegotiation mode. However, while doing the same on FreeBSD (which
involves porting the MIIF_FORCEANEG flag and converting nsphy.c to use
mii_phy_setmedia()) fixes autonegotiation, using mii_phy_setmedia()
causes this driver to no longer work properly in the other modes.
Another drawback of that approach is that this will also force writing
the ANAR on other PHYs whose drivers use mii_phy_setmedia() and which
are used with a NIC whose driver sets MIIF_FORCEANEG (e.g. hme(4) is
known to be used together with 3 different PHYs while only the DP83840A
require this workaround).
So instead of moving to MIIF_FORCEANEG, just call mii_phy_auto() in
nsphy_service() unconditionally when hanging off of a hme(4) and serving
a media change
This is part 1/2 of fixing autonegotiation on hme(4) using DP83840A PHYs.
pipes, since open pipes are linked off a usbd_interface structure
that is free()'d when the configuration index is changed. Attempting
to close or use such pipes later would access freed memory and
usually crash the system.
The only driver that is known to trigger this problem is if_axe,
which is itself at fault, but it is worth detecting the situation
to avoid the obscure crashes that result from this type of easily
made driver mistakes.
behaviour lost in the change from 4.x style netgraph tee nodes.
Alter the tee node to use the new method. Document the behaviour.
Step the ABI version number... old netgraph klds will refuse to load.
Better than just crashing.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
make the key name matching case-insensitive. There are some drivers
and .inf files that have mismatched cases, e.g. the driver will look
for "AdhocBand" whereas the .inf file specifies a registry key to be
created called "AdHocBand." The mismatch is probably a typo that went
undetected (so much for QA), but since Windows seems to be case-insensitive,
we should be too.
In if_ndis.c, initialize rates and channels correctly so that specify
frequences correctly when trying to set channels in the 5Ghz band, and
so that 802.11b rates show up for some a/b/g cards (which otherwise
appear to have no 802.11b modes).
Also, when setting OID_802_11_CONFIGURATION in ndis_80211_setstate(),
provide default values for the beacon interval, ATIM window and dwelltime.
The Atheros "Aries" driver will crash if you try to select ad-hoc mode
and leave the beacon interval set to 0: it blindly uses this value and
does a division by 0 in the interrupt handler, causing an integer
divide trap.
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.
Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.
For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.
Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week
correct. Instead, check it against the possible settings (_PRS) when
the link is probed. This is important when using APIC mode but link
devices still have PIC mode settings. This is also what Linux does.
Additional prodding by: Len Brown len dot brown at intel dot com
based on the destination sleep state. Add a method to restore the old
state on resume. This is needed for the case of suspending to a very low
state disabling a GPE (i.e. S4), resuming, and then suspending to a higher
state (i.e. S3). This case should now keep the proper GPEs enabled.
device can wake the system. For example:
dev.root0.nexus0.acpi0.acpi_lid0.wake: 1
dev.root0.nexus0.acpi0.acpi_button0.wake: 1
dev.root0.nexus0.acpi0.pcib0.wake: 0
dev.root0.nexus0.acpi0.sio0.wake: 0
been developed for use with FreeBSD, version 4.8 and later.
Submitted by: Hema Joyce
Reviewed by: Prafulla Deuskar
Approved by: Prafulla Deuskar
MFC after: 1 week
subsystem lock to avoid tripping over an assertion regarding whether
the lock is held or not. This is likely to be the cause of a panic
tripped over by Andrea Campi.
acpi_wake_init:
Evaluate _PRW and set the GPE type
acpi_wake_set_enable:
Enable or disable a device's GPE.
acpi_wake_sleep_prep:
Perform any last-minute changes to the device to prepare it for
entering the given sleep state.
Also, walk the entire namespace when transitioning to a sleep state,
disabling any GPEs which aren't appropriate for the given state. Transition
acpi_lid and acpi_button to the new API.
This clears the way for non-ACPI-aware devices to wake the system (i.e.
modems) and fixes a problem where systems power up after shutdown when a
GPE is triggered.
In particular, disabling it was likely to break configurations
involving ng_vlan(4) since the latter couldn't control
the parent's VLAN_MTU in the way vlan(4) did.
Pointed out by: ru
does not reliably prevent the triggering of interrupts for all supported
configurations. Thus, the FIFO size probe could cause an interrupt,
which could lead to an interrupt storm in the shared interrupt case.
To prevent this, change ns8250_bus_probe() to use the overflow bit in
the line status register instead of the RX ready bit in the interrupt
identification register to detect whether the FIFO has filled up.
This allows us to clear all bits in the interrupt enable register during
the probe, which should prevent interrupts reliably.
Additionally, the detected FIFO size may be a bit more accurate, because
the overflow bit is only set when the FIFO did actually fill up, while
interrupts would trigger a bit early.
Reviewed and tested on a lot of hardware by: marcel
struct vmspace is freed from cpu_sched_exit() to pmap_release().
This has the advantage of being able to rely on MI code to decide
when a free should occur, instead of having to inspect the reference
count ourselves.
At the same time, turn the per-CPU vmspace pointer into a pmap pointer,
so that pmap_release() can deal with pmaps exclusively.
Reviewed (and embrassing bug spotted) by: jake
gmon and struct gmonhdr was originally just to represent the kernel
(profiling) clock frequency and it remains poorly suited to representing
the frequencies of fast counters like the TSC. It broke a year or two
ago. This quick fix keeps it working for another year or month or two
until TSC frequencies can exceed 2^32, by dividing the frequency by 2.
Dividing the frequency by 4 would work for a little longer but would
lose a little too much precision.
Fixed profiling of trap, syscall and interrupt handlers and some
ordinary functions, essentially by backing out half of rev.1.106 of
i386/exception.s. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .s files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.
This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.
vm86bios.s is included as before, but it is now between the labels for
interrupt handlers again, which seems to be wrong since half of it is
for a non-interrupt handler.
converts miidevs to a .h file, so rename to reflect that.
The usb and pccard versions have also been renamed and will be hooked
into the build system shortly (I've made the conversion in my p4
tree).
process is a non-prison root. The security.jail.allow_raw_sockets
sysctl variable is disabled by default, however if the user enables
raw sockets in prisons, prison-root should not be able to interact
with firewall rule sets.
Approved by: rwatson, bmilekic (mentor)
WRT manipulating capabilities of the parent interface:
- use ioctl(SIOCSIFCAP) to toggle VLAN_MTU (the way that was done
before was just wrong);
- use the right order of conditional clauses to set the MTU fudge
(that is logically independent from toggling VLAN_MTU.)
This splits the driver into a bus-independant backend, plus bus-specific
frontends. The old pcf(4) (i386/ISA) frontend is now in pcf_isa.c, the
frontend in envctrl.c is for sparc64/Ebus2 (Sun device name: SUNW,envctrl
from Sun E450 machines). More frontends are expected to appear in future.
This is not yet ready for public consumption, but it basically works.
Nicolas will bring over his ISA-specific fixes soon.
Reviewed by: nsouch