calling VOP_LOOKUP(). Rather than having each filesystem check the
LOCKPARENT flag, we simply check it once here and unlock as required.
The only unusual case is ISDOTDOT, where we require an unlocked vnode
on return. Relocking this vnode with the child locked is allowed since
the child is actually its parent.
- Add a few asserts for some unusual conditions that I do not believe can
happen. These will later go away and turn into implementations for these
conditions.
Sponsored by: Isilon Systems, Inc.
case where filesystems legitimately need to unlock the directory vp is
in the DOTDOT case, which we can explicitly check for in lookup().
Furthermore, allowing filesystems to unlock dvp can lead to lock order
reversals in lookup() when we vrele the dvp while the child is still
locked.
Sponsored by: Isilon Systems, Inc.
acpi_bus_alloc_gas() to delete the resource it set if alloc fails. Then,
change acpi_perf to delete the resource after releasing it if alloc fails.
This should make probe and attach both fully restartable if either fails.
and AMD Cool&Quiet PowerNow! (k8) cpufreq control. This driver is enabled
for both i386 and amd64 architectures. It has both acpi and legacy BIOS
attachments. Thanks to Bruno Ducrot for writing this driver and Jung-uk
Kim for testing.
Submitted by: Bruno Ducrot (ducrot:poupinou.org)
may help with various interdependencies between subsystems. More testing
is needed to understand what the underlying issues are here.
Tested by: Juho Vuori
MFC after: 2 days
variables in internal blocks.
Also, go ahead and fail if we can't load the firmware. It should have
failed like this, but never did (firmware loads generally don't fail).
some of which are rather serious:
- Use the device sysctl tree instead of rolling our own.
- Don't create a bus_dmamap_t to pass to bus_dmamem_alloc(), it is
bus_dmamem_alloc() that creates it itself. The DMA map created
by the driver was overwritten and its memory was leaked.
- Fix resource handling bugs in the error path of ixgb_dma_alloc().
- Don't use vtophys() to get the base address of the TX and RX rings
when busdma already gave us the correct address to use!
- Remove now useless includes and the alpha_XXX_dmamap() hack.
- Don't initialize if_output to ether_output(), ether_ifattach() does
it for us already.
- Add proper module dependencies on ether and pci.
Unfortunately, I'm not lucky enough to own an ixgb(4) card, nor a
machine with a bus where to plug it in and I couldn't find anyone able
to test these patches, so they are only build-tested and I won't MFC
them for 5.4-RELEASE.
This ensures that we explore EHCI busses before their companion
controllers' busses, so that ports connected to full/low speed
devices will be properly routed to the companion controllers by the
time the OHCI/UHCI exploration occurs.
work on SMP" saga. After several weeks and much gnashing of teeth,
I have finally tracked down all the problems, despite their best
efforts to confound and annoy me.
Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
miniport! It used to be that NDIS drivers relied on the NDIS library
itself for all their locking and serialization needs. Transmit packet
queues were all handled internally by NDIS, and all calls to
MiniportXXX() routines were guaranteed to be appropriately serialized.
This proved to be a performance problem however, and Microsoft
introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
still supports serialized miniports, but recommends that all new drivers
written for Windows XP and later be deserialized. Apparently Atheros
wasn't listening when they said this.
This means (among other things) that we have to serialize calls to
MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
that are triggered via the NdisMInitializeTimer() routine. It finally
dawned on me why NdisMInitializeTimer() takes a special
NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
the timer callback must be serialized, and it's only by saving the
miniport block handle that we can get access to the serialization
lock during the timer callback.
Problem number two: haunted hardware. The thing that was _really_
driving me absolutely bonkers for the longest time is that, for some
reason I couldn't understand, my test machine would occasionally freeze
or more frustratingly, reset completely. That's reset and in *pow!*
back to the BIOS startup. No panic, no crashdump, just a reset. This
appeared to happen most often when MiniportReset() was called. (As
to why MiniportReset() was being called, see problem three below.)
I thought maybe I had created some sort of horrible deadlock
condition in the process of adding the serialization, but after three
weeks, at least 6 different locking implementations and heroic efforts
to debug the spinlock code, the machine still kept resetting. Finally,
I started single stepping through the MiniportReset() routine in
the driver using the kernel debugger, and this ultimately led me to
the source of the problem.
One of the last things the Atheros MiniportReset() routine does is
call NdisReadPciSlotInformation() several times to inspect a portion
of the device's PCI config space. It reads the same chunk of config
space repeatedly, in rapid succession. Presumeably, it's polling
the hardware for some sort of event. The reset occurs partway through
this process. I discovered that when I single-stepped through this
portion of the routine, the reset didn't occur. So I inserted a 1
microsecond delay into the read loop in NdisReadPciSlotInformation().
Suddenly, the reset was gone!!
I'm still very puzzled by the whole thing. What I suspect is happening
is that reading the PCI config space so quickly is causing a severe
PCI bus error. My test system is a Sun w2100z dual Opteron system,
and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
plugged into a 100Mhz PCI slot. It's possible that this combination of
hardware causes a bus protocol violation in this scenario which leads
to a fatal machine check. This is pure speculation though. Really all I
know for sure is that inserting the delay makes the problem go away.
(To quote Homer Simpson: "I don't know how it works, but fire makes
it good!")
Problem number three: NdisAllocatePacket() needs to make sure to
initialize the npp_validcounts field in the 'private' section of
the NDIS_PACKET structure. The reason if_ndis was calling the
MiniportReset() routine in the first place is that packet transmits
were sometimes hanging. When sending a packet, an NDIS driver will
call NdisQueryPacket() to learn how many physical buffers the packet
resides in. NdisQueryPacket() is actually a macro, which traverses
the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
of the results in the 'private' section of the NDIS_PACKET. It also
sets the npp_validcounts field to TRUE To indicate that the results are
now valid. The problem is, now that if_ndis creates a pool of transmit
packets via NdisAllocatePacketPool(), it's important that each time
a new packet is allocated via NdisAllocatePacket() that validcounts
be initialized to FALSE. If it isn't, and a previously transmitted
NDIS_PACKET is pulled out of the pool, it may contain stale data
from a previous transmission which won't get updated by NdisQueryPacket().
This would cause the driver to miscompute the number of fragments
for a given packet, and botch the transmission.
Fixing these three problems seems to make the Atheros driver happy
on SMP, which hopefully means other serialized miniports will be
happy too.
And there was much rejoicing.
Other stuff fixed along the way:
- Modified ndis_thsuspend() to take a mutex as an argument. This
allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
avoid any possible race conditions with other routines that
use the dispatcher lock.
- Fixed KeCancelTimer() so that it returns the correct value for
'pending' according to the Microsoft documentation
- Modfied NdisGetSystemUpTime() to use ticks and hz rather than
calling nanouptime(). Also added comment that this routine wraps
after 49.7 days.
- Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
all the MSCALL() goop.
- For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
function. This is because it's supposed to be _stdcall on the x86
arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
On amd64, all routines use the same calling convention so we can
just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
and it will work. (The _fastcall attribute is a no-op on amd64.)
- Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
just macros) and use them for interrupt handling. This allows us to
move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.
- Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
when computing mdl_size instead of uint32_t, so that it matches the
MmSizeOfMdl() routine.
- Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
subr_ndis.c.
- Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.
- Get rid of the "wait for link event" hack in ndis_init(). Now that
I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
This should fix the witness panic a couple of people have reported.
- Use MSCALL1() when calling the MiniportHangCheck() function in
ndis_ticktask(). I accidentally missed this one when adding the
wrapping for amd64.
count of valid frequencies and use that as the final package count, don't
give up when the first invalid state is found. Also, add 0x9999 and expand
our upper check to >= 0xffff Mhz [2].
Submitted by: Bruno Ducrot, Jung-uk Kim [2]
succeed if there was no media in the drive.
This was broken in rev 1.72 when the media check was added to cdioctl().
For now, check the ioctl group to decide whether to check for media or not.
(We only need to check for media on CD-specific ioctls.)
Reported by: bland
MFC after: 3 days
found it guilty in putting the card into unusable state after UP->DOWN->UP
media status change.
Looks like some of register writes in this functions mess up PHY interface.
No visible regressions has been found after commenting this code out -
the card properly handles forceful local mode changes and auto-detects changes
made remotely (tested with Auto, 10HD, 10FD, 100HD, 100FD).
Sponsored by: PBXpress Inc.
MFC after: 3 days
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist. Add a per-thread
flag to detect this situation and avoid the recursion. This should
fix the stack overflows that could occur while removing large
directory trees.
Tested by: kris
Reviewed by: mckusick
results in connectivty to MacOSX hosts via fwip.
Thanks to Apple's Arulchandran Paramasivam <arulchandranp@apple.com> for
letting us know what we were doing wrong.
Reviewed by: dfr
MFC After: 7 days
now always allocates a new vnode.
- Define a new function, vnlru_free, which frees vnodes from the free list.
It takes as a parameter the number of vnodes to free, which is
wantfreevnodes - freevnodes when called from vnlru_proc or 1 when
called from getnewvnode(). For now, getnewvnode() still tries to reclaim
a free vnode before creating a new one when we are near the limit.
- Define a function, vdestroy, which handles the actual release of memory
and teardown of locks, etc. This could become a uma_dtor() routine.
- Get rid of minvnodes. Now wantfreevnodes is 1/4th the max vnodes. This
keeps more unreferenced vnodes around so that files which have only
been stat'd are less likely to be kicked out of the system before we
have a chance to read them, etc. These vnodes may still be freed via
the normal vnlru_proc() routines which may some day become a real lru.
of the device id.
- Use BAR2 rather than BAR0 for the Rocketport UPCI 8O card. I suspect
that other UPCI cards might need to use BAR2 as well.
Tested by: wsk at gddsn dot org dot cn
MFC after: 1 week
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.
with the wrong language parameter when retrieving the device serial
number. This invalid request caused some devices not to work at
all.
PR: usb/79190
Submitted by: Hans Petter Selasky <hselasky@c2i.net>
Instead, explicitly enable them when we setup the interrupt handler.
Also, move the setting of stathz and profhz down to the same place so
that the code flow is simpler and easier to follow.
- Don't setup an interrupt handler for IRQ0 if we are using the lapic timer
as it doesn't do anything productive in that case.
Replace a KASSERT of LINUX_IFNAMSIZ == IFNAMSIZ with a preprocessor
check and #error message. This will prevent nasty suprises if users
change IFNAMSIZ without updating the linux code appropriatly.
lockmgr locks that this thread owns. This is complicated due to
LK_KERNPROC and because lockmgr tolerates unlocking an unlocked lock.
Sponsored by: Isilon Systes, Inc.
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
lookup to do shared locks on the root. Filesystems are free to ignore
flags and instead acquire an exclusive lock if they do not support
shared locks.
Sponsored by: Isilon Systems, Inc.
before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path
because we may violate some lock order with another locked vnode if
we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the
vnode with VI_OWEINACT. This case should be very rare.
- Clear VI_OWEINACT in vinactive() and vbusy().
- If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well.
Sponsored by: Isilon Systems, Inc.
necessary since we disable the shared locks in vfs_cache, but it is
prefered that the option not leak out into filesystems when it is
disabled.
Sponsored by: Isilon Systems, Inc.
config option have now been fixed. All filesystems are properly locked
and checked via DEBUG_VFS_LOCKS. Remove the workaround code.
Sponsored by: Isilon Systems, Inc.
non-maskable).
- The NFS client needs to guard against spurious wakeups
while waiting for the response. ltrace causes the process
under question to wakeup (possibly from ptrace()), which
causes NFS to wakeup from tsleep without the response being
delivered.
Submitted by: Mohan Srinivasan
disables tag queuing temporarily in order to allow controllers a window
to safely perform transfer negotiation with non-compliant devices. Before
this change, CAM would restore the queue depth to the controller specified
maximum or device quirk level rather than any depth determined by reactions
to QUEUE FULL/BUSY events or an explicit user setting.
During device probe, initialize the flags field for XPT_SCAN_BUS.
The uninitialized value often confused CAM into not bothering to
issue an AC_FOUND_DEVICE async event for new devices. The reason
this bug wasn't reported earlier is that CAM manually announces
devices after the initial system bus scans.
MFC: 3 days
svr4_do_getmsg(). In principle this bug could disclose data from
kernel memory, but in practice, the SVR4 emulation layer is probably
not functional enough to cause the relevant code path to be executed.
In any case, the emulator has been disconnected from the build since
5.0-RELEASE.
Found by: Coverity Prevent analysis tool
to kmem_alloc(). Failure to do this made it possible for user
processes to cause a hard lock on i386 kernels. I believe this only
affects 6-CURRENT on or after 2005-01-26.
Found by: Coverity Prevent analysis tool
Security: Local DOS
with the IP_HDRINCL option set. Without this change, a Linux process
with access to a raw socket could cause a kernel panic. Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal. I believe this only affects 6-CURRENT on or after 2005-01-30.
Found by: Coverity Prevent analysis tool
Security: Local DOS
validation error in procfs/linprocfs that can be exploited by local
users to cause a kernel panic. All versions of FreeBSD with the patch
referenced in SA-04:17.procfs have this bug, but versions without that
patch have a more serious bug instead. This problem only affects
systems on which procfs or linprocfs is mounted.
Found by: Coverity Prevent analysis tool
Security: Local DOS
FreeBSD based on aue(4) it was picked by OpenBSD, then from OpenBSD ported
to NetBSD and finally NetBSD version merged with original one goes into
FreeBSD.
Obtained from: http://www.gank.org/freebsd/cdce/
NetBSD
OpenBSD
be pass-thru mode, when traffic is not copied by ng_tee, but passed thru
ng_netflow.
Changes made:
- In ng_netflow_rcvdata() do all necessary pulluping: Ethernet header,
IP header, and TCP/UDP header.
- Pass only pointer to struct ip to ng_netflow_flow_add(). Any TCP/UDP
headers are guaranteed to by after it.
- Merge make_flow_rec() function into ng_netflow_flow_add().
be pass-thru mode, when traffic is not copied by ng_tee, but passed thru
ng_netflow.
Changes made:
- In ng_netflow_rcvdata() do all necessary pulluping: Ethernet header,
IP header, and TCP/UDP header.
- Pass only pointer to struct ip to ng_netflow_flow_add(). Any TCP/UDP
headers are guaranteed to by after it.
- Merge make_flow_rec() function into ng_netflow_flow_add().
modern CPUs that have multiple VID#s that aren't detectable via public
methods. We use the control value from acpi_perf as the id16 for setting
a given frequency.
Add two another workarounds for carp(4) interfaces:
- do not add connected route when address is assigned to carp(4) interface
- do not add connected route when other interface goes down
Embrace workarounds with #ifdef DEV_CARP
(like an EC/SMbus controller) to access the EC address space. Access
is synchronized by the EcLock/Unlock routines in EcSpaceHandler().
Tested by: Hans Petter Selasky
configure_final(), assert that "cold" is true in usb_cold_explore()
when there are busses to explore. When USB is kldloaded after boot,
usb_cold_explore() will still get invoked but the list of busses
to explore in that case should always be empty.
transfer, which lead to panics or page faults. For example if a
transfer timed out, another thread could come along and attempt to
abort the same transfer while the timeout task was sleeping in
the *_abort_xfer() function.
Add an "aborting" flag to the private transfer state in each host
controller driver and use this to ensure that the abort is only
executed once. Also prioritise normal abort requests over timeouts
so that the callback is always given a status of USB_CANCELLED even
if the timeout-initiated abort began first.
The crashes caused by this bug were mainly reported in connection
with lpd printing to a USB printer.
PR: usb/78208, usb/78986
inevitable component in Sun Exx00 machines and provides serial ports,
NVRAM and TOD amongst others which are handled by uart(4) and eeprom(4)
respectively). This driver currently only prints out information about
the chassis on attach and allows to blink the 'Cycling' LED (which is
duplicated on the front panel) of the clock board just like fhc(4) does
for the other boards. The device name for the LED is /dev/led/clockboard.
Obtained from: OpenBSD
Tested by: joerg
bus_generic_rl_release_resource() for the bus_release_resource() method
instead of a local copy.
- Correctly handle pass-through allocations in fhc_alloc_resource().
- In case the board model can't be determined just print "unknown model"
so the physical slot number is reported in any case.
- Add support for blinking the 'Cycling' LED of boards on a fhc(4) hanging
of off the nexus (i.e. all boards except the clock board) via led(4).
All boards have at least 3 controllable status LEDs, 'Power', 'Failure'
and 'Cycling'. While the 'Cycling' LED is suitable for signaling from
the OS the others are better off being controlled by the firmware.
The device name for the 'Cycling' LED of each board is /dev/led/boardX
where X is the physical slot number of the board. [1]
Obtained from: OpenBSD [1]
Tested by: joerg [1]
bus_generic_rl_release_resource() for the bus_release_resource() method
instead of a local copy.
- Correctly handle pass-through allocations in central_alloc_resource().
This is mentioned in the Handbook but it is not as obvious to new
users why bpf is needed compared to the other largely self-explanatory
items in GENERIC.
PR: conf/40855
MFC after: 1 week
last in the list rather than first.
This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This
also means that the pci code will once again print the resources in BAR
ascending order.
w/o problems than I was before... This simply brings back the knote_delete
as knlist_delete which will also drop the knote's, instead of just clearing
the list and seeing _ONESHOT...
Fix a race where if a note was _INFLUX and _DETACHED, it could end up being
modified... whoopse..
MFC after: 1 week
Prodded by: ambrisko and dwhite
same value as the previous ioctls so no binary change. Also, make a few
style changes to reduce diffs to my tree.
Loosely based on code from: Hans Petter Selasky
system have been attached, but no later. This ensures that we do
not explore ohci or uhci busses before the companion echi controller
has been initialised, so it should fix the problem of multi-speed
USB devices getting attached as USB 1 devices first and then
re-attached as USB 2.
Some further changes are needed on architectures that do not currently
allow hooks to be inserted before configure_final() - alpha, ia64,
powerpc and sparc64. On these architectures the exploration will
now be delayed until the usb kthread runs.
alignment restrictive, and help performance on some ethernet cards which
currently copy the entire packet a couple bytes to get the packet aligned
properly...
Wordsmithing by: dwhite
Obtained from: NetBSD (code only)
I'll clean it up later: rwatson
in the window between the beginning of panic() and entering the debugger,
it's possible to receive interrupts. If we receive an interrupt, don't
preempt if panicstr != NULL, as the system is in the process of failing, and
the preempting thread is likely to stumble over the failure. The typical
scenario is during the printf() in panic() prior to entering the debugger,
but when running with a slower console type such as serial console.
It could be that the panic string should be passed to the debugger to print,
so that it can run from the debugger's environment rather than a regular
kernel printf.
Glanced at by: jhb
Do our best to plug some memory leaks (VPD data, jumbo memory buffer,...).
Log if we cannot free because memory still in use[1].
Change locking to avoid ''acquiring duplicate lock of same
type: "network driver"'' and potential deadlock. Also seems to fix LOR #063.
[1] This change does not solve problems if buffers are still in use when
unloading if_sk.ko. There is ongoing work which will address jumbogram
allocations in a more general way.
PR: kern/75677 (with changes, no mii fixes in here)
Tested by: net, Antoine Brodin (slightly different version)
Approved by: rwatson (mentor)
MFC after: 5 days
Obtained from: NetBSD if_sk.c rev. 1.11
* Take PHY out of reset for Yukon Lite Rev. A3.
Submitted by: postings on net@ in thread "skc0: no PHY found", 2005-02-22
Tested by: net
Approved by: rwatson (mentor)
MFC after: 5 days
if the interface is marked RUNNING.
Obtained from: NetBSD if_sk.c rev. 1.12
* Don't initialize the card (and start an autonegotiation) every time the IP
address changes. Makes 'dhclient sk0' invocations way faster and more
consistant. i.e. one DHCPREQUEST elicits the DHCPACK.
Obtained from: OpenBSD if_sk.c rev. 1.56
* Additional locking changes in sk_ioctl.
PR: kern/61296 should see improvements by the last two.
Approved by: rwatson (mentor)
MFC after: 5 days
doing that in bfe_stop().
This should fix a panic recently reported on -current occuring when taking
device down then up. In the original implementation, an "ifconfig bfe0 down"
triggers bfe_stop(), which also destroys all TX/RX descriptor dmamaps. Hence
the subsequent "ifconfig bfe0 up" would force the device to use those
already-released dmamap and thus panic the kernel.
PR: kern/77804
Submitted by: Frank Mayhar <frank at exit dot com>
Reviewed by: dmlb, sam (mentor)
Tested by: Phil <pcasidy at casidy dot com>, myself
MFC after: 1 week
session in tprintf(). SESSRELE() needs to properly dispose of the
sessions mutex.
Add sessrele() which does the proper cleanup and have SESSRELE() call it.
Use SESSRELE also in pgdelete().
Found by: Coverity (ID:526)
a given page and, if the pmap is the current pmap, write back the associated
cache line.
Use pmap_wb_page in pmap_qenter() instead of inconditionally
write back/invalidating the data cache.
o Use IP_NPX in preference to hard coded value to write 0 to clear busy#
o Use md macro for a full reset of the npx
o Use IRQ_NPX in preference to hard coded value for each platform.
# The other two ifdefs in this file are hard to remove
rman_resource_resournce_bound wrt end parameter. The end parameter
here was the same as the start. However, it should be start + count -
1, so make it that instead.
it to get better hashing in vfs_hash.
In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.
Drop the VI_HASHED flag.
where having this disabled was actually hurting us, since so many
BIOSes include legacy USB emulation that takes control of all usb
ports and only the ehci driver knows how to disable it.
instead of failing.
When looking for a region to allocate, we used to check to see if the
start address was < end. In the case where A..B is allocated already,
and one wants to allocate A..C (B < C), then this test would
improperly fail (which means we'd examine that region as a possible
one), and we'd return the region B+1..C+(B-A+1) rather than NULL.
Since C+(B-A+1) is necessarily larger than C (end argument), this is
incorrect behavior for rman_reserve_resource_bound().
The fix is to exclude those regions where r->r_start + count - 1 > end
rather than r->r_start > end. This bug has been in this code for a
very long time. I believe that all other tests against end are
correctly done.
This is why sio0 generated a message about interrupts not being
enabled properly for the device. When fdc had a bug that allocated
from 0x3f7 to 0x3fb, sio0 was then given 0x3fc-0x404 rather than the
0x3f8-0x3ff that it wanted. Now when fdc has the same bug, sio0 fails
to allocate its ports, which is the proper behavior. Since the probe
failed, we never saw the messed up resources reported.
I suspect that there are other places in the tree that have weird
looping or other odd work arounds to try to cope with the observed
weirdness this bug can introduce. These workarounds should be located
and eliminated.
Minor debug write fix to match the above test done as well.
'nice' by: mdodd
Sponsored by: timing solutions (http://www.timing.com/)
I think all we really need is -fno-sse2.
I really don't like cluttering up the compiler invocation,
but this bigger hammer will fix reported problems for now.
to mistakes from day 1, it has always had semantics inconsistent with
SVR4 and its successors. In particular, given argument M:
- On Solaris and FreeBSD/{alpha,sparc64}, it clobbers the old flags
and *sets* the new flag word to M. (NetBSD, too?)
- On FreeBSD/{amd64,i386}, it *clears* the flags that are specified in M
and leaves the remaining flags unchanged (modulo a small bug on amd64.)
- On FreeBSD/ia64, it is not implemented.
There is no way to fix fpsetsticky() to DTRT for both old FreeBSD apps
and apps ported from other operating systems, so the best approach
seems to be to kill the function and fix any apps that break. I
couldn't find any ports that use it, and any such ports would already
be broken on FreeBSD/ia64 and Linux anyway.
By the way, the routine has always been undocumented in FreeBSD,
except for an MLINK to a manpage that doesn't describe it. This
manpage has stated since 5.3-RELEASE that the functions it describes
are deprecated, so that must mean that functions that it is *supposed*
to describe but doesn't are even *more* deprecated. ;-)
Note that fpresetsticky() has been retained on FreeBSD/i386. As far
as I can tell, no other operating systems or ports of FreeBSD
implement it, so there's nothing for it to be inconsistent with.
PR: 75862
Suggested by: bde
- Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so
no one else attempts to grow a dependency on them.
- Now that objects with pages hold the vnode we don't have to do unlocked
checks for the page count in the vm object in VSHOULDFREE. These three
macros could simply check for holdcnt state transitions to determine
whether the vnode is on the free list already, but the extra safety
the flag affords us is probably worth the minimal cost.
- The leafonly sysctl and code have been dead for several years now,
remove the sysctl and the code that employed it from vtryrecycle().
- vtryrecycle() also no longer has to check the object's page count as
the object holds the vnode until it reaches 0.
Sponsored by: Isilon Systems, Inc.
is inserted.
- In vm_page_remove() drop the backing vnode when the last page
is removed.
- Don't check the vnode to see if it must be reclaimed on every
call to vm_page_free_toq() as we only check it now when it is
actually required. This saves us two lock operations per call.
Sponsored by: Isilon Systems, Inc.
that they set v->v_vnlock. This is true for all filesystems in the
tree.
- Remove all uses of LK_THISLAYER. If the lower layer is locked, the
null layer is locked. We only use vget() to get a reference now.
null essentially does no locking. This fixes LOOKUP_SHARED with
nullfs.
- Remove the special LK_DRAIN considerations, I do not believe this is
needed now as LK_DRAIN doesn't destroy the lower vnode's lock, and
it's hardly used anymore.
- Add one well commented hack to prevent the lowervp from going away
while we're in it's VOP_LOCK routine. This can only happen if we're
forcibly unmounted while some callers are waiting in the lock. In
this case the lowervp could be recycled after we drop our last ref
in null_reclaim(). Prevent this with a vhold().
that we free that resource. All the other resources are freed in
their own routine, but since we haven't saved a pointer to this one,
it is leaked. This is the failure case that lead to the sio ports
that weren't working, I think.
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
as suggested by Matt's comment. Also fix some style and paranoia issues.
The entire function could benefit from review by a VFS guru.
MFC after: 6 weeks
Since we used an sbuf of size resid to accumulate dirents, we would end
up returning one byte short when we had enough dirents to fill or exceed
the size of the sbuf (the last byte being lost to bogus NUL termination)
causing the next call to return EINVAL due to an unaligned offset. This
went undetected for a long time because I did most of my testing in
single-user mode, where there are rarely enough processes to fill the
4096-byte buffer ls(1) uses. The most common symptom of this bug is that
tab completion of /proc or /compat/linux/proc does not work properly when
many processes are running.
Also, a check near the top would return EINVAL if resid was smaller than
PFS_DELEN, even if it was 0, which is frequently the case and perfectly
allowable. Change the test so that it returns 0 if resid is 0.
MFC after: 2 weeks
to get from (mount + inode) to vnode. These tables are mostly
copy&pasted from UFS, sized based on desiredvnodes and therefore
quite large (128K-512K). Several filesystems are buggy enough that
they allocate the hash table even before they know if they will
ever be used or not.
Add "vfs_hash", a system wide hash table, which will replace all
the per-filesystem hash-tables.
The fields we add to struct vnode will more or less be saved in
the respective filesystems inodes.
Having one central implementation will save code and will allow us
to justify the complexity of code to dynamically (re)size the hash
at a later point.
to use only the holdcnt to determine whether a vnode may be recycled,
simplifying the V* macros as well as vtryrecycle(), etc.
Sponsored by: Isilon Systems, Inc.
vtryrecycle(). All obj refs also ref the vnode.
- Consistently use v_incr_usecount() to increment the usecount. This will
be more important later.
Sponsored by: Isilon Systems, Inc.
cleared if the host controller retries the transfer and is successful,
but we were interpreting these bits as indicating a fatal error.
Ignore these error bits, and instead use the HALTED bit to determine
if the transfer failed. Also update the USBD_STALLED detection to
ignore these bits.
Obtained from: OpenBSD
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().
Sponsored by: Isilon Systems, Inc.
vnode lock. Remove the c_lock and use the vn lock in its place.
- Keep the coda lock functions so that the debugging information is
preserved, but call directly to the vop_std*lock routines for the real
functionality.
Sponsored by: Isilon Systems, Inc.
the filesystem. Check that rather than VI_XLOCK.
- Shorten ffs_reload by one step. The old check for an inactive vnode
was slightly racey, and the code which deals with still active vnodes
is not much more expensive.
Sponsored by: Isilon Systems, Inc.
required.
- In ufs_close(), don't do the EAGAIN vrele hack, the top layer now calls
vn_start_write before the lock is acquired as it should.
Sponsored by: Isilon Systems, Inc.
- Also in ufs_inactive, don't acquire the vnode interlock where it isn't
strictly needed. Also owning the vnode interlock while calling vprint()
will cause locking assertions to trip.
Sponsored by: Isilon Systems, Inc.
on an unlinked file. We can't know if this is the case until after we
have the lock.
- Lock the vnode in vn_close, many filesystems had code which was unsafe
without the lock held, and holding it greatly simplifies vgone().
- Adjust vn_lock() to check for the VI_DOOMED flag where appropriate.
Sponsored by: Isilon Systems, Inc.
- Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do
writing ops without suspending. This could suspend the vlruproc which
should not be a problem under normal circumstances.
- Manually implement VMIGHTFREE in vlrureclaim as this was the only instance
where it was used.
- Acquire a lock before calling vgone() as it now requires it.
- Move the acquisition of the vnode interlock from vtryrecycle() to
getnewvnode() so that if it fails we don't drop and reacquire the
vnode_free_list_mtx.
- Check for a usecount or holdcount at the end of vtryrecycle() in case
someone grabbed a ref while we were recycling. Abort the recycle, and
on the final ref drop this vnode will be placed on the head of the free
list.
- Move the redundant VOP_INACTIVE protection code into the local
vinactive() routine to avoid code bloat.
- Keep the vnode lock held across calls to vgone() in several places.
- vgonel() no longer uses XLOCK, instead callers must hold an exclusive
vnode lock. The VI_DOOMED flag is set to allow other threads to detect
a vnode which is no longer valid. This flag is set until the last
reference is gone, and there are no chances for a new ref. vgonel()
holds this lock across the entire function, which greatly simplifies
logic.
_ Only vfree() in one place in vgone() not three.
- Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock
in the LK_NOWAIT case. In other cases, check after we have slept and
acquired an exlusive lock. This will simulate the old vx_wait()
behavior.
Sponsored by: Isilon Systems, Inc.
against recycling.
- Modify VSHOULDFREE, VCANRECYCLE, etc. now that certain flags are no
longer important. Remove VMIGHTFREE as it is only used in one place.
Sponsored by: Isilon Systems, Inc.
between passes over a QH. Previously the accesses to a QH were
bunched together in time, so the interval was often much longer
than intended. This now appears to match the diagrams in the EHCI
spec, so remove the XXX comment.
to stll be able to mount NFS root as prescribed by DCHP configuration. Since
pxeboot is using TFTP to get to the files, pxeboot can not rely on NFS to
provide it a root directory hande as a side effect. pxeboot has to make RPC
mount call itself.
a serial console anyway because input-device is set to keyboard and
output-device is set to screen but no keyboard is plugged in don't
assume that a device node for the input-device alias exists. While
this is true for RS232 keyboards (the node of the SCC and UART
respectively which controls the keyboard doesn't disappear when no
keyboard is plugged in) this assumption breaks for USB keyboards.
It's most likely also not true for PS/2 keyboards but OFW doesn't
reliably switch to a serial console when the potential keyboard is
a PS/2 one which isn't plugged in so this couldn't be verified
properly.
Reported by: Will Andrews <will@csociety.org>, obrien
MFC after: 1 week
so that the socket lock is held over the test-and-set removal of the
accept filter option during connect, and the two socket mutex regions
(transition to connected, perform accept filter) are combined.
from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it
now matches do_setopt_accept_filter(). Slightly reformulate the logic to
match the optimistic allocation of storage for the argument in advance,
and slightly expand the coverage of the socket lock.
OR the physical address with alpha_XXX_dmamap_or to get the DMA address,
like the name of the variable suggests. However, while we were doing
this correctly in the alpha_XXX_dmamap() macro, the busdma code added
the variable to the physical address instead of or'ing it. Fortunately
and if my math is not entirely wrong, you would need more than 128GB of
RAM and a device able to do DMA in 64bits to experience the bug.
Spotted by: cognet
long filename. Each substring is indexed by the windows ID, a
sequential one-based value. The previous code was extremely slow,
doing a malloc/strcpy/free for each substring.
This code optimizes these routines with this in mind, using the ID
to index into a single array and concatenating each WIN_CHARS chunk
at once. (The last chunk is variable-length.)
This code has been tested as working on an FS with difficult filename
sizes (255, 13, 26, etc.) It gives a 77.1% decrease in profiled
time (total across all functions) and a 73.7% decrease in wall time.
Test was "ls -laR > /dev/null".
Per-function time savings:
mbnambuf_init: -90.7%
mbnambuf_write: -18.7%
mbnambuf_flush: -67.1%
MFC after: 1 month
socket lock around knlist_init(), so don't.
Hard code the setting of the socket reference count to 1 rather than
using soref() to avoid asserting the socket lock, since we've not yet
exposed the socket to other threads.
This removes two mutex operations from each socket allocation.
so that the socket does not generate SIGPIPE, only EPIPE, when a write
is attempted after socket shutdown. When the option was introduced in
2002, this required the logic for determining whether SIGPIPE was
generated to be pushed down from dofilewrite() to the socket layer so
that the socket options could be considered. However, the change in
2002 omitted modification to soo_write() required to add that logic,
resulting in SIGPIPE not being generated even without SO_NOSIGPIPE when
the socket was written to using write() or related generic system calls.
This change adds the EPIPE logic to soo_write(), generating a SIGPIPE
signal to the process associated with the passed uio in the event that
the SO_NOSIGPIPE option is not set.
Notes:
- The are upsides and downsides to placing this logic in the socket
layer as opposed to the file descriptor layer. This is really fd
layer logic, but because we need so_options, we have a choice of
layering violations and pick this one.
- SIGPIPE possibly should be delivered to the thread performing the
write, not the process performing the write.
- uio->uio_td and the td argument to soo_write() might potentially
differ; we use the thread in the uio argument.
- The "sigpipe" regression test in src/tools/regression/sockets/sigpipe
tests for the bug.
Submitted by: Mikko Tyolajarvi <mbsd at pacbell dot net>
Talked with: glebius, alfred
PR: 78478
MFC after: 1 week
to syncrhonize access to the data as a result. This makes the pps
less likely to miss the 1ms pulse that I'm feeding it, but not
entirely reliable yet on my 133MHz P5.
Reviewed by: phk
unknown (since my sony vaio didn't :-(.
Instead, fix the problem described by 1.49 in a different way: just
add the two calls I'd hoped I'd avoid in 1.49 by doing the (wrong)
gymnastics there. While 1.49 is a good direction to go in, each step
of the way should work :-(.
resources. When allocating 6 ports for a 4 port range isa code
returns an error. I'm not sure yet why this is the case, but suspect
it is just a non-regularity in how the resource allocation code works
which should be corrected. Use 1 as the ports size in this case.
However, in the hints case, we have to specify the length, so use 6 in
that case. I believe that this is also acpi friendly.
Also, complain when we can't allocate FDOUT register space. Right now we
silently fail when we can't. This failure is referred to above.
When there's no resource for FDCTL, go ahead and allocate one by hand.
Many PNPBIOS tables don't list this resource, and our hints mechanism also
doesn't cover that range. If we can't allocate it, whine, but fake up
something. Before, we were always bogusly faking it and no one noticed
the sham (save the original author who has now fixed his private shame).
per-connection and globally. This eliminates potential DoS attacks
where SACK scoreboard elements tie up too much memory.
Submitted by: Raja Mukerji (raja at moselle dot com).
Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com).
asks that each buffer be (2048 * 256) bytes long. I suspect that alignment
isn't a real requirement since busdma only recently started honoring it. The
size is also bogus. Fix both of these and stop busdma from trying to
exhaust the system memory pool with bounce pages.
Submitted by: Kevin Oberman
MFC After: 7 days
- Fix a bug in the same condition where we forgot to drop the ACPI pcib
lock. This fixes hangs after the pcib0 attach on some machines.
Tested by: sos (2)
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.
Suggested by: alfred
Add support for passing in a mutex. If NULL is passed a global
subr_unit mutex is used.
Add alloc_unrl() which expects the mutex to be held.
Allocating a unit will never sleep as it does not need to allocate
memory.
Cut possible range in half so we can use -1 to mean "out of number".
Collapse first and last runs into the head by means of counters.
This saves memory in the common case(s).
ever working correctly: the code was linking the QHs together but
then immediately overwriting the "next" pointers. Oops. Also
initialise qh_endphub, since the EHCI spec says that we should
always set the pipe multiplier field to something sensible.
This appears to make basic split transactions work, so enable split
transactions for control, bulk and interrupt pipes (split isochronous
transfers are not yet implemented). It should now be possible to
use USB1 devices even when they are connected through a USB2 hub.
seem to be necessary anymore, and it prevents tasting a valid drive
when booting with geom_vinum already loaded, since SCSI disks set their
sectorsize not until first opening them.
from an mbuf into the fxp_encap() function, as done in other drivers.
- Don't waste time calling bus_dmamap_load_mbuf() if we know the mbuf
chain is too long to fit in a TX descriptor, call m_defrag() first.
- Convert fxp(4) to use bus_dmamap_load_mbuf_sg().
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.
However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.
PR: kern/76426
Submitted by: Steven Hartland <killing@multiplay.co.uk>
MFC after: 3 days
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.
The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.
What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.
Along the way, I discovered a few other problems:
- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
wrong.)
- Similarly, the NDIS interrupt handler, which is essentially a
DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
has been fixed accordingly.
- MiniportQueryInformation() and MiniportSetInformation() run at
DISPATCH_LEVEL, and each request must complete before another
can be submitted. ndis_get_info() and ndis_set_info() have been
fixed accordingly.
- Turned the sleep lock that guards the NDIS thread job list into
a spin lock. We never do anything with this lock held except manage
the job list (no other locks are held), so it's safe to do this,
and it's possible that ndis_sched() and ndis_unsched() can be
called from DISPATCH_LEVEL, so using a sleep lock here is
semantically incorrect. Also updated subr_witness.c to add the
lock to the order list.
- "options" is followed by the characters \040\011, not \011\011.
Correct both my own sins and those of others.
- Comment blocks start and end with an empty line ^#$.
- Remove non-standard comments added in my last commit.
Requested by: njl
Correctness confirmed by: bde
since there are often significant holes in the memory map due to the
kernel, loader and OFW data structures not being included: Maxmem is
the highest available, so can be misleading.