Commit Graph

114114 Commits

Author SHA1 Message Date
Bill Paul
7c1968ad82 Finally bring an end to the great "make the Atheros NDIS driver
work on SMP" saga. After several weeks and much gnashing of teeth,
I have finally tracked down all the problems, despite their best
efforts to confound and annoy me.

Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
miniport! It used to be that NDIS drivers relied on the NDIS library
itself for all their locking and serialization needs. Transmit packet
queues were all handled internally by NDIS, and all calls to
MiniportXXX() routines were guaranteed to be appropriately serialized.
This proved to be a performance problem however, and Microsoft
introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
still supports serialized miniports, but recommends that all new drivers
written for Windows XP and later be deserialized. Apparently Atheros
wasn't listening when they said this.

This means (among other things) that we have to serialize calls to
MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
that are triggered via the NdisMInitializeTimer() routine. It finally
dawned on me why NdisMInitializeTimer() takes a special
NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
the timer callback must be serialized, and it's only by saving the
miniport block handle that we can get access to the serialization
lock during the timer callback.

Problem number two: haunted hardware. The thing that was _really_
driving me absolutely bonkers for the longest time is that, for some
reason I couldn't understand, my test machine would occasionally freeze
or more frustratingly, reset completely. That's reset and in *pow!*
back to the BIOS startup. No panic, no crashdump, just a reset. This
appeared to happen most often when MiniportReset() was called. (As
to why MiniportReset() was being called, see problem three below.)
I thought maybe I had created some sort of horrible deadlock
condition in the process of adding the serialization, but after three
weeks, at least 6 different locking implementations and heroic efforts
to debug the spinlock code, the machine still kept resetting. Finally,
I started single stepping through the MiniportReset() routine in
the driver using the kernel debugger, and this ultimately led me to
the source of the problem.

One of the last things the Atheros MiniportReset() routine does is
call NdisReadPciSlotInformation() several times to inspect a portion
of the device's PCI config space. It reads the same chunk of config
space repeatedly, in rapid succession. Presumeably, it's polling
the hardware for some sort of event. The reset occurs partway through
this process. I discovered that when I single-stepped through this
portion of the routine, the reset didn't occur. So I inserted a 1
microsecond delay into the read loop in NdisReadPciSlotInformation().
Suddenly, the reset was gone!!

I'm still very puzzled by the whole thing. What I suspect is happening
is that reading the PCI config space so quickly is causing a severe
PCI bus error. My test system is a Sun w2100z dual Opteron system,
and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
plugged into a 100Mhz PCI slot. It's possible that this combination of
hardware causes a bus protocol violation in this scenario which leads
to a fatal machine check. This is pure speculation though. Really all I
know for sure is that inserting the delay makes the problem go away.
(To quote Homer Simpson: "I don't know how it works, but fire makes
it good!")

Problem number three: NdisAllocatePacket() needs to make sure to
initialize the npp_validcounts field in the 'private' section of
the NDIS_PACKET structure. The reason if_ndis was calling the
MiniportReset() routine in the first place is that packet transmits
were sometimes hanging. When sending a packet, an NDIS driver will
call NdisQueryPacket() to learn how many physical buffers the packet
resides in. NdisQueryPacket() is actually a macro, which traverses
the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
of the results in the 'private' section of the NDIS_PACKET. It also
sets the npp_validcounts field to TRUE To indicate that the results are
now valid. The problem is, now that if_ndis creates a pool of transmit
packets via NdisAllocatePacketPool(), it's important that each time
a new packet is allocated via NdisAllocatePacket() that validcounts
be initialized to FALSE. If it isn't, and a previously transmitted
NDIS_PACKET is pulled out of the pool, it may contain stale data
from a previous transmission which won't get updated by NdisQueryPacket().
This would cause the driver to miscompute the number of fragments
for a given packet, and botch the transmission.

Fixing these three problems seems to make the Atheros driver happy
on SMP, which hopefully means other serialized miniports will be
happy too.

And there was much rejoicing.

Other stuff fixed along the way:

- Modified ndis_thsuspend() to take a mutex as an argument. This
  allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
  avoid any possible race conditions with other routines that
  use the dispatcher lock.

- Fixed KeCancelTimer() so that it returns the correct value for
  'pending' according to the Microsoft documentation

- Modfied NdisGetSystemUpTime() to use ticks and hz rather than
  calling nanouptime(). Also added comment that this routine wraps
  after 49.7 days.

- Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
  all the MSCALL() goop.

- For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
  function. This is because it's supposed to be _stdcall on the x86
  arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
  On amd64, all routines use the same calling convention so we can
  just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
  and it will work. (The _fastcall attribute is a no-op on amd64.)

- Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
  just macros) and use them for interrupt handling. This allows us to
  move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.

- Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
  when computing mdl_size instead of uint32_t, so that it matches the
  MmSizeOfMdl() routine.

- Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
  subr_ndis.c.

- Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.

- Get rid of the "wait for link event" hack in ndis_init(). Now that
  I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
  This should fix the witness panic a couple of people have reported.

- Use MSCALL1() when calling the MiniportHangCheck() function in
  ndis_ticktask(). I accidentally missed this one when adding the
  wrapping for amd64.
2005-03-27 10:14:36 +00:00
Poul-Henning Kamp
3b73a3c079 Remove another ';' after if().
Also spotted by:	bz
2005-03-27 07:53:13 +00:00
Poul-Henning Kamp
2d8dfb2836 Remove extra ; at end of if().
Found by:	bz
2005-03-27 07:52:12 +00:00
Nate Lawson
43ce1c7762 If a device_add_child fails (i.e. low memory situation), be sure to free
the unused ivars also.

Submitted by:	pjd
Obtained from:	Coverity Prevent analysis
2005-03-27 03:37:43 +00:00
Sam Leffler
4a8bef25fe check copyin+copyout return values when processing TWA_IOCTL_GET_LOCK
Noticed by:	Coverity Prevent analysis tool
2005-03-27 00:29:37 +00:00
Sam Leffler
155fb57323 purge dead code
Noticed by:	Coverity Prevent analysis tool
2005-03-26 23:51:39 +00:00
Sam Leffler
f6ef5ddaa4 correct logic so we recognize timeout on alloc
Noticed by:	Coverity Prevent analysis tool
2005-03-26 23:43:54 +00:00
Sam Leffler
83888a7f30 purge dead code
Noticed by:	Coverity Prevent analysis tool
2005-03-26 23:37:54 +00:00
Sam Leffler
52c94c38dc deal with malloc failure when setting up the multicast filter
Noticed by:	Coverity Prevent analysis tool
2005-03-26 23:26:49 +00:00
Sam Leffler
af5691cdd5 handle malloc failure and sk_vpd_prodname potentially being null for
other reasons

Noticed by:	Coverity Prevent analysis tool
Reviewed by:	bz, jmg
2005-03-26 22:57:28 +00:00
Sam Leffler
5309f84168 deal with malloc failures
Noticed by:	Coverity Prevent analysis tool
Together with:	mdodd
2005-03-26 22:20:22 +00:00
John-Mark Gurney
1a82818b98 fix a copy/paste typo for scanner/gameport...
Spotted by:	Michal Mertl <mime@traveller.cz>
2005-03-26 22:17:48 +00:00
Poul-Henning Kamp
23db907c0f Don't call mlx_free() i mlx_attach() in case of failure. Doing so
in mlx_attach_pci() is much cleaner.

Inspired by:	Coverity
2005-03-26 21:58:09 +00:00
Sam Leffler
7a7fa27b23 rt_newaddrmsg will blow up if given something other than RTM_ADD
or RTM_DELETE; add an assertion, may want to do something more
heavyhanded in the future

Noticed by:	Coverity Prevent analysis tool
Reviewed by:	mdodd
2005-03-26 21:49:43 +00:00
Sam Leffler
2f83086184 deal with malloc failure
Noticed by:	Coverity Prevent analysis tool
2005-03-26 21:34:12 +00:00
Sam Leffler
4204b6b9a0 deal with failed malloc calls
Noticed by:	Coverity Prevent analysis tool
Glanced at by:	mdodd
2005-03-26 21:30:49 +00:00
Poul-Henning Kamp
9bb329f4e5 fix a "modify after free" bug which is practically impossible to
experience.

Found by:	Coverity (id #540 #541)
2005-03-26 21:07:35 +00:00
John-Mark Gurney
4ea3b0e7e2 add some additional pci classes and sub-classes..
Reviewed by:	imp (almost 6 months ago)
2005-03-26 20:31:09 +00:00
Ruslan Ermilov
77d23c9bc3 xl(4) meets polling(4). Hardware for this work kindly provided by
Eric Masson.

MFC after:	3 weeks
2005-03-26 20:22:58 +00:00
Poul-Henning Kamp
4a650cc291 Make (some) serial ports implement the PPS-API again. This change
appearantly fell out during the tty code cleanup.
2005-03-26 20:12:39 +00:00
Colin Percival
c0be525bed netstart is now obsoleted by /etc/rc.d/*, not by /etc/rc.network.
Reported by:	Martin Jakob, on freebsd-stable@
MFC after:	1 month
2005-03-26 20:10:24 +00:00
Poul-Henning Kamp
f83856243d s/ENOTTY/ENOIOCTL/ 2005-03-26 20:04:28 +00:00
Sam Leffler
2307fc820c deref correct mbuf ptr to collect any vlan tag
Noticed by:	Coverity Prevent analysis tool
2005-03-26 18:47:17 +00:00
Sam Leffler
05aabdfcf1 eliminate double free when sym_cam_attach fails
Noticed by:	Coverity Prevent analysis tool
2005-03-26 18:17:58 +00:00
Sam Leffler
ca640ca965 check copyin return values when loading pallete
Noticed by:	Coverity Prevent analysis tool
2005-03-26 18:01:35 +00:00
Nate Lawson
55fa5feab7 Check for invalid frequencies after parsing the package. Keep a running
count of valid frequencies and use that as the final package count, don't
give up when the first invalid state is found.  Also, add 0x9999 and expand
our upper check to >= 0xffff Mhz [2].

Submitted by:	Bruno Ducrot, Jung-uk Kim [2]
2005-03-26 17:30:34 +00:00
Pawel Jakub Dawidek
34cb151796 If an error occurs, clean up before returning from g_raid3_connect_disk(). 2005-03-26 17:24:19 +00:00
Pawel Jakub Dawidek
c2ca10933d Make the code more obvious - when an error occurs in g_mirror_connect_disk(),
detach and destroy consumer before returning.
2005-03-26 17:23:01 +00:00
Pawel Jakub Dawidek
cc6aa917b9 Check for return values.
Submitted by:	sam
Found by:	Coverity Prevent analysis tool
2005-03-26 16:51:19 +00:00
Xin LI
a431ada687 Do not do write gathering for NFSv3, since it makes no sense unless
the client is broken and does sync writes all the time.

Obtained from:	NetBSD (sys/nfs/nfs_syscalls.c,v 1.44)
Reviewed by:	-arch (bde)
2005-03-26 11:29:02 +00:00
David Schultz
b4a12fe13b Teach libstdc++ about frexpl() and ldexpl(). 2005-03-26 08:27:53 +00:00
Sam Leffler
c9a4bb99b0 when WPA is enabled discard association requests w/o a WPA ie
Submitted by:	Divy Le Ray
2005-03-26 07:15:34 +00:00
Sam Leffler
d6ec172c3a don't include wme ie in probe request frames; it was meant for probe response
frames--move it there

Noticed by:	Ghislain Mary
Submitted by:	Michael Wong
2005-03-26 07:11:31 +00:00
Kenneth D. Merry
53372d3a6d Fix a problem with the cd(4) driver -- the CAMGETPASSTHRU ioctl wouldn't
succeed if there was no media in the drive.

This was broken in rev 1.72 when the media check was added to cdioctl().

For now, check the ioctl group to decide whether to check for media or not.
(We only need to check for media on CD-specific ioctls.)

Reported by:	bland
MFC after:	3 days
2005-03-26 06:05:06 +00:00
Kenneth D. Merry
e9e4d3e4d4 Add "report only" functionality to 'camcontrol format', so users can get a
report on the status of a format already running on a drive.

Fix status reporting for 'camcontrol format'.  This was broken in rev 1.34
of camcontrol.c, almost 4 years ago!

Submitted by:	joerg (most of the reportonly changes)
MFC after:	3 days
2005-03-26 05:34:54 +00:00
Colin Percival
6e34b4796d When executing mount_foo, pass "mount_foo" as argv[0] instead of "foo".
This unbreaks "/rescue/mount -t foo" -- previously it was necessary to
explicitly call "/rescue/mount_foo".

Hints from:	gordon
X-MFC after:	3 days (if approved by re@)
2005-03-26 04:45:53 +00:00
Kenneth D. Merry
b3b73720f2 Fix a place where we were referencing a pointer after it had been freed.
Submitted by:	"Henry Miller" <hank@blackhole.com>
MFC after:	3 days
2005-03-26 04:21:11 +00:00
Brooks Davis
6c6faad6dd Remove bogus (but harmless) -I.. from CFLAGS. It makes no difference to
.depends other then the commant line.

Also remove -g from CFLAGS.  The user should add it to CFLAGS if they
desire debug support.

Reviewed by:	ru (in concept)
MFC After:	7 days
2005-03-25 22:08:59 +00:00
Maxim Sobolev
e33b116b78 Comment out rue_miibus_statchg() function. Using trial-and-error approach I
found it guilty in putting the card into unusable state after UP->DOWN->UP
media status change.

Looks like some of register writes in this functions mess up PHY interface.

No visible regressions has been found after commenting this code out -
the card properly handles forceful local mode changes and auto-detects changes
made remotely (tested with Auto, 10HD, 10FD, 100HD, 100FD).

Sponsored by:	PBXpress Inc.
MFC after:	3 days
2005-03-25 20:19:18 +00:00
David Schultz
188f6433f6 When the softupdates worklist gets too long, threads that attempt to
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist.  Add a per-thread
flag to detect this situation and avoid the recursion.  This should
fix the stack overflows that could occur while removing large
directory trees.

Tested by:	kris
Reviewed by:	mckusick
2005-03-25 17:30:31 +00:00
Warner Losh
aad0ac34fe Revert bogus += -g change. I needed it to debug the problem.
Noticed by: njl, Andrej Tobola
2005-03-25 17:30:20 +00:00
John-Mark Gurney
a517cb3040 remove unimplemented part of the interface..
MFC after:	3 days
2005-03-25 16:23:48 +00:00
Andrew Gallatin
f83935f874 Zero the reserved fields of the header, as per rfc 2734. This change
results in connectivty to MacOSX hosts via fwip.

Thanks to Apple's Arulchandran Paramasivam <arulchandranp@apple.com> for
letting us know what we were doing wrong.

Reviewed by: dfr
MFC After: 7 days
2005-03-25 16:05:42 +00:00
John Baldwin
5165a17df5 Add code to read the primary PCI bus number out of the Compaq/HP 6010
hotplug Host to PCI bridge.  This is only needed for the non-ACPI case
as the BIOS includes a proper _BBN method in ACPI.
2005-03-25 14:18:50 +00:00
Maxim Sobolev
ddbcc78273 Add /* _FOO_H_ */ after the final #endif to make danfe happy. 2005-03-25 13:22:58 +00:00
Maxim Sobolev
8eadbd831e Fix identation. 2005-03-25 12:55:06 +00:00
Maxim Sobolev
f1cc4b713e Add missed KUE_UNLOCK(). This is NOOP yet, but may be handy later on. 2005-03-25 12:53:26 +00:00
Maxim Sobolev
5135f8e687 Fix breakage in the previous commit caused by the last-minute change. 2005-03-25 12:50:57 +00:00
Maxim Sobolev
87fcbb8fe0 Protect against multiple inclusions. 2005-03-25 12:49:26 +00:00
Maxim Sobolev
d521dc21e7 Move Rx/Tx lists management routines into central location. 2005-03-25 12:42:30 +00:00