for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
January 1, 1601. The 1601 - 1970 period was in seconds rather than 100ns
units.
Remove duplication by having NdisGetCurrentSystemTime call ntoskrnl_time.
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.
be woken up by kthread_exit. This is racey and in some cases the kthread will
exit before ndis gets around to sleep so it will be stuck indefinitely. This
change reuses the kq_exit variable to indicate that the thread has gone and
will loop on tsleep with a timeout waiting for it. If the kthread has already
exited then it will not sleep at all.
Approved by: re (rwatson)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
for a Windows ISR is 'BOOLEAN isrfunc(KINTERRUPT *, void *)' meaning
the ISR get a pointer to the interrupt object and a context pointer,
and returns TRUE if the ISR determines the interrupt was really generated
by the associated device, or FALSE if not.
I had mistakenly used 'void isrfunc(void *)' instead. It happens the
only thing this affects is the internal ndis_intr() ISR in subr_ndis.c,
but it should be fixed just in case we ever need to register a real
Windows ISR vi IoConnectInterrupt().
For NDIS miniports that provide a MiniportISR() method, the 'is_our_intr'
value returned by the method serves as the return value from ndis_isr(),
and 'call_isr' is used to decide whether or not to schedule the interrupt
handler via DPC. For drivers that only supply MiniportEnableInterrupt()
and MiniportDisableInterrupt() methods, call_isr is always TRUE and
is_our_intr is always FALSE.
In the end, there should be no functional changes, except that now
ntoskrnl_intr() can terminate early once it finds the ISR that wants
to service the interrupt.
Intel's web site requires some minor tweaks to get it to work:
- The driver seems to have been released with full WMI tracing enabled,
and makes references to some WMI APIs, namely IoWMIRegistrationControl(),
WmiQueryTraceInformation() and WmiTraceMessage(). Only the first
one is ever called (during intialization). These have been implemented
as do-nothing stubs for now. Also added a definition for STATUS_NOT_FOUND
to ntoskrnl_var.h, which is used as a return code for one of the WMI
routines.
- The driver references KeRaiseIrqlToDpcLevel() and KeLowerIrql()
(the latter as a function, which is unusual because normally
KeLowerIrql() is a macro in the Windows DDK that calls KfLowewIrql()).
I'm not sure why these are being called since they're not really
part of WDM. Presumeably they're being used for backwards
compatibility with old versions of Windows. These have been
implemented in subr_hal.c. (Note that they're _stdcall routines
instead of _fastcall.)
- When querying the OID_802_11_BSSID_LIST OID to get a BSSID list,
you don't know ahead of time how many networks the NIC has found
during scanning, so you're allowed to pass 0 as the list length.
This should cause the driver to return an 'insufficient resources'
error and set the length to indicate how many bytes are actually
needed. However for some reason, the Intel driver does not honor
this convention: if you give it a length of 0, it returns some
other error and doesn't tell you how much space is really needed.
To get around this, if using a length of 0 yields anything besides
the expected error case, we arbitrarily assume a length of 64K.
This is similar to the hack that wpa_supplicant uses when doing
a BSSID list query.
for code to start out on one CPU when thunking into Windows
mode in ctxsw_utow(), and then be pre-empted and migrated to another
CPU before thunking back to UNIX mode in ctxsw_wtou(). This is
bad, because then we can end up looking at the wrong 'thread environment
block' when trying to come back to UNIX mode. To avoid this, we now
pin ourselves to the current CPU when thunking into Windows code.
Few other cleanups, since I'm here:
- Get rid of the ndis_isr(), ndis_enable_interrupt() and
ndis_disable_interrupt() wrappers from kern_ndis.c and just invoke
the miniport's methods directly in the interrupt handling routines
in subr_ndis.c. We may as well lose the function call overhead,
since we don't need to export these things outside of ndis.ko
now anyway.
- Remove call to ndis_enable_interrupt() from ndis_init() in if_ndis.c.
We don't need to do it there anyway (the miniport init routine handles
it, if needed).
- Fix the logic in NdisWriteErrorLogEntry() a little.
- Change some NDIS_STATUS_xxx codes in subr_ntoskrnl.c into STATUS_xxx
codes.
- Handle kthread_create() failure correctly in PsCreateSystemThread().
and ndis_halt_nic(). It's been disabled for some time anyway, and
it turns out there's a possible deadlock in NdisMInitializeTimer() when
acquiring the miniport block lock to modify the timer list: it's
possible for a driver to call NdisMInitializeTimer() when the miniport
block lock has already been acquired by an earlier piece of code. You
can't acquire the same spinlock twice, so this can deadlock.
Also, implement MmMapIoSpace() and MmUnmapIoSpace(), and make
NdisMMapIoSpace() and NdisMUnmapIoSpace() use them. There are some
drivers that want MmMapIoSpace() and MmUnmapIoSpace() so that they can
map arbitrary register spaces not directly associated with their
device resources. For example, there's an Atheros driver for
a miniPci card (0x168C:0x1014) on the IBM Thinkpad x40 that wants
to map some I/O spaces at 0xF00000 and 0xE00000 which are held by
the acpi0 device. I don't know what it wants these ranges for,
but if it can't map and access them, the MiniportInitialize() method
fails.
This avoids the need for sched_bind() in the default case so that you
can start up the NDIS subsystem at boot time when only CPU 0 is running.
There are potentially ways to fix it so that the DPC threads aren't
started until after the other CPUs are launched, but doing it correctly
is tricky. You need to defer the startup of the ntoskrnl subsystem
(ntoskrnl_libinit()), not just defer ndis_attach().
For now, I don't think it will make much difference having just the
single DPC thread (I started out with just one anyway). Note that this
turns the KeSetTargetProcessorDpc() routine into a no-op, since the
CPU number in struct kdpc is now ignored.
is KeRaiseIrql(newirql, &oldirql), not oldirql = KeRaiseIrql(newirql).
(The macro ultimately translates to KfRaiseIrql() which does use
the latter API, so this has no effect on generated code.)
Also, wait for thread termination the right way: kthread_exit()
will ultimately do a wakeup(td->td_proc). This is the event we
should wait on. Eliminate the previous synchronization machinery
for this since it was never guaranteed to work correctly.
processor, to insure DPC thread 0 runs on CPU0, DPC thread 1 runs on
CPU1, and so on.
Elevate the priority of the workitem threads, though don't use as
high a priority as the DPC threads.
- Change ndis_return() from a DPC to a workitem so that it doesn't
run at DISPATCH_LEVEL (with the dispatcher lock held).
- In if_ndis.c, submit packets to the stack via (*ifp->if_input)() in
a workitem instead of doing it directly in ndis_rxeof(), because
ndis_rxeof() runs in a DPC, and hence at DISPATCH_LEVEL. This
implies that the 'dispatch level' mutex for the current CPU is
being held, and we don't want to call if_input while holding
any locks.
- Reimplement IoConnectInterrupt()/IoDisconnectInterrupt(). The original
approach I used to track down the interrupt resource (by scanning
the device tree starting at the nexus) is prone to problems when
two devices share an interrupt. (E.g removing ndis1 might disable
interrupts for ndis0.) The new approach is to multiplex all the
NDIS interrupts through a common internal dispatcher (ntoskrnl_intr())
and allow IoConnectInterrupt()/IoDisconnectInterrupt() to add or
remove interrupts from the dispatch list.
- Implement KeAcquireInterruptSpinLock() and KeReleaseInterruptSpinLock().
- Change the DPC and workitem threads to use the KeXXXSpinLock
API instead of mtx_lock_spin()/mtx_unlock_spin().
- Simplify the NdisXXXPacket routines by creating an actual
packet pool structure and using the InterlockedSList routines
to manage the packet queue.
- Only honor the value returned by OID_GEN_MAXIMUM_SEND_PACKETS
for serialized drivers. For deserialized drivers, we now create
a packet array of 64 entries. (The Microsoft DDK documentation
says that for deserialized miniports, OID_GEN_MAXIMUM_SEND_PACKETS
is ignored, and the driver for the Marvell 8335 chip, which is
a deserialized miniport, returns 1 when queried.)
- Clean up timer handling in subr_ntoskrnl.
- Add the following conditional debugging code:
NTOSKRNL_DEBUG_TIMERS - add debugging and stats for timers
NDIS_DEBUG_PACKETS - add extra sanity checking for NdisXXXPacket API
NTOSKRNL_DEBUG_SPINLOCKS - add test for spinning too long
- In kern_ndis.c, always start the HAL first and shut it down last,
since Windows spinlocks depend on it. Ntoskrnl should similarly be
started second and shut down next to last.
First and most importantly, I threw out the thread priority-twiddling
implementation of KeRaiseIrql()/KeLowerIrq()/KeGetCurrentIrql() in
favor of a new scheme that uses sleep mutexes. The old scheme was
really very naughty and sought to provide the same behavior as
Windows spinlocks (i.e. blocking pre-emption) but in a way that
wouldn't raise the ire of WITNESS. The new scheme represents
'DISPATCH_LEVEL' as the acquisition of a per-cpu sleep mutex. If
a thread on cpu0 acquires the 'dispatcher mutex,' it will block
any other thread on the same processor that tries to acquire it,
in effect only allowing one thread on the processor to be at
'DISPATCH_LEVEL' at any given time. It can then do the 'atomic sit
and spin' routine on the spinlock variable itself. If a thread on
cpu1 wants to acquire the same spinlock, it acquires the 'dispatcher
mutex' for cpu1 and then it too does an atomic sit and spin to try
acquiring the spinlock.
Unlike real spinlocks, this does not disable pre-emption of all
threads on the CPU, but it does put any threads involved with
the NDISulator to sleep, which is just as good for our purposes.
This means I can now play nice with WITNESS, and I can safely do
things like call malloc() when I'm at 'DISPATCH_LEVEL,' which
you're allowed to do in Windows.
Next, I completely re-wrote most of the event/timer/mutex handling
and wait code. KeWaitForSingleObject() and KeWaitForMultipleObjects()
have been re-written to use condition variables instead of msleep().
This allows us to use the Windows convention whereby thread A can
tell thread B "wake up with a boosted priority." (With msleep(), you
instead have thread B saying "when I get woken up, I'll use this
priority here," and thread A can't tell it to do otherwise.) The
new KeWaitForMultipleObjects() has been better tested and better
duplicates the semantics of its Windows counterpart.
I also overhauled the IoQueueWorkItem() API and underlying code.
Like KeInsertQueueDpc(), IoQueueWorkItem() must insure that the
same work item isn't put on the queue twice. ExQueueWorkItem(),
which in my implementation is built on top of IoQueueWorkItem(),
was also modified to perform a similar test.
I renamed the doubly-linked list macros to give them the same names
as their Windows counterparts and fixed RemoveListTail() and
RemoveListHead() so they properly return the removed item.
I also corrected the list handling code in ntoskrnl_dpc_thread()
and ntoskrnl_workitem_thread(). I realized that the original logic
did not correctly handle the case where a DPC callout tries to
queue up another DPC. It works correctly now.
I implemented IoConnectInterrupt() and IoDisconnectInterrupt() and
modified NdisMRegisterInterrupt() and NdisMDisconnectInterrupt() to
use them. I also tried to duplicate the interrupt handling scheme
used in Windows. The interrupt handling is now internal to ndis.ko,
and the ndis_intr() function has been removed from if_ndis.c. (In
the USB case, interrupt handling isn't needed in if_ndis.c anyway.)
NdisMSleep() has been rewritten to use a KeWaitForSingleObject()
and a KeTimer, which is how it works in Windows. (This is mainly
to insure that the NDISulator uses the KeTimer API so I can spot
any problems with it that may arise.)
KeCancelTimer() has been changed so that it only cancels timers, and
does not attempt to cancel a DPC if the timer managed to fire and
queue one up before KeCancelTimer() was called. The Windows DDK
documentation seems to imply that KeCantelTimer() will also call
KeRemoveQueueDpc() if necessary, but it really doesn't.
The KeTimer implementation has been rewritten to use the callout API
directly instead of timeout()/untimeout(). I still cheat a little in
that I have to manage my own small callout timer wheel, but the timer
code works more smoothly now. I discovered a race condition using
timeout()/untimeout() with periodic timers where untimeout() fails
to actually cancel a timer. I don't quite understand where the race
is, using callout_init()/callout_reset()/callout_stop() directly
seems to fix it.
I also discovered and fixed a bug in winx32_wrap.S related to
translating _stdcall calls. There are a couple of routines
(i.e. the 64-bit arithmetic intrinsics in subr_ntoskrnl) that
return 64-bit quantities. On the x86 arch, 64-bit values are
returned in the %eax and %edx registers. However, it happens
that the ctxsw_utow() routine uses %edx as a scratch register,
and x86_stdcall_wrap() and x86_stdcall_call() were only preserving
%eax before branching to ctxsw_utow(). This means %edx was getting
clobbered in some cases. Curiously, the most noticeable effect of this
bug is that the driver for the TI AXC110 chipset would constantly drop
and reacquire its link for no apparent reason. Both %eax and %edx
are preserved on the stack now. The _fastcall and _regparm
wrappers already handled everything correctly.
I changed if_ndis to use IoAllocateWorkItem() and IoQueueWorkItem()
instead of the NdisScheduleWorkItem() API. This is to avoid possible
deadlocks with any drivers that use NdisScheduleWorkItem() themselves.
The unicode/ansi conversion handling code has been cleaned up. The
internal routines have been moved to subr_ntoskrnl and the
RtlXXX routines have been exported so that subr_ndis can call them.
This removes the incestuous relationship between the two modules
regarding this code and fixes the implementation so that it honors
the 'maxlen' fields correctly. (Previously it was possible for
NdisUnicodeStringToAnsiString() to possibly clobber memory it didn't
own, which was causing many mysterious crashes in the Marvell 8335
driver.)
The registry handling code (NdisOpen/Close/ReadConfiguration()) has
been fixed to allocate memory for all the parameters it hands out to
callers and delete whem when NdisCloseConfiguration() is called.
(Previously, it would secretly use a single static buffer.)
I also substantially updated if_ndis so that the source can now be
built on FreeBSD 7, 6 and 5 without any changes. On FreeBSD 5, only
WEP support is enabled. On FreeBSD 6 and 7, WPA-PSK support is enabled.
The original WPA code has been updated to fit in more cleanly with
the net80211 API, and to eleminate the use of magic numbers. The
ndis_80211_setstate() routine now sets a default authmode of OPEN
and initializes the RTS threshold and fragmentation threshold.
The WPA routines were changed so that the authentication mode is
always set first, followed by the cipher. Some drivers depend on
the operations being performed in this order.
I also added passthrough ioctls that allow application code to
directly call the MiniportSetInformation()/MiniportQueryInformation()
methods via ndis_set_info() and ndis_get_info(). The ndis_linksts()
routine also caches the last 4 events signalled by the driver via
NdisMIndicateStatus(), and they can be queried by an application via
a separate ioctl. This is done to allow wpa_supplicant to directly
program the various crypto and key management options in the driver,
allowing things like WPA2 support to work.
Whew.
as a part of the GENERIC kernel with INVARIANT* and WITNESS*
turned off.
(For non GENERIC kernel KTR and MUTEX_PROFILING should be also
off).
Submitted by: Eygene A. Ryabinkin <rea at rea dot mbslab dot kiae dot ru>
Approved by: re (scottl)
PR: 81767
works again.
This driver uses NdisScheduleWorkItem(), and we have to take special steps
to insure that its workitems don't collide with any of the other workitems
used by the NDISulator. In particular, if one of the driver's work jobs
blocks, it can prevent NdisMAllocateSharedMemoryAsync() from completing
when expected.
The original hack to fix this was to have NdisMAllocateSharedMemoryAsync()
defer its work to the DPC queue instead of the general task queue. To
fix it now, I decided to add some additional workitem threads. (There's
supposed to be a pool of worker threads in Windows anyway.) Currently,
there are 4. There should be at least 2. One is reserved for the legacy
ExQueueWorkItem() API, while the others are used in round-robin by the
IoQueueWorkItem() API. NdisMAllocateSharedMemoryAsync() uses the latter
API while NdisScheduleWorkItem() uses the former, so the deadlock is
avoided.
Fixed NdisMRegisterDevice()/NdisMDeregisterDevice() to work a little
more sensibly with the new driver_object/device_object framework. It
doesn't really register a working user-mode interface, but the existing
code was completely wrong for the new framework.
Fixed a couple of bugs dealing with the cancellation of events and
DPCs. When cancelling an event that's still on the timer queue (i.e.
hasn't expired yet), reset dh_inserted in its dispatch header to FALSE.
Previously, it was left set to TRUE, which would make a cancelled
timer appear to have not been cancelled. Also, when removing a DPC
from a queue, reset its list pointers, otherwise a cancelled DPC
might mistakenly be treated as still pending.
Lastly, fix the behavior of ntoskrnl_wakeup() when dealing with
objects that have nobody waiting on them: sync event objects get
their signalled state reset to FALSE, but notification objects
should still be set to TRUE.
routines (_alldiv(), _allmul(), _alludiv(), _aullmul(), etc...)
that use the _stdcall calling convention.
These routines all take two arguments, but the arguments are 64 bits wide.
On the i386 this means they each consume two 32-bit slots on the stack.
Consequently, when we specify the argument count in the IMPORT_SFUNC()
macro, we have to lie and claim there are 4 arguments instead of two.
This will cause the resulting i386 assembly wrapper to push the right
number of longwords onto the stack.
This fixes a crash I discovered with the RealTek 8180 driver, which
uses these routines a lot during initialization.
Remove unused fields from ndis_miniport_block.
Fix a bug in KeFlushQueuedDpcs() (we weren't calculating the kq pointer
correctly).
In if_ndis.c, clear the IFF_RUNNING flag before calling ndis_halt_nic().
Add some guards in kern_ndis.c to avoid letting anyone invoke ndis_get_info()
or ndis_set_info() if the NIC isn't fully initialized. Apparently, mdnsd
will sometimes try to invoke the ndis_ioctl() routine at exactly the
wrong moment (to futz with its multicast filters) when the interface
comes up, and can trigger a crash unless we guard against it.
- Remove the old task threads from kern_ndis.c and reimplement them in
subr_ntoskrnl.c, in order to more properly emulate the Windows DPC
API. Each CPU gets its own DPC queue/thread, and each queue can
have low, medium and high importance DPCs. New APIs implemented:
KeSetTargetProcessorDpc(), KeSetImportanceDpc() and KeFlushQueuedDpcs().
(This is the biggest change.)
- Fix a bug in NdisMInitializeTimer(): the k_dpc pointer in the
nmt_timer embedded in the ndis_miniport_timer struct must be set
to point to the DPC, also embedded in the struct. Failing to do
this breaks dequeueing of DPCs submitted via timers, and in turn
breaks cancelling timers.
- Fix a bug in KeCancelTimer(): if the timer is interted in the timer
queue (i.e. the timeout callback is still pending), we have to both
untimeout() the timer _and_ call KeRemoveQueueDpc() to nuke the DPC
that might be pending. Failing to do this breaks cancellation of
periodic timers, which always appear to be inserted in the timer queue.
- Make use of the nmt_nexttimer field in ndis_miniport_timer: keep a
queue of pending timers and cancel them all in ndis_halt_nic(), prior
to calling MiniportHalt(). Also call KeFlushQueuedDpcs() to make sure
any DPCs queued by the timers have expired.
- Modify NdisMAllocateSharedMemory() and NdisMFreeSharedMemory() to keep
track of both the virtual and physical addresses of the shared memory
buffers that get handed out. The AirGo MIMO driver appears to have a bug
in it: for one of the segments is allocates, it returns the wrong
virtual address. This would confuse NdisMFreeSharedMemory() and cause
a crash. Why it doesn't crash Windows too I have no idea (from reading
the documentation for NdisMFreeSharedMemory(), it appears to be a violation
of the API).
- Implement strstr(), strchr() and MmIsAddressValid().
- Implement IoAllocateWorkItem(), IoFreeWorkItem(), IoQueueWorkItem() and
ExQueueWorkItem(). (This is the second biggest change.)
- Make NdisScheduleWorkItem() call ExQueueWorkItem(). (Note that the
ExQueueWorkItem() API is deprecated by Microsoft, but NDIS still uses
it, since NdisScheduleWorkItem() is incompatible with the IoXXXWorkItem()
API.)
- Change if_ndis.c to use the NdisScheduleWorkItem() interface for scheduling
tasks.
With all these changes and fixes, the AirGo MIMO driver for the Belkin
F5D8010 Pre-N card now works. Special thanks to Paul Robinson
(paul dawt robinson at pwermedia dawt net) for the loan of a card
for testing.
here on in, if_ndis.ko will be pre-built as a module, and can be built
into a static kernel (though it's not part of GENERIC). Drivers are
created using the new ndisgen(8) script, which uses ndiscvt(8) under
the covers, along with a few other tools. The result is a driver module
that can be kldloaded into the kernel.
A driver with foo.inf and foo.sys files will be converted into
foo_sys.ko (and foo_sys.o, for those who want/need to make static
kernels). This module contains all of the necessary info from the
.INF file and the driver binary image, converted into an ELF module.
You can kldload this module (or add it to /boot/loader.conf) to have
it loaded automatically. Any required firmware files can be bundled
into the module as well (or converted/loaded separately).
Also, add a workaround for a problem in NdisMSleep(). During system
bootstrap (cold == 1), msleep() always returns 0 without actually
sleeping. The Intel 2200BG driver uses NdisMSleep() to wait for
the NIC's firmware to come to life, and fails to load if NdisMSleep()
doesn't actually delay. As a workaround, if msleep() (and hence
ndis_thsuspend()) returns 0, use a hard DELAY() to sleep instead).
This is not really the right thing to do, but we can't really do much
else. At the very least, this makes the Intel driver happy.
There are probably other drivers that fail in this way during bootstrap.
Unfortunately, the only workaround for those is to avoid pre-loading
them and kldload them once the system is running instead.
layer, but with a twist.
The twist has to do with the fact that Microsoft supports structured
exception handling in kernel mode. On the i386 arch, exception handling
is implemented by hanging an exception registration list off the
Thread Environment Block (TEB), and the TEB is accessed via the %fs
register. The problem is, we use %fs as a pointer to the pcpu stucture,
which means any driver that tries to write through %fs:0 will overwrite
the curthread pointer and make a serious mess of things.
To get around this, Project Evil now creates a special entry in
the GDT on each processor. When we call into Windows code, a context
switch routine will fix up %fs so it points to our new descriptor,
which in turn points to a fake TEB. When the Windows code returns,
or calls out to an external routine, we swap %fs back again. Currently,
Project Evil makes use of GDT slot 7, which is all 0s by default.
I fully expect someone to jump up and say I can't do that, but I
couldn't find any code that makes use of this entry anywhere. Sadly,
this was the only method I could come up with that worked on both
UP and SMP. (Modifying the LDT works on UP, but becomes incredibly
complicated on SMP.) If necessary, the context switching stuff can
be yanked out while preserving the convention calling wrappers.
(Fortunately, it looks like Microsoft uses some special epilog/prolog
code on amd64 to implement exception handling, so the same nastiness
won't be necessary on that arch.)
The advantages are:
- Any driver that uses %fs as though it were a TEB pointer won't
clobber pcpu.
- All the __stdcall/__fastcall/__regparm stuff that's specific to
gcc goes away.
Also, while I'm here, switch NdisGetSystemUpTime() back to using
nanouptime() again. It turns out nanouptime() is way more accurate
than just using ticks(). On slower machines, the Atheros drivers
I tested seem to take a long time to associate due to the loss
in accuracy.
- On amd64, InterlockedPushEntrySList() and InterlockedPopEntrySList()
are mapped to ExpInterlockedPushEntrySList and
ExpInterlockedPopEntrySList() via macros (which do the same thing).
Add IMPORT_FUNC_MAP()s for these.
- Implement ExQueryDepthSList().
alloc and free routine pointers in the lookaside list with pointers
to ExAllocatePoolWithTag() and ExFreePool() (in the case where the
driver does not provide its own alloc and free routines). For amd64,
this is wrong: we have to use pointers to the wrapped versions of these
functions, not the originals.
nll_obsoletelock field in the lookaside list structure is only defined
for the i386 arch. For amd64, the field is gone, and different list
update routines are used which do their locking internally. Apparently
the Inprocomm amd64 driver uses lookaside lists. I'm not positive this
will make it work yet since I don't have an Inprocomm NIC to test, but
this needs to be fixed anyway.
work on SMP" saga. After several weeks and much gnashing of teeth,
I have finally tracked down all the problems, despite their best
efforts to confound and annoy me.
Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
miniport! It used to be that NDIS drivers relied on the NDIS library
itself for all their locking and serialization needs. Transmit packet
queues were all handled internally by NDIS, and all calls to
MiniportXXX() routines were guaranteed to be appropriately serialized.
This proved to be a performance problem however, and Microsoft
introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
still supports serialized miniports, but recommends that all new drivers
written for Windows XP and later be deserialized. Apparently Atheros
wasn't listening when they said this.
This means (among other things) that we have to serialize calls to
MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
that are triggered via the NdisMInitializeTimer() routine. It finally
dawned on me why NdisMInitializeTimer() takes a special
NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
the timer callback must be serialized, and it's only by saving the
miniport block handle that we can get access to the serialization
lock during the timer callback.
Problem number two: haunted hardware. The thing that was _really_
driving me absolutely bonkers for the longest time is that, for some
reason I couldn't understand, my test machine would occasionally freeze
or more frustratingly, reset completely. That's reset and in *pow!*
back to the BIOS startup. No panic, no crashdump, just a reset. This
appeared to happen most often when MiniportReset() was called. (As
to why MiniportReset() was being called, see problem three below.)
I thought maybe I had created some sort of horrible deadlock
condition in the process of adding the serialization, but after three
weeks, at least 6 different locking implementations and heroic efforts
to debug the spinlock code, the machine still kept resetting. Finally,
I started single stepping through the MiniportReset() routine in
the driver using the kernel debugger, and this ultimately led me to
the source of the problem.
One of the last things the Atheros MiniportReset() routine does is
call NdisReadPciSlotInformation() several times to inspect a portion
of the device's PCI config space. It reads the same chunk of config
space repeatedly, in rapid succession. Presumeably, it's polling
the hardware for some sort of event. The reset occurs partway through
this process. I discovered that when I single-stepped through this
portion of the routine, the reset didn't occur. So I inserted a 1
microsecond delay into the read loop in NdisReadPciSlotInformation().
Suddenly, the reset was gone!!
I'm still very puzzled by the whole thing. What I suspect is happening
is that reading the PCI config space so quickly is causing a severe
PCI bus error. My test system is a Sun w2100z dual Opteron system,
and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
plugged into a 100Mhz PCI slot. It's possible that this combination of
hardware causes a bus protocol violation in this scenario which leads
to a fatal machine check. This is pure speculation though. Really all I
know for sure is that inserting the delay makes the problem go away.
(To quote Homer Simpson: "I don't know how it works, but fire makes
it good!")
Problem number three: NdisAllocatePacket() needs to make sure to
initialize the npp_validcounts field in the 'private' section of
the NDIS_PACKET structure. The reason if_ndis was calling the
MiniportReset() routine in the first place is that packet transmits
were sometimes hanging. When sending a packet, an NDIS driver will
call NdisQueryPacket() to learn how many physical buffers the packet
resides in. NdisQueryPacket() is actually a macro, which traverses
the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
of the results in the 'private' section of the NDIS_PACKET. It also
sets the npp_validcounts field to TRUE To indicate that the results are
now valid. The problem is, now that if_ndis creates a pool of transmit
packets via NdisAllocatePacketPool(), it's important that each time
a new packet is allocated via NdisAllocatePacket() that validcounts
be initialized to FALSE. If it isn't, and a previously transmitted
NDIS_PACKET is pulled out of the pool, it may contain stale data
from a previous transmission which won't get updated by NdisQueryPacket().
This would cause the driver to miscompute the number of fragments
for a given packet, and botch the transmission.
Fixing these three problems seems to make the Atheros driver happy
on SMP, which hopefully means other serialized miniports will be
happy too.
And there was much rejoicing.
Other stuff fixed along the way:
- Modified ndis_thsuspend() to take a mutex as an argument. This
allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
avoid any possible race conditions with other routines that
use the dispatcher lock.
- Fixed KeCancelTimer() so that it returns the correct value for
'pending' according to the Microsoft documentation
- Modfied NdisGetSystemUpTime() to use ticks and hz rather than
calling nanouptime(). Also added comment that this routine wraps
after 49.7 days.
- Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
all the MSCALL() goop.
- For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
function. This is because it's supposed to be _stdcall on the x86
arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
On amd64, all routines use the same calling convention so we can
just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
and it will work. (The _fastcall attribute is a no-op on amd64.)
- Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
just macros) and use them for interrupt handling. This allows us to
move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.
- Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
when computing mdl_size instead of uint32_t, so that it matches the
MmSizeOfMdl() routine.
- Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
subr_ndis.c.
- Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.
- Get rid of the "wait for link event" hack in ndis_init(). Now that
I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
This should fix the witness panic a couple of people have reported.
- Use MSCALL1() when calling the MiniportHangCheck() function in
ndis_ticktask(). I accidentally missed this one when adding the
wrapping for amd64.
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.
The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.
What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.
Along the way, I discovered a few other problems:
- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
wrong.)
- Similarly, the NDIS interrupt handler, which is essentially a
DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
has been fixed accordingly.
- MiniportQueryInformation() and MiniportSetInformation() run at
DISPATCH_LEVEL, and each request must complete before another
can be submitted. ndis_get_info() and ndis_set_info() have been
fixed accordingly.
- Turned the sleep lock that guards the NDIS thread job list into
a spin lock. We never do anything with this lock held except manage
the job list (no other locks are held), so it's safe to do this,
and it's possible that ndis_sched() and ndis_unsched() can be
called from DISPATCH_LEVEL, so using a sleep lock here is
semantically incorrect. Also updated subr_witness.c to add the
lock to the order list.
that describe a buffer of variable size). The problem is, allocating
MDLs off the heap is slow, and it can happen that drivers will allocate
lots and lots of lots of MDLs as they run.
As a compromise, we now do the following: we pre-allocate a zone for
MDLs big enough to describe any buffer with 16 or less pages. If
IoAllocateMdl() needs a MDL for a buffer with 16 or less pages, we'll
allocate it from the zone. Otherwise, we allocate it from the heap.
MDLs allocate from the zone have a flag set in their mdl_flags field.
When the MDL is released, IoMdlFree() will uma_zfree() the MDL if
it has the MDL_ZONE_ALLOCED flag set, otherwise it will release it
to the heap.
The assumption is that 16 pages is a "big number" and we will rarely
need MDLs larger than that.
- Moved the ndis_buffer zone to subr_ntoskrnl.c from kern_ndis.c
and named it mdl_zone.
- Modified IoAllocateMdl() and IoFreeMdl() to use uma_zalloc() and
uma_zfree() if necessary.
- Made ndis_mtop() use IoAllocateMdl() instead of calling uma_zalloc()
directly.
Inspired by: discussion with Giridhar Pemmasani
- In kern_ndis.c:ndis_unload_driver(), test that ndis_block->nmb_rlist
is not NULL before trying to free() it.
- In subr_pe.c:pe_get_import_descriptor(), do a case-insensitive
match on the import module name. Most drivers I have encountered
link against "ntoskrnl.exe" but the ASIX USB ethernet driver I'm
testing with wants "NTOSKRNL.EXE."
- In subr_ntoskrnl.c:IoAllocateIrp(), return a pointer to the IRP
instead of NULL. (Stub code leftover.)
- Also in subr_ntoskrnl.c, add ExAllocatePoolWithTag() and ExFreePool()
to the function table list so they'll get exported to drivers properly.
and a machine-independent though inefficient InterlockedExchange().
In Windows, InterlockedExchange() appears to be implemented in header
files via inline assembly. I would prefer using an atomic.h macro for
this, but there doesn't seem to be one that just does a plain old
atomic exchange (as opposed to compare and exchange). Also implement
IoSetCancelRoutine(), which is just a macro that uses InterlockedExchange().
Fill in IoBuildSynchronousFsdRequest(), IoBuildAsynchronousFsdRequest()
and IoBuildDeviceIoControlRequest() so that they do something useful,
and add a bunch of #defines to ntoskrnl_var.h to help make these work.
These may require some tweaks later.
for now) exactly the same as KfAcquireSpinLock() and KfReleaseSpinLock().
I implemented the former as small routines in subr_ntoskrnl.c that just
turned around and invoked the latter. But I don't really need the wrapper
routines: I can just create an entries in the ntoskrnl func table that
map KeAcquireSpinLockRaiseToDpc() and KeReleaseSpinLock() to
KfAcquireSpinLock() and KfReleaseSpinLock() directly. This means
the stubs can go away.
Ville-Pertti Keinonen (will at exomi dot comohmygodnospampleasekthx)
deserves a big thanks for submitting initial patches to make it
work. I have mangled his contributions appropriately.
The main gotcha with Windows/x86-64 is that Microsoft uses a different
calling convention than everyone else. The standard ABI requires using
6 registers for argument passing, with other arguments on the stack.
Microsoft uses only 4 registers, and requires the caller to leave room
on the stack for the register arguments incase the callee needs to
spill them. Unlike x86, where Microsoft uses a mix of _cdecl, _stdcall
and _fastcall, all routines on Windows/x86-64 uses the same convention.
This unfortunately means that all the functions we export to the
driver require an intermediate translation wrapper. Similarly, we have
to wrap all calls back into the driver binary itself.
The original patches provided macros to wrap every single routine at
compile time, providing a secondary jump table with a customized
wrapper for each exported routine. I decided to use a different approach:
the call wrapper for each function is created from a template at
runtime, and the routine to jump to is patched into the wrapper as
it is created. The subr_pe module has been modified to patch in the
wrapped function instead of the original. (On x86, the wrapping
routine is a no-op.)
There are some minor API differences that had to be accounted for:
- KeAcquireSpinLock() is a real function on amd64, not a macro wrapper
around KfAcquireSpinLock()
- NdisFreeBuffer() is actually IoFreeMdl(). I had to change the whole
NDIS_BUFFER API a bit to accomodate this.
Bugs fixed along the way:
- IoAllocateMdl() always returned NULL
- kern_windrv.c:windrv_unload() wasn't releasing private driver object
extensions correctly (found thanks to memguard)
This has only been tested with the driver for the Broadcom 802.11g
chipset, which was the only Windows/x86-64 driver I could find.
Windows DRIVER_OBJECT and DEVICE_OBJECT mechanism so that we can
simulate driver stacking.
In Windows, each loaded driver image is attached to a DRIVER_OBJECT
structure. Windows uses the registry to match up a given vendor/device
ID combination with a corresponding DRIVER_OBJECT. When a driver image
is first loaded, its DriverEntry() routine is invoked, which sets up
the AddDevice() function pointer in the DRIVER_OBJECT and creates
a dispatch table (based on IRP major codes). When a Windows bus driver
detects a new device, it creates a Physical Device Object (PDO) for
it. This is a DEVICE_OBJECT structure, with semantics analagous to
that of a device_t in FreeBSD. The Windows PNP manager will invoke
the driver's AddDevice() function and pass it pointers to the DRIVER_OBJECT
and the PDO.
The AddDevice() function then creates a new DRIVER_OBJECT structure of
its own. This is known as the Functional Device Object (FDO) and
corresponds roughly to a private softc instance. The driver uses
IoAttachDeviceToDeviceStack() to add this device object to the
driver stack for this PDO. Subsequent drivers (called filter drivers
in Windows-speak) can be loaded which add themselves to the stack.
When someone issues an IRP to a device, it travel along the stack
passing through several possible filter drivers until it reaches
the functional driver (which actually knows how to talk to the hardware)
at which point it will be completed. This is how Windows achieves
driver layering.
Project Evil now simulates most of this. if_ndis now has a modevent
handler which will use MOD_LOAD and MOD_UNLOAD events to drive the
creation and destruction of DRIVER_OBJECTs. (The load event also
does the relocation/dynalinking of the image.) We don't have a registry,
so the DRIVER_OBJECTS are stored in a linked list for now. Eventually,
the list entry will contain the vendor/device ID list extracted from
the .INF file. When ndis_probe() is called and detectes a supported
device, it will create a PDO for the device instance and attach it
to the DRIVER_OBJECT just as in Windows. ndis_attach() will then call
our NdisAddDevice() handler to create the FDO. The NDIS miniport block
is now a device extension hung off the FDO, just as it is in Windows.
The miniport characteristics table is now an extension hung off the
DRIVER_OBJECT as well (the characteristics are the same for all devices
handled by a given driver, so they don't need to be per-instance.)
We also do an IoAttachDeviceToDeviceStack() to put the FDO on the
stack for the PDO. There are a couple of fake bus drivers created
for the PCI and pccard buses. Eventually, there will be one for USB,
which will actually accept USB IRP.s
Things should still work just as before, only now we do things in
the proper order and maintain the correct framework to support passing
IRPs between drivers.
Various changes:
- corrected the comments about IRQL handling in subr_hal.c to more
accurately reflect reality
- update ndiscvt to make the drv_data symbol in ndis_driver_data.h a
global so that if_ndis_pci.o and/or if_ndis_pccard.o can see it.
- Obtain the softc pointer from the miniport block by referencing
the PDO rather than a private pointer of our own (nmb_ifp is no
longer used)
- implement IoAttachDeviceToDeviceStack(), IoDetachDevice(),
IoGetAttachedDevice(), IoAllocateDriverObjectExtension(),
IoGetDriverObjectExtension(), IoCreateDevice(), IoDeleteDevice(),
IoAllocateIrp(), IoReuseIrp(), IoMakeAssociatedIrp(), IoFreeIrp(),
IoInitializeIrp()
- fix a few mistakes in the driver_object and device_object definitions
- add a new module, kern_windrv.c, to handle the driver registration
and relocation/dynalinkign duties (which don't really belong in
kern_ndis.c).
- made ndis_block and ndis_chars in the ndis_softc stucture pointers
and modified all references to it
- fixed NdisMRegisterMiniport() and NdisInitializeWrapper() so they
work correctly with the new driver_object mechanism
- changed ndis_attach() to call NdisAddDevice() instead of ndis_load_driver()
(which is now deprecated)
- used ExAllocatePoolWithTag()/ExFreePool() in lookaside list routines
instead of kludged up alloc/free routines
- added kern_windrv.c to sys/modules/ndis/Makefile and files.i386.
attributes in casts (i.e. foo = (__stdcall sometype)bar). This only
happens in two places where we need to set up function pointers, so
work around the problem with some void pointer magic.
USB device support):
- Convert all of my locally chosen function names to their actual
Windows equivalents, where applicable. This is a big no-op change
since it doesn't affect functionality, but it helps avoid a bit
of confusion (it's now a lot easier to see which functions are
emulated Windows API routines and which are just locally defined).
- Turn ndis_buffer into an mdl, like it should have been. The structure
is the same, but now it belongs to the subr_ntoskrnl module.
- Implement a bunch of MDL handling macros from Windows and use them where
applicable.
- Correct the implementation of IoFreeMdl().
- Properly implement IoAllocateMdl() and MmBuildMdlForNonPagedPool().
- Add the definitions for struct irp and struct driver_object.
- Add IMPORT_FUNC() and IMPORT_FUNC_MAP() macros to make formatting
the module function tables a little cleaner. (Should also help
with AMD64 support later on.)
- Fix if_ndis.c to use KeRaiseIrql() and KeLowerIrql() instead of
the previous calls to hal_raise_irql() and hal_lower_irql() which
have been renamed.
The function renaming generated a lot of churn here, but there should
be very little operational effect.
calls MiniportQueryInformation(), it will return NDIS_STATUS_PENDING.
When this happens, ndis_get_info() will sleep waiting for a completion
event. If two threads call ndis_get_info() and both end up having to
sleep, they will both end up waiting on the same wait channel, which
can cause a panic in sleepq_add() if INVARIANTS are turned on.
Fix this by having ndis_get_info() use a common mutex rather than
using the process mutex with PROC_LOCK(). Also do the same for
ndis_set_info(). Note that Pierre's original patch also made ndis_thsuspend()
use the new mutex, but ndis_thsuspend() shouldn't need this since
it will make each thread that calls it sleep on a unique wait channel.
Also, it occured to me that we probably don't want to enter
MiniportQueryInformation() or MiniportSetInformation() from more
than one thread at any given time, so now we acquire a Windows
spinlock before calling either of them. The Microsoft documentation
says that MiniportQueryInformation() and MiniportSetInformation()
are called at DISPATCH_LEVEL, and previously we would call
KeRaiseIrql() to set the IRQL to DISPATCH_LEVEL before entering
either routine, but this only guarantees mutual exclusion on
uniprocessor machines. To make it SMP safe, we need to use a real
spinlock. For now, I'm abusing the spinlock embedded in the
NDIS_MINIPORT_BLOCK structure for this purpose. (This may need to be
applied to some of the other routines in kern_ndis.c at a later date.)
Export ntoskrnl_init_lock() (KeInitializeSpinlock()) from subr_ntoskrnl.c
since we need to use in in kern_ndis.c, and since it's technically part
of the Windows kernel DDK API along with the other spinlock routines. Use
it in subr_ndis.c too rather than frobbing the spinlock directly.