While I would have prefered to have a solution that didn't move
knowledge of this into the pci layer. However, this is literally the
only exception that's listed in the PCI standard to the usual way of
decoding BARs. atapci devices in legacy mode now ignore the first 4
bars and hard code the values to the legacy ide values (well, for each
of the controllers that are in legacy mode). The 5th bar is handled
normally.
Remove the zero bar handling. zero bars should be ignored at all
other times, and since we handle that specially, we don't need the
older workaround.
what the ACPI-safe workaround is intended to fix. Requested by phk.
Set the bushandle and tag when attaching the timer, don't do it each time
in read_counter(). Pointed out by bde.
Move test_counter() to the end. Staticize acpi_timer_reg.
Clearly comment the assumptions on the structure of keys (addresses)
and masks, and introduce a macro, LEN(p), to extract the size of these
objects instead of using *(u_char *)p which might be confusing.
Comment the confusion in the types used to pass around pointers
to keys and masks, as a reminder to fix that at some point.
Add a few comments on what some functions do.
Comment a probably inefficient (but still correct) section of code
in rn_walktree_from()
The object code generated after this commit is the same as before.
At some point we should also change same variable identifiers such
as "t, tt, ttt" to fancier names such as "root, left, right" (just
in case someone wants to understand the code!), replace misspelling
of NULL as 0, remove 'register' declarations that make little sense
these days.
like "the foo(4) manual page" to "foo(4)". Uniformized the remaining
instances of "manual page" and "manpage" to "man page". Uniformized
some nearby sentence breaks. Reformatted the whole paragraph containing
these changes only for DUMMYNET.
of you with other cards, please do review and test the drivers for
MP-safety and disable Giant in the interrupt routines when you are
sure of proper functionality.
wireless ever since I added the new spinlock code. Previously, I added
a special ndis_rxeof_serial() function to insure that when we receive
a packet, we never end up calling the MiniportReturnPacket() routine
until after the receive handler has finished. I set things up so that
ndis_rxeof_serial() would only be used for serialized miniports since
they depend on this property. Well, it turns out deserialized miniports
depend on a similar property: you can't let MiniportReturnPacket() be
called from the same context as the receive handler at all. The 2100B
driver happens to use a single spinlock for all of its synchronization,
and it tries to acquire it both while in MiniportHandleInterrupt() and
in MiniportReturnPacket(), so if we call MiniportReturnPacket() from
the MiniportHandleInterrupt() context, we will end up trying to acquire
the spinlock recursively, which you can't do.
To fix this, I made the ndis_rxeof_serial() handler the default. An
alternate solution would be to make ndis_return_packet() submit
the call to MiniportReturnPacket() to the NDIS task queue thread.
I may do that in the future, after I've tested things a bit more.
supported. Symptoms of this bug included unnecessary use of ACPI-safe
and a dmesg that has deltas of about 2^24:
ACPI timer looks BAD min = 2, max = 16777206, width = 16777204
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks GOOD min = 4, max = 5, width = 1
ACPI timer looks BAD min = 2, max = 16777206, width = 16777204
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks BAD min = 2, max = 16777210, width = 16777208
ACPI timer looks BAD min = 4, max = 16777189, width = 16777185
ACPI timer looks GOOD min = 4, max = 5, width = 1
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks BAD min = 4, max = 16777189, width = 16777185
To fix this:
* Use a 32 bit timecounter mask when the timer is 32 bits.
* In test_counter(), use the acpi_TimerDelta function which handles 24/32
bit timers and wraparound.
Miscellaneous fixes:
* Use C99 initializers for timecounter struct.
* Use u_int and uint32_t where appropriate instead of unsigned.
* Remove whitespace-only lines
* Remove the old PIIX4 PCI workaround. The timecounter testing code has
been in use for long enough to prove it's functional.
globally available. acpi_TimerDelta() subtracts two readings from the
ACPI PM timer and returns the difference. It properly distinguishes between
24-bit and 32-bit timers and handles wraparound.
2. Document that this means that kernel modules must be rebuilt.
3. While I'm here, fix my sorting error in callout.h
Requested by: many [1], scottl [2], bde [3]
it checked for rt == NULL after dereferencing the pointer).
We never check for those events elsewhere, so probably these checks
might go away here as well.
Slightly simplify (and document) the logic for memory allocation
in rt_setgate().
The rest is mostly style changes -- replace 0 with NULL where appropriate,
remove the macro SA() that was only used once, remove some useless
debugging code in rt_fixchange, explain some odd-looking casts.
implementation taken directly from OpenBSD.
I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
TIME_WAIT recycling cases I was able to generate with http testing tools.
In short, as the old algorithm relied on ticks to create the time offset
component of an ISN, two connections with the exact same host, port pair
that were generated between timer ticks would have the exact same sequence
number. As a result, the second connection would fail to pass the TIME_WAIT
check on the server side, and the SYN would never be acknowledged.
I've "fixed" this by adding random positive increments to the time component
between clock ticks so that ISNs will *always* be increasing, no matter how
quickly the port is recycled.
Except in such contrived benchmarking situations, this problem should never
come up in normal usage... until networks get faster.
No MFC planned, 4.x is missing other optimizations that are needed to even
create the situation in which such quick port recycling will occur.
a NULL crsbuf pointer. This shouldn't happen if it returns AE_OK. We'll
figure out why this is happening later.
Submitted by: Bruno Ducrot <ducrot@poupinou.org>
routine since the error will be reported back to the user buffer.
This will quiet down the bootverbose case when using an ACU which
does brute force discovery of the physical and logical devices.
the same process as the current thread it makes absolutely
no sense to lock the parent process through the pointer in
said thread.
Submitted by: pho (with minor correction)
Pointy Hat To: mtm
this patch were submitted by Maurycy Pawlowski-Wieronski. In addition
to Maurycy's change, break out softc tear down from ppp_clone_destroy()
into ppp_destroy() rather than performing a convoluted series of
extraction casts and indirections during tear down at mod unload.
Submitted by: Maurycy Pawlowski-Wieronski <maurycy@fouk.org>
of the struct, so that a placeholder for it (or unportable C99
initializers) are not needed for entries that don't use it. Use a C99
initializer for the 1 entry that uses it. Removed 91 placeholders.
This also restores API compatibility with NetBSD and RELENG_4 for most
entries.
+ remove useless wrappers around bcmp(), bcopy(), bzero().
The code assumes that bcmp() returns 0 if the size is 0, but
this is true for both the libc and the libkern versions.
+ nuke Bcmp, Bzero, Bcopy from radix.h now that nobody uses them anymore.
Removed the requirement for a particular subvendor/subproduct in
rev.1.26 (VScom PCI-800L card). While the BARs, etc., may depend on
the sub-ids, this is not known to be so, and I think it is better to
guess that they don't. The decision to check sub-id checks in this
file is apparently random; for VScom cards they were checked in 3 of
8 cases.
Reviewed by: timeout by committer (joerg) after 6 months
there so there are no ABI changes);
+ replace 5 redefinitions of the IPF2AC macro with one in if_arp.h
Eventually (but before freezing the ABI) we need to get rid of
struct arpcom (initially with the help of some smart #defines
to avoid having to touch each and every driver, see below).
Apart from the struct ifnet, struct arpcom now only stores a copy
of the MAC address (ac_enaddr, but we already have another copy in
the struct ifnet -- if_addrhead), and a netgraph-specific field
which is _always_ accessed through the ifp, so it might well go
into the struct ifnet too (where, besides, there is already an entry
for AF_NETGRAPH data...)
Too bad ac_enaddr is widely referenced by all drivers. But
this can be fixed as follows:
#define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet
(note that the right hand side would likely be a pointer rather than
the base address of an array.)
+ replace 0 with NULL where appropriate (not complete)
+ remove register declaration while there
+ add argument names to function prototypes to have a better idea of
what they are used for
+ add 'const' qualifiers in 3 places
Nehemiah chip, but the work is all done in hardware.
There are three opportunities to add other entropy; the Data
Buffer, the Cipher's IV and the Cipher's key. A future commit
will exploit these opportunities.
+ remove a partly incorrect comment that i introduced in the last commit;
+ deal with the correct part of the above comment by cleaning up the
updates of 'info' -- rti_addrs needd not to be updated,
rti_info[RTAX_IFP] can be set once outside the loop.
While at it, correct a few misspelling of NULL as 0, but there are
way too many in this file, and i did not want to clutter the
important part of this commit.
Logical volumes on these devices show up as LUNs behind another
controller (also known as proxy controller). In order to issue
firmware commands for a volume on a proxy controller, they must be
targeted at the address of the proxy controller it is attached to,
not the Host/PCI controller.
A proxy controller is defined as a device listed in the INQUIRY
PHYSICAL LUNS command who's L2 and L3 SCSI addresses are zero. The
corresponding address returned defines which "bus" the controller
lives on and we use this to create a virtual CAM bus.
A logical volume's addresses first byte defines the logical drive
number. The second byte defines the bus that it is attached to
which corresponds to the BUS of the proxy controller's found or the
Host/PCI controller.
Change event notification to be handled in its own kernel thread.
This is needed since some events may require the driver to sleep
on some operations and this cannot be done during interrupt context.
With this change, it is now possible to create and destroy logical
volumes from FreeBSD, but it requires a native application to
construct the proper firmware commands which is not publicly
available.
Special thanks to John Cagle @ HP for providing remote access to
all the hardware and beating on the storage engineers at HP to
answer my questions.
Specifically, we used to enable the source after locking sched_lock
and just before we had already decided to do a context switch.
This meant that an ithread could never process more than one interrupt
per context switch. Enabling earlier in the loop before sched_lock is
acquired allows an ithread to handle multiple interrupts per context
switch if interrupts fire very rapidly. For the case of heavy interrupt
load this can reduce the number of context switches (and thus overhead)
as well as reduce interrupt latency.
- Now that we can handle multiple interrupts per context switch, add simple
interrupt storm protection to threaded interrupts. If X number of
consecutive interrupts are triggered before the itherad voluntarily
yields to another thread, then the interrupt thread will sleep with the
associated interrupt source disabled (masked) for 1/10th of a second.
The default value of X is 500, but it can be tweaked via the tunable/
sysctl hw.intr_storm_threshold. If an interrupt storm is detected, then
a message is output to the kernel console on the first occurrence per
interrupt thread. Interrupt storm protection can be disabled completely
by setting this value to 0. There is no scientific reasoning for the
1/10th of a second or 500 interrupts values, so they may require tweaking
at some point in the future.
Tested by: rwatson (an earlier version w/o the storm protection)
Tested by: mux (reportedly made a machine with two PCI interrupts
storming usable rather than hard locked)
Reviewed by: imp
different BIOSs use the same exact settings to mean two very different and
incompatible things for the SCI. Thus, if the SCI is remapped to a PCI
interrupt, we now trust the trigger/polarity that the MADT provides by
default. However, the SCI can be forced to level/lo as 1.10 did by setting
the tunable "hw.acpi.force_sci_lo" to a non-zero value from the loader.
Thus, if rev 1.10 caused an interrupt storm, it should nwo fix your
machine. If rev 1.10 fixed an interrupt storm on your machine, you
probably need to set the aforementioned tunable in /boot/loader.conf to
prevent the interrupt storm.
The more general problem of getting the SCI's trigger/polarity programmed
"correctly" (for some value of correctly meaning several workarounds for
broken BIOSs and inconsistent "implementations" of the ACPI standard) is
going to require more work, but this band-aid should improve the current
situation somewhat.
Requested by: njl
uiomove(9) is not properly locked. So, return to NEEDGIANT
mode. Later, when uiomove is finely locked, I'll revisit.
While I'm here, provide some temporary debugging output to
help catch blocking startups.
unconditionally initialize the mbuf header even if cluster allocation
failed, which could result in a NULL pointer dereference in low-memory
conditions.
PR: kern/65548
Submitted by: Stephan Uphoff <ups@tree.com>
the TAILQ_FOREACH() form.
Comment the need to store the same info (mac address for ethernet-type
devices) in two different places.
No functional changes. Even the compiler output should be unmodified
by this change.
of an interface. No functional change.
On passing, comment an useless invocation of TAILQ_INIT(&ifp->if_addrhead)
which could probably be removed in the interest of clarity.
of an interface. No functional change.
On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed
ntoskrnl_unlocl_dpc().
- hal_raise_irql(), hal_lower_irql() and hal_irql() didn't work right
on SMP (priority inheritance makes things... interesting). For now,
use only two states: DISPATCH_LEVEL (PI_REALTIME) and PASSIVE_LEVEL
(everything else). Tested on a dual PIII box.
- Use ndis_thsuspend() in ndis_sleep() instead of tsleep(). (I added
ndis_thsuspend() and ndis_thresume() to replace kthread_suspend()
and kthread_resume(); the former will preserve a thread's priority
when it wakes up, the latter will not.)
- Change use of tsleep() in ndis_stop_thread() to prevent priority
change on wakeup.
if the link-level address has been initialized already.
The majority of modern drivers never does this and works fine, which
makes me think that the check is totally unnecessary and a residue
of cut&paste from other drivers.
This change is done to simplify locking because now almost none of the
drivers uses this field. The exceptions are "ct" "ctau" and "cx"
where i am not sure if i can remove that part.
This avoids presenting invalid data to the client's applications
when the file is modified, and then extended within the window of
the resolution of the modifcation timestamp.
Reviewed By: iedowse
PR: kern/64091
because they bogusly check for defined(INTR_MPSAFE) -- something which
never was a #define. Correct the definitions.
This make INTR_TYPE_AV finally get used instead of the lower-priority
INTR_TYPE_TTY, so it's quite possible some improvement will be had
on sound driver performance. It would also make all the drivers
marked INTR_MPSAFE actually run without Giant (which does seem to
work for me), but:
INTR_MPSAFE HAS BEEN REMOVED FROM EVERY SOUND DRIVER!
It needs to be re-added on a case-by-case basis since there is no one
who will vouch for which sound drivers, if any, willy actually operate
correctly without Giant, since there hasn't been testing because of
this bug disabling INTR_MPSAFE.
Found by: "Yuriy Tsibizov" <Yuriy.Tsibizov@gfk.ru>
attempting to duplicate Windows spinlocks. Windows spinlocks differ
from FreeBSD spinlocks in the way they block preemption. FreeBSD
spinlocks use critical_enter(), which masks off _all_ interrupts.
This prevents any other threads from being scheduled, but it also
prevents ISRs from running. In Windows, preemption is achieved by
raising the processor IRQL to DISPATCH_LEVEL, which prevents other
threads from preempting you, but does _not_ prevent device ISRs
from running. (This is essentially what Solaris calls dispatcher
locks.) The Windows spinlock itself (kspin_lock) is just an integer
value which is atomically set when you acquire the lock and atomically
cleared when you release it.
FreeBSD doesn't have IRQ levels, so we have to cheat a little by
using thread priorities: normal thread priority is PASSIVE_LEVEL,
lowest interrupt thread priority is DISPATCH_LEVEL, highest thread
priority is DEVICE_LEVEL (PI_REALTIME) and critical_enter() is
HIGH_LEVEL. In practice, only PASSIVE_LEVEL and DISPATCH_LEVEL
matter to us. The immediate benefit of all this is that I no
longer have to rely on a mutex pool.
Now, I'm sure many people will be seized by the urge to criticize
me for doing an end run around our own spinlock implementation, but
it makes more sense to do it this way. Well, it does to me anyway.
Overview of the changes:
- Properly implement hal_lock(), hal_unlock(), hal_irql(),
hal_raise_irql() and hal_lower_irql() so that they more closely
resemble their Windows counterparts. The IRQL is determined by
thread priority.
- Make ntoskrnl_lock_dpc() and ntoskrnl_unlock_dpc() do what they do
in Windows, which is to atomically set/clear the lock value. These
routines are designed to be called from DISPATCH_LEVEL, and are
actually half of the work involved in acquiring/releasing spinlocks.
- Add FASTCALL1(), FASTCALL2() and FASTCALL3() macros/wrappers
that allow us to call a _fastcall function in spite of the fact
that our version of gcc doesn't support __attribute__((__fastcall__))
yet. The macros take 1, 2 or 3 arguments, respectively. We need
to call hal_lock(), hal_unlock() etc... ourselves, but can't really
invoke the function directly. I could have just made the underlying
functions native routines and put _fastcall wrappers around them for
the benefit of Windows binaries, but that would create needless bloat.
- Remove ndis_mtxpool and all references to it. We don't need it
anymore.
- Re-implement the NdisSpinLock routines so that they use hal_lock()
and friends like they do in Windows.
- Use the new spinlock methods for handling lookaside lists and
linked list updates in place of the mutex locks that were there
before.
- Remove mutex locking from ndis_isr() and ndis_intrhand() since they're
already called with ndis_intrmtx held in if_ndis.c.
- Put ndis_destroy_lock() code under explicit #ifdef notdef/#endif.
It turns out there are some drivers which stupidly free the memory
in which their spinlocks reside before calling ndis_destroy_lock()
on them (touch-after-free bug). The ADMtek wireless driver
is guilty of this faux pas. (Why this doesn't clobber Windows I
have no idea.)
- Make NdisDprAcquireSpinLock() and NdisDprReleaseSpinLock() into
real functions instead of aliasing them to NdisAcaquireSpinLock()
and NdisReleaseSpinLock(). The Dpr routines use
KeAcquireSpinLockAtDpcLevel() level and KeReleaseSpinLockFromDpcLevel(),
which acquires the lock without twiddling the IRQL.
- In ndis_linksts_done(), do _not_ call ndis_80211_getstate(). Some
drivers may call the status/status done callbacks as the result of
setting an OID: ndis_80211_getstate() gets OIDs, which means we
might cause the driver to recursively access some of its internal
structures unexpectedly. The ndis_ticktask() routine will call
ndis_80211_getstate() for us eventually anyway.
- Fix the channel setting code a little in ndis_80211_setstate(),
and initialize the channel to IEEE80211_CHAN_ANYC. (The Microsoft
spec says you're not supposed to twiddle the channel in BSS mode;
I may need to enforce this later.) This fixes the problems I was
having with the ADMtek adm8211 driver: we were setting the channel
to a non-standard default, which would cause it to fail to associate
in BSS mode.
- Use hal_raise_irql() to raise our IRQL to DISPATCH_LEVEL when
calling certain miniport routines, per the Microsoft documentation.
I think that's everything. Hopefully, other than fixing the ADMtek
driver, there should be no apparent change in behavior.
* In the resume path, give up after waiting for a while
for WAK_STS to be set. Some BIOSs never set it.
* Allow access to the field if it is within the region size rounded
up to a multiple of the access byte width. This overcomes "off-by-one"
programming errors in the AML often found in Toshiba laptops.
in favour of rtalloc_ign(), which is what would end up being called
anyways.
There are 25 more instances of rtalloc() in net*/ and
about 10 instances of rtalloc_ign()
change the video output but use a separate device with a DSSX method
and a HID of "TOS6201" instead. We use a pseudo-driver to get the handle
for this object and pass it to the acpi_toshiba driver.
This is untested but seems to match the Linux Toshiba driver.
same problems as their Hurricane 575* bretheren in that one could set
the memory mapped port, but that has no effect. Add a quirk for this.
# I'll have to see if I can dig up documentation on these parts to see
# if there's someway software can know this other than a table...
the sense that any write to them reads back as a 0. This presents a
problem to our resource allocation scheme. If we encounter such vars,
the code now treats them as special, allowing any allocation against
them to succeed. I've not seen anything in the standard to clearify
what host software should do when it encounters these sorts of BARs.
Also cleaned up some output while I'm here and add commmented out
bootverbose lines until I'm ready to reduce the verbosity of boot
messages.
This gets a number of south bridges and ata controllers made mostly by
VIA, AMD and nVidia working again. Thanks to Soren Schmidt for his
help in coming up with this patch.
the space occupied by a struct sockaddr when passed through a
routing socket.
Use it to replace the macro ROUNDUP(int), that does the same but
is redefined by every file which uses it, courtesy of
the School of Cut'n'Paste Programming(TM).
(partial) userland changes to follow.
controllers (PDC203** PDC206**).
This also adds preliminary support for the Promise SX4/SX4000 but *only*
as a "normal" Promise ATA controller (ATA RAID's are supported though
but only RAID0, RAID1 and RAID0+1).
This cuts off yet another 5-8% of the command overhead on promise controllers,
making them the fastest we have ever had support for.
Work is now continuing to add support for this in ATA RAID, to accellerate
ATA RAID quite a bit on these controllers, and especially the SX4/SX4000
series as they have quite a few tricks in there..
This commit also adds a few fixes to the SATA code needed for proper support.
Alignment for pccards should also be treated in a similar way that
we tread it for cardbus cards.
Remove bogus debugs while I'm here.
# This is also necessary to make the CIS reading work.
Submitted by: Carlos Velasco
(1) Align to 64k for the CIS. Some cards don't like it when we aren't
aligned to a 64k boundary. I can't find anything in the standard
that requires this, but I have 1/2 dozen cards that won't work at
all unless I enable this.
(2) Sleep 1s before scanning the CIS. This may be a nop, but has little
harm.
(3) The CIS can be up to 4k in some weird, odd-ball edge cases. Since we
have limiters for when that's not the case, it does no harm to increase
it to 4k.
#1 was submitted, in a different form, by Carlos Velasco.
a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.
Reported by: Scott Sipe <cscotts@mindspring.com>
Reviewed by: rwatson, jhb
I'm not sure this is completely correct but at least this
is consistent with the accounting of incoming broadcasts.
PR: kern/65273
Submitted by: David J Duchscher <daved@tamu.edu>
FreeBSD, we can have a negative available space value, but the
corresponding fields in the NFS protocol are unsigned. So
trnucate the value to 0 if it's negative, so that the client
doesn't receive absurdly high values.
Tested by: cognet