to a label is inside an #ifdef block, then the label should *also* be
inside an #ifdef block. Hide the "done:" label which is only used if
DEVICE_POLLING is enabled under #ifdef DEVICE_POLLING.
Non-SMP, i386-only, no polling in the idle loop at the moment.
To use this code you must compile a kernel with
options DEVICE_POLLING
and at runtime enable polling with
sysctl kern.polling.enable=1
The percentage of CPU reserved to userland can be set with
sysctl kern.polling.user_frac=NN (default is 50)
while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.
Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at
http://info.iet.unipi.it/~luigi/polling/
and also supports polling in the idle loop.
NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.
NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.
Quick description of files touched by this commit:
sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications
Introduce an additional device flag for those NICs which require the
transmit buffers to be aligned to 32-bit boundaries.
(the equivalen fix for STABLE is slightly simpler because there are
no supported chips which require this alignment there.)
The reason we are required to commit to -current first is so that later
MFC's do not risk the loss of existing bug fixes. Even if this was not
strictly required in -current, it should still be fixed there too.
mbuf allocation fails, and fix (i hope) a couple of style bugs.
I believe these printf() are extremely dangerous because now they can
occur on every incoming packet and are not rate limited. They were
meant to warn the sysadmin about lack of resources, but now they
can become a nice way to panic your system under load.
Other drivers (e.g. the fxp driver) have nothing like this.
There is a pending discussion on putting this kind of warnings
elsewhere, and I hope we can fix this soon.
underlying unaligned bcopy) on incoming packets that are already
available (albeit unaligned) in a buffer.
The performance improvement varies, depending on CPU and memory
speed, but can be quite large especially on slow CPUs. I have seen
over 50% increase on forwarding speed on the sis driver for the
486/133 (embedded systems), which does exactly the same thing.
The behaviour is controlled by a sysctl variable, hw.dc_quick which
defaults to 1. Set it to 0 to restore the old behaviour.
After running a few experiments (in userland, though) I am convinced
that doing the m_devget() is detrimental to performance in almost
all cases.
Even if your CPU has degraded performance with misaligned data,
the bcopy() in the driver has the same overhead due to misaligment
as the one that you save in the uiomove(), plus you do one extra
copy and pollute the cache.
But more often than not, you do not even have to touch the payload,
e.g. when you are forwarding packets, and even in the often-cited
case of NFS, you often end up passing a pointer to the payload to
the disk controller.
In any case, you can play with the sysctl variable to toggle between
the two behaviours, and see if it makes a difference.
MFC-after: 3 days
in the 21143, instead of giving priority to the receive unit.
This gives a 10-15% performance improvement in the forwarding rate
under heavy load.
Reviewed-by: Bill Paul
. Make internal service routines static.
. Use a consistent ordering of checks in MII_TICK. Do the work in the
mii_phy_tick() subroutine if appropriate.
. Call mii_phy_update() to trigger the callbacks.
laptops with this chip should test this and report back as I don't have
access to this hardware myself. People with -stable systems should try
the patch at:
http://www.freebsd.org/~wpaul/conexant.patch.gz
Submitted by: Phil Kernick <Phil@Kernick.org>
a bunch of frames. In this case, the dc_link flag is cleared, and dc_start()
stops draining the if_snd send queue, which results in lots of 'no buffers
available' errors being reported to applications. The whole idea behind
not draining the send queue until the link comes up was to avoid having
the gratuitous ARP being lost while we're waiting for autoneg to complete
after the interface is first brought up. As an optimization, change the
test in dc_start() so that we only bail if dc_link is not set _and_ there
are less than 10 packets in the send queue. If the queue has many frames
in it, we need to drain them. If the queue has a small number of frames
in it, we can hold off on sending them until the link comes up.
MFC after: 1 week
something: offset into the first mbuf of the target chain before copying
the source data over.
Make drivers using m_devget() with a first argument "data - ETHER_ALIGN"
to use the offset argument to pass ETHER_ALIGN in. The way it was previously
done is potentially dangerous if the source data was at the top of a page
and the offset caused the previous page to be copied (if the
previous page has not yet been appropriately mapped).
The old `offset' argument in m_devget() is not used anywhere (it's always
0) and dates back to ~1995 (and earlier?) when support for ethernet trailers
existed. With that support gone, it was merely collecting dust.
Tested on alpha by: jlemon
Partially submitted by: jlemon
Reviewed by: jlemon
MFC after: 3 weeks
- Use pci_get_powerstate()/pci_set_powerstate() in all the other drivers
that need them so we don't have to fiddle with the PCI power management
registers directly.
- Use pci_enable_busmaster()/pci_enable_io() to turn on busmastering and
PIO/memory mapped accesses.
- Add support to the RealTek driver for the D-Link DFE-530TX+ which has
a RealTek 8139 with its own PCI ID. (Submitted by Jason Wright)
- Have the SiS 900/National DP83815 driver be sure to disable PME
mode in sis_reset(). This apparently fixes a problem on some
motherboards where the DP83815 chip fails to receive packets.
(Submitted by Chuck McCrobie <mccrobie@cablespeed.com>)
case there is nothing to do. This happens normally when the card shares
the interrupt line with other devices.
This code saves a couple of microseconds per interrupt even on a
fast CPU. You normally would not care, except under heavy tinygram
traffic where you can have some 50-100.000 interrupts per second...
On passing, correct a spelling error.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
if_vr: handle the case where vr_encap() returns failure: bust out of the
packet sending loop instead of panicking. Also add some missing
newlines to some printf()s.
if_dc: The miibus_read and miibus_write methods keep swapping in and
out of MII mode by fiddling with CSR6 for cards with MII PHYs.
This is a hack to support the original Macronix 98713 card which
has built-in NWAY that uses an MII-like management interface
even though it uses serial transceivers. Conditionalize this
so that we only do this on 98713 chips, since it does bad things
to genuine tulip chips (and maybe other clones).
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.
Reviewed by: jhb
PCI code. This saves each driver from having to grovel around looking
for the right registers to twiddle.
I should eventually convert the other PCI drivers to do this; for now,
these three are ones which I know need power state handling.
the interface to use callout_* instead of timeout(). Also add an
IS_MPSAFE #define (currently off) which will mark the driver as mpsafe
to the upper layers.
them. If we leave garbage in them, the dc_apply_fixup() routine may
try to follow bogus pointers when applying the reset fixup.
Noticed by: Andrew Gallatin
a RealTek 8139 cardbus device. Unfortunately it doesn't quite work yet
because the CIS parser barfs on it.
Submitted by msmith, with some small tweaks by me.
the PCI latency timer value to 0x80. Davicom's Linux driver does this,
and it drastically reduces the number of TX underruns in my tests. (Note:
this is done only for the Davicom chips. I'm not sure it's a good idea to
do it for all of them.)
Again, still waiting on confirmation before merging to stable.
DM9100/DM9102 chips. Do not set DC_TX_ONE. The DC_TX_USE_TX_INTR flag
causes dc_encap() to set the 'interrupt on TX completion' bit only
once every 64 packets. This is an attempt to reduce the number
of interrupts generated by the chip. You're supposed to get a 'no more
TX buffers left' interrupt once you hit the last packet whether you
ask for one or not, however it seems the Davicom chip doesn't generate
this interrupt, or at least it doesn't generate it under the same
circumstances. The result is that if you transmit n packets, where
n is less than 64, and then wait 5 seconds, you'll get a watchdog
timeout whether you want one or not. The DC_TX_INTR_ALWAYS causes
dc_encap() to request an interrupt for every frame.
I'm still waiting on confirmation from a couple of users to see if this
fixes their problems with the Davicom DM9102 before I merge this into
-stable, but this fixed the problem for me in my own testing so I'm
willing to make the change to -current right away.
This commit adds support for Xircom X3201 based cardbus cards.
Support for the TDK 78Q2120 MII is also added.
IBM Etherjet, Intel and Xircom cards uses these chips.
Note that as a result of this commit, some Intel/DEC 21143 based cardbus
cards will also attach, but not get link. That is being looked at.
takes care of all the 10/100 and gigE PCI drivers that I've done.
Next will be the wireless drivers, then the USB ones. I may pick up
some stragglers along the way. I'm sort of playing this by ear: if
anyone spots any places where I've screwed up horribly, please let me
know.
adapters. This is necessary in order to make this driver work with
the built-in ethernet on the alpha Miata machines. These systems
have a 21143-PC chip on-board and optional daughtercards with either
a 10/100 MII transceiver or a 10baseT/10base2 transceiver. In both
cases, you need to twiddle the GPIO bits on the controller in order
to turn the transceivers on, and you have to read the media info
from the SROM in order to find out what bits to twiddle.
on the NEC VersaPro NoteBook PC. This 21143 implementation has no LEDs,
and flipping the LED control bits somehow stops it from establishing
a link. We check the subsystem ID and don't flip the LED control
bits for the NEC NIC.
the link and activity LED control bits in CSR15 in order for the
controller to drive the LEDs correctly. This was largely done for the
ZNYX multiport cards, but should also work with the DEC DE500-BA
and other non-MII cards.
with LEDs on some cards being stomped on when clearing the "jabber disable"
bit. Using DC_SETBIT() has an unwanted side effect of setting a write enable
bit in the watchdog timer register which we really want to be cleared when
we do a write.
3.3volt PCI/cardbus chipsets similar to the 98715 (and they have
512-bit hash tables). Also update the man page to mention the 98727/98732
and the SOHOware SFA110A Rev B4 card with the 98715AEC-C chip.
which differ slightly from the Macronix MX98715AEC chip on the sample
adapter that I have in that the multicast hash table is only 128 bits
wide instead of 512. New adapters are popping up with this chip, and
due to improper handling of the smaller hash table, broadcast packets
were not being received correctly.
ether_ifdetach().
The former consolidates the operations of if_attach(), ng_ether_attach(),
and bpfattach(). The latter consolidates the corresponding detach operations.
Reviewed by: julian, freebsd-net
21143 chips, I accidentally removed the DC_MII_REDUCED_POLL flag
for all 21143 cards. This caused problems with timer-instigated
TCP retransmits, which happened to occur at the same time as an
MII poll tick on MII-based cards (e.g. D-Link DFE-570TX). Fixed this,
plus made some other cleanups. The autoneg fixes for the non-MII
cards still work. Also tested the PNIC II now that I have one again.
workalike chips (Macronix 98713A/98715 and PNIC II). Timing is somewhat
critical: you need to bring the link as soon as possible after NWAY
is done, and the old one second polling interval was too long. Now
we poll every 10th of a second until NWAY completes (at which point
we return to the 1 second interval again to keep an eye on the link
state).
I tested all the other cards I had on hand to make sure I didn't bust
any of them and they seem to work (including the MII-based 21143 card).
This should fix some autoneg problems with DE500-BA cards and the
built-in 10/100 ethernet on some alpha systems.
(Now before anyone asks why I never noticed this before, the old code
worked just find with the Intel swich I used for testing back in NY.
Apparently not all switches are as picky about the timing.)
of the individual drivers and into the common routine ether_input().
Also, remove the (incomplete) hack for matching ethernet headers
in the ip_fw code.
The good news: net result of 1016 lines removed, and this should make
bridging now work with *all* Ethernet drivers.
The bad news: it's nearly impossible to test every driver, especially
for bridging, and I was unable to get much testing help on the mailing
lists.
Reviewed by: freebsd-net
Note that if_aue doesn't strictly depend on usb because it uses the
method interface for calls rather than using internal symbols, and
because it's a child driver of usb and therefore will not try and do
anything unless the parent usb code is loaded at some point. if_aue does
strictly depend on miibus as it will fail to link if it is missing.
This is just to make sure we initialize the chip correctly: we need to
make the sure the port select bit in CSR6 is set properly so that we
use the internal PHY for 10/100 support. (The eval boards I have also
include an external HomePNA PHY, but I need to play with that more
before I can support it.)
packets into a single buffer, and set the DC_TX_COALESCE flag for the
Davicom DM9102 chip. I thought I had escaped this problem, but... This
chip appears to silently corrupt or discard transmitted frames when
using scatter/gather DMA (i.e. DMAing each packet fragment in place
with a separate descriptor). The only way to insure reliable transmission
is to coalesce transmitted packets into a single cluster buffer. (There
may also be an alignment constraint here, but mbuf cluster buffers are
naturally aligned on 2K boundaries, which seems to be good enough.)
The DM9102 driver for Linux written by Davicom also uses this workaround.
Unfortunately, the Davicom datasheet has no errata section describing
this or any other apparently known defect.
Problem noted by: allan_chou@davicom.com.tw
down, the dc driver and receiver can fall out of sync with one another,
resulting in a condition where the chip continues to receive packets
but the driver never notices. Normally, the receive handler checks each
descriptor starting from the current producer index to see if the chip
has relinquished ownership, indicating that a packet has been received.
The driver hands the packet off to ether_input() and then prepares the
descriptor to receive another frame before moving on to the next
descriptor in the ring. But sometimes, the chip appears to skip a
descriptor. This leaves the driver testing the status word in a descriptor
that never gets updated. The driver still gets "RX done" interrupts but
never advances further into the RX ring, until the ring fills up and the
chip interrupts again to signal an error condition. Sometimes, the
driver will remain in this desynchronized state, resulting in spotty
performance until the interface is reset.
Fortunately, it's fairly simple to detect this condition: if we call
the rxeof routine but the number of received packets doesn't increase,
we suspect that there could be a problem. In this case, we call a new
routine called dc_rx_resync(), which scans ahead in the RX ring to see
if there's a frame waiting for us somewhere beyond that the driver thinks
is the current producer index. If it finds one, it bumps up the index
and calls the rxeof handler again to snarf up the packet and bring the
driver back in sync with the chip. (It may actually do this several times
in the event that there's more than one "hole" in the ring.)
So far the only card supported by if_dc which has exhibited this problem
is a LinkSys LNE100TX v2.0 (82c115 PNIC II), and it only seems to happen
on one particular system, however the fix is general enough and has low
enough overhead that we may as well apply it for all supported chipsets.
I also implemented the same fix for the 3Com xl driver, which is apparently
vulnerable to the same problem.
Problem originally noted and patch tested by: Matt Dillon
- Add a flag DC_TX_INTR_ALWAYS which causes the transmit code to
request a TX done interrupt for every packet. The PNIC seems to need
this to insure that the sent TX buffers get reaped in a timely fashion.
- Try to unreset the SIA as soon as possible after resetting the whole
chip.
- Change dcphy to support either 10/100 or 10Mbps only NICs. The
built-in 21143 ethernet in Compaq Presario machines is 10Mbps only
and it doesn't work right if we try to advertise 100Mbps modes during
autoneg. When restricted to only 10mbps modes, it works fine.
Note that for now, I detect this condition by checking the PCI
subsystem ID on this NIC (which has a Compaq vendor/device ID).
Yes, I know that's what the SROM is supposed to be for. I'm deliberately
ignoring the SROM wherever possible. Sue me.
The latter two fixes allow if_dc to work correctly with the built-in
ethernet on certain Compaq Presario boxes. There are liable to be quite
a few people using these as their home systems who might want to try
FreeBSD; may as well be nice to them.
Now if anybody out there has an Alpha miata with 10Mbps ethernet and
can show me the output from pciconf -l on their system, I'd be grateful.
case. The idea is to reduce how often we call mii_tick(), however currently
it may not be called often enough, which prevents autonegotiation from
being driven correctly.
This should improve the chances of successfully autonegotiating media
settings on non-MII 21143 NICs. (Still waiting for confirmation from
some testers, but the code is clearly wrong in any case.)
which it replaces. The new driver supports all of the chips supported
by the ones it replaces, as well as many DEC/Intel 21143 10/100 cards.
This also completes my quest to convert things to miibus and add
Alpha support.