This patch is rather big because I had to significantly redesign
the driver to make the busdma conversion possible. Most notably,
hardware and software structures were carefully splitted to get
rid of all the structs overlapping evilness.
Special thanks to phk and Richard Puga <puga@mauibuilt.com> for
providing me with fxp(4) hardware to do this work.
Thanks to marcel for testing this on ia64, and to Fred Clift
<fclift@verio.net> for testing this on alpha.
Tested on: i386, ia64, alpha
the fxp driver. This is enabled only for the 82550/82551 chips
(PCI revision code 12 or 13). RX and TX checksum offload are
both supported. Transmit offload is limited to TCP and UDP only
right now: there seems to be a problem with IP header checksumming
on transmit in some cases.
This chip has hardware VLAN support as well. I hope to enable
support for this eventually.
o don't strip the Ethernet header from inbound packets; pass packets
up the stack intact (required significant changes to some drivers)
o reference common definitions in net/ethernet.h (e.g. ETHER_ALIGN)
o track ether_ifattach/ether_ifdetach API changes
o track bpf changes (use BPF_TAP and BPF_MTAP)
o track vlan changes (ifnet capabilities, revised processing scheme, etc.)
o use if_input to pass packets "up"
o call ether_ioctl for default handling of ioctls
Reviewed by: many
Approved by: re
just limited to the DEVICE_POLLING case. This removes the FXP_RFA_RNRMARK
hack, and replaces it with a softc flag that is used to record when
the handling of a no-resource condition was deferred due to running
out of DEVICE_POLLING cycles. This was tested on -stable, but the
code is essentially the same as in -current. It should only affect
the case where DEVICE_POLLING is defined.
The details of the mechanism behind the crashes are still uncertain
but the most likely cause seems to be some kind of hardware confusion
when the no-resource recovery code is accidentally invoked while
the receiver is still active. This could have happened if the
hardware left the 0x4000 bit of the RFA status word set. The comments
in the commit log for revision 1.142 stating that the driver could
clash with the hardware writing to this status word were not correct.
Tested by: Guy Helmer <ghelmer@palisadesys.com>
behaviour of the hardware: a possibly reserved bit of the receive
descriptor (RFA) `status' field is borrowed to record no-resource
(RNR) events, and the same status field is read and written to at
a time that may clash with the hardware updating this field.
There is no hardware documentation available to determine if these
things are safe to do; the second issue almost certainly isn't, and
the first is only safe if there is documentation saying that this
bit is free to be used by the driver. The PR referenced below
provides extremely convincing evidence that the changes cause random
crashes on some (unusual) hardware.
Since these features are only required by the DEVICE_POLLING case,
this commit makes their use conditional on that option. It does not
change the DEVICE_POLLING case, but at least people with the rare
hardware on which this code causes problems can now avoid the crashes
by not enabling DEVICE_POLLING.
PR: kern/42260
Reviewed by: luigi
Problem revision found by: Pawel Malachowski <pawmal@unia.3lo.lublin.pl>
Tested by: Pawel Malachowski <pawmal@unia.3lo.lublin.pl>
MFC after: 1 week
Linux driver defines 0x103[B-E] so add those as well.
Obtained from: Intel Linux e100 driver
MFC: Immediately if re@ allows it, otherwise after 4.7-RELEASE
Also take this chance to cleanup the code in fxp_intr_body.
Add a missing block of code to disable interrupts when
reinitializing the interface while doing polling (the RELENG_4
version was correct).
MFC after: 3 days
1.131 is slightly broken, and I would commit the fix to that here, but it
has been reported that any deviation from the original code is causing
problems with some 82557 chips, causing them to lock hard.
Until those issues have been figured out, going back to the original
code is the best plan.
Frustrated: Silby
are packets queued for transmission.
This driver is strange -- it never sets IFF_OACTIVE, so all
transmissions always cause a call to fxp_start. However, if the
link gets stuck, there was nothing to reset it, so there was still
a possibility of lockups.
MFC after: 3 days
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
instead of relying on the previous filters to be present.
Back out r1.125, as a reset is needed to unload any existing microcode,
(which clears the multicast addresses), as it is superceded by this change.
Non-SMP, i386-only, no polling in the idle loop at the moment.
To use this code you must compile a kernel with
options DEVICE_POLLING
and at runtime enable polling with
sysctl kern.polling.enable=1
The percentage of CPU reserved to userland can be set with
sysctl kern.polling.user_frac=NN (default is 50)
while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.
Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at
http://info.iet.unipi.it/~luigi/polling/
and also supports polling in the idle loop.
NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.
NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.
Quick description of files touched by this commit:
sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications
and packet bundling. Make the microcode settings controllable via sysctl
and loader tunables.
Submitted by: Marko Zec <zec@tel.fer.hr>
(with some munging and dynamic sysctl support by me)
Also extend the workaround for Dynamic Standby mode to later '559 chips,
not just the ICH2 variants.
This is taken verbatim from the Intel's e100-1.6.22 release, with
the addition of their LICENSE file at the top.
Submitted by: Marko Zec <zec@tel.fer.hr>
the chip can cause a PCI protocol violation in under certain scenarios.
The workaround is to rewrite the EEPROM to disable Dynamic Standby Mode.
Once the EEPROM is rewritten, the system needs to be rebooted in order
to pick up the new settings.
This has been tested on several ICH2/ICH2-M systems, found in 815E based
boards, and usually identified by the presence of the 82562 ET/EM PHY.
Thanks to: Mike Tansca, Paul Saab for samples of the problematic boards.
of " &= ". Also change the MII PHY device mask to check the correct bits.
Cookie to: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Pointy hat to: me
a 82557 (e.g.: a newer chip) then:
+ enable MWI, if the PCI configuration indicates the system supports it
+ enable usage of extended TxCB, for better performance
+ enable hardware flow control. FC frames will be passed up to the
host only if promiscuous mode is enabled.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
an override as a loader settable variable (fxp_iomap). fxp_iomap is
a bitmap of fxp units that should be configured to use PCI I/O space
in stead of PCI Memory space.
Reviewed by: Kees Jan Koster <dutchman@tccn.cs.kun.nl>, dg@freebsd.org
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.
Reviewed by: jhb
claimed that their Intel NIC is comatose after a warm boot from Windoze.
This is most likely due to the card getting put in the D3 state. This
should bring it back to life.
is enabling as all entries are still called with Giant being held.
Maintaining compatability with NetBSD makes what should be very simple
kinda ugly.
Reviewed by: Jason Evans
understand exactly what it is about SMPng that tickles this bug. What I
do know is that the foo_init() routine in most drivers is often called
twice when an interface is brought up. One time is due to the ifconfig(8)
command calling the SIOCSIFFLAGS ioctl to set the IFF_UP flag, and another
is probably due to the kernel calling ifp->if_init at some point. In any
case, the SMPng changes seem to affect the timing of these two events in
such a way that there is a significant delay before any packets are sent
onto the wire after the interface is first brought up. This manifested
itself locally as an SMPng test machine which failed to obtain an address
via DHCP when booting up.
It looks like the second call to fxp_init() is happening faster now than
it did before, and I think it catches the chip while it's in the process
of dealing with the configuration command from the first call. Whatever
the case, a FXP_CSR_SCB_CNA interrupt event is now generated shortly after
the second fxp_init() call. (This interrupt is apparently never generated
by a non-SMPng kernel, so nobody noticed.)
There are two problems with this: first, fxp_intr() does not handle the
FXP_CSR_SCB_CNA interrupt event (it never tests for it or does anything
to deal with it), and second, the meaning of FXP_CSR_SCB_CNA is not
documented in the driver. (Apparently it means "command unit not active.")
Bad coder. No biscuit.
The fix is to have the FXP_CSR_SCB_CNA interrupt handled just like the
FXP_SCB_STATACK_CXTNO interrupt. This prevents the state machine for
the configuration/RX filter programming stuff from getting wedged for
several seconds and preventing packet transmission.
Noticed by: jhb
lock up under moderate to heavy load.
The status & command fields share a 32-bit longword. The programming
API of the eepro apparently requires that you update the command field
of a transmit slot that you've already given to the card. This means
the card could be updating the status field of the same longword at
the same time. Since alphas can only operate on 32-bit chunks of
memory, both the status & command fields are loaded from memory &
operated on in registers when the following line of C is executed:
sc->cbl_last->cb_command &= ~FXP_CB_COMMAND_S;
The race is caused by the card DMA'ing up the status at just the wrong
time -- after it has been loaded into a register & before it has been
written back. The old value of the status is written back, clobbering
the status the card just DMA'ed up. The fact that the card has sent
this frame is missed & the transmit engine appears to hang.
Luckily, as numerous people on the freebsd-alpha list pointed out, the
load-locked/store-conditional instructions used by the atomic
functions work with respect changes in memory due to I/O devices. We
now use them to safely update the command field.
Tested by: Bernd Walter <ticso@mail.cicely.de>
ether_ifdetach().
The former consolidates the operations of if_attach(), ng_ether_attach(),
and bpfattach(). The latter consolidates the corresponding detach operations.
Reviewed by: julian, freebsd-net
of the individual drivers and into the common routine ether_input().
Also, remove the (incomplete) hack for matching ethernet headers
in the ip_fw code.
The good news: net result of 1016 lines removed, and this should make
bridging now work with *all* Ethernet drivers.
The bad news: it's nearly impossible to test every driver, especially
for bridging, and I was unable to get much testing help on the mailing
lists.
Reviewed by: freebsd-net
address size that is different than the standard 6bits. This fixes
support for the Compaq NC3121 card, certain newer Intel Pro/100+
cards, and should also fix integrated NICs on SuperMicro and Compaq
motherboards.
The auto-sizing algorithm was taken from NetBSD (thanks!), which I
think got it from Linux originally.
Thanks also to Andrew Sparrow <spadger@best.com> and Joe Moore
<jomor@ahpcns.com> for supplying me with unworking Compaq and Intel
cards to develop and test the fixes with.
and/or when using the card.
o Convert the driver to using bus_space. This allows alphas with
fxp's to boot, rather than panic'ing because rman_get_virtual()
doesn't really return a virtual address on alphas.
o Fix an alpha unaligned access error caused by some misfeature of
gcc/egcs: if link_addr & rbd_addr in the fxp_rfa struct are 32 bit
quantities, egcs will assume they are naturally aligned. So it will do
a ldl & some shifty/masky to twiddle 16 bit values in fxp_lwcopy().
However, if they are 16-bit aligned, the ldl will actually be done on
a 16-bit aligned value & we will panic with an unaligned access
error... Changing their definition to an array of chars seems to fix
this. I obtained this from NetBSD.
I've tested this on both i386 & alpha.
This means that we will not have to have a bpf and a non-bpf version
of our driver modules.
This does not open any security hole, because the bpf core isn't loadable
The drivers left unchanged are the "cross platform" drivers where the respective
maintainers are urged to DTRT, whatever that may be.
Add a couple of missing FreeBSD tags.
declaration for the interface driver from "foo" to "if_foo" but leave the
declaration for the miibus attached to the interface driver alone. This
lets the internal module name be "if_foo" while still allowing the miibus
instances to attach to "foo."
This should allow ifconfig to autoload driver modules again without
breaking the miibus attach.
This whole idea isn't going to work until somebody makes the bus/kld
code smarter. The idea here is to change the module's internal name
from "foo" to "if_foo" so that ifconfig can tell a network driver from
a non-network one. However doing this doesn't work correctly no matter
how you slice it. For everything to work, you have to change the name
in both the driver_t struct and the DRIVER_MODULE() declaration. The
problems are:
- If you change the name in both places, then the kernel thinks that
the device's name is now "if_foo", so you get things like:
if_foo0: <FOO ethernet> irq foo at device foo on pcifoo
if_foo0: Ethernet address: foo:foo:foo:foo:foo:foo
This is bogus. Now the device name doesn't agree with the logical
interface name. There's no reason for this, and it violates the
principle of least astonishment.
- If you leave the name in the driver_t struct as "foo" and only
change the names in the DRIVER_MODULE() declaration to "if_foo" then
attaching drivers to child devices doesn't work because the names don't
agree. This breaks miibus: drivers that need to have miibuses and PHY
drivers attached never get them.
In other words: damned if you do, damned if you don't.
This needs to be thought through some more. Since the drivers that
use miibus are broken, I have to change these all back in order to
make them work again. Yes this will stop ifconfig from being able
to demand load driver modules. On the whole, I'd rather have that
than having the drivers not work at all.
I have an 82559 card with the same id as the other 8255[78] chips, but
that was made with a date code of 0699 (June 99). The submitter shows
this working with the probe etc, but doesn't actually say it works as
on the ethernet. :-) Assuming it does, this is a RELENG_3 merge candidate.
Submitted by: Steven E Lumos <slumos@sam.ISRI.UNLV.EDU>
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.
(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)
This is a checkpoint of work-in-progress, but is quite functional.
The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.
Approved by: core
Now should be able to report speed for cards using NatSemi PHY.
(if you have one please let me know if it works as I
only have the Intel version)
Reviewed by: David Greenman <dg@root.com>