Blade 2500, Fire V210 and probably some other sparc64 machines.
These chips are typically not fitted with an EEPROM which means
that we have to obtain the MAC address via OFW and that some chip
tests will just always fail.
These changes are based on the respective code found in OpenBSD
with some additional info obtained from OpenSolaris and some style
suggestions by jkim@. They also have the desired side-effect of
respecting the 'local-mac-address?' system configuration variable
for the affected BGEs.
- In bge_attach() factor out calling bge_release_resources() before
going to the fail label into the fail label as well as replace a
magic 6 with ETHER_ADDR_LEN.
Reviewed by: yongari (before style changes), jkim
- Remove some excessive parentheses around shift operators.
- Use macro instead of magic number where it is applicable.
- Change lower-case hexdecimals to upper cases to match wpaul's style.
- Revert some unnecessary line wraps and changes from the previous commit.
Pointed out by: bde
- Move some PHY bug detections from brgphy.c to if_bge.c.
- Do not penalize working PHYs.
- Re-arrange bge_flags roughly by their categories.
- Fix minor style(9) nits.
PR: kern/107257
Obtained from: OpenBSD
Tested by: Mike Hibler <mike at flux dot utah dot edu>
bge_intr(). Some of them are used in bge_poll(). Simplify by only
initializing these for polling mode and not toggling them when switching
modes. This also fixes missing synchronization with the coalescing
engine in the toggling.
been handled instead of when at least one descriptor was just handled.
For bge, it is normal to get a txeof when only a small fraction of the
queued tx descriptors have been handled, so the bug broke the watchdog
in a usual case.
- moved the synchronizing bus read to after the bus write for the first
interrupt ack so that it actually synchronizes everything necessary.
We were acking not only the status update that triggered the interrupt
together with any status updates that occurred before we got around
to the bus write for the ack, but also any status updates that occur
after we do the bus write but before the write reaches the device.
The corresponding race for the second interrupt ack resulted in
sometimes returning from the interrupt handler with acked but
unserviced interrupt events. Such events then remain unserviced
until further events cause another interrupt or the watchdog times
out.
The race was often lost on my 5705, apparently since my 5705 has broken
event coalescing which causes a status update for almost every packet,
so another status update is quite likely to occur while the interrupt
handler is running. Watchdog timeouts weren't very noticeable,
apparently because bge_txeof() has one of the usual bugs resetting the
watchdog.
- don't disable device interrupts while bge_intr() is running. Doing this
just had the side effects of:
- entering a device mode in which different coalescing parameters apply.
Different coalescing parameters can be used to either inhibit or
enhance the chance of getting another status update while in the
interrupt handler. This feature is useless with the current
organization of the interrupt handler but might be useful with a
taskqueue handler.
- giving a race for ack+reenable/return. This cannot be handled
by simply rearranging the order of bus accesses like the race for
ack+keepenable/entry. It is necessary to sync the ack and then
check for new events.
- taking longer, especially with the extra code to avoid the race on
ack+reenable/return.
Reviewed by: ru, gleb, scottl
- Do not repeatedly read vendor/device IDs while probing.
- Remove redundant bzero(3) for softc. device_get_softc(9) does it for free[1].
Reviewed by: glebius
Suggested by: glebius[1]
have been added erroneously, and it causes problems on some chips. A larger
change is needed to do this write at a more appropriate place, but that
change requires reworking the ASF logic. That will be worked on in the
future.
Submitted by: Bruce Evans
- Use the appropriate register writing method when reseting the chip
- Program the descriptor DMA engine correctly.
- More reliably detect certain chips and their features.
Also add some low-level debugging tools to help future work on this driver.
Submitted by: David Christenson (proof of concept changes)
Sponsored by: www.UIA.net
- Correct RX packet drop counter for BCM5705+. This register is read/clear
and it wraps very quickly under heavy packet drops because only the lower
ten bits are valid according to the documentation. However, it seems few
more bits are actually valid and the rest bits are always zeros[1].
Therefore, we don't mask them off here. To get accurate packet drop count,
we need to check the register from bge_rxeof(). It is commented out for now,
not to penalize normal operation. Actual performance impact should be
measured later.
- Correct integer casting from u_long to uint32_t. Casting is not really
needed for all supported platforms but we better do this correctly[2].
Tested by: bde[1]
Suggested by: bde[2]
discarded RX packets to input error for BCM5705 or newer chipset as the others.
Unfortunately we cannot do the same for output errors because ifOutDiscards
equivalent register does not exist. While I am here, replace misleading and
wrong BGE_RX_STATS/BGE_TX_STATS with BGE_MAC_STATS. They were reversed but
worked accidently.
m_pkthdr.ether_vlan. The presence of the M_VLANTAG flag on the mbuf
signifies the presence and validity of its content.
Drivers that support hardware VLAN tag stripping fill in the received
VLAN tag (containing both vlan and priority information) into the
ether_vtag mbuf packet header field:
m->m_pkthdr.ether_vtag = vlan_id; /* ntohs()? */
m->m_flags |= M_VLANTAG;
to mark the packet m with the specified VLAN tag.
On output the driver should check the mbuf for the M_VLANTAG flag to
see if a VLAN tag is present and valid:
if (m->m_flags & M_VLANTAG) {
... = m->m_pkthdr.ether_vtag; /* htons()? */
... pass tag to hardware ...
}
VLAN tags are stored in host byte order. Byte swapping may be necessary.
(Note: This driver conversion was mechanic and did not add or remove any
byte swapping in the drivers.)
Remove zone_mtag_vlan UMA zone and MTAG_VLAN definition. No more tag
memory allocation have to be done.
Reviewed by: thompsa, yar
Sponsored by: TCP/IP Optimization Fundraise 2005
if_watchdog, etc., or in functions used only in these methods.
In all other functions in the driver use device_printf().
- Use __func__ instead of typing function name.
Submitted by: Alex Lyashkov <umka sevcity.net>
to it. Try to co-operate with the IPMI/ASF firmware accessing the PHY.
One we get link we don't mess with the PHY. If we do then over time
the NIC will go off line. It would be nice if we could tell if IPMI
was enabled on the chip but I can't figure out a reliable way to do
that. The scheme I tried worked on a Dell PE850 but not on an HP machine.
So we assume any NIC that has ASF capability needs to deal with it.
The code was inspired by the support in Linux from kernel.org and Broadcom.
Broadcom did give me some info. but it is rather limited and is mostly
just what is in the Linux driver. Thanks to the numerous people that
helped debug the many prior versions and that I didn't break other
bge(4) HW.
Reviewed by: several people
Tested by: even more
required by arches like sparc64 (not yet implemented) and sun4v where there
are seperate IOMMU's for each PCI bus... For all other arches, it will
end up returning NULL, which makes it a no-op...
Convert a few drivers (the ones we've been working w/ on sun4v) to the
new convection... Eventually all drivers will need to replace the parent
tag of NULL, w/ bus_get_dma_tag(dev), though dev is usually different for
each driver, and will require hand inspection...
Reviewed by: scottl (earlier version)
Following issues should be resolved:
- random watchdog timeouts (caused by concurrent phy access)
- some link state issues
- non working TX if media type was set explicitly
PR: kern/98738
Approved by: glebius (mentor)
MFC after: 2 weeks
BCM5787 based NICs.
- Recognize BCM5703 B0 ASIC.
- Rewrite the jumbo capability matching macro, so that chips known
to work are listed there. [*]
[*] I'm still not sure about this. Probably more corrections
will be done to this macro after discussion with davidch@
and brad@OpenBSD.
Obtained from: OpenBSD (brad)
- Add more device IDs, ASIC revisions and chip IDs.
- Rewrite a bit code that picks the description for device.
- Introduce several macros to shorten quirks for bugs and
features.[*]
- Use some magic values, that OpenBSD has successfully
possessed from Linux (Broadcom supplied) driver.
- Remove disabled code that tried to access VPD.
[*] The macro that matches Jumbo capable NICs is
rewritten to preserve our current behavior. I
need clarify whether our or theirs is correct.
PR: 68351 (and may be others)
Obtained from: OpenBSD, brad@ mostly
This allows one to change the behavior of the driver pre-boot.
NOTE: This patch was made for DragonFly BSD by Sepherosa Ziehau.
PR: kern/94833
Submitted by: Devon H. O'Dell
Obtained from: DragonFly
MFC after: 1 month
should fix strange link state behaviour reported for bcm5721 & bcm5704c
2) Clear bge_link flag in bge_stop()
3) Force link state check after bge_ifmedia_upd(). Otherwise we can miss link
event if PHY changes it's state fast enough.
Tested by: phk (bcm5704c)
Approved by: glebius (mentor)
MFC after: 1 week
with pseudo header for tcp/udp packets). This could save one in_pseudo() call
per incoming tcp/udp packet.
Approved by: glebius (mentor)
MFC after: 3 weeks
to process. It could give us [significant?] perfomance increase if there is big
difference between RX/TX flows.
Submitted by: Mihail Balikov <mihail.balikov AT interbgc DOT com>
Approved by: glebius (mentor)
MFC after: 3 days
synchronized on every call of bge_poll_locked().
Suggested by: Mihail Balikov <mihail.balikov AT interbgc DOT com>
Approved by: glebius (mentor)
MFC after: 3 days
2) add missing bus_dmamap_sync() call in bge_intr()
Tested by: Husnu Demir <hdemir AT metu DOT edu DOT tr>
Approved by: glebius (mentor)
MFC after: 3 days
as input/output interface errors.
- Keep values of rx/tx discards & tx collisions inside struct bge_softc.
So we can keep statistic across ifconfig down/up runs (cause bringing
bge up will reset chip).
Approved by: glebius (mentor)
MFC after: 1 week
2) use more robust way of link state handling for BCM5700 rev.B2 chip
3) workaround bug of some BCM570x chips which cause spurious "link up" messages
4) fix bug: some BCM570x chips was unable to detect link state changes after
ifconfig down/up sequence until any 'non-link related' interrupt generated.
(this happened due to pending internal link state attention which blocked
interrupt generation)
Approved by: glebius (mentor)
MFC after: 1 week
except for BGE_CHIPID_BCM5700_B0, which is buggy.
- All bge(4) supported hardware, has a bug that produces incorrect checksums
on Ethernet runts. However, in case of a transmitted packet, the latter can
be padded with zeroes, and the checksum would be correct. (Probably chip
includes the pad data into checksum). In case of receive, we just don't
trust checksum data in received runts.
Obtained from: NetBSD (jonathan) via Mihail Balikov
Previously it always returned 0 which means success regardless of
EEPROM status.
While here, add a check whether EEPROM read is successful.
Submitted by: jkim
- removed unused funtion bge_handle_events().
- removed bus_dmamap_destroy(9) calls for DMA maps created by
bus_dmamem_alloc(9). This should fix panics seen on sparc64
in device detach.
- added check for parent DMA tag creation.
- switched to use __NO_STRICT_ALIGNMENT as bge(4) supports all
architectures.
- added missing bus_dmamap_sync(9) in bge_txeof().
- added missing bus_dmamap_sync(9) in bge_encap().
- corrected memory synchronization operation on status block.
As the driver just read status block that was DMAed by NIC it
should use BUS_DMASYNC_POSTREAD. Likewise the driver does not
need to write status block back, so remove unnecessary
bus_dmamap_sync(9) calls in bge_intr().
- corrected memory synchronization operation on RX return ring.
The driver only read the block so remove unnecessary
bus_dmamap_sync(9) in bge_rxeof().
- force bus_dmamap_sync(9) for only modified descriptors. Blindly
synching all desciptor rings would reduce performance.
- call bus_dmamap_sync(9) for DMA maps that were modified in bge_rxeof().
Reviewed by: jkim(initial version)
Tested by: glebius(i386), jkim(amd64 initial version)
we can cache its value in the softc. Eliminates one PCI register
write per call to bge_start().
A 1.8% speedup for UDP_RR test on my old box.
Obtained from: NetBSD(jonathan) via delphij
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
In collaboration with: yongari, ru, sam
- Give up endianess support and switch to native-endian format for
accessing hardware structures. In fact embedded processor for
BCM57xx is big-endian architure(MIPS) and it requires native-endian
format for NIC structures.The NIC performs necessary byte/word
swapping depending on programmed endian type.
- With above changes all htole16/htole32 calls were gone.
- Remove bge_vhandle member in softc and changed to use explicit
register access. This may add additional performance penalty
that than that of previous memory access. But most of the access
is performed on initialization phase(e.g. RCB setup), it would be
negligible.
Due to incorrect use of bus_dma(9) in bge(4) it still panics sparc64
system in device detach path. The issue would be fixed in next patch.
Reviewed by: jkim (initial version)
Silence from: ps
Tested by: glebius
Obtained from: NetBSD via OpenBSD
cluster allocator, that wasn't MPSAFE. Instead, utilize our new generic
UMA jumbo cluster allocator. Since UMA gives us a 9k piece that is contigous
in virtual memory, but isn't contigous in physical memory we need to handle
a few segments. To deal with this we utilize Tigon chip feature - extended
RX descriptors, that can handle up to four DMA segments for one frame.
Details:
o Remove bge_alloc_jumbo_mem(), bge_free_jumbo_mem(),
bge_jalloc(), bge_jfree() functions.
o Remove SLIST heads, bge_jumbo_tag, bge_jumbo_map from softc.
o Use extended RX BDs for Jumbo receive producer ring, and
initialize it appropriately.
o New bge_newbuf_jumbo():
- Allocate an mbuf with Jumbo cluster with help of m_cljget().
- Load the cluster for DMA with help of bus_dmamap_load_mbuf_sg().
- Assert that we got 3 segments in the DMA mapping.
- Fill in these 3 segments into the extended RX descriptor.
2) rework link state detection code & use it in POLLING mode
3) fix 2 bugs in link state detection code:
a) driver unable to detect link loss on bcm5721
b) on bcm570x chips (tested on bcm5700 bcm5701 bcm5702) driver fails
to detect link loss with probability 1/6 (solved in brgphy.c)
Devices working in TBI mode should not be affected by this change.
Approved by: glebius (mentor)
MFC after: 1 month
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
I'm able to suspend/resume my laptop without this change, but then I need
to wait for the watchdog to reset the card.
With this change, it is ready immediately.
Glanced at by: glebius
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
over iteration of their multicast address lists when synchronizing the
hardware address filter with the network stack-maintained list.
Problem reported by: Ed Maste (emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
a problem with one particular switch module. Create a kernel option
BGE_FAKE_AUTONEG that restores the 5.4 behavior, which should make the DNLK
switch module work. IBM/Intel blades with Intel or AD switch modules should
work without patching or kernel options with this commit.
Hardware for testing provided by several folks, including
Danny Braniss <danny@cs.huji.ac.il>, Achim Patzner <ap@bnc.net>,
and OffMyServer.
Approved by: re
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
entered the interface start function, no packets were actually dequeued.
Therefore, keep a count of how many packets we really added onto the tx
chain, and initiate a transmit only if the count is non-zero.
frames. BGE hardware with the rx alignment bug will still be handled by the
calls to m_adj() that already exist. m_adj() is probably better suited for
this task anyways. Just as with if_em, this saves a malloc + several locks
per packet and prevents unneeded data copying within busdma.
I have from Broadcom does not give much information on these devices,
so the Broadcom Linux driver was used for clues to what these chips
support. It turns out they are similar to the 5705 with the 5751
being the PCI-Express version and needing special work-arounds and
settings.
copper NICs, a link change event is posted whenever MII autopolling is
toggled off and on, which happens whenever someone calls
bge_miibus_readreg() or bge_miibus_writereg() to access the PHY
registers. This means anytime someone called the SIOCGIFMEDIA ioctl
on a bge interface, the link would reset. Even a simple "ifconfig bge0"
would do it, though other apps like dhclient or the PPPoE daemon could
trigger it as well. An obvious symptom of this problem is lots of
"bgeX: gigabit link up" messages appearing on the console for no
apparent reason.
Through experimentation, I determined that when a real link change
event occurs, the BGE_MIMODE_AUTOPOLL in the BGE_MI_MODE register
is always set, so now if we have a copper NIC and an link change
event occurs and the BGE_MIMODE_AUTOPOLL bit is clear, we ignore
the event.
Note that this does not apply to the original BCM5700 chip since we
use a different method for sensing link changes with that chip (the
status block method was broken), nor to fiber optic NICs since they
don't use the GMII PHY access registers.
(in particular, bge(4) hasn't supported rxcsum since if_bge.c#1.5)
Clean up some aspects of capabilities usage, i.e. stop using
if_hwassist to see whether we are doing offload now because if_hwassist
is for TCP/IP layer and it is subordinate to if_capenable.
Thanks to: Aled Morris for donating a nice bge(4) NIC to me
Reviewed by: -net, -hackers (silence)
mode. The 5704 apparently has some s00p3r s33kr1t registers for setting
the advertisement of pause frame ability (i.e flow control) when in
autoneg mode. If we don't set these registers correctly, we may not
be able to negotiate a proper link with some switches. (Symptom is that
the NIC reports the link as up (PCS synched) but no traffic can be
exchanged.)
PR: kern/67598
multicast hash are written. There are still two distinct algorithms used,
and there actually isn't any reason each driver should have its own copy
of this function as they could all share one copy of it (if it grew an
additional argument).
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
These are 10/100 only NICs found on the IBM Thinkpad R40E and
G40. These seem to be based on the BCM5705 MAC but with a PHY
that doesn't support 1000Mbps modes.
Submitted by: Igor Sviridov <sia@nest.org>
to configure this correctly yields many watchdog timeouts even on lightly
loaded machines. This is a common complaint from users with Dell 1750
servers with built-in dual 5704 NICs.
BGE_MACSTAT_MI_COMPLETE bit in the MAC status register as a link
change indicator. We turn this bit on now because some of the newer
chips need it, but it usually just means that reading/writing
an MII/GMII register has completed, not that a link change has
occured.
larger than normal frames, to account for the case where a bge(4) NIC
is used with VLANs. Since we set the IFCAP_VLAN_MTU flag, we must allow
reception of frames up to 1522 bytes in size rather than 1518.
Note that it is possible to work around this bug by doing:
# ifconfig bge0 mtu 1504
prior to configuring any VLAN interfaces.
- 5705 doesn't support jumbo frames
- Statistics must be read from registers
- RX return ring must be capped at 512 entries
- Omit initialization of certain device blocks
- Acknowledge link change interrupts by setting the 'link changed'
bit in the status register (used to have no effect)
- Remember to toggle the MI completion bit too
- Set the mbuf low watermark differently (on-chip memory buffers,
not BSD mbufs)
- Don't enable Ethernet@WireSpeed feature for certain 5705 chip revs
- Add additional PCI IDs for 5705 and 5782 parts
- Add a forgotten 5704 PCI ID
Most changes ripped kicking and screaming from the Broadcom linux driver.
Thanks to Paul Saab for sanity testing. (My lack of sanity has been
confirmed.)
(mainly the 3Com 3c996B/BCM5701).
For some reason that I don't fully understand, the 5701 signals PCS
encoding errors as though they were link change events, i.e. the 'link
state changed' bit in the status word of the status block is updated
and an interrupt is generated. This would cause the bge_tick() function
to be invoked and a "gigabit link up" message to be printed on the console.
To avoid this, the interrupt handler now checks the MAC status register
when a link change interrupt is triggered, and it will only call the
bge_tick() function if the 'PCS encoding error detected' bit is clear.
(This change should have no effect on copper NICs since this bit can
only ever be set in TBI mode. I do not know how it affects 5704 NICs
with a BCM8002 SERDES PHY.)
Special thanks to: Sherry Rogers at UCB for allowing me access to one
of their traffic monitor boxes so I could diagnose this problem.
ASIC revision is really the major number of the CHIPID. Also store
the chipid, asic rev and chip revision in the softc for later use.
- The write twice to send producer index workaround only applies to
the 5700_BX chips, so only do it there.
Requested by: jdp
- Do not initalize the LED's to 0x00. The default configuration
the chip comes up in should yeild proper operation of the LED's.
Confirmed by: John Cagle <john.cagle@hp.com>
Approved by: re (blanket)