Unlike other controllers which have more advanced jumbo support,
these controllers have one send ring, one standard receive producer
ring and one receive return ring. In order to receive jumbo frames
on the controllers, driver now will increase Rx buffer size to 9k.
Two Rx modes are supported on these controllers and I chose
standard Rx BDs over extended Rx BDs. The extended Rx BD mode
allows up to 4 segmentations for each Rx BDs such that kernel does
not have to allocate large buffer of contiguous memory for
receiving. The extended Rx BD mode is already used on controllers
that have separate jumbo receive ring. However, using extended Rx
BDs on BCM5714/BCM5715/BCM5780 reduces the number of Rx BDs to 256
entries which in turn may reduce the performance. Also UMA backed
page allocator for jumbo frame returns contiguous memory so using
extended Rx BD has no advantage on FreeBSD unless highly customized
local allocator implemented in driver is used.
To use jumbo buffers in standard receive ring, Rx buffer allocation
handler was changed to allocate MJUM9BYTES sized mbuf.
PR: kern/155192
Tested by: Vijay Singh <vijju.singh <> gmail dot com>
Submitted by: mjacob (initial version)
DMA boundary bug and runs with PCI-X mode. watchdog timeout was
observed on BCM5704 which lives behind certain PCI-X bridge(e.g.
AMD 8131 PCI-X bridge). It's still not clear whether the root
cause came from that PCI-X bridge or not. The watchdog timeout
indicates the issue is in TX path. If the bridge reorders TX
mailbox write accesses it would generate all kinds of problems but
I'm not sure. This should be revisited.
Tested by: Michael L. Squires (mikes <> siralan dot org)
issue seen on PCIX BCM5704 controller. r216970 fixed the issue but
the DMA address space restriction was applied to all bge(4)
controllers such that it caused unnecessary performance degradation
for controllers that have no such issues.
bus_dma(9)'s capability which honors boundary restrictions of DMA
tag for dynamic buffers. However it seems this does not work well
and it triggered watchodg timeouts on controller that has the
hardware bug. It's not clear whether there is still another
hardware bug not mentioned in errata. This should be revisited
since this change shall make use of bounce buffers which in turn
reduces performance a lot on systems that have more than 4GB
memory.
Reported by: Michael L. Squires (mikes <> siralan dot org)
Tested by: Michael L. Squires (mikes <> siralan dot org)
MFC after: 3 days
versions of FreeBSD. In fact we are already missing a lot of conditional
code necessary to support older versions of FreeBSD, including alternatives
for vital functionality not yet provided by the respective subsystem back
then (see for example r199663). So this change shouldn't actually break
this driver on versions of FreeBSD that were supported before. Besides,
this driver also isn't maintained as an multi-release version outside of
the main repository, so removing the conditional code shouldn't be a
problem in that regard either.
- Sprinkle some more const on tables.
of certain MAC models from brgphy(4) to bge(4) where it belongs. While at it,
update the list of models having that restriction to what OpenBSD uses, which
in turn seems to have obtained that information from the Linux tg3 driver.
support in mii(4):
- Merge generic flow control advertisement (which can be enabled by
passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
IFM_FLOW isn't implemented as a global option via the "don't care
mask" but instead as a media specific option this. This has the
following advantages:
o allows flow control advertisement with autonegotiation to be
turned on and off via ifconfig(8) with the default typically
being off (though MIIF_FORCEPAUSE has been added causing flow
control to be always advertised, allowing to easily MFC this
changes for drivers that previously used home-grown support for
flow control that behaved that way without breaking POLA)
o allows to deal with PHY drivers where flow control advertisement
with manual selection doesn't work or at least isn't implemented,
like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
by setting MIIF_NOMANPAUSE
o the available combinations of media options are readily available
from the `ifconfig -m` output
- Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
- Change IFM_ETH_MASTER from being implemented as a global option via
the "don't care mask" to a media specific one as it actually is only
applicable to IFM_1000_T to date.
- Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
actually configure manually selected slave mode (like we also do in
the PHY specific implementations).
- Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
e1000phy(4) and ip1000phy(4) to use the generic flow control support
instead of home-grown solutions via IFM_FLAGs. This includes changing
these PHY drivers and smcphy(4) to no longer unconditionally advertise
support for flow control but only if the selected media has IFM_FLOW
set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
the contents of the BMSR via mii_phy_add_media() instead of hardcoding
them. The latter approach seems to have developed historically, besides
causing unnecessary code duplication it was also undesirable because
brgphy_mii_phy_auto() already based the capability advertisement on the
contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
just BCM5701. Apparently this was a misinterpretation of a workaround
in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
this doesn't mean we can't set these as well on other PHYs for manual
media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
IFM_1000_T master mode support now is generally available with all PHY
drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
not applicable there.
Reviewed by: yongari (plus additional testing)
Obtained from: NetBSD (partially), OpenBSD (partially)
MFC after: 2 weeks
the dual port BCM5717 and BCM5718 devices which are intended for
mainstream workstation and entry-level server designs and
represents the twelfth generation of NetXtreme Ethernet controllers.
This family is the successor to the BCM5714/BCM5715 family and
supports IPv4/IPv6 checksum offloading, TSO, VLAN hardware tagging,
jumbo frames, MSI/MSIX, IOV, RSS and TSS.
This change set supports all hardware features except IOV and
RSS/TSS. Unlike its predecessors, only extended RX buffer
descriptors can be posted to the jumbo producer ring. Single RX
buffer descriptors for jumbo frame are not supported. RSS requires
a more substantial set of changes and will apply to a larger set
of NetXtreme devices so RSS/TSS multi-queue support will be
implemented in a future releases.
Special thanks to Broadcom who kindly sent a sample board to me
and to davidch who gave provided the initial support code.
Submitted by: davidch (initial version)
HW donated by: Broadcom
to BCM6906 A0/A2. This should fix a long standing BCM5906 A2 lockup
issues. Data sheet explicitly mentions BCM5906 A0, A1 and A2 use
de-pipelined mode on these revisions.
Special thanks to Buganini who tried all combinations of
experimental patches for more than 10 days.
Tested by: Buganini <buganini <> gmail dot com >
auto-negotiation results in half-duplex operation, excess collision
on the ethernet link may cause internal chip delays that may result
in subsequent valid frames being dropped due to insufficient
receive buffer resources. The workaround is to choose de-pipeline
method as a flow control decision for SDI. De-pipeline method
allows only 1 data in TxMbuf at a time such that a request to RDMA
from SDI is made only when TxMbuf is empty. Thanks for david for
providing detailed errata information.
receive two back-to-back send BDs with less than or equal to 8
total bytes then the device may hang. The two back-to-back send
BDs must be in the same frame for this failure to occur.
Thanks to davidch for detailed errata information.
Reviewed by: davidch
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
Reviewed by: jhb, yongari
header parser uses m_pullup(9) to get access to mbuf chain.
m_pullup(9) can allocate new mbuf chain and free old one if the
space left in the mbuf chain is not enough to hold requested
contiguous bytes. Previously drivers can use stale ip/tcp header
pointer if m_pullup(9) returned new mbuf chain.
Reported by: Andrew Boyer (aboyer <> averesystems dot com)
MFC after: 10 days
auto polling such that it made all controllers obtain link status
information from the state of the LNKRDY input signal. Broadcom
recommends disabling auto polling such that driver should rely on
PHY interrupts for link status change indications. Unfortunately it
seems some controllers(BCM5703, BCM5704 and BCM5705) have PHY
related issues so Linux took other approach to workaround it.
bge(4) didn't follow that and it used to enable auto polling to
workaround it. Restore this old behavior for BCM5700 family
controllers and BCM5705 to use auto polling. For BCM5700 and
BCM5701, it seems it does not need to enable auto polling but I
restored it for safety.
Special thanks to marius who tried lots of patches with patience.
Reported by: marius
Tested by: marius
Link UP state could be reported first before actual completion of
auto-negotiation. This change makes bge(4) reprogram BGE_MAC_MODE,
BGE_TX_MODE and BGE_RX_MODE register only after controller got a
valid link.
receive producer ring only for BCM5700. It was believed that
BCM5700 with external SSRAM is the only controller that supports
mini ring but it seems all BCM570[0-4] requires to disable mini
receive producer ring. Otherwise, it caused unexpected RX DMA
error or watchdog timeouts.
Reported by: marius, Steve Kargl <sgk <> troutmask dot apl dot washington dot edu>
Tested by: marius, Steve Kargl <sgk <> troutmask dot apl dot washington dot edu>
before setting the flag, interrupt was already enabled such that
interrupt handler could be run before setting IFF_DRV_RUNNING flag.
This can lose initial link state change interrupt which in turn
make bge(4) think that it still does not have valid link. Fix this
race by protecting the taskqueue with a driver lock.
While I'm here move reenabling interrupt code after handling of link
state chage.
Reviewed by: davidch
Previously bge(4) always enabled auto polling for non-BGE_FLAG_TBI
controllers. With this change, auto polling is not used anymore so
polling through mii(4) was introduced.
Reviewed by: davidch
as 5788. This caused BGE_MISC_LOCAL_CTL register is used to
generate link state change interrupt for non-5788 controllers. The
interrupt handler may or may not detect link state attention as
status block wouldn't be updated when an interrupt was generated
with BGE_MISC_LOCAL_CTL register. All controllers except 5700 and
5788 should use host coalescing mode register to trigger an
interrupt.
versions of controller support different number of ring control
blocks such that adjust code a bit to access known number of
send/receive ring control blocks. Previously bge(4) blindly
accessed 16 send/receive RCBs. Also move initializing standard
receive producer ring producer index, jumbo receive producer ring
producer index and mini receive producer ring producer index to
the end of each receive producer ring initialization.
Do not assume mini receive producer ring is available only when
controller has jumbo frame capability, instead explicitly check
ASIC version BCM5700 to disable mini receive producer ring.
Additionally always enable send ring 0 regardless of controller
versions. Previously bge(4) didn't enable send ring 0 if controller
is BGE_IS_5705_PLUS. Becase bge(4) need 1 send ring to send frames
at least, I have no idea how it would have worked so far.
Submitted by: davidch
BGE_MI_MODE register accesses. Previously bge(4) used to read
BGE_MI_MODE register to detect whether it needs to disable
autopolling feature or not. Because we don't touch autopolling in
other part of driver there is no reason to read BGE_MI_MODE
register given that we know default value in advance. In order to
achieve the goal, check whether the controller has CPMU(Central
Power Mangement Unit) capability. If controller has CPMU feature,
use 500KHz MII management interface(mdio/mdc) frequency regardless
core clock frequency. Otherwise use default MII clock. While I'm
here, add CPMU register definition.
In bge_miibus_readreg(), rearrange code a bit and remove goto
statement. In bge_miibus_writereg(), make sure to restore
autopolling even if MII write failed. The delay time inserted after
accessing BGE_MI_MODE register increased from 40us to 80us.
The default PHY address is now stored in softc. All PHYs supported
by bge(4) currently uses PHY address 1 but it will be changed when
we add newer controllers. This change will make it easier to change
default PHY address depending on PHY models.
Submitted by: davidch
these names are used in data sheet. Also use UnicastPkts,
MulticastPkts and BroadcastPkts instead of UcastPkts, McastPkts
and BcastPkts to clarify its meaning.
Suggested by: bde
controllers. bge(4) exported MAC statistics on controllers that
maintain the statistics in the NIC's internal memory. Newer
controllers require register access to fetch these values. These
counters provide useful information to diagnose driver issues.
parent driver. Use that information to configure flow-control.
One drawback is there is no way to disable flow-control as we still
don't have proper way to not advertise RX/TX pause capability to
link partner. But I don't think it would cause severe problems and
users can selectively disable flow-control in switch port.
has reached. This reduced number of dropped frames when
flow-control is enabled. Previously it dropped incoming frames once
RX MBUF low watermark has reached. The value used in MAC RX MBUF
low watermark is greater than or equal to 4 so receiving two more
RX frames should not be a problem.
Obtained from: OpenBSD
too many bge(4) controllers there and model name does not
necessarily match asic/chip revision. Relying on VPD string made
it hard to identify exact asic/chip revision so the first step to
debug bge(4) was getting exact asic/chip information with verbose
boot which may not be available on production server.
already updated after allocating mbuf so driver had to use the last
index instead of using next producer index. This should fix driver
hang which may happen under high network load.
Reported by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
Tested by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
MFC after: 10 days
or not by comparing reported TX consumer index with saved index. So
remove unnecessary check done after freeing transmitted mbufs.
While I'm here nuke unnecessary variable initializations.
tag. All controllers that are not BCM5755 or higher have 4GB
boundary DMA bug. Previously bge(4) used 32bit DMA address to
workaround the bug(r199670). However this caused the use of bounce
buffers such that it resulted in poor performance for systems which
have more than 4GB memory. Because bus_dma(9) honors boundary
restriction requirement of DMA tag for dynamic buffers, having a
separate TX/RX mbuf DMA tag will greatly reduce the possibility of
using bounce buffers. For DMA buffers allocated with
bus_dmamem_alloc(9), now bge(4) explicitly checks whether the
requested memory region crossed the boundary or not.
With this change, only the DMA buffer that crossed the boundary
will use 32bit DMA address. Other DMA buffers are not affected as
separate DMA tag is created for each DMA buffer.
Even if 32bit DMA address space is used for a buffer, the chance to
use bounce buffer is still very low as the size of buffer is small.
This change should eliminate most usage of bounce buffers on
systems that have more than 4GB memory.
More correct fix would be teaching bus_dma(9) to honor boundary
restriction for buffers created with bus_dmamem_alloc(9) but it
seems that is not easy.
While I'm here cleanup bge_dma_map_addr() and remove unnecessary
member variables in bge_dmamap_arg structure.
Tested by: marcel
datagrams with checksum value 0 when TX UDP checksum offloading is
enabled. Generating UDP checksum value 0 is RFC 768 violation.
Even though the probability of generating such UDP datagrams is
low, I don't want to see FreeBSD boxes to inject such datagrams
into network so disable UDP checksum offloading by default. Users
still override this behavior by setting a sysctl variable or loader
tunable, dev.bge.%d.forced_udpcsum.
I have no idea why this issue was not reported so far given that
bge(4) is one of the most commonly used controller on high-end
server class systems. Thanks to andre@ who passed the PR to me.
PR: kern/104826
r165114 added that code and that change ignored the same logic
committed in r135772. In addition, data FIFO protection should be
selectively enabled instead of applying to all PCIe devices.
While I'm here add BCM5785 to devices that do not require this
fix.
configuration to get IPv4 TSO work on BCM57780. While I'm here
apply the same fix to BCM5785 which shares similar hardware feature
of BCM57780. This change makes TSO work on BCM57780.
Tested by: Tong Liu <nemoliu <> gmail dot com>
buffers it should also reinitialize RX descriptors otherwise some
stale data could be passed to controller. This could end up with
mbuf double free or unexpected NULL pointer dereference in upper
stack. To fix the issue, save loaded buffer's length and
reinitialize RX descriptors with the saved value whenever bge(4)
reuses the loaded RX buffers.
While I'm here, increase the number of RX buffers to 512 from 256.
This simplifies RX buffer handling as well as giving more RX
buffers. Controller supports just fixed number of RX buffers
(i.e. 512) and bge(4) used to rely on hope that our CPU is fast
enough to keep up with the controller. With this change, bge(4)
will use 1MB for RX buffers but I don't think it would cause
problems in these days.
Reported by: marcel
Tested by: marcel
index of status block is read first before acknowledging the
interrupts. Otherwise bge(4) may get stale status block as
acknowledging an interrupt may yield another status block update.
Reviewed by: marius
Also disable relaxed ordering as recommended by data sheet for
PCI-X devices. For PCI-X BCM5704, set maximum outstanding split
transactions to 0 as indicated by data sheet.
For BCM5703 in PCI-X mode, DMA read watermark should be less than
or equal to maximum read byte count configuration. Enforce this
limitation in DMA read watermark configuration.
the issue. I still have no idea why TSO does not work on this
controller. davidch@ also confirmed there is no known TSO related
issues for this controller.
not support TSO over VLAN if VLAN hardware tagging is disabled so
there is no need to check VLAN here.
While I'm here make sure to pullup IP/TCP headers in the first
buffer.
to make TSO work on VLAN. So if VLAN hardware tagging is disabled
explicitly clear TSO on VLAN. While I'm here remove duplicated
VLAN_CAPABILITIES call.
The softc obtained in device probe wouldn't be the same one used in
device attach. Drivers should not assume any values stored in softc
structure in probe routine will be available for its attach routine.
no effect. Make sure to clear error bits by writing 1. [1]
While I'm here use predefined value instead of hardcodig magic
vlaue.
Submitted by: msaitoh at NetBSD [1]
implementation of heartbeat interval was 2 but there was typo which
caused the heartbeat is sent approximately every 5 seconds. This
caused unintended controller reset by firmware because firmware
thought OS was crashed.
Submitted by: Floris Bos < info <> je-eigen-domein dot nl >
Tested by: Andrzej Tobola < ato <> iem dot pw dot edu dot pl >
chains. This part of code is to enhance performance so failing the
collapsing should not free TX frames. Otherwise bge(4) will
unnecessarily drop frames which in turn can freeze the network
connection.
Reported by: Igor Sysoev (is <> rambler-co dot ru)
Tested by: Igor Sysoev (is <> rambler-co dot ru)
over GMII, make sure to enable GMII. With this change brgphy(4) is
used to handle the dual mode PHY. Since we still don't have a sane
way to pass PHY specific information to mii(4) layer special
handling is needed in brgphy(4) to determine which mode of PHY was
configured in parent interface.
This change make BCM5715S work.
Tested by: olli
Obtained from: OpenBSD
MFC after: 1 week
o Don't enable BGE_FLAG_BER_BUG on both 5722 and 5756, and based
on their PCI IDs rather than their chip IDs.
Reported by: several PC-BSD users via kmoore
Reviewed by: yongari, imp, jhb, davidch
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
same ASIC ID of BCM5758 such that r198318 incorecctly enabled TSO
on BCM5754.BCM5754M controllers. BCM5754/BCM5754M needs a special
firmware to enable TSO and bge(4) does not support firmware based
TSO.
Reported by: ed
Tested by: ed
hw.bge.forced_collapse. hw.bge.forced_collapse affects all bge(4)
controllers on system which may not desirable behavior of the
sysctl node. Also allow the sysctl node could be modified at any
time.
Reviewed by: bde (initial version)
feature. These registers are reserved on controllers that have no
support for jumbo frame.
Only BCM5700 has mini ring so do not poke mini ring related
registers if controller is not BCM5700.
Reviewed by: marius
handler in brgphy(4) does not exist and brgphy(4) just resets the
PHY and returns EINVAL as it has no isolation handler. I also agree
on Marius's opinion that stop handler of every NIC driver seems to
be the wrong place for implementing PHY isolate/power down.
If we need PHY isolate/power down it should be implemented in
brgphy(4) and users should administratively down the PHY.
Reviewed by: marius
single outstanding DMA read operation. Most controllers targeted to
client with PCIe bus interface(e.g. BCM5761) may have this
limitation. All controllers for servers does not have this
limitation.
Collapsing mbuf chains to reduce number of memory reads before
transmitting was most effective way to workaround this. I got about
940Mbps from 850Mbps with mbuf collapsing on BCM5761. However it
takes a lot of CPU cycles to collapse mbuf chains so add tunable to
control the number of allowed TX buffers before collapsing. The
default value is 0 which effectively disables the forced collapsing.
For most cases 2 would yield best performance(about 930Mbps)
without much sacrificing CPU cycles.
Note the collapsing is only activated when the controller is on
PCIe bus and the frame does not need TSO operation. TSO does not
seem to suffer from the hardware limitation because the payload
size is much bigger than normal IP datagram.
Thanks to davidch@ who told me the limitation of client controllers
and actually gave possible workarounds to mitigate the limitation.
Reviewed by: davidch, marius
Tx/Rx/Rx return ring such that large part of status block was not
used at all. All bge(4) controllers except BCM5700 AX/BX has a
feature to control the size of status block. So use minimum status
block size allowed in controller. This reduces number of DMAed
status block size to 32 bytes from 80 bytes.
seem to require a special firmware to use TSO. But the firmware is
not available to FreeBSD and Linux claims that the TSO performed by
the firmware is slower than hardware based TSO. Moreover the
firmware based TSO has one known bug which can't handle TSO if
ethernet header + IP/TCP header is greater than 80 bytes. The
workaround for the TSO bug exist but it seems it's too expensive
than not using TSO at all. Some hardwares also have the TSO bug so
limit the TSO to the controllers that are not affected TSO issues
(e.g. 5755 or higher).
While I'm here set VLAN tag bit to all descriptors that belengs to
a frame instead of the first descriptor of a frame. The datasheet
is not clear how to handle VLAN tag bit but it worked either way in
my testing. This makes it simplify TSO configuration a little bit.
Big thanks to davidch@ who sent me detailed TSO information.
Without this I was not able to implement it.
Tested by: current
have a DMA bug when buffer address crosses a multiple of the 4GB
boundary(e.g. 4GB, 8GB, 12GB etc). Limit DMA address to be within
4GB address for these controllers. The second DMA bug limits DMA
address to be within 40bit address space. This bug applies to
BCM5714 and BCM5715 and 5708(bce(4) controller). This is not
actually a MAC controller bug but an issue with the embedded PCIe
to PCI-X bridge in the device. So for BCM5714/BCM5715 controllers
also limit the DMA address to be within 40bit address space.
Special thanks to davidch@ who gave me detailed errata information.
I think this change will fix long standing bge(4) instability
issues on systems with more than 4GB memory.
Reviewed by: davidch
PCI flush to get correct status block update. Add an optimized
interrupt handler that is activated for MSI case. Actual interrupt
handling is done by taskqueue such that the handler does not
require driver lock for Rx path. The MSI capable bge(4) controllers
automatically disables further interrupt once it enters interrupt
state so we don't need PIO access to disable interrupt in interrupt
handler.
update and then clear status block. Previously it used to access
these index without synchronization which may cause problems when
bounce buffers are used. Also add missing bus_dmamap_sync(9) in
polling handler. Since we now update status block in driver, adjust
bus_dmamap_sync(9) for status block.
checking IFF_DRV_RUNNING and IFF_DRV_OACTIVE flags. Also if we
have less than 16 free send BDs set IFF_DRV_OACTIVE and try it
later. Previously bge(4) used to reserve 16 free send BDs after
loading dma maps but hardware just need one reserved send BD. If
prouder index has the same value of consumer index it means the Tx
queue is empty.
While I'm here check IFQ_DRV_IS_EMPTY first to save one lock
operation.
directly access them at fixed address. While I'm here don't touch
other bits of PCIe device control register except max payload size.
Reviewed by: marius
Revision 1.158 says only lower ten bits of
BGE_RXLP_LOCSTAT_IFIN_DROPS register is valid. For BCM5761 case it
seems the controller maintains 16bits value for the register.
However 16bits are still too small to count all dropped packets
happened in a second. To get a correct counter we have to read the
register in bge_rxeof() which would be too expensive.
Pointed out by: bde
MAC in bge_tick. Previously it used to show more number of input
errors. I noticed actual input errors were less than 8% even for
64 bytes UDP frames generated by netperf.
Since we always access BGE_RXLP_LOCSTAT_IFIN_DROPS register in
bge_tick, remove useless code protected by #ifdef notyet.
initializes it to ETHER_HDR_LEN so we have to override it after
calling ether_ifattch().
While I'm here remove setting if_mtu value, it's initialized in
ether_ifattach().
Introduce two spare dma maps for standard buffer and jumbo buffer
respectively. If loading a dma map failed reuse previously loaded
dma map. This should fix unloaded dma map is used in case of dma
map load failure. Also don't blindly unload dma map and defer
dma map sync and unloading operation until we know dma map for new
buffer is successfully loaded. This change saves unnecessary dma
load/unload operation. Previously bge(4) tried to reuse mbuf
with unloaded dma map which is really bad thing in bus_dma(9)
perspective.
While I'm here update if_iqdrops if we can't allocate Rx buffers.
standard buffer size. If controller is not capable of handling
jumbo frame, interface MTU couldn't be larger than standard MTU
which in turn the received should be fit in standard buffer. This
fixes bus_dmamap_sync call for jumbo ring is called even if
interface is configured to use standard MTU.
Also if total frame size could be fit into standard buffer don't
use jumbo buffers.
for buffer allocation. If driver know we are out of Rx buffers let
controller stop. This should fix panic when interface is run even
if it had no configured Rx buffers.
and Rx DMA tag separately. Previously it used a common mbuf DMA tag
for both Tx and Rx path but Rx buffer(standard ring case) should
have a single DMA segment and maximum buffer size of the segment
should be less than or equal to MCLBYTES. This change also make it
possible to add TSO with minor changes.
bge_newbuf_std still has a bug for handling dma map load failure
under high network load. Just reusing mbuf is not enough as driver
already unloaded the dma map of the mbuf. Graceful recovery needs
more work.
Ideally we can just update dma address part of a Rx descriptor
because the controller never overwrite the Rx descriptor. This
requires some Rx initialization code changes and it would be done
later after fixing other incorrect bus_dma(9) usages.
instead of POSTREAD: the hardware do not touch this memory (CPU
updates it). It is already synchronized as PREWRITE after the
processing is done.
- Synchronize RX return ring memory in rx_eof. This is needed
as the deviced updates this memory when receives packets.
- Decouple the synchronization of BGE status block in the interrupt
service routine: perfrom PREREAD synchronization only all accesses
to this block are finished. This seems to be more natural.
Reviewed by: yongari, marius
MFC after: 2 weeks
to the lock we hold, disable interrupts, and announce to the firmware
that we are shutting down. Especially do this before disabling blocks.
This makes some types of machines with asf enabled no longer hang upon
boot, when we start configuring the interface.
PR: i386/96382, kern/100410, kern/122252, kern/116328
Reported by: erwin
Hardware provided by: TDC A/S
Reviewed by: stas
Tested by: stas
BGE_PCI_PRODID_ASICREV register to store the chip identifier and its revision.
- Add new grouping macro for 7575+ chips (BGE_IS_5755_PLUS).
- Add IDs for Fujitsu-branded Broadcom adapters.
PR: kern/127587
Tested by: Thomas Quinot <thomas@quinot.org> (BCM7561 A0)
MFC after: 2 weeks
Obtained from: OpenBSD
loop iteration as it can be updated by the card while we
process the RX ring forcing us to process RX descriptors
for which DMA synchronisation operation has not been
performed. This fixes the bug when bge(4) drops packets
under high load.
Discussed with: yongari, marius
Approved by: re (kib)
MFC after: 1 week
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.
In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).
Bump __FreeBSD_version in order to signal such situation.
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
drops and re-grabs the softc mutex in the middle, resulting in kernel
trap 12. This may happen when a lot of traffic is being hammered on
one bge(4) interface while the system is shutting down.
Reported by: Alexander Sack <pisymbol gmail com>
PR: kern/134548
MFC After: 2 weeks
quirk requiring it to be enabled even when using MSI. This makes
the latter work again after r189285.
- Remove a comment which no longer applies since r190194.
- If boot verbose, print asicrev, chiprev and bus type on attach.
- For PCI Express devices:
1) Adjust max read request size to 4Kbytes
2) Turn on FIFO_LONG_BURST in RDMA during bge_blockinit()
Though 1) does not seem to have much to do with the poor TX performance
observed on PCI Express bge(4), 2) does fix the problem. [1]
- Nuke the RX CPU self-diag, which prevents working cards from working
(Linux tg3 does not have this diag neither does OpenBSD's bge(4)).
The increasing of the firmware handshaking timeout to 20000 retries
done as part of the original commit isn't merged as way already have a
way higher BGE_TIMEOUT of 100000.
PR: 119361 [1]
Obtained from: tg3 via DragonflyBSD [1], DragonflyBSD
causes data corruption in combination with certain bridges.
Information about this problem was kindly provided by davidch. [1]
- As BGE_FLAG_PCIX is meant to indicate that the controller is in
PCI-X mode, revert to the pre __FreeBSD_version 602101 method of
reading the bus mode register rather than checking the mere
existence of a PCI-X capability, which is also there when the
NIC f.e. is put into a 32-bit slot causing it not to be in PCI-X
mode. Setting BGE_FLAG_PCIX inappropriately could cause the NIC
to be tuned incorrectly.
PR: 128833 [1]
Reviewed by: jhb
MFC after: 3 days
for the BCM5714 revision A0 when in a multi-port configuration
and unconditionally for the remainder of the class of BCM575X
and beyond chips.
This was prodded by mav and is based on a suggestion and a
patch submitted by jhb.
Reviewed by: jhb
MFC after: 2 months
containing an Ethernet address fitted as this is yet another thing
that fails in that case in order to avoid the one second delay
until pci_read_vpd_reg() times out.
- Const'ify the bge_devs array.
- Rename BGE_FLAG_EEPROM to BGE_FLAG_EADDR to underline it's absence means
"there's no chip containing an Ethernet address fitted to the BGE chip
so we have to get it from the firmware instead" rather than "there's no
EEPROM, but maybe NVRAM or something else".
- Don't treat BCM5906[M] generally like chips w/o BGE_FLAG_EADDR set, just
in the two cases really necessary. This gets us line with the original
patch for DragonFlyBSD.
- For sparc64 restore the intended behavior of obtaining the Ethernet
address from the firmware in case BGE_FLAG_EADDR is not set, even for
BCM5906[M].
- Fix some style(9) bugs introduced with rev. 1.208 of if_bge.c
Approved by: jhb
Additional testing by: Thomas Nystroem (BCM5906)
all cards/modes.
In addition to the intr forcing added with rev. 1.205 adopt the other
places to use the same logic.
We need to exclude a few chips/revisions (5700, 5788) from using the
enhanced version and fall back to the old way as that is the only
method they support.
Tested by: phk
Suggested by: davidch, Broadcom (thanks a lot for the help!)
MFC after: 16 days
10/100 operation and place the mailbox registers at a different offset.
They also do not have an EEPROM, so the MAC address must be read from
NVRAM instead.
MFC after: 1 month
PR: kern/118975
Submitted by: benjsc, Thomas Nyström thn at saeab dot se
Submitted by: sephe (original patch for DragonflyBSD)
when creating the parent bus DMA tag. While at it correct the style
and a nearby comment.
- Take advantage of m_collapse(9) for performance reasons.
MFC after: 2 weeks
Because of this we were not getting further interrupts for link state
changes, thus never went into iface UP state and thus could not transmit.
The only way out of this was an incoming packet generating an rx interrupt
and making us call into bge_link_upd.
Up to rev. 1.101, in bge_start_locked, we only returned instantly
if there was 'no link AND nothing queued for tx'. So with a packet queued
for tx, we hit the register scrubbing at the end of bge_start_locked
and were out fine. We simply lost a packet or two but got the interrupts
need to get into UP state.
With rev. 1.102 this was turned into 'if there is no link OR there is
nothing to send' (correct behaviour) and as long as there is no link
we never hit the register scrubbing and consequently never got the link UP.
What we do now is force an interrupt at the end of bge_ifmedia_upd_locked
so we will call bge_link_upd, clear the link state attention and get
further interrupts.
This helps to get the iface UP on an idle network or at least to get
it UP faster not depending on an rx intr anymore.
In case you could not get a DHCP lease or it took very long,
it was because of this.
It is unknown which chips are affected by this. ASIC rev. 0x2003 was the
most popular trouble candidate.
At least the fiber cards should have been working fine.
Which register to scrub is currently under discussion. The comitted
solution was tested and found to work for a lot of setups. It might
not help with MSI.
The reason why we end up in such a situation is entirely unknown.
PR: kern/111804
Tested by: phk, scottl at Y!
MFC after: 14 days
The register layout is little different from memory-mapped stats
in the previous generation chips. In fact, it is bad because
registers in this range are cleared after reading them.
Reviewed by: scottl
MFC after: 3 days
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)