out 32 is not enough to support a full sized TSO packet.
While I'm here fix a long standing bug introduced in r169632 in
bce(4) where it didn't include L2 header length of TSO packet in
the maximum DMA segment size calculation.
In collaboration with: rmacklem
MFC after: 2 weeks
bit25 of rxMode MAC register of 5762 needs to be set for rx mgmt
filter to work correctly when processing match for UDP header
fields. Otherwise false positive can occur which causes IPv4
fragment to be received by APE instead of host.
Reported by: Geans Pin <geanspin@broadcom.com>
new 1Gb server controller chip that will be going into production
soon.
BCM5725 combines MAC with triple-speed PHY, a Network Controller
Sideband Interface (NC-SI) and on-chip memory buffer in a single
device. BCM5725 has an Application Processing Engine (APE) that is
capable of on-chip management and offloading features. BCM5725
supports high-precision clock, time stamp registers for
receive/transmit packets and programmable trigger inputs and
watchdog timeouts. These new features are not yet supported by
bge(4).
Many thanks to Broadcom for continuing to support FreeBSD!
Submitted by: Geans Pin geanspin@Broacom (initial version)
Reviewed by: Geans Pin geanspin@Broacom
H/W donated by: Broadcom
The read DMA request logic operation is based on having sufficient
available space in the transmit data buffer (TXMBUF) before a read
DMA can be requested. There are four read DMA channels that use
the TXMBUF, and the logic checks if the available free space in the
TXMBUF is large enough for all the data in the four Send Buffers
for which buffer descriptors have been fetched. The Enable_Request
signal is asserted only if the free TXMBUF space is larger than the
sum of the four DMA length registers. The power-up default value
of BGE_RDMA_LSO_CRPTEN_CTRL register bit 25 (bit 21 on BCM5720) is
zero, which selects the DMA length registers to connect to the
input of the adder block. The DMA length registers are
asynchronously reset following BCM5719/BCM5720 power-up, and due to
the lack of synchronous deassertion of the length registers reset
signal these resisters may contain uninitialized values following
the reset deassertion.
In the case of the failure the uninitialized DMA length register
values added up to more than the TXMBUF size, which prevented the
assertion of the Enable_Request signal and any subsequent read DMA
to start. This lockup condition is the root cause of failing to
generate any transmit traffic.
To workaround the issue, select alternate output of multiplexers
and transmit the first four Ethernet frames. This overwrites the
DMA length registers with valid values.
Reported by: Geans Pin <geanspin@broadcom.com>
Reviewed by: Geans Pin <geanspin@broadcom.com>
implemented as a 10 bits linear feedback shift register so only
lower 10 bits are valid.
Because this register is used to initialize random backoff interval
register only when resolved duplex is half-duplex, it wouldn't have
caused issues in these days.
Submitted by: Masanobu SAITOH <msaitoh@NetBSD.org>
This change will enable IPMI access on 5717/5718/5719/5720 and 5761
controllers. Because ASF is not available when APE firmware is
present, bge_allow_asf tunable is ignored when driver detects APE
firmware. Also bge(4) no longer performs two resets(one blind
reset and the other reset with firmware in mind) in device attach.
Now bge(4) performs a reset with enough information in bge_reset().
The APE firmware also needs special handling to make suspend/resume
work but it was not implemented yet.
With this change, bge(4) should work on any 5717/5718/5719/5720
controllers. Special thanks to Mike Hibler at Emulab who setup
remote debugging on Dell R820. Without his help I couldn't be able
to address several issues happened on Dell Rx20 systems. And many
thanks to Broadcom for continuing to support FreeBSD!
Submitted by: davidch (initial version)
H/W donated by: Broadcom
Tested by: many
Tested on: Del R820/R720/R620/R420/R320 and HP Proliant DL 360 G8
BGE_PCI_PCISTATE register before issuing global reset. After
issuing reset, it reads BGE_PCI_PCISTATE register again and
compares the saved register value and current value. It was used to
know whether the global reset operation was completed or not.
Unfortunately, this logic caused several issues on recent BCM5717/
5718/5719 and BCM5720 controllers. It seems APE firmware accesses
some registers while global reset is in progress such that reading
BGE_PCI_PCISTATE register after reset does not yield old pre-reset
state value. This resulted in consuming too much time in global
reset and sometimes it couldn't successfully complete reset.
The BGE_MISCCFG_RESET_CORE_CLOCKS of BGE_MISC_CFG register is
self-clearing bit so driver is able to know the reset completion.
But the core-lock reset will disable indirect/flat/standard access
modes such that driver cannot poll BGE_MISCCFG_RESET_CORE_CLOCKS
bit of BGE_MISC_CFG register. So just wait enough time for
core-clock reset to complete.
Data sheet says driver should wait 100us for PCI/PCI-X devices and
100ms for PCIe devices. I chose 1ms for PCI/PCI-X since this value
was used for many years in bge(4). For PCIe devices, use 100ms as
recommended by data sheet.
bge_chipinit() also cleared BGE_MAC_MODE register which shall clear
firmware configured mode information. I think this will result in
losing ASF/IPMI link in device attachment. Let bge_reset() honor
firmware configured BGE_MAC_MODE register and don't announce driver
is UP in bge_reset(). Firmware should have control over driver until
it's fully initialized by driver.
While I'm here, enable workaround for PCI-X BCM5704 A0 in
bge_reset(). This will prevent internal arbitration logic from
switching to the other DMA engine after a retry cycle.
water mark to 256 bytes. Otherwise controller will encounter DMA
write under run errors and would result in RX DMA hang. If the
maximum payload size is 128 bytes, the water mark is set to 128
bytes as usual.
While here, set maximum read request size to 2048 for BCM5719/BCM5720.
For other PCIe devices, use 4096. And reprogram the maximum read
request size whenever device reset is performed.
updated.
o Number of times NIC ran out of RX buffer descriptors
o Number of inbound packet errors
o Number of inbound packets that were chosen to be discarded
Previously only the discarded packet counter was used to update
if_ierrors. This change fixes wrong if_ierrors counter on
BCM570[0-4] controllers. For BCM5705 and later controllers bge(4)
already correctly counted it.
Reported by: Eugene Grosbein <egrosbein <> rdtc dot ru>
device in device attach. This would help to narrow down issue to a
specific controller and operating mode of the controller.
While I'm here rename BGE_MISCCFG_BOARD_ID with
BGE_MISCCFG_BOARD_ID_MASK.
AMD-8131 PCI-X bridge. The bridge seems to reorder write access to
mailbox registers such that it caused watchdog timeouts by
out-of-order TX completions.
Tested by: Michael L. Squires <mikes <> siralan dot org >
Reviewed by: jhb
bge(4) sends BGE_FW_CMD_DRV_ALIVE command to firmware every 2
seconds. BGE_FW_CMD_DRV_ALIVE command requires 4 bytes data. This
data contains timeout value in seconds until the next
BGE_FW_CMD_DRV_ALIVE command.
Broadcom recommends driver set the value 3 times longer than the
interval that it sends BGE_FW_CMD_DRV_ALIVE. Currently bge(4) uses
3 seconds so probably we have to increase it in future and use
different ALIVE command(e.g. BGE_FW_CMD_DRV_ALIVE3).
No functional changes.
This bit(SW event 7 in publicly available data sheet) is used to
make RX CPU handle a firmware command and the bit is automatically
cleared after RX CPU completed the command.
Generally firmware command takes the following steps.
1. Write BGE_SRAM_FW_CMD_MB with a command.
2. Write BGE_SRAM_FW_CMD_LEN_MB with the length of the command in bytes.
3. Write BGE_SRAM_FW_CMD_DATA_MB with actual command data.
4. Generate BGE_RX_CPU_EVENT and let firmware handle the command.
5. Wait for the ACK of the firmware command.
No functional changes.
about the various driver events like load, unload, reset, suspend,
restart, and ioctl operations.
Define driver's event rather than using hard-coded values. We don't
still send suspend/resume event to firmware.
Previously bge(4) used BGE_SDI_STATUS to send events. Because driver
has to access firmware mail box to inform current state, using
BGE_SDI_STATUS register was wrong. The end result was the same as
BGE_SDI_STATUS is 0x0C04.
No functional changes.
The origin of GENCOMM seems to come from Alteon Tigon Host/NIC
interface definition where it defines general communications region
which is active when firmware is loaded and running. This region
was used in communication between the host and processor internal
to the Tigon chip.
Broadcom data sheet also defines the region as 'Software Gencomm'
in NetXtreme memory map but lacks detailed description of its
interface so it was hard to know which ones are used for which
interface.
This change shall slightly enhance readability.
No functional changes.
larger than 4KB in size. However the maximum DMA segment size
created in DMA tag is 4KB, so we wouldn't encounter the issue here.
Just record this issue such that let developers not to create a DMA
segment that is larger than 4KB for BCM5719. It's possible to split
a DMA segment into multiple smaller ones in run time but I believe
it's not worth to implement that.
disabled for BCM5719 A0 revision due to known hardware errata.
Many thanks to Broadcom for continuing support of FreeBSD.
Submitted by: Geans Pin at Broadcom
have similar hardware features of BCM5718 family except the number
of receive return ring is 4. The BCM57765 family is known to
support IEEE 802.3az EEE(Energy Efficient Ethernet) but this change
does not include EEE support code. I hope EEE is implemented in
near future.
This change will support BCM57761, BCM57765, BCM57781, BCM57785,
BCM57791 and BCM57795. All hardware offloading features are
supported and suspend/resume also should work.
Many thanks to Broadcom for continuing support of FreeBSD.
Tested by: Paul Thornton (prt <> prt dot org)
HW donated by: Broadcom
Unlike other controllers which have more advanced jumbo support,
these controllers have one send ring, one standard receive producer
ring and one receive return ring. In order to receive jumbo frames
on the controllers, driver now will increase Rx buffer size to 9k.
Two Rx modes are supported on these controllers and I chose
standard Rx BDs over extended Rx BDs. The extended Rx BD mode
allows up to 4 segmentations for each Rx BDs such that kernel does
not have to allocate large buffer of contiguous memory for
receiving. The extended Rx BD mode is already used on controllers
that have separate jumbo receive ring. However, using extended Rx
BDs on BCM5714/BCM5715/BCM5780 reduces the number of Rx BDs to 256
entries which in turn may reduce the performance. Also UMA backed
page allocator for jumbo frame returns contiguous memory so using
extended Rx BD has no advantage on FreeBSD unless highly customized
local allocator implemented in driver is used.
To use jumbo buffers in standard receive ring, Rx buffer allocation
handler was changed to allocate MJUM9BYTES sized mbuf.
PR: kern/155192
Tested by: Vijay Singh <vijju.singh <> gmail dot com>
Submitted by: mjacob (initial version)
the dual port BCM5717 and BCM5718 devices which are intended for
mainstream workstation and entry-level server designs and
represents the twelfth generation of NetXtreme Ethernet controllers.
This family is the successor to the BCM5714/BCM5715 family and
supports IPv4/IPv6 checksum offloading, TSO, VLAN hardware tagging,
jumbo frames, MSI/MSIX, IOV, RSS and TSS.
This change set supports all hardware features except IOV and
RSS/TSS. Unlike its predecessors, only extended RX buffer
descriptors can be posted to the jumbo producer ring. Single RX
buffer descriptors for jumbo frame are not supported. RSS requires
a more substantial set of changes and will apply to a larger set
of NetXtreme devices so RSS/TSS multi-queue support will be
implemented in a future releases.
Special thanks to Broadcom who kindly sent a sample board to me
and to davidch who gave provided the initial support code.
Submitted by: davidch (initial version)
HW donated by: Broadcom
to BCM6906 A0/A2. This should fix a long standing BCM5906 A2 lockup
issues. Data sheet explicitly mentions BCM5906 A0, A1 and A2 use
de-pipelined mode on these revisions.
Special thanks to Buganini who tried all combinations of
experimental patches for more than 10 days.
Tested by: Buganini <buganini <> gmail dot com >
auto-negotiation results in half-duplex operation, excess collision
on the ethernet link may cause internal chip delays that may result
in subsequent valid frames being dropped due to insufficient
receive buffer resources. The workaround is to choose de-pipeline
method as a flow control decision for SDI. De-pipeline method
allows only 1 data in TxMbuf at a time such that a request to RDMA
from SDI is made only when TxMbuf is empty. Thanks for david for
providing detailed errata information.
receive two back-to-back send BDs with less than or equal to 8
total bytes then the device may hang. The two back-to-back send
BDs must be in the same frame for this failure to occur.
Thanks to davidch for detailed errata information.
Reviewed by: davidch
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
Reviewed by: jhb, yongari
BGE_MI_MODE register accesses. Previously bge(4) used to read
BGE_MI_MODE register to detect whether it needs to disable
autopolling feature or not. Because we don't touch autopolling in
other part of driver there is no reason to read BGE_MI_MODE
register given that we know default value in advance. In order to
achieve the goal, check whether the controller has CPMU(Central
Power Mangement Unit) capability. If controller has CPMU feature,
use 500KHz MII management interface(mdio/mdc) frequency regardless
core clock frequency. Otherwise use default MII clock. While I'm
here, add CPMU register definition.
In bge_miibus_readreg(), rearrange code a bit and remove goto
statement. In bge_miibus_writereg(), make sure to restore
autopolling even if MII write failed. The delay time inserted after
accessing BGE_MI_MODE register increased from 40us to 80us.
The default PHY address is now stored in softc. All PHYs supported
by bge(4) currently uses PHY address 1 but it will be changed when
we add newer controllers. This change will make it easier to change
default PHY address depending on PHY models.
Submitted by: davidch
controllers. bge(4) exported MAC statistics on controllers that
maintain the statistics in the NIC's internal memory. Newer
controllers require register access to fetch these values. These
counters provide useful information to diagnose driver issues.
has reached. This reduced number of dropped frames when
flow-control is enabled. Previously it dropped incoming frames once
RX MBUF low watermark has reached. The value used in MAC RX MBUF
low watermark is greater than or equal to 4 so receiving two more
RX frames should not be a problem.
Obtained from: OpenBSD
tag. All controllers that are not BCM5755 or higher have 4GB
boundary DMA bug. Previously bge(4) used 32bit DMA address to
workaround the bug(r199670). However this caused the use of bounce
buffers such that it resulted in poor performance for systems which
have more than 4GB memory. Because bus_dma(9) honors boundary
restriction requirement of DMA tag for dynamic buffers, having a
separate TX/RX mbuf DMA tag will greatly reduce the possibility of
using bounce buffers. For DMA buffers allocated with
bus_dmamem_alloc(9), now bge(4) explicitly checks whether the
requested memory region crossed the boundary or not.
With this change, only the DMA buffer that crossed the boundary
will use 32bit DMA address. Other DMA buffers are not affected as
separate DMA tag is created for each DMA buffer.
Even if 32bit DMA address space is used for a buffer, the chance to
use bounce buffer is still very low as the size of buffer is small.
This change should eliminate most usage of bounce buffers on
systems that have more than 4GB memory.
More correct fix would be teaching bus_dma(9) to honor boundary
restriction for buffers created with bus_dmamem_alloc(9) but it
seems that is not easy.
While I'm here cleanup bge_dma_map_addr() and remove unnecessary
member variables in bge_dmamap_arg structure.
Tested by: marcel