socket-buffer implementations, introduce a return value for MCLGET()
(and m_cljget() that underlies it) to allow the caller to avoid testing
M_EXT itself. Update all callers to use the return value.
With this change, very few network device drivers remain aware of
M_EXT; the primary exceptions lie in mbuf-chain pretty printers for
debugging, and in a few cases, custom mbuf and cluster allocation
implementations.
NB: This is a difficult-to-test change as it touches many drivers for
which I don't have physical devices. Instead we've gone for intensive
review, but further post-commit review would definitely be appreciated
to spot errors where changes could not easily be made mechanically,
but were largely mechanical in nature.
Differential Revision: https://reviews.freebsd.org/D1440
Reviewed by: adrian, bz, gnn
Sponsored by: EMC / Isilon Storage Division
for the lookup.
- For devices affected by PCI_QUIRK_MSI_INTX_BUG, ensure PCIM_CMD_INTxDIS
is cleared when using MSI/MSI-X.
- Employ PCI_QUIRK_MSI_INTX_BUG for BCM5714(S)/BCM5715(S)/BCM5780(S) rather
than clearing PCIM_CMD_INTxDIS unconditionally for all devices in bge(4).
MFC after: 3 days
- Do not ever set a counter to a value. For those counters
that we don't increment, but return directly from hardware
create cases in if_get_counter() method.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
and keep both converted to drvapi and non-converted drivers
compilable.
o Make if_t typedef to struct ifnet *.
o Remove shim functions.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.
In many cases these bugs could result in leaks when a driver was detached.
Reviewed by: yongari
MFC after: 2 weeks
fiddle with the MSI count and pci_release_msi(9) is smart enough to just
do nothing in case of INTx.
- Don't allocate MSI as RF_SHAREABLE.
MFC after: 1 week
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
bit25 of rxMode MAC register of 5762 needs to be set for rx mgmt
filter to work correctly when processing match for UDP header
fields. Otherwise false positive can occur which causes IPv4
fragment to be received by APE instead of host.
Reported by: Geans Pin <geanspin@broadcom.com>
cross into regions which are within MSS bytes of a 4GB boundary.
If we encounter the condition, drop the packet.
Reviewed by: Geans Pin geanspin@Broacom
new 1Gb server controller chip that will be going into production
soon.
BCM5725 combines MAC with triple-speed PHY, a Network Controller
Sideband Interface (NC-SI) and on-chip memory buffer in a single
device. BCM5725 has an Application Processing Engine (APE) that is
capable of on-chip management and offloading features. BCM5725
supports high-precision clock, time stamp registers for
receive/transmit packets and programmable trigger inputs and
watchdog timeouts. These new features are not yet supported by
bge(4).
Many thanks to Broadcom for continuing to support FreeBSD!
Submitted by: Geans Pin geanspin@Broacom (initial version)
Reviewed by: Geans Pin geanspin@Broacom
H/W donated by: Broadcom
The read DMA request logic operation is based on having sufficient
available space in the transmit data buffer (TXMBUF) before a read
DMA can be requested. There are four read DMA channels that use
the TXMBUF, and the logic checks if the available free space in the
TXMBUF is large enough for all the data in the four Send Buffers
for which buffer descriptors have been fetched. The Enable_Request
signal is asserted only if the free TXMBUF space is larger than the
sum of the four DMA length registers. The power-up default value
of BGE_RDMA_LSO_CRPTEN_CTRL register bit 25 (bit 21 on BCM5720) is
zero, which selects the DMA length registers to connect to the
input of the adder block. The DMA length registers are
asynchronously reset following BCM5719/BCM5720 power-up, and due to
the lack of synchronous deassertion of the length registers reset
signal these resisters may contain uninitialized values following
the reset deassertion.
In the case of the failure the uninitialized DMA length register
values added up to more than the TXMBUF size, which prevented the
assertion of the Enable_Request signal and any subsequent read DMA
to start. This lockup condition is the root cause of failing to
generate any transmit traffic.
To workaround the issue, select alternate output of multiplexers
and transmit the first four Ethernet frames. This overwrites the
DMA length registers with valid values.
Reported by: Geans Pin <geanspin@broadcom.com>
Reviewed by: Geans Pin <geanspin@broadcom.com>
one. This change also fixes non-working traffic LED on BCM57780.
Submitted by: Masanobu SAITOH <msaitoh@NetBSD.org>
Tested by: Alexander Milanov <a@amilanov.com>
implemented as a 10 bits linear feedback shift register so only
lower 10 bits are valid.
Because this register is used to initialize random backoff interval
register only when resolved duplex is half-duplex, it wouldn't have
caused issues in these days.
Submitted by: Masanobu SAITOH <msaitoh@NetBSD.org>
Reporting link status in driver has a side-effect that makes mii(4)
check current link status. mii(4) will call link status change
callback when it sees link state change. Normally this wouldn't
have problems. However, ASF/IPMI firmware can actively access PHY
regardless of driver's running state such that reporting link
status for not-running interface can generate meaningless link
UP/DOWN messages.
This change also makes dhclient think driver got a valid link
regardless of link establishment so it will bypass dhclient's
initial link status check. I think that wouldn't be issue
though.
Tested by: Daniel Braniss <danny@cs.huji.ac.il>
Fix the IPMI regression by sending BGE_FW_DRV_STATE_UNLOAD to
ASF/IPMI firmware in driver attach phase. Sending heartheat to
ASF/IPMI is enabled only after upping interface so
setting driver state to BGE_FW_DRV_STATE_START in attach phase
broke IPMI access.
While I'm here, add NVRAM arbitration lock before performing
controller reset. ASF/IPMI firmware may be able to access the NVRAM
while controller reset is in progress. Without the arbitration
lock before resetting the controller, ASF/IPMI may not initialize
properly.
Special thanks to Miroslav Lachman who provided full remote
debugging environments.
- Make bge_lookup_{rev,vendor}() static.
- Factor out chip identification rather than duplicating the code.
- Sanitize bge_probe() a bit (don't hardcode buffer sizes, allow
bge_lookup_vendor() to return NULL so the excessive panic() three
can be removed there, etc.) and return BUS_PROBE_DEFAULT rather than
hardcoding 0.
- According to the Linux tg3 driver, BCM57791 and BCM57795 aren't
capable of Gigabit Ethernet.
- Check the return value of taskqueue_start_threads().
them, please let me know if not). Most of these are of the form:
static const struct bzzt_type {
[...list of members...]
} const bzzt_devs[] = {
[...list of initializers...]
};
The second const is unnecessary, as arrays cannot be modified anyway,
and if the elements are const, the whole thing is const automatically
(e.g. it is placed in .rodata).
I have verified this does not change the binary output of a full kernel
build (except for build timestamps embedded in the object files).
Reviewed by: yongari, marius
MFC after: 1 week
removed in r99417. bge(4) controllers can do TCP checksum offload
for IP fragmented datagrams but unlike ti(4), it lacks UDP checksum
offloading for IP fragmented datagrams. The problem was bge(4)
blindly requested TCP/UDP checksum for IP fragmented datagrams such
that it resulted in corrupted UDP datagrams before r99417.
Remove remaining code for TCP checksum offloading for IP fragmented
datagrams which should have been removed in r99417.
link at a lower speed so enabling it for fiber adapters is wrong.
Fix the issue by setting BGE_PHY_NO_WIRESPEED such that brgphy(4)
wouldn't enable the feature.
While I'm here move PHY specific feature/bug configuration to new
location(just before mii attach) for readability.
This change will enable IPMI access on 5717/5718/5719/5720 and 5761
controllers. Because ASF is not available when APE firmware is
present, bge_allow_asf tunable is ignored when driver detects APE
firmware. Also bge(4) no longer performs two resets(one blind
reset and the other reset with firmware in mind) in device attach.
Now bge(4) performs a reset with enough information in bge_reset().
The APE firmware also needs special handling to make suspend/resume
work but it was not implemented yet.
With this change, bge(4) should work on any 5717/5718/5719/5720
controllers. Special thanks to Mike Hibler at Emulab who setup
remote debugging on Dell R820. Without his help I couldn't be able
to address several issues happened on Dell Rx20 systems. And many
thanks to Broadcom for continuing to support FreeBSD!
Submitted by: davidch (initial version)
H/W donated by: Broadcom
Tested by: many
Tested on: Del R820/R720/R620/R420/R320 and HP Proliant DL 360 G8
BGE_PCI_PCISTATE register before issuing global reset. After
issuing reset, it reads BGE_PCI_PCISTATE register again and
compares the saved register value and current value. It was used to
know whether the global reset operation was completed or not.
Unfortunately, this logic caused several issues on recent BCM5717/
5718/5719 and BCM5720 controllers. It seems APE firmware accesses
some registers while global reset is in progress such that reading
BGE_PCI_PCISTATE register after reset does not yield old pre-reset
state value. This resulted in consuming too much time in global
reset and sometimes it couldn't successfully complete reset.
The BGE_MISCCFG_RESET_CORE_CLOCKS of BGE_MISC_CFG register is
self-clearing bit so driver is able to know the reset completion.
But the core-lock reset will disable indirect/flat/standard access
modes such that driver cannot poll BGE_MISCCFG_RESET_CORE_CLOCKS
bit of BGE_MISC_CFG register. So just wait enough time for
core-clock reset to complete.
Data sheet says driver should wait 100us for PCI/PCI-X devices and
100ms for PCIe devices. I chose 1ms for PCI/PCI-X since this value
was used for many years in bge(4). For PCIe devices, use 100ms as
recommended by data sheet.
bge_chipinit() also cleared BGE_MAC_MODE register which shall clear
firmware configured mode information. I think this will result in
losing ASF/IPMI link in device attachment. Let bge_reset() honor
firmware configured BGE_MAC_MODE register and don't announce driver
is UP in bge_reset(). Firmware should have control over driver until
it's fully initialized by driver.
While I'm here, enable workaround for PCI-X BCM5704 A0 in
bge_reset(). This will prevent internal arbitration logic from
switching to the other DMA engine after a retry cycle.
that requires 10ms delay after device reset. Because that code was
there from day 1, I guess it was added to give enough settlement
time after updating BGE_MAC_MODE register.
The recommended delay time for BGE_MAC_MODE after updating is 40us
and it was already done in r241219.
The VCPU(Virtual CPU) of BCM5906 is used to provide a mechanism to
control the bootcode execution and to pick up configuration data
stored inside the EEPROM.
The bootcode of BCM5906 will check the BGE_VCPU_STATUS_DRV_RESET
bit to decide which booting procedure to choose.
Data sheet indicates the VCPU of BCM5906 should set
BGE_VCPU_STATUS_DRV_RESET bit *before* VCPU reset or global reset.
water mark to 256 bytes. Otherwise controller will encounter DMA
write under run errors and would result in RX DMA hang. If the
maximum payload size is 128 bytes, the water mark is set to 128
bytes as usual.
While here, set maximum read request size to 2048 for BCM5719/BCM5720.
For other PCIe devices, use 4096. And reprogram the maximum read
request size whenever device reset is performed.