interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
shifts into the sign bit. Instead use (1U << 31) which gets the
expected result.
This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.
A similar change was made in OpenBSD.
Discussed with: -arch, rdivacky
Reviewed by: cperciva
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
command register. The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR. Thus, the bit is no longer
a reliable indication of capability, and should not be checked. This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.
This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.
Submitted by: jhb
Reviewed by: jfv, marius, achadd, achim
MFC after: 1 day
- Remove vestigial null pointer tests after malloc(..., M_WAITOK).
- Remove vestigal qualhack union
- Use strlcpy() instead of the error-prone strncpy() when parsing
EEPROM and copying strings
- Check the MAC address in the EEPROM strings more strictly.
- Expand the macro MXGE_NEXT_STRING() at its only user. Due to a typo,
the macro was very confusing.
- Remove unnecessary buffer limit check. The buffer is double-NUL
terminated per construction.
PR: kern/176369
Submitted by: Christoph Mallon <christoph.mallon gmx.de>
- Some mxge nics may store the serial number in the SN2 field of the
EEPROM. These will also have an SN=0 field, so parse the SN2 field,
and give it precedence.
- Skip MXGEFW_CMD_UNALIGNED_TEST on mxge nics which do not require it.
This saves roughly 10ms per port at device attach time.
Sponsored by: Myricom
MFC After: 7 days
- Re-fix build by restoring local removed in r247151, but protected
by #if defined(INET) || defined(INET6) so that the compile
succeeds in the !(INET||INET6) case.
- Protect call to in_pseudo() with an #ifdef INET, to allow
a kernel to link with mxge when INET is not compiled in.
- Also remove an errant (improperly commented) obsolete debugging printf
Thanks to Glebius for pointing out the !(INET||INET6) build issue.
Sponsored by: Myricom
MFC After: 7 days
- Add support for IPv6 rx csum offload
- Finally switch mxge from using its own driver lro, to
using tcp_lro
MFC after: 7 days
Sponsored by: Myricom Inc.
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.
Reviewed by: jfv, gnn
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.
For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
This fixes a long standing bug in mxge(4) where "ifconfig mxge0 $IP"
did not bring the interface into a RUNNING state, like it does on
most (all?) other FreeBSD NIC drivers.
Thanks to gnn for mentioning the bug, and yongari for pointing out that
ether_ioctl() invokes ifp->if_init() in SIOCSIFADDR.
MFC after: 7 days
The Myri10GE NIC will assume all TSO frames contain partial checksum,
and will emit TSO segments with bad TCP checksums if a TSO frame
contains a full checksum. The mxge driver takes care to make sure
that TSO is disabled when checksum offload is disabled for this
reason. However, modules that modify packet contents (like pf) may
end up completing a checksum on a TSO frame, leading to the NIC emitting
TSO segments with bad checksums.
To workaround this, restore the partial checksum in the mxge driver
when we're fed a TSO frame with a full checksum.
Reported by: Bob Healey
MFC after: 3 days
- Re-probe xfp / sfp+ socket on link events, in case user
has changed transceiver
- correctly report current media to avoid confusing lagg (reported by Panasas)
- Report link speed (submitted by yongari)
Reviewed by: yongari (earlier version)
MFC after: 7 days
- Don't leak slice resources when mxge_alloc_rings() fails
- Start taskq threads only after we know attach will succeed. At
boot time, taskqueue_terminate() will loop infinately, waiting
for the threads to exit, and hang the system.
Submitted by: Panasas
MFC After: 3 days
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq
Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks
Pertinant highlights from Myricom CHANGES file include:
- Make sure invalid external smbus activity cannot affect performance
- Fix to avoid a bug where the link could sometimes stay reported as
up on after unplugging the cable.
- For 8B NIC, make smbus connection passive at init to avoid
possible address conflicts
- Increase number of slices to 17 for multi-slice fw
- Fix a bug where packets dropped because of link_overflow could
be occasionally reported as bad_crc32
- Add selectable failover strategy for dual-port chip: symmetric or primary/backup
- On failover, send RARP broadcast to make the change immediately
known to the network
- Change endianess for PCI Device Serial Number
- For dual-port NICs, time to failover is now a few microsecs
instead of a few millisecs.
MFC after: 3 days
by checking PCI config space when the NIC is not
transmitting. Previously, a h/w fault would not have been
detected if the NIC was down, or handling an RX only
workload.
1) Restore the PCI Express control register after a watchdog
reset. This is required because the device will come out
of watchdog reset with the pectl reg at its default state,
and important BIOS configuration (like max payload size)
could be lost.
2) Call mxge_start_locked() for every tx queue before dropping
the lock in the watchdog handler. This is required, as
the queue's buf ring may have filled during the reset.
- Mark the link as down, so if watchdog reset fails, link watching
failover software can notice it
- Don't send MXGEFW_CMD_ETHERNET_DOWN if the NIC has been reset, it is
not needed, and will fail on a freshly reset NIC.
- Ensure the transmit routines aren't attempting to PIO write to doorbells
while the NIC is being reset.
- Download the correct f/w, rather than using the EEPROM f/w after reset.
- Export a count of the number of watchdog resets via sysctl
- Zero all f/w stats at reset. This will lead to less confusing
diagnostic output when investigating NIC failures.
MFC after: 3 days
loader, because it uses a reserved suffix (_type). Fix
this by removing the "_" and renaming the tunable to
hw.mxge.rss_hashtype. The old (rss_hash_type) tunable is
still fetched, in case people load the driver via scripts.
When both are present in the kernel environment,
the new value (hw.mxge.rss_hashtype) overrides the old
value.
Approved by: re (kib)
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
I tried re-ordering ether_ifdetach(), but this created a new race
where sometimes, when under heavy receive load (>1Mpps) and running
tcpdump, the machine would panic. At panic, the ithread was still in
the original (not dead) if_input() path, and was accessing stale BPF
data structs. By using a dying flag, I can close the interface prior
to if_detach() to be certain the interface cannot send packets up in
the middle of ether_ifdetach.