come up as 't6nex' nexus devices with 'cc' ports hanging off them.
The T6 firmware and configuration files will be added as soon as they
are released. For now the driver will try to work with whatever
firmware and configuration is on the card's flash.
Sponsored by: Chelsio Communications
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.
VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.
Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.
Reviewed by: np
MFC after: 2 months
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7599
routines available in t4_tom to manage the iSCSI DDP page pod region.
This adds the ability to use multiple DDP page sizes to the iSCSI
driver, among other improvements.
Sponsored by: Chelsio Communications
Chelsio NICs are a bit unique compared to some other NICs in that they
expose different functionality on different physical functions. In
particular, PF4 is used to manage the NIC interfaces ('t4nex' and 't5nex').
However, PF4 is not able to create VF devices. Instead, VFs are only
supported by physical functions 0 through 3. This commit adds 't4iov'
and 't5iov' drivers that attach to PF0-3.
One extra wrinkle is that the iov devices cannot enable SR-IOV until the
firwmare has been initialized by the main PF4 driver. To handle this
case, a new t4_if kobj interface has been added to permit cross-calls
between the PF drivers. The PF4 driver notifies sibling drivers when it
is fully attached. It also requests sibling drivers to detach before it
detaches. Sibling drivers query the PF4 driver during their attach
routine to see if it is attached. If not, the sibling drivers defer
their attach actions until the PF4 driver informs them it is attached.
VF devices are associated with a single port on the NIC. VF devices
created from PF0 are associated with the first port on the NIC, VFs
from PF1 are associated with the second port, etc. VF devices can
only be created from a PF device that has an associated port. Thus,
on a 2-port card, VFs are only supported on PF0 and PF1.
Reviewed by: np (earlier versions)
MFC after: 1 month
Sponsored by: Chelsio Communications
These firmwares were obtained from the "Chelsio T5/T4 Unified Wire
v2.12.0.3 for Linux" release. Changes since 1.14.4.0 (which is the
firmware in -STABLE branches) are in the "Release Notes" accompanying
the Unified Wire release and are copy-pasted here as well.
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
ETH
- Fixed a link flap issue on T580-CR.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
- Fixed an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host. (DCBX CEE)
- Fixed an issue where apply bit set for APP id was affecting the ETS and PFC
settings.(DCBX IEEE)
- Fixed an issue where app priority values are not handled correctly in fw.
(DCBX IEEE)
- Fixed an issue where enable/disable dcbx can cause crash. (DCBX CEE,DCBX IEEE)
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host.(DCBX CEE)
- Fixed an issue where enable/disable dcbx can cause crash in firmware.(DCBX CEE)
FOiSCSI
- Fixes an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
Obtained from: Chelsio Communications
MFC after: 6 weeks
Relnotes: yes
Sponsored by: Chelsio Communications
These firmwares were obtained from the beta "Chelsio T5/T4 Unified Wire
v2.12.0.2 for Linux" release. Changes since last release are listed in the
"Release Notes" accompanying the beta release and are copy-pasted here as well.
The plan is to have only GA'd firmwares in any -STABLE FreeBSD branch so I'll
MFC this (after 2 months) only if it ends up in a GA release.
================================================================================
================================================================================
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.15.28.0
Date : 02/29/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after
reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic
management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
================================================================================
================================================================================
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.28.0
Date : 02/29/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after
reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic
management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 months
Sponsored by: Chelsio Communications
cards supported by cxgbe(4).
On the host side this driver interfaces with the storage stack via the
ICL (iSCSI Common Layer) in the kernel. On the wire the traffic is
standard iSCSI (SCSI over TCP as per RFC 3720/7143 etc.) that
interoperates with all other standards compliant implementations. The
driver is layered on top of the TOE driver (t4_tom) and promotes
connections being handled by t4_tom to iSCSI ULP (Upper Layer Protocol)
mode. Hardware assistance in this mode includes:
- Full TCP processing.
- iSCSI PDU identification and recovery within the TCP stream.
- Header and/or data digest insertion (tx) and verification (rx).
- Zero copy (both tx and rx).
Man page will follow in a separate commit in a couple of weeks.
Relnotes: Yes
Sponsored by: Chelsio Communications
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
LinuxKPI into the kernel. This was done to keep the build rules for
the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.
Reviewed by: np @ (cxgb and cxgbe related changes only)
Sponsored by: Mellanox Technologies
changes in the firmwares since 1.11.27.0 are listed here (straight copy-paste
from the "Release Notes.txt" accompanying the Chelsio Unified Wire 2.11.1.0
release on the website).
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.14.4.0
Date : 08/05/2015
================================================================================
FIXES
-----
BASE:
- Fixes a potential data path hang by properly programming PMTX congestion
threshold settings.
- Fixes a potential initialization error when accessing a configuration file
stored on the flash.
- Fixes a regression where SGE resources can be miss-sized if iWARP is disabled.
ETH:
- Fixes a timing issue that would prevent CR4 links from coming up with some
switches.
FOFCoE:
- Defers fcoe linkdown mailbox command handling till LOGO is sent.
- Updates vlan prio for all outstanding IOs during dcbx update.
ENHANCEMENTS
------------
BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.
ETH:
- Enhances segmentation offload to include VxLAN and Geneve.
- Adds PTP support.
- Adds new interface to allow the driver to query the VI rss table base
addresses.
- Allows the driver to program the SGE ingrext contxt CongDrop field.
OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
and rcv scale factors.
iSCSI:
- Adds support for iscsi segmentatation offload (ISO).
- Adds support for iscsi t10-dif offload.
FOiSCSI:
- Sets FORCE_BIT for cut through processing for FOiSCSI.
FOFCoE:
- Adds support for FCoE BB6.
- Improves WRITE performance.
================================================================================
================================================================================
Version : 1.13.32.0
Date : 03/25/2015
================================================================================
FIXES
-----
BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
adapter configurations)
- Fixes config file based PL_TIMEOUT register programming
ETH:
- Fixes a potential EO UDP SEG header corruption
- Fixes an issue where 1000Base-X was not enabled correctly when using QSA
modules
OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1
FOFCoE:
- Fixes fcoe xchg leaks in linkdown/peer down path
- Fixes cleanup in FCoE linkdown and fixed buf timer flowid abuse
- Fixes fw crash by clearing fcf flowc during bye
FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.
ENHANCEMENTS
------------
BASE:
- Adds support for VFs on PFs 4 to 7
- Adds support for QPs/CQs on any physical and virtual function
ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
- Adds support for CR4 links (BEAN/AEC on 40G TwinAx cables)
OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software
FOISCSI:
- Adds IPv6 support for foiscsi. Keeps backward compatibility with
old foiscsi drivers which doesn't support ipv6.
FOFCoE:
- Added fcoe debug support in flowc dump
================================================================================
================================================================================
Version : 1.12.25.0
Date : 10/22/2014
================================================================================
FIXES
-----
BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Fixes an issue where the link would intermittently fail to come up
- Fixes an issue where adapters with an external PHY couldn't run at 100Mbps
- Fixes an issue where active optical cables were not recognized
- Fixes link advertising issues on T520-BT (speed and pause frames) that would
cause the link to negotiate unexpected settings
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested
ETH:
- Fixes NVGRE Segmentation Offload network header generation.
DCBX:
- Fixes an issue where some settings were not being sent to the switch
correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
FW
- Fixes a firmware crash on DCBX APP information request before link up
FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT
ENHANCEMENTS
-------------
BASE:
- Adds link partner settings reporting when available
- Adds QSA support (in conjunction with QSA VPD)
- Adds T520-BT LED support
- Reports NOTSUPPORTED for modules with an unhandled identifier
DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs
FOiSCSI:
- Add support for multiple iSCSI DDP client
- Sends DHCP renew request when lease expires
================================================================================
22.2. T4 Firmware
+++++++++++++++++
Version : 1.14.4.0
Date : 08/05/2015
================================================================================
FIXES
-----
BASE:
- Fixes a potential initialization error when accessing a configuration file
stored on the flash.
- Initialize PCIE_DBG_INDIR_REQ.Enable to 0, as hardware failed to do so and
register dumps could result in errors.
ETH:
- Fixes an issue that sometimes prevented the link from coming up in CR adapters.
ENHANCEMENTS
------------
BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.
ETH:
- Adds new interface to allow the driver to query the VI rss table base
addresses.
OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
and rcv scale factors.
================================================================================
================================================================================
Version : 1.13.32.0
Date : 03/25/2015
================================================================================
FIXES
-----
BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
adapter configurations)
- Fixes config file based PL_TIMEOUT register programming
ETH:
- Fixes a potential EO UDP SEG header corruption
OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1
FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.
ENHANCEMENTS
------------
ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software
================================================================================
================================================================================
Version : 1.12.25.0
Date : 10/22/2014
================================================================================
FIXES
-----
BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested
DCBX:
- Fixes an issue where some settings were not being sent to the switch
correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
FW
- Fixes a firmware crash on DCBX APP information request before link up
FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT
ENHANCEMENTS
------------
BASE:
- Adds link partner settings reporting when available
- Firmware now reports NOTSUPPORTED for modules with an unhandled identifier
DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs
FOiSCSI:
- Adds support for multiple iSCSI DDP clients
- Sends DHCP renew request when lease expires
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 weeks
Sponsored by: Chelsio Communications
this option from all modules that enable it theirselves.
In C mode -fms-extensions option enables anonymous structs and unions,
allowing us to use this C11 feature in kernel. Of course, clang supports
it without any extra options.
Reviewed by: dim
a dependency. This ensures "ifconfig cxl<n> ..." does the right thing
even when it's run with no driver loaded.
if_cxl.ko is the tiniest module in /boot/kernel.
MFC after: 2 weeks
a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.
b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual. mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue. It also has:
- the ability to enqueue/dequeue multiple items. This might become
significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
descriptors and have another if_transmit thread take over. A thread
that's writing tx descriptors can end up doing so for an unbounded
time period if a) there are other if_transmit threads continuously
feeding the sofware queue, and b) the chip keeps up with whatever the
thread is throwing at it.
- accurate statistics about interesting events even when the stats come
at the expense of additional branches/conditional code.
The NIC txq lock is uncontested on the fast path at this point. I've
left it there for synchronization with the control events (interface
up/down, modload/unload).
c) Add support for "type 1" coalescing work request in the normal NIC tx
path. This work request is optimized for frames with a single item in
the DMA gather list. These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.
d) Do not request automatic cidx updates every 32 descriptors. Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately). Also, request an automatic final update
when the queue idles after activity. This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles. This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.
e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx). Allow work requests to be written
directly to the hardware descriptor ring if room is available. I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.
MFC after: 2 months
options into kern.opts.mk and change all the places where we use
src.opts.mk to pull in the options. Conditionally define SYSDIR and
use SYSDIR/conf/kern.opts.mk instead of a CURDIR path. Replace all
instances of CURDIR/../../etc with STSDIR, but only in the affected
files.
As a special compatibility hack, include bsd.owm.mk at the top of
kern.opts.mk to allow the bare build of sys/modules to work on older
systems. If the defaults ever change between 9.x, 10.x and current for
these options, however, you'll wind up with the host OS' defaults
rather than the -current defaults. This hack will be removed when
we no longer need to support this build scenario.
Reviewed by: jhb
Differential Revision: https://phabric.freebsd.org/D529
opt_inet6.h into kmod.mk by forcing almost everybody to eat the same
dogfood. While at it, consolidate the opt_bpf.h and opt_mroute.h
targets here too.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way. You can use both
simultaneously.
For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface. These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software. Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc. You
should run netmap on the 'n' interfaces only, that's what they are for.
With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card. 2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress. I expect it to be closer to 40Mpps once done. In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size. T4 gear is totally untested.
trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef)
881.962562 main [1882] Sending 512 packets every 0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...
Relnotes: Yes
Sponsored by: Chelsio Communications.
all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and
for general purpose monitoring without tapping any cxgbe or cxl ifnet
directly.
Tracers on the T4/T5 chips provide access to Ethernet frames exactly as
they were received from or transmitted on the wire. On transmit, a
tracer will capture a frame after TSO segmentation, hw VLAN tag
insertion, hw L3 & L4 checksum insertion, etc. It will also capture
frames generated by the TCP offload engine (TOE traffic is normally
invisible to the kernel). On receive, a tracer will capture a frame
before hw VLAN extraction, runt filtering, other badness filtering,
before the steering/drop/L2-rewrite filters or the TOE have had a go at
it, and of course before sw LRO in the driver.
There are 4 tracers on a chip. A tracer can trace only in one direction
(tx or rx). For now cxgbetool will set up tracers to capture the first
128B of every transmitted or received frame on a given port. This is a
small subset of what the hardware can do. A pseudo ifnet with the same
name as the nexus driver (t4nex0 or t5nex0) will be created for tracing.
The data delivered to this ifnet is an additional copy made inside the
chip. Normal delivery to cxgbe<n> or cxl<n> will be made as usual.
/* watch cxl0, which is the first port hanging off t5nex0. */
# cxgbetool t5nex0 tracer 0 tx0 (watch what cxl0 is transmitting)
# cxgbetool t5nex0 tracer 1 rx0 (watch what cxl0 is receiving)
# cxgbetool t5nex0 tracer list
# tcpdump -i t5nex0 <== all that cxl0 sees and puts on the wire
If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K
"frames" with no L3/L4 checksum but this will show you the frames that
were actually transmitted.
/* all done */
# cxgbetool t5nex0 tracer 0 disable
# cxgbetool t5nex0 tracer 1 disable
# cxgbetool t5nex0 tracer list
# ifconfig t5nex0 destroy
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.
The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver. T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.
Sponsored by: Chelsio
- Add full support for IPv6 addresses.
- Read the size of the L2 table during attach. Do not assume that PCIe
physical function 4 of the card has all of the table to itself.
- Use FNV instead of Jenkins to hash L3 addresses and drop the private
copy of jhash.h from the driver.
MFC after: 1 week
Basically, this is automatic rx zero copy when feasible. TCP payload is
DMA'd directly into the userspace buffer described by the uio submitted
in soreceive by an application.
- Works with sockets that are being handled by the TCP offload engine
of a T4 chip (you need t4_tom.ko module loaded after cxgbe, and an
"ifconfig +toe" on the cxgbe interface).
- Does not require any modification to the application.
- Not enabled by default. Use hw.t4nex.<X>.toe.ddp="1" to enable it.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.
Build-tested with make universe.
30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe
Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
- Device configuration via plain text config file. Also able to operate
when not attached to the chip as the master driver.
- Generic "work request" queue that serves as the base for both ctrl and
ofld tx queues.
- Generic interrupt handler routine that can process any event on any
kind of ingress queue (via a dispatch table).
- A couple of new driver ioctls. cxgbetool can now install a firmware
to the card ("loadfw" command) and can read the card's memory
("memdump" and "tcb" commands).
- Lots of assorted information within dev.t4nex.X.misc.* This is
primarily for debugging and won't show up in sysctl -a.
- Code to manage the L2 tables on the chip.
- Updates to cxgbe(4) man page to go with the tunables that have changed.
- Updates to the shared code in common/
- Updates to the driver-firmware interface (now at fw 1.4.16.0)
MFC after: 1 month
filters working. (All other filters - switch without L2 info rewrite,
steer, and drop - were already fully-functional).
Some contrived examples of "switch" filters with L2 rewriting:
# cxgbetool t4nex0 iport 0 dport 80 action switch vlan +9 eport 3
Intercept all packets received on physical port 0 with TCP port 80 as
destination, insert a vlan tag with VID 9, and send them out of port 3.
# cxgbetool t4nex0 sip 192.168.1.1/32 ivlan 5 action switch \
vlan =9 smac aa:bb:cc:dd:ee:ff eport 0
Intercept all packets (received on any port) with source IP address
192.168.1.1 and VLAN id 5, rewrite the VLAN id to 9, rewrite source mac
to aa:bb:cc:dd:ee:ff, and send it out of port 0.
MFC after: 1 week
Reference code that shows how to get a packet's timestamp out of
cxgbe(4). Disabled by default because we don't have a standard way
today to pass this information up the stack.
The timestamp is 60 bits wide and each increment represents 1 tick of
the T4's core clock. As an example, the timestamp granularity is ~4.4ns
for this card:
# sysctl dev.t4nex.0.core_clock
dev.t4nex.0.core_clock: 228125
MFC after: 1 week