The TCB is read using a memory window right now. A better alternate to
get self-consistent, uncached information would be to use a GET_TCB
request but waiting for a reply from hw while holding non-sleepable
locks is quite inconvenient.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14817
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.
However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue. This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.
Submitted by: Harsh Jain @ Chelsio (original version)
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14871
This fixes an avoidable EINVAL when the user tries to disable AN after
the port is initialized but l1cfg doesn't have a valid speed to use.
MFC after: 1 week
Sponsored by: Chelsio Communications
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero. This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.
Reported by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
- Remove the one use of is_tls_offload() and the function. AIO special
handling only needs to be disabled when a TOE socket is actively doing
TLS offload on transmit. The TOE socket's mode (which affects receive
operation) doesn't matter, so remove the check for the socket's mode and
only check if a TOE socket has TLS transmit keys configured to determine
if an AIO write request should fall back to the normal socket handling
instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c. It is not used in critical paths,
so inlining isn't that important. Change return type to bool while here.
Sponsored by: Chelsio Communications
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections. Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing. Currently
these socket options are implemented as TCP options in the vendor
specific range. A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.
TOE sockets can either offload both transmit and reception of TLS
records or just transmit. TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface. Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key. This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library). Receive offload can only be used with TOE sockets using the
TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers. Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket. Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value. A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library. Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.
New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:
dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620
TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.
TLS transmit work requests require a buffer containing IVs. If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request. This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.
Received TLS frames use two new CPL messages. The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors. The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.
A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().
TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.
In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host. As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket. In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase. Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.
A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS). The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.
Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k). To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.
Reviewed by: np (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14529
- Change t4_ddp_mod_load() to return void instead of always returning
success. This avoids having to pretend to have proper support for
unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
The module handling code in the kernel invokes MOD_UNLOAD on a module
whose MOD_LOAD fails with an error already.
Reviewed by: np (part of a larger patch)
MFC after: 1 month
Sponsored by: Chelsio Communications
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
MFC after: 1 week
Sponsored by: Mellanox Technologies
After the auth key is copied into the ipad[] array, any remaining bytes
are cleared to zero (in case the key is shorter than one block size).
The full block size was used as the length of the zero rather than the
size of the remaining ipad[]. In practice this overflow was harmless as
it could only clear bytes in the following opad[] array which is
initialized with a copy of ipad[] in the next statement.
Sponsored by: Chelsio Communications
The parameters describe how much of the adapter's memory is reserved for
storing TLS keys. The 'meminfo' sysctl now lists this region of adapter
memory as 'TLS keys' if present.
Sponsored by: Chelsio Communications
This consolidates all of the DDP state in one place. Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections. This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.
MFC after: 1 month
Sponsored by: Chelsio Communications
This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).
Reviewed by: bz, emaste
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14129
Properly honor the lack of the CRD_F_IV_PRESENT flag in the GCM
software fallback case for encryption requests.
Submitted by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
In particular, this avoids edge cases where a generated IV might be
written into the output buffer even though the request is failed with
an error.
Sponsored by: Chelsio Communications
- Extend ccr_gcm_soft() to handle requests with a non-empty payload.
While here, switch to allocating the GMAC context instead of placing
it on the stack since it is over 1KB in size.
- Allow ccr_gcm() to return a special error value (EMSGSIZE) which
triggers a fallback to ccr_gcm_soft(). Move the existing empty
payload check into ccr_gcm() and change a few other cases
(e.g. large AAD) to fallback to software via EMSGSIZE as well.
- Add a new 'sw_fallback' stat to count the number of requests
processed via the software fallback.
Submitted by: Harsh Jain @ Chelsio (original version)
Sponsored by: Chelsio Communications
This works around an issue in the T6 that can result in DMA engine
stalls if an error occurs while processing a DSGL entry with a length
larger than 2KB.
Submitted by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
Most crypto requests will not trigger this condition, but a request
with a highly-fragmented data buffer (and a resulting "large" S/G
list) could trigger it.
Sponsored by: Chelsio Communications
The T6 can hang when processing certain AEAD requests if the request
sets a flag asking the crypto engine to discard the input IV and AAD
rather than copying them into the output buffer. The existing driver
always discards the IV and AAD as we do not need it. As a workaround,
allocate a single "dummy" buffer when the ccr driver attaches and
change all AEAD requests to write the IV and AAD to this scratch
buffer. The contents of the scratch buffer are never used (similar to
"bogus_page"), and it is ok for multiple in-flight requests to share
this dummy buffer.
Submitted by: Harsh Jain @ Chelsio (original version)
Sponsored by: Chelsio Communications
The T6 crypto engine's control messages only support a total AAD
length (including the prefixed IV) of 511 bytes. Reject requests with
large AAD rather than returning incorrect results.
Sponsored by: Chelsio Communications
Combined authentication-encryption and GCM requests already stored the
IV in the immediate explicitly. This extends this behavior to block
cipher requests to work around a firmware bug. While here, simplify
the AEAD and GCM handlers to not include always-true conditions.
Submitted by: Harsh Jain @ Chelsio
Sponsored by: Chelsio Communications
Uses of mallocarray(9).
The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.
Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.
Reported by: wosch
PR: 225197
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
Unconditional 32-bit shift is not endianness-safe.
Modify the logic to work both on LE and BE.
Submitted by: Wojciech Macek <wma@freebsd.org>
Reviewed by: np
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13102
leaves the firmware event queue (fwq) as the only queue that can take
interrupts for others.
This simplifies cfg_itype_and_nqueues and queue allocation in the driver
at the cost of a little (never?) used configuration. It also allows
service_iq to be split into two specialized variants in the future.
MFC after: 2 months
Sponsored by: Chelsio Communications
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
tunables. Add num_vis to the intrs_and_queues structure as it affects
the number of interrupts requested and queues created. In future
cfg_itype_and_nqueues might lower it incrementally instead of going
straight to 1 when enough interrupts aren't available.
Sponsored by: Chelsio Communications
The setbit/clearbit pair casts the bitfield pointer
to uint8_t* which effectively treats its contents as
little-endian variable. The ffs() function accepts int as
the parameter, which is big-endian. Use uint8_t here to
avoid mismatch, as we have only 4 doorbells.
Submitted by: Wojciech Macek <wma@freebsd.org>
Reviewed by: np
Obtained from: Semihalf
Sponsored by: QCM Technologies
Differential revision: https://reviews.freebsd.org/D13084
different from hardware defaults. The congestion channel map, which is
still fixed, needs to be tracked separately now. Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore
Reviewed by: np @
Differential Revision: https://reviews.freebsd.org/D12563
Sponsored by: Mellanox Technologies
MFC after: 3 days
r324539 gathered some vnet decls into netinet/tcp_var.h, so that they
are now redundant in dev/cxgbe/tom/{t4_cpl_io.c,t4_ddp.c}. This triggers
gcc -Wredundant-decls.
Reviewed by: np
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12674
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list. Not all functions need the second
argument, some don't even need the first one. The second argument
lives in next cache line, so not dereferencing it is a performance
gain. This was discovered in sendfile(2), which will be covered by
next commits.
The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.
Reviewed by: gallatin, kbowling
Differential Revision: https://reviews.freebsd.org/D12615
Changes since 1.16.26.0 for all three firmwares are listed below. This
list was obtained from the Release Notes of the Chelsio Unified Wire
v3.5.05 release for Linux.
T6 Firmware
++++++++++++
================================================================================
Version : 1.16.63.0
Date : 09/29/2017
================================================================================
Fixes
-----
BASE:
- Fixed a fw crash when configured traffic rate limit is less than 10kbps.
- Fixed traffic rate limiting for smaller traffic rate value.
ETH:
- Fixed 40G link failure when interface is toggled.
- Fixed adapter crash when interface is toggled during traffic.
- Fixed 25G link failure when PEER only supports consortium mode autoneg
for 25G.
- Fixed 100G optics link failure when cable is plugged in after bringing up
the interface.
- Enable RS FEC as default if speed is 100G.
- Fixed DCBX configuration refresh failure.
OFLD
- Fixed 0B iWARP ingress read failure.
- Fixed iWARP SRQ reuse failure.
FOiSCSI:
- Fixed vlan interface ping failure.
- Fixed target discovery failures.
- Fixed mutual chap login failure.
================================================================================
Version : 1.16.59.0
Date : 09/05/2017
================================================================================
FIXES
-----
BASE:
- Fixed fw crash caused by MC parity error in SO adapters.
- Generate Timer0Int interrupt if fw crashes due to unaligned access error. Host
driver must look into PCIE_FW register to see if any fw fatal error has
encountered. If PCIE_FW doesn't indicate any error then driver must ignore this
interrupt.
- Fixed receive buffer threshold settings which was resulting in error frames on
receive side.
ETH:
- Fixed an issue in connection traffic shaping when
FLOWC_WR->FW_FLOWC_MNEM_SCHEDCLASS is not received in first WR on the connection.
- Fixed link failure when speed is changed from 10G-1G-10G due to incorrect flag
check.
- Fixed improper LED behaviour for blink test and when traffic is running.
- Removed storage of previous fec settings from fw. Driver needs to pass the user
settings whenever a new module is plugged in as fw resets these when a module is
unplugged.
OFLD
- OVS offload: TP cache is flushed periodically to get the accuate filters stats
(hit count).
ENHANCEMENTS
------------
BASE:
- Ring backbone feature added. New FW_PARAMS_PARAM_DEV_RING_BACKBONE param type
added to query and enable ring backbone support.
- VNI support added for filtering. New entry_type FW_VI_MAC_TYPE_EXACTMAC_VNI
added to FW_VI_MAC_CMD.
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer group
mapping for the ports.
- FW_PARAMS_PARAM_DEV_TPCHMAP API added to read the port to channel mapping.
- HMA (Host memory access) support added. New FW_HMA_CMD and
FW_PARAMS_PARAM_DEV_HMA_SIZE added to query and configure the HMA. It
enables the memfree support (256 connections) for iwarp.
- PTP support enabled.
ETH:
- Added consortium mode 50G support.
- Added the ability to allow only selected speeds to be advertised during auto
negotiation.
- Increased port capability from 16 to 32 bits to support more speeds.
FW_PARAMS_PARAM_PFVF_PORT_CAPS32 added to query whether fw supports 16 or 32
bit port capability.
OFLD:
- RDMA Write with immediate support added (iwarp 2.0 feature)
- FW_TLS_KEYCTX_TX_WR removed and security key management moved to driver.
- 256 offloaded connections support for iwarp on SO adapters.
iSCSI:
- New param FW_PARAMS_PARAM_DEV_PPOD_EDRAM added for iscsi ppod configuration
in EDRAM (performance improvement).
FOiSCSI:
- iSCSI Command offload target support added.
FOFCoE:
- FCoE support enabled.
================================================================================
Version : 1.16.43.0
Date : 05/05/2017
================================================================================
FIXES
-----
BASE:
- Fixed default DCB mode to AUTO.
- Fixed DCBX bugs when AUTO mode is configured in config file.
- Fixed an issue where even after removing PFC from switch, PFC wasn't getting
reset.
- Fixed DDR3/DDR4 ECC errors.
- Fixed an FLR issue where FLR completion was going to host before FLR
processing is finished in fw.
ETH:
- Fixed bug in writing multi-bytes using i2c interface.
- Fixed the link failure when optical cable is inserted into the QSA module
after loading the driver.
- Fixed false link up when peer interface was brought down.
- Enabling RS FEC by default for 100Gbase-SR4 according to 802.3BJ standard.
- Fixed bugs related to negotiated fec based local/peer fec ability and request.
- Fixed auto-neg failure with few switches.
- T6 Performance improvement fixes.
OFLD
- Fixed an extra credit issue if FW_RI_TYPE_FINI is delayed in fw due to
backpressure.
- Added a new queue type FW_IQ_TYPE_VF_CQ to handle the FW_PARAMS_PARAM_DMAQ*
commands. queue type will be part of the FW_PARAMS_PARAM_DMAQ_IQ_INTIDX
value. Used in guest RDMA (RDMA from VM/VF) usecase.
- T6 Crypto Coprocessor mode bug fixes.
- T6 Crypto TLS-inline mode bug fixes.
ENHANCEMENTS
------------
BASE:
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer
group mapping for the ports.
ETH:
- Added broadcom consortium next page support for 25G CR.
This can be enabled using flags=an_brcm option in the t6-config.txt file.
- Added spider mode support.
- Added support for 10G-BaseT converter sfp+ module.
- Added support for additional 25G/100G cables.
- Added support to enable/disable auto-neg using ethtool.
================================================================================
Version : 1.16.33.0
Date : 02/24/2017
================================================================================
Fixes
-----
BASE:
- Fixed DDR4 uncorrectable errors.
ETH:
- Enabled link auto negotiation (AN) by default in config file.
- Added AN and FEC control api. Host driver and application can enable/disable
AN and FEC.
ENHANCEMENTS
------------
BASE:
- Enabled High priorty filter.
- Added T6425 adapter support.
ETH:
- Added new workrequest ETH_TX_PKTS2_WR (see fw api document for more details).
================================================================================
Version : 1.16.29.0
Date : 01/27/2017
================================================================================
FIXES
-----
BASE:
- Set multiple fec values only if AN is enabled in config file and when module
is connected.
- Fixed intermittent DDR3/4 ECC errors.
- max number of ethctrl queue in VF set to 2 (reverted the last change
because it causes problem in VF drivers).
ETH:
- Made devlog more verbose by printing cable information in redable form.
- Updated AN settings to work with more 25G/100G switches.
- Added support for more SFP28/QSFP28 cables.
- Fixed an issue of link going down after few hours of idle time.
OFLD:
- Fixed an issue in TLS which was causing fw crash on running TLS traffic.
FOiSCSI:
- Fixed the failure of PXE boot OS install on an iscsi lun.
ENHANCEMENTS
------------
OFLD:
- Added filtering support for NAT. New WR FW_FILTER2_WR and
FW_PARAMS_PARAM_DEV_FILTER2_WR added for the same.
- Added RDMA guest mode (mode 3 or RDMA from VF) support.
================================================================================
T5 Firmware
++++++++++++
================================================================================
Version : 1.16.63.0
Date : 09/29/2017
================================================================================
Fixes
-----
BASE:
- Fixed offload memory overcommit in case of SO adapter.
ETH:
- Fixed DCBX configuration refresh failure.
OFLD
- Fixed 0B iWARP ingress read failure.
FOiSCSI:
- Fixed vlan interface ping failure.
================================================================================
Version : 1.16.59.0
Date : 09/05/2017
================================================================================
FIXES
-----
BASE:
- Fixed an FLR issue which was causing error when VF attached VM was powered on.
ETH:
- Fixed an issue in connection traffic shaping when
FLOWC_WR->FW_FLOWC_MNEM_SCHEDCLASS is not received in first WR on the connection.
- Fixed link failure when speed is changed from 10G-1G-10G due to incorrect flag
check.
- Fixed T580 link failure with few switches which take more time for
establishing link.
ENHANCEMENTS
------------
BASE:
- Ring backbone feature added. New FW_PARAMS_PARAM_DEV_RING_BACKBONE param type
added to query and enable ring backbone support.
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer group
mapping for the ports.
- FW_PARAMS_PARAM_DEV_TPCHMAP API added to read the port to channel mapping.
FOiSCSI:
- iSCSI Command offload target support added.
================================================================================
Version : 1.16.43.0
Date : 05/05/2017
================================================================================
FIXES
-----
BASE:
- Fixed default DCB mode to AUTO.
- Fixed DCBX bugs when AUTO mode is configured in config file.
- Fixed an issue where even after removing PFC from switch, PFC wasn't getting
reset.
ETH:
- Fixed bug in writing multi-bytes using i2c interface.
- Fixed the link failure when optical cable is inserted into the QSA module
after loading the driver.
OFLD
- Fixed an extra credit issue if FW_RI_TYPE_FINI is delayed in fw due to
backpressure.
- Added a new queue type FW_IQ_TYPE_VF_CQ to handle the FW_PARAMS_PARAM_DMAQ*
commands. queue type will be part of the FW_PARAMS_PARAM_DMAQ_IQ_INTIDX
value. Used in guest RDMA (RDMA from VM/VF) usecase.
ENHANCEMENTS
------------
BASE:
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer
group mapping for the ports.
================================================================================
Version : 1.16.33.0
Date : 02/24/2017
================================================================================
ENHANCEMENTS
------------
ETH:
- Added new workrequest ETH_TX_PKTS2_WR (see fw api document for more details).
================================================================================
Version : 1.16.29.0
Date : 01/27/2017
================================================================================
FIXES
-----
BASE:
- max number of ethctrl queue in VF set to 2 (reverted the last change
because it causes problem in VF drivers).
FOiSCSI:
- Fixed the failure of PXE boot OS install on an iscsi lun.
ENHANCEMENTS
------------
OFLD:
- Added filtering support for NAT. New WR FW_FILTER2_WR and
FW_PARAMS_PARAM_DEV_FILTER2_WR added for the same.
- Added RDMA guest mode (mode 3 or RDMA from VF) support.
================================================================================
T4 Firmware
+++++++++++
================================================================================
Version : 1.16.63.0
Date : 09/29/2017
================================================================================
Fixes
-----
ETH:
- Fixed DCBX configuration refresh failure.
FOiSCSI:
- Fixed vlan interface ping failure.
================================================================================
Version : 1.16.59.0
Date : 09/05/2017
================================================================================
FIXES
-----
ETH:
- Fixed an issue in connection traffic shaping when
FLOWC_WR->FW_FLOWC_MNEM_SCHEDCLASS is not received in first WR on the connection.
ENHANCEMENTS
------------
BASE:
- FW_PARAMS_PARAM_DEV_TPCHMAP API added to read the port to channel mapping.
================================================================================
Version : 1.16.43.0
Date : 05/05/2017
================================================================================
FIXES
-----
BASE:
- Fixed default DCB mode to AUTO.
- Fixed DCBX bugs when AUTO mode is configured in config file.
- Fixed an issue where even after removing PFC from switch, PFC wasn't getting
reset.
ETH:
- Fixed bug in writing multi-bytes using i2c interface.
OFLD
- Fixed an extra credit issue if FW_RI_TYPE_FINI is delayed in fw due to
backpressure.
- Added a new queue type FW_IQ_TYPE_VF_CQ to handle the FW_PARAMS_PARAM_DMAQ*
commands. queue type will be part of the FW_PARAMS_PARAM_DMAQ_IQ_INTIDX
value. Used in guest RDMA (RDMA from VM/VF) usecase.
ENHANCEMENTS
------------
BASE:
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer
group mapping for the ports.
Obtained from: Chelsio Communications
MFC after: 2 weeks
Sponsored by: Chelsio Communications
To optimize the case of ping-ponging between two buffers, the DDP code
caches the last two buffers used keeping the pages wired and page pods
stored in the NIC's RAM. If a new aio_read() request uses one of the
same buffers, then the work of holding pages, etc. can be avoided.
However, the starting virtual address of an aio buffer was not saved,
only the page count, length, and initial page offset. Thus, an
aio_read() request could match a different buffer in the address
space. (Earlier during development vm_fault_hold_quick_pages() was
always called and the vm_page_t values were compared, but that was
eventually removed without being adequately replaced.) Fix by storing
the starting virtual address and comparing that (along with other
fields) to determine if a buffer can be reused.
MFC after: 3 days
Sponsored by: Chelsio Communications
The bad_session, sglist_error, and process_error sysctl nodes were
returning the value of the pad_error node instead of the appropriate
error counters.
Sponsored by: Chelsio Communications
- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
pending. This can happen when a t4_wrq_tx runs between two
start_wrq_wr.
- commit_wrq_wr must examine the cookie's pidx and ndesc with the
queue's lock held. Otherwise there is a bad race when incomplete WRs
are being completed and commit_wrq_wr for the WR that is ahead in the
queue updates the next incomplete WR's cookie's pidx/ndesc but the
commit_wrq_wr for the second one is using stale values that it read
without the lock.
MFC after: 1 week
Sponsored by: Chelsio Communications
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
used by the TOE hardware for fully offloaded connections. The knob
affects new connections only.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
install after full initialization, and another to disable the TCB
cache (T6+). The latter works as a tunable only.
Note that debug_flags are for debugging only and should not be set
normally.
MFC after: 1 week
Sponsored by: Chelsio Communications
These firmwares come from a pre-release snapshot. The final firmwares
in this Chelsio release cycle will likely be .61.0 or later and those
will be the next "long lived" firmwares in FreeBSD head and stable
branches. .59 is being provided in head (only) for wider test exposure.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
creating hardware VIs.
This fixes a bad race on systems with hw.cxgbe.num_vis > 1.
Reported by: olivier@
MFC after: 1 week
Sponsored by: Chelsio Communications
the current state to determine whether to generate a link-state change
notification. This fixes a bug introduced in r321063 that caused the
driver to sometimes skip these notifications.
Reported by: Jason Eggleston @ LLNW
MFC after: 3 days
Sponsored by: Chelsio Communications
precision. These timers are already displayed in microseconds in the
sysctl MIB. Add variables to track these tunables while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
debug (cudbg) code, hooked up to the main driver via an ioctl.
The ioctl can be used to collect the chip's internal state in a
compressed dump file. These dumps can be decoded with the "view"
component of cudbg.
Obtained from: Chelsio Communications
MFC after: 2 months
Sponsored by: Chelsio Communications
and keepalive in the sysctl MIB. Provide tunables to change some of
these parameters. These are supposed to be setup by the firmware so
these tunables are for experimentation only.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
doesn't seem to have one. This lets the driver recover automatically
from incomplete firmware upgrades (panic, reboot, power loss, etc. in
the middle of an upgrade).
MFC after: 2 weeks
Sponsored by: Chelsio Communications
- Deal with changes to port_type, and not just port_mod when a
transceiver is changed. This fixes hot swapping of transceivers of
different types (QSFP+ or QSA or QSFP28 in a QSFP28 port, SFP+ or
SFP28 in a SFP28 port, etc.).
- Always refresh media information for ifconfig if the port is down.
The firmware does not generate tranceiver-change interrupts unless at
least one VI is enabled on the physical port. Before this change
ifconfig diplayed potentially stale information for ports that were
administratively down.
- Always recalculate and reapply L1 config on a transceiver change.
- Display PAUSE settings in ifconfig. The driver sysctls for this
continue to work as well.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
the "effective MSS" for the connection. The chip expects it this way.
Submitted by: Krishnamraju Eraparaju @ Chelsio
MFC after: 3 days
Sponsored by: Chelsio Communications
Do not attempt to initialize netmap queues that are already initialized
or aren't supposed to be initialized. Similarly, do not free queues
that are not initialized or aren't supposed to be freed.
PR: 217156
Sponsored by: Chelsio Communications
- For HMAC requests, construct a special input buffer to request an empty
hash result.
- For plain cipher requests and requests that chain an AES cipher with an
HMAC, fail with EINVAL if there is no cipher payload. If needed in
the future, chained requests that only contain AAD could be serviced as
HMAC-only requests.
- For GCM requests, the hardware does not support generating the tag for
an AAD-only request. Instead, complete these requests synchronously
in software on the assumption that such requests are rare.
Sponsored by: Chelsio Communications
The adapter firmware in general does not accept PDUs larger than 64k - 1
bytes in size. Sending crypto requests larger than this size result in
hangs or incorrect output, so reject them with EFBIG. For requests
chaining an AES cipher with an HMAC, the firmware appears to require
slightly smaller requests (around 512 bytes).
Sponsored by: Chelsio Communications
The latest firmware has a number of link related fixes, support for a
new custom card, and the fix for a bug that affected rate limiting on
FreeBSD.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
The ccr(4) driver supports use of the crypto accelerator engine on
Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.
Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS
cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC,
and SHA2-512-HMAC authentication algorithms. The driver also supports
chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication
algorithm for encrypt-then-authenticate operations.
Note that this driver is still under active development and testing and
may not yet be ready for production use. It does pass the tests in
tests/sys/opencrypto with the exception that the AES-GCM implementation
in the driver does not yet support requests with a zero byte payload.
To use this driver currently, the "uwire" configuration must be used
along with explicitly enabling support for lookaside crypto capabilities
in the cxgbe(4) driver. These can be done by setting the following
tunables before loading the cxgbe(4) driver:
hw.cxgbe.config_file=uwire
hw.cxgbe.cryptocaps_allowed=-1
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D10763
Invoke any identify routines of child drivers during attach before attaching
children, and delete any remaining devices after deleting ports.
MFC after: 1 month
Sponsored by: Chelsio Communications
- Do not leak the adapter lock in sysctl_autoneg.
- Accept only 0 or 1 as valid settings for autonegotiation.
- A fixed speed must be requested by the driver when autonegotiation is
disabled otherwise the firmware will reject the l1cfg command. Use
the top speed supported by the port for now.
MFC after: 3 days
Sponsored by: Chelsio Communications
the TOE. For now this capability is always enabled in kernels with
options RATELIMIT. t4_tom will check if_capenable once the base driver
gets code to support rate limiting for any socket (TOE or not).
This was tested with iperf3 and netperf ToT as they already support
SO_MAX_PACING_RATE sockopt. There is a bug in firmwares prior to
1.16.45.0 that affects the BSD driver only and results in rate-limiting
at an incorrect rate. This will resolve by itself as soon as 1.16.45.0
or later firmware shows up in the driver.
Relnotes: Yes
Sponsored by: Chelsio Communications
- Create a new file, t4_sched.c, and move all of the code related to
traffic management from t4_main.c and t4_sge.c to this file.
- Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl)
parameters in the PF driver.
- Initialize all the cl_rl limiters with somewhat arbitrary default
rates and provide routines to update them on the fly.
- Provide routines to reserve and release traffic classes.
MFC after: 1 month
Sponsored by: Chelsio Communications
available starting with T6. The values in the timer holdoff registers
are multiplied by the scaling factor before use.
dev.<nexus>.<n>.holdoff_timers shows the final values of the
timers in microseconds.
MFC after: 1 week
Sponsored by: Chelsio Communications
(and accurate).
T4 and later have an extra bit for page shift so the maximum page size
is 8TB (shift of 12 + 31) instead of 128MB (12 + 15). This saves space
in the chip's PBL (physical buffer list) when registering very large
memory regions.
MFC after: 3 days
Sponsored by: Chelsio Communications
the thread that deals with socket state changes. This eliminates
various bad races with the ithread.
Submitted by: KrishnamRaju ErapaRaju @ Chelsio (original version)
MFC after: 3 days
Sponsored by: Chelsio Communications
inpcb. t4_tom detaches the inpcb from the toepcb as soon as the
hardware is done with the connection (in final_cpl_received) but the
socket is around as long as the cm_id and the rest of iWARP state is.
This fixes an intermittent NULL dereference during abort.
Submitted by: KrishnamRaju ErapaRaju @ Chelsio
MFC after: 3 days
Sponsored by: Chelsio Communications
ULPs can set a qp's state to ERROR and then post a work request on the
sq and/or rq. When the reply for that work request comes back it is
guaranteed that all previous work requests posted on that queue have
been drained.
Obtained from: Chelsio Communications
MFC after: 3 days
Sponsored by: Chelsio Communications
registered. T4/5/6 have no internal limit on this size. This is
probably a copy paste from the T3 iw_cxgb driver.
MFC after: 3 days
Sponsored by: Chelsio Communications
Sockets representing the TCP endpoints for iWARP connections are
allocated by the ibcore module. Before this revision they were closed
either by the ibcore module or the iw_cxgbe hardware driver depending on
the state transitions during connection teardown. This is error prone
and there were cases where both iw_cxgbe and ibcore closed the socket
leading to double-free panics. The fix is to let ibcore close the
sockets it creates and never do it in the driver.
- Use sodisconnect instead of soclose (preceded by solinger = 0) in the
driver to tear down an RDMA connection abruptly. This does what's
intended without releasing the socket's fd reference.
- Close the socket in ibcore when the iWARP iw_cm_id is destroyed. This
works for all kinds of sockets: clients that initiate connections,
listeners, and sockets accepted off of listeners.
Reviewed by: Steve Wise @ Open Grid Computing, hselasky@
MFC after: 3 days
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D9796
'-n' to tell the driver to create _up to_ 'n' queues if enough cores are
available. For example, setting hw.cxgbe.nrxq10g="-32" will result in
16 queues if the system has 16 cores, 32 if it has 32.
There is no change in the default number of queues of any type.
MFC after: 1 week
Sponsored by: Chelsio Communications
- Check for Chelsio vendor ID in probe routines.
- Fail attach instead of faulting if pci_find_dbsf() doesn't find a
device.
PR: 216539
Reported by: asomers
Tested by: Dave Baukus <daveb@spectralogic.com>
MFC after: 3 days
Sponsored by: Chelsio Communications
undo_offload_socket() is only called by t4_connect() during a connection
setup failure, but t4_connect() still owns the TOE PCB and frees ita
after undo_offload_socket() returns. Release a reference in
undo_offload_socket() resulted in a double-free which panicked when
t4_connect() performed the second free. The reference release was
added to undo_offload_socket() incorrectly in r299210.
MFC after: 1 week
Sponsored by: Chelsio Communications
Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
switch to it when processing the reply (do_act_open_rpl or
do_act_establish).
Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
listening socket.
CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
associated with one of the card's ifnets.
Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
so t4_push_frames runs in proper vnet context already. One exception
is when incoming credits trigger tx within the driver's ithread. Set
the vnet explicitly in do_fw4_ack for that case.
MFC after: 3 days
Sponsored by: Chelsio Communications
a linuxkpi style device is expected. If OFED/linuxkpi actually starts
using this field then we'll have to figure out whether to create fake
devices for these drivers or have linuxkpi deal with NULL device.
This mismatch was first reported as part of D6585.
all public firmwares for all chips since the last release (1.15.37.0)
follows (it's a straight copy-paste from the Release Notes for the
12/30/2016 Unified Wire release on Chelsio's website).
T6 Firmware
++++++++++++
Version : 1.16.26.0
Date : 12/28/2016
Fixes
-----
BASE:
- Max number of egress and control queues adjusted to accomodate
co-processor mode queues.
- Fixed intermittent DDR3/4 ECC errors.
- Fixed a traffic stall when ETS BW is configured as 0%.
- Max number of ethctrl queue in VF set to 1.
ETH:
- Added a new config file option 'speed' under port section to set the
port speed. Use only when auto negotiation is off.
- FEC option removed from firmware config file. cxgbtool can be used to
change the fec setting.
- CPL_TX_TNL_LSO cpl handling added in ETH_TX_PKT_VM handler. This fixes
large tunnel tcp packet support for VxLAN.
Version : 1.16.22.0
Date : 12/05/2016
Fixes
-----
BASE:
- fw_port_type updated in fw API to match kernel.org definitions.
- Saved power by disaling unused MAC lanes.
- Configures correct power bin.
- Enhanced DDR4 performance.
- Enabled interrupts.
- Fixed an issue where filter rule for 'unicast hash' is not working.
ETH:
- Disabled auto negotiation by default because most of 100G switches do
not support AN as of today.
- Fixed flow control not getting disabled problem.
- Fixed an issue where port0 doesn't come up sometimes.
- Fixed 10G link not coming up issue.
- Fixed an issue with promiscuous mode when dcbx disabled.
OFLD:
- Fixed a connection stuck issue when abort is received during out of tx
pages backpressure.
ENHANCEMENTS
------------
BASE:
- Added inline TLS mode support.
Version : 1.16.12.0
Date : 11/11/2016
ENHANCEMENTS
------------
BASE:
- Added T6 support.
- Added T6 1G/10G/25G/40G/100G link speeds.
- Added T6 co-processor mode crypto support.
- Added facility to increase link AN+AEC timeout.
OFLD:
- Added support for all T5 offload protocols except FCoE.
iSCSI:
- iscsi completion moderation enabled.
=======================================================================
T5 Firmware
++++++++++++
Version : 1.16.26.0
Date : 12/28/2016
FIXES
-----
BASE:
- Max number of ethctrl queue in VF set to 1.
Version : 1.16.22.0
Date : 12/05/2016
FIXES
-----
BASE:
- Fixed an issue where filter rule for 'unicast hash' is not working.
ETH:
- Fixed an issue with promiscuous mode when dcbx disabled.
ENHANCEMENTS
------------
ETH:
- Added 40G-KR support.
Version : 1.16.12.0
Date : 11/11/2016
FIXES
-----
BASE:
- Fixed multiple issues related with VFs FLR processing.
- Fixed channel assignment based on number of ports in adapter.
- Fixed a crash when VM having PF assigned as passthrough mode is
rebooted.
- Handled 2nd HELLO command from the same PF without seeing BYE from the
same PF and if that is the only PF.
- A warning is printed in firmware log if PCI-E cookie generation is
enabled in serial initialization file.
- Fixed multiple issues related with Filtering.
- Enabled DSGL memory write for iscsi and rdma.
- Added new FW_PARAMS_CMD[DEV] options to retrieve Serial Configuration
and VPD version numbers.
- Fixed an issue where LVDS output was not getting enabled using vpd.
DCBX:
- Fixed DCBX CEE Incorrect class to pririty mapping.
- Fixed incorrect interpretation of DCBX IEEE PFC.
ETH:
- Adjusted the link related delay timings according to the QSFP spec.
- Improved 40G link bringup time with few switches.
OFLD:
- Do not reserve qp/cq if rdma capability is not enabled.
- Fixed an issue where approx 1600+ TOE connections were causing a
firmware fatal error.
FOiSCSI:
- Fixed an issue where unloading foiscsi driver causes mailbox timeout.
ENHANCEMENTS
------------
BASE:
- Added 10G KR/KX support.
- Added T540-BT adapter support.
- Added 4 new rss key modes for PFs and VFs.
OFLD:
- Added new WR FW_RI_FR_NSMR_TPTE_WR to improve fast MR write
performance in RDMA.
Version : 1.16.5.0
Date : 10/26/2016
FIXES
-----
BASE:
- Fixed multiple issues where FLR from multiple VFs can cause firmware
crash.
- Fixed channel assignment based on number of ports in adapter.
- Fixed the HELLO command master force api to handle the 2nd HELLO
correctly without getting BYE from the PF driver.
- Added facility to retrieve Serial configuration and VPD version. Two
new FW_PARAMS_CMD[DEV] options added to retrieve these values.
- Fixed multiple issues where FLR from multiple VFs are not completing.
- Added new RSS hash secret key modes.
- Fixed an issue where LVDS output was not getting enabled using vpd.
DCBX:
- Fixed an issue where iscsi tlv is sent incorrectly to host (DCBX CEE).
- Fixed an issue where app priority values are not handled correctly
in fw (DCBX IEEE).
ETH:
- Adjusts the link related delay timings according to the QSFP spec.
- Changed 2.5G mac speed bit to 25G mac speed bit in fw API.
- Improvement in 40G link bringup time with few switches.
OFLD:
- Do not reserve qp/cq if rdma capability is not enabled.
- Fixed an issue where approx 1600+ TOE connections were causing a
firmware fatal error.
- Fixed DSGL memory write in T5. Now iwarp and iscsi can use DSGL to do
memory write.
- Fixed multiple issues in hash filter mode where incorrect protocol
mask was getting used and affecting hash filter functionality.
- New fastpath WR FW_RI_FR_NSMR_TPTE_WR (with fully populated TPTE) is
added for small REG_MR operations.
FOiSCSI:
- Fixed an issue in foiscsi recovery path.
- Fixed an issue where foiscsi (in VM in PCIE passthrough mode) didn't
come up after VM FLR.
ENHANCEMENTS
------------
ETH:
- Implemented 1G/10G KR/KX ability.
- Implemented T540-BT adapter support.
=======================================================================
T4 Firmware
+++++++++++
Version : 1.16.12.0
Date : 11/11/2016
FIXES
-----
BASE:
- Fixed an issue where reading temperature sesors using ldst command
causes mailbox timeout.
- Added new FW_PARAMS_CMD[DEV] options to retrieve Serial Configuration
and VPD version numbers.
ETH:
- Fixed DCBX CEE Incorrect class to pririty mapping.
FOiSCSI:
- Fixed an issue where unloading foiscsi driver causes mailbox timeout.
MFC after: 3 days
Sponsored by: Chelsio Communications
- Update struct link_settings and associated shared code.
- Add tunables to control FEC and autonegotiation. All ports inherit
these values as their initial settings.
hw.cxgbe.fec
hw.cxgbe.autoneg
- Add per-port sysctls to control FEC and autonegotiation. These can be
modified at any time.
dev.<port>.<n>.fec
dev.<port>.<n>.autoneg
MFC after: 3 days
Sponsored by: Chelsio Communications
- Disable features that are not supported or not used on FreeBSD.
- Increase the RSS table slice per interface.
- Increase the share of the TCAM reserved for filtering.
MFH: 2 weeks
Sponsored by: Chelsio Communications
There are 4 independent knobs in T5+ chips to include or exclude PAUSE
frames from the "total frames" and "multicast frames" counters in either
direction. This change lets the driver deal with any combination of
these settings.
contiguous regions in an mbuf chain.
If the payload of an mbuf ends at a page boundary count_mbuf_nsegs would
incorrectly consider the next mbuf's payload physically contiguous based
solely on a KVA comparison.
MFC after: 1 week
Sponsored by: Chelsio Communications
come up as 't6nex' nexus devices with 'cc' ports hanging off them.
The T6 firmware and configuration files will be added as soon as they
are released. For now the driver will try to work with whatever
firmware and configuration is on the card's flash.
Sponsored by: Chelsio Communications
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.
VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.
Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.
Reviewed by: np
MFC after: 2 months
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7599
If a packet contains the Ethernet header (14 bytes) in the first mbuf
and the payload (IP + UDP + data) in the second mbuf, then the attempt
to fetch the l3hdr will return a NULL pointer. The first loop iteration
will drop len to zero and exit the loop without setting 'p'. However,
the desired data is at the start of the second mbuf, so the correct
behavior is to loop around and let the conditional set 'p' to m_data of
the next mbuf (and leave offset as 0).
Reviewed by: np
Sponsored by: Chelsio Communications
transfers.
The Initiator and Target both perform zero copy receive for transfers
greater than or equal to this threshold.
Sponsored by: Chelsio Communications
routines available in t4_tom to manage the iSCSI DDP page pod region.
This adds the ability to use multiple DDP page sizes to the iSCSI
driver, among other improvements.
Sponsored by: Chelsio Communications
The device quiet flag is not automatically reset on detach, so it is
inherited by other device drivers (e.g. when switching a device driver
over to ppt for PCI pass through). Cope with this behavior by explicitly
marking the device verbose during detach so that the next driver can make
its own decision.
Sponsored by: Chelsio Communications
- Return appropriate error code instead of ENOMEM when sosend() fails in
send_mpa_req.
- Fix for problematic race during destroy_qp.
- Abortive close in the failure of send_mpa_reject() instead of normal close.
- Remove the unnecessary doorbell flowcontrol logic.
Submitted by: Krishnamraju Eraparaju at Chelsio
MFC after: 1 month
Sponsored by: Chelsio communications
hardware send and receive PDU limits. Report these limits to ICL and
take them into account when setting the socket's send and receive buffer
sizes. The driver used a single hardcoded limit everywhere prior to
this change.
Sponsored by: Chelsio Communications
Decouple the send and receive limits on the amount of data in a single
iSCSI PDU. MaxRecvDataSegmentLength is declarative, not negotiated, and
is direction-specific so there is no reason for both ends to limit
themselves to the same min(initiator, target) value in both directions.
Allow iSCSI drivers to report their send, receive, first burst, and max
burst limits explicitly instead of using hardcoded values or trying to
derive all of them from the receive limit (which was the only limit
reported by the drivers prior to this change).
Display the send and receive limits separately in the userspace iSCSI
utilities.
Reviewed by: jpaetzel@ (earlier version), trasz@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7279
This permits a single early return for VF devices in the routines that
add sysctl nodes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7512
Specifically, the FW_PORT_CMD may or may not work for a VF (the PF
driver can choose whether or not to permit access to this command),
so don't attempt to fetch port information on a VF if permission is
denied by the PF.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7511
While here, mark which parameters are PF-specific and which are
VF-specific.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7508
- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
instead of inlining it.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7484
- Use alternate register locations for the data and control registers for
VFs.
- Do a dummy read to force the writes to the mailbox data registers to
post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7483
Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure. Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly. This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.
While here, move the call to t4_init_sge_params() to
get_params__post_init(). The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7476
Use this to map an absolute queue ID to a logical queue ID in interrupt
handlers. For the regular cxgbe/cxl drivers this should be a no-op as
the base absolute ID should be zero. VF devices have a non-zero base
absolute ID and require this change. While here, export the absolute ID
of egress queues via a sysctl.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7446
Chelsio T4/T5 adapters are multifunction cards. The main driver uses
physical function 4 (PF4). However, VF devices for SR-IOV are only
supported on physical functions 0 through 3, where PF0 creates VFs tied
to port 0, etc. The t4iov/t5iov driver was previously added to
create VF devices for ports that are present on each adapter. This
change uses the recently added pci_iov_attach_name() function to
name the character device in /dev/iov after the associated port on
the card (e.g. /dev/iov/cxl0 is used to create VFs that share the
cxl0 port). With this in place, mark the t4iov/t5iov devices quiet
to prevent them from cluttering dmesg.
Reviewed by: rstone
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7402
VF devices use a different register layout than PF devices. Storing
the offset in a value in the softc allows code to be shared between the
PF and VF drivers.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7389
- Remove null open/close methods.
- Don't set d_flags to 0 explicitly.
- Remove t5_cdevsw as the .d_name member isn't really used and doesn't
warrant a separate cdevsw just for the name.
- Use ENOTTY as the error value for an unknown ioctl request.
- Use make_dev_s() to close race with setting si_drv1.
Sponsored by: Chelsio Communications
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer. This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer. The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.
Because these mbufs do not have an associated virtual address, m_data
is not valid. Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().
An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job). The
special mbufs reference this structure via m_ext. Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size). The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer. The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.
Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
returning EAGAIN if they aren't available when the user tries to program
a filter. Do this after validating the filter so that the driver
doesn't bring up the queues if it doesn't have to.
Chelsio NICs are a bit unique compared to some other NICs in that they
expose different functionality on different physical functions. In
particular, PF4 is used to manage the NIC interfaces ('t4nex' and 't5nex').
However, PF4 is not able to create VF devices. Instead, VFs are only
supported by physical functions 0 through 3. This commit adds 't4iov'
and 't5iov' drivers that attach to PF0-3.
One extra wrinkle is that the iov devices cannot enable SR-IOV until the
firwmare has been initialized by the main PF4 driver. To handle this
case, a new t4_if kobj interface has been added to permit cross-calls
between the PF drivers. The PF4 driver notifies sibling drivers when it
is fully attached. It also requests sibling drivers to detach before it
detaches. Sibling drivers query the PF4 driver during their attach
routine to see if it is attached. If not, the sibling drivers defer
their attach actions until the PF4 driver informs them it is attached.
VF devices are associated with a single port on the NIC. VF devices
created from PF0 are associated with the first port on the NIC, VFs
from PF1 are associated with the second port, etc. VF devices can
only be created from a PF device that has an associated port. Thus,
on a 2-port card, VFs are only supported on PF0 and PF1.
Reviewed by: np (earlier versions)
MFC after: 1 month
Sponsored by: Chelsio Communications
If a driver sends an malformed or disallowed work request, the firmware
responds with a work request error. Previously the driver treated this is
as an unexpected message and panicked. Now it decodes the error message
to aid in debugging.
Reviewed by: np (older version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6950
related to "shared" CPLs.
a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function. Allow callers to direct the response to any iq. Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.
b) Remove all CPL handler tables from struct adapter. This reduces its
size by around 2KB. All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation. The registration
functions do not need an adapter parameter any more.
c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode. There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply. Update t4_write_l2e and callers to to support any
wrq/iq combination.
Approved by: re@ (kib@)
Sponsored by: Chelsio Communications
The interface's queues are functional after VI_INIT_DONE (which is short
of interface-up) and that's all that's needed for t4_tom to communicate
with the chip.
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications
vcxgbe/vcxl interfaces and retire the 'n' interfaces. The main
cxgbe/cxl interfaces and tunables related to them are not affected by
any of this and will continue to operate as usual.
The driver used to create an additional 'n' interface for every
cxgbe/cxl interface if "device netmap" was in the kernel. The 'n'
interface shared the wire with the main interface but was otherwise
autonomous (with its own MAC address, etc.). It did not have normal
tx/rx but had a specialized netmap-only data path. r291665 added
another set of virtual interfaces (the 'v' interfaces) to the driver.
These had normal tx/rx but no netmap support.
This revision consolidates the features of both the interfaces into the
'v' interface which now has a normal data path, TOE support, and native
netmap support. The 'v' interfaces need to be created explicitly with
the hw.cxgbe.num_vis tunable. This means "device netmap" will not
result in the automatic creation of any virtual interfaces.
The following tunables can be used to override the default number of
queues allocated for each 'v' interface. nofld* = 0 will disable TOE on
the virtual interface and nnm* = 0 to will disable native netmap
support.
# number of normal NIC queues
hw.cxgbe.ntxq_vi
hw.cxgbe.nrxq_vi
# number of TOE queues
hw.cxgbe.nofldtxq_vi
hw.cxgbe.nofldrxq_vi
# number of netmap queues
hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi
hw.cxgbe.nnm{t,r}xq{10,1}g tunables have been removed.
--- tl;dr version ---
The workflow for netmap on cxgbe starting with FreeBSD 11 is:
1) "device netmap" in the kernel config.
2) "hw.cxgbe.num_vis=2" in loader.conf. num_vis > 2 is ok too, you'll
end up with multiple autonomous netmap-capable interfaces for every
port.
3) "dmesg | grep vcxl | grep netmap" to verify that the interface has
netmap queues.
4) Use any of the 'v' interfaces for netmap. pkt-gen -i vcxl<n>... .
One major improvement is that the netmap interface has a normal data
path as expected.
5) Just ignore the cxl interfaces if you want to use netmap only. No
need to bring them up. The vcxl interfaces are completely independent
and everything should just work.
---------------------
Approved by: re@ (gjb@)
Relnotes: Yes
Sponsored by: Chelsio Communications
File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thread that
completes an AIO request via aio_return() or aio_waitcomplete(). This
change extends AIO jobs to store counts of received/sent messages and
updates socket backends to set these counts accordingly. Note that
the socket backends are careful to only charge a single messages for
each AIO request even though a single request on a blocking socket might
invoke sosend or soreceive multiple times. This is to mimic the
resource accounting of synchronous read/write.
Adjust the UNIX socketpair AIO test to verify that the message resource
usage counts update accordingly for aio_read and aio_write.
Approved by: re (hrs)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6911
Inserting a full mbuf with an external cluster into the socket buffer
resulted in sbspace() returning -MLEN. However, since sb_hiwat is
unsigned, the -MLEN value was converted to unsigned in comparisons. As a
result, the socket buffer was never autosized. Note that sb_lowat is signed
to permit direct comparisons with sbspace(), but sb_hiwat is unsigned.
Follow suit with what tcp_output() does and compare the value of sbused()
with sb_hiwat instead.
Approved by: re (gjb)
Sponsored by: Chelsio Communications
This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.
Change the socket and Chelsio DDP backends to use 'backend3' instead of
abusing _aiocb_private.status directly. This confines the use of
_aiocb_private to the AIO internals in vfs_aio.c.
Reviewed by: kib (earlier version)
Approved by: re (gjb)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6547
Release the hold on ep->com immediately after sending the RST. This
fixes a bug that sometimes leaves userspace iWARP tools hung when the
user presses ^C.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Approved by: re (gjb@)
Sponsored by: Chelsio Communications
- Validate the scheduling class against the actual limit (which is chip
specific) instead of a magic number.
- Return an error if an attempt is made to manipulate the tx queues of a
VI that hasn't been initialized.
Sponsored by: Chelsio Communications
comp_handler_lock but c4iw_destroy_cq has already freed the CQ memory
(which is where the lock resides).
Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications
by default. This is a workaround for a too simplistic ICL module
choosing mechanism. To use it, specify offload in ctl.conf
or iscsi.conf.
This fixes a problem where "kldload cxgbei" wedges the iSCSI stack,
if you don't have a Chelsio card installed, or the endpoints of the
iSCSI session are not reachable through addresses configured
on that interface.
Reviewed by: np@
MFC after: 1 month
avoid panicking debug kernels.
t4_tom does not keep track of a connection once it switches to ULP mode
iWARP. If the connection falls out of ULP mode the driver/hardware seq#
etc. are out of sync. A better fix would be to figure out what the
current seq# are, update the driver's state, and perform all sanity
checks as usual.
If the parent is DEAD or connect_request_upcall() fails, the parent
mutex is left locked. This leads to a hang when process_mpa_request()
is called again for another child of the listening endpoint.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Obtained from: upstream iw_cxgb4
Sponsored by: Chelsio Communications
tunable SYSCTL's. Linux module parameters are associated with the
module they belong to. FreeBSD does not share this concept of a parent
module. Instead add macros which define the prefix to use for the
module parameters in the LinuxKPI consumers.
While at it convert all "bool" LinuxKPI module parameters to "byte"
type, because we don't have a "bool" type of SYSCTL in FreeBSD.
Sponsored by: Mellanox Technologies
MFC after: 1 week
method. This is required for upcoming iSER support.
Obtained from: Mellanox Technologies (earlier version)
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
These firmwares were obtained from the "Chelsio T5/T4 Unified Wire
v2.12.0.3 for Linux" release. Changes since 1.14.4.0 (which is the
firmware in -STABLE branches) are in the "Release Notes" accompanying
the Unified Wire release and are copy-pasted here as well.
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
ETH
- Fixed a link flap issue on T580-CR.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
- Fixed an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host. (DCBX CEE)
- Fixed an issue where apply bit set for APP id was affecting the ETS and PFC
settings.(DCBX IEEE)
- Fixed an issue where app priority values are not handled correctly in fw.
(DCBX IEEE)
- Fixed an issue where enable/disable dcbx can cause crash. (DCBX CEE,DCBX IEEE)
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host.(DCBX CEE)
- Fixed an issue where enable/disable dcbx can cause crash in firmware.(DCBX CEE)
FOiSCSI
- Fixes an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
Obtained from: Chelsio Communications
MFC after: 6 weeks
Relnotes: yes
Sponsored by: Chelsio Communications
Other structures needed by prototypes in t4_tom.h are explicitly
declared in this file, so adding the prototype here seems most
consistent with existing code.
Chelsio's TCP offload engine supports direct DMA of received TCP payload
into wired user buffers. This feature is known as Direct-Data Placement.
However, to scale well the adapter needs to prepare buffers for DDP
before data arrives. aio_read() is more amenable to this requirement than
read() as applications often call read() only after data is available in
the socket buffer.
When DDP is enabled, TOE sockets use the recently added pru_aio_queue
protocol hook to claim aio_read(2) requests instead of letting them use
the default AIO socket logic. The DDP feature supports scheduling DMA
to two buffers at a time so that the second buffer is ready for use
after the first buffer is filled. The aio/DDP code optimizes the case
of an application ping-ponging between two buffers (similar to the
zero-copy bpf(4) code) by keeping the two most recently used AIO buffers
wired. If a buffer is reused, the aio/DDP code is able to reuse the
vm_page_t array as well as page pod mappings (a kind of MMU mapping the
Chelsio NIC uses to describe user buffers). The generation of the
vmspace of the calling process is used in conjunction with the user
buffer's address and length to determine if a user buffer matches a
previously used buffer. If an application queues a buffer for AIO that
does not match a previously used buffer then the least recently used
buffer is unwired before the new buffer is wired. This ensures that no
more than two user buffers per socket are ever wired.
Note that this feature is best suited to applications sending a steady
stream of data vs short bursts of traffic.
Discussed with: np
Relnotes: yes
Sponsored by: Chelsio Communications
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
The size of the reply can be different from the size of the command in
case a debug firmware asserts. fw_asrt() needs the entire reply in
order to decode the location of the assert.
Sponsored by: Chelsio Communications
This fixes a conflict with the M_B macro in powerpc's
<machine/db_machdep.h> exposed by the recent addition of DDB commands
to the cxgbe driver.
Discussed with: np
Reported by: bz
Sponsored by: Chelsio Communications
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5725
This provides a constant ABI and layout for these structures (especially
struct adapter) avoiding some foot shooting.
Discussed with: np
Sponsored by: Chelsio Communications
Figure out if the chip is counting PAUSE frames in the "normal" stats
and take them out if it is. This fixes a bug in the tx stats because
the default hardware behavior is different for Tx and Rx but the driver
was treating both the same way. The result was that OPACKETS, OBYTES,
and OMCASTS were under-reported (if tx_pause > 0) before this change.
Note that the mac_stats sysctl still gives you the raw value of these
statistics straight from the device registers.
defined:
sys/dev/cxgbe/t4_main.c:7474: warning: 'sysctl_tp_tick' defined but not used
sys/dev/cxgbe/t4_main.c:7505: warning: 'sysctl_tp_dack_timer' defined but not used
sys/dev/cxgbe/t4_main.c:7519: warning: 'sysctl_tp_timer' defined but not used
This just adds a bunch of #ifdef TCP_OFFLOAD in the right places.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D5620
- Query the location of the log very early during attach. Refresh the
location later after establishing contact with the firmware.
- Save the log's location as a flat address in devlog_params.
- Use a memory window instead of backdoor access to the EDC/MC to read
the log.
which is responsible for filtering and RSS.
Add the ability to use filters that match on PF/VF (aka "VNIC id") while
here. This is mutually exclusive with filtering on outer VLAN tag with
Q-in-Q.
Sponsored by: Chelsio Communications
Move the code that reads all the parameters to t4_init_sge_params in the
shared code. Use these per-adapter values instead of globals.
Sponsored by: Chelsio Communications
- Get the list of registers to read during a regdump from the shared
code instead of the OS specific code. This follows a similar move
internally. The shared code includes the list for T6.
- Update cxgbetool to be able to decode T5 VF, T6, and T6 VF register
dumps (and catch up with some updates to T4 and T5 register decode).
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
update to the latest internal shared code.
- Add a chip_params structure to keep track of hardware constants for
all generations of Terminators handled by cxgbe.
- Update t4_hw_pci_read_cfg4 to work with T6.
- Update the hardware debug sysctls (hidden within dev.<tNnex>.<n>.misc.*) to
work with T6. Most of the changes are in the decoders for the CIM
logic analyzer and the MPS TCAM.
- Acquire the regwin lock around indirect register accesses.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
code:
- Rename some CamelCase variables.
- s/t4_link_start/t4_link_l1cfg/g
- Pull in t4_get_port_type_description.
- Move t4_wait_op_done to t4_hw.c.
- Flip the order of the RDMA stats.
- Remove unsused function t4_iq_start_stop.
- Move t4_wait_op_done and t4_wait_op_done_val to t4_hw.c
Obtained from: Chelsio Communications
These firmwares were obtained from the beta "Chelsio T5/T4 Unified Wire
v2.12.0.2 for Linux" release. Changes since last release are listed in the
"Release Notes" accompanying the beta release and are copy-pasted here as well.
The plan is to have only GA'd firmwares in any -STABLE FreeBSD branch so I'll
MFC this (after 2 months) only if it ends up in a GA release.
================================================================================
================================================================================
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.15.28.0
Date : 02/29/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after
reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic
management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
================================================================================
================================================================================
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.28.0
Date : 02/29/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after
reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic
management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 months
Sponsored by: Chelsio Communications
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.
Submitted by: Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D4801
a) Look for the CPL in the payload buffer instead of the descriptor.
b) Retrieve the socket associated with the tid with the inpcb lock held.
Submitted by: Krishnamraju Eraparaju @ Chelsio
- Add optimizing LRO wrapper which pre-sorts all incoming packets
according to the hash type and flowid. This prevents exhaustion of
the LRO entries due to too many connections at the same time.
Testing using a larger number of higher bandwidth TCP connections
showed that the incoming ACK packet aggregation rate increased from
~1.3:1 to almost 3:1. Another test showed that for a number of TCP
connections greater than 16 per hardware receive ring, where 8 TCP
connections was the LRO active entry limit, there was a significant
improvement in throughput due to being able to fully aggregate more
than 8 TCP stream. For very few very high bandwidth TCP streams, the
optimizing LRO wrapper will add CPU usage instead of reducing CPU
usage. This is expected. Network drivers which want to use the
optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
"tcp_lro_flush()". Further the LRO control structure must be
initialized using "tcp_lro_init_args()" passing a non-zero number
into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for
statistics which can be prone to wrap-around. Fix this while at it
and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures,
especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO
control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to
change of the LRO control structure size.
Sponsored by: Mellanox Technologies
Reviewed by: gallatin, sbruno, rrs, gnn, transport
Tested by: Netflix
Differential Revision: https://reviews.freebsd.org/D4914
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.
With this change:
- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
options length. The functions gives a better estimate, since it takes
into account SACK state as well.
Reviewed by: jtl
Differential Revision: https://reviews.freebsd.org/D3593
The fd is closed later in this case. This fixes a "SS_NOFDREF on enter"
panic.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Reviewed by: Steve Wise @ Open Grid Computing
Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).
Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.
These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.
ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.
BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).
Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.
Differential Revision: https://reviews.freebsd.org/D4102
cards supported by cxgbe(4).
On the host side this driver interfaces with the storage stack via the
ICL (iSCSI Common Layer) in the kernel. On the wire the traffic is
standard iSCSI (SCSI over TCP as per RFC 3720/7143 etc.) that
interoperates with all other standards compliant implementations. The
driver is layered on top of the TOE driver (t4_tom) and promotes
connections being handled by t4_tom to iSCSI ULP (Upper Layer Protocol)
mode. Hardware assistance in this mode includes:
- Full TCP processing.
- iSCSI PDU identification and recovery within the TCP stream.
- Header and/or data digest insertion (tx) and verification (rx).
- Zero copy (both tx and rx).
Man page will follow in a separate commit in a couple of weeks.
Relnotes: Yes
Sponsored by: Chelsio Communications
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.
Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).
T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).
One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.
The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.
The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.
Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio
the TOE for LAN operation. It is possible to set this to other values
(cluster for networks with little loss and really tight RTTs, and wan
for relatively large RTTs and/or lossy networks) depending on the
environment in which the TOE is being used.
None of this affects plain NIC operation in any way.
MFC after: 1 week
attributes when replying to a TLP from a Root Port. As a workaround,
disable No Snoop and Relaxed Ordering in the Root Port of each T5 adapter
during attach so that CPU-initiated requests do not contain these flags.
Note that this affects CPU-initiated requests to all devices under this
root port.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.
Reviewed by: np @
Sponsored by: Mellanox Technologies
This is roughly the iw_cxgbe equivalent of
be13b2dff8
-----------------
RDMA/cxgb4: Connect_request_upcall fixes
When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.
If the IWCM returns an error, then we must abort the connection and
release resources. Also abort_connection() should not post a CLOSE
event, so clean that up too.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
-----------------
Submitted by: Krishnamraju Eraparaju at chelsio dot com.
changes in the firmwares since 1.11.27.0 are listed here (straight copy-paste
from the "Release Notes.txt" accompanying the Chelsio Unified Wire 2.11.1.0
release on the website).
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.14.4.0
Date : 08/05/2015
================================================================================
FIXES
-----
BASE:
- Fixes a potential data path hang by properly programming PMTX congestion
threshold settings.
- Fixes a potential initialization error when accessing a configuration file
stored on the flash.
- Fixes a regression where SGE resources can be miss-sized if iWARP is disabled.
ETH:
- Fixes a timing issue that would prevent CR4 links from coming up with some
switches.
FOFCoE:
- Defers fcoe linkdown mailbox command handling till LOGO is sent.
- Updates vlan prio for all outstanding IOs during dcbx update.
ENHANCEMENTS
------------
BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.
ETH:
- Enhances segmentation offload to include VxLAN and Geneve.
- Adds PTP support.
- Adds new interface to allow the driver to query the VI rss table base
addresses.
- Allows the driver to program the SGE ingrext contxt CongDrop field.
OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
and rcv scale factors.
iSCSI:
- Adds support for iscsi segmentatation offload (ISO).
- Adds support for iscsi t10-dif offload.
FOiSCSI:
- Sets FORCE_BIT for cut through processing for FOiSCSI.
FOFCoE:
- Adds support for FCoE BB6.
- Improves WRITE performance.
================================================================================
================================================================================
Version : 1.13.32.0
Date : 03/25/2015
================================================================================
FIXES
-----
BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
adapter configurations)
- Fixes config file based PL_TIMEOUT register programming
ETH:
- Fixes a potential EO UDP SEG header corruption
- Fixes an issue where 1000Base-X was not enabled correctly when using QSA
modules
OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1
FOFCoE:
- Fixes fcoe xchg leaks in linkdown/peer down path
- Fixes cleanup in FCoE linkdown and fixed buf timer flowid abuse
- Fixes fw crash by clearing fcf flowc during bye
FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.
ENHANCEMENTS
------------
BASE:
- Adds support for VFs on PFs 4 to 7
- Adds support for QPs/CQs on any physical and virtual function
ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
- Adds support for CR4 links (BEAN/AEC on 40G TwinAx cables)
OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software
FOISCSI:
- Adds IPv6 support for foiscsi. Keeps backward compatibility with
old foiscsi drivers which doesn't support ipv6.
FOFCoE:
- Added fcoe debug support in flowc dump
================================================================================
================================================================================
Version : 1.12.25.0
Date : 10/22/2014
================================================================================
FIXES
-----
BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Fixes an issue where the link would intermittently fail to come up
- Fixes an issue where adapters with an external PHY couldn't run at 100Mbps
- Fixes an issue where active optical cables were not recognized
- Fixes link advertising issues on T520-BT (speed and pause frames) that would
cause the link to negotiate unexpected settings
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested
ETH:
- Fixes NVGRE Segmentation Offload network header generation.
DCBX:
- Fixes an issue where some settings were not being sent to the switch
correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
FW
- Fixes a firmware crash on DCBX APP information request before link up
FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT
ENHANCEMENTS
-------------
BASE:
- Adds link partner settings reporting when available
- Adds QSA support (in conjunction with QSA VPD)
- Adds T520-BT LED support
- Reports NOTSUPPORTED for modules with an unhandled identifier
DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs
FOiSCSI:
- Add support for multiple iSCSI DDP client
- Sends DHCP renew request when lease expires
================================================================================
22.2. T4 Firmware
+++++++++++++++++
Version : 1.14.4.0
Date : 08/05/2015
================================================================================
FIXES
-----
BASE:
- Fixes a potential initialization error when accessing a configuration file
stored on the flash.
- Initialize PCIE_DBG_INDIR_REQ.Enable to 0, as hardware failed to do so and
register dumps could result in errors.
ETH:
- Fixes an issue that sometimes prevented the link from coming up in CR adapters.
ENHANCEMENTS
------------
BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.
ETH:
- Adds new interface to allow the driver to query the VI rss table base
addresses.
OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
and rcv scale factors.
================================================================================
================================================================================
Version : 1.13.32.0
Date : 03/25/2015
================================================================================
FIXES
-----
BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
adapter configurations)
- Fixes config file based PL_TIMEOUT register programming
ETH:
- Fixes a potential EO UDP SEG header corruption
OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1
FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.
ENHANCEMENTS
------------
ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software
================================================================================
================================================================================
Version : 1.12.25.0
Date : 10/22/2014
================================================================================
FIXES
-----
BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested
DCBX:
- Fixes an issue where some settings were not being sent to the switch
correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
FW
- Fixes a firmware crash on DCBX APP information request before link up
FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT
ENHANCEMENTS
------------
BASE:
- Adds link partner settings reporting when available
- Firmware now reports NOTSUPPORTED for modules with an unhandled identifier
DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs
FOiSCSI:
- Adds support for multiple iSCSI DDP clients
- Sends DHCP renew request when lease expires
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 weeks
Sponsored by: Chelsio Communications
- The existing TCP INP_INFO lock continues to protect the global inpcb list
stability during full list traversal (e.g. tcp_pcblist()).
- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
and free) and inpcb global counters.
It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.
PR: 183659
Differential Revision: https://reviews.freebsd.org/D2599
Reviewed by: jhb, adrian
Tested by: adrian, nitroboost-gmail.com
Sponsored by: Verisign, Inc.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.
Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
save it for later. This enables direct manipulation of the indirection
tables (although the stock driver doesn't do that right now).
MFC after: 1 month
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:
- fix zerocopy monitor ports and introduce copying monitor ports
(the latter are lower performance but give access to all traffic
in parallel with the application)
- exclusive open mode, useful to implement solutions that recover
from crashes of the main netmap client (suggested by Patrick Kelsey)
- revised memory allocator in preparation for the 'passthrough mode'
(ptnetmap) recently presented at bsdcan. ptnetmap is described in
S. Garzarella, G. Lettieri, L. Rizzo;
Virtual device passthrough for high speed VM networking,
ACM/IEEE ANCS 2015, Oakland (CA) May 2015
http://info.iet.unipi.it/~luigi/research.html
- fix rx CRC handing on ixl
- add module dependencies for netmap when building drivers as modules
- minor simplifications to device-specific routines (*txsync, *rxsync)
- general code cleanup (remove unused variables, introduce macros
to access rings and remove duplicate code,
Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).
Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.
MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
handle_ddp_close() function in t4_ddp.c as the logic is similar
to handle_ddp_data(). This allows all knowledge of the special
DDP mbufs to be private to t4_ddp.c as well.
strings returned to userland include the nulterm byte.
Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases. (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)
Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.
PR: 195668
netmap rx queues, and the "batchiness" of rx updates sent to the chip.
These knobs will probably become per-rxq in the near future and will be
documented only after their final form is decided.
MFC after: 1 month
It is disabled by default but users can set IFCAP_TXCSUM on the
netmap ifnet (ifconfig ncxl0 txcsum) to override netmap and force
the hardware to calculate and insert proper IP and L4 checksums in
outbound frames.
MFC after: 2 weeks
Highlights:
- Multiple verbs API updates
- Support for RoCE, RDMA over ethernet
All hardware drivers depending on the common infiniband stack has been
updated aswell.
Discussed with: np @
Sponsored by: Mellanox Technologies
MFC after: 1 month
Drivers (ULDs) and the base if_cxgbe driver.
Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.
iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD). The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
or iSCSI. Any such attempt will fail with EBUSY.
MFC after: 2 weeks
a dependency. This ensures "ifconfig cxl<n> ..." does the right thing
even when it's run with no driver loaded.
if_cxl.ko is the tiniest module in /boot/kernel.
MFC after: 2 weeks
the no-longer existant sb_cc sockbuf member.
- Use sbavail() instead of sbused() in t4_soreceive_ddp() to match the
usage in soreceive_stream() on which it is based.
Discussed with: glebius (2)