Commit Graph

1044 Commits

Author SHA1 Message Date
Navdeep Parhar
4a4e9c516c cxgbe(4): Fix an assertion that is not valid during attach.
Firmware access from t4_attach takes place without any synchronization.
The driver should not panic (debug kernels) if something goes wrong in
early communication with the firmware.  It should still load so that
it's possible to poke around with cxgbetool.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-03-05 11:28:18 -08:00
Navdeep Parhar
dfff1de729 cxgbe(4): Read the rx 'c' channel for a port and make it available.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-02-25 23:46:14 -08:00
Navdeep Parhar
0460a45062 cxgbe(4): Use the correct filter width for T5+.
T5 and above have extra bits for the optional filter fields.  This is a
correctness issue and not just a waste because a filter mode valid on a
T4 (36b) may not be valid on a T5+ (40b).

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Navdeep Parhar
c91dda5ad9 cxgbe(4): Add a driver ioctl to set the filter mask.
Allow the filter mask (aka the hashfilter mode when hashfilters are
in use) to be set any time it is safe to do so.  The requested mask
must be a subset of the filter mode already.  The driver will not change
the mode or ingress config just to support a new mask.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Navdeep Parhar
7ac8040a99 cxgbe(4): Use firmware commands to get/set filter configuration.
1. Query the firmware for filter mode, mask, and related ingress config
   instead of trying to figure them out from hardware registers.  Read
   configuration from the registers only when the firmware does not
   support this query.

2. Use the firmware to set the filter mode.  This is the correct way to
   do it and is more flexible as well.  The filter mode (and associated
   ingress config) can now be changed any time it is safe to do so.

   The user can specify a subset of a valid mode and the driver will
   enable enough bits to make sure that the mode is maxed out -- that
   is, it is not possible to set another bit without exceeding the
   total width for optional filter fields.  This is a hardware
   requirement that was not enforced by the driver previously.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Navdeep Parhar
fae028dd97 cxgbe(4): Break up t4_read_chip_settings.
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-02-18 01:22:42 -08:00
John Baldwin
1deaad9364 Handle negative return values from syncache_expand().
These errors do not clear so to NULL, so the existing check was
treating these failures as success.  The rest of do_pass_establish()
then tried to use the listen socket as if it was a connection socket
newly created by syncache_expand().

In addition, for negative return values, do not send a RST to the
peer.

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D28243
2021-02-17 13:28:04 -08:00
Alexander Motin
294e62bebf cxgbe(4): Save proper zone index on low memory in refill_fl().
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones.  But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx.  It caused problems
with the later use of that cluster, including memory and/or data
corruption.

While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.

MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Reviewed by:	np, jhb
Differential Revision:	https://reviews.freebsd.org/D28716
2021-02-16 21:15:28 -05:00
Navdeep Parhar
3447df8bc5 cxgbe(4): Fixes to tx coalescing.
- The behavior implemented in r362905 resulted in delayed transmission
  of packets in some cases, causing performance issues.  Use a different
  heuristic to predict tx requests.

- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
  entirely.  It can be changed at any time.  There is no change in
  default behavior.
2021-02-01 03:00:09 -08:00
Gleb Smirnoff
3f43ada98c Catch up with 6edfd179c8: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address.  Then, this was extended to array of
such pointers.  Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change.  The new name should be generic enough to avoid further
renaming.
2021-01-29 11:46:24 -08:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
John Baldwin
6727847500 Don't try to adjust a TLS TOE socket that has been closed.
The handshake timer can race with another thread sending a FIN or RST
to close a TOE TLS socket.  Just bail from the timer without
rescheduling if the connection is closed when the timer fires.

Reported by:	Sony Arpita Das @ Chelsio QA
Reviewed by:	np
Differential Revision:	https://reviews.freebsd.org/D27583
2020-12-30 09:56:24 -08:00
Toomas Soome
40c4557bee cxgbe: replace zero sized array by flexible array
The issue was found while building cxgbe with gcc 10 (in illumos),
the array subscription check is warning us about outside the bounds
access.

See also: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
2020-12-29 23:09:15 +02:00
John Baldwin
0082e479ef Clear TLS offload mode if a TLS socket hangs without receiving data.
By default, if a TOE TLS socket stops receiving data for more than 5
seconds, revert the connection back to plain TOE mode.  This provides
a fallback if the userland SSL library does not support KTLS.  In
addition, for client TLS 1.3 sockets using connect(), the TOE socket
blocks before the handshake has completed since the socket option is
only invoked for the final handshake.

The timeout defaults to 5 seconds, but can be changed at boot via the
hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via
the dev.<nexus>.toe.tls_rx_timeout sysctl.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27470
2020-12-03 22:06:08 +00:00
Navdeep Parhar
180c2dca4e cxgbe(4): Fix vertical alignment in sysctl_cpl_stats.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-12-03 22:04:23 +00:00
John Baldwin
99963f5343 Don't transmit mbufs that aren't yet ready on TOE sockets.
This includes mbufs waiting for data from sendfile() I/O requests, or
mbufs awaiting encryption for KTLS.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27469
2020-12-03 22:01:13 +00:00
Navdeep Parhar
dbc5c85c66 cxgbe(4): two new debug sysctls.
dev.<nexus>.<instance>.misc.tid_stats
dev.<nexus>.<instance>.misc.tnl_stats

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-12-03 22:00:41 +00:00
John Baldwin
a42f096821 Clear TLS offload mode for unsupported cipher suites and versions.
If TOE TLS is requested for an unsupported cipher suite or TLS
version, disable TLS processing and fall back to plain TOE.  In
addition, if an error occurs when saving the decryption keys in the
card's memory, disable TLS processing and fall back to plain TOE.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27468
2020-12-03 21:59:47 +00:00
John Baldwin
05d5675520 Fix downgrading of TOE TLS sockets to plain TOE.
If a TOE TLS socket ends up using an unsupported TLS version or
ciphersuite, it must be downgraded to a "plain" TOE socket with TLS
encryption/decryption performed on the host.  The previous
implementation of this fallback was incomplete and resulted in hung
connections.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27467
2020-12-03 21:49:20 +00:00
Navdeep Parhar
8eba75ed68 cxgbe(4): Stop but don't free netmap queues when netmap is switched off.
It is common for freelists to be starving when a netmap application
stops.  Mailbox commands to free queues can hang in such a situation.
Avoid that by not freeing the queues when netmap is switched off.
Instead, use an alternate method to stop the queues without releasing
the context ids.  If netmap is enabled again later then the same queue
is reinitialized for use.  Move alloc_nm_rxq and txq to t4_netmap.c
while here.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-12-03 08:30:29 +00:00
Navdeep Parhar
f42f3b2955 cxgbe(4): Revert r367917.
r367917 fixed the backpressure on the netmap rxq being stopped but that
doesn't help if some other netmap rxq is starved (because it is stopping
too although the driver doesn't know this yet) and blocks the pipeline.
An alternate fix that works in all cases will be checked in instead.

Sponsored by:	Chelsio Communications
2020-12-02 20:54:03 +00:00
Navdeep Parhar
b3718e2d7e cxgbe(4): Catch up with in-flight netmap rx before destroying queues.
The netmap application using the driver is responsible for replenishing
the receive freelists and they may be totally depleted when the
application exits.  Packets in flight, if any, might block the pipeline
in case there aren't enough buffers left in the freelist.  Avoid this by
filling up the freelists with a driver allocated buffer.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-21 03:27:32 +00:00
Navdeep Parhar
bdabd00d65 cxgbe/t4_tom: Handle VXLAN-encapsulated SYNs correctly.
TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE
rx parsing is enabled in the chip.  t4_tom should pass on these SYNs to
the kernel and let it deal with them as if they arrived on the non-TOE
path.

Reported by:	Sony at Chelsio
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-12 20:02:48 +00:00
Navdeep Parhar
f14d7c9516 cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs.
This should have been part of r366929.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-11-12 01:18:05 +00:00
John Baldwin
b3ceca0c80 Clear tp->tod in t4_pcb_detach().
Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27028
2020-11-10 19:54:39 +00:00
Navdeep Parhar
de0a3472d8 cxgbe(4): Allow the PF driver to set a VF's MAC address.
The MAC address can be set with the optional mac-addr property in the VF
section of the iovctl.conf(5) used to instantiate the VFs.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-11-09 00:08:35 +00:00
Navdeep Parhar
dc0800a9ad cxgbev(4): Use the MAC address set by the the PF if there is one.
Query the firmware for the MAC address set by the PF for the VF and use
it instead of the firmware generated MAC if it's available.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-11-09 00:01:13 +00:00
Navdeep Parhar
76b976ad98 cxgbe(4): Add the firmware binaries missing in r367428.
Obtained from:	Chelsio Communications
MFC after:	5 days
Sponsored by:	Chelsio Communications
2020-11-08 22:30:13 +00:00
Navdeep Parhar
890efa1ab9 cxgbe(4): Update firmwares to 1.25.0.40.
This fixes a potential crash in firmware 1.25.0.0 on the passive open
side during TOE operation.

Obtained from:	Chelsio Communications
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-06 19:04:20 +00:00
Mark Johnston
f7db0c9532 vmspace: Convert to refcount(9)
This is mostly mechanical except for vmspace_exit().  There, use the new
refcount_release_if_last() to avoid switching to vmspace0 unless other
processes are sharing the vmspace.  In that case, upon switching to
vmspace0 we can unconditionally release the reference.

Remove the volatile qualifier from vm_refcnt now that accesses are
protected using refcount(9) KPIs.

Reviewed by:	alc, kib, mmel
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27057
2020-11-04 16:30:56 +00:00
Navdeep Parhar
e2e43aafd7 cxgbe(4): Fix min/max typo in r366958. 2020-10-23 02:24:43 +00:00
Navdeep Parhar
b8b01d9be8 cxgbe(4): refine the values reported in if_ratelimit_query.
- Get the number of classes from chip_params.
- Get the number of ethofld tids from the firmware.
- Do not let tcp_ratelimit allocate all traffic classes.

Sponsored by:	Chelsio Communications
2020-10-23 01:36:54 +00:00
John Baldwin
8a82be5044 Handle CPL_RX_DATA on active TLS sockets.
In certain edge cases, the NIC might have only received a partial TLS
record which it needs to return to the driver.  For example, if the
local socket was closed while data was still in flight, a partial TLS
record might be pending when the connection is closed.  Receiving a
RST in the middle of a TLS record is another example.  When this
happens, the firmware returns the the partial TLS record as plain TCP
data via CPL_RX_DATA.  Handle these requests by returning an error to
OpenSSL (via so_error for KTLS or via an error TLS record header for
the older Chelsio OpenSSL interface).

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	Revision: https://reviews.freebsd.org/D26800
2020-10-23 00:23:54 +00:00
Navdeep Parhar
b20b25e744 cxgbe(4): fix the size of the iq/eq maps.
The firmware can allocate ingress and egress context ids anywhere from
its configured range.  Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.

Reported by:	Baptiste Wicht @ Verisign
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-22 08:40:25 +00:00
Navdeep Parhar
37d411338e cxgbe(4): display correct tid range for T6 based -SO cards.
Reported by:	Chelsio QA
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-21 20:42:29 +00:00
Navdeep Parhar
ae5da4e14d cxgbe(4): Updates to the drop features from r366532.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-19 21:11:49 +00:00
John Baldwin
6b7ecdcd9d Re-enable receive flow control for TOE TLS sockets.
Flow control was disabled during initial TOE TLS development to
workaround a hang (and to match the Linux TOE TLS support for T6).
The rest of the TOE TLS code maintained credits as if flow control was
enabled which was inherited from before the workaround was added with
the exception that the receive window was allowed to go negative.
This negative receive window handling (rcv_over) was because I hadn't
realized the full implications of disabling flow control.

To clean this up, re-enable flow control on TOE TLS sockets.  The
existing TPF_FORCE_CREDITS workaround is sufficient for the original
hang.  Now that flow control is enabled, remove the rcv_over
workaround and instead assert that the receive window never goes
negative matching plain TCP TOE sockets.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26799
2020-10-19 20:08:50 +00:00
Navdeep Parhar
3f3e04a062 cxgbe(4): Fix page fault in t4_get_lb_stats with 2 port T5 cards.
PR:		250449
Reported by:	freqlabs@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-19 20:08:47 +00:00
Navdeep Parhar
472d183268 cxgbe(4): Do not request FEC when requesting speeds that don't have FEC.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-14 10:12:39 +00:00
Navdeep Parhar
6cc4520b0a cxgbe(4): unimplemented cudbg routines should return the correct
internal error code and not an errno.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-14 08:04:39 +00:00
Navdeep Parhar
31deb3cc76 cxgbe(4): More fixes for the T6 FCS error counter.
r365732 was the first attempt to get an accurate count but it was
writing to some read-only registers to clear them and that obviously
didn't work.  Instead, note the counter's value when it is supposed to
be cleared and subtract it from future readings.

dev.<port>.stats.rx_fcs_error should not be serviced from the MPS
register for T6.

The stats.* sysctls should all use T5_PORT_REG for T5 and above.  This
must have been missed in the initial T5 support years ago.  Fix it while
here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-10-09 22:23:39 +00:00
Navdeep Parhar
77af2b2c85 cxgbe(4): knobs to drop various kinds of undesirable frames on ingress.
These kind of drops come for free in the sense that they do not use the
filter TCAM or any other resource that wouldn't normally be used during
rx.  Frames dropped by the hardware get counted in the MAC's rx stats
but are not delivered to the driver.

hw.cxgbe.attack_filter
Set to 1 to enable the "attack filter".  Default is 0.  The attack
filter will drop an incoming frame if any of these conditions is true:
src ip/ip6 == dst ip/ip6; tcp and src/dst ip is not unicast; src/dst ip
is loopback (127.x.y.z); src ip6 is not unicast; src/dst ip6 is loopback
(::1/128) or unspecified (::/128); tcp and src/dst ip6 is mcast
(ff00::/8).

hw.cxgbe.drop_ip_fragments
Set to 1 to drop all incoming IP fragments.  Default is 0.  Note that
this drops valid frames.

hw.cxgbe.drop_pkts_with_l2_errors
Set to 1 to drop incoming frames with Layer 2 length or checksum errors.
Default is 1.

hw.cxgbe.drop_pkts_with_l3_errors
Set to 1 to drop incoming frames with IP version, length, or checksum
errors.  Default is 0.

hw.cxgbe.drop_pkts_with_l4_errors
Set to 1 to drop incoming frames with Layer 4 length, checksum, or other
errors.  Default is 0.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-10-08 10:00:13 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Navdeep Parhar
8741306b3b cxgbe(4) sysctls do not need Giant.
Sponsored by:	Chelsio Communications
2020-10-05 22:18:04 +00:00
Navdeep Parhar
73f6606b47 cxgbe(4): set up the firmware flowc for the tid before send_abort_rpl.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-10-02 23:48:57 +00:00
Navdeep Parhar
7676c62aa3 cxgbe(4): validate largest_rx_cluster and safest_rx_cluster.
These tunables can only be set to a valid cluster size (2K, 4K, 9K, or
16K) as documented in the man page.  Anything else could lead to a
panic on interface up.

Reported by:	mav@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-02 05:59:55 +00:00
John Baldwin
0e99339684 Fallback to software for more GCM and CCM requests.
ccr(4) uses software to handle GCM and CCM requests not supported by
the crypto engine (e.g. with only AAD and no payload).  This change
adds a fallback for a few more requests such as those with more SGL
entries than can fit in a work request (this can happen for GCM when
decrypting a TLS record split across 15 or more packets).

Reported by:	Chelsio QA
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26582
2020-09-29 21:51:32 +00:00
Navdeep Parhar
822967e7e5 cxgbe(4): Avoid unnecessary work in the firmware during netmap tx.
Bind the netmap tx queues to a special '0xff' scheduling class which
makes the firmware skip some processing related to rate limiting on the
outgoing traffic.  Future firmwares will do this automatically.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-09-29 09:25:52 +00:00
Navdeep Parhar
7efe256233 Remove duplicate line. 2020-09-29 09:11:51 +00:00
Navdeep Parhar
15ca0766ed cxgbe(4): adjust the doorbell threshold for netmap freelists to match the
maximum burst size used when fetching descriptors from the list.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-09-29 07:51:06 +00:00