Commit Graph

201 Commits

Author SHA1 Message Date
John Baldwin
4b6ed0758d cxgbe: Make the TOE ISCSI RX stats per-queue instead of per adapter.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29903
2021-05-14 12:16:33 -07:00
Navdeep Parhar
83b5cda106 cxgbe(4): Add support for NIC suspend/resume and live reset.
Add suspend/resume callbacks to the driver and a live reset built around
them.  This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip.  Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports.  It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC.  A reset will
look like a link bounce to the networking stack.

Here are some ways to exercise this functionality:

 /* Manual suspend and resume. */
 # devctl suspend t6nex0
 # devctl resume t6nex0

 /* Manual reset. */
 # devctl reset t6nex0

 /* Manual reset with driver sysctl. */
 # sysctl dev.t6nex.0.reset=1

 /* Automatic adapter reset on any fatal error. */
 # hw.cxgbe.reset_on_fatal_err=1

Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver.  All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO.  All ifnets report
link-down while the adapter is suspended.

Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets.  The ifnets are able to send
and receive traffic as soon as the link comes back up.

Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
2021-04-27 22:48:51 -07:00
Navdeep Parhar
43bbae1948 cxgbe(4): Separate the sw- and hw-specific parts of resource allocations
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC.  This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.

This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):

/* Idempotent */
alloc_foo
{
	if (!SW_ALLOCATED)
		init_iq/init_eq/init_fl		no-fail sw init
		alloc_iq_fl/alloc_eq/alloc_wrq	may-fail sw alloc
		add_foo_sysctls, etc.		no-fail post-alloc items
	if (!HW_ALLOCATED)
		alloc_iq_fl_hwq/alloc_eq_hwq	hw resource allocation
}

/* Idempotent */
free_foo
{
	if (!HW_ALLOCATED)
		free_iq_fl_hwq/free_eq_hwq	release hw resources
	if (!SW_ALLOCATED)
		free_iq_fl/free_eq/free_wrq	release sw resources
}

The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent.  The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2021-04-26 14:09:59 -07:00
Navdeep Parhar
b47b28e5b2 cxgbe(4): Add flag to reliably stop the driver from accessing hw stats.
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers.  They can be called
from the 1Hz callout or if_get_counter.  All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-04-22 17:45:52 -07:00
John Baldwin
568e69e4eb cxgbe: Add counters for iSCSI PDUs transmitted via TOE.
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29297
2021-04-12 13:57:45 -07:00
Navdeep Parhar
516fe911a6 cxgbe(4): Always use the per-VI callout to read interface stats.
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.

Sponsored by:	Chelsio Communications
Reviewed by:	jhb@
Differential Revision:	https://reviews.freebsd.org/D29527
2021-04-01 14:24:29 -07:00
John Baldwin
fe496dc02a cxgbe: Make the TOE TLS stats per-queue instead of per-port.
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29383
2021-03-26 15:19:58 -07:00
John Baldwin
077ba6a845 cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues.  Currently it only holds a work queue but will
include additional state in future changes.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29382
2021-03-26 15:19:58 -07:00
Navdeep Parhar
15f3355567 cxgbe(4): Allow a T6 adapter to switch between TOE and NIC TLS mode.
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload.  With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps.  hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.

* Enable nic_ktls_ofld in the default configuration file and use the
  firmware instead of direct register manipulation to apply/rollback
  NIC TLS configuration.  This allows the driver to switch the hardware
  between TOE and NIC TLS mode in a safe manner.  Note that the
  configuration is adapter-wide and not per-port.

* Remove the kern_tls config file as it works with 100G T6 cards only
  and leads to firmware crashes with 25G cards.  The configurations
  included with the driver (with the exception of the FPGA configs) are
  supposed to work with all adapters.

Reported by:	Veeresh U.K. at Chelsio
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Reviewed by:	jhb@
Differential Revision: https://reviews.freebsd.org/D29291
2021-03-25 12:39:41 -07:00
Navdeep Parhar
473f6163e3 cxgbe(4): use standard sysctl routines to deal with 16b values.
These routines to handle 8b and 16b types were added in r289773 5+ years
ago.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-03-19 10:56:24 -07:00
Navdeep Parhar
765d623d60 cxgbe(4): Remove extra blank line.
No functional change.
2021-03-05 12:48:39 -08:00
Navdeep Parhar
dfff1de729 cxgbe(4): Read the rx 'c' channel for a port and make it available.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-02-25 23:46:14 -08:00
Navdeep Parhar
c91dda5ad9 cxgbe(4): Add a driver ioctl to set the filter mask.
Allow the filter mask (aka the hashfilter mode when hashfilters are
in use) to be set any time it is safe to do so.  The requested mask
must be a subset of the filter mode already.  The driver will not change
the mode or ingress config just to support a new mask.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Navdeep Parhar
fae028dd97 cxgbe(4): Break up t4_read_chip_settings.
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-02-18 01:22:42 -08:00
Navdeep Parhar
3447df8bc5 cxgbe(4): Fixes to tx coalescing.
- The behavior implemented in r362905 resulted in delayed transmission
  of packets in some cases, causing performance issues.  Use a different
  heuristic to predict tx requests.

- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
  entirely.  It can be changed at any time.  There is no change in
  default behavior.
2021-02-01 03:00:09 -08:00
Navdeep Parhar
8eba75ed68 cxgbe(4): Stop but don't free netmap queues when netmap is switched off.
It is common for freelists to be starving when a netmap application
stops.  Mailbox commands to free queues can hang in such a situation.
Avoid that by not freeing the queues when netmap is switched off.
Instead, use an alternate method to stop the queues without releasing
the context ids.  If netmap is enabled again later then the same queue
is reinitialized for use.  Move alloc_nm_rxq and txq to t4_netmap.c
while here.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-12-03 08:30:29 +00:00
Navdeep Parhar
f42f3b2955 cxgbe(4): Revert r367917.
r367917 fixed the backpressure on the netmap rxq being stopped but that
doesn't help if some other netmap rxq is starved (because it is stopping
too although the driver doesn't know this yet) and blocks the pipeline.
An alternate fix that works in all cases will be checked in instead.

Sponsored by:	Chelsio Communications
2020-12-02 20:54:03 +00:00
Navdeep Parhar
b3718e2d7e cxgbe(4): Catch up with in-flight netmap rx before destroying queues.
The netmap application using the driver is responsible for replenishing
the receive freelists and they may be totally depleted when the
application exits.  Packets in flight, if any, might block the pipeline
in case there aren't enough buffers left in the freelist.  Avoid this by
filling up the freelists with a driver allocated buffer.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-11-21 03:27:32 +00:00
Navdeep Parhar
b20b25e744 cxgbe(4): fix the size of the iq/eq maps.
The firmware can allocate ingress and egress context ids anywhere from
its configured range.  Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.

Reported by:	Baptiste Wicht @ Verisign
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-22 08:40:25 +00:00
Navdeep Parhar
31deb3cc76 cxgbe(4): More fixes for the T6 FCS error counter.
r365732 was the first attempt to get an accurate count but it was
writing to some read-only registers to clear them and that obviously
didn't work.  Instead, note the counter's value when it is supposed to
be cleared and subtract it from future readings.

dev.<port>.stats.rx_fcs_error should not be serviced from the MPS
register for T6.

The stats.* sysctls should all use T5_PORT_REG for T5 and above.  This
must have been missed in the initial T5 support years ago.  Fix it while
here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-10-09 22:23:39 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Navdeep Parhar
15ca0766ed cxgbe(4): adjust the doorbell threshold for netmap freelists to match the
maximum burst size used when fetching descriptors from the list.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-09-29 07:51:06 +00:00
Navdeep Parhar
30e3f2b4ea cxgbe(4): let the PF driver use VM work requests for transmit.
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC.  Fix the GL limits for VM work requests
while here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-22 04:16:40 +00:00
Navdeep Parhar
a4a4ad2dd9 cxgbe(4): add support for stateless offloads for VXLAN traffic.
Hardware assistance includes checksumming (tx and rx), TSO, and RSS on
the inner traffic in a VXLAN tunnel.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2020-09-18 03:01:47 +00:00
Navdeep Parhar
d735920d33 cxgbe(4): changes in the Tx path to help increase tx coalescing.
- Ask the firmware for the number of frames that can be stuffed in one
  work request.

- Modify mp_ring to increase the likelihood of tx coalescing when there
  are just one or two threads that are doing most of the tx.  Add teeth
  to the abdication mechanism by pushing the consumer lock into mp_ring.
  This reduces the likelihood that a consumer will get stuck with all
  the work even though it is above its budget.

- Add support for coalesced tx WR to the VF driver.  This, with the
  changes above, results in a 7x improvement in the tx pps of the VF
  driver for some common cases.  The firmware vets the L2 headers
  submitted by the VF driver and it's a big win if the checks are
  performed for a batch of packets and not each one individually.

Reviewed by:	jhb@
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25454
2020-07-03 04:44:23 +00:00
Navdeep Parhar
7c228be30b cxgbe(4): Add a pointer to the adapter softc in vi_info.
There were quite a few places where port_info was being accessed only to
get to the adapter.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25432
2020-06-25 17:04:22 +00:00
Navdeep Parhar
0cadedfc46 cxgbe(4): Add a tx_len16_to_desc helper.
No functional change.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-06-23 07:33:29 +00:00
Navdeep Parhar
b0dede77b1 cxgbe/iw_cxgbe: Add an async callback to notify iw_cxgbe in case of a
fatal error.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-05-19 16:28:20 +00:00
John Baldwin
94fad5ffc6 Use both crypto engines on a T6.
A T6 adapter contains two crypto engines on separate channels.  This
commit distributes sessions between the two engines.  Previously, only
the first engine was used.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24347
2020-04-10 22:27:45 +00:00
John Baldwin
c034143269 Refactor driver and consumer interfaces for OCF (in-kernel crypto).
- The linked list of cryptoini structures used in session
  initialization is replaced with a new flat structure: struct
  crypto_session_params.  This session includes a new mode to define
  how the other fields should be interpreted.  Available modes
  include:

  - COMPRESS (for compression/decompression)
  - CIPHER (for simply encryption/decryption)
  - DIGEST (computing and verifying digests)
  - AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
  - ETA (combined auth and encryption using encrypt-then-authenticate)

  Additional modes could be added in the future (e.g. if we wanted to
  support TLS MtE for AES-CBC in the kernel we could add a new mode
  for that.  TLS modes might also affect how AAD is interpreted, etc.)

  The flat structure also includes the key lengths and algorithms as
  before.  However, code doesn't have to walk the linked list and
  switch on the algorithm to determine which key is the auth key vs
  encryption key.  The 'csp_auth_*' fields are always used for auth
  keys and settings and 'csp_cipher_*' for cipher.  (Compression
  algorithms are stored in csp_cipher_alg.)

- Drivers no longer register a list of supported algorithms.  This
  doesn't quite work when you factor in modes (e.g. a driver might
  support both AES-CBC and SHA2-256-HMAC separately but not combined
  for ETA).  Instead, a new 'crypto_probesession' method has been
  added to the kobj interface for symmteric crypto drivers.  This
  method returns a negative value on success (similar to how
  device_probe works) and the crypto framework uses this value to pick
  the "best" driver.  There are three constants for hardware
  (e.g. ccr), accelerated software (e.g. aesni), and plain software
  (cryptosoft) that give preference in that order.  One effect of this
  is that if you request only hardware when creating a new session,
  you will no longer get a session using accelerated software.
  Another effect is that the default setting to disallow software
  crypto via /dev/crypto now disables accelerated software.

  Once a driver is chosen, 'crypto_newsession' is invoked as before.

- Crypto operations are now solely described by the flat 'cryptop'
  structure.  The linked list of descriptors has been removed.

  A separate enum has been added to describe the type of data buffer
  in use instead of using CRYPTO_F_* flags to make it easier to add
  more types in the future if needed (e.g. wired userspace buffers for
  zero-copy).  It will also make it easier to re-introduce separate
  input and output buffers (in-kernel TLS would benefit from this).

  Try to make the flags related to IV handling less insane:

  - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
    member of the operation structure.  If this flag is not set, the
    IV is stored in the data buffer at the 'crp_iv_start' offset.

  - CRYPTO_F_IV_GENERATE means that a random IV should be generated
    and stored into the data buffer.  This cannot be used with
    CRYPTO_F_IV_SEPARATE.

  If a consumer wants to deal with explicit vs implicit IVs, etc. it
  can always generate the IV however it needs and store partial IVs in
  the buffer and the full IV/nonce in crp_iv and set
  CRYPTO_F_IV_SEPARATE.

  The layout of the buffer is now described via fields in cryptop.
  crp_aad_start and crp_aad_length define the boundaries of any AAD.
  Previously with GCM and CCM you defined an auth crd with this range,
  but for ETA your auth crd had to span both the AAD and plaintext
  (and they had to be adjacent).

  crp_payload_start and crp_payload_length define the boundaries of
  the plaintext/ciphertext.  Modes that only do a single operation
  (COMPRESS, CIPHER, DIGEST) should only use this region and leave the
  AAD region empty.

  If a digest is present (or should be generated), it's starting
  location is marked by crp_digest_start.

  Instead of using the CRD_F_ENCRYPT flag to determine the direction
  of the operation, cryptop now includes an 'op' field defining the
  operation to perform.  For digests I've added a new VERIFY digest
  mode which assumes a digest is present in the input and fails the
  request with EBADMSG if it doesn't match the internally-computed
  digest.  GCM and CCM already assumed this, and the new AEAD mode
  requires this for decryption.  The new ETA mode now also requires
  this for decryption, so IPsec and GELI no longer do their own
  authentication verification.  Simple DIGEST operations can also do
  this, though there are no in-tree consumers.

  To eventually support some refcounting to close races, the session
  cookie is now passed to crypto_getop() and clients should no longer
  set crp_sesssion directly.

- Assymteric crypto operation structures should be allocated via
  crypto_getkreq() and freed via crypto_freekreq().  This permits the
  crypto layer to track open asym requests and close races with a
  driver trying to unregister while asym requests are in flight.

- crypto_copyback, crypto_copydata, crypto_apply, and
  crypto_contiguous_subsegment now accept the 'crp' object as the
  first parameter instead of individual members.  This makes it easier
  to deal with different buffer types in the future as well as
  separate input and output buffers.  It's also simpler for driver
  writers to use.

- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
  This understands the various types of buffers so that drivers that
  use DMA do not have to be aware of different buffer types.

- Helper routines now exist to build an auth context for HMAC IPAD
  and OPAD.  This reduces some duplicated work among drivers.

- Key buffers are now treated as const throughout the framework and in
  device drivers.  However, session key buffers provided when a session
  is created are expected to remain alive for the duration of the
  session.

- GCM and CCM sessions now only specify a cipher algorithm and a cipher
  key.  The redundant auth information is not needed or used.

- For cryptosoft, split up the code a bit such that the 'process'
  callback now invokes a function pointer in the session.  This
  function pointer is set based on the mode (in effect) though it
  simplifies a few edge cases that would otherwise be in the switch in
  'process'.

  It does split up GCM vs CCM which I think is more readable even if there
  is some duplication.

- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
  as an auth algorithm and updated cryptocheck to work with it.

- Combined cipher and auth sessions via /dev/crypto now always use ETA
  mode.  The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
  This was actually documented as being true in crypto(4) before, but
  the code had not implemented this before I added the CIPHER_FIRST
  flag.

- I have not yet updated /dev/crypto to be aware of explicit modes for
  sessions.  I will probably do that at some point in the future as well
  as teach it about IV/nonce and tag lengths for AEAD so we can support
  all of the NIST KAT tests for GCM and CCM.

- I've split up the exising crypto.9 manpage into several pages
  of which many are written from scratch.

- I have converted all drivers and consumers in the tree and verified
  that they compile, but I have not tested all of them.  I have tested
  the following drivers:

  - cryptosoft
  - aesni (AES only)
  - blake2
  - ccr

  and the following consumers:

  - cryptodev
  - IPsec
  - ktls_ocf
  - GELI (lightly)

  I have not tested the following:

  - ccp
  - aesni with sha
  - hifn
  - kgssapi_krb5
  - ubsec
  - padlock
  - safe
  - armv8_crypto (aarch64)
  - glxsb (i386)
  - sec (ppc)
  - cesa (armv7)
  - cryptocteon (mips64)
  - nlmsec (mips64)

Discussed with:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
Navdeep Parhar
aa301e5ffe cxgbe(4): Split sge_nm_rxq into three cachelines.
This reduces the lines bouncing around between the driver rx ithread and
the netmap rxsync thread.  There is no net change in the size of the
struct (it continues to waste a lot of space).

This kind of split was originally proposed in D17869 by Marc De La
Gueronniere @ Verisign, Inc.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-03-20 05:12:16 +00:00
John Baldwin
6d44e8e6b5 Rename TOE TLS stats from [rt]x_tls_* to [rt]x_toe_tls_*.
This more clearly differentiates TLS records encrypted and decrypted
in TOE connections from those encrypted via NIC TLS.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-28 00:42:27 +00:00
John Baldwin
ca3b3c573e Remove the per-TXQ tls_wrs stat.
It duplicated the kern_tls_records stat and was not conditional on NIC
TLS being enabled.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23670
2020-02-13 22:55:45 +00:00
Navdeep Parhar
87bbb3338e cxgbe(4): Add pfil(9) hooks to the driver's rx.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-04 01:09:02 +00:00
Navdeep Parhar
46e1e307ed cxgbe(4): Retire the allow_mbufs_in_cluster optimization.
This simplifies the driver's rx fast path as well as the bookkeeping
code that tracks various rx buffer sizes and layouts.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-04 00:51:10 +00:00
Navdeep Parhar
d6f79b2710 cxgbe(4): Avoid ext_arg2 in rxb_free.
ext_arg2 is the only item in the third cacheline in an mbuf and could be
cold by the time rxb_free runs.  Put the information needed by rxb_free
in the same line as the refcount, which is very likely to be hot given
that rxb_free runs when the refcount is decremented and reaches 0.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-03 23:50:29 +00:00
Navdeep Parhar
a9c4062a9a cxgbe(4): Initialize the rx buffer's metadata on first-use and not on
allocation.

refill_fl doesn't touch any part of a freshly allocated cluster after
this change.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-03 23:25:12 +00:00
Navdeep Parhar
aa7bdbc00c cxgbe(4): Use TX_PKTS2 work requests in netmap Tx if it's available.
TX_PKTS2 is more efficient within the firmware and this improves netmap
Tx by a few Mpps in some common scenarios.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-12-10 08:16:19 +00:00
Navdeep Parhar
515a40d5d9 cxgbe(4): sysctl to reset the temperature/voltage sensor.
# sysctl dev.<nexus>.<inst>.reset_sensor=1
# sysctl dev.t6nex.0.reset_sensor=1

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-11-24 16:40:54 +00:00
John Baldwin
bddf73433e NIC KTLS for Chelsio T6 adapters.
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters.
Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE
connections.

NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment
the encrypted TLS frames output by the crypto engine.  Instead, the
TOE is placed into a special setup to permit "dummy" connections to be
associated with regular sockets using KTLS.  This permits using the
TOE to segment the encrypted TLS records.  However, this approach does
have some limitations:

1) Regular TOE sockets cannot be used when the TOE is in this special
   mode.  One can use either TOE and TOE-based KTLS or NIC KTLS, but
   not both at the same time.

2) In NIC KTLS mode, the TOE is only able to accept a per-connection
   timestamp offset that varies in the upper 4 bits.  Put another way,
   only connections whose timestamp offset has the 28 lower bits
   cleared can use NIC KTLS and generate correct timestamps.  The
   driver will refuse to enable NIC KTLS on connections with a
   timestamp offset with any of the lower 28 bits set.  To use NIC
   KTLS, users can either disable TCP timestamps by setting the
   net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the
   tcp_new_ts_offset() function to clear the lower 28 bits of the
   generated offset.

3) Because the TCP segmentation relies on fields mirrored in a TCB in
   the TOE, not all fields in a TCP packet can be sent in the TCP
   segments generated from a TLS record.  Specifically, for packets
   containing TCP options other than timestamps, the driver will
   inject an "empty" TCP packet holding the requested options (e.g. a
   SACK scoreboard) along with the segments from the TLS record.
   These empty TCP packets are counted by the
   dev.cc.N.txq.M.kern_tls_options sysctls.

Unlike TOE TLS which is able to buffer encrypted TLS records in
on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS
records for retransmit requests as well as non-retransmit requests
that do not include the start of a TLS record but do include the
trailer.  The T6 NIC KTLS code tries to optimize some of the cases for
requests to transmit partial TLS records.  In particular it attempts
to minimize sending "waste" bytes that have to be given as input to
the crypto engine but are not needed on the wire to satisfy mbufs sent
from the TCP stack down to the driver.

TCP packets for TLS requests are broken down into the following
classes (with associated counters):

- Mbufs that send an entire TLS record in full do not have any waste
  bytes (dev.cc.N.txq.M.kern_tls_full).

- Mbufs that send a short TLS record that ends before the end of the
  trailer (dev.cc.N.txq.M.kern_tls_short).  For sockets using AES-CBC,
  the encryption must always start at the beginning, so if the mbuf
  starts at an offset into the TLS record, the offset bytes will be
  "waste" bytes.  For sockets using AES-GCM, the encryption can start
  at the 16 byte block before the starting offset capping the waste at
  15 bytes.

- Mbufs that send a partial TLS record that has a non-zero starting
  offset but ends at the end of the trailer
  (dev.cc.N.txq.M.kern_tls_partial).  In order to compute the
  authentication hash stored in the trailer, the entire TLS record
  must be sent as input to the crypto engine, so the bytes before the
  offset are always "waste" bytes.

In addition, other per-txq sysctls are provided:

- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq
  using AES-CBC.

- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq
  using AES-GCM.

- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to
  compensate for the TOE engine not being able to set FIN on the last
  segment of a TLS record if the TLS record mbuf had FIN set.

- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this
  txq including full, short, and partial records.

- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header
  and payload) sent for TLS record requests.

- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS
  record requests.

To enable NIC KTLS with T6, set the following tunables prior to
loading the cxgbe(4) driver:

hw.cxgbe.config_file=kern_tls
hw.cxgbe.kern_tls=1

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21962
2019-11-21 19:30:31 +00:00
John Baldwin
a1b2b6e184 Create a file to hold shared routines for dealing with T6 key contexts.
ccr(4) and TLS support in cxgbe(4) construct key contexts used by the
crypto engine in the T6.  This consolidates some duplicated code for
helper functions used to build key contexts.

Reviewed by:	np
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D22156
2019-11-13 00:53:45 +00:00
John Baldwin
e38a50e8b6 Split Chelsio send tags into a generic base tag and a ratelimit tag.
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a
distinct tag from a ratelimit tag.  To support this, refactor
cxgbe_snd_tag to be a simple send tag with a type and convert the
existing ratelimit tag to a new cxgbe_rate_tag structure.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D22072
2019-10-22 20:41:54 +00:00
John Baldwin
866a7f286f Always allocate the atid table during attach.
Previously the table was allocated on first use by TOE and the
ratelimit code.  The forthcoming NIC KTLS code also uses this table.
Allocate it unconditionally during attach to simplify consumers.

Reviewed by:	np
Differential Revision:	https://reviews.freebsd.org/D22028
2019-10-22 20:01:47 +00:00
Randall Stewart
20abea6663 This adds the third step in getting BBR into the tree. BBR and
an updated rack depend on having access to the new
ratelimit api in this commit.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D20953
2019-08-01 14:17:31 +00:00
Navdeep Parhar
dd3b96ecec cxgbe(4): Count and clear interrupts generated at the software's request.
An interrupt can be requested by setting the F_SWINT bit in PL_PF_CTL.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-28 21:22:28 +00:00
Navdeep Parhar
edb518f44d cxgbe(4): Treat the viid as an opaque identifier.
Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-20 17:27:11 +00:00
Navdeep Parhar
644b22ae36 cxgbe(4): Auto-dump the CIM block's logic analyzer on a TIMER0 interrupt.
Sponsored by:	Chelsio Communications
2019-02-07 05:40:51 +00:00
Navdeep Parhar
286fd42ba6 cxgbe(4): Auto-dump the device log on a mailbox timeout or when the
firmware reports an error in pcie_fw.

Sponsored by:	Chelsio Communications
2019-02-07 05:06:29 +00:00
Navdeep Parhar
cb7c3f124a cxgbe(4): Improved error reporting and diagnostics.
"slow" interrupt handler:
- Expand the list of INT_CAUSE registers known to the driver.
- Add decode information for many more bits but decouple it from the
  rest of intr_info so that it is entirely optional.
- Call t4_fatal_err exactly once, and from the top level PL intr handler.

t4_fatal_err:
- Use t4_shutdown_adapter from the common code to stop the adapter.
- Stop servicing slow interrupts after the first fatal one.

Driver/firmware interaction:
- CH_DUMP_MBOX: note whether the mailbox being dumped is a command or a
  reply or something else.
- Log the raw value of pcie_fw for some errors.
- Use correct log levels (debug vs. error).

Sponsored by:	Chelsio Communications
2019-02-01 20:42:49 +00:00
Navdeep Parhar
f02cc9b2a8 cxgbe(4): Fall back to a basic configuration in case of any error during
card initialization.  This is an expanded version of r333682.

Break up prep_firmware into simpler routines while here.  Load the
firmware/config KLD only if needed.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-12-06 06:18:21 +00:00