Commit Graph

866 Commits

Author SHA1 Message Date
Navdeep Parhar
ebb8639822 cxgbe/t4_tom: adjust the hardware receive window to match changes to the
receive sockbuf's high water mark.

Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb.  The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).

This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-06-01 03:03:48 +00:00
Navdeep Parhar
35c0026f42 cxgbe/t4_tom: Do not attempt to look up entries in the TCB history if
it hasn't been initialized.

This fixes a bug in r346570 that could cause a panic when servicing
TCP_INFO for offloaded connections.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-05-30 17:27:40 +00:00
Alexey Dokuchaev
0a16ee7544 Fix two errors reported by PVS Studio: V646 Consider inspecting the
application's logic.  It's possible that 'else' keyword is missing.

Reviewed by:	gallatin, np, pfg
Approved by:	pfg
Differential Revision:	https://reviews.freebsd.org/D20396
2019-05-26 12:41:03 +00:00
John Baldwin
fb3bc59600 Restructure mbuf send tags to provide stronger guarantees.
- Perform ifp mismatch checks (to determine if a send tag is allocated
  for a different ifp than the one the packet is being output on), in
  ip_output() and ip6_output().  This avoids sending packets with send
  tags to ifnet drivers that don't support send tags.

  Since we are now checking for ifp mismatches before invoking
  if_output, we can now try to allocate a new tag before invoking
  if_output sending the original packet on the new tag if allocation
  succeeds.

  To avoid code duplication for the fragment and unfragmented cases,
  add ip_output_send() and ip6_output_send() as wrappers around
  if_output and nd6_output_ifp, respectively.  All of the logic for
  setting send tags and dealing with send tag-related errors is done
  in these wrapper functions.

  For pseudo interfaces that wrap other network interfaces (vlan and
  lagg), wrapper send tags are now allocated so that ip*_output see
  the wrapper ifp as the ifp in the send tag.  The if_transmit
  routines rewrite the send tags after performing an ifp mismatch
  check.  If an ifp mismatch is detected, the transmit routines fail
  with EAGAIN.

- To provide clearer life cycle management of send tags, especially
  in the presence of vlan and lagg wrapper tags, add a reference count
  to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
  Provide a helper function (m_snd_tag_init()) for use by drivers
  supporting send tags.  m_snd_tag_init() takes care of the if_ref
  on the ifp meaning that code alloating send tags via if_snd_tag_alloc
  no longer has to manage that manually.  Similarly, m_snd_tag_rele
  drops the refcount on the ifp after invoking if_snd_tag_free when
  the last reference to a send tag is dropped.

  This also closes use after free races if there are pending packets in
  driver tx rings after the socket is closed (e.g. from tcpdrop).

  In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
  csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
  Drivers now also check this flag instead of checking snd_tag against
  NULL.  This avoids false positive matches when a forwarded packet
  has a non-NULL rcvif that was treated as a send tag.

- cxgbe was relying on snd_tag_free being called when the inp was
  detached so that it could kick the firmware to flush any pending
  work on the flow.  This is because the driver doesn't require ACK
  messages from the firmware for every request, but instead does a
  kind of manual interrupt coalescing by only setting a flag to
  request a completion on a subset of requests.  If all of the
  in-flight requests don't have the flag when the tag is detached from
  the inp, the flow might never return the credits.  The current
  snd_tag_free command issues a flush command to force the credits to
  return.  However, the credit return is what also frees the mbufs,
  and since those mbufs now hold references on the tag, this meant
  that snd_tag_free would never be called.

  To fix, explicitly drop the mbuf's reference on the snd tag when the
  mbuf is queued in the firmware work queue.  This means that once the
  inp's reference on the tag goes away and all in-flight mbufs have
  been queued to the firmware, tag's refcount will drop to zero and
  snd_tag_free will kick in and send the flush request.  Note that we
  need to avoid doing this in the middle of ethofld_tx(), so the
  driver grabs a temporary reference on the tag around that loop to
  defer the free to the end of the function in case it sends the last
  mbuf to the queue after the inp has dropped its reference on the
  tag.

- mlx5 preallocates send tags and was using the ifp pointer even when
  the send tag wasn't in use.  Explicitly use the ifp from other data
  structures instead.

- Sprinkle some assertions in various places to assert that received
  packets don't have a send tag, and that other places that overwrite
  rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.

Reviewed by:	gallatin, hselasky, rgrimes, ae
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20117
2019-05-24 22:30:40 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Andrew Gallatin
50575ce11c Track TCP connection's NUMA domain in the inpcb
Drivers can now pass up numa domain information via the
mbuf numa domain field.  This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.

Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.

Reviewed by:	markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20028
2019-04-25 15:37:28 +00:00
John Baldwin
6b0451d603 Add support for AES-CCM to ccr(4).
This is fairly similar to the AES-GCM support in ccr(4) in that it will
fall back to software for certain cases (requests with only AAD and
requests that are too large).

Tested by:	cryptocheck, cryptotest.py
MFC after:	1 month
Sponsored by:	Chelsio Communications
2019-04-24 23:31:46 +00:00
John Baldwin
a2ad169e61 Fix requests for "plain" SHA digests of an empty buffer.
To workaround limitations in the crypto engine, empty buffers are
handled by manually constructing the final length block as the payload
passed to the crypto engine and disabling the normal "final" handling.
For HMAC this length block should hold the length of a single block
since the hash is actually the hash of the IPAD digest, but for
"plain" SHA the length should be zero instead.

Reported by:	NIST SHA1 test failure
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2019-04-24 23:18:10 +00:00
Andrew Gallatin
7687707dd4 Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets.  When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node.  Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by:	glebius, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19930
2019-04-22 19:24:21 +00:00
Navdeep Parhar
61e02298ce cxgbe/t4_tom: Add a "TCB history" feature that samples hardware state
for a tid and maintains a running history of some interesting events.

Service TCP_INFO queries from the history when the tid is being tracked
there.
2019-04-22 17:48:10 +00:00
Navdeep Parhar
be7eaf979e cxgbe(4): Make sure bundled_fw is always initialized before use.
This fixes a bug that prevented the driver from auto-flashing the
firmware when it didn't see one on the card.  This feature was
introduced in r321390 and this bug was introduced in r343269.

Reported by:	gallatin@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-04-22 17:00:30 +00:00
Navdeep Parhar
1b3fc371b7 cxgbe(4): Add a flag to indicate that bits in interrupt cause but not in
interrupt enable are not fatal.

The firmware sets up all the interrupt enables based on run time
configuration, which means the information in the enables is more
accurate than what's compiled into the driver.  This change also allows
the fatal bits to be updated without any changes in the driver in some
cases.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-04-02 18:50:33 +00:00
Navdeep Parhar
be09e82abb cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc.
The declaration in tcp_var.h is still around so t4_tom continued to
compile but wouldn't load.  A separate commit will fix tcp_var.h

Reported By: Dustin Marquess (dmarquess at gmail)

Sponsored by:	Chelsio Communications
2019-03-29 16:43:24 +00:00
Navdeep Parhar
dd3b96ecec cxgbe(4): Count and clear interrupts generated at the software's request.
An interrupt can be requested by setting the F_SWINT bit in PL_PF_CTL.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-28 21:22:28 +00:00
Navdeep Parhar
edb518f44d cxgbe(4): Treat the viid as an opaque identifier.
Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-20 17:27:11 +00:00
Navdeep Parhar
5e2b3494c7 iw_cxgbe: Remove unused smac_idx from the ep structure.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
2019-03-19 19:11:44 +00:00
Navdeep Parhar
4a21f4c606 cxgbe(4): Update T4/5/6 firmwares to 1.23.0.0.
Obtained from:	Chelsio Communications
MFC after:	1 month
Sponsored by:	Chelsio Communications
2019-03-13 06:46:15 +00:00
Navdeep Parhar
b43e2d7de6 cxgbev(4): Enable 32b port capabilities in the VF driver.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-02 04:39:59 +00:00
Navdeep Parhar
41dda0d9eb cxgbe(4): Don't forget to report link state to the kernel if the link is
already up at attach.

Reported by:	Fabrice Bruel @ Orange Business Service
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-01 02:43:30 +00:00
John Baldwin
d18e541983 Don't assume all children of a nexus are ports.
Specifically, ccr(4) devices are also children of cxgbe nexus devices.
Rather than making assumptions about the child device's softc, walk
the list of ports from the nexus' softc to determine if a child is a
port in t4_child_location_str().  This fixes a panic when detaching a
ccr device.

Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D19399
2019-02-28 22:10:19 +00:00
Navdeep Parhar
a098959032 cxgbe(4): Request high priority filter support explicitly, as required
by recent firmwares.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-02-28 05:45:14 +00:00
Navdeep Parhar
305e7e925f cxgbe(4): Updates to the default and hashfilter configurations.
- Do not use nvf = 4 as it is not really supported by the firmware.
  Firmwares 1.23.3.0 and above will ignore it silently.
- Increase PF4's share of the VIs and let it use all of the RSS table.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2019-02-25 16:28:13 +00:00
Navdeep Parhar
d18c10d066 cxgbe(4): Use correct port_info in the call to is_bt().
This fixes a panic during configuration if the tx channel of a port
isn't the same as its port id.

Reported by:	Fabrice Bruel
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-02-25 15:47:22 +00:00
Navdeep Parhar
3c25d4ea3c cxgbe(4): Ignore unused interrupts.
Sponsored by:	Chelsio Communications
2019-02-10 19:20:03 +00:00
Navdeep Parhar
a71c41ccc4 cxgbe(4): Delay the panic due to a fatal error by 30s.
This lets information logged by the interrupt handler reach the system
log before the system goes down.
2019-02-09 01:49:53 +00:00
Navdeep Parhar
c0a248ef93 cxgbev(4): Initialize debug_flags from the environment like in the PF driver. 2019-02-08 03:31:38 +00:00
Navdeep Parhar
644b22ae36 cxgbe(4): Auto-dump the CIM block's logic analyzer on a TIMER0 interrupt.
Sponsored by:	Chelsio Communications
2019-02-07 05:40:51 +00:00
Navdeep Parhar
286fd42ba6 cxgbe(4): Auto-dump the device log on a mailbox timeout or when the
firmware reports an error in pcie_fw.

Sponsored by:	Chelsio Communications
2019-02-07 05:06:29 +00:00
Navdeep Parhar
cb7c3f124a cxgbe(4): Improved error reporting and diagnostics.
"slow" interrupt handler:
- Expand the list of INT_CAUSE registers known to the driver.
- Add decode information for many more bits but decouple it from the
  rest of intr_info so that it is entirely optional.
- Call t4_fatal_err exactly once, and from the top level PL intr handler.

t4_fatal_err:
- Use t4_shutdown_adapter from the common code to stop the adapter.
- Stop servicing slow interrupts after the first fatal one.

Driver/firmware interaction:
- CH_DUMP_MBOX: note whether the mailbox being dumped is a command or a
  reply or something else.
- Log the raw value of pcie_fw for some errors.
- Use correct log levels (debug vs. error).

Sponsored by:	Chelsio Communications
2019-02-01 20:42:49 +00:00
Navdeep Parhar
3496224a96 cxgbe/iw_cxgbe: Fix an address calculation in the memory registration code that
was added in r342266.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2019-01-30 05:39:47 +00:00
Navdeep Parhar
ef96741259 cxgbe(4): Add adapter information to messages logged by the OS-agnostic
code in t4_hw.c.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-01-29 00:49:12 +00:00
John Baldwin
cecf8bebe7 Fix a few more places to handle ofld tx queues for RATELIMIT.
- Drain offload transmit queues when RATELIMIT is enabled but
  TCP_OFFLOAD is not.
- Expose the per-VI nofldtxq and first_ofld_txq sysctls when
  RATELIMIT is enabled but TCP_OFFLOAD is not.
- Clear offload transmit queue stats as part of a 'cxgbetool clearstats'
  request when RATELIMIT is enabled but TCP_OFFLOAD is not.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D18966
2019-01-25 20:54:18 +00:00
Navdeep Parhar
2a857082dc cxgbe(4): Allow negative values in hw.cxgbe.fw_install and take them to
mean that the driver should taste the firmware in the KLD and use that
firmware's version for all its fw_install checks.

The driver gets firmware version information from compiled-in values by
default and this change allows custom (or older/newer) firmware modules
to be used with the stock driver.

There is no change in default behavior.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-01-21 18:42:16 +00:00
Navdeep Parhar
8d5106760c cxgbe(4): Use a truncated firmware header for version checks. All the
version numbers are towards the begining of the header.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-01-21 17:58:06 +00:00
Navdeep Parhar
6baf1e4803 cxgbe(4): Clear the reply-pending status of a hashfilter when the reply
indicates an error.  Also, do not remove it twice from the hf list in
this case.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	1 week
Sponsored by:	Chelsio Communicatons
2019-01-20 23:30:16 +00:00
John Baldwin
475d54fac3 Reject new sessions if the necessary queues aren't initialized.
ccr reuses the control queue and first rx queue from the first port on
each adapter.  The driver cannot send requests until those queues are
initialized.  Refuse to create sessions for now if the queues aren't
ready.  This is a workaround until cxgbe allocates one or more
dedicated queues for ccr.

PR:		233851
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D18478
2019-01-15 18:53:45 +00:00
Navdeep Parhar
1dca7005b1 cxgbe(4): Move some INTx specific code to a more appropriate place. 2019-01-12 04:44:25 +00:00
Navdeep Parhar
1a35ae9353 cxgbe(4): Clear FW_OK if the firmware reports an error.
Sponsored by:	Chelsio Communications
2019-01-04 04:15:17 +00:00
Navdeep Parhar
450ffb7cb2 cxgbe(4): Attach to two T540 variants.
MFC after:	1 week
2018-12-30 01:57:11 +00:00
Navdeep Parhar
6c5c0137a9 Remove unused macros from t4_tom.h. 2018-12-21 20:46:45 +00:00
Navdeep Parhar
ad025209ba cxgbe/iw_cxgbe: Remove redundant CTRs from c4iw_alloc/c4iw_rdev_open.
This information is readily available elsewhere.

Sponsored by:	Chelsio Communications
2018-12-20 22:39:58 +00:00
Navdeep Parhar
6bb034658d cxgbe/iw_cxgbe: Do not terminate CTRx messages with \n. 2018-12-20 22:31:07 +00:00
Navdeep Parhar
9877f73541 cxgbe(4): Make sure the rx queues start off with the correct timestamp
settings on initialization.

Sponsored by:	Chelsio Communications
2018-12-20 20:34:21 +00:00
Navdeep Parhar
8953e80f5e cxgbe/iw_cxgbe: Use -ve errno when interfacing with linuxkpi/OFED.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-12-20 01:35:45 +00:00
Navdeep Parhar
b562884d63 cxgbe/iw_cxgbe: Add a knob for testing that lets iWARP connections cycle
through 4-tuples quickly.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-12-20 01:00:21 +00:00
Navdeep Parhar
121684b714 cxgbe/iw_cxgbe: Use DSGLs to write to card's memory when appropriate.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-12-19 23:29:01 +00:00
Navdeep Parhar
377328701d cxgbe(4): Do not issue mbox commands after t4_fw_bye.
Sponsored by:	Chelsio Communications
2018-12-19 19:21:29 +00:00
Navdeep Parhar
b156a400a6 cxgbe/t4_tom: fixes for issues on the passive open side.
- Fix PR 227760 by getting the TOE to respond to the SYN after the call
  to toe_syncache_add, not during it.  The kernel syncache code calls
  syncache_respond just before syncache_insert.  If the ACK to the
  syncache_respond is processed in another thread it may run before the
  syncache_insert and won't find the entry.  Note that this affects only
  t4_tom because it's the only driver trying to insert and expand
  syncache entries from different threads.

- Do not leak resources if an embryonic connection terminates at
  SYN_RCVD because of L2 lookup failures.

- Retire lctx->synq and associated code because there is never a need to
  walk the list of embryonic connections associated with a listener.
  The per-tid state is still called a synq entry in the driver even
  though the synq itself is now gone.

PR:		227760
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2018-12-19 01:37:00 +00:00
Navdeep Parhar
9b11a65d1c cxgbe(4): Get Linux cxgb4vf working in bhyve VMs with VFs passed
through.

cxgb4vf doesn't own the buffer size list but still expects the first two
entries to be 4K and some power of 2 respectively.  The BSD cxgbe
doesn't care where its preferred buffer sizes are as long as they're in
the list somewhere, so just move its entries towards the end as a
workaround.

MFC after:	1 month
Sponsored by:	Chelsio Communicatons
2018-12-06 21:33:08 +00:00
Navdeep Parhar
f02cc9b2a8 cxgbe(4): Fall back to a basic configuration in case of any error during
card initialization.  This is an expanded version of r333682.

Break up prep_firmware into simpler routines while here.  Load the
firmware/config KLD only if needed.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-12-06 06:18:21 +00:00