Commit Graph

187 Commits

Author SHA1 Message Date
Li-Wen Hsu
57f0337a57 Fix gcc build for cxgbe(4)
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20879
2019-07-08 19:59:15 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
John Baldwin
7b17c92129 Use unmapped (M_NOMAP) mbufs for zero-copy AIO writes via TOE.
Previously the TOE code used its own custom unmapped mbufs via
EXT_FLAG_VENDOR1.  The old version always wired the entire AIO request
buffer first for the duration of the AIO operation and constructed
multiple mbufs which used the wired buffer as an external buffer.

The new version determines how much room is available in the socket
buffer and only wires the pages needed for the available room building
chains of M_NOMAP mbufs.  This means that a large AIO write will now
limit the amount of wired memory it uses to the size of the socket
buffer.

Reviewed by:	gallatin, np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20839
2019-07-03 16:06:11 +00:00
John Baldwin
d76bbe175a Add support for IFCAP_NOMAP to cxgbe(4).
Since cxgbe(4) uses sglist instead of bus_dma, this required updates
to the code that generates scatter/gather lists for packets.  Also,
unmapped mbufs are always sent via DMA and never as immediate data in
the payload of a work request.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	np
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616
2019-06-29 00:52:21 +00:00
Navdeep Parhar
8674e626c6 cxgbe/t4_tom: Tweaks to some of the AIO related CTRs.
Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-06-28 19:57:42 +00:00
Navdeep Parhar
74a155edb0 cxgbe/t4_tom: the AIO tx job queue must be empty by the time the driver
releases the offload resources associated with the tid.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20798
2019-06-28 19:27:45 +00:00
Navdeep Parhar
d49be2a696 cxgbe/t4_tom: Mark the socket's receive as done before calling
handle_ddp_close.

This eliminates a bad race where an aio_ddp_requeue that happened to run
after handle_ddp_close could bump up the active count.

Discussed with:	jhb@
MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-06-28 04:02:56 +00:00
Navdeep Parhar
b7acf27c2e cxgbe/t4_tom: Fix regression in t_maxseg usage within t4_tom.
t_maxseg was changed in r293284 to not have any adjustment for TCP
timestamps.  t4_tom inadvertently went back to pre-r293284 semantics
in r332506.

Sponsored by:	Chelsio Communications
2019-06-28 02:41:17 +00:00
John Baldwin
7f63b888c7 Hold an explicit reference on the socket for the aiotx task.
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket.  However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty.  In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.

Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.

Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20539
2019-06-27 19:36:30 +00:00
Navdeep Parhar
17795d8234 cxgbe/t4_tom: DDP_DEAD is a ddp flag and not a toepcb flag.
The driver was in effect setting TPF_ABORT_SHUTDOWN on the toepcb
instead of what was intended.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-06-20 20:06:19 +00:00
John Baldwin
5f37b74d5d Fix debug trace after removal of pdu_overhead.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-06-07 21:30:11 +00:00
Navdeep Parhar
ebb8639822 cxgbe/t4_tom: adjust the hardware receive window to match changes to the
receive sockbuf's high water mark.

Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb.  The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).

This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-06-01 03:03:48 +00:00
Navdeep Parhar
35c0026f42 cxgbe/t4_tom: Do not attempt to look up entries in the TCB history if
it hasn't been initialized.

This fixes a bug in r346570 that could cause a panic when servicing
TCP_INFO for offloaded connections.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-05-30 17:27:40 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Navdeep Parhar
61e02298ce cxgbe/t4_tom: Add a "TCB history" feature that samples hardware state
for a tid and maintains a running history of some interesting events.

Service TCP_INFO queries from the history when the tid is being tracked
there.
2019-04-22 17:48:10 +00:00
Navdeep Parhar
be09e82abb cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc.
The declaration in tcp_var.h is still around so t4_tom continued to
compile but wouldn't load.  A separate commit will fix tcp_var.h

Reported By: Dustin Marquess (dmarquess at gmail)

Sponsored by:	Chelsio Communications
2019-03-29 16:43:24 +00:00
Navdeep Parhar
edb518f44d cxgbe(4): Treat the viid as an opaque identifier.
Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-20 17:27:11 +00:00
Navdeep Parhar
6c5c0137a9 Remove unused macros from t4_tom.h. 2018-12-21 20:46:45 +00:00
Navdeep Parhar
b156a400a6 cxgbe/t4_tom: fixes for issues on the passive open side.
- Fix PR 227760 by getting the TOE to respond to the SYN after the call
  to toe_syncache_add, not during it.  The kernel syncache code calls
  syncache_respond just before syncache_insert.  If the ACK to the
  syncache_respond is processed in another thread it may run before the
  syncache_insert and won't find the entry.  Note that this affects only
  t4_tom because it's the only driver trying to insert and expand
  syncache entries from different threads.

- Do not leak resources if an embryonic connection terminates at
  SYN_RCVD because of L2 lookup failures.

- Retire lctx->synq and associated code because there is never a need to
  walk the list of embryonic connections associated with a listener.
  The per-tid state is still called a synq entry in the driver even
  though the synq itself is now gone.

PR:		227760
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2018-12-19 01:37:00 +00:00
John Baldwin
78afed1396 Move CLIP table handling out of TOM and into the base driver.
- Store the clip table in 'struct adapter' instead of in the TOM softc.
- Init the clip table during attach and teardown during detach.
- While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the
  CLIP table.

This does mean that we update the clip table even if TOE is not enabled,
but non-TOE things need the CLIP table anyway.

Reviewed by:	np, Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D18010
2018-11-29 01:15:53 +00:00
John Baldwin
d09389fd05 Consolidate on a single set of constants for SCMD fields.
Both ccr(4) and the TOE TLS code had separate sets of constants for
fields in SCMD messages.

Sponsored by:	Chelsio Communications
2018-11-16 19:08:52 +00:00
John Baldwin
f0aefccb70 Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.
vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.

Pointy hat to:	jhb
Sponsored by:	Chelsio Communications
2018-11-16 01:27:24 +00:00
John Baldwin
47c64f9e3e Remove bogus roundup2() of the key programming work request header.
The key context is always placed immediately after the work request
header.  The total work request length has to be rounded up by 16
however.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 23:31:04 +00:00
John Baldwin
bc13c69bef Move the TLS key map into the adapter softc so non-TOE code can use it.
Sponsored by:	Chelsio Communications
2018-11-15 23:00:30 +00:00
John Baldwin
c15600b71a Use sbsndptr_adv() instead of sbsndptr() for TOE TLS.
For TOE TLS, we just want to advance the send pointer to skip over the
record just sent to the TOE.  The recently added sbsndptr_adv() is
sufficient for that and is cheaper.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 22:47:47 +00:00
John Baldwin
fe03ca08a6 Use tcp_state_change() in the cxgbe(4) TOE module.
r254889 added tcp_state_change() as a centralized place to log state
changes in TCP connections for DTrace.  r294869 and r296881 took
advantage of this central location to manage per-state counters.
However, TOE sockets were still performing some (but not all) state
change updates via direct assignments to t_state.  This resulted in
state counters underflowing when TOE was in use.  Fix by using
tcp_state_change() when changing a TOE connection's state.

Reviewed by:	np, markj
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17915
2018-11-09 21:16:45 +00:00
Navdeep Parhar
ea710848dc cxgbe(4): Link related changes.
- Switch to using 32b port/link capabilities in the driver.  The 32b
  format is used internally by firmwares > 1.16.45.0 and the driver will
  now interact with the firmware in its native format, whether it's 16b
  or 32b.  Note that the 16b format doesn't have room for 50G, 200G, or
  400G speeds.

- Add a bit in the pause_settings knobs to allow negotiated PAUSE
  settings to override manual settings.

- Ensure that manual link settings persist across an administrative
  down/up as well as transceiver unplug/replug.

- Remove unused is_*G_port() functions.

Approved by:	re@ (gjb@)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-09-25 05:52:42 +00:00
Navdeep Parhar
d6ddb0848c cxgbe/tom: Unregister shared CPL handlers on module unload. This fixes
a panic with INVARIANTS that occurs when t4_tom is unloaded and reloaded.

Approved by:	re@ (kib@)
2018-08-28 18:16:02 +00:00
Navdeep Parhar
24bc8671f9 cxgbe/tom: Make sure 'matched' is always initialized before use.
Reported by:	Coverity (CID 1390894)
MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-21 22:19:34 +00:00
Navdeep Parhar
7576fe761e cxgbe/tom: Provide the hardware tid in tcp_info.
Submitted by:	marius@
2018-08-20 21:40:14 +00:00
Navdeep Parhar
72049e7395 cxgbe/tom: Put the ifnet or VLAN's PCP value in the 802.1Q tag of frames
generated by the TOE.  Works with vid 0 (no VLAN, just priority) too.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-17 19:22:46 +00:00
Navdeep Parhar
9f78434942 cxgbe(4): Use VLAN_TRUNKDEV instead of private cookie to figure out the
parent of a VLAN ifnet.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-15 21:24:05 +00:00
Navdeep Parhar
408954013a Whitespace nit in t4_tom.h 2018-08-13 19:21:28 +00:00
Navdeep Parhar
5fc0f72f3b cxgbe(4): Add support for high priority filters on T6+. They have their
own region in the TCAM starting with T6, unlike previous chips where
they were in the same region as normal filters.

These filters "hit" before anything else in the LE's lookup.  The exact
order is:
a) High priority filters
b) TOE's active region (TCAM and/or hash)
c) Servers (TOE hw listeners)
d) Normal filters

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-09 14:19:47 +00:00
Navdeep Parhar
1979b51141 cxgbe(4): Allow user-configured and driver-configured traffic classes to
be used simultaneously.  Move sysctl_tc and sysctl_tc_params to
t4_sched.c while here.

MFC after:	3 weeks
Sponsored by:	Chelsio Communications
2018-08-06 23:21:13 +00:00
Navdeep Parhar
564ec04ea8 Fix typo in cxgbe/t4_tom. 2018-08-06 19:09:55 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Navdeep Parhar
c90a8cf80a cxgbe/t4_tom: ABORT_RPL_RSS is a shared CPL and t4_tom shouldn't remove
the global handler when it's being unloaded.
2018-05-24 08:32:02 +00:00
Navdeep Parhar
9c707b3287 cxgbe(4): Make FW4_ACK a shared CPL. ETHOFLD in the base driver will
use it for per-flow rate limiting.

Sponsored by:	Chelsio Communications
2018-05-24 08:21:43 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
John Baldwin
24ddd0ec9c Be more robust against garbage input on a TOE TLS TX socket.
If a socket is closed or shutdown and a partial record (or what
appears to be a partial record) is waiting in the socket buffer,
discard the partial record and close the connection rather than
waiting forever for the rest of the record.

Reported by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-05-18 19:09:11 +00:00
Navdeep Parhar
b3daa684d8 cxgbe(4): Filtering related features and fixes.
- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by:	Chelsio Communications
2018-05-15 04:24:38 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Navdeep Parhar
111638bf68 cxgbe(4): Convert ACT_OPEN_RPL to a shared CPL.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch
the CPL.  This allows all CPLs that use an atid to be used as shared
CPLs, although ACT_OPEN_RPL is the only one being converted in this
revision.

Sponsored by:	Chelsio Communications
2018-04-30 21:47:30 +00:00
Navdeep Parhar
f8bc08fa57 cxgbe/t4_tom: Use appropriate macros instead of magic math while
constructing the atid of an active open work request.

Sponsored by:	Chelsio Communications
2018-04-30 17:33:44 +00:00
Navdeep Parhar
4535e8046f cxgbe(4): Use opaque cookies or tid range-checks to determine the
intended recipient of a CPL when it can't be determined solely from the
opcode.  Retire the per-queue handlers for such CPLs in favor of the new
scheme.

Sponsored by:	Chelsio Communications
2018-04-30 15:18:38 +00:00
John Baldwin
4217c685b1 Use the correct key address when renegotiating the transmit key.
Previously, get_keyid() was returning the address of the receive key
instead of the transmit key when renegotiating the transmit key.  This
could either hang the card (if a connection was only offloading TLS TX
and thus had a receive key address of -1) or cause the connection to
fail by overwriting the wrong key (if both RX and TX TLS were
offloaded).

Submitted by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-04-27 17:20:23 +00:00
Navdeep Parhar
8896672a77 cxgbe(4): Move release_tid to the base NIC driver for future consumers.
Sponsored by:	Chelsio Communications.
2018-04-26 22:04:21 +00:00
Navdeep Parhar
3747c1ffc7 cxgbe(4): Break up alloc_tid_tabs and move the atid routines to the base
NIC driver.  The atid services will be used by new features (hashfilters
and inline TLS) that do not involve TOE.

Sponsored by:	Chelsio Communications
2018-04-26 19:00:35 +00:00
Navdeep Parhar
8aa1c1d8b4 cxgbe(4): Fix bugs in the handling of COP rules that match on VLAN tag.
Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-04-19 18:10:44 +00:00