Commit Graph

37 Commits

Author SHA1 Message Date
Navdeep Parhar
72049e7395 cxgbe/tom: Put the ifnet or VLAN's PCP value in the 802.1Q tag of frames
generated by the TOE.  Works with vid 0 (no VLAN, just priority) too.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-17 19:22:46 +00:00
Navdeep Parhar
9f78434942 cxgbe(4): Use VLAN_TRUNKDEV instead of private cookie to figure out the
parent of a VLAN ifnet.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-15 21:24:05 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Navdeep Parhar
111638bf68 cxgbe(4): Convert ACT_OPEN_RPL to a shared CPL.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch
the CPL.  This allows all CPLs that use an atid to be used as shared
CPLs, although ACT_OPEN_RPL is the only one being converted in this
revision.

Sponsored by:	Chelsio Communications
2018-04-30 21:47:30 +00:00
Navdeep Parhar
f8bc08fa57 cxgbe/t4_tom: Use appropriate macros instead of magic math while
constructing the atid of an active open work request.

Sponsored by:	Chelsio Communications
2018-04-30 17:33:44 +00:00
Navdeep Parhar
3747c1ffc7 cxgbe(4): Break up alloc_tid_tabs and move the atid routines to the base
NIC driver.  The atid services will be used by new features (hashfilters
and inline TLS) that do not involve TOE.

Sponsored by:	Chelsio Communications
2018-04-26 19:00:35 +00:00
Navdeep Parhar
8aa1c1d8b4 cxgbe(4): Fix bugs in the handling of COP rules that match on VLAN tag.
Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-04-19 18:10:44 +00:00
Navdeep Parhar
1131c927c4 cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload.  t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface.  The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so.  The general format
of a rule is: [socket-type] expr => settings

Example.  See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match.  There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by:	Chelsio Communications
2018-04-14 19:07:56 +00:00
John Baldwin
1e9538d253 Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections.  Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing.  Currently
these socket options are implemented as TCP options in the vendor
specific range.  A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit.  TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface.  Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key.  This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library).  Receive offload can only be used with TOE sockets using the
TLS mode.  The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers.  Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket.  Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value.  A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library.  Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs.  If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request.  This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages.  The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb.  The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors.  The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host.  As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket.  In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase.  Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS).  The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k).  To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14529
2018-03-13 23:05:51 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Navdeep Parhar
3ef7429927 cxgbe/t4_tom: Add a knob to select the congestion control algorigthm
used by the TOE hardware for fully offloaded connections.  The knob
affects new connections only.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-08-31 20:33:22 +00:00
Navdeep Parhar
eaf5669483 cxgbe/t4_tom: Fix CLIP entry refcounting on the passive side. Every
IPv6 connection being handled by the TOE should have a reference on its
CLIP entry.

Sponsored by:	Chelsio Communications
2017-02-06 17:48:25 +00:00
John Baldwin
fbb779cca9 Unregister CPL handlers for TOE-related messages when unloading TOM.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-01-27 23:08:30 +00:00
Navdeep Parhar
a342904bb5 cxgbe/tom: Add VIMAGE support to the TOE driver.
Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
  switch to it when processing the reply (do_act_open_rpl or
  do_act_establish).

Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
  to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
  listening socket.

CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
  associated with one of the card's ifnets.

Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
  so t4_push_frames runs in proper vnet context already.  One exception
  is when incoming credits trigger tx within the driver's ithread.  Set
  the vnet explicitly in do_fw4_ack for that case.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-11 23:48:17 +00:00
Navdeep Parhar
d663d5ca23 cxgbe/t4_tom: Fix tid accounting. An offloaded IPv6 connection uses 2
tids, not 1, in the hardware.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-07 20:26:19 +00:00
Navdeep Parhar
e4612db2cd Fix comment in t4_tom. No functional change.
MFC after:	3 days
2017-01-07 00:08:55 +00:00
Navdeep Parhar
efe00fd923 cxgbe/t4_tom: Update the active/passive open code to support T6. Data
path works as-is.

Sponsored by:	Chelsio Communications
2016-09-17 23:08:49 +00:00
Navdeep Parhar
5aaa3bc3b9 cxgbe/t4_tom: toepcb should be all-zero on allocation because the code
that cleans up on failure assumes that non-NULL values indicate
initialized items.

Sponsored by:	Chelsio Communications
2016-09-05 19:37:47 +00:00
Navdeep Parhar
671bf2b8b2 cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function.  Allow callers to direct the response to any iq.  Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter.  This reduces its
size by around 2KB.  All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation.  The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode.  There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL.  The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply.  Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by:	re@ (kib@)
Sponsored by:	Chelsio Communications
2016-07-05 01:29:24 +00:00
Navdeep Parhar
40bf7442fa cxgbe: catch up with the latest hardware-related definitions.
Obtained from:	Chelsio Communications
Sponsored by:	Chelsio Communications
2016-02-19 00:29:16 +00:00
Gleb Smirnoff
f353ae1c62 More fixes to the build. 2016-01-27 05:15:53 +00:00
John Baldwin
fe2ebb7644 Add support for configuring additional virtual interfaces (VIs) on a port.
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port.  This change allows additional non-netmap
interfaces to be configured on each port.  Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.

Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).

T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).

One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.

The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.

The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats.  There is currently no way to
clear the stats of an individual VI.

Reviewed by:	np
MFC after:	1 month
Sponsored by:	Chelsio
2015-12-03 00:02:01 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Navdeep Parhar
19abdd0654 cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by:	Hariprasad at chelsio dot com (original version)
MFC after:	2 weeks.
2014-10-07 21:26:22 +00:00
Gleb Smirnoff
66e01d73cd - Provide necessary includes.
- Remove unnecessary includes.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-29 11:17:49 +00:00
Navdeep Parhar
6eb3180fb2 - Make note of interface MTU change if the rx queues exist, and not just
when the interface is up.
- Add a tunable to control the TOE's rx coalesce feature (enabled by
  default as it always has been).  Consider the interface MTU or the
  coalesce size when deciding which cluster zone to use to fill the
  offload rx queue's free list.  The tunable is:
  dev.{t4nex,t5nex}.<N>.toe.rx_coalesce

MFC after:	1 day
2013-07-04 21:19:01 +00:00
Navdeep Parhar
054a2dc11c The T5 allows the driver to specify the ISS. Do so; use the ISS picked
by the kernel.

MFC after:	1 day
2013-07-04 18:41:21 +00:00
Navdeep Parhar
c337fa30af - Read all TP parameters in one place.
- Read the filter mode, calculate various shifts, and use them
  properly during active open (in select_ntuple).

MFC after:	1 day
2013-07-04 17:55:52 +00:00
Navdeep Parhar
b7a7c6d0c3 cxgbe/tom: Slight simplification of code that calculates options2.
MFC after:	3 days
2013-04-11 21:36:01 +00:00
Navdeep Parhar
d14b0ac129 cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.

The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver.  T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.

Sponsored by:	Chelsio
2013-03-30 02:26:20 +00:00
Navdeep Parhar
dfd1b3a02f Add a couple of missing error codes. Treat CPL_ERR_KEEPALV_NEG_ADVICE as
negative advice and not a fatal error.

MFC after:	3 days
2013-01-26 03:01:51 +00:00
Navdeep Parhar
8be0815671 cxgbe/tom: Add support for fully offloaded TCP/IPv6 connections (active open).
MFC after:	1 week
2013-01-15 18:38:51 +00:00
Navdeep Parhar
c91bcaaab5 Minor cleanup: use bitwise ops instead of pointless wrappers around
setbit/clrbit.
2012-08-21 18:30:16 +00:00
Navdeep Parhar
06fd9875aa Correctly handle the case where an inp has already been dropped by the time
the TOE driver reports that an active open failed.  toe_connect_failed is
supposed to handle this but it should be provided the inpcb instead of the
tcpcb which may no longer be around.
2012-08-21 18:09:33 +00:00
Navdeep Parhar
e682d02e12 Support for TCP DDP (Direct Data Placement) in the T4 TOE module.
Basically, this is automatic rx zero copy when feasible.  TCP payload is
DMA'd directly into the userspace buffer described by the uio submitted
in soreceive by an application.

- Works with sockets that are being handled by the TCP offload engine
  of a T4 chip (you need t4_tom.ko module loaded after cxgbe, and an
  "ifconfig +toe" on the cxgbe interface).
- Does not require any modification to the application.
- Not enabled by default.  Use hw.t4nex.<X>.toe.ddp="1" to enable it.
2012-08-17 00:49:29 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00