Commit Graph

68 Commits

Author SHA1 Message Date
John Baldwin
8a82be5044 Handle CPL_RX_DATA on active TLS sockets.
In certain edge cases, the NIC might have only received a partial TLS
record which it needs to return to the driver.  For example, if the
local socket was closed while data was still in flight, a partial TLS
record might be pending when the connection is closed.  Receiving a
RST in the middle of a TLS record is another example.  When this
happens, the firmware returns the the partial TLS record as plain TCP
data via CPL_RX_DATA.  Handle these requests by returning an error to
OpenSSL (via so_error for KTLS or via an error TLS record header for
the older Chelsio OpenSSL interface).

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	Revision: https://reviews.freebsd.org/D26800
2020-10-23 00:23:54 +00:00
John Baldwin
8cce4145fa Add support for KTLS RX over TOE to T6.
This largely reuses the TLS TOE support added in r330884.  However,
this uses the KTLS framework in upstream OpenSSL rather than requiring
Chelsio-specific patches to OpenSSL.  As with the existing TLS TOE
support, use of RX offload requires setting the tls_rx_ports sysctl.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24453
2020-04-27 23:59:42 +00:00
Alexander V. Chernikov
8d6708ba80 Convert TOE routing lookups to the new routing KPI.
Reviewed by:	np
Differential Revision:	https://reviews.freebsd.org/D24388
2020-04-22 07:53:43 +00:00
John Baldwin
708652acc4 Set inp_flowid's for TOE connections.
KTLS uses the flowid to distribute software encryption tasks among its
pool of worker threads.  Without this change, all software KTLS
requests for TOE sockets ended up on the first worker thread.

Note that the flowid for TOE sockets created via connect() is not a
hash of the 4-tuple, but is instead the id of the TOE pcb (tid).  The
flowid of TOE sockets created from TOE listen sockets do use the
4-tuple RSS hash as the flowid since the firmware provides the hash in
the message containing the original SYN.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24348
2020-04-15 19:28:51 +00:00
Navdeep Parhar
843b264a85 cxgbe(4): Make sure 'flags' is at the same offset in structs toepcb and
synq_entry.  TAILQ_ENTRY isn't always the same size as two pointers.

Reported by:	rmacklem@
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-04-13 20:12:47 +00:00
Navdeep Parhar
7ba6f5493d cxgbe/t4_tom: Do not uninitialize a toepcb that has not been initialized.
This fixes the following panic:
--- trap 0xc, rip = 0xffffffff80c00411, rsp = 0xfffffe0025192840, rbp = 0xfffffe0025192860 ---
vmem_xfree() at vmem_xfree+0xd1/frame 0xfffffe0025192860
tls_uninit_toep() at tls_uninit_toep+0x78/frame 0xfffffe0025192880
free_toepcb() at free_toepcb+0x32/frame 0xfffffe00251928a0
t4_connect() at t4_connect+0x3be/frame 0xfffffe0025192950
tcp_offload_connect() at tcp_offload_connect+0xa4/frame 0xfffffe0025192990
tcp_usr_connect() at tcp_usr_connect+0xec/frame 0xfffffe00251929f0
soconnect() at soconnect+0xae/frame 0xfffffe0025192a30
kern_connectat() at kern_connectat+0xe2/frame 0xfffffe0025192a90
sys_connect() at sys_connect+0x75/frame 0xfffffe0025192ad0
amd64_syscall() at amd64_syscall+0x137/frame 0xfffffe0025192bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0025192bf0
--- syscall (98, FreeBSD ELF64, sys_connect), rip = 0x8008e9d8a, rsp = 0x7fffffffc0f8, rbp = 0x7fffffffc130 ---

Reviewed by:	jhb@
MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23989
2020-03-06 19:56:12 +00:00
John Baldwin
4f13842f75 Add support for KTLS in the Chelsio TOE module.
This adds a TOE hook to allocate a KTLS session.  It also recognizes
TLS mbufs in the socket buffer and sends those to the NIC using a TLS
work request to encrypt the record before segmenting it.

TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in
addition to enabling KTLS.

Reviewed by:	np, gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21891
2019-10-08 21:40:42 +00:00
Navdeep Parhar
c537e887ac cxgbe/t4_tom: Initialize all TOE connection parameters in one place.
Remove now-redundant items from toepcb and synq_entry and the code to
support them.

Let the driver calculate tx_align, rx_coalesce, and sndbuf by default.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21387
2019-08-27 04:19:40 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
John Baldwin
7b17c92129 Use unmapped (M_NOMAP) mbufs for zero-copy AIO writes via TOE.
Previously the TOE code used its own custom unmapped mbufs via
EXT_FLAG_VENDOR1.  The old version always wired the entire AIO request
buffer first for the duration of the AIO operation and constructed
multiple mbufs which used the wired buffer as an external buffer.

The new version determines how much room is available in the socket
buffer and only wires the pages needed for the available room building
chains of M_NOMAP mbufs.  This means that a large AIO write will now
limit the amount of wired memory it uses to the size of the socket
buffer.

Reviewed by:	gallatin, np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20839
2019-07-03 16:06:11 +00:00
Navdeep Parhar
b7acf27c2e cxgbe/t4_tom: Fix regression in t_maxseg usage within t4_tom.
t_maxseg was changed in r293284 to not have any adjustment for TCP
timestamps.  t4_tom inadvertently went back to pre-r293284 semantics
in r332506.

Sponsored by:	Chelsio Communications
2019-06-28 02:41:17 +00:00
John Baldwin
7f63b888c7 Hold an explicit reference on the socket for the aiotx task.
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket.  However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty.  In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.

Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.

Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20539
2019-06-27 19:36:30 +00:00
Navdeep Parhar
ebb8639822 cxgbe/t4_tom: adjust the hardware receive window to match changes to the
receive sockbuf's high water mark.

Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb.  The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).

This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-06-01 03:03:48 +00:00
Navdeep Parhar
61e02298ce cxgbe/t4_tom: Add a "TCB history" feature that samples hardware state
for a tid and maintains a running history of some interesting events.

Service TCP_INFO queries from the history when the tid is being tracked
there.
2019-04-22 17:48:10 +00:00
Navdeep Parhar
6c5c0137a9 Remove unused macros from t4_tom.h. 2018-12-21 20:46:45 +00:00
Navdeep Parhar
b156a400a6 cxgbe/t4_tom: fixes for issues on the passive open side.
- Fix PR 227760 by getting the TOE to respond to the SYN after the call
  to toe_syncache_add, not during it.  The kernel syncache code calls
  syncache_respond just before syncache_insert.  If the ACK to the
  syncache_respond is processed in another thread it may run before the
  syncache_insert and won't find the entry.  Note that this affects only
  t4_tom because it's the only driver trying to insert and expand
  syncache entries from different threads.

- Do not leak resources if an embryonic connection terminates at
  SYN_RCVD because of L2 lookup failures.

- Retire lctx->synq and associated code because there is never a need to
  walk the list of embryonic connections associated with a listener.
  The per-tid state is still called a synq entry in the driver even
  though the synq itself is now gone.

PR:		227760
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2018-12-19 01:37:00 +00:00
John Baldwin
78afed1396 Move CLIP table handling out of TOM and into the base driver.
- Store the clip table in 'struct adapter' instead of in the TOM softc.
- Init the clip table during attach and teardown during detach.
- While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the
  CLIP table.

This does mean that we update the clip table even if TOE is not enabled,
but non-TOE things need the CLIP table anyway.

Reviewed by:	np, Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D18010
2018-11-29 01:15:53 +00:00
John Baldwin
f0aefccb70 Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.
vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.

Pointy hat to:	jhb
Sponsored by:	Chelsio Communications
2018-11-16 01:27:24 +00:00
John Baldwin
bc13c69bef Move the TLS key map into the adapter softc so non-TOE code can use it.
Sponsored by:	Chelsio Communications
2018-11-15 23:00:30 +00:00
Navdeep Parhar
408954013a Whitespace nit in t4_tom.h 2018-08-13 19:21:28 +00:00
Navdeep Parhar
4535e8046f cxgbe(4): Use opaque cookies or tid range-checks to determine the
intended recipient of a CPL when it can't be determined solely from the
opcode.  Retire the per-queue handlers for such CPLs in favor of the new
scheme.

Sponsored by:	Chelsio Communications
2018-04-30 15:18:38 +00:00
Navdeep Parhar
8896672a77 cxgbe(4): Move release_tid to the base NIC driver for future consumers.
Sponsored by:	Chelsio Communications.
2018-04-26 22:04:21 +00:00
Navdeep Parhar
1131c927c4 cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload.  t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface.  The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so.  The general format
of a rule is: [socket-type] expr => settings

Example.  See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match.  There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by:	Chelsio Communications
2018-04-14 19:07:56 +00:00
John Baldwin
edf95feba4 Use the offload transmit queue to set flags on TLS connections.
Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.

However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue.  This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.

Submitted by:	Harsh Jain @ Chelsio (original version)
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14871
2018-03-27 20:54:57 +00:00
John Baldwin
02d2bcfaba Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function.  AIO special
  handling only needs to be disabled when a TOE socket is actively doing
  TLS offload on transmit.  The TOE socket's mode (which affects receive
  operation) doesn't matter, so remove the check for the socket's mode and
  only check if a TOE socket has TLS transmit keys configured to determine
  if an AIO write request should fall back to the normal socket handling
  instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c.  It is not used in critical paths,
  so inlining isn't that important.  Change return type to bool while here.

Sponsored by:	Chelsio Communications
2018-03-14 20:46:25 +00:00
John Baldwin
1e9538d253 Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections.  Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing.  Currently
these socket options are implemented as TCP options in the vendor
specific range.  A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit.  TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface.  Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key.  This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library).  Receive offload can only be used with TOE sockets using the
TLS mode.  The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers.  Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket.  Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value.  A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library.  Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs.  If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request.  This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages.  The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb.  The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors.  The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host.  As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket.  In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase.  Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS).  The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k).  To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14529
2018-03-13 23:05:51 +00:00
John Baldwin
9689995d23 Simplify error handling in t4_tom.ko module loading.
- Change t4_ddp_mod_load() to return void instead of always returning
  success.  This avoids having to pretend to have proper support for
  unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
  The module handling code in the kernel invokes MOD_UNLOAD on a module
  whose MOD_LOAD fails with an error already.

Reviewed by:	np (part of a larger patch)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-03-13 21:42:38 +00:00
John Baldwin
125d42fe81 Move DDP PCB state into a helper structure.
This consolidates all of the DDP state in one place.  Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections.  This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-02-22 01:50:30 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
John Baldwin
91a65e2f6b Avoid reusing the wrong buffer for a DDP AIO request.
To optimize the case of ping-ponging between two buffers, the DDP code
caches the last two buffers used keeping the pages wired and page pods
stored in the NIC's RAM.  If a new aio_read() request uses one of the
same buffers, then the work of holding pages, etc. can be avoided.
However, the starting virtual address of an aio buffer was not saved,
only the page count, length, and initial page offset.  Thus, an
aio_read() request could match a different buffer in the address
space.  (Earlier during development vm_fault_hold_quick_pages() was
always called and the vm_page_t values were compared, but that was
eventually removed without being adequately replaced.)  Fix by storing
the starting virtual address and comparing that (along with other
fields) to determine if a buffer can be reused.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-09-15 22:40:57 +00:00
Navdeep Parhar
6790499792 cxgbe/t4_tom: Per-connection rate limiting for TCP sockets handled by
the TOE.  For now this capability is always enabled in kernels with
options RATELIMIT.  t4_tom will check if_capenable once the base driver
gets code to support rate limiting for any socket (TOE or not).

This was tested with iperf3 and netperf ToT as they already support
SO_MAX_PACING_RATE sockopt.  There is a bug in firmwares prior to
1.16.45.0 that affects the BSD driver only and results in rate-limiting
at an incorrect rate.  This will resolve by itself as soon as 1.16.45.0
or later firmware shows up in the driver.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2017-05-05 20:06:49 +00:00
Navdeep Parhar
eaf5669483 cxgbe/t4_tom: Fix CLIP entry refcounting on the passive side. Every
IPv6 connection being handled by the TOE should have a reference on its
CLIP entry.

Sponsored by:	Chelsio Communications
2017-02-06 17:48:25 +00:00
John Baldwin
fbb779cca9 Unregister CPL handlers for TOE-related messages when unloading TOM.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-01-27 23:08:30 +00:00
Navdeep Parhar
a342904bb5 cxgbe/tom: Add VIMAGE support to the TOE driver.
Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
  switch to it when processing the reply (do_act_open_rpl or
  do_act_establish).

Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
  to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
  listening socket.

CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
  associated with one of the card's ifnets.

Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
  so t4_push_frames runs in proper vnet context already.  One exception
  is when incoming credits trigger tx within the driver's ithread.  Set
  the vnet explicitly in do_fw4_ack for that case.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-11 23:48:17 +00:00
Navdeep Parhar
d663d5ca23 cxgbe/t4_tom: Fix tid accounting. An offloaded IPv6 connection uses 2
tids, not 1, in the hardware.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-07 20:26:19 +00:00
Navdeep Parhar
a9feb2cdbb cxgbe/t4_tom: Two new routines to allocate and write page pods for a
buffer in the kernel's address space.
2016-09-01 00:51:59 +00:00
Navdeep Parhar
968267fdb8 cxgbe/t4_tom: Add general purpose routines to deal with page pod regions
and allocations within them.  Switch to these routines to manage the TOE
DDP region.

Sponsored by:	Chelsio Communications
2016-08-31 23:23:46 +00:00
Navdeep Parhar
515b36c5b5 cxgbe/t4_tom: Read the chip's DDP page sizes and save them in a
per-adapter data structure.  This replaces a global array with hardcoded
page sizes.

Sponsored by:	Chelsio Communications
2016-08-02 23:54:21 +00:00
John Baldwin
07159830be Add support for zero-copy aio_write() on TOE sockets.
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer.  This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer.  The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.

Because these mbufs do not have an associated virtual address, m_data
is not valid.  Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().

An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job).  The
special mbufs reference this structure via m_ext.  Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size).  The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer.  The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.

Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
2016-07-27 18:29:35 +00:00
Navdeep Parhar
671bf2b8b2 cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function.  Allow callers to direct the response to any iq.  Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter.  This reduces its
size by around 2KB.  All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation.  The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode.  There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL.  The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply.  Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by:	re@ (kib@)
Sponsored by:	Chelsio Communications
2016-07-05 01:29:24 +00:00
John Baldwin
dc9643853d Use DDP to implement zerocopy TCP receive with aio_read().
Chelsio's TCP offload engine supports direct DMA of received TCP payload
into wired user buffers.  This feature is known as Direct-Data Placement.
However, to scale well the adapter needs to prepare buffers for DDP
before data arrives.  aio_read() is more amenable to this requirement than
read() as applications often call read() only after data is available in
the socket buffer.

When DDP is enabled, TOE sockets use the recently added pru_aio_queue
protocol hook to claim aio_read(2) requests instead of letting them use
the default AIO socket logic.  The DDP feature supports scheduling DMA
to two buffers at a time so that the second buffer is ready for use
after the first buffer is filled.  The aio/DDP code optimizes the case
of an application ping-ponging between two buffers (similar to the
zero-copy bpf(4) code) by keeping the two most recently used AIO buffers
wired.  If a buffer is reused, the aio/DDP code is able to reuse the
vm_page_t array as well as page pod mappings (a kind of MMU mapping the
Chelsio NIC uses to describe user buffers).  The generation of the
vmspace of the calling process is used in conjunction with the user
buffer's address and length to determine if a user buffer matches a
previously used buffer.  If an application queues a buffer for AIO that
does not match a previously used buffer then the least recently used
buffer is unwired before the new buffer is wired.  This ensures that no
more than two user buffers per socket are ever wired.

Note that this feature is best suited to applications sending a steady
stream of data vs short bursts of traffic.

Discussed with:	np
Relnotes:	yes
Sponsored by:	Chelsio Communications
2016-05-07 00:33:35 +00:00
Navdeep Parhar
9eb533d3b4 cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI
offload driver.  These changes come from projects/cxl_iscsi.
2015-12-26 00:26:02 +00:00
John Baldwin
fe2ebb7644 Add support for configuring additional virtual interfaces (VIs) on a port.
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port.  This change allows additional non-netmap
interfaces to be configured on each port.  Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.

Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).

T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).

One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.

The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.

The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats.  There is currently no way to
clear the stats of an individual VI.

Reviewed by:	np
MFC after:	1 month
Sponsored by:	Chelsio
2015-12-03 00:02:01 +00:00
John Baldwin
b12c0a9eeb Move special DDP handling for closing a connection into a new
handle_ddp_close() function in t4_ddp.c as the logic is similar
to handle_ddp_data().  This allows all knowledge of the special
DDP mbufs to be private to t4_ddp.c as well.
2015-03-16 15:56:06 +00:00
Navdeep Parhar
db8bcd1b21 cxgbe/tom: allocate page pod addresses instead of ppod#.
MFC after:	2 weeks
2015-01-07 06:20:33 +00:00
Navdeep Parhar
f8c479085f cxgbe/tom: use vmem(9) as the DDP page pod allocator.
MFC after:	1 month
2015-01-06 01:30:32 +00:00
Navdeep Parhar
a7570ee305 Move KTR_CXGBE from t4_tom.h to adapter.h so that the base if_cxgbe
code can use it too.

MFC after:	1 week
2014-12-12 21:54:59 +00:00
Navdeep Parhar
19abdd0654 cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by:	Hariprasad at chelsio dot com (original version)
MFC after:	2 weeks.
2014-10-07 21:26:22 +00:00
Navdeep Parhar
0fe982772d Some hooks in cxgbe(4) for the offloaded iSCSI driver.
(I'm committing this on behalf of my colleagues in the Storage team
at Chelsio).

Submitted by:	Sreenivasa Honnur <shonnur at chelsio dot com>
Sponsored by:	Chelsio Communications.
2014-07-24 18:39:08 +00:00
Navdeep Parhar
88bb82e511 Do not create a hardware IPv6 server if the listen address is not
in6addr_any and is not in the CLIP table either.  This fixes a reported
TOE+IPv6 NULL-dereference panic in do_pass_open_rpl().

While here, stop creating hardware servers for any loopback address.
It's just a waste of server tids.

MFC after:	1 week
2013-12-17 21:41:23 +00:00