Commit Graph

992 Commits

Author SHA1 Message Date
Conrad Meyer
1b0909d51a OpenCrypto: Convert sessions to opaque handles instead of integers
Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers.  Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().

Session handles are no longer integers with information encoded in various
high bits.  Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.

Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface.  Discard existing session tracking as much as
possible (quick pass).  There may be additional code ripe for deletion.

Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface.  The conversion is largely mechnical.

The change is documented in crypto.9.

Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .

No objection from:	ae (ipsec portion)
Reported by:	jhb
2018-07-18 00:56:25 +00:00
Navdeep Parhar
069262a734 Fix vertical whitespace nit in cxgbe. 2018-07-10 06:09:25 +00:00
Navdeep Parhar
82df14c3ab cxgbe(4): Add a sysctl to report the chip's microprocessor's load
averages.  This works with debug or custom firmwares only.

sysctl dev.<nexus>.<instance>.loadavg
sysctl dev.t6nex.0.loadavg

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-07-10 03:03:10 +00:00
Navdeep Parhar
4d96a1b772 cxgbe(4): Assume that any unknown flash on the card is 4MB and has 64KB
sectors, instead of refusing to attach to the card.

Submitted by:	Casey Leedom @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-07-06 19:33:58 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Navdeep Parhar
d207040226 cxgbe/cxgbei: Fix harmful typo in the iSCSI offload driver.
Reported by:	gcc8 (via mmacy@)
MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-06-27 14:29:13 +00:00
Navdeep Parhar
af8854fdc1 cxgbe(4): Do not leak the filters in the hashfilter table on module
unload.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-06-27 01:51:17 +00:00
Navdeep Parhar
b605d9cd51 cxgbe(4): Some mailbox commands require access to the Tx pipeline and
can time out if it's backed up due to a non-stop deluge of PAUSE frames
from a misbehaving peer.  Detect this situation and toggle MPS TxEn
to allow forward progress.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2018-06-19 00:50:27 +00:00
Navdeep Parhar
0afe96c7bf cxgbe(4): Add a hw.cxgbe.starve_fl sysctl that can be used to starve the
freelists of netmap receive queues.  This is primarily to test various
congestion scenarios in the chip.

Sponsored by:	Chelsio Communications
2018-06-15 23:42:22 +00:00
Navdeep Parhar
31f494cdf5 cxgbe(4): Track the number of received frames separately from the number
of descriptors processed.  Add the ability to gather a certain maximum
number of frames in the driver's rx before waking up netmap rx.  If
there aren't enough frames then netmap rx will be woken up as usual.

hw.cxgbe.nm_rx_nframes

Sponsored by:	Chelsio Communications
2018-06-15 21:23:03 +00:00
Navdeep Parhar
6ddad9de86 cxgbe(4): sysctls to display the local and intr CPUs for the adapter.
The driver assumes the list can change (even though it does't right now)
and queries it every time the sysctl runs.

sysctl dev.<nexus>.<inst>.local_cpus
sysctl dev.<nexus>.<inst>.intr_cpus

sysctl dev.t6nex.0.local_cpus
sysctl dev.t6nex.0.intr_cpus

Sponsored by:	Chelsio Communications
2018-06-15 18:04:44 +00:00
Navdeep Parhar
2ea5b0f54a cxgbe(4): Catch up with recent changes in the kernel -- it no longer
holds non-sleepable locks around any of the driver ioctls.

Sponsored by:	Chelsio Communications
2018-06-14 01:27:35 +00:00
Navdeep Parhar
1bb577b4c2 cxgbe(4): Remove homemade version of htobe32 from the driver.
It was needed only for ia64 where it was implemented as a call to
bswapXX, which was always a real function.  htobeXX with a constant
argument is calculated at compile-time everywhere else.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-06-12 06:46:03 +00:00
Navdeep Parhar
c27fcc70cc cxgbe(4): Include full duplex mediaopt in media that can be reported as
active.  Always report full duplex in active media.

Sponsored by:	Chelsio Communications
2018-06-01 16:46:29 +00:00
Navdeep Parhar
b9330ed7a2 cxgbe(4): Retire an old check. 2018-06-01 01:05:34 +00:00
Navdeep Parhar
2c87bdc706 cxgbe(4): Add support for SMAC-rewriting filters.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-05-31 21:56:57 +00:00
Navdeep Parhar
2dae2a7487 cxgbe(4): Add code to deal with the chip's source MAC table (aka SMT).
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-05-31 21:31:08 +00:00
Navdeep Parhar
1e3e6b634e cxgbe(4): Use ifm for ifmedia just like the rest of the kernel.
No functional change.
2018-05-31 02:22:40 +00:00
Navdeep Parhar
7cff4fd2d7 cxgbe(4): Implement ifm_change callback.
Sponsored by:	Chelsio Communications
2018-05-31 02:10:50 +00:00
Navdeep Parhar
56226f5673 cxgbe(4): Consider all supported speeds when building the ifmedia list
for a port.  Fix other related issues while here:
- Require port lock for access to link_config.
- Allow 100Mbps operation by tracking the speed in Mbps.  Yes, really.
- New port flag to indicate that the media list is immutable.  It will
  be used in future refinements.

This also fixes a bug where the driver reports incorrect media with
recent firmwares.

MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-30 22:36:09 +00:00
Navdeep Parhar
c3fce948fb cxgbe(4): Suppress a warning about code that is used only with options
RATELIMIT.

Reported by:	mmacy@
2018-05-25 18:57:41 +00:00
Navdeep Parhar
475d42db4a cxgbe(4): Report IFCAP_TXRTLMT to kernels built with RATELIMIT if the
firmware has provisioned resources for this feature.

Sponsored by:	Chelsio Communications
2018-05-24 10:55:26 +00:00
Navdeep Parhar
786099de5e cxgbe(4): Data path for rate-limited tx.
This is hardware support for the SO_MAX_PACING_RATE sockopt (see
setsockopt(2)), which is available in kernels built with "options
RATELIMIT".

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2018-05-24 10:18:14 +00:00
Navdeep Parhar
c90a8cf80a cxgbe/t4_tom: ABORT_RPL_RSS is a shared CPL and t4_tom shouldn't remove
the global handler when it's being unloaded.
2018-05-24 08:32:02 +00:00
Navdeep Parhar
9c707b3287 cxgbe(4): Make FW4_ACK a shared CPL. ETHOFLD in the base driver will
use it for per-flow rate limiting.

Sponsored by:	Chelsio Communications
2018-05-24 08:21:43 +00:00
Navdeep Parhar
1dd95f641e cxgbe(4): Fix range checks in is_etid. 2018-05-24 08:02:11 +00:00
Navdeep Parhar
a6a8ff351d cxgbe(4): Slightly simpler needs_<foo> functions. 2018-05-24 07:38:46 +00:00
Navdeep Parhar
2e09fe9116 cxgbe(4): Make sure that the egress queue's cidx is updated periodically
when the driver is writing WRs using start_wrq_wr/commit_wrq_wr all the
time.

Sponsored by:	Chelsio Communications
2018-05-24 06:44:06 +00:00
Navdeep Parhar
80259b6c12 cxgbe(4): Only valid filters are expected to have a valid tid. 2018-05-22 16:23:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
John Baldwin
24ddd0ec9c Be more robust against garbage input on a TOE TLS TX socket.
If a socket is closed or shutdown and a partial record (or what
appears to be a partial record) is waiting in the socket buffer,
discard the partial record and close the connection rather than
waiting forever for the rest of the record.

Reported by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-05-18 19:09:11 +00:00
Navdeep Parhar
67e071128d cxgbe(4): Implement ifnet callbacks that deal with send tags.
An etid (ethoffload tid) is allocated for a send tag and it acquires a
reference on the traffic class that matches the send parameters
associated with the tag.

Sponsored by:	Chelsio Communications
2018-05-18 06:09:15 +00:00
Navdeep Parhar
d88a0442bb cxgbe(4): Fix s->neq miscalculation in r333698. 2018-05-17 06:04:50 +00:00
Navdeep Parhar
eff62dba61 cxgbe(4): Allocate offload Tx queues when a card has resources
provisioned for NIC_ETHOFLD and the kernel has option RATELIMIT.

It is possible to use the chip's offload queues for normal NIC Tx and
not just TOE Tx.  The difference is that these queues support out of
order processing of work requests and have a per-"flowid" mechanism for
tracking credits between the driver and hardware.  This allows Tx for
any number of flows bound to different rate limits to be submitted to a
single Tx queue and the work requests for slow flows won't cause HOL
blocking for the rest.

Sponsored by:	Chelsio Communications
2018-05-17 01:42:18 +00:00
Navdeep Parhar
93c0bfb85b cxgbe(4): Add NIC_ETHOFLD to the NIC capabilities allowed by the driver
by default.

This is the first of a series of commits that will add support for
RATELIMIT kernel option to the base if_cxgbe driver, for use with
ordinary NIC traffic "flows".  RATELIMIT is already supported by t4_tom
for the fully-offloaded TCP connections that it handles.

Sponsored by:	Chelsio Communications
2018-05-17 00:52:48 +00:00
Navdeep Parhar
47ae7a7e4f cxgbe(4): Fall back to a failsafe configuration built into the firmware
if an error is reported while pre-processing the configuration file that
the driver attempted to use.

Also, allow the user to explicitly use the built-in configuration with
hw.cxgbe.config_file="built-in"

MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-16 17:55:16 +00:00
Navdeep Parhar
b6ce1d5b1d cxgbe(4): Add support for two more flash parts.
Obtained from:	Chelsio Communications
MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-15 22:26:09 +00:00
Navdeep Parhar
40f242e440 cxgbe(4): Claim some more T5 and T6 boards.
MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-15 21:54:59 +00:00
Navdeep Parhar
b3daa684d8 cxgbe(4): Filtering related features and fixes.
- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by:	Chelsio Communications
2018-05-15 04:24:38 +00:00
Navdeep Parhar
f348cdad1a cxgbe(4): Add fields to support configuration of hardware NAT and
swapmac (SMAC/DMAC switcheroo) from userspace.

Sponsored by:	Chelsio Communications
2018-05-10 20:39:04 +00:00
Navdeep Parhar
f7a203bc21 cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul.  Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-10 06:33:54 +00:00
Navdeep Parhar
5174205de5 cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by:	Chelsio Communications
2018-05-10 00:04:14 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Navdeep Parhar
b6f2c452cb cxgbe(4): Update all firmwares to 1.19.1.0.
These firmwares and the following list of changes are from the public
ChelsioUwire-3.7.1.0 release.

T6 Firmware
================================================================================
Version : 1.19.1.0
Date    : 04/23/2018
================================================================================

Fixes
-----

BASE:
- Fixed traffic stall when rate-limit is modified while running traffic.
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.

ETH:
- Exit Auto-Negotiation if we don't receive base page from peer within 10s.
  This fixes some cases where in we keep on restarting auto negotiation without
  ever exiting, resulting in link failure.
- Fixes an issue where VF packets counter were not increasing if VF packets
  coalesced WR is used by driver.

OFLD:
- Kernel and user mode NVMEoF performance enhancements.

FOiSCSI:
- Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.

================================================================================
Version : 1.18.9.0
Date    : 03/27/2018
================================================================================

Fixes
-----

BASE:
- For Ethernet frames less than 64B, pad them with zero bytes as per IEEE spec
  (RFC 894).
- Added a new parameter iqtype to FW_IQ_CMD to identify the ingress NIC or offload
  queues. This fixes an issue where driver was receiving interrupt with no new
  messages in queue.
- FW_PARAMS_CMD processes all the valaid paramaters and returns value 0UL for
  any unknown parameter.

OFLD:
- Fixes connection failure during SRQ reuse.
- Fixes incorrect cqe in case of WRITE with immediate operation.

FOiSCSI:
- Fixes a fw crash when wrong node-id is passed to FW_FOISCSI_CTRL_WR.

FOFCoE:
- Fixes a fw hang while creating NPIV.

Enhancements
------------

ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.

================================================================================
Version : 1.18.4.0
Date    : 02/28/2018
================================================================================

Fixes
-----

BASE:
- Fixed Rate limiting not working for 101Mbps<=rate limit<=163Mbps range.
- Fixed starting more than 32 VMs on PF4 causing firmware hang.

ETH:
- Fixed link failure due to FEC mismatch with optics.
- Fixed link failure with link toggle stress tests.
- Only BaseR FEC is supported for 50G.
- Fixed a bug in next page handling which sometimes causes link down.
- Fixed port down due to failre to read eeprom contents of some modules.
- Fixed a bug causing adapter to fail with spider configuration.

FOiSCSI:
- Fixed a bug causing login failure when connecting to multiple targets.

Enhancements
------------

BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
  the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).

ETH:
- Added support for user to contol pause negotiation during auto negotiation.

FOiSCSI:
- Added a new facility to redirect few fw events to offload rx queue
  (based on driver's configration)
- Driver can ignore providing ipv6 prefix len during ipv6 address configuration.

================================================================================
Version : 1.17.14.0
Date    : 12/27/2017
================================================================================

FIXES
-----

BASE:
- Fixed an FLR failure during simulteneous power up of VM.
- Fixed an issue in vlan acl which was limiting vlan range to 1024.

ETH:
- Enabled RS-FEC for 25G active copper cable and 25GBASE-SR.
- When auto negotiation is enabled, final pause settings are resolved
  based on local and peer pause settings.
- Handle NACK for an I2C access.

OFLD
- Fixed rdma connection cleanup in SO adpater.
- Fixed rdma connections during read invalidate.
- Fixed the crash when invalid BW rate is passed to fw.
- Fixed the traffic hang when BW allocation is changed from switch during traffic.

FOFCoE:
- Fixed an issue where initiator remains logged-in even after LLDP is disabled
  on switch.

ENHANCEMENTS
------------

BASE:
- Added support for 248 VFs.
- Added fw driver periodic calibration for MC.

ETH:
- Added XLAUI port type support.
- Added raw mac entry deletion support (FW_VI_MAC_ID_BASED_FREE).

OFLD:
- Inline IPSec support added (flag F_FW_ULPTX_WR_DATA indicates the inline
  IPSec WR).
- New work request FW_RI_RDMA_WRITE_CMPL_WR (write with completion) added to

T5 Firmware
================================================================================
Version : 1.19.1.0
Date    : 04/23/2018
================================================================================

Fixes
-----

BASE:
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.

ETH:
- Fixes an issue where VF packets counter were not increasing if VF packets
  coalesced WR is used by driver.

OFLD:
- Fixes an issue where fw hangs if max traffic rate passed is 0.

FOiSCSI:
-  Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.

================================================================================
Version : 1.18.9.0
Date    : 03/27/2018
================================================================================

Fixes
-----

BASE:
- For Ethernet frames less than 64B, pad them with zero bytes as per IEEE spec
  (RFC 894).
- Added a new parameter iqtype to FW_IQ_CMD to identify the ingress NIC or offload
  queues. This fixes an issue where driver was receiving interrupt with no new
  messages in queue.

ETH:
- Pad the Ethernet packets of size less than 64B with zeros. This fixes the
  incorrect checksum generation of packets less then 64B.

FOiSCSI:
- Fixes a fw crash when wrong node-id is passed to FW_FOISCSI_CTRL_WR.

FOFCoE:
- Fixes a fw hang while creating NPIV.

Enhancements
------------

ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.

================================================================================
Version : 1.18.4.0
Date    : 02/28/2018
================================================================================

Fixes
-----

BASE:
- Fixed starting more than 32 VMs on PF4 causing firmware hang.

FOiSCSI:
- Fixed a bug causing login failure when connecting to multiple targets.

Enhancements
------------

BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
  the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).

ETH:
- Added support for user to contol pause negotiation during auto negotiation.

FOiSCSI:
- Added a new facility to redirect few fw events to offload rx queue
  (based on driver's configration)
- Driver can ignore providing ipv6 prefix len during ipv6 address configuration.

================================================================================
Version : 1.17.14.0
Date    : 12/27/2017
================================================================================

FIXES
-----

BASE:
- Fixed an issue in vlan acl which was limiting vlan range to 1024.

ETH:
- Corrected lane inversion logic.
- Fixed improper LED behavior in T580 cards.
- When auto negotiation is enabled, final pause settings are resolved
  based on local and peer pause settings.
- Handle NACK for an I2C access.

OFLD
- Fixed rdma connections during read invalidate.

FOiSCSI:
- Fixed a connections hang when link is toggled frequently.

FOFCoE:
- Fixed an issue where initiator remains logged-in even after LLDP is disabled
  on switch.

ENHANCEMENTS
------------

BASE:
- Added support for 124 VFs.

ETH:
- Added XLAUI port type support.
- Added raw mac entry deletion support (FW_VI_MAC_ID_BASED_FREE).

OFLD:
- New work request FW_RI_RDMA_WRITE_CMPL_WR (write with completion) added to
  optimize NVMEoF write.

T4 Firmware
================================================================================
Version : 1.19.1.0
Date    : 04/23/2018
================================================================================

Fixes
-----

BASE:
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.

FOiSCSI:
-  Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.

================================================================================
Version : 1.18.9.0
Date    : 03/27/2018
================================================================================

Fixes
-----

BASE:
- Added a new paramter iqtype to FW_IQ_CMD to identify the ingress NIC or
  offload queues. This fixes an issue where driver was receiving interrupt with
  no new messages in queue.

FOFCoE:
- Fixes a fw hang while creating NPIV.

Enhancements
------------

ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.

================================================================================
Version : 1.18.4.0
Date    : 02/28/2018
================================================================================

Enhancements
------------

BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
  the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).

================================================================================
Version : 1.17.14.0
Date    : 12/27/2017
================================================================================

FIXES
-----

BASE:
- Fixed an issue in vlan acl which was limiting vlan range to 1024.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-05 20:16:08 +00:00
Navdeep Parhar
e1320420d5 cxgbe(4): Move all TCAM filter code into a separate file.
Sponsored by:	Chelsio Communications
2018-05-01 20:17:22 +00:00
Andrew Gallatin
a378e59420 Optionally panic when cxgbe encounters a fatal error
Sometimes it is better to panic than to leave a machine
unreachable.

Reviewed by:	np
Sponsored by:	Netflix
2018-05-01 15:33:21 +00:00
Navdeep Parhar
faf6d96b45 cxgbe(4): Destroy the cdev before disabling interrupts in driver detach.
Filter work requests are submitted in the nexus cdev's ioctl which then
blocks waiting for a reply.  If driver detach runs in this state and
disables interrupts the ioctl will never complete and detach will hang
in destroy_cdev.

Sponsored by:	Chelsio Communications
2018-05-01 14:59:38 +00:00
Navdeep Parhar
111638bf68 cxgbe(4): Convert ACT_OPEN_RPL to a shared CPL.
Reserve 3b in the 14b atid to identify the owner and use it to dispatch
the CPL.  This allows all CPLs that use an atid to be used as shared
CPLs, although ACT_OPEN_RPL is the only one being converted in this
revision.

Sponsored by:	Chelsio Communications
2018-04-30 21:47:30 +00:00
Navdeep Parhar
f8bc08fa57 cxgbe/t4_tom: Use appropriate macros instead of magic math while
constructing the atid of an active open work request.

Sponsored by:	Chelsio Communications
2018-04-30 17:33:44 +00:00
Navdeep Parhar
4535e8046f cxgbe(4): Use opaque cookies or tid range-checks to determine the
intended recipient of a CPL when it can't be determined solely from the
opcode.  Retire the per-queue handlers for such CPLs in favor of the new
scheme.

Sponsored by:	Chelsio Communications
2018-04-30 15:18:38 +00:00
John Baldwin
4217c685b1 Use the correct key address when renegotiating the transmit key.
Previously, get_keyid() was returning the address of the receive key
instead of the transmit key when renegotiating the transmit key.  This
could either hang the card (if a connection was only offloading TLS TX
and thus had a receive key address of -1) or cause the connection to
fail by overwriting the wrong key (if both RX and TX TLS were
offloaded).

Submitted by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-04-27 17:20:23 +00:00
Navdeep Parhar
8896672a77 cxgbe(4): Move release_tid to the base NIC driver for future consumers.
Sponsored by:	Chelsio Communications.
2018-04-26 22:04:21 +00:00
Navdeep Parhar
3747c1ffc7 cxgbe(4): Break up alloc_tid_tabs and move the atid routines to the base
NIC driver.  The atid services will be used by new features (hashfilters
and inline TLS) that do not involve TOE.

Sponsored by:	Chelsio Communications
2018-04-26 19:00:35 +00:00
Navdeep Parhar
8aa1c1d8b4 cxgbe(4): Fix bugs in the handling of COP rules that match on VLAN tag.
Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2018-04-19 18:10:44 +00:00
Navdeep Parhar
1131c927c4 cxgbe(4): Add support for Connection Offload Policy (aka COP).
COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload.  t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface.  The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so.  The general format
of a rule is: [socket-type] expr => settings

Example.  See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match.  There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by:	Chelsio Communications
2018-04-14 19:07:56 +00:00
Vincenzo Maffione
2ff91c175e netmap: align codebase to the current upstream (commit id 3fb001303718146)
Changelist:
    - Turn tx_rings and rx_rings arrays into arrays of pointers to kring
      structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib,
      vtnet and ptnet drivers to cope with the change.
    - Generalize the nm_config() callback to accept a struct containing many
      parameters.
    - Introduce NKR_FAKERING to support buffers sharing (used for netmap
      pipes)
    - Improved API for external VALE modules.
    - Various bug fixes and improvements to the netmap memory allocator,
      including support for externally (userspace) allocated memory.
    - Refactoring of netmap pipes: now linked rings share the same netmap
      buffers, with a separate set of kring pointers (rhead, rcur, rtail).
      Buffer swapping does not need to happen anymore.
    - Large refactoring of the control API towards an extensible solution;
      the goal is to allow the addition of more commands and extension of
      existing ones (with new options) without the need of hacks or the
      risk of running out of configuration space.
      A new NIOCCTRL ioctl has been added to handle all the requests of the
      new control API, which cover all the functionalities so far supported.
      The netmap API bumps from 11 to 12 with this patch. Full backward
      compatibility is provided for the old control command (NIOCREGIF), by
      means of a new netmap_legacy module. Many parts of the old netmap.h
      header has now been moved to netmap_legacy.h (included by netmap.h).

Approved by:	hrs (mentor)
2018-04-12 07:20:50 +00:00
Navdeep Parhar
de93353248 cxgbe(4): Always display an error message if SIOCSIFFLAGS will leave
IFF_UP and IFF_DRV_RUNNING out of sync.  ifhwioctl in the kernel pays no
attention to the return code from the driver ioctl during SIOCSIFFLAGS
so these messages are the only indication that the ioctl was called but
failed.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-04-04 22:52:24 +00:00
Navdeep Parhar
f8fea0d90e cxgbe: Implement tcp_info handler for connections handled by t4_tom.
The TCB is read using a memory window right now.  A better alternate to
get self-consistent, uncached information would be to use a GET_TCB
request but waiting for a reply from hw while holding non-sleepable
locks is quite inconvenient.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14817
2018-04-03 01:22:15 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
John Baldwin
edf95feba4 Use the offload transmit queue to set flags on TLS connections.
Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.

However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue.  This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.

Submitted by:	Harsh Jain @ Chelsio (original version)
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14871
2018-03-27 20:54:57 +00:00
Navdeep Parhar
d57241d2e7 cxgbe(4): Always initialize requested_speed to a valid value.
This fixes an avoidable EINVAL when the user tries to disable AN after
the port is initialized but l1cfg doesn't have a valid speed to use.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-03-24 01:07:58 +00:00
Navdeep Parhar
1b4df78b42 cxgbe(4): Do not read MFG diags information from custom boards.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-03-22 04:42:29 +00:00
Navdeep Parhar
5401e09688 cxgbe(4): Tunnel congestion drops on a port should be cleared when the
stats for that port are cleared.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-03-22 02:04:57 +00:00
John Baldwin
6b02bb1aa7 Fix the check for an empty send socket buffer on a TOE TLS socket.
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero.  This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.

Reported by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-03-14 20:49:51 +00:00
John Baldwin
02d2bcfaba Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function.  AIO special
  handling only needs to be disabled when a TOE socket is actively doing
  TLS offload on transmit.  The TOE socket's mode (which affects receive
  operation) doesn't matter, so remove the check for the socket's mode and
  only check if a TOE socket has TLS transmit keys configured to determine
  if an AIO write request should fall back to the normal socket handling
  instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c.  It is not used in critical paths,
  so inlining isn't that important.  Change return type to bool while here.

Sponsored by:	Chelsio Communications
2018-03-14 20:46:25 +00:00
John Baldwin
1e9538d253 Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections.  Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing.  Currently
these socket options are implemented as TCP options in the vendor
specific range.  A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit.  TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface.  Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key.  This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library).  Receive offload can only be used with TOE sockets using the
TLS mode.  The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers.  Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket.  Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value.  A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library.  Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs.  If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request.  This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages.  The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb.  The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors.  The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host.  As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket.  In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase.  Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS).  The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k).  To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by:	np (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14529
2018-03-13 23:05:51 +00:00
John Baldwin
9689995d23 Simplify error handling in t4_tom.ko module loading.
- Change t4_ddp_mod_load() to return void instead of always returning
  success.  This avoids having to pretend to have proper support for
  unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
  The module handling code in the kernel invokes MOD_UNLOAD on a module
  whose MOD_LOAD fails with an error already.

Reviewed by:	np (part of a larger patch)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-03-13 21:42:38 +00:00
Hans Petter Selasky
1456d97c01 Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:34:52 +00:00
John Baldwin
6b554d26ed Move #include for rijndael.h out of x86-specific region.
The #include was added inside of the conditional by accident and the lack
of it broke non-x86 builds.

Reported by:	lwhsu (jenkins), andrew
2018-02-27 17:51:58 +00:00
John Baldwin
db631975fe Don't overflow the ipad[] array when clearing the remainder.
After the auth key is copied into the ipad[] array, any remaining bytes
are cleared to zero (in case the key is shorter than one block size).
The full block size was used as the length of the zero rather than the
size of the remaining ipad[].  In practice this overflow was harmless as
it could only clear bytes in the following opad[] array which is
initialized with a copy of ipad[] in the next statement.

Sponsored by:	Chelsio Communications
2018-02-26 22:17:27 +00:00
John Baldwin
52f8c52677 Move ccr_aes_getdeckey() from ccr(4) to the cxgbe(4) driver.
This routine will also be used by the TOE module to manage TLS keys.

Sponsored by:	Chelsio Communications
2018-02-26 22:12:31 +00:00
John Baldwin
198729ea7d Fetch TLS key parameters from the firmware.
The parameters describe how much of the adapter's memory is reserved for
storing TLS keys.  The 'meminfo' sysctl now lists this region of adapter
memory as 'TLS keys' if present.

Sponsored by:	Chelsio Communications
2018-02-26 21:56:06 +00:00
John Baldwin
6619d9fb70 Bring in additional constants and message fields for TLS-related messages.
Sponsored by:	Chelsio Communications
2018-02-22 02:02:31 +00:00
John Baldwin
125d42fe81 Move DDP PCB state into a helper structure.
This consolidates all of the DDP state in one place.  Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections.  This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-02-22 01:50:30 +00:00
Wojciech Macek
19a5b68236 CXGBE: implement prefetch on non-Intel architectures
Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           np, pdk@semihalf.com
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14452
2018-02-21 08:05:56 +00:00
Navdeep Parhar
7cb7c6e37a Catch up with the removal of nktr_slot_flags from upstream netmap. No
functional impact intended.

Submitted by:	Vincenzo Maffione <v.maffione@gmail.com>
2018-02-20 21:42:45 +00:00
Navdeep Parhar
919da4ceff iw_cxgbe: Remove declaration of a function that no longer exists. 2018-02-07 20:13:08 +00:00
John Baldwin
f17985319d Export tcp_always_keepalive for use by the Chelsio TOM module.
This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Reviewed by:	bz, emaste
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14129
2018-01-30 23:01:37 +00:00
Navdeep Parhar
ea3c5e5d04 cxgbe(4): Accept old names of a couple of tunables. 2018-01-26 00:45:40 +00:00
Navdeep Parhar
39e7cb038b cxgbe(4): Do not display harmless warning in non-debug builds.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-01-26 00:03:14 +00:00
John Baldwin
c0154062c7 Store IV in output buffer in GCM software fallback when requested.
Properly honor the lack of the CRD_F_IV_PRESENT flag in the GCM
software fallback case for encryption requests.

Submitted by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-01-24 20:16:48 +00:00
John Baldwin
2bc40b6ca9 Don't read or generate an IV until all error checking is complete.
In particular, this avoids edge cases where a generated IV might be
written into the output buffer even though the request is failed with
an error.

Sponsored by:	Chelsio Communications
2018-01-24 20:15:49 +00:00
John Baldwin
04043b3dcd Expand the software fallback for GCM to cover more cases.
- Extend ccr_gcm_soft() to handle requests with a non-empty payload.
  While here, switch to allocating the GMAC context instead of placing
  it on the stack since it is over 1KB in size.
- Allow ccr_gcm() to return a special error value (EMSGSIZE) which
  triggers a fallback to ccr_gcm_soft().  Move the existing empty
  payload check into ccr_gcm() and change a few other cases
  (e.g. large AAD) to fallback to software via EMSGSIZE as well.
- Add a new 'sw_fallback' stat to count the number of requests
  processed via the software fallback.

Submitted by:	Harsh Jain @ Chelsio (original version)
Sponsored by:	Chelsio Communications
2018-01-24 20:14:57 +00:00
John Baldwin
bf5b662033 Clamp DSGL entries to a length of 2KB.
This works around an issue in the T6 that can result in DMA engine
stalls if an error occurs while processing a DSGL entry with a length
larger than 2KB.

Submitted by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-01-24 20:13:07 +00:00
John Baldwin
f7b61e2fcc Fail crypto requests when the resulting work request is too large.
Most crypto requests will not trigger this condition, but a request
with a highly-fragmented data buffer (and a resulting "large" S/G
list) could trigger it.

Sponsored by:	Chelsio Communications
2018-01-24 20:12:00 +00:00
John Baldwin
5929c9fb13 Don't discard AAD and IV output data for AEAD requests.
The T6 can hang when processing certain AEAD requests if the request
sets a flag asking the crypto engine to discard the input IV and AAD
rather than copying them into the output buffer.  The existing driver
always discards the IV and AAD as we do not need it.  As a workaround,
allocate a single "dummy" buffer when the ccr driver attaches and
change all AEAD requests to write the IV and AAD to this scratch
buffer.  The contents of the scratch buffer are never used (similar to
"bogus_page"), and it is ok for multiple in-flight requests to share
this dummy buffer.

Submitted by:	Harsh Jain @ Chelsio (original version)
Sponsored by:	Chelsio Communications
2018-01-24 20:11:00 +00:00
John Baldwin
acaabdbbee Reject requests with AAD and IV larger than 511 bytes.
The T6 crypto engine's control messages only support a total AAD
length (including the prefixed IV) of 511 bytes.  Reject requests with
large AAD rather than returning incorrect results.

Sponsored by:	Chelsio Communications
2018-01-24 20:08:10 +00:00
John Baldwin
020ce53af3 Always set the IV location to IV_NOP.
The firmware ignores this field in the FW_CRYPTO_LOOKASIDE_WR work
request.

Submitted by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-01-24 20:06:02 +00:00
John Baldwin
d3f25aa152 Always store the IV in the immediate portion of a work request.
Combined authentication-encryption and GCM requests already stored the
IV in the immediate explicitly.  This extends this behavior to block
cipher requests to work around a firmware bug.  While here, simplify
the AEAD and GCM handlers to not include always-true conditions.

Submitted by:	Harsh Jain @ Chelsio
Sponsored by:	Chelsio Communications
2018-01-24 20:04:08 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
26c1d774b5 dev: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
2018-01-13 22:30:30 +00:00
Navdeep Parhar
e62fb4b142 cxgbe/iw_cxgbe: Remove duplicates to fix compilation with recent gcc. 2018-01-13 00:04:11 +00:00
Wojciech Macek
f078ecf642 CXGBE: fix get_filt to be endianness-aware
Unconditional 32-bit shift is not endianness-safe.
Modify the logic to work both on LE and BE.

Submitted by:          Wojciech Macek <wma@freebsd.org>
Reviewed by:           np
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13102
2018-01-11 09:17:02 +00:00
Navdeep Parhar
218c5b2115 cxgbe(4): Add a knob to enable/disable PCIe relaxed ordering. Disable it by
default when running on Intel CPUs.

This is a crude fix for the performance issues alluded to in these Linux commits:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=87e09cdec4dae08acdb4aa49beb793c19d73e73e
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a99b646afa8a02571ea298bedca6592d818229cd

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-01-03 19:24:57 +00:00
Navdeep Parhar
348694dada cxgbe(4): Reduce duplication by consolidating minor variations of the
same code into a single routine.

Sponsored by:	Chelsio Communications
2017-12-29 02:30:21 +00:00
Navdeep Parhar
adbb3637ec cxgbe/iw_cxgbe: Fix iWARP over VLANs (catch up with r326169).
Submitted by:	KrishnamRaju ErapaRaju @ Chelsio
Sponsored by:	Chelsio Communications
2017-12-27 22:44:50 +00:00
Navdeep Parhar
f549e3521d cxgbe(4): Do not forward interrupts to queues with freelists. This
leaves the firmware event queue (fwq) as the only queue that can take
interrupts for others.

This simplifies cfg_itype_and_nqueues and queue allocation in the driver
at the cost of a little (never?) used configuration.  It also allows
service_iq to be split into two specialized variants in the future.

MFC after:	2 months
Sponsored by:	Chelsio Communications
2017-12-22 19:10:19 +00:00
Navdeep Parhar
426b6bd53c cxgbe(4): Read the MFG diags version from the VPD and make it available
in the sysctl MIB.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-12-21 15:19:43 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Hans Petter Selasky
82725ba9bf Merge ^/head r325999 through r326131. 2017-11-23 14:28:14 +00:00
Navdeep Parhar
8ee789e9fa cxgbe(4): Fix unsafe mailbox access in cudbg.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-11-21 01:18:58 +00:00
Navdeep Parhar
8e628d6d65 cxgbe(4): Add a custom board to the device id list.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-11-20 19:50:48 +00:00
Hans Petter Selasky
937d37fc6c Merge ^/head r325842 through r325998. 2017-11-19 12:36:03 +00:00
Navdeep Parhar
879462e916 cxgbe(4): Add core Vdd to the sysctl MIB.
Sponsored by:	Chelsio Communications
2017-11-17 23:22:39 +00:00
Navdeep Parhar
d3d5e96893 cxgbe(4): Remove rsrv_noflowq from intrs_and_queues structure as it does
not influence or get affected by the number of interrupts or queues.

Sponsored by:	Chelsio Communications
2017-11-16 02:42:37 +00:00
Navdeep Parhar
d406c47d71 cxgbe(4): Sanitize t4_num_vis during MOD_LOAD like all other t4_*
tunables.  Add num_vis to the intrs_and_queues structure as it affects
the number of interrupts requested and queues created.  In future
cfg_itype_and_nqueues might lower it incrementally instead of going
straight to 1 when enough interrupts aren't available.

Sponsored by:	Chelsio Communications
2017-11-16 01:33:53 +00:00
Navdeep Parhar
8c61c6bbda cxgbe(4): Combine all _10g and _1g tunables and drop the suffix from
their names.  The finer-grained knobs weren't practically useful.

Sponsored by:	Chelsio Communications
2017-11-15 23:48:02 +00:00
Hans Petter Selasky
55b1c6e7e4 Merge ^/head r325663 through r325841. 2017-11-15 11:28:11 +00:00
Wojciech Macek
ec7f8d58b9 CXGBE: fix big-endian behaviour
The setbit/clearbit pair casts the bitfield pointer
to uint8_t* which effectively treats its contents as
little-endian variable. The ffs() function accepts int as
the parameter, which is big-endian. Use uint8_t here to
avoid mismatch, as we have only 4 doorbells.

Submitted by:          Wojciech Macek <wma@freebsd.org>
Reviewed by:           np
Obtained from:         Semihalf
Sponsored by:          QCM Technologies
Differential revision: https://reviews.freebsd.org/D13084
2017-11-15 06:45:33 +00:00
Navdeep Parhar
f93039d9de Fix iw_cxgbe build in the projects branch.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
2017-11-14 07:04:06 +00:00
Navdeep Parhar
28fe4b67af cxgbe(4): Excluce mdi from the check against port capabilities.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-11-10 20:30:10 +00:00
Hans Petter Selasky
f819030092 Merge ^/head r325505 through r325662. 2017-11-10 14:46:50 +00:00
Navdeep Parhar
0cf77199b2 cxgbe(4): Do not request settings not supported by the port.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-11-09 15:35:51 +00:00
Navdeep Parhar
5c2bacde58 Update the iw_cxgbe bits in the projects branch.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2017-11-07 23:52:14 +00:00
Navdeep Parhar
5bcae8ddfa cxgbe(4): Read the MPS buffer group map from the firmware as it could be
different from hardware defaults.  The congestion channel map, which is
still fixed, needs to be tracked separately now.  Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-10-24 05:41:48 +00:00
Hans Petter Selasky
d05554bb99 The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore

Reviewed by:		np @
Differential Revision:	https://reviews.freebsd.org/D12563
Sponsored by:		Mellanox Technologies
MFC after:		3 days
2017-10-20 08:20:15 +00:00
Ryan Libby
b541ba195c cxgbe: delete now-redundant vnet decls
r324539 gathered some vnet decls into netinet/tcp_var.h, so that they
are now redundant in dev/cxgbe/tom/{t4_cpl_io.c,t4_ddp.c}. This triggers
gcc -Wredundant-decls.

Reviewed by:	np
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12674
2017-10-17 20:37:31 +00:00
Gleb Smirnoff
e8fd18f306 Shorten list of arguments to mbuf external storage freeing function.
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list.  Not all functions need the second
argument, some don't even need the first one.  The second argument
lives in next cache line, so not dereferencing it is a performance
gain.  This was discovered in sendfile(2), which will be covered by
next commits.

The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.

Reviewed by:	gallatin, kbowling
Differential Revision:	https://reviews.freebsd.org/D12615
2017-10-09 20:35:31 +00:00
Navdeep Parhar
714d3ee05b cxgbe(4): Update T6, T5, and T4 firmwares to 1.16.63.0.
Changes since 1.16.26.0 for all three firmwares are listed below.  This
list was obtained from the Release Notes of the Chelsio Unified Wire
v3.5.05 release for Linux.

T6 Firmware
++++++++++++
================================================================================
Version : 1.16.63.0
Date    : 09/29/2017
================================================================================

Fixes
-----

BASE:
- Fixed a fw crash when configured traffic rate limit is less than 10kbps.
- Fixed traffic rate limiting for smaller traffic rate value.

ETH:
- Fixed 40G link failure when interface is toggled.
- Fixed adapter crash when interface is toggled during traffic.
- Fixed 25G link failure when PEER only supports consortium mode autoneg
  for 25G.
- Fixed 100G optics link failure when cable is plugged in after bringing up
  the interface.
- Enable RS FEC as default if speed is 100G.
- Fixed DCBX configuration refresh failure.

OFLD
- Fixed 0B iWARP ingress read failure.
- Fixed iWARP SRQ reuse failure.

FOiSCSI:
- Fixed vlan interface ping failure.
- Fixed target discovery failures.
- Fixed mutual chap login failure.

================================================================================
Version : 1.16.59.0
Date    : 09/05/2017
================================================================================

FIXES
-----

BASE:
- Fixed fw crash caused by MC parity error in SO adapters.
- Generate Timer0Int interrupt if fw crashes due to unaligned access error. Host
  driver must look into PCIE_FW register to see if any fw fatal error has
  encountered. If PCIE_FW doesn't indicate any error then driver must ignore this
  interrupt.
- Fixed receive buffer threshold settings which was resulting in error frames on
  receive side.

ETH:
- Fixed an issue in connection traffic shaping when
  FLOWC_WR->FW_FLOWC_MNEM_SCHEDCLASS is not received in first WR on the connection.
- Fixed link failure when speed is changed from 10G-1G-10G due to incorrect flag
  check.
- Fixed improper LED behaviour for blink test and when traffic is running.
- Removed storage of previous fec settings from fw. Driver needs to pass the user
  settings whenever a new module is plugged in as fw resets these when a module is
  unplugged.

OFLD
- OVS offload: TP cache is flushed periodically to get the accuate filters stats
  (hit count).

ENHANCEMENTS
------------

BASE:
- Ring backbone feature added. New FW_PARAMS_PARAM_DEV_RING_BACKBONE param type
  added to query and enable ring backbone support.
- VNI support added for filtering. New entry_type FW_VI_MAC_TYPE_EXACTMAC_VNI
  added to FW_VI_MAC_CMD.
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer group
  mapping for the ports.
- FW_PARAMS_PARAM_DEV_TPCHMAP API added to read the port to channel mapping.
- HMA (Host memory access) support added. New FW_HMA_CMD and
  FW_PARAMS_PARAM_DEV_HMA_SIZE added to query and configure the HMA. It
  enables the memfree support (256 connections) for iwarp.
- PTP support enabled.

ETH:
- Added consortium mode 50G support.
- Added the ability to allow only selected speeds to be advertised during auto
  negotiation.
- Increased port capability from 16 to 32 bits to support more speeds.
  FW_PARAMS_PARAM_PFVF_PORT_CAPS32 added to query whether fw supports 16 or 32
  bit port capability.

OFLD:
- RDMA Write with immediate support added (iwarp 2.0 feature)
- FW_TLS_KEYCTX_TX_WR removed and security key management moved to driver.
- 256 offloaded connections support for iwarp on SO adapters.

iSCSI:
- New param FW_PARAMS_PARAM_DEV_PPOD_EDRAM added for iscsi ppod configuration
  in EDRAM (performance improvement).

FOiSCSI:
- iSCSI Command offload target support added.

FOFCoE:
- FCoE support enabled.

================================================================================
Version : 1.16.43.0
Date    : 05/05/2017
================================================================================

FIXES
-----

BASE:
- Fixed default DCB mode to AUTO.
- Fixed DCBX bugs when AUTO mode is configured in config file.
- Fixed an issue where even after removing PFC from switch, PFC wasn't getting
  reset.
- Fixed DDR3/DDR4 ECC errors.
- Fixed an FLR issue where FLR completion was going to host before FLR
  processing is finished in fw.

ETH:
- Fixed bug in writing multi-bytes using i2c interface.
- Fixed the link failure when optical cable is inserted into the QSA module
  after loading the driver.
- Fixed false link up when peer interface was brought down.
- Enabling RS FEC by default for 100Gbase-SR4 according to 802.3BJ standard.
- Fixed bugs related to negotiated fec based local/peer fec ability and request.
- Fixed auto-neg failure with few switches.
- T6 Performance improvement fixes.

OFLD
- Fixed an extra credit issue if FW_RI_TYPE_FINI is delayed in fw due to
  backpressure.
- Added a new queue type FW_IQ_TYPE_VF_CQ to handle the FW_PARAMS_PARAM_DMAQ*
  commands. queue type will be part of the FW_PARAMS_PARAM_DMAQ_IQ_INTIDX
  value. Used in guest RDMA (RDMA from VM/VF) usecase.
- T6 Crypto Coprocessor mode bug fixes.
- T6 Crypto TLS-inline mode bug fixes.

ENHANCEMENTS
------------

BASE:
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer
  group mapping for the ports.

ETH:
- Added broadcom consortium next page support for 25G CR.
  This can be enabled using flags=an_brcm option in the t6-config.txt file.
- Added spider mode support.
- Added support for 10G-BaseT converter sfp+ module.
- Added support for additional 25G/100G cables.
- Added support to enable/disable auto-neg using ethtool.

================================================================================
Version : 1.16.33.0
Date    : 02/24/2017
================================================================================

Fixes
-----

BASE:
- Fixed DDR4 uncorrectable errors.

ETH:
- Enabled link auto negotiation (AN) by default in config file.
- Added AN and FEC control api. Host driver and application can enable/disable
  AN and FEC.

ENHANCEMENTS
------------

BASE:
- Enabled High priorty filter.
- Added T6425 adapter support.

ETH:
- Added new workrequest ETH_TX_PKTS2_WR (see fw api document for more details).

================================================================================
Version : 1.16.29.0
Date    : 01/27/2017
================================================================================

FIXES
-----

BASE:
- Set multiple fec values only if AN is enabled in config file and when module
  is connected.
- Fixed intermittent DDR3/4 ECC errors.
- max number of ethctrl queue in VF set to 2 (reverted the last change
  because it causes problem in VF drivers).

ETH:
- Made devlog more verbose by printing cable information in redable form.
- Updated AN settings to work with more 25G/100G switches.
- Added support for more SFP28/QSFP28 cables.
- Fixed an issue of link going down after few hours of idle time.

OFLD:
- Fixed an issue in TLS which was causing fw crash on running TLS traffic.

FOiSCSI:
- Fixed the failure of PXE boot OS install on an iscsi lun.

ENHANCEMENTS
------------

OFLD:
- Added filtering support for NAT. New WR FW_FILTER2_WR and
  FW_PARAMS_PARAM_DEV_FILTER2_WR added for the same.
- Added RDMA guest mode (mode 3 or RDMA from VF) support.

================================================================================

T5 Firmware
++++++++++++
================================================================================
Version : 1.16.63.0
Date    : 09/29/2017
================================================================================

Fixes
-----

BASE:
- Fixed offload memory overcommit in case of SO adapter.

ETH:
- Fixed DCBX configuration refresh failure.

OFLD
- Fixed 0B iWARP ingress read failure.

FOiSCSI:
- Fixed vlan interface ping failure.

================================================================================
Version : 1.16.59.0
Date    : 09/05/2017
================================================================================

FIXES
-----

BASE:
- Fixed an FLR issue which was causing error when VF attached VM was powered on.

ETH:
- Fixed an issue in connection traffic shaping when
  FLOWC_WR->FW_FLOWC_MNEM_SCHEDCLASS is not received in first WR on the connection.
- Fixed link failure when speed is changed from 10G-1G-10G due to incorrect flag
  check.
- Fixed T580 link failure with few switches which take more time for
  establishing link.

ENHANCEMENTS
------------

BASE:
- Ring backbone feature added. New FW_PARAMS_PARAM_DEV_RING_BACKBONE param type
  added to query and enable ring backbone support.
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer group
  mapping for the ports.
- FW_PARAMS_PARAM_DEV_TPCHMAP API added to read the port to channel mapping.

FOiSCSI:
- iSCSI Command offload target support added.

================================================================================
Version : 1.16.43.0
Date    : 05/05/2017
================================================================================

FIXES
-----

BASE:
- Fixed default DCB mode to AUTO.
- Fixed DCBX bugs when AUTO mode is configured in config file.
- Fixed an issue where even after removing PFC from switch, PFC wasn't getting
  reset.

ETH:
- Fixed bug in writing multi-bytes using i2c interface.
- Fixed the link failure when optical cable is inserted into the QSA module
  after loading the driver.

OFLD
- Fixed an extra credit issue if FW_RI_TYPE_FINI is delayed in fw due to
  backpressure.
- Added a new queue type FW_IQ_TYPE_VF_CQ to handle the FW_PARAMS_PARAM_DMAQ*
  commands. queue type will be part of the FW_PARAMS_PARAM_DMAQ_IQ_INTIDX
  value. Used in guest RDMA (RDMA from VM/VF) usecase.

ENHANCEMENTS
------------

BASE:
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer
  group mapping for the ports.

================================================================================
Version : 1.16.33.0
Date    : 02/24/2017
================================================================================

ENHANCEMENTS
------------

ETH:
- Added new workrequest ETH_TX_PKTS2_WR (see fw api document for more details).

================================================================================
Version : 1.16.29.0
Date    : 01/27/2017
================================================================================

FIXES
-----

BASE:
- max number of ethctrl queue in VF set to 2 (reverted the last change
  because it causes problem in VF drivers).

FOiSCSI:
- Fixed the failure of PXE boot OS install on an iscsi lun.

ENHANCEMENTS
------------

OFLD:
- Added filtering support for NAT. New WR FW_FILTER2_WR and
  FW_PARAMS_PARAM_DEV_FILTER2_WR added for the same.
- Added RDMA guest mode (mode 3 or RDMA from VF) support.

================================================================================

T4 Firmware
+++++++++++
================================================================================
Version : 1.16.63.0
Date    : 09/29/2017
================================================================================

Fixes
-----

ETH:
- Fixed DCBX configuration refresh failure.

FOiSCSI:
- Fixed vlan interface ping failure.

================================================================================
Version : 1.16.59.0
Date    : 09/05/2017
================================================================================

FIXES
-----

ETH:
- Fixed an issue in connection traffic shaping when
  FLOWC_WR->FW_FLOWC_MNEM_SCHEDCLASS is not received in first WR on the connection.

ENHANCEMENTS
------------

BASE:
- FW_PARAMS_PARAM_DEV_TPCHMAP API added to read the port to channel mapping.

================================================================================
Version : 1.16.43.0
Date    : 05/05/2017
================================================================================

FIXES
-----

BASE:
- Fixed default DCB mode to AUTO.
- Fixed DCBX bugs when AUTO mode is configured in config file.
- Fixed an issue where even after removing PFC from switch, PFC wasn't getting
  reset.

ETH:
- Fixed bug in writing multi-bytes using i2c interface.

OFLD
- Fixed an extra credit issue if FW_RI_TYPE_FINI is delayed in fw due to
  backpressure.
- Added a new queue type FW_IQ_TYPE_VF_CQ to handle the FW_PARAMS_PARAM_DMAQ*
  commands. queue type will be part of the FW_PARAMS_PARAM_DMAQ_IQ_INTIDX
  value. Used in guest RDMA (RDMA from VM/VF) usecase.

ENHANCEMENTS
------------

BASE:
- Added new API FW_PARAM_PARAM_DEV_MPSBGMAP to read the priority to buffer
  group mapping for the ports.

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-10-07 17:24:23 +00:00
Navdeep Parhar
08cd1f11bd cxgbe(4): Provide knobs to set the holdoff parameters of TOE rx queues
separately from NIC rx queues instead of using the same parameters for
both types of queues.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-10-05 07:18:16 +00:00
John Baldwin
4fedff3ca1 Enable support for lookaside crypto operations by default.
This permits ccr(4) to be used with the default firmware configuration
file.

Discussed with:	np
Sponsored by:	Chelsio Communications
2017-09-18 23:50:34 +00:00
John Baldwin
91a65e2f6b Avoid reusing the wrong buffer for a DDP AIO request.
To optimize the case of ping-ponging between two buffers, the DDP code
caches the last two buffers used keeping the pages wired and page pods
stored in the NIC's RAM.  If a new aio_read() request uses one of the
same buffers, then the work of holding pages, etc. can be avoided.
However, the starting virtual address of an aio buffer was not saved,
only the page count, length, and initial page offset.  Thus, an
aio_read() request could match a different buffer in the address
space.  (Earlier during development vm_fault_hold_quick_pages() was
always called and the vm_page_t values were compared, but that was
eventually removed without being adequately replaced.)  Fix by storing
the starting virtual address and comparing that (along with other
fields) to determine if a buffer can be reused.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-09-15 22:40:57 +00:00
John Baldwin
2bd1e600e3 Fix some incorrect sysctl pointers for some error stats.
The bad_session, sglist_error, and process_error sysctl nodes were
returning the value of the pad_error node instead of the appropriate
error counters.

Sponsored by:	Chelsio Communications
2017-09-14 21:06:08 +00:00
Navdeep Parhar
efeb46889f cxgbe(4): Ignore capabilities that depend on TOE when the firmware
reports TOE is not available.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-09-13 06:07:02 +00:00
Navdeep Parhar
8d6ae10af6 cxgbe(4): Fix a couple of problems in the sge_wrq data path.
- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
  pending.  This can happen when a t4_wrq_tx runs between two
  start_wrq_wr.

- commit_wrq_wr must examine the cookie's pidx and ndesc with the
  queue's lock held.  Otherwise there is a bad race when incomplete WRs
  are being completed and commit_wrq_wr for the WR that is ahead in the
  queue updates the next incomplete WR's cookie's pidx/ndesc but the
  commit_wrq_wr for the second one is using stale values that it read
  without the lock.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-09-09 05:12:14 +00:00
Navdeep Parhar
95bb5b694e cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that
t4_tom picks it up right away.  This is less work than waiting for
the connection to be established before applying the setting.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-09-01 01:34:12 +00:00
Navdeep Parhar
0a3bf7fb7e cxgbe/t4_tom: There may not be a tid to update if the connection isn't
established.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-08-31 23:34:08 +00:00
Navdeep Parhar
3ef7429927 cxgbe/t4_tom: Add a knob to select the congestion control algorigthm
used by the TOE hardware for fully offloaded connections.  The knob
affects new connections only.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-08-31 20:33:22 +00:00
Navdeep Parhar
2f318252cb cxgbe(4): Add two new debug flags -- one to allow manual firmware
install after full initialization, and another to disable the TCB
cache (T6+).  The latter works as a tunable only.

Note that debug_flags are for debugging only and should not be set
normally.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-08-30 23:41:04 +00:00
Navdeep Parhar
e3d00a3e03 cxgbe(4): Zero out the memory allocated for the debug dump.
cudbg_collect seems to expect it this way.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-08-30 18:46:38 +00:00
Navdeep Parhar
fc740a161b cxgbe(4): Update T6/T5/T4 firmwares to 1.16.59.0.
These firmwares come from a pre-release snapshot.  The final firmwares
in this Chelsio release cycle will likely be .61.0 or later and those
will be the next "long lived" firmwares in FreeBSD head and stable
branches.  .59 is being provided in head (only) for wider test exposure.

Obtained from:	Chelsio Communications
Sponsored by:	Chelsio Communications
2017-08-29 23:37:26 +00:00
Navdeep Parhar
1ba8c29ca8 cxgbe(4): Do not access the mailbox without appropriate locks while
creating hardware VIs.

This fixes a bad race on systems with hw.cxgbe.num_vis > 1.

Reported by:	olivier@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-08-28 22:41:15 +00:00
Navdeep Parhar
7023d9d4c6 cxgbe(4): Maintain one ifmedia per physical port instead of one per
Virtual Interface (VI).  All autonomous VIs that share a port share the
same media.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-08-28 21:44:25 +00:00
Navdeep Parhar
573443c542 cxgbe(4): vi_mac_funcs should include the base Ethernet function. It is
already used in the driver as if it does.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-28 07:50:54 +00:00
Navdeep Parhar
358346b456 cxgbe(4): Remove write only variable from t4_port_init.
MFC after:	3 days
2017-08-28 04:06:40 +00:00
Navdeep Parhar
0fa7560dfd cxgbe(4): Fix some assertions during driver detach. The netmap queues
can't be initialized if the VI isn't.

MFC after:	3 days
2017-08-28 03:25:41 +00:00
Navdeep Parhar
f6d9d14b93 cxgbe(4): Verify that the driver accesses the firmware mailbox in a
thread-safe manner.

MFC after:	3 days
2017-08-28 03:13:16 +00:00
Navdeep Parhar
7cb574039b cxgbe(4): Dump the mailbox contents in the same format as CH_DUMP_MBOX.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-25 23:31:15 +00:00
Navdeep Parhar
b58fd65490 cxgbe/t4_tom: Use correct name for the ISS-valid bit in options2.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-15 19:21:27 +00:00
Navdeep Parhar
5d973bad2a cxgbe(4): Save the last reported link parameters and compare them with
the current state to determine whether to generate a link-state change
notification.  This fixes a bug introduced in r321063 that caused the
driver to sometimes skip these notifications.

Reported by:	Jason Eggleston @ LLNW
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-12 14:02:19 +00:00
Navdeep Parhar
cc2050c5eb cxgbe(4): Avoid a NULL dereference that would occur during module unload
if there were problems earlier during attach.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-06 19:45:59 +00:00
Navdeep Parhar
019c1a0111 cxgbe(4): Allow the TOE timer tunables to be set with microsecond
precision.  These timers are already displayed in microseconds in the
sysctl MIB.  Add variables to track these tunables while here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-04 15:57:10 +00:00
Navdeep Parhar
6320b0f850 cxgbe(4): Always use the first and not the last virtual interface
associated with a port in begin_synchronized_op.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-04 01:28:06 +00:00
Navdeep Parhar
f856f099cb cxgbe(4): Initial import of the "collect" component of Chelsio unified
debug (cudbg) code, hooked up to the main driver via an ioctl.

The ioctl can be used to collect the chip's internal state in a
compressed dump file.  These dumps can be decoded with the "view"
component of cudbg.

Obtained from:	Chelsio Communications
MFC after:	2 months
Sponsored by:	Chelsio Communications
2017-08-03 14:43:30 +00:00
Navdeep Parhar
f8d0488ec6 cxgbe/iw_cxgbe: Log the end point's history and flags to the trace
buffer just before it's freed.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-07-28 22:28:45 +00:00
Navdeep Parhar
c45b18681f cxgbe(4): Some updates to the common code.
- Updated register ranges.
- Helper routines for access to TP registers.
- Updated routine to read flash parameters.

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-07-26 20:20:58 +00:00
Navdeep Parhar
dcbe705600 cxgbe(4): Display some more TOE parameters related to retransmission
and keepalive in the sysctl MIB.  Provide tunables to change some of
these parameters.  These are supposed to be setup by the firmware so
these tunables are for experimentation only.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-07-24 19:17:13 +00:00
Navdeep Parhar
98d227f3e8 cxgbe(4): Install the firmware bundled with the driver to the card if it
doesn't seem to have one.  This lets the driver recover automatically
from incomplete firmware upgrades (panic, reboot, power loss, etc. in
the middle of an upgrade).

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-07-23 18:10:47 +00:00
Navdeep Parhar
089400e72d cxgbe/t4_tom: Log more details about the newly ESTABLISHED tid to the
trace buffer.

MFC after:	3 days
2017-07-19 01:49:01 +00:00
Navdeep Parhar
1e5f943084 cxgbe(4): New ioctls to flash bootrom and boot config to the card.
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-07-18 00:50:58 +00:00
Navdeep Parhar
01285747aa cxgbe(4): Various link/media related improvements.
- Deal with changes to port_type, and not just port_mod when a
  transceiver is changed.  This fixes hot swapping of transceivers of
  different types (QSFP+ or QSA or QSFP28 in a QSFP28 port, SFP+ or
  SFP28 in a SFP28 port, etc.).

- Always refresh media information for ifconfig if the port is down.
  The firmware does not generate tranceiver-change interrupts unless at
  least one VI is enabled on the physical port.  Before this change
  ifconfig diplayed potentially stale information for ports that were
  administratively down.

- Always recalculate and reapply L1 config on a transceiver change.

- Display PAUSE settings in ifconfig.  The driver sysctls for this
  continue to work as well.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-07-17 00:42:13 +00:00
Navdeep Parhar
3a5426dd3c cxgbe/t4_tom: Do not include space taken by the TCP timestamp option in
the "effective MSS" for the connection.  The chip expects it this way.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-06-27 22:05:06 +00:00
Navdeep Parhar
ea168fbc64 cxgbe/iw_cxgbe: Disable debug output by default. The help text for the sysctl
already says that the default is 0.

Sponsored by:	Chelsio Communications
2017-06-27 17:48:11 +00:00
Navdeep Parhar
edc9c9cd54 cxgbe/iw_cxgbe: Catch up with r319722. The socket lock is not the same as the
lock for the receive buffer any more.

Sponsored by:	Chelsio Communications
2017-06-27 17:45:47 +00:00
Navdeep Parhar
98d391a0d3 cxgbe/t4_tom: sbspace on listening sockets is no longer supported (as of
r319722), use sol_sbrcv_hiwat instead.

Sponsored by:	Chelsio Communications
2017-06-27 17:43:28 +00:00
Navdeep Parhar
a8c4fcb9c7 cxgbe(4): Fix per-queue netmap operation.
Do not attempt to initialize netmap queues that are already initialized
or aren't supposed to be initialized.  Similarly, do not free queues
that are not initialized or aren't supposed to be freed.

PR:		217156
Sponsored by:	Chelsio Communications
2017-06-15 19:56:59 +00:00
Navdeep Parhar
7127f6f455 cxgbe(4): Do not request an FEC setting that the port does not support.
MFC after:	3 days.
Sponsored by:	Chelsio Communications
2017-06-12 20:55:20 +00:00
John Baldwin
1496376fee Fix the software fallback for GCM to validate the existing tag for decrypts.
Sponsored by:	Chelsio Communications
2017-06-08 21:33:10 +00:00
John Baldwin
4623e047a7 Add explicit handling for requests with an empty payload.
- For HMAC requests, construct a special input buffer to request an empty
  hash result.
- For plain cipher requests and requests that chain an AES cipher with an
  HMAC, fail with EINVAL if there is no cipher payload.  If needed in
  the future, chained requests that only contain AAD could be serviced as
  HMAC-only requests.
- For GCM requests, the hardware does not support generating the tag for
  an AAD-only request.  Instead, complete these requests synchronously
  in software on the assumption that such requests are rare.

Sponsored by:	Chelsio Communications
2017-06-08 21:06:18 +00:00
Navdeep Parhar
a59a14773a cxgbe(4): Update the statistics for compound tx work requests once per
work request, not once per frame.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-06-02 17:57:27 +00:00
John Baldwin
d68990a14c Fail large requests with EFBIG.
The adapter firmware in general does not accept PDUs larger than 64k - 1
bytes in size.  Sending crypto requests larger than this size result in
hangs or incorrect output, so reject them with EFBIG.  For requests
chaining an AES cipher with an HMAC, the firmware appears to require
slightly smaller requests (around 512 bytes).

Sponsored by:	Chelsio Communications
2017-05-26 20:20:40 +00:00
Navdeep Parhar
27bdfd5a8a cxgbe/iw_cxgbe: sodisconnect failures are harmless and should not be
treated as fatal errors.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-05-24 04:48:09 +00:00
Navdeep Parhar
7c0cad38c7 cxgbe(4): Update the T4, T5, and T6 firmwares to 1.16.45.0.
The latest firmware has a number of link related fixes, support for a
new custom card, and the fix for a bug that affected rate limiting on
FreeBSD.

Obtained from:	Chelsio Communications
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-05-23 23:40:17 +00:00
John Baldwin
5033c43b7a Add a driver for the Chelsio T6 crypto accelerator engine.
The ccr(4) driver supports use of the crypto accelerator engine on
Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.

Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS
cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC,
and SHA2-512-HMAC authentication algorithms.  The driver also supports
chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication
algorithm for encrypt-then-authenticate operations.

Note that this driver is still under active development and testing and
may not yet be ready for production use.  It does pass the tests in
tests/sys/opencrypto with the exception that the AES-GCM implementation
in the driver does not yet support requests with a zero byte payload.

To use this driver currently, the "uwire" configuration must be used
along with explicitly enabling support for lookaside crypto capabilities
in the cxgbe(4) driver.  These can be done by setting the following
tunables before loading the cxgbe(4) driver:

    hw.cxgbe.config_file=uwire
    hw.cxgbe.cryptocaps_allowed=-1

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D10763
2017-05-17 22:13:07 +00:00
John Baldwin
720e7bb69c Add support for child devices that aren't ports.
Invoke any identify routines of child drivers during attach before attaching
children, and delete any remaining devices after deleting ports.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-05-16 23:18:50 +00:00
Navdeep Parhar
3f1466a535 cxgbe(4): Avoid an out of bounds access when an attempt to unbind a tx
queue from a traffic class fails.

Reported by:	x ksi <s3810 at pjwstk edu pl>
MFC after:	3 days
2017-05-15 18:18:32 +00:00
Navdeep Parhar
c8da9163bf cxgbe(4): netmap-only interrupts for a VI do not have an associated rxq
or ofld_rxq and should be ignored by vi_intr_iq.

MFC after:	3 days.
Sponsored by:	Chelsio Communications
2017-05-14 09:07:13 +00:00
Navdeep Parhar
b2d8f4934e Adjust whitespace and fix a comment. No functional change.
MFC after:	3 days
2017-05-10 00:42:28 +00:00
Navdeep Parhar
1404daa76c cxgbe(4): Do not assume that if_qflush is always followed by inteface-down.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-05-09 18:33:41 +00:00
Navdeep Parhar
e006d2a6fd cxgbe(4): Fixes related to the knob that controls link autonegotiation.
- Do not leak the adapter lock in sysctl_autoneg.
- Accept only 0 or 1 as valid settings for autonegotiation.
- A fixed speed must be requested by the driver when autonegotiation is
  disabled otherwise the firmware will reject the l1cfg command.  Use
  the top speed supported by the port for now.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-05-09 08:08:28 +00:00
Navdeep Parhar
6790499792 cxgbe/t4_tom: Per-connection rate limiting for TCP sockets handled by
the TOE.  For now this capability is always enabled in kernels with
options RATELIMIT.  t4_tom will check if_capenable once the base driver
gets code to support rate limiting for any socket (TOE or not).

This was tested with iperf3 and netperf ToT as they already support
SO_MAX_PACING_RATE sockopt.  There is a bug in firmwares prior to
1.16.45.0 that affects the BSD driver only and results in rate-limiting
at an incorrect rate.  This will resolve by itself as soon as 1.16.45.0
or later firmware shows up in the driver.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2017-05-05 20:06:49 +00:00
Navdeep Parhar
1dc02549c3 cxgbe(4): The Tx scheduler initialization either works or doesn't. It
doesn't need a refresh in either case.

Sponsored by:	Chelsio Communications
2017-05-05 19:34:05 +00:00
Navdeep Parhar
49c0beb6f5 cxgbe(4): Update the VF device ids too. This should have been part
of r317820.

Reported by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-05-05 16:52:25 +00:00
Navdeep Parhar
63febe64f1 cxgbe(4): Update the list of PCIe devices claimed by the driver. At
this point any board with a T6 should just work.

Obtained from:	Chelsio Communications
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-05-05 00:54:23 +00:00
Navdeep Parhar
2204b42716 cxgbe(4): Support routines for Tx traffic scheduling.
- Create a new file, t4_sched.c, and move all of the code related to
  traffic management from t4_main.c and t4_sge.c to this file.
- Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl)
  parameters in the PF driver.
- Initialize all the cl_rl limiters with somewhat arbitrary default
  rates and provide routines to update them on the fly.
- Provide routines to reserve and release traffic classes.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-05-02 20:38:10 +00:00
Navdeep Parhar
034b4dcfa8 cxgbe/iw_cxgbe: Pull in some updates to c4iw_wait_for_reply from the
iw_cxgb4 Linux driver.

Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-25 16:54:27 +00:00
Alexander Motin
5a5b6d1979 Use proper alignment constant for uma_zcreate().
Previous code panicked on KASSERT with INVARIANTS enabled.

MFC after:	2 weeks
2017-04-24 08:44:51 +00:00
Navdeep Parhar
46f48ee519 cxgbe: Add tunables to control the number of LRO entries and the number
of rx mbufs that should be presorted before LRO.  There is no change in
default behavior.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-04-17 09:00:20 +00:00
Navdeep Parhar
d491f8ca2f cxgbe: Add a tunable to configure the SGE time scaler, which is
available starting with T6.  The values in the timer holdoff registers
are multiplied by the scaling factor before use.

dev.<nexus>.<n>.holdoff_timers shows the final values of the
timers in microseconds.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-04-15 17:00:50 +00:00
Navdeep Parhar
e3951def25 cxgbe/iw_cxgbe: Report the actual values of various parameters as
configured by the firmware.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-14 20:15:17 +00:00
Navdeep Parhar
7f77a37048 cxgbe/iw_cxgbe: Report accurate page_size_cap in ib_query_device.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-14 19:18:50 +00:00
Navdeep Parhar
080491b13c cxgbe/iw_cxgbe: hw supports 64K (not 32K) Protection Domains.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-14 19:15:31 +00:00
Navdeep Parhar
0a600b6312 cxgbe: Query some more RDMA related parameters from the firmware.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-13 17:16:36 +00:00
Navdeep Parhar
1c7d0de794 cxgbe/iw_cxgbe: Remove another bad cast. This should have been
included in r316571.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-06 16:19:19 +00:00
Navdeep Parhar
ba81aae287 cxgbe/iw_cxgbe: Replace a magic constant with something more readable
(and accurate).

T4 and later have an extra bit for page shift so the maximum page size
is 8TB (shift of 12 + 31) instead of 128MB (12 + 15).  This saves space
in the chip's PBL (physical buffer list) when registering very large
memory regions.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-06 14:50:15 +00:00
Navdeep Parhar
870b2660d4 cxgbe/iw_cxgbe: Remove bad cast that resulted in incorrect length for
memory regions larger than 4GB.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-06 13:58:59 +00:00
Navdeep Parhar
3128b167ed cxgbe(4): Program the global RSS key once instead of once per ifnet.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-04-04 23:14:03 +00:00
Navdeep Parhar
a6bef5c275 cxgbe: Don't call t4_edc_err_read for errors not related to the EDCs.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-03-29 19:25:31 +00:00
Navdeep Parhar
ce22fbcf83 cxgbe/iw_cxgbe: T6 has no limit on the amount of memory that can be
registered in one ib_reg_phys_mr.
2017-03-28 23:39:11 +00:00
Navdeep Parhar
74308c6816 cxgbe/iw_cxgbe: Defer the handling of error CQEs and RDMA_TERMINATE to
the thread that deals with socket state changes.  This eliminates
various bad races with the ithread.

Submitted by:	KrishnamRaju ErapaRaju @ Chelsio (original version)
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-03-27 22:00:03 +00:00
Navdeep Parhar
dee33b4bd5 cxgbe/iw_cxgbe: Remove unused code.
MFC after:	3 days
2017-03-27 03:11:51 +00:00
Navdeep Parhar
4aff1c38d1 cxgbe/iw_cxgbe: allocations that use GFP_KERNEL (which is M_WAITOK on
FreeBSD) cannot fail.

MFC after:	3 days
2017-03-25 02:28:21 +00:00
Navdeep Parhar
d4ce83cd36 cxgbe/iw_cxgbe: alloc_ep expects a gfp_t, and it's always ok to sleep during
alloc_ep.
2017-03-25 01:45:04 +00:00
Navdeep Parhar
a8d61a0a7b cxgbe/iw_cxgbe: c4iw_connect should always returns a -ve errno on failure.
MFC after:	3 days
2017-03-25 01:38:17 +00:00
Navdeep Parhar
94036cfff0 cxgbe/iw_cxgbe: Use the socket and not the toepcb to reach for the
inpcb.  t4_tom detaches the inpcb from the toepcb as soon as the
hardware is done with the connection (in final_cpl_received) but the
socket is around as long as the cm_id and the rest of iWARP state is.

This fixes an intermittent NULL dereference during abort.

Submitted by:	KrishnamRaju ErapaRaju @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-03-15 19:10:04 +00:00
Navdeep Parhar
73832aae54 cxgbe(4): Fix an always-true assertion (reported by PVS-Studio).
sys/dev/cxgbe/t4_main.c: PVS-Studio: Expression is Always True (CWE-571) (3)

PR:		217746
Submitted by:	Svyatoslav (razmyslov at viva64 com)
MFC after:	1 week
2017-03-13 17:16:29 +00:00
Navdeep Parhar
1081f354af cxgbe/iw_cxgbe: Abort connection if there is an error during c4iw_modify_qp.
Submitted by:	KrishnamRaju ErapaRaju @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-03-07 01:13:26 +00:00
Navdeep Parhar
401032c67d cxgbe/iw_cxgbe: Implement sq/rq drain operation.
ULPs can set a qp's state to ERROR and then post a work request on the
sq and/or rq.  When the reply for that work request comes back it is
guaranteed that all previous work requests posted on that queue have
been drained.

Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-03-03 03:07:54 +00:00
Navdeep Parhar
f52a45c98e cxgbe/iw_cxgbe: Do not check the size of the memory region being
registered.  T4/5/6 have no internal limit on this size.  This is
probably a copy paste from the T3 iw_cxgb driver.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-03-01 20:23:21 +00:00
Navdeep Parhar
017296dbb6 cxgbe/iw_cxgbe: fix various double-close panics with iWARP sockets.
Sockets representing the TCP endpoints for iWARP connections are
allocated by the ibcore module.  Before this revision they were closed
either by the ibcore module or the iw_cxgbe hardware driver depending on
the state transitions during connection teardown.  This is error prone
and there were cases where both iw_cxgbe and ibcore closed the socket
leading to double-free panics.  The fix is to let ibcore close the
sockets it creates and never do it in the driver.

- Use sodisconnect instead of soclose (preceded by solinger = 0) in the
  driver to tear down an RDMA connection abruptly.  This does what's
  intended without releasing the socket's fd reference.

- Close the socket in ibcore when the iWARP iw_cm_id is destroyed.  This
  works for all kinds of sockets: clients that initiate connections,
  listeners, and sockets accepted off of listeners.

Reviewed by:	Steve Wise @ Open Grid Computing, hselasky@
MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D9796
2017-02-28 19:27:41 +00:00
Navdeep Parhar
3ae36eee9c cxgbe/iw_cxgbe: Minor changes for T6.
Submitted by:	Krishnamraju Eraparaju at Chelsio
Sponsored by:	Chelsio Communications
2017-02-23 19:02:40 +00:00
Navdeep Parhar
eaf5669483 cxgbe/t4_tom: Fix CLIP entry refcounting on the passive side. Every
IPv6 connection being handled by the TOE should have a reference on its
CLIP entry.

Sponsored by:	Chelsio Communications
2017-02-06 17:48:25 +00:00
Navdeep Parhar
987258d00f cxgbe(4): Allow tunables that control the number of queues to be set to
'-n' to tell the driver to create _up to_ 'n' queues if enough cores are
available.  For example, setting hw.cxgbe.nrxq10g="-32" will result in
16 queues if the system has 16 cores, 32 if it has 32.

There is no change in the default number of queues of any type.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-02-06 05:19:29 +00:00
John Baldwin
0ed5eff947 Fix a couple of issues with t4iov probe and attach.
- Check for Chelsio vendor ID in probe routines.
- Fail attach instead of faulting if pci_find_dbsf() doesn't find a
  device.

PR:		216539
Reported by:	asomers
Tested by:	Dave Baukus <daveb@spectralogic.com>
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-31 18:54:13 +00:00
John Baldwin
fbb779cca9 Unregister CPL handlers for TOE-related messages when unloading TOM.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-01-27 23:08:30 +00:00
John Baldwin
b67915014d Don't drop a reference to the TOE PCB in undo_offload_socket().
undo_offload_socket() is only called by t4_connect() during a connection
setup failure, but t4_connect() still owns the TOE PCB and frees ita
after undo_offload_socket() returns.  Release a reference in
undo_offload_socket() resulted in a double-free which panicked when
t4_connect() performed the second free.  The reference release was
added to undo_offload_socket() incorrectly in r299210.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-01-27 23:03:28 +00:00
Navdeep Parhar
ff2c4d3f79 cxgbe/tom: Fix a case where do_pass_accept_req wasn't properly restoring
the VNET.  This should have been in r311949.

MFC after:	2 days
2017-01-18 03:35:42 +00:00
Navdeep Parhar
a342904bb5 cxgbe/tom: Add VIMAGE support to the TOE driver.
Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
  switch to it when processing the reply (do_act_open_rpl or
  do_act_establish).

Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
  to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
  listening socket.

CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
  associated with one of the card's ifnets.

Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
  so t4_push_frames runs in proper vnet context already.  One exception
  is when incoming credits trigger tx within the driver's ithread.  Set
  the vnet explicitly in do_fw4_ack for that case.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-11 23:48:17 +00:00
Navdeep Parhar
c70c4d71de The iw_cxgb and iw_cxgbe drivers should not use a FreeBSD device_t where
a linuxkpi style device is expected.  If OFED/linuxkpi actually starts
using this field then we'll have to figure out whether to create fake
devices for these drivers or have linuxkpi deal with NULL device.

This mismatch was first reported as part of D6585.
2017-01-10 18:39:53 +00:00
Navdeep Parhar
e8257dbe43 cxgbe(4): Attach to the 2x25 debug card. This is for internal use only.
MFC after:	3 days
2017-01-10 01:36:50 +00:00
Navdeep Parhar
b91f227f60 cxgbe(4): Refresh t4_msg.h, mainly for definitions related to the crypto
engine.

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-01-10 01:30:41 +00:00
Navdeep Parhar
87b027ba69 cxgbe(4): Enable automatic cidx flush for all control queues.
MFC after:	3 days
2017-01-09 22:20:09 +00:00
Navdeep Parhar
f50c49cca7 cxgbe(4): The wraparound logic in start_wrq_wr() should not get involved
in work requests that end at the end of the descriptor ring, even though
the pidx wraps around to 0.

MFC after:	3 days
2017-01-09 22:18:08 +00:00
Navdeep Parhar
d663d5ca23 cxgbe/t4_tom: Fix tid accounting. An offloaded IPv6 connection uses 2
tids, not 1, in the hardware.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-07 20:26:19 +00:00
Navdeep Parhar
e4612db2cd Fix comment in t4_tom. No functional change.
MFC after:	3 days
2017-01-07 00:08:55 +00:00
Navdeep Parhar
c88fa71928 cxgbe(4): Update T4, T5 and T6 firmwares to 1.16.26.0. Changelog for
all public firmwares for all chips since the last release (1.15.37.0)
follows (it's a straight copy-paste from the Release Notes for the
12/30/2016 Unified Wire release on Chelsio's website).

T6 Firmware
++++++++++++

Version : 1.16.26.0
Date    : 12/28/2016

Fixes
-----

BASE:
- Max number of egress and control queues adjusted to accomodate
  co-processor mode queues.
- Fixed intermittent DDR3/4 ECC errors.
- Fixed a traffic stall when ETS BW is configured as 0%.
- Max number of ethctrl queue in VF set to 1.

ETH:
- Added a new config file option 'speed' under port section to set the
  port speed.  Use only when auto negotiation is off.
- FEC option removed from firmware config file. cxgbtool can be used to
  change the fec setting.
- CPL_TX_TNL_LSO cpl handling added in ETH_TX_PKT_VM handler. This fixes
  large tunnel tcp packet support for VxLAN.

Version : 1.16.22.0
Date    : 12/05/2016

Fixes
-----

BASE:
- fw_port_type updated in fw API to match kernel.org definitions.
- Saved power by disaling unused MAC lanes.
- Configures correct power bin.
- Enhanced DDR4 performance.
- Enabled interrupts.
- Fixed an issue where filter rule for 'unicast hash' is not working.

ETH:
- Disabled auto negotiation by default because most of 100G switches do
  not support AN as of today.
- Fixed flow control not getting disabled problem.
- Fixed an issue where port0 doesn't come up sometimes.
- Fixed 10G link not coming up issue.
- Fixed an issue with promiscuous mode when dcbx disabled.

OFLD:
- Fixed a connection stuck issue when abort is received during out of tx
  pages backpressure.

ENHANCEMENTS
------------

BASE:
- Added inline TLS mode support.

Version : 1.16.12.0
Date    : 11/11/2016

ENHANCEMENTS
------------

BASE:
- Added T6 support.
- Added T6 1G/10G/25G/40G/100G link speeds.
- Added T6 co-processor mode crypto support.
- Added facility to increase link AN+AEC timeout.

OFLD:
- Added support for all T5 offload protocols except FCoE.

iSCSI:
- iscsi completion moderation enabled.

=======================================================================

T5 Firmware
++++++++++++

Version : 1.16.26.0
Date    : 12/28/2016

FIXES
-----

BASE:
- Max number of ethctrl queue in VF set to 1.

Version : 1.16.22.0
Date    : 12/05/2016

FIXES
-----

BASE:
- Fixed an issue where filter rule for 'unicast hash' is not working.

ETH:
- Fixed an issue with promiscuous mode when dcbx disabled.

ENHANCEMENTS
------------

ETH:
- Added 40G-KR support.

Version : 1.16.12.0
Date    : 11/11/2016

FIXES
-----

BASE:
- Fixed multiple issues related with VFs FLR processing.
- Fixed channel assignment based on number of ports in adapter.
- Fixed a crash when VM having PF assigned as passthrough mode is
  rebooted.
- Handled 2nd HELLO command from the same PF without seeing BYE from the
  same PF and if that is the only PF.
- A warning is printed in firmware log if PCI-E cookie generation is
  enabled in serial initialization file.
- Fixed multiple issues related with Filtering.
- Enabled DSGL memory write for iscsi and rdma.
- Added new FW_PARAMS_CMD[DEV] options to retrieve Serial Configuration
  and VPD version numbers.
- Fixed an issue where LVDS output was not getting enabled using vpd.

DCBX:
- Fixed DCBX CEE Incorrect class to pririty mapping.
- Fixed incorrect interpretation of DCBX IEEE PFC.

ETH:
- Adjusted the link related delay timings according to the QSFP spec.
- Improved 40G link bringup time with few switches.

OFLD:
- Do not reserve qp/cq if rdma capability is not enabled.
- Fixed an issue where approx 1600+ TOE connections were causing a
  firmware fatal error.

FOiSCSI:
- Fixed an issue where unloading foiscsi driver causes mailbox timeout.

ENHANCEMENTS
------------

BASE:
- Added 10G KR/KX support.
- Added T540-BT adapter support.
- Added 4 new rss key modes for PFs and VFs.

OFLD:
- Added new WR FW_RI_FR_NSMR_TPTE_WR to improve fast MR write
  performance in RDMA.

Version : 1.16.5.0
Date    : 10/26/2016

FIXES
-----

BASE:
- Fixed multiple issues where FLR from multiple VFs can cause firmware
  crash.
- Fixed channel assignment based on number of ports in adapter.
- Fixed the HELLO command master force api to handle the 2nd HELLO
  correctly without getting BYE from the PF driver.
- Added facility to retrieve Serial configuration and VPD version. Two
  new FW_PARAMS_CMD[DEV] options added to retrieve these values.
- Fixed multiple issues where FLR from multiple VFs are not completing.
- Added new RSS hash secret key modes.
- Fixed an issue where LVDS output was not getting enabled using vpd.

DCBX:
- Fixed an issue where iscsi tlv is sent incorrectly to host (DCBX CEE).
- Fixed an issue where app priority values are not handled correctly
  in fw (DCBX IEEE).

ETH:
- Adjusts the link related delay timings according to the QSFP spec.
- Changed 2.5G mac speed bit to 25G mac speed bit in fw API.
- Improvement in 40G link bringup time with few switches.

OFLD:
- Do not reserve qp/cq if rdma capability is not enabled.
- Fixed an issue where approx 1600+ TOE connections were causing a
  firmware fatal error.
- Fixed DSGL memory write in T5. Now iwarp and iscsi can use DSGL to do
  memory write.
- Fixed multiple issues in hash filter mode where incorrect protocol
  mask was getting used and affecting hash filter functionality.
- New fastpath WR FW_RI_FR_NSMR_TPTE_WR (with fully populated TPTE) is
  added for small REG_MR operations.

FOiSCSI:
- Fixed an issue in foiscsi recovery path.
- Fixed an issue where foiscsi (in VM in PCIE passthrough mode) didn't
  come up after VM FLR.

ENHANCEMENTS
------------

ETH:
- Implemented 1G/10G KR/KX ability.
- Implemented T540-BT adapter support.

=======================================================================

T4 Firmware
+++++++++++

Version : 1.16.12.0
Date    : 11/11/2016

FIXES
-----

BASE:
- Fixed an issue where reading temperature sesors using ldst command
  causes mailbox timeout.
- Added new FW_PARAMS_CMD[DEV] options to retrieve Serial Configuration
  and VPD version numbers.

ETH:
- Fixed DCBX CEE Incorrect class to pririty mapping.

FOiSCSI:
- Fixed an issue where unloading foiscsi driver causes mailbox timeout.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-01-03 22:05:07 +00:00
Navdeep Parhar
358bca3bc6 cxgbe(4): Updates to link configuration.
- Update struct link_settings and associated shared code.

- Add tunables to control FEC and autonegotiation.  All ports inherit
  these values as their initial settings.
  hw.cxgbe.fec
  hw.cxgbe.autoneg

- Add per-port sysctls to control FEC and autonegotiation.  These can be
  modified at any time.
  dev.<port>.<n>.fec
  dev.<port>.<n>.autoneg

MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-12-30 08:59:49 +00:00
Navdeep Parhar
5a8b662ea6 cxgbe(4): Fix typo in an unused macro.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-12-16 06:30:07 +00:00
Navdeep Parhar
07b5fdfd01 cxgbe(4): Changes to the default T6 firmware configuration file.
- Disable features that are not supported or not used on FreeBSD.
- Increase the RSS table slice per interface.
- Increase the share of the TCAM reserved for filtering.

MFH:		2 weeks
Sponsored by:	Chelsio Communications
2016-12-16 06:25:51 +00:00
Navdeep Parhar
1de8c69de7 cxgbe(4): Deal with compressed error vectors.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-12-15 02:05:29 +00:00
Navdeep Parhar
ca276276f1 cxgbe(4): Fix the tid range shown for T6 cards in misc.tids.
MFC after:	3 days
2016-12-14 07:36:36 +00:00
Navdeep Parhar
1521ca71d3 cxgbe(4): Retire t4_bus_space_read_8 and t4_bus_space_write_8.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-12-13 20:35:57 +00:00
Navdeep Parhar
b8c1ffef80 cxgbe(4): netmap does not set IFCAP_NETMAP in an ifnet's if_capabilities
any more (since r307394).  Do it in the driver instead.

MFC after:	1 week
2016-12-09 02:21:27 +00:00
Navdeep Parhar
aa7792f2f6 cxgbe(4): unsigned short isn't large enough to store link speed (which
is in Mbps) for 100Gbps links.

MFC after:	3 days
2016-12-07 04:23:08 +00:00
Navdeep Parhar
3cbaf64f2e cxgbe(4): Update firmwares from version 1.16.12.0 to 1.16.22.0.
Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-12-06 12:43:07 +00:00
Navdeep Parhar
a10443e8ba cxgbe(4): Include firmware for T6 cards in the driver. Update all
firmwares to 1.16.12.0.

Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-11-30 00:26:35 +00:00
Navdeep Parhar
041e5aed63 cxgbe(4): Accurate statistics for all chip settings.
There are 4 independent knobs in T5+ chips to include or exclude PAUSE
frames from the "total frames" and "multicast frames" counters in either
direction.  This change lets the driver deal with any combination of
these settings.
2016-10-28 23:01:11 +00:00
Navdeep Parhar
77e9044c47 cxgbe(4): Fix bug in the calculation of the number of physically
contiguous regions in an mbuf chain.

If the payload of an mbuf ends at a page boundary count_mbuf_nsegs would
incorrectly consider the next mbuf's payload physically contiguous based
solely on a KVA comparison.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2016-10-24 19:09:56 +00:00
Navdeep Parhar
aaa4ddd9d0 cxgbe(4): Dump any mailbox command that times out. 2016-10-22 00:48:58 +00:00
Navdeep Parhar
b806e571bf cxgbe(4): Adjust whitespace to line up the column titles in cim_qcfg
with the values displayed.
2016-10-17 20:57:54 +00:00
Navdeep Parhar
721f5406c8 cxgbe(4): Allow the interface MTU to be set as high as the actual
hardware limit.

Submitted by:	jpaetzel@
Differential Revision:	https://reviews.freebsd.org/D8237
2016-10-13 19:40:21 +00:00
Navdeep Parhar
35b5ef914c cxgbe(4): Add an ioctl to copy a firmware config file to the card's flash. 2016-10-07 19:02:39 +00:00
Navdeep Parhar
d4d953bf37 cxgbe(4): Fix whitespace in the pm_stats display. 2016-10-06 21:25:17 +00:00
Navdeep Parhar
4e6b9efc86 cxgbe(4): Claim the T6 -DBG card. 2016-09-30 00:16:54 +00:00
Eitan Adler
eb1c1a7f24 Remove a a duplicated word. 2016-09-29 13:59:14 +00:00
Navdeep Parhar
788f3c06f6 cxgbe(4): Use the port's top speed to figure out whether it is "high
speed" or not (for the purpose of calculating the number of queues etc.)
This does the right thing for 25Gbps and 100Gbps ports.
2016-09-24 19:03:05 +00:00
Navdeep Parhar
d44268d135 cxgbe(4): Support SIOGIFXMEDIA so that ifconfig displays correct media
for 25Gbps and 100Gbps ports.   This should have been part of r305713,
which is when the driver first started reporting extended media types.
2016-09-24 13:23:47 +00:00
Navdeep Parhar
aa93b99aa0 cxgbe(4): Make the location/length of all descriptor rings available in
the sysctl MIB.
2016-09-23 20:03:28 +00:00
Navdeep Parhar
3cdfcb51aa cxgbe(4): Fix netmap with T6, which doesn't encapsulate SGE_EGR_UPDATE
message inside a FW_MSG.  The base NIC already deals with updates in
either form.

Sponsored by:	Chelsio Communications
2016-09-23 17:24:06 +00:00
Navdeep Parhar
de58013128 cxgbe(4): Fix the output of the "tids" sysctl on T6. 2016-09-22 21:19:25 +00:00
Navdeep Parhar
d4759c89e5 cxgbe(4): Catch up with the different layout of WHOAMI in T6.
Note that the code moved below t4_prep_adapter() as part of this change
because now it needs a working chip_id().
2016-09-22 18:47:07 +00:00
Navdeep Parhar
8c0ca00b72 cxgbe(4): Setup congestion response for T6 rx queues. 2016-09-21 00:50:22 +00:00
Navdeep Parhar
6a0cae68b4 cxgbe(4): Show wcwr_stats for T6 cards. 2016-09-21 00:46:08 +00:00
Navdeep Parhar
0459a175eb cxgbe(4): Fixes to wrq stats.
- Increment tx_wrs_copied in the correct place.
- Add tx_wrs_sspace to the sysctl MIB.

Sponsored by:	Chelsio Communications
2016-09-19 17:16:51 +00:00
Navdeep Parhar
efe00fd923 cxgbe/t4_tom: Update the active/passive open code to support T6. Data
path works as-is.

Sponsored by:	Chelsio Communications
2016-09-17 23:08:49 +00:00
Navdeep Parhar
b0c554c3a5 cxgbe/t4_tom: The SMAC entry for a VI is at a different location in the T6.
Sponsored by:	Chelsio Communications
2016-09-17 22:13:03 +00:00
Navdeep Parhar
e6b81479f9 cxgbe(4): Attach to cards with the Terminator 6 ASIC. T6 cards will
come up as 't6nex' nexus devices with 'cc' ports hanging off them.

The T6 firmware and configuration files will be added as soon as they
are released.  For now the driver will try to work with whatever
firmware and configuration is on the card's flash.

Sponsored by:	Chelsio Communications
2016-09-16 00:08:37 +00:00
Navdeep Parhar
83a202cac5 Whitespace nits. 2016-09-15 22:31:49 +00:00
Navdeep Parhar
97f2919d54 cxgbe(4): Use the interface's viid to calculate the PF/VF/VFValid fields
to use in tx work requests.
2016-09-15 08:30:47 +00:00
John Baldwin
1fedfdd5c1 Remove explicit device_verbose() from the t4iov driver detach routine
now that this case is handled generically.
2016-09-12 18:07:06 +00:00
Navdeep Parhar
4cf3aa135b cxgbe(4): Catch up with the rename of tlscaps -> cryptocaps. TLS is one
of the capabilities of the crypto engine in T6.

Sponsored by:	Chelsio Communications
2016-09-12 00:15:40 +00:00
Navdeep Parhar
9113e53d54 cxgbe(4): Add support for additional port types and link speeds.
Sponsored by:	Chelsio Communications.
2016-09-11 23:08:57 +00:00
Navdeep Parhar
769ef07a38 cxgbe(4): Rename the debug_flags driver tunable/sysctl to dflags.
Tunables that end with _flags are special.

Sponsored by:	Chelsio Communications
2016-09-11 18:05:37 +00:00
Navdeep Parhar
82c1d6b762 cxgbe(4): Deal with the slightly different SGE_STAT_CFG in T6.
Sponsored by:	Chelsio Communications
2016-09-11 17:57:53 +00:00
Navdeep Parhar
ed7e5640a5 cxgbe(4): Use smaller min/max bursts for fl descriptors with a T6.
Sponsored by:	Chelsio Communications
2016-09-11 17:51:17 +00:00
Navdeep Parhar
0dbc6cfd75 cxgbe(4): Update the pad_boundary calculation for T6, which has a
different range of boundaries.

Sponsored by:	Chelsio Communications
2016-09-11 17:22:54 +00:00
Navdeep Parhar
472a6004cf cxgbe(4): Use correct macro for header length with T6 ASICs. This
affects the transmit of the VF driver only.

Sponsored by:	Chelsio Communications
2016-09-11 16:11:51 +00:00
Navdeep Parhar
774168be39 cxgbe(4): Set up fl_starve_threshold2 accurately for T6.
Sponsored by:	Chelsio Communications
2016-09-11 16:06:17 +00:00
Navdeep Parhar
3aea32935c cxgbe(4): Avoid a NULL dereference in the clearstats ioctl handler.
Port softc's are not initialized when the adapter is in recovery mode.
2016-09-09 17:15:16 +00:00
Navdeep Parhar
9e7cb06c17 cxgbe(4): Do not prescreen frames before attempting LRO.
Sponsored by:	Chelsio Communications
2016-09-09 07:34:14 +00:00
John Baldwin
6af45170c1 Chelsio T4/T5 VF driver.
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters.  The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.

Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device.  It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.

t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.

t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.

VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages).  This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request.  In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices.  Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.

Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.

Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics.  In addition, TOE is not supported on VF devices, only for
the PF interfaces.

Reviewed by:	np
MFC after:	2 months
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7599
2016-09-07 18:13:57 +00:00
John Baldwin
e06ab612d2 Don't break out of the m_advance() loop if len drops to zero.
If a packet contains the Ethernet header (14 bytes) in the first mbuf
and the payload (IP + UDP + data) in the second mbuf, then the attempt
to fetch the l3hdr will return a NULL pointer.  The first loop iteration
will drop len to zero and exit the loop without setting 'p'.  However,
the desired data is at the start of the second mbuf, so the correct
behavior is to loop around and let the conditional set 'p' to m_data of
the next mbuf (and leave offset as 0).

Reviewed by:	np
Sponsored by:	Chelsio Communications
2016-09-07 18:08:43 +00:00
Navdeep Parhar
5aaa3bc3b9 cxgbe/t4_tom: toepcb should be all-zero on allocation because the code
that cleans up on failure assumes that non-NULL values indicate
initialized items.

Sponsored by:	Chelsio Communications
2016-09-05 19:37:47 +00:00
Navdeep Parhar
5a63431265 Use correct CTR<n> variant. 2016-09-03 18:54:26 +00:00
Navdeep Parhar
af4251f3a1 cxgbe/cxgbei: Provide a knob to set the DDP threshold for iSCSI
transfers.

The Initiator and Target both perform zero copy receive for transfers
greater than or equal to this threshold.

Sponsored by:	Chelsio Communications
2016-09-02 00:21:24 +00:00
Navdeep Parhar
22536e1556 cxgbe/cxgbei: Count various events related to iSCSI operation and make
these counters available in the sysctl MIB.

Sponsored by:	Chelsio Communications
2016-09-01 23:58:36 +00:00
Navdeep Parhar
fec7f83342 cxgbe/cxgbei: Minor changes in the iSCSI CPL handlers.
- Use m_copydata instead of bcopy.
- Add new assertions.
- Log some more information.

Sponsored by:	Chelsio Communications
2016-09-01 22:40:55 +00:00
Navdeep Parhar
7cba15b16e cxgbe/cxgbei: Retire all DDP related code from cxgbei and switch to
routines available in t4_tom to manage the iSCSI DDP page pod region.

This adds the ability to use multiple DDP page sizes to the iSCSI
driver, among other improvements.

Sponsored by:	Chelsio Communications
2016-09-01 20:43:01 +00:00
Navdeep Parhar
a9feb2cdbb cxgbe/t4_tom: Two new routines to allocate and write page pods for a
buffer in the kernel's address space.
2016-09-01 00:51:59 +00:00
Navdeep Parhar
968267fdb8 cxgbe/t4_tom: Add general purpose routines to deal with page pod regions
and allocations within them.  Switch to these routines to manage the TOE
DDP region.

Sponsored by:	Chelsio Communications
2016-08-31 23:23:46 +00:00
John Baldwin
bc32f05443 Use device_verbose() to undo device_quiet() when detaching from t[45]iovX.
The device quiet flag is not automatically reset on detach, so it is
inherited by other device drivers (e.g. when switching a device driver
over to ppt for PCI pass through).  Cope with this behavior by explicitly
marking the device verbose during detach so that the next driver can make
its own decision.

Sponsored by:	Chelsio Communications
2016-08-29 22:47:14 +00:00
Navdeep Parhar
e25621e5ea cxgbe(4): Provide more details about the card in the sysctl MIB.
dev.t5nex.0.%desc: Chelsio T580-CR
dev.t5nex.0.hw_revision: 1
dev.t5nex.0.sn: PT13140042
dev.t5nex.0.pn: 110117150A0
dev.t5nex.0.ec: 0000000000000000
dev.t5nex.0.na: 0007432AF490
dev.t5nex.0.vpd_version: 3
dev.t5nex.0.scfg_version: 53255
dev.t5nex.0.bs_version: 1.1.0.0
dev.t5nex.0.er_version: 1.0.0.68
dev.t5nex.0.tp_version: 0.1.4.9
dev.t5nex.0.firmware_version: 1.16.2.0

Sponsored by:	Chelsio Communications
2016-08-27 00:13:41 +00:00
Navdeep Parhar
7df135c055 cxgbe/iw_cxgbe: Various fixes to the iWARP driver.
- Return appropriate error code instead of ENOMEM when sosend() fails in
  send_mpa_req.
- Fix for problematic race during destroy_qp.
- Abortive close in the failure of send_mpa_reject() instead of normal close.
- Remove the unnecessary doorbell flowcontrol logic.

Submitted by:	Krishnamraju Eraparaju at Chelsio
MFC after:	1 month
Sponsored by:	Chelsio communications
2016-08-26 17:38:13 +00:00
Navdeep Parhar
c149da1661 cxgbe/cxgbei: There is no need for multiple modules in the KLD.
Sponsored by:	Chelsio Communications
2016-08-26 01:28:31 +00:00
Navdeep Parhar
a95e0bfb6c cxgbe/cxgbei: Convert the driver-private PDU flags to enums and replace
pdu_ prefix with icp_ in struct icl_cxgbei_pdu.

Sponsored by:	Chelsio Communications
2016-08-25 23:06:12 +00:00
Navdeep Parhar
5d83392ac5 cxgbe/cxgbei: Read the chip's configuration to determine the actual
hardware send and receive PDU limits.  Report these limits to ICL and
take them into account when setting the socket's send and receive buffer
sizes.  The driver used a single hardcoded limit everywhere prior to
this change.

Sponsored by:	Chelsio Communications
2016-08-25 21:55:17 +00:00
Navdeep Parhar
97b84d344d Make the iSCSI parameter negotiation more flexible.
Decouple the send and receive limits on the amount of data in a single
iSCSI PDU.  MaxRecvDataSegmentLength is declarative, not negotiated, and
is direction-specific so there is no reason for both ends to limit
themselves to the same min(initiator, target) value in both directions.

Allow iSCSI drivers to report their send, receive, first burst, and max
burst limits explicitly instead of using hardcoded values or trying to
derive all of them from the receive limit (which was the only limit
reported by the drivers prior to this change).

Display the send and receive limits separately in the userspace iSCSI
utilities.

Reviewed by:	jpaetzel@ (earlier version), trasz@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7279
2016-08-25 05:22:53 +00:00
John Baldwin
bd6ff0807e Reorder sysctls so that nodes shared with the VF driver are added first.
This permits a single early return for VF devices in the routines that
add sysctl nodes.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7512
2016-08-19 17:54:51 +00:00
John Baldwin
301abe0f3f Adjust t4_port_init() to work with VF devices.
Specifically, the FW_PORT_CMD may or may not work for a VF (the PF
driver can choose whether or not to permit access to this command),
so don't attempt to fetch port information on a VF if permission is
denied by the PF.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7511
2016-08-19 17:52:48 +00:00
John Baldwin
577422fc53 Add structures for VF-specific adapter parameters.
While here, mark which parameters are PF-specific and which are
VF-specific.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7508
2016-08-19 17:49:49 +00:00
John Baldwin
8b2246b349 Add support for register dumps on VF devices.
- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
  instead of inlining it.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7484
2016-08-15 17:42:54 +00:00
John Baldwin
e5e9897c5f Update mailbox writes to work with VF devices.
- Use alternate register locations for the data and control registers for
  VFs.
- Do a dummy read to force the writes to the  mailbox data registers to
  post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7483
2016-08-15 17:41:34 +00:00
John Baldwin
59c1e950b9 Make SGE parameter handling more VF-friendly.
Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure.  Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly.  This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.

While here, move the call to t4_init_sge_params() to
get_params__post_init().  The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7476
2016-08-15 17:40:05 +00:00
John Baldwin
ec55567ce6 Track the base absolute ID of ingress and egress queues.
Use this to map an absolute queue ID to a logical queue ID in interrupt
handlers.  For the regular cxgbe/cxl drivers this should be a no-op as
the base absolute ID should be zero.  VF devices have a non-zero base
absolute ID and require this change.  While here, export the absolute ID
of egress queues via a sysctl.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7446
2016-08-09 17:49:42 +00:00
John Baldwin
a56745092c Reserve an adapter flag IS_VF to mark VF devices vs PF devices.
Sponsored by:	Chelsio Communications
2016-08-08 21:45:39 +00:00
John Baldwin
8f6690d385 Fix a typo. 2016-08-08 21:28:02 +00:00
John Baldwin
f454e7ebf5 Add __printflike() to bus_describe_intr() to enable -Wformat checks.
Fix a few places that were passing a raw string as the format to use
a "%s" format string instead.

MFC after:	2 months
2016-08-04 18:29:16 +00:00
Navdeep Parhar
9217931fb4 cxgbe/t4_tom: The page pod arena allocates from pod address space and
not index space.  The minimum valid allocation out of this arena is the
size of a single page pod.

Sponsored by:	Chelsio Communications
2016-08-04 17:29:42 +00:00
John Baldwin
847bfa8eb7 Use the port device name for the iov device for Chelsio T4/T5 cards.
Chelsio T4/T5 adapters are multifunction cards.  The main driver uses
physical function 4 (PF4).  However, VF devices for SR-IOV are only
supported on physical functions 0 through 3, where PF0 creates VFs tied
to port 0, etc.  The t4iov/t5iov driver was previously added to
create VF devices for ports that are present on each adapter.  This
change uses the recently added pci_iov_attach_name() function to
name the character device in /dev/iov after the associated port on
the card (e.g. /dev/iov/cxl0 is used to create VFs that share the
cxl0 port).  With this in place, mark the t4iov/t5iov devices quiet
to prevent them from cluttering dmesg.

Reviewed by:	rstone
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7402
2016-08-03 17:11:08 +00:00
Navdeep Parhar
515b36c5b5 cxgbe/t4_tom: Read the chip's DDP page sizes and save them in a
per-adapter data structure.  This replaces a global array with hardcoded
page sizes.

Sponsored by:	Chelsio Communications
2016-08-02 23:54:21 +00:00
John Baldwin
315048f2ad Store the offset of the KDOORBELL and GTS registers in the softc.
VF devices use a different register layout than PF devices.  Storing
the offset in a value in the softc allows code to be shared between the
PF and VF drivers.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7389
2016-08-01 22:39:51 +00:00
John Baldwin
8dd07eab0e Various fixes to the t4/5nex character device.
- Remove null open/close methods.
- Don't set d_flags to 0 explicitly.
- Remove t5_cdevsw as the .d_name member isn't really used and doesn't
  warrant a separate cdevsw just for the name.
- Use ENOTTY as the error value for an unknown ioctl request.
- Use make_dev_s() to close race with setting si_drv1.

Sponsored by:	Chelsio Communications
2016-07-29 22:11:29 +00:00
John Baldwin
29c229e9fc Mark spg_len and fl_pktshift static.
These variables are no longer exported to t4_netmap.c after r296478.
2016-07-28 17:37:12 +00:00
John Baldwin
07159830be Add support for zero-copy aio_write() on TOE sockets.
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer.  This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer.  The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.

Because these mbufs do not have an associated virtual address, m_data
is not valid.  Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().

An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job).  The
special mbufs reference this structure via m_ext.  Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size).  The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer.  The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.

Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
2016-07-27 18:29:35 +00:00
Navdeep Parhar
17146cd543 cxgbe(4): Initialize the adapter queues (fwq and mgmtq) instead of
returning EAGAIN if they aren't available when the user tries to program
a filter.  Do this after validating the filter so that the driver
doesn't bring up the queues if it doesn't have to.
2016-07-26 23:29:37 +00:00
John Baldwin
f91fca5ba7 Add a driver to create VF devices on Chelsio T4/T5 NICs.
Chelsio NICs are a bit unique compared to some other NICs in that they
expose different functionality on different physical functions.  In
particular, PF4 is used to manage the NIC interfaces ('t4nex' and 't5nex').
However, PF4 is not able to create VF devices.  Instead, VFs are only
supported by physical functions 0 through 3.  This commit adds 't4iov'
and 't5iov' drivers that attach to PF0-3.

One extra wrinkle is that the iov devices cannot enable SR-IOV until the
firwmare has been initialized by the main PF4 driver.  To handle this
case, a new t4_if kobj interface has been added to permit cross-calls
between the PF drivers.  The PF4 driver notifies sibling drivers when it
is fully attached.  It also requests sibling drivers to detach before it
detaches.  Sibling drivers query the PF4 driver during their attach
routine to see if it is attached.  If not, the sibling drivers defer
their attach actions until the PF4 driver informs them it is attached.

VF devices are associated with a single port on the NIC.  VF devices
created from PF0 are associated with the first port on the NIC, VFs
from PF1 are associated with the second port, etc.  VF devices can
only be created from a PF device that has an associated port.  Thus,
on a 2-port card, VFs are only supported on PF0 and PF1.

Reviewed by:	np (earlier versions)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2016-07-22 22:46:41 +00:00
John Baldwin
069af0eb14 Install a handler for firmware work request error messages.
If a driver sends an malformed or disallowed work request, the firmware
responds with a work request error.  Previously the driver treated this is
as an unexpected message and panicked.  Now it decodes the error message
to aid in debugging.

Reviewed by:	np (older version)
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D6950
2016-07-22 21:52:07 +00:00
Enji Cooper
092af585e1 Remove redundant declaration for tcp_dooptions, similar to r302576
netinet/tcp_var.h already defines this function

Differential Revision:	https://reviews.freebsd.org/D7189
MFC after:	1 week
PR:	209920
Reported by:	Mark Millard <markmi@dsl-only.net>
Reviewed by:	np
Tested with:	clang 3.8.0, gcc 4.2.1, gcc 5.3.0
Sponsored by:	EMC / Isilon Storage Division
2016-07-11 17:11:18 +00:00
Navdeep Parhar
a33b046750 cxgbe(4): Add sysctl to display the RSS indirection table size for an
interface.

dev.cxl.<n>.rss_size
dev.vcxl.<n>.rss_size

MFC after:	3 days
2016-07-08 18:13:23 +00:00
Navdeep Parhar
671bf2b8b2 cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function.  Allow callers to direct the response to any iq.  Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter.  This reduces its
size by around 2KB.  All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation.  The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode.  There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL.  The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply.  Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by:	re@ (kib@)
Sponsored by:	Chelsio Communications
2016-07-05 01:29:24 +00:00