Commit Graph

514 Commits

Author SHA1 Message Date
Navdeep Parhar
dd991bd5a1 cxgbe(4): Update T5 and T4 firmwares to 1.15.28.0.
These firmwares were obtained from the beta "Chelsio T5/T4 Unified Wire
v2.12.0.2 for Linux" release.  Changes since last release are listed in the
"Release Notes" accompanying the beta release and are copy-pasted here as well.

The plan is to have only GA'd firmwares in any -STABLE FreeBSD branch so I'll
MFC this (after 2 months) only if it ends up in a GA release.

================================================================================
================================================================================

22.1. T5 Firmware
+++++++++++++++++++++++++++++++++

Version : 1.15.28.0
Date    : 02/29/2016
================================================================================

FIXES
-----

BASE:
 - Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
   queue was ignored.
 - Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
 - Fixed an issue in watchdog which was causing VM bring-up failure after
   reboot.
 - Fixed 40G link failures with some switches when auto-negotiation enabled.
 - Fixed to improve on link bring-up time.
 - Per port buffer groups size doubled to improve performance.
 - Fixed an issue where bogus d3hot bits were set causing traffic stall.
 - Fixed an issue where sometimes adapter was not seen after reboot.
 - Fixed an issue where iWARP was crashing in conjunction with traffic
   management.
 - Fixed an issue where link failed to come up after removing twinax cable and
   inserting optical module.

OFLD
 - Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.

FOiSCSI
 - Fixed an issue in recovery path where connection was getting closed before
   recovery processing was done.
 - Fixed an issue in TCP port reuse.
 - Fixed an issue in recovery path when large number (>64) of iSCSI connections
   were in use.
 - Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
   use given inerface.

ENHANCEMENTS
------------

BASE:
 - Added new interface to program DCA settings in SGE contexts; allow 32-byte
   IQE size
 - Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
 - Added MPS raw interface.

ETH:
 - New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.

OFLD:
 - WR opcode is returned to host in cqe error response.

================================================================================
================================================================================

22.2. T4 Firmware
+++++++++++++++++

Version : 1.15.28.0
Date    : 02/29/2016
================================================================================

FIXES
-----

BASE:
 - Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
   was ignored.
 - Fixed an issue in watchdog which was causing VM bring-up failure after
   reboot.
 - Per port buffer groups size doubled to improve performance.
 - Fixed an issue where iWARP was crashing in conjunction with traffic
   management.

FOiSCSI:
 - Fixed an issue in recovery path where connection was getting closed before
   recovery processing was done.
 - Fixed an issue in TCP port reuse.
 - Fixed an issue in recovery path when large number (>64) of iSCSI connections
   were in use.
 - Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
   use given inerface.

ENHANCEMENTS
------------

BASE:
 - Added MPS raw interface.

ETH:
 - New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.

================================================================================

Obtained from:	Chelsio Communications
MFC after:	2 months
Sponsored by:	Chelsio Communications
2016-03-01 02:36:50 +00:00
Navdeep Parhar
e8c6ba7265 cxgbe(4): Add a sysctl to retrieve the maximum speed/bandwidth supported by a
port.

dev.cxgbe.<n>.max_speed
dev.cxl.<n>.max_speed

Sponsored by:	Chelsio Communications
2016-02-25 01:10:56 +00:00
Navdeep Parhar
40bf7442fa cxgbe: catch up with the latest hardware-related definitions.
Obtained from:	Chelsio Communications
Sponsored by:	Chelsio Communications
2016-02-19 00:29:16 +00:00
Navdeep Parhar
748d440809 Remove duplicate definition (CPL_TRACE_PKT_T5). 2016-02-12 20:14:03 +00:00
Gleb Smirnoff
b4b12e52fb Garbage collect unused arguments of m_init(). 2016-02-10 18:54:18 +00:00
Gleb Smirnoff
f353ae1c62 More fixes to the build. 2016-01-27 05:15:53 +00:00
Gleb Smirnoff
57a78e3bae Augment struct tcpstat with tcps_states[], which is used for book-keeping
the amount of TCP connections by state.  Provides a cheap way to get
connection count without traversing the whole pcb list.

Sponsored by:	Netflix
2016-01-27 00:45:46 +00:00
Navdeep Parhar
097f289f25 Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.

Submitted by:	Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D4801
2016-01-22 23:33:34 +00:00
Navdeep Parhar
a34ec16adb iw_cxgbe: fix a couple of problems int the RDMA_TERMINATE handler.
a) Look for the CPL in the payload buffer instead of the descriptor.
b) Retrieve the socket associated with the tid with the inpcb lock held.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
2016-01-21 00:42:48 +00:00
Hans Petter Selasky
e936121d31 Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets
  according to the hash type and flowid. This prevents exhaustion of
  the LRO entries due to too many connections at the same time.
  Testing using a larger number of higher bandwidth TCP connections
  showed that the incoming ACK packet aggregation rate increased from
  ~1.3:1 to almost 3:1. Another test showed that for a number of TCP
  connections greater than 16 per hardware receive ring, where 8 TCP
  connections was the LRO active entry limit, there was a significant
  improvement in throughput due to being able to fully aggregate more
  than 8 TCP stream. For very few very high bandwidth TCP streams, the
  optimizing LRO wrapper will add CPU usage instead of reducing CPU
  usage. This is expected. Network drivers which want to use the
  optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
  of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
  "tcp_lro_flush()". Further the LRO control structure must be
  initialized using "tcp_lro_init_args()" passing a non-zero number
  into the "lro_mbufs" argument.

- Make LRO statistics 64-bit. Previously 32-bit integers were used for
  statistics which can be prone to wrap-around. Fix this while at it
  and update all SYSCTL's which expose LRO statistics.

- Ensure all data is freed when destroying a LRO control structures,
  especially leftover LRO entries.

- Reduce number of memory allocations needed when setting up a LRO
  control structure by precomputing the total amount of memory needed.

- Add own memory allocation counter for LRO.

- Bump the FreeBSD version to force recompilation of all KLDs due to
  change of the LRO control structure size.

Sponsored by:	Mellanox Technologies
Reviewed by:	gallatin, sbruno, rrs, gnn, transport
Tested by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D4914
2016-01-19 15:33:28 +00:00
Navdeep Parhar
5725f0e490 cxgbe: bind the ithreads that handle NIC rx to the correct CPU if the kernel
is built with option RSS.
2016-01-11 17:52:42 +00:00
Alexander V. Chernikov
8a9f7532b0 Convert cxgb/cxgbe to the new routing API.
Discussed with:		np
2016-01-07 08:07:17 +00:00
Gleb Smirnoff
0c39d38d21 Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.

With this change:

- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
  options length. The functions gives a better estimate, since it takes
  into account SACK state as well.

Reviewed by:	jtl
Differential Revision:	https://reviews.freebsd.org/D3593
2016-01-07 00:14:42 +00:00
Navdeep Parhar
9f6b62e791 iw_cxgbe: Shut down the socket but do not close the fd in case of error.
The fd is closed later in this case.  This fixes a "SS_NOFDREF on enter"
panic.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Reviewed by:	Steve Wise @ Open Grid Computing
2016-01-05 01:32:40 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Navdeep Parhar
e3148e46b2 cxgbei: Hardware accelerated iSCSI target and initiator for TOE capable
cards supported by cxgbe(4).

On the host side this driver interfaces with the storage stack via the
ICL (iSCSI Common Layer) in the kernel.  On the wire the traffic is
standard iSCSI (SCSI over TCP as per RFC 3720/7143 etc.) that
interoperates with all other standards compliant implementations.  The
driver is layered on top of the TOE driver (t4_tom) and promotes
connections being handled by t4_tom to iSCSI ULP (Upper Layer Protocol)
mode.  Hardware assistance in this mode includes:

- Full TCP processing.
- iSCSI PDU identification and recovery within the TCP stream.
- Header and/or data digest insertion (tx) and verification (rx).
- Zero copy (both tx and rx).

Man page will follow in a separate commit in a couple of weeks.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2015-12-26 06:05:21 +00:00
Navdeep Parhar
9eb533d3b4 cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI
offload driver.  These changes come from projects/cxl_iscsi.
2015-12-26 00:26:02 +00:00
Navdeep Parhar
a89012b294 Fix RSS build.
Reported by:	arybchik@
2015-12-05 10:10:18 +00:00
Konstantin Belousov
f28b929a9e Fix build for !TCP_OFFLOAD case. 2015-12-03 10:33:57 +00:00
John Baldwin
fe2ebb7644 Add support for configuring additional virtual interfaces (VIs) on a port.
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port.  This change allows additional non-netmap
interfaces to be configured on each port.  Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.

Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).

T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).

One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.

The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.

The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats.  There is currently no way to
clear the stats of an individual VI.

Reviewed by:	np
MFC after:	1 month
Sponsored by:	Chelsio
2015-12-03 00:02:01 +00:00
Navdeep Parhar
00cc2faabb cxgbe/t4_tom: add a knob to the default configuration file to tune
the TOE for LAN operation.  It is possible to set this to other values
(cluster for networks with little loss and really tight RTTs, and wan
for relatively large RTTs and/or lossy networks) depending on the
environment in which the TOE is being used.

None of this affects plain NIC operation in any way.

MFC after:	1 week
2015-11-10 02:29:19 +00:00
John Baldwin
02da5bb12a Chelsio T5 chips do not properly echo the No Snoop and Relaxed Ordering
attributes when replying to a TLP from a Root Port.  As a workaround,
disable No Snoop and Relaxed Ordering in the Root Port of each T5 adapter
during attach so that CPU-initiated requests do not contain these flags.

Note that this affects CPU-initiated requests to all devices under this
root port.

Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio
2015-11-05 21:33:15 +00:00
Navdeep Parhar
baa7d0bf9d cxgbe/tom: decide whether to shove segments or not only if there is
payload to transmit.

MFC after:	1 week
2015-10-30 01:18:07 +00:00
Hans Petter Selasky
2da3897d01 Rename linuxapi[.ko] into linuxkpi[.ko], to reflect that it is a
kernel programming interface module, KPI, to avoid confusion with the
existing Linux userspace binary compatibility shims. Bump the
FreeBSD_version number.

Reviewed by:	np @
Suggested by:	dumbbell @
Sponsored by:	Mellanox Technologies
2015-10-22 09:50:45 +00:00
Hans Petter Selasky
f556cede8a Merge LinuxKPI changes from DragonflyBSD:
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.

Reviewed by:	np @
Sponsored by:	Mellanox Technologies
2015-10-19 12:26:38 +00:00
Navdeep Parhar
07d684f712 cxgbe(4): support for the kernel RSS option.
You need PCBGROUP and RSS in the kernel config to use this.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2015-10-16 01:19:55 +00:00
Navdeep Parhar
66eb134d76 iw_cxgbe: use correct RFC number. 2015-10-14 23:29:19 +00:00
Navdeep Parhar
ca7c0f2e2d iw_cxgbe: MPA v2 is always available.
Submitted by:	Krishnamraju Eraparaju at chelsio dot com
Reviewed by:	Steve Wise at opengridcomputing dot com
2015-10-13 01:04:38 +00:00
Navdeep Parhar
0d18010dec iw_cxgbe: fix for page fault in cm_close_handler().
This is roughly the iw_cxgbe equivalent of
be13b2dff8
-----------------
RDMA/cxgb4: Connect_request_upcall fixes

When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.

If the IWCM returns an error, then we must abort the connection and
release resources.  Also abort_connection() should not post a CLOSE
event, so clean that up too.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
-----------------

Submitted by:	Krishnamraju Eraparaju at chelsio dot com.
2015-10-10 01:41:07 +00:00
John Baldwin
8fb15ddb00 Add a comment that to clarify how to determine the amount of received DDP
data.

Reviewed by:	np
Differential Revision:	https://reviews.freebsd.org/D3619
2015-09-10 21:41:11 +00:00
Navdeep Parhar
8faf57012b cxgbe(4): Save the flags for the last adapter-wide synchronized
operation that was initiated successfully.  (The caller and thread are
already recorded).

MFC after:	1 week
2015-08-19 15:40:03 +00:00
Navdeep Parhar
d94b89b915 cxgbe(4): Update T5 and T4 firmwares bundled with the driver to 1.14.4.0. The
changes in the firmwares since 1.11.27.0 are listed here (straight copy-paste
from the "Release Notes.txt" accompanying the Chelsio Unified Wire 2.11.1.0
release on the website).

22.1. T5 Firmware
+++++++++++++++++++++++++++++++++

Version : 1.14.4.0
Date    : 08/05/2015
================================================================================

FIXES
-----

BASE:
- Fixes a potential data path hang by properly programming PMTX congestion
  threshold settings.
- Fixes a potential initialization error when accessing a configuration file
  stored on the flash.
- Fixes a regression where SGE resources can be miss-sized if iWARP is disabled.

ETH:
- Fixes a timing issue that would prevent CR4 links from coming up with some
  switches.

FOFCoE:
- Defers fcoe linkdown mailbox command handling till LOGO is sent.
- Updates vlan prio for all outstanding IOs during dcbx update.

ENHANCEMENTS
------------

BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.

ETH:
- Enhances segmentation offload to include VxLAN and Geneve.
- Adds PTP support.
- Adds new interface to allow the driver to query the VI rss table base
  addresses.
- Allows the driver to program the SGE ingrext contxt CongDrop field.

OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
  and rcv scale factors.

iSCSI:
- Adds support for iscsi segmentatation offload (ISO).
- Adds support for iscsi t10-dif offload.

FOiSCSI:
- Sets FORCE_BIT for cut through processing for FOiSCSI.

FOFCoE:
- Adds support for FCoE BB6.
- Improves WRITE performance.

================================================================================
================================================================================

Version : 1.13.32.0
Date    : 03/25/2015
================================================================================

FIXES
-----

BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
   negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
   adapter configurations)
- Fixes config file based PL_TIMEOUT register programming

ETH:
- Fixes a potential EO UDP SEG header corruption
- Fixes an issue where 1000Base-X was not enabled correctly when using QSA
   modules

OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1

FOFCoE:
- Fixes fcoe xchg leaks in linkdown/peer down path
- Fixes cleanup in FCoE linkdown and fixed buf timer flowid abuse
- Fixes fw crash by clearing fcf flowc during bye

FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.

ENHANCEMENTS
------------

BASE:
- Adds support for VFs on PFs 4 to 7
- Adds support for QPs/CQs on any physical and virtual function

ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
- Adds support for CR4 links (BEAN/AEC on 40G TwinAx cables)

OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software

FOISCSI:
- Adds IPv6 support for foiscsi. Keeps backward compatibility with
   old foiscsi drivers which doesn't support ipv6.

FOFCoE:
- Added fcoe debug support in flowc dump

================================================================================
================================================================================

Version : 1.12.25.0
Date    : 10/22/2014
================================================================================

FIXES
-----

BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Fixes an issue where the link would intermittently fail to come up
- Fixes an issue where adapters with an external PHY couldn't run at 100Mbps
- Fixes an issue where active optical cables were not recognized
- Fixes link advertising issues on T520-BT (speed and pause frames) that would
  cause the link to negotiate unexpected settings
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested

ETH:
- Fixes NVGRE Segmentation Offload network header generation.

DCBX:
- Fixes an issue where some settings were not being sent to the switch
  correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
  FW
- Fixes a firmware crash on DCBX APP information request before link up

FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT

ENHANCEMENTS
-------------

BASE:
- Adds link partner settings reporting when available
- Adds QSA support (in conjunction with QSA VPD)
- Adds T520-BT LED support
- Reports NOTSUPPORTED for modules with an unhandled identifier

DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs

FOiSCSI:
- Add support for multiple iSCSI DDP client
- Sends DHCP renew request when lease expires

================================================================================

22.2. T4 Firmware
+++++++++++++++++

Version : 1.14.4.0
Date    : 08/05/2015
================================================================================

FIXES
-----

BASE:
- Fixes a potential initialization error when accessing a configuration file
  stored on the flash.
- Initialize PCIE_DBG_INDIR_REQ.Enable to 0, as hardware failed to do so and
  register dumps could result in errors.

ETH:
- Fixes an issue that sometimes prevented the link from coming up in CR adapters.

ENHANCEMENTS
------------

BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.

ETH:
- Adds new interface to allow the driver to query the VI rss table base
  addresses.

OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
  and rcv scale factors.

================================================================================
================================================================================

Version : 1.13.32.0
Date    : 03/25/2015
================================================================================

FIXES
-----

BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
    negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
    adapter configurations)
- Fixes config file based PL_TIMEOUT register programming

ETH:
- Fixes a potential EO UDP SEG header corruption

OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1

FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.

ENHANCEMENTS
------------

ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE

OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software

================================================================================
================================================================================

Version : 1.12.25.0
Date    : 10/22/2014
================================================================================

FIXES
-----

BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested

DCBX:
- Fixes an issue where some settings were not being sent to the switch
  correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
  FW
- Fixes a firmware crash on DCBX APP information request before link up

FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT

ENHANCEMENTS
------------

BASE:
- Adds link partner settings reporting when available
- Firmware now reports NOTSUPPORTED for modules with an unhandled identifier

DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs

FOiSCSI:
- Adds support for multiple iSCSI DDP clients
- Sends DHCP renew request when lease expires

================================================================================

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2015-08-05 19:45:11 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Navdeep Parhar
3d3169c858 cxgbe(4): initialize debug_flags from the kernel environment.
MFC after:	3 days
2015-07-31 04:50:47 +00:00
Andrey V. Elsukov
cc0a3c8ca4 Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
Navdeep Parhar
a1ed88571f cxgbe(4): Ask the firmware for the start of the RSS slice for a port and
save it for later.  This enables direct manipulation of the indirection
tables (although the stock driver doesn't do that right now).

MFC after:	1 month
2015-07-17 06:46:18 +00:00
Navdeep Parhar
c7dbd80213 cxgbe(4): Update T4 and T5 firmwares to 1.14.2.0.
Obtained from:	Chelsio Communications
MFC after:	3 days
2015-07-14 08:02:05 +00:00
Luigi Rizzo
847bf38369 Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
  (the latter are lower performance but give access to all traffic
  in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
  from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
  (ptnetmap) recently presented at bsdcan. ptnetmap is described in
        S. Garzarella, G. Lettieri, L. Rizzo;
        Virtual device passthrough for high speed VM networking,
        ACM/IEEE ANCS 2015, Oakland (CA) May 2015
        http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
  to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after:	1 month
Sponsored by:	(partly) Verisign, Cisco
2015-07-10 05:51:36 +00:00
Navdeep Parhar
9af71ab3bc cxgbe(4): Add a new knob that controls the congestion response of netmap
rx queues.  The default is to drop rather than backpressure.

This decouples the congestion settings of NIC and netmap rx queues.

MFC after:	3 days
2015-07-06 20:56:59 +00:00
Navdeep Parhar
41f7622b64 cxgbe(4): Do not override the the global defaults for congestion drops.
The hw.cxgbe.cong_drop knob is not affected by this change because the
driver sets up congestion drop on a per-queue basis.

MFC after:	3 days
2015-07-06 20:28:42 +00:00
Navdeep Parhar
fd215e45eb cxgbe(4): request an automatic tx update when a netmap tx queue idles.
The NIC tx queues already do this.

MFC after:	1 week
Differential Revision:
2015-07-01 00:34:14 +00:00
Navdeep Parhar
dbbf46c40c cxgbe: get_fl_payload returns a header mbuf when successful.
MFC after:	3 days
2015-06-23 05:55:13 +00:00
Navdeep Parhar
0e4cd4a2e0 cxgbe(4): Add the ability to dump mailbox commands and replies. It is
enabled/disabled via bit 0 of adapter->debug_flags (which is available
at dev.t5nex.<n>.debug_flags).

MFC after:	1 week
2015-06-16 12:36:29 +00:00
Navdeep Parhar
a40200ca80 cxgbe: set the minimum burst size when fetching fl buffers to 128B for
netmap rx queues too.  This should have gone in as part of r283858.
2015-06-05 00:37:46 +00:00
Navdeep Parhar
a89409b817 cxgbe: no need to display the per-lane GT/s rating of the pcie link.
MFC after:	1 week
2015-06-01 03:24:39 +00:00
Navdeep Parhar
6af2071b47 cxgbe: set minimum burst size when fetching freelist buffers to 128B.
MFC after:	3 days
2015-06-01 00:55:15 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Gleb Smirnoff
a5e0fa4063 Don't use ifm_data. It was used only for self checking debug.
Reviewed by:	np
2015-04-26 21:31:30 +00:00
Gleb Smirnoff
28ebe80cab Provide functions to determine presence of a given address
configured on a given interface.

Discussed with:	np
Sponsored by:	Nginx, Inc.
2015-04-17 11:57:06 +00:00
Navdeep Parhar
7ef00d7884 cxgbe/tom: return rx credits promptly if the socket buffer's low water
mark cannot be reached because the window advertised to the peer isn't
wide enough.  While here, tweak the normal credit return too.

MFC after:	1 month
2015-03-31 01:22:20 +00:00
Navdeep Parhar
70ca622987 cxgbe(4): provide the exact RSS hash type instead of a catch-all value
to the upper layers.
2015-03-26 18:45:51 +00:00
Navdeep Parhar
bb481a8edd cxgbe(4): Do not call sbuf_trim on an sbuf with a drain function.
MFC after:	1 week
2015-03-23 23:06:32 +00:00
John Baldwin
b12c0a9eeb Move special DDP handling for closing a connection into a new
handle_ddp_close() function in t4_ddp.c as the logic is similar
to handle_ddp_data().  This allows all knowledge of the special
DDP mbufs to be private to t4_ddp.c as well.
2015-03-16 15:56:06 +00:00
Ian Lepore
1eafc07856 Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
Ian Lepore
b36424bd4b Revert r279934, r279938; this is going to be fixed in sbuf instead.
PR:		195668
2015-03-14 13:04:39 +00:00
Navdeep Parhar
070d262707 cxgbe(4): fix if_media handling for T520-BT cards. 1Gbps and 100Mbps
are valid for this card.

MFC after:	1 week
2015-03-14 00:02:53 +00:00
Ian Lepore
2f01da7886 Fix a paste-o, sb is already a pointer in this one. 2015-03-12 23:31:29 +00:00
Ian Lepore
9bc58e3d33 Nullterminate strings returned via sysctl.
PR:		195668
2015-03-12 18:22:20 +00:00
John Baldwin
69a088631e Resize receive socket buffers that support autosizing when receiving
TCP data via direct data placement.

Sponsored by:	Chelsio
MFC after:	1 week
2015-03-11 17:35:07 +00:00
Navdeep Parhar
08aeb15136 cxgbe(4): experimental rx packet sink for netmap queues. This is not
intended for general use.

MFC after:	1 month
2015-03-06 20:41:28 +00:00
Navdeep Parhar
1cdfce07df cxgbe(4): knobs to experiment with the interrupt coalescing timer for
netmap rx queues, and the "batchiness" of rx updates sent to the chip.

These knobs will probably become per-rxq in the near future and will be
documented only after their final form is decided.

MFC after:	1 month
2015-03-06 20:39:19 +00:00
Navdeep Parhar
7f93d696bf cxgbe(4): provide the correct size of freelists associated with netmap
rx queues to the chip.  This will fix many problems with native netmap
rx on ncxl/ncxgbe interfaces.

MFC after:	1 week
2015-03-06 16:05:20 +00:00
Navdeep Parhar
1c468ed975 cxgbe(4): allow tx hardware checksumming on the netmap interface.
It is disabled by default but users can set IFCAP_TXCSUM on the
netmap ifnet (ifconfig ncxl0 txcsum) to override netmap and force
the hardware to calculate and insert proper IP and L4 checksums in
outbound frames.

MFC after:	2 weeks
2015-02-24 21:31:13 +00:00
Navdeep Parhar
1605bac6fb cxgbe(4): set up congestion management for netmap rx queues.
The hw.cxgbe.cong_drop knob controls the response of the chip when
netmap queues are congested.
2015-02-24 18:40:10 +00:00
Navdeep Parhar
b91cbdda76 cxgbe(4): do not set the netmap rxq interrupts on a hair-trigger.
MFC after:	2 weeks
2015-02-24 18:32:17 +00:00
Navdeep Parhar
b46fad7b6a cxgbe(4): wait for the hardware to catch up before destroying a netmap txq.
MFC after:	2 weeks
2015-02-24 18:22:24 +00:00
Navdeep Parhar
f8e30f9cf1 cxgbe(4): request an automatic tx update when a netmap txq idles.
MFC after:	2 weeks
2015-02-24 18:19:25 +00:00
Navdeep Parhar
c5bb375553 cxgbe(4): there is no need to force an "unimplemented" panic needlessly.
The calls to free_nm_txq and free_nm_rxq are made just a few lines prior
to the panic.
2015-02-20 22:57:54 +00:00
Hans Petter Selasky
b5c1e0cb8d Update the infiniband stack to Mellanox's OFED version 2.1.
Highlights:
 - Multiple verbs API updates
 - Support for RoCE, RDMA over ethernet

All hardware drivers depending on the common infiniband stack has been
updated aswell.

Discussed with:	np @
Sponsored by:	Mellanox Technologies
MFC after:	1 month
2015-02-17 08:40:27 +00:00
Navdeep Parhar
d5d9fbbae2 cxgbe(4): allow the SET_FILTER_MODE ioctl to change the mode when it's
safe to do so.

MFC after:	1 month
2015-02-10 01:16:43 +00:00
Navdeep Parhar
b3d44a6800 cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD).  The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
   iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
   adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
   or iSCSI.  Any such attempt will fail with EBUSY.

MFC after:	2 weeks
2015-02-08 09:28:55 +00:00
Navdeep Parhar
319f290030 cxgbe(4): adapter_full_init is always a synchronized operation.
MFC after:	1 week
2015-02-08 08:52:18 +00:00
Navdeep Parhar
d86a5ff917 cxgbe(4): a change to the synchronization rules within the the driver.
This is purely cosmetic because the new rules are already followed.

MFC after:	1 week
2015-02-08 08:42:45 +00:00
Navdeep Parhar
cb6c101a86 cxgbe(4): fix a test made while enabling TOE.
MFC after:	1 week
2015-02-07 01:50:32 +00:00
Navdeep Parhar
85dc477798 cxgbe(4): Add a minimal if_cxl module that pulls in the real driver as
a dependency.  This ensures "ifconfig cxl<n> ..." does the right thing
even when it's run with no driver loaded.

if_cxl.ko is the tiniest module in /boot/kernel.

MFC after:	2 weeks
2015-02-06 01:10:04 +00:00
Navdeep Parhar
a2355dd909 cxgbe(4): reserve id for iSCSI upper layer driver. 2015-02-05 08:52:20 +00:00
John Baldwin
86f05ea6cf Lock the socket buffer before jumping to the 'out' label if sblock()
fails in t4_soreceive_ddp().
2015-01-26 16:32:41 +00:00
John Baldwin
de5a10ecbc - Update a disabled KASSERT() to use sbused() instead of accessing
the no-longer existant sb_cc sockbuf member.
- Use sbavail() instead of sbused() in t4_soreceive_ddp() to match the
  usage in soreceive_stream() on which it is based.

Discussed with:	glebius (2)
2015-01-26 16:29:14 +00:00
John Baldwin
4f621933a5 Fix a couple of panics when detaching from a cxgbe/cxl interface that was
never brought up:
- Allow NULL to be passed to sglist_free().
- Don't try to stop an interface that was never fully initialized.

Reviewed by:	np
2015-01-26 16:26:28 +00:00
Hans Petter Selasky
d39d7c8636 Add missing linuxapi module dependencies and always use the FreeBSD
"MODULE_VERSION" macro definition. Remove the redefinition of the
"MODULE_VERSION" macro from the Linux kernel compatibility API.

MFC after:	1 month
Reported by:	np@
Sponsored by:	Mellanox Technologies
2015-01-19 21:53:00 +00:00
Navdeep Parhar
88d7f6bddf Allow cxgbe(4) to be built on i386. Driver attach will succeed only on a subset
of i386 systems.
2015-01-16 01:32:40 +00:00
Navdeep Parhar
e503548810 cxgbe/iw_cxgbe: fix whitespace nit in r277102.
Reported by:	stefanf@
2015-01-13 16:18:31 +00:00
Navdeep Parhar
b3e112f962 cxgbe/iw_cxgbe: allow any size during the initial MPA exchange.
MFC after:	1 month
2015-01-13 01:40:12 +00:00
Navdeep Parhar
db8bcd1b21 cxgbe/tom: allocate page pod addresses instead of ppod#.
MFC after:	2 weeks
2015-01-07 06:20:33 +00:00
Navdeep Parhar
f8c479085f cxgbe/tom: use vmem(9) as the DDP page pod allocator.
MFC after:	1 month
2015-01-06 01:30:32 +00:00
Navdeep Parhar
008015d2f4 cxgbe(4): fix the description of a strange bunch of counters.
MFC after:	1 week
2015-01-05 23:43:24 +00:00
Navdeep Parhar
79b93bf6a3 cxgbe/tom: do not engage the TOE's payload chopper for payload < 2 MSS
or for 10Gbps ports.

MFC after:	2 weeks
2015-01-03 00:09:21 +00:00
Navdeep Parhar
402873f32a cxgbe/tom: fix the MSS calculation for IPv6 connections handled by the TOE.
MFC after:	1 week
2015-01-02 21:13:24 +00:00
Navdeep Parhar
dd1be4d418 cxgbe/tom: log some more details in send_flowc_wr.
MFC after:	1 week
2015-01-02 20:52:51 +00:00
Navdeep Parhar
255155fc91 cxgbe(4): remove buf_ring specific restriction on the txq size.
MFC after:	2 months
2015-01-01 09:33:46 +00:00
Navdeep Parhar
7951040f8a cxgbe(4): major tx rework.
a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.

b) Replace buf_ring with a brand new mp_ring (multiproducer ring).  This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual.  mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue.  It also has:
- the ability to enqueue/dequeue multiple items.  This might become
  significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
  descriptors and have another if_transmit thread take over.  A thread
  that's writing tx descriptors can end up doing so for an unbounded
  time period if a) there are other if_transmit threads continuously
  feeding the sofware queue, and b) the chip keeps up with whatever the
  thread is throwing at it.
- accurate statistics about interesting events even when the stats come
  at the expense of additional branches/conditional code.

The NIC txq lock is uncontested on the fast path at this point.  I've
left it there for synchronization with the control events (interface
up/down, modload/unload).

c) Add support for "type 1" coalescing work request in the normal NIC tx
path.  This work request is optimized for frames with a single item in
the DMA gather list.  These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.

d) Do not request automatic cidx updates every 32 descriptors.  Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately).  Also, request an automatic final update
when the queue idles after activity.  This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles.  This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.

e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx).  Allow work requests to be written
directly to the hardware descriptor ring if room is available.  I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.

MFC after:	2 months
2014-12-31 23:19:16 +00:00
John Baldwin
5ad25ceb41 Check for SS_NBIO in so->so_state instead of sb->sb_flags in
soreceive_stream().

Differential Revision:	https://reviews.freebsd.org/D1299
Reviewed by:	bz, gnn
MFC after:	1 week
2014-12-15 17:52:08 +00:00
Navdeep Parhar
a7570ee305 Move KTR_CXGBE from t4_tom.h to adapter.h so that the base if_cxgbe
code can use it too.

MFC after:	1 week
2014-12-12 21:54:59 +00:00
Navdeep Parhar
b741402c40 cxgbe(4): allow the driver to use rx buffers that do not end on a pack
boundary.

MFC after:	2 weeks
2014-12-06 01:47:38 +00:00
Navdeep Parhar
e3207e1973 cxgbe(4): Allow for different pad and pack boundaries for different
adapters.  Set the pack boundary for T5 cards to be the same as the
PCIe max payload size.  The chip likes it this way.

In this revision the driver allocate rx buffers that align on both
boundaries.  This is not a strict requirement and a followup commit
will switch the driver to a more relaxed allocation strategy.

MFC after:	2 weeks
2014-12-06 00:13:56 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Gleb Smirnoff
651e4e6a30 Merge from projects/sendfile: extend protocols API to support
sending not ready data:
o Add new flag to pru_send() flags - PRUS_NOTREADY.
o Add new protocol method pru_ready().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-30 13:24:21 +00:00
Gleb Smirnoff
0f9d0a73a4 Merge from projects/sendfile:
o Introduce a notion of "not ready" mbufs in socket buffers.  These
mbufs are now being populated by some I/O in background and are
referenced outside.  This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
  freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
  The sb_ccc stands for ""claimed character count", or "committed
  character count".  And the sb_acc is "available character count".
  Consumers of socket buffer API shouldn't already access them directly,
  but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
  with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
  search.
o New function sbready() is provided to activate certain amount of mbufs
  in a socket buffer.

A special note on SCTP:
  SCTP has its own sockbufs.  Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs.  Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them.  The new
notion of "not ready" data isn't supported by SCTP.  Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
  A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does.  This was discussed with rrs@.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 12:52:33 +00:00
Navdeep Parhar
729fee332b cxgbe(4): figure out the max payload size and save it for later.
MFC after:	1 week
2014-11-19 20:16:56 +00:00
Navdeep Parhar
aa8d1792d1 iw_cxgbe: don't forget to close the socket in c4iw_connect if soconnect
fails.

Submitted by:	hariprasad at chelsio dot com
2014-11-13 03:59:36 +00:00
Navdeep Parhar
05c4567dd9 Fix some bad interaction between cxgbe(4) and lacp lagg(4) that could
leave a port permanently disabled when a copper cable is unplugged and
then plugged right back in.

lacp_linkstate goes looking for the current ifmedia on a link state
change and it could get stale information from cxgbe(4) on a module
unplug followed by replug.  The fix is to process module events before
link-state events within the driver, and to always rebuild the ifmedia
list on a module change event (instead of rebuilding it lazily).

Thanks to asomers@ for the problem report and detailed analysis to go
with it.

MFC after:	1 week
2014-11-12 23:29:22 +00:00
Gleb Smirnoff
cfa6009e36 In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-12 09:57:15 +00:00
John Baldwin
7bfc98355a Add device ID for the T502-BT (dual-port 1G) adapter.
Reviewed by:	np
MFC after:	1 week
2014-11-11 20:05:50 +00:00
Navdeep Parhar
62fc63abfb cxgbe(4): adjust PMRX and PMTX parameters.
MFC after:	1 week
2014-11-10 19:45:28 +00:00
Navdeep Parhar
527e4e62ac Always request a completion for every work request for iWARP. The
initial MPA exchange must be tracked this way so that t4_tom's state for
the tid is all clean at the time the tid transitions to RDMA mode.  Once
it does, t4_tom is out of the way and iw_cxgbe uses the qp endpoints
directly.

Sponsored by:	Chelsio Communications
2014-10-28 18:10:57 +00:00
Navdeep Parhar
3030c8bce9 iwcm_event status needs to be populated for close_complete_upcall
Submitted by:	Hariprasad at Chelsio dot com
Sponsored by:	Chelsio Communications
2014-10-27 23:11:48 +00:00
Navdeep Parhar
d25d06afc0 Some cxgbe/iw_cxgbe fixes:
- Free rt in c4iw_connect only if it is allocated.
- Call soclose instead of so_shutdown if there is an abort from the peer.
- Close socket and return failure if TOE is not enabled.

Submitted by:	Hariprasad at Chelsio dot com
Sponsored by:	Chelsio Communications
2014-10-27 22:22:46 +00:00
Navdeep Parhar
1284501329 cxgbe(4): bump up PF4's share of some global resources.
This increases the size of the per-port RSS slice and also allows the
driver to use a larger number of tx and rx queues.

MFC after:	2 weeks
2014-10-25 00:14:44 +00:00
Navdeep Parhar
53f49a7bd8 cxgbe/iw_cxgbe: wake up waiters after flushing the qp.
Obtained from:	Chelsio
2014-10-22 18:55:44 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Hans Petter Selasky
2c6eb461a7 Update the OFED Linux compatibility layer and
Mellanox hardware driver(s):

- Properly name an inclusion guard
- Fix compile warnings regarding unsigned enums
- Add two new sysctl nodes
- Remove all empty linux header files
- Make an error printout more verbose
- Use "mod_delayed_work()" instead of
  cancelling and starting a timeout.
- Implement more Linux scatterlist
  functions.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-15 13:40:29 +00:00
Navdeep Parhar
19abdd0654 cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by:	Hariprasad at chelsio dot com (original version)
MFC after:	2 weeks.
2014-10-07 21:26:22 +00:00
Navdeep Parhar
2d8910854b cxgbe(4): implement if_get_counter. 2014-09-27 05:50:31 +00:00
Navdeep Parhar
acc45299f5 cxgbe(4): explicitly set various if_hw_tso* values.
MFC after:	3 days
2014-09-26 22:21:02 +00:00
Navdeep Parhar
db25c97a1a Make sure the adapter's management queue and the event queue are
available before any uppper layer driver (TOE, iWARP, or iSCSI)
registers with the base cxgbe(4) driver.

Submitted by:	Hariprasad at chelsio dot com
Reviewed by:	np@
2014-09-26 18:53:00 +00:00
Navdeep Parhar
3a260c2a19 Update comment (missed this bit in r272079). 2014-09-24 20:08:43 +00:00
Navdeep Parhar
13251b21e1 cxgbe/tom: Catch up with r271119, syncache_add doesn't need tcbinfo lock. 2014-09-24 20:04:11 +00:00
Navdeep Parhar
1dee8327d4 cxgbe(4): Verify that the addresses in if_multiaddrs really are multicast
addresses.  (The chip doesn't really care, it's just that it needs to be
told explicitly if unicast DMACs are checked for "hits" in the hash that
is used after the TCAM entries are all used up).
2014-09-23 22:57:11 +00:00
Navdeep Parhar
8374717dc0 cxgbe(4): add support for the SIOCGI2C ioctl. 2014-09-12 21:56:57 +00:00
Navdeep Parhar
3eb2c201a6 cxgbe(4): knobs to enable/disable PAUSE frame based flow control.
MFC after:	1 week
2014-09-12 05:25:56 +00:00
Robert Watson
7524f39b9f Add new a M_START() mbuf macro that returns a pointer to the start of
an mbuf's storage (internal or external).

Add a new M_SIZE() mbuf macro that returns the size of an mbuf's
storage (internal or external).

These contrast with m_data and m_len, which are with respect to data
in the buffer, rather than the buffer itself.

Rewrite M_LEADINGSPACE() and M_TRAILINGSPACE() in terms of M_START()
and M_SIZE().

This is done as we currently have many instances of using mbuf flags
to generate pointers or lengths for internal storage in header and
regular mbufs, as well as to external storage. Rather than replicate
this logic throughout the network stack, centralising the
implementation will make it easier for us to refine mbuf storage.
This should also help reduce bugs by limiting the amount of
mbuf-type-specific pointer arithmetic.  Followup changes will
propagate use of the macros throughout the stack.

M_SIZE() conflicts with one macro in the Chelsio driver; rename that
macro in a slightly unsatisfying way to eliminate the collision.

MFC after:	3 days
Obtained from:	jeff (with enhancements)
Sponsored by:	EMC / Isilon Storage Division
Reviewed by:	bz, glebius, np
Differential Revision:	https://reviews.freebsd.org/D753
2014-09-11 07:16:15 +00:00
Navdeep Parhar
4309e7b020 Whitespace nit.
MFC after:	1 week
2014-09-09 18:36:00 +00:00
Hans Petter Selasky
c7818b48b6 - Update the OFED Linux Emulation layer as a preparation for a
hardware driver update from Mellanox Technologies.
- Remove empty files from the OFED Linux Emulation layer.
- Fix compile warnings related to printf() and the "%lld" and "%llx"
format specifiers.
- Add some missing 2-clause BSD copyrights.
- Add "Mellanox Technologies, Ltd." to list of copyright holders.
- Add some new compatibility files.
- Fix order of uninit in the mlx4ib module to avoid crash at unload
using the new module_exit_order() function.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-08-27 13:21:53 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Navdeep Parhar
6b6e7079b3 cxgbe(4): Do not poke T4-only registers on a T5 (and vice versa).
Obtained from:	Chelsio Communications
MFC after:	1 week
2014-08-08 18:36:53 +00:00
Navdeep Parhar
bc22dc708f cxgbe(4): Let caller specify whether it's ok to sleep in
t4_sched_config and t4_sched_params.

MFC after:	2 weeks
2014-08-06 19:38:03 +00:00
Navdeep Parhar
46a646940f cxgbe(4): Do not run any sleepable code in the SIOCSIFFLAGS handler when
IFF_PROMISC or IFF_ALLMULTI is being flipped.  bpf(4) holds its global
mutex around ifpromisc in at least the bpf_dtor path.

MFC after:	3 days
2014-08-04 22:32:16 +00:00
Navdeep Parhar
b2c5bf0de2 cxgbe(4): Remove an unused version of t4_enable_vi.
MFC after:	2 weeks
2014-08-02 18:37:22 +00:00
Navdeep Parhar
4d6db4e0f7 cxgbe(4): some optimizations in freelist handling.
MFC after:	2 weeks.
2014-08-02 06:55:36 +00:00
Navdeep Parhar
f10405b396 cxgbe(4): Fix an off by one error when looking for the BAR2 doorbell
address of an egress queue.

MFC after:	2 weeks
2014-08-02 01:48:25 +00:00
Navdeep Parhar
b2daa9a9cd cxgbe(4): minor optimizations in ingress queue processing.
Reorganize struct sge_iq.  Make the iq entry size a compile time
constant.  While here, eliminate RX_FL_ESIZE and use EQ_ESIZE directly.

MFC after:	2 weeks
2014-08-02 00:56:34 +00:00
Navdeep Parhar
0fe982772d Some hooks in cxgbe(4) for the offloaded iSCSI driver.
(I'm committing this on behalf of my colleagues in the Storage team
at Chelsio).

Submitted by:	Sreenivasa Honnur <shonnur at chelsio dot com>
Sponsored by:	Chelsio Communications.
2014-07-24 18:39:08 +00:00
Navdeep Parhar
82eff304b6 cxgbe(4): Keep track of the clusters that have to be freed by the
custom free routine (rxb_free) in the driver.  Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet.  This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.

MFC after:	1 week
2014-07-23 22:29:22 +00:00
Navdeep Parhar
c086e3d1b7 Add missing newline to an error message.
MFC after:	3 days
2014-07-22 19:48:21 +00:00
Navdeep Parhar
c3fb772502 Simplify r267600, there's no need to distinguish between allocated and
inlined mbufs.

MFC after:	1 week
2014-07-22 02:02:39 +00:00
Navdeep Parhar
bae4e5af99 cxgbe(4): Display CF facility correctly in the device log.
MFC after:	3 days
2014-07-15 18:24:41 +00:00
Navdeep Parhar
44eb893659 Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl. The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.

MFC after:	1 week
2014-07-15 01:03:29 +00:00
Navdeep Parhar
30f337891d cxgbe(4): Add an iSCSI softc to the adapter structure. 2014-07-11 21:02:54 +00:00
Gleb Smirnoff
15c28f87b8 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Navdeep Parhar
327235b3d6 cxgbe(4): Update the bundled T4 and T5 firmwares to versions 1.11.27.0.
Obtained from:	Chelsio
MFC after:	3 days
2014-06-22 23:40:20 +00:00
Navdeep Parhar
0835ddc766 Consider the total number of descriptors available (and not just those
that are ready to be reclaimed) when deciding whether to resume tx after
a stall.

MFC after:	3 days
2014-06-20 20:28:46 +00:00
Navdeep Parhar
ccc69b2fa9 cxgbe(4): Fix bug in the fast rx buffer recycle path. In some cases rx
buffers were getting recycled when they should have been left alone.

MFC after:	3 days
2014-06-18 00:16:35 +00:00
Attilio Rao
3ae10f7477 - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
Navdeep Parhar
861e42b209 cxgbe(4): Properly account for the freelist buffers used when returning
early from service_iq due to a budget restriction.  This fixes a potential
rx hang when using INTx.

MFC after:	3 days
2014-06-05 00:38:32 +00:00
Navdeep Parhar
368541ba1e cxgbe(4): Fix a NULL dereference when the very first call to
get_scatter_segment() in get_fl_payload() fails.  While here,
fix the code to adjust fl_bufs_used when a failure occurs for
any other scatter segment.

MFC after:	3 days
2014-05-30 22:59:45 +00:00
Navdeep Parhar
298d969c53 cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way.  You can use both
simultaneously.

For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface.  These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software.  Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc.  You
should run netmap on the 'n' interfaces only, that's what they are for.

With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card.  2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress.  I expect it to be closer to 40Mpps once done.  In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size.  T4 gear is totally untested.

trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef)
881.962562 main [1882] Sending 512 packets every  0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...

Relnotes:	Yes
Sponsored by:	Chelsio Communications.
2014-05-27 18:18:41 +00:00
Bjoern A. Zeeb
255cd9fd58 Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
functions to the tcp_var.h header file in order to avoid further
duplication with upcoming commits.

Reviewed by:	np
MFC after:	2 weeks
2014-05-23 20:15:01 +00:00
Navdeep Parhar
7a5b897dfe cxgbe(4): Remove stray if_up from the code that creates the tracing ifnet. 2014-05-23 01:45:44 +00:00
Maksim Yevmenkin
080a4b9b1c use correct (integer) type for the temperature sysctl
Reviewed by:	np, scottl
Obtained from:	Netflix
MFC after:	3 days
2014-04-17 19:29:15 +00:00
Navdeep Parhar
8b3f42d52d cxgbe(4): Recognize the "spider" configuration where a T5 card's 40G
QSFP port is presented as 4 distinct 10G SFP+ ports to the driver.

MFC after:	2 weeks
2014-03-21 00:56:56 +00:00
Navdeep Parhar
65bd4d1cb4 cxgbe(4): Use ifi_oqdrops in if_data to count drops in the tx path. 2014-03-20 02:28:05 +00:00
Navdeep Parhar
475992bdfb cxgbe(4): if_iqdrops statistic should include tunnel congestion drops.
MFC after:	1 week
2014-03-20 01:58:04 +00:00
Navdeep Parhar
38035ed6dc cxgbe(4): significant rx rework.
- More flexible cluster size selection, including the ability to fall
  back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
  case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
  mbuf chain for any kind of freelist.  This replaces two variants: one
  for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster.  It was limited to 4K clusters
  only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes.  The driver's
  man page will be updated separately.

MFC after:	5 weeks
2014-03-18 20:14:13 +00:00
Dimitry Andric
e9e21b6e41 In cxgbe, conditionalize the t4_pgprot_wc() function, since it is only
used when DOT5 is defined.

Reviewed by:	np
MFC after:	3 days
2014-02-14 23:38:42 +00:00
Scott Long
f7a74e061b Add a new sysctl, dev.cxgbe.N.rsrv_noflow, and a companion tunable,
hw.cxgbe.rsrv_noflow.  When set, queue 0 of the port is reserved for
TX packets without a flowid.  The hash value of packets with a flowid
is bumped up by 1.  The intent is to provide a private queue for
link-level packets like LACP that is unlikely to overflow or suffer
deep queue latency.

Reviewed by:	np
Obtained from:	Netflix
MFC after:	3 days
2014-02-06 18:40:38 +00:00
Navdeep Parhar
e46dcc5670 cxgbe(4): Use the rx channel map (instead of the tx channel map) as the
congestion channel map.

MFC after:	1 week
2014-02-06 03:30:12 +00:00
Navdeep Parhar
7293a15f54 cxgbe(4): The T5 allows for a different freelist starvation threshold
for queues with buffer packing.  Use the correct value to calculate a
freelist's low water mark.

MFC after:	1 week
2014-02-06 03:21:43 +00:00
Navdeep Parhar
454813ff9c cxgbe(4): Use the port's tx channel to identify it to t4_clr_port_stats.
MFC after:	3 days
2014-02-06 02:34:29 +00:00
Adrian Chadd
3af0f449ae Add an option to enable or disable the small RX packet copying that
is done to improve performance of small frames.

When doing RX packing, the RX copying isn't necessarily required.

Reviewed by:	np
2014-01-02 23:23:33 +00:00
Navdeep Parhar
88bb82e511 Do not create a hardware IPv6 server if the listen address is not
in6addr_any and is not in the CLIP table either.  This fixes a reported
TOE+IPv6 NULL-dereference panic in do_pass_open_rpl().

While here, stop creating hardware servers for any loopback address.
It's just a waste of server tids.

MFC after:	1 week
2013-12-17 21:41:23 +00:00
Navdeep Parhar
93e9cae3fa Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).

Read the range of tids available for the ETHOFLD functionality if it's
enabled.

New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.

MFC after:	2 weeks
2013-12-14 03:08:03 +00:00
Adrian Chadd
ac68deae6d Print out the full PCIe link negotiation during dmesg.
I found this useful when checking whether a NIC is in a PCIE 3.0 8x slot
or not.

Reviewed by:	np
Sponsored by:	Netflix, inc.
2013-12-10 00:07:04 +00:00
Navdeep Parhar
d419aaa126 Unstaticize t4_list and t4_uld_list. This works around a clang
annoyance[1] and allows kgdb to find these symbols.

[1] http://lists.freebsd.org/pipermail/freebsd-hackers/2012-November/041166.html

MFC after:	3 days
2013-12-09 23:33:57 +00:00
Navdeep Parhar
273ef9912d cxgbe(4): save a copy of the RSS map for each port for the driver's use. 2013-12-08 17:47:37 +00:00
Navdeep Parhar
05337b80ee cxgbe(4): T4_SET_SCHED_CLASS and T4_SET_SCHED_QUEUE ioctls to program
scheduling classes in the chip and to bind tx queue(s) to a scheduling
class respectively.  These can be used for various kinds of tx traffic
throttling (to force selected tx queues to drain at a fixed Kbps rate,
or a % of the port's total bandwidth, or at a fixed pps rate, etc.).

Obtained from:	Chelsio
2013-12-03 18:34:52 +00:00
Navdeep Parhar
2471928bf8 Disable an assertion that relies on some code[1] that isn't in HEAD yet.
[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-August/036573.html
2013-11-27 19:54:19 +00:00
Navdeep Parhar
245a0bd40a cxgbe(4): update the internal list of device features.
MFC after:	3 days
2013-11-21 20:07:58 +00:00
Navdeep Parhar
1192eeb8a3 cxgbe(4): Tidy up the display for payload memory statistics (pm_stats).
# sysctl -n dev.t4nex.0.misc.pm_stats
# sysctl -n dev.t5nex.0.misc.pm_stats

MFC after:	1 week
2013-11-07 00:25:49 +00:00
Navdeep Parhar
be2c01211c cxgbe(4): Exclude MPS_RPLC_MAP_CTL (0x11114) from the register dump. Turns
out it's a write-only register with strange side effects on read.

Submitted by:	gnn
MFC after:	3 days
2013-11-04 21:06:21 +00:00
Gleb Smirnoff
66e01d73cd - Provide necessary includes.
- Remove unnecessary includes.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-29 11:17:49 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Navdeep Parhar
87f804e879 Fix typo in previous commit. 2013-10-18 00:00:08 +00:00
Navdeep Parhar
02318fcf86 iw_cxgbe should have a dependency on t4nex.
Reported by:	trasz@
2013-10-17 23:57:17 +00:00
Navdeep Parhar
fb93f5c47f iw_cxgbe: iWARP driver for Chelsio T4/T5 chips. This is a straight port
of the iw_cxgb4 found in OFED distributions.

Obtained from:	Chelsio
2013-10-17 18:37:25 +00:00
Navdeep Parhar
b3eda7872d cxgbe(4): Store the log2 of the # of doorbells per BAR2 page for both
ingress and egress queues, and for both T4 and T5.  These values are
used by the T4/T5 iWARP driver.
2013-10-14 23:32:56 +00:00
Navdeep Parhar
48d05478bf cxgbe(4): Update T4 and T5 firmwares to 1.9.12.0 2013-10-14 21:25:07 +00:00
Gleb Smirnoff
4cdc1f5421 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00
Dimitry Andric
64db896617 Fix kernel build on amd64 after r256118, since the machine/md_var.h
header is not implicitly included there.  So include it explicitly.

Approved by:	re (delphij)
Pointy hat to:	dim
MFC after:	3 days
X-MFC-With:	r256118
2013-10-07 22:30:03 +00:00
Dimitry Andric
42355a4ff6 Remove redundant declaration of cpu_clflush_line_size in
sys/dev/cxgbe/t4_sge.c, to silence a gcc warning.

Approved by:	re (gjb)
MFC after:	3 days
2013-10-07 16:56:56 +00:00
Navdeep Parhar
eb22728291 Rework the tx credit mechanism between the cxgbe/tom driver
and the card.  This helps smooth out some burstiness in the
exchange.

Approved by:	re (glebius)
2013-09-09 04:38:57 +00:00
Navdeep Parhar
c81d56a0aa Fix a miscalculation that caused cxgbe/tom to auto-increment
a TOE socket's tx buffer size too aggressively.

Approved by:	re (delphij)
2013-09-09 00:16:59 +00:00
Navdeep Parhar
4f641559c7 For TOE connections, the window scale factor in CPL_PASS_ACCEPT_REQ is
set to 15 to indicate that the peer did not send a window scale option
with its SYN.  Do not send a window scale option in the SYN|ACK reply
in that case.
2013-09-03 23:34:04 +00:00
Navdeep Parhar
32e9219012 Fix the sysctl that displays whether buffer packing is enabled
or not.
2013-08-30 02:13:36 +00:00
Navdeep Parhar
1458bff9a4 Implement support for rx buffer packing. Enable it by default for T5
cards.

This is a T4 and T5 chip feature which lets the chip deliver multiple
Ethernet frames in a single buffer.  This is more efficient within the
chip, in the driver, and reduces wastage of space in rx buffers.

- Always allocate rx buffers from the jumbop zone, no matter what the
  MTU is.  Do not use the normal cluster refcounting mechanism.
- Reserve space for an mbuf and a refcount in the cluster itself and let
  the chip DMA multiple frames in the rest.
- Use the embedded mbuf for the first frame and allocate mbufs on the
  fly for any additional frames delivered in the cluster.  Each of these
  mbufs has a reference on the underlying cluster.
2013-08-30 01:45:36 +00:00
Navdeep Parhar
480e603c79 Merge r254386 from user/np/cxl_tuning. Add an INET|INET6 check missing
in said revision.

r254386:
Flush inactive LRO entries periodically.
2013-08-29 06:26:22 +00:00
Navdeep Parhar
c93d567412 Whitespace nit. 2013-08-28 23:15:05 +00:00
Navdeep Parhar
319a31ea18 Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.
- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
  around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
  had to be reworked as the kernel calls ifc_match with the global
  if_cloners_mtx held.
2013-08-28 20:59:22 +00:00
Navdeep Parhar
9800517691 Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update a
couple of assertions in the TOE driver as well.
2013-08-28 20:45:45 +00:00
Navdeep Parhar
8a59745fca Use correct mailbox and PCIe PF number when querying RDMA parameters. 2013-08-26 19:02:52 +00:00
Navdeep Parhar
aa9a5cc05a There is no need to hold the freelist lock around alloc/free of
software descriptors.  This also silences WITNESS warnings when
the software descriptors are allocated with M_WAITOK.

MFC after:	1 week
2013-08-23 18:03:18 +00:00
Navdeep Parhar
2485eeee37 Display P/N information in the description.
Submitted by:	gnn
MFC after:	3 days
2013-08-20 18:22:04 +00:00
Navdeep Parhar
82342de26d Display temperature sensor data. Shows -1 if sensor not
available on the card.

# sysctl dev.t4nex.0.temperature
# sysctl dev.t5nex.0.temperature
2013-08-02 18:05:42 +00:00
Navdeep Parhar
73cd922046 Fix previous commit (r253873). "cong" has one bit per channel but the
congestion channel map has 1 nibble per channel.  So bits wxyz need to
be blown up into 000w000x000y000z.
2013-08-02 17:44:19 +00:00
Navdeep Parhar
ba41ec4848 Set up congestion manager context properly for T5 based cards.
MFC after:	3 days (will check with re@)
2013-08-01 23:38:30 +00:00
Navdeep Parhar
6e22f9f3da Display SGE tunables in the sysctl tree.
dev.t5nex.0.fl_pktshift: payload DMA offset in rx buffer (bytes)
dev.t5nex.0.fl_pad: payload pad boundary (bytes)
dev.t5nex.0.spg_len: status page size (bytes)
dev.t5nex.0.cong_drop: congestion drop setting

Discussed with:	scottl
2013-07-31 05:12:51 +00:00
Navdeep Parhar
2393220538 Display a string instead of a numeric code in the linkdnrc sysctl.
Submitted by:	gnn@
2013-07-27 07:43:43 +00:00