Commit Graph

398 Commits

Author SHA1 Message Date
Navdeep Parhar
79bb86ac2e cxgbe/iw_cxgbe: Fix for stray "start_ep_timer timer already started!"
messages.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2016-03-29 00:10:47 +00:00
Navdeep Parhar
78552b23a5 cxgbe(4): Be consistent and call ETHER_BPF_MTAP before writing anything
to the descriptor ring no matter what path the frame takes within the
driver's tx.
2016-03-22 18:56:23 +00:00
Navdeep Parhar
8d814a458c iw_cxgbe/libcxgb4: Pull in many applicable fixes from the upstream Linux
iWARP driver and userspace library to the FreeBSD iw_cxgbe and libcxgb4.

This commit includes internal changesets 6785 8111 8149 8478 8617 8648
8650 9110 9143 9440 9511 9894 10164 10261 10450 10980 10981 10982 11730
11792 12218 12220 12222 12223 12225 12226 12227 12228 12229 12654.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Sponsored by:	Chelsio Communications
2016-03-21 00:29:45 +00:00
Navdeep Parhar
784a631dc5 cxgbe(4): Tidy up PAUSE frame accounting.
Figure out if the chip is counting PAUSE frames in the "normal" stats
and take them out if it is.  This fixes a bug in the tx stats because
the default hardware behavior is different for Tx and Rx but the driver
was treating both the same way.  The result was that OPACKETS, OBYTES,
and OMCASTS were under-reported (if tx_pause > 0) before this change.

Note that the mac_stats sysctl still gives you the raw value of these
statistics straight from the device registers.
2016-03-17 01:15:16 +00:00
Navdeep Parhar
460f25e5ca cxgbe(4): Enable PFs 0-3, and allow creation of SR-IOV VFs on these PFs
in the default configuration files.
2016-03-16 19:46:22 +00:00
Navdeep Parhar
fc2aa1fc0b cxgbe(4): Enable additional capabilities in the default configuration
files.  All features with FreeBSD drivers of some kind are now in the
default configuration.
2016-03-16 19:43:44 +00:00
Navdeep Parhar
c47c408a38 cxgbe(4): Update some register settings in the default configuration
files to match the "uwire" configuration.
2016-03-16 19:41:00 +00:00
Navdeep Parhar
78aa2c994c cxgbe(4): Remove a couple of pointless assignments in sysctl_meminfo.
Do not display range if start = stop (this is a workaround for some
unused regions).
2016-03-16 19:36:11 +00:00
Dimitry Andric
353cae4976 Fix the following gcc warnings on sparc64, when TCP_OFFLOAD is not
defined:

    sys/dev/cxgbe/t4_main.c:7474: warning: 'sysctl_tp_tick' defined but not used
    sys/dev/cxgbe/t4_main.c:7505: warning: 'sysctl_tp_dack_timer' defined but not used
    sys/dev/cxgbe/t4_main.c:7519: warning: 'sysctl_tp_timer' defined but not used

This just adds a bunch of #ifdef TCP_OFFLOAD in the right places.

Reviewed by:	np
Differential Revision: https://reviews.freebsd.org/D5620
2016-03-12 18:38:51 +00:00
Navdeep Parhar
537e5f9a74 cxgbe(4): Fix typo in previous commit. 2016-03-12 03:02:33 +00:00
Navdeep Parhar
0f2f53efd2 cxgbe(4): Catch up with the latest list of card capabilities as reported
by the firmware.
2016-03-12 02:54:55 +00:00
Navdeep Parhar
959824daba cxgbe(4): sysctls to display the TOE's TCP timers.
cask:~# sysctl -d dev.t5nex.0.toe
dev.t5nex.0.toe.finwait2_timer: FINWAIT2 timer (us)
dev.t5nex.0.toe.initial_srtt: Initial SRTT (us)
dev.t5nex.0.toe.keepalive_intvl: Keepidle interval (us)
dev.t5nex.0.toe.keepalive_idle: Keepidle idle timer (us)
dev.t5nex.0.toe.persist_max: Persist timer max (us)
dev.t5nex.0.toe.persist_min: Persist timer min (us)
dev.t5nex.0.toe.rexmt_max: Retransmit max (us)
dev.t5nex.0.toe.rexmt_min: Retransmit min (us)
dev.t5nex.0.toe.dack_timer: DACK timer (us)
dev.t5nex.0.toe.dack_tick: DACK tick (us)
dev.t5nex.0.toe.timestamp_tick: TCP timestamp tick (us)
dev.t5nex.0.toe.timer_tick: TP timer tick (us)
...

cask:~# sysctl dev.t5nex.0.toe
dev.t5nex.0.toe.finwait2_timer: 9765440
dev.t5nex.0.toe.initial_srtt: 244128
dev.t5nex.0.toe.keepalive_intvl: 73240800
dev.t5nex.0.toe.keepalive_idle: 7031116800
dev.t5nex.0.toe.persist_max: 9765440
dev.t5nex.0.toe.persist_min: 976544
dev.t5nex.0.toe.rexmt_max: 9765440
dev.t5nex.0.toe.rexmt_min: 244128
dev.t5nex.0.toe.dack_timer: 19520
dev.t5nex.0.toe.dack_tick: 32.768
dev.t5nex.0.toe.timestamp_tick: 1048.576
dev.t5nex.0.toe.timer_tick: 32.768
...
2016-03-11 23:24:04 +00:00
Navdeep Parhar
9945ceb857 cxgbe(4): Add sysctls to display the TP microcode version and the
expansion rom version (if there's one).

trantor:~# sysctl dev.t4nex dev.t5nex | grep _version
dev.t4nex.0.firmware_version: 1.15.28.0
dev.t4nex.0.tp_version: 0.1.9.4
dev.t5nex.0.firmware_version: 1.15.28.0
dev.t5nex.0.exprom_version: 1.0.0.68
dev.t5nex.0.tp_version: 0.1.4.9
2016-03-11 03:15:17 +00:00
Navdeep Parhar
5849e3bbcd cxgbe(4): Add a sysctl for the event capture mask of the TP block's
logic analyzer.

dev.t5nex.<n>.misc.tp_la_mask
dev.t4nex.<n>.misc.tp_la_mask
2016-03-11 01:54:43 +00:00
Navdeep Parhar
98790d33c6 cxgbe(4): Improvements to the code that deals with the firmware's log.
- Query the location of the log very early during attach.  Refresh the
  location later after establishing contact with the firmware.
- Save the log's location as a flat address in devlog_params.
- Use a memory window instead of backdoor access to the EDC/MC to read
  the log.
2016-03-10 23:17:26 +00:00
Navdeep Parhar
30d816a7b5 cxgbe(4): Fix bug in r296603. The memory window needs to be
repositioned if the start address isn't in the window already.  One
of the bounds check used the end address instead.
2016-03-10 20:36:32 +00:00
Navdeep Parhar
c912289045 cxgbe(4): Add general purpose routines that offer safe access to the
chip's memory windows.  Convert existing users of these windows to the
new routines.
2016-03-10 06:15:31 +00:00
Navdeep Parhar
f808381742 cxgbe(4): Allow the addr/len pair that is being validated in
validate_mem_range to span multiple memory types.  Update
validate_mt_off_len to use validate_mem_range.
2016-03-10 02:43:10 +00:00
Navdeep Parhar
4d131308f3 cxgbe(4): Rename regwin_lock to reg_lock. It is used to protect access
to indirect registers only.
2016-03-08 22:23:30 +00:00
Navdeep Parhar
40f1f5c6c4 cxgbe(4): Reshuffle and rototill t4_hw.c, solely to reduce diffs with
the internal shared code.

Obtained from:	Chelsio Communications
2016-03-08 19:34:56 +00:00
Navdeep Parhar
fa6d184d93 cxgbe(4): Minor updates to the shared routines that deal with firmware images. 2016-03-08 10:07:40 +00:00
Navdeep Parhar
89308a40ba cxgbe(4): Fix t4_tp_get_rdma_stats. 2016-03-08 09:40:45 +00:00
Navdeep Parhar
a5eff82112 cxgbe(4): Many new functions in the shared code, unused at this time.
Obtained from:	Chelsio Communications
2016-03-08 09:34:56 +00:00
Navdeep Parhar
f799998f84 cxgbe(4): Use t4_link_down_rc_str in shared code to decode the reason
the link is down, instead of doing it in OS specific code.
2016-03-08 08:59:34 +00:00
Navdeep Parhar
9b6d39a019 cxgbe(4): Updates to shared routines that get/set various parameters via
the firmware.

Obtained from:	Chelsio Communications
2016-03-08 08:39:53 +00:00
Navdeep Parhar
05e2c36c20 cxgbe(4): Remove __devinit and SPEED_<foo> as part of catch up with
internal shared code.

Obtained from:	Chelsio Communications
2016-03-08 08:13:37 +00:00
Navdeep Parhar
b3500921c4 cxgbe(4): Updates to the shared routines that deal with the serial EEPROM,
flash, and VPD.

Obtained from:	Chelsio Communications
2016-03-08 07:48:55 +00:00
Navdeep Parhar
948d0ec074 cxgbe(4): Updates to mailbox routines in the shared code.
Obtained from:	Chelsio Communications
2016-03-08 06:27:47 +00:00
Navdeep Parhar
207d9fea78 cxgbe(4): Update the interrupt handlers for hardware errors.
Obtained from:	Chelsio Communications
2016-03-08 02:44:32 +00:00
Navdeep Parhar
700cfba72d cxgbe(4): Overhaul the shared code that deals with the chip's TP block,
which is responsible for filtering and RSS.

Add the ability to use filters that match on PF/VF (aka "VNIC id") while
here.  This is mutually exclusive with filtering on outer VLAN tag with
Q-in-Q.

Sponsored by:	Chelsio Communications
2016-03-08 02:04:05 +00:00
Navdeep Parhar
90e7434a6d cxgbe(4): Add a struct sge_params to store per-adapter SGE parameters.
Move the code that reads all the parameters to t4_init_sge_params in the
shared code.  Use these per-adapter values instead of globals.

Sponsored by:	Chelsio Communications
2016-03-08 00:23:56 +00:00
Navdeep Parhar
4f8a1fd8cd cxgbe(4): Updated register dumps.
- Get the list of registers to read during a regdump from the shared
  code instead of the OS specific code.  This follows a similar move
  internally.  The shared code includes the list for T6.

- Update cxgbetool to be able to decode T5 VF, T6, and T6 VF register
  dumps (and catch up with some updates to T4 and T5 register decode).

Obtained from:	Chelsio Communications
Sponsored by:	Chelsio Communications
2016-03-07 21:11:35 +00:00
Navdeep Parhar
d1205d093d cxgbe(4): Very basic T6 awareness. This is part of ongoing work to
update to the latest internal shared code.

- Add a chip_params structure to keep track of hardware constants for
  all generations of Terminators handled by cxgbe.
- Update t4_hw_pci_read_cfg4 to work with T6.
- Update the hardware debug sysctls (hidden within dev.<tNnex>.<n>.misc.*) to
  work with T6.  Most of the changes are in the decoders for the CIM
  logic analyzer and the MPS TCAM.
- Acquire the regwin lock around indirect register accesses.

Obtained from:	Chelsio Communications
Sponsored by:	Chelsio Communications
2016-03-04 13:11:13 +00:00
Navdeep Parhar
ecdff8fa76 cxgbe(4): First of many changes to reduce diffs with internal shared
code:

- Rename some CamelCase variables.
- s/t4_link_start/t4_link_l1cfg/g
- Pull in t4_get_port_type_description.
- Move t4_wait_op_done to t4_hw.c.
- Flip the order of the RDMA stats.
- Remove unsused function t4_iq_start_stop.
- Move t4_wait_op_done and t4_wait_op_done_val to t4_hw.c

Obtained from:	Chelsio Communications
2016-03-03 01:41:53 +00:00
Navdeep Parhar
dd991bd5a1 cxgbe(4): Update T5 and T4 firmwares to 1.15.28.0.
These firmwares were obtained from the beta "Chelsio T5/T4 Unified Wire
v2.12.0.2 for Linux" release.  Changes since last release are listed in the
"Release Notes" accompanying the beta release and are copy-pasted here as well.

The plan is to have only GA'd firmwares in any -STABLE FreeBSD branch so I'll
MFC this (after 2 months) only if it ends up in a GA release.

================================================================================
================================================================================

22.1. T5 Firmware
+++++++++++++++++++++++++++++++++

Version : 1.15.28.0
Date    : 02/29/2016
================================================================================

FIXES
-----

BASE:
 - Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
   queue was ignored.
 - Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
 - Fixed an issue in watchdog which was causing VM bring-up failure after
   reboot.
 - Fixed 40G link failures with some switches when auto-negotiation enabled.
 - Fixed to improve on link bring-up time.
 - Per port buffer groups size doubled to improve performance.
 - Fixed an issue where bogus d3hot bits were set causing traffic stall.
 - Fixed an issue where sometimes adapter was not seen after reboot.
 - Fixed an issue where iWARP was crashing in conjunction with traffic
   management.
 - Fixed an issue where link failed to come up after removing twinax cable and
   inserting optical module.

OFLD
 - Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.

FOiSCSI
 - Fixed an issue in recovery path where connection was getting closed before
   recovery processing was done.
 - Fixed an issue in TCP port reuse.
 - Fixed an issue in recovery path when large number (>64) of iSCSI connections
   were in use.
 - Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
   use given inerface.

ENHANCEMENTS
------------

BASE:
 - Added new interface to program DCA settings in SGE contexts; allow 32-byte
   IQE size
 - Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
 - Added MPS raw interface.

ETH:
 - New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.

OFLD:
 - WR opcode is returned to host in cqe error response.

================================================================================
================================================================================

22.2. T4 Firmware
+++++++++++++++++

Version : 1.15.28.0
Date    : 02/29/2016
================================================================================

FIXES
-----

BASE:
 - Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
   was ignored.
 - Fixed an issue in watchdog which was causing VM bring-up failure after
   reboot.
 - Per port buffer groups size doubled to improve performance.
 - Fixed an issue where iWARP was crashing in conjunction with traffic
   management.

FOiSCSI:
 - Fixed an issue in recovery path where connection was getting closed before
   recovery processing was done.
 - Fixed an issue in TCP port reuse.
 - Fixed an issue in recovery path when large number (>64) of iSCSI connections
   were in use.
 - Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
   use given inerface.

ENHANCEMENTS
------------

BASE:
 - Added MPS raw interface.

ETH:
 - New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.

================================================================================

Obtained from:	Chelsio Communications
MFC after:	2 months
Sponsored by:	Chelsio Communications
2016-03-01 02:36:50 +00:00
Navdeep Parhar
e8c6ba7265 cxgbe(4): Add a sysctl to retrieve the maximum speed/bandwidth supported by a
port.

dev.cxgbe.<n>.max_speed
dev.cxl.<n>.max_speed

Sponsored by:	Chelsio Communications
2016-02-25 01:10:56 +00:00
Navdeep Parhar
40bf7442fa cxgbe: catch up with the latest hardware-related definitions.
Obtained from:	Chelsio Communications
Sponsored by:	Chelsio Communications
2016-02-19 00:29:16 +00:00
Navdeep Parhar
748d440809 Remove duplicate definition (CPL_TRACE_PKT_T5). 2016-02-12 20:14:03 +00:00
Gleb Smirnoff
b4b12e52fb Garbage collect unused arguments of m_init(). 2016-02-10 18:54:18 +00:00
Gleb Smirnoff
f353ae1c62 More fixes to the build. 2016-01-27 05:15:53 +00:00
Gleb Smirnoff
57a78e3bae Augment struct tcpstat with tcps_states[], which is used for book-keeping
the amount of TCP connections by state.  Provides a cheap way to get
connection count without traversing the whole pcb list.

Sponsored by:	Netflix
2016-01-27 00:45:46 +00:00
Navdeep Parhar
097f289f25 Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.

Submitted by:	Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D4801
2016-01-22 23:33:34 +00:00
Navdeep Parhar
a34ec16adb iw_cxgbe: fix a couple of problems int the RDMA_TERMINATE handler.
a) Look for the CPL in the payload buffer instead of the descriptor.
b) Retrieve the socket associated with the tid with the inpcb lock held.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
2016-01-21 00:42:48 +00:00
Hans Petter Selasky
e936121d31 Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets
  according to the hash type and flowid. This prevents exhaustion of
  the LRO entries due to too many connections at the same time.
  Testing using a larger number of higher bandwidth TCP connections
  showed that the incoming ACK packet aggregation rate increased from
  ~1.3:1 to almost 3:1. Another test showed that for a number of TCP
  connections greater than 16 per hardware receive ring, where 8 TCP
  connections was the LRO active entry limit, there was a significant
  improvement in throughput due to being able to fully aggregate more
  than 8 TCP stream. For very few very high bandwidth TCP streams, the
  optimizing LRO wrapper will add CPU usage instead of reducing CPU
  usage. This is expected. Network drivers which want to use the
  optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
  of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
  "tcp_lro_flush()". Further the LRO control structure must be
  initialized using "tcp_lro_init_args()" passing a non-zero number
  into the "lro_mbufs" argument.

- Make LRO statistics 64-bit. Previously 32-bit integers were used for
  statistics which can be prone to wrap-around. Fix this while at it
  and update all SYSCTL's which expose LRO statistics.

- Ensure all data is freed when destroying a LRO control structures,
  especially leftover LRO entries.

- Reduce number of memory allocations needed when setting up a LRO
  control structure by precomputing the total amount of memory needed.

- Add own memory allocation counter for LRO.

- Bump the FreeBSD version to force recompilation of all KLDs due to
  change of the LRO control structure size.

Sponsored by:	Mellanox Technologies
Reviewed by:	gallatin, sbruno, rrs, gnn, transport
Tested by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D4914
2016-01-19 15:33:28 +00:00
Navdeep Parhar
5725f0e490 cxgbe: bind the ithreads that handle NIC rx to the correct CPU if the kernel
is built with option RSS.
2016-01-11 17:52:42 +00:00
Alexander V. Chernikov
8a9f7532b0 Convert cxgb/cxgbe to the new routing API.
Discussed with:		np
2016-01-07 08:07:17 +00:00
Gleb Smirnoff
0c39d38d21 Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.

With this change:

- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
  options length. The functions gives a better estimate, since it takes
  into account SACK state as well.

Reviewed by:	jtl
Differential Revision:	https://reviews.freebsd.org/D3593
2016-01-07 00:14:42 +00:00
Navdeep Parhar
9f6b62e791 iw_cxgbe: Shut down the socket but do not close the fd in case of error.
The fd is closed later in this case.  This fixes a "SS_NOFDREF on enter"
panic.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Reviewed by:	Steve Wise @ Open Grid Computing
2016-01-05 01:32:40 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Navdeep Parhar
e3148e46b2 cxgbei: Hardware accelerated iSCSI target and initiator for TOE capable
cards supported by cxgbe(4).

On the host side this driver interfaces with the storage stack via the
ICL (iSCSI Common Layer) in the kernel.  On the wire the traffic is
standard iSCSI (SCSI over TCP as per RFC 3720/7143 etc.) that
interoperates with all other standards compliant implementations.  The
driver is layered on top of the TOE driver (t4_tom) and promotes
connections being handled by t4_tom to iSCSI ULP (Upper Layer Protocol)
mode.  Hardware assistance in this mode includes:

- Full TCP processing.
- iSCSI PDU identification and recovery within the TCP stream.
- Header and/or data digest insertion (tx) and verification (rx).
- Zero copy (both tx and rx).

Man page will follow in a separate commit in a couple of weeks.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2015-12-26 06:05:21 +00:00