Commit Graph

302 Commits

Author SHA1 Message Date
Navdeep Parhar
1c468ed975 cxgbe(4): allow tx hardware checksumming on the netmap interface.
It is disabled by default but users can set IFCAP_TXCSUM on the
netmap ifnet (ifconfig ncxl0 txcsum) to override netmap and force
the hardware to calculate and insert proper IP and L4 checksums in
outbound frames.

MFC after:	2 weeks
2015-02-24 21:31:13 +00:00
Navdeep Parhar
1605bac6fb cxgbe(4): set up congestion management for netmap rx queues.
The hw.cxgbe.cong_drop knob controls the response of the chip when
netmap queues are congested.
2015-02-24 18:40:10 +00:00
Navdeep Parhar
b91cbdda76 cxgbe(4): do not set the netmap rxq interrupts on a hair-trigger.
MFC after:	2 weeks
2015-02-24 18:32:17 +00:00
Navdeep Parhar
b46fad7b6a cxgbe(4): wait for the hardware to catch up before destroying a netmap txq.
MFC after:	2 weeks
2015-02-24 18:22:24 +00:00
Navdeep Parhar
f8e30f9cf1 cxgbe(4): request an automatic tx update when a netmap txq idles.
MFC after:	2 weeks
2015-02-24 18:19:25 +00:00
Navdeep Parhar
c5bb375553 cxgbe(4): there is no need to force an "unimplemented" panic needlessly.
The calls to free_nm_txq and free_nm_rxq are made just a few lines prior
to the panic.
2015-02-20 22:57:54 +00:00
Hans Petter Selasky
b5c1e0cb8d Update the infiniband stack to Mellanox's OFED version 2.1.
Highlights:
 - Multiple verbs API updates
 - Support for RoCE, RDMA over ethernet

All hardware drivers depending on the common infiniband stack has been
updated aswell.

Discussed with:	np @
Sponsored by:	Mellanox Technologies
MFC after:	1 month
2015-02-17 08:40:27 +00:00
Navdeep Parhar
d5d9fbbae2 cxgbe(4): allow the SET_FILTER_MODE ioctl to change the mode when it's
safe to do so.

MFC after:	1 month
2015-02-10 01:16:43 +00:00
Navdeep Parhar
b3d44a6800 cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD).  The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
   iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
   adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
   or iSCSI.  Any such attempt will fail with EBUSY.

MFC after:	2 weeks
2015-02-08 09:28:55 +00:00
Navdeep Parhar
319f290030 cxgbe(4): adapter_full_init is always a synchronized operation.
MFC after:	1 week
2015-02-08 08:52:18 +00:00
Navdeep Parhar
d86a5ff917 cxgbe(4): a change to the synchronization rules within the the driver.
This is purely cosmetic because the new rules are already followed.

MFC after:	1 week
2015-02-08 08:42:45 +00:00
Navdeep Parhar
cb6c101a86 cxgbe(4): fix a test made while enabling TOE.
MFC after:	1 week
2015-02-07 01:50:32 +00:00
Navdeep Parhar
85dc477798 cxgbe(4): Add a minimal if_cxl module that pulls in the real driver as
a dependency.  This ensures "ifconfig cxl<n> ..." does the right thing
even when it's run with no driver loaded.

if_cxl.ko is the tiniest module in /boot/kernel.

MFC after:	2 weeks
2015-02-06 01:10:04 +00:00
Navdeep Parhar
a2355dd909 cxgbe(4): reserve id for iSCSI upper layer driver. 2015-02-05 08:52:20 +00:00
John Baldwin
86f05ea6cf Lock the socket buffer before jumping to the 'out' label if sblock()
fails in t4_soreceive_ddp().
2015-01-26 16:32:41 +00:00
John Baldwin
de5a10ecbc - Update a disabled KASSERT() to use sbused() instead of accessing
the no-longer existant sb_cc sockbuf member.
- Use sbavail() instead of sbused() in t4_soreceive_ddp() to match the
  usage in soreceive_stream() on which it is based.

Discussed with:	glebius (2)
2015-01-26 16:29:14 +00:00
John Baldwin
4f621933a5 Fix a couple of panics when detaching from a cxgbe/cxl interface that was
never brought up:
- Allow NULL to be passed to sglist_free().
- Don't try to stop an interface that was never fully initialized.

Reviewed by:	np
2015-01-26 16:26:28 +00:00
Hans Petter Selasky
d39d7c8636 Add missing linuxapi module dependencies and always use the FreeBSD
"MODULE_VERSION" macro definition. Remove the redefinition of the
"MODULE_VERSION" macro from the Linux kernel compatibility API.

MFC after:	1 month
Reported by:	np@
Sponsored by:	Mellanox Technologies
2015-01-19 21:53:00 +00:00
Navdeep Parhar
88d7f6bddf Allow cxgbe(4) to be built on i386. Driver attach will succeed only on a subset
of i386 systems.
2015-01-16 01:32:40 +00:00
Navdeep Parhar
e503548810 cxgbe/iw_cxgbe: fix whitespace nit in r277102.
Reported by:	stefanf@
2015-01-13 16:18:31 +00:00
Navdeep Parhar
b3e112f962 cxgbe/iw_cxgbe: allow any size during the initial MPA exchange.
MFC after:	1 month
2015-01-13 01:40:12 +00:00
Navdeep Parhar
db8bcd1b21 cxgbe/tom: allocate page pod addresses instead of ppod#.
MFC after:	2 weeks
2015-01-07 06:20:33 +00:00
Navdeep Parhar
f8c479085f cxgbe/tom: use vmem(9) as the DDP page pod allocator.
MFC after:	1 month
2015-01-06 01:30:32 +00:00
Navdeep Parhar
008015d2f4 cxgbe(4): fix the description of a strange bunch of counters.
MFC after:	1 week
2015-01-05 23:43:24 +00:00
Navdeep Parhar
79b93bf6a3 cxgbe/tom: do not engage the TOE's payload chopper for payload < 2 MSS
or for 10Gbps ports.

MFC after:	2 weeks
2015-01-03 00:09:21 +00:00
Navdeep Parhar
402873f32a cxgbe/tom: fix the MSS calculation for IPv6 connections handled by the TOE.
MFC after:	1 week
2015-01-02 21:13:24 +00:00
Navdeep Parhar
dd1be4d418 cxgbe/tom: log some more details in send_flowc_wr.
MFC after:	1 week
2015-01-02 20:52:51 +00:00
Navdeep Parhar
255155fc91 cxgbe(4): remove buf_ring specific restriction on the txq size.
MFC after:	2 months
2015-01-01 09:33:46 +00:00
Navdeep Parhar
7951040f8a cxgbe(4): major tx rework.
a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.

b) Replace buf_ring with a brand new mp_ring (multiproducer ring).  This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual.  mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue.  It also has:
- the ability to enqueue/dequeue multiple items.  This might become
  significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
  descriptors and have another if_transmit thread take over.  A thread
  that's writing tx descriptors can end up doing so for an unbounded
  time period if a) there are other if_transmit threads continuously
  feeding the sofware queue, and b) the chip keeps up with whatever the
  thread is throwing at it.
- accurate statistics about interesting events even when the stats come
  at the expense of additional branches/conditional code.

The NIC txq lock is uncontested on the fast path at this point.  I've
left it there for synchronization with the control events (interface
up/down, modload/unload).

c) Add support for "type 1" coalescing work request in the normal NIC tx
path.  This work request is optimized for frames with a single item in
the DMA gather list.  These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.

d) Do not request automatic cidx updates every 32 descriptors.  Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately).  Also, request an automatic final update
when the queue idles after activity.  This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles.  This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.

e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx).  Allow work requests to be written
directly to the hardware descriptor ring if room is available.  I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.

MFC after:	2 months
2014-12-31 23:19:16 +00:00
John Baldwin
5ad25ceb41 Check for SS_NBIO in so->so_state instead of sb->sb_flags in
soreceive_stream().

Differential Revision:	https://reviews.freebsd.org/D1299
Reviewed by:	bz, gnn
MFC after:	1 week
2014-12-15 17:52:08 +00:00
Navdeep Parhar
a7570ee305 Move KTR_CXGBE from t4_tom.h to adapter.h so that the base if_cxgbe
code can use it too.

MFC after:	1 week
2014-12-12 21:54:59 +00:00
Navdeep Parhar
b741402c40 cxgbe(4): allow the driver to use rx buffers that do not end on a pack
boundary.

MFC after:	2 weeks
2014-12-06 01:47:38 +00:00
Navdeep Parhar
e3207e1973 cxgbe(4): Allow for different pad and pack boundaries for different
adapters.  Set the pack boundary for T5 cards to be the same as the
PCIe max payload size.  The chip likes it this way.

In this revision the driver allocate rx buffers that align on both
boundaries.  This is not a strict requirement and a followup commit
will switch the driver to a more relaxed allocation strategy.

MFC after:	2 weeks
2014-12-06 00:13:56 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Gleb Smirnoff
651e4e6a30 Merge from projects/sendfile: extend protocols API to support
sending not ready data:
o Add new flag to pru_send() flags - PRUS_NOTREADY.
o Add new protocol method pru_ready().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-30 13:24:21 +00:00
Gleb Smirnoff
0f9d0a73a4 Merge from projects/sendfile:
o Introduce a notion of "not ready" mbufs in socket buffers.  These
mbufs are now being populated by some I/O in background and are
referenced outside.  This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
  freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
  The sb_ccc stands for ""claimed character count", or "committed
  character count".  And the sb_acc is "available character count".
  Consumers of socket buffer API shouldn't already access them directly,
  but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
  with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
  search.
o New function sbready() is provided to activate certain amount of mbufs
  in a socket buffer.

A special note on SCTP:
  SCTP has its own sockbufs.  Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs.  Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them.  The new
notion of "not ready" data isn't supported by SCTP.  Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
  A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does.  This was discussed with rrs@.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 12:52:33 +00:00
Navdeep Parhar
729fee332b cxgbe(4): figure out the max payload size and save it for later.
MFC after:	1 week
2014-11-19 20:16:56 +00:00
Navdeep Parhar
aa8d1792d1 iw_cxgbe: don't forget to close the socket in c4iw_connect if soconnect
fails.

Submitted by:	hariprasad at chelsio dot com
2014-11-13 03:59:36 +00:00
Navdeep Parhar
05c4567dd9 Fix some bad interaction between cxgbe(4) and lacp lagg(4) that could
leave a port permanently disabled when a copper cable is unplugged and
then plugged right back in.

lacp_linkstate goes looking for the current ifmedia on a link state
change and it could get stale information from cxgbe(4) on a module
unplug followed by replug.  The fix is to process module events before
link-state events within the driver, and to always rebuild the ifmedia
list on a module change event (instead of rebuilding it lazily).

Thanks to asomers@ for the problem report and detailed analysis to go
with it.

MFC after:	1 week
2014-11-12 23:29:22 +00:00
Gleb Smirnoff
cfa6009e36 In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-12 09:57:15 +00:00
John Baldwin
7bfc98355a Add device ID for the T502-BT (dual-port 1G) adapter.
Reviewed by:	np
MFC after:	1 week
2014-11-11 20:05:50 +00:00
Navdeep Parhar
62fc63abfb cxgbe(4): adjust PMRX and PMTX parameters.
MFC after:	1 week
2014-11-10 19:45:28 +00:00
Navdeep Parhar
527e4e62ac Always request a completion for every work request for iWARP. The
initial MPA exchange must be tracked this way so that t4_tom's state for
the tid is all clean at the time the tid transitions to RDMA mode.  Once
it does, t4_tom is out of the way and iw_cxgbe uses the qp endpoints
directly.

Sponsored by:	Chelsio Communications
2014-10-28 18:10:57 +00:00
Navdeep Parhar
3030c8bce9 iwcm_event status needs to be populated for close_complete_upcall
Submitted by:	Hariprasad at Chelsio dot com
Sponsored by:	Chelsio Communications
2014-10-27 23:11:48 +00:00
Navdeep Parhar
d25d06afc0 Some cxgbe/iw_cxgbe fixes:
- Free rt in c4iw_connect only if it is allocated.
- Call soclose instead of so_shutdown if there is an abort from the peer.
- Close socket and return failure if TOE is not enabled.

Submitted by:	Hariprasad at Chelsio dot com
Sponsored by:	Chelsio Communications
2014-10-27 22:22:46 +00:00
Navdeep Parhar
1284501329 cxgbe(4): bump up PF4's share of some global resources.
This increases the size of the per-port RSS slice and also allows the
driver to use a larger number of tx and rx queues.

MFC after:	2 weeks
2014-10-25 00:14:44 +00:00
Navdeep Parhar
53f49a7bd8 cxgbe/iw_cxgbe: wake up waiters after flushing the qp.
Obtained from:	Chelsio
2014-10-22 18:55:44 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Hans Petter Selasky
2c6eb461a7 Update the OFED Linux compatibility layer and
Mellanox hardware driver(s):

- Properly name an inclusion guard
- Fix compile warnings regarding unsigned enums
- Add two new sysctl nodes
- Remove all empty linux header files
- Make an error printout more verbose
- Use "mod_delayed_work()" instead of
  cancelling and starting a timeout.
- Implement more Linux scatterlist
  functions.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-15 13:40:29 +00:00
Navdeep Parhar
19abdd0654 cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by:	Hariprasad at chelsio dot com (original version)
MFC after:	2 weeks.
2014-10-07 21:26:22 +00:00