155 Commits

Author SHA1 Message Date
np
92e828b60a cxgbe: no need to display the per-lane GT/s rating of the pcie link.
MFC after:	1 week
2015-06-01 03:24:39 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
glebius
8777a4e32f Don't use ifm_data. It was used only for self checking debug.
Reviewed by:	np
2015-04-26 21:31:30 +00:00
np
265b7b076b cxgbe(4): Do not call sbuf_trim on an sbuf with a drain function.
MFC after:	1 week
2015-03-23 23:06:32 +00:00
ian
0dd684d23f Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
ian
470fe38d1f Revert r279934, r279938; this is going to be fixed in sbuf instead.
PR:		195668
2015-03-14 13:04:39 +00:00
np
43c1b0e92f cxgbe(4): fix if_media handling for T520-BT cards. 1Gbps and 100Mbps
are valid for this card.

MFC after:	1 week
2015-03-14 00:02:53 +00:00
ian
d59418210a Nullterminate strings returned via sysctl.
PR:		195668
2015-03-12 18:22:20 +00:00
np
0dc33bb531 cxgbe(4): allow the SET_FILTER_MODE ioctl to change the mode when it's
safe to do so.

MFC after:	1 month
2015-02-10 01:16:43 +00:00
np
f975fdf494 cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD).  The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
   iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
   adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
   or iSCSI.  Any such attempt will fail with EBUSY.

MFC after:	2 weeks
2015-02-08 09:28:55 +00:00
np
c3605e4a62 cxgbe(4): adapter_full_init is always a synchronized operation.
MFC after:	1 week
2015-02-08 08:52:18 +00:00
np
dda106938e cxgbe(4): a change to the synchronization rules within the the driver.
This is purely cosmetic because the new rules are already followed.

MFC after:	1 week
2015-02-08 08:42:45 +00:00
np
4542b7bb6f cxgbe(4): fix a test made while enabling TOE.
MFC after:	1 week
2015-02-07 01:50:32 +00:00
jhb
c5ac2eb628 Fix a couple of panics when detaching from a cxgbe/cxl interface that was
never brought up:
- Allow NULL to be passed to sglist_free().
- Don't try to stop an interface that was never fully initialized.

Reviewed by:	np
2015-01-26 16:26:28 +00:00
np
8ad5f160bf Allow cxgbe(4) to be built on i386. Driver attach will succeed only on a subset
of i386 systems.
2015-01-16 01:32:40 +00:00
np
3b561af844 cxgbe(4): fix the description of a strange bunch of counters.
MFC after:	1 week
2015-01-05 23:43:24 +00:00
np
06bf9630b6 cxgbe/tom: do not engage the TOE's payload chopper for payload < 2 MSS
or for 10Gbps ports.

MFC after:	2 weeks
2015-01-03 00:09:21 +00:00
np
448a336731 cxgbe(4): remove buf_ring specific restriction on the txq size.
MFC after:	2 months
2015-01-01 09:33:46 +00:00
np
b2f095aaa6 cxgbe(4): major tx rework.
a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.

b) Replace buf_ring with a brand new mp_ring (multiproducer ring).  This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual.  mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue.  It also has:
- the ability to enqueue/dequeue multiple items.  This might become
  significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
  descriptors and have another if_transmit thread take over.  A thread
  that's writing tx descriptors can end up doing so for an unbounded
  time period if a) there are other if_transmit threads continuously
  feeding the sofware queue, and b) the chip keeps up with whatever the
  thread is throwing at it.
- accurate statistics about interesting events even when the stats come
  at the expense of additional branches/conditional code.

The NIC txq lock is uncontested on the fast path at this point.  I've
left it there for synchronization with the control events (interface
up/down, modload/unload).

c) Add support for "type 1" coalescing work request in the normal NIC tx
path.  This work request is optimized for frames with a single item in
the DMA gather list.  These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.

d) Do not request automatic cidx updates every 32 descriptors.  Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately).  Also, request an automatic final update
when the queue idles after activity.  This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles.  This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.

e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx).  Allow work requests to be written
directly to the hardware descriptor ring if room is available.  I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.

MFC after:	2 months
2014-12-31 23:19:16 +00:00
hselasky
12fec3618b Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
np
b2c3a88be4 cxgbe(4): figure out the max payload size and save it for later.
MFC after:	1 week
2014-11-19 20:16:56 +00:00
np
4ccd7dfa2c Fix some bad interaction between cxgbe(4) and lacp lagg(4) that could
leave a port permanently disabled when a copper cable is unplugged and
then plugged right back in.

lacp_linkstate goes looking for the current ifmedia on a link state
change and it could get stale information from cxgbe(4) on a module
unplug followed by replug.  The fix is to process module events before
link-state events within the driver, and to always rebuild the ifmedia
list on a module change event (instead of rebuilding it lazily).

Thanks to asomers@ for the problem report and detailed analysis to go
with it.

MFC after:	1 week
2014-11-12 23:29:22 +00:00
jhb
b3923faaf8 Add device ID for the T502-BT (dual-port 1G) adapter.
Reviewed by:	np
MFC after:	1 week
2014-11-11 20:05:50 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
np
d8abaa7d21 cxgbe(4): implement if_get_counter. 2014-09-27 05:50:31 +00:00
np
43fb052c69 cxgbe(4): explicitly set various if_hw_tso* values.
MFC after:	3 days
2014-09-26 22:21:02 +00:00
np
f3631462c4 Make sure the adapter's management queue and the event queue are
available before any uppper layer driver (TOE, iWARP, or iSCSI)
registers with the base cxgbe(4) driver.

Submitted by:	Hariprasad at chelsio dot com
Reviewed by:	np@
2014-09-26 18:53:00 +00:00
np
c72cd1f162 cxgbe(4): Verify that the addresses in if_multiaddrs really are multicast
addresses.  (The chip doesn't really care, it's just that it needs to be
told explicitly if unicast DMACs are checked for "hits" in the hash that
is used after the TCAM entries are all used up).
2014-09-23 22:57:11 +00:00
np
ee3fde2213 cxgbe(4): add support for the SIOCGI2C ioctl. 2014-09-12 21:56:57 +00:00
np
cc95d45388 cxgbe(4): knobs to enable/disable PAUSE frame based flow control.
MFC after:	1 week
2014-09-12 05:25:56 +00:00
np
a24889f942 cxgbe(4): Let caller specify whether it's ok to sleep in
t4_sched_config and t4_sched_params.

MFC after:	2 weeks
2014-08-06 19:38:03 +00:00
np
adf16c3a4a cxgbe(4): Do not run any sleepable code in the SIOCSIFFLAGS handler when
IFF_PROMISC or IFF_ALLMULTI is being flipped.  bpf(4) holds its global
mutex around ifpromisc in at least the bpf_dtor path.

MFC after:	3 days
2014-08-04 22:32:16 +00:00
np
ff70733f8a Some hooks in cxgbe(4) for the offloaded iSCSI driver.
(I'm committing this on behalf of my colleagues in the Storage team
at Chelsio).

Submitted by:	Sreenivasa Honnur <shonnur at chelsio dot com>
Sponsored by:	Chelsio Communications.
2014-07-24 18:39:08 +00:00
np
153afd26ad cxgbe(4): Keep track of the clusters that have to be freed by the
custom free routine (rxb_free) in the driver.  Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet.  This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.

MFC after:	1 week
2014-07-23 22:29:22 +00:00
np
a4e0ce3808 cxgbe(4): Display CF facility correctly in the device log.
MFC after:	3 days
2014-07-15 18:24:41 +00:00
np
d6f3d65931 Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl. The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.

MFC after:	1 week
2014-07-15 01:03:29 +00:00
np
6e9f44d85d cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way.  You can use both
simultaneously.

For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface.  These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software.  Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc.  You
should run netmap on the 'n' interfaces only, that's what they are for.

With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card.  2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress.  I expect it to be closer to 40Mpps once done.  In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size.  T4 gear is totally untested.

trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef)
881.962562 main [1882] Sending 512 packets every  0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...

Relnotes:	Yes
Sponsored by:	Chelsio Communications.
2014-05-27 18:18:41 +00:00
emax
7967459eb0 use correct (integer) type for the temperature sysctl
Reviewed by:	np, scottl
Obtained from:	Netflix
MFC after:	3 days
2014-04-17 19:29:15 +00:00
np
220e798d63 cxgbe(4): Recognize the "spider" configuration where a T5 card's 40G
QSFP port is presented as 4 distinct 10G SFP+ ports to the driver.

MFC after:	2 weeks
2014-03-21 00:56:56 +00:00
np
6b585f6346 cxgbe(4): Use ifi_oqdrops in if_data to count drops in the tx path. 2014-03-20 02:28:05 +00:00
np
9ce120962b cxgbe(4): if_iqdrops statistic should include tunnel congestion drops.
MFC after:	1 week
2014-03-20 01:58:04 +00:00
np
6e8d0a1f82 cxgbe(4): significant rx rework.
- More flexible cluster size selection, including the ability to fall
  back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
  case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
  mbuf chain for any kind of freelist.  This replaces two variants: one
  for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster.  It was limited to 4K clusters
  only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes.  The driver's
  man page will be updated separately.

MFC after:	5 weeks
2014-03-18 20:14:13 +00:00
scottl
afffae9118 Add a new sysctl, dev.cxgbe.N.rsrv_noflow, and a companion tunable,
hw.cxgbe.rsrv_noflow.  When set, queue 0 of the port is reserved for
TX packets without a flowid.  The hash value of packets with a flowid
is bumped up by 1.  The intent is to provide a private queue for
link-level packets like LACP that is unlikely to overflow or suffer
deep queue latency.

Reviewed by:	np
Obtained from:	Netflix
MFC after:	3 days
2014-02-06 18:40:38 +00:00
np
2bbe4479e0 cxgbe(4): Use the port's tx channel to identify it to t4_clr_port_stats.
MFC after:	3 days
2014-02-06 02:34:29 +00:00
adrian
aa67a2f114 Add an option to enable or disable the small RX packet copying that
is done to improve performance of small frames.

When doing RX packing, the RX copying isn't necessarily required.

Reviewed by:	np
2014-01-02 23:23:33 +00:00
np
0a7078d856 Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).

Read the range of tids available for the ETHOFLD functionality if it's
enabled.

New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.

MFC after:	2 weeks
2013-12-14 03:08:03 +00:00
adrian
a960b4929d Print out the full PCIe link negotiation during dmesg.
I found this useful when checking whether a NIC is in a PCIE 3.0 8x slot
or not.

Reviewed by:	np
Sponsored by:	Netflix, inc.
2013-12-10 00:07:04 +00:00
np
3bf38b9b87 Unstaticize t4_list and t4_uld_list. This works around a clang
annoyance[1] and allows kgdb to find these symbols.

[1] http://lists.freebsd.org/pipermail/freebsd-hackers/2012-November/041166.html

MFC after:	3 days
2013-12-09 23:33:57 +00:00
np
6666e32367 cxgbe(4): save a copy of the RSS map for each port for the driver's use. 2013-12-08 17:47:37 +00:00
np
6221ad9050 cxgbe(4): T4_SET_SCHED_CLASS and T4_SET_SCHED_QUEUE ioctls to program
scheduling classes in the chip and to bind tx queue(s) to a scheduling
class respectively.  These can be used for various kinds of tx traffic
throttling (to force selected tx queues to drain at a fixed Kbps rate,
or a % of the port's total bandwidth, or at a fixed pps rate, etc.).

Obtained from:	Chelsio
2013-12-03 18:34:52 +00:00