229 Commits

Author SHA1 Message Date
np
a4e0ce3808 cxgbe(4): Display CF facility correctly in the device log.
MFC after:	3 days
2014-07-15 18:24:41 +00:00
np
d6f3d65931 Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl. The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.

MFC after:	1 week
2014-07-15 01:03:29 +00:00
np
be7d2f5cfc cxgbe(4): Add an iSCSI softc to the adapter structure. 2014-07-11 21:02:54 +00:00
glebius
0236597739 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
np
d532d18d9f cxgbe(4): Update the bundled T4 and T5 firmwares to versions 1.11.27.0.
Obtained from:	Chelsio
MFC after:	3 days
2014-06-22 23:40:20 +00:00
np
f8549e28ab Consider the total number of descriptors available (and not just those
that are ready to be reclaimed) when deciding whether to resume tx after
a stall.

MFC after:	3 days
2014-06-20 20:28:46 +00:00
np
b95e9c279b cxgbe(4): Fix bug in the fast rx buffer recycle path. In some cases rx
buffers were getting recycled when they should have been left alone.

MFC after:	3 days
2014-06-18 00:16:35 +00:00
attilio
2802c525ad - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
np
2eb27b8587 cxgbe(4): Properly account for the freelist buffers used when returning
early from service_iq due to a budget restriction.  This fixes a potential
rx hang when using INTx.

MFC after:	3 days
2014-06-05 00:38:32 +00:00
np
851231354f cxgbe(4): Fix a NULL dereference when the very first call to
get_scatter_segment() in get_fl_payload() fails.  While here,
fix the code to adjust fl_bufs_used when a failure occurs for
any other scatter segment.

MFC after:	3 days
2014-05-30 22:59:45 +00:00
np
6e9f44d85d cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way.  You can use both
simultaneously.

For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface.  These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software.  Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc.  You
should run netmap on the 'n' interfaces only, that's what they are for.

With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card.  2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress.  I expect it to be closer to 40Mpps once done.  In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size.  T4 gear is totally untested.

trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef)
881.962562 main [1882] Sending 512 packets every  0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...

Relnotes:	Yes
Sponsored by:	Chelsio Communications.
2014-05-27 18:18:41 +00:00
bz
c4e312930b Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
functions to the tcp_var.h header file in order to avoid further
duplication with upcoming commits.

Reviewed by:	np
MFC after:	2 weeks
2014-05-23 20:15:01 +00:00
np
ba421a2e25 cxgbe(4): Remove stray if_up from the code that creates the tracing ifnet. 2014-05-23 01:45:44 +00:00
emax
7967459eb0 use correct (integer) type for the temperature sysctl
Reviewed by:	np, scottl
Obtained from:	Netflix
MFC after:	3 days
2014-04-17 19:29:15 +00:00
np
220e798d63 cxgbe(4): Recognize the "spider" configuration where a T5 card's 40G
QSFP port is presented as 4 distinct 10G SFP+ ports to the driver.

MFC after:	2 weeks
2014-03-21 00:56:56 +00:00
np
6b585f6346 cxgbe(4): Use ifi_oqdrops in if_data to count drops in the tx path. 2014-03-20 02:28:05 +00:00
np
9ce120962b cxgbe(4): if_iqdrops statistic should include tunnel congestion drops.
MFC after:	1 week
2014-03-20 01:58:04 +00:00
np
6e8d0a1f82 cxgbe(4): significant rx rework.
- More flexible cluster size selection, including the ability to fall
  back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
  case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
  mbuf chain for any kind of freelist.  This replaces two variants: one
  for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster.  It was limited to 4K clusters
  only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes.  The driver's
  man page will be updated separately.

MFC after:	5 weeks
2014-03-18 20:14:13 +00:00
dim
9a614d1229 In cxgbe, conditionalize the t4_pgprot_wc() function, since it is only
used when DOT5 is defined.

Reviewed by:	np
MFC after:	3 days
2014-02-14 23:38:42 +00:00
scottl
afffae9118 Add a new sysctl, dev.cxgbe.N.rsrv_noflow, and a companion tunable,
hw.cxgbe.rsrv_noflow.  When set, queue 0 of the port is reserved for
TX packets without a flowid.  The hash value of packets with a flowid
is bumped up by 1.  The intent is to provide a private queue for
link-level packets like LACP that is unlikely to overflow or suffer
deep queue latency.

Reviewed by:	np
Obtained from:	Netflix
MFC after:	3 days
2014-02-06 18:40:38 +00:00
np
34a66678cc cxgbe(4): Use the rx channel map (instead of the tx channel map) as the
congestion channel map.

MFC after:	1 week
2014-02-06 03:30:12 +00:00
np
bbaa0236fd cxgbe(4): The T5 allows for a different freelist starvation threshold
for queues with buffer packing.  Use the correct value to calculate a
freelist's low water mark.

MFC after:	1 week
2014-02-06 03:21:43 +00:00
np
2bbe4479e0 cxgbe(4): Use the port's tx channel to identify it to t4_clr_port_stats.
MFC after:	3 days
2014-02-06 02:34:29 +00:00
adrian
aa67a2f114 Add an option to enable or disable the small RX packet copying that
is done to improve performance of small frames.

When doing RX packing, the RX copying isn't necessarily required.

Reviewed by:	np
2014-01-02 23:23:33 +00:00
np
b2e6c8c6a2 Do not create a hardware IPv6 server if the listen address is not
in6addr_any and is not in the CLIP table either.  This fixes a reported
TOE+IPv6 NULL-dereference panic in do_pass_open_rpl().

While here, stop creating hardware servers for any loopback address.
It's just a waste of server tids.

MFC after:	1 week
2013-12-17 21:41:23 +00:00
np
0a7078d856 Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).

Read the range of tids available for the ETHOFLD functionality if it's
enabled.

New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.

MFC after:	2 weeks
2013-12-14 03:08:03 +00:00
adrian
a960b4929d Print out the full PCIe link negotiation during dmesg.
I found this useful when checking whether a NIC is in a PCIE 3.0 8x slot
or not.

Reviewed by:	np
Sponsored by:	Netflix, inc.
2013-12-10 00:07:04 +00:00
np
3bf38b9b87 Unstaticize t4_list and t4_uld_list. This works around a clang
annoyance[1] and allows kgdb to find these symbols.

[1] http://lists.freebsd.org/pipermail/freebsd-hackers/2012-November/041166.html

MFC after:	3 days
2013-12-09 23:33:57 +00:00
np
6666e32367 cxgbe(4): save a copy of the RSS map for each port for the driver's use. 2013-12-08 17:47:37 +00:00
np
6221ad9050 cxgbe(4): T4_SET_SCHED_CLASS and T4_SET_SCHED_QUEUE ioctls to program
scheduling classes in the chip and to bind tx queue(s) to a scheduling
class respectively.  These can be used for various kinds of tx traffic
throttling (to force selected tx queues to drain at a fixed Kbps rate,
or a % of the port's total bandwidth, or at a fixed pps rate, etc.).

Obtained from:	Chelsio
2013-12-03 18:34:52 +00:00
np
7183522461 Disable an assertion that relies on some code[1] that isn't in HEAD yet.
[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-August/036573.html
2013-11-27 19:54:19 +00:00
np
3736475696 cxgbe(4): update the internal list of device features.
MFC after:	3 days
2013-11-21 20:07:58 +00:00
np
df7d7bf4a5 cxgbe(4): Tidy up the display for payload memory statistics (pm_stats).
# sysctl -n dev.t4nex.0.misc.pm_stats
# sysctl -n dev.t5nex.0.misc.pm_stats

MFC after:	1 week
2013-11-07 00:25:49 +00:00
np
938566d71a cxgbe(4): Exclude MPS_RPLC_MAP_CTL (0x11114) from the register dump. Turns
out it's a write-only register with strange side effects on read.

Submitted by:	gnn
MFC after:	3 days
2013-11-04 21:06:21 +00:00
glebius
9e01f79e97 - Provide necessary includes.
- Remove unnecessary includes.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-29 11:17:49 +00:00
glebius
f469ae1d45 Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
np
a3b6ec336d Fix typo in previous commit. 2013-10-18 00:00:08 +00:00
np
f8660440c3 iw_cxgbe should have a dependency on t4nex.
Reported by:	trasz@
2013-10-17 23:57:17 +00:00
np
188c164b5a iw_cxgbe: iWARP driver for Chelsio T4/T5 chips. This is a straight port
of the iw_cxgb4 found in OFED distributions.

Obtained from:	Chelsio
2013-10-17 18:37:25 +00:00
np
39c3801927 cxgbe(4): Store the log2 of the # of doorbells per BAR2 page for both
ingress and egress queues, and for both T4 and T5.  These values are
used by the T4/T5 iWARP driver.
2013-10-14 23:32:56 +00:00
np
04edd1ec6f cxgbe(4): Update T4 and T5 firmwares to 1.9.12.0 2013-10-14 21:25:07 +00:00
glebius
75528d8e36 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00
dim
2b5c0ebf70 Fix kernel build on amd64 after r256118, since the machine/md_var.h
header is not implicitly included there.  So include it explicitly.

Approved by:	re (delphij)
Pointy hat to:	dim
MFC after:	3 days
X-MFC-With:	r256118
2013-10-07 22:30:03 +00:00
dim
1b35fd5d4c Remove redundant declaration of cpu_clflush_line_size in
sys/dev/cxgbe/t4_sge.c, to silence a gcc warning.

Approved by:	re (gjb)
MFC after:	3 days
2013-10-07 16:56:56 +00:00
np
a9b6160aa1 Rework the tx credit mechanism between the cxgbe/tom driver
and the card.  This helps smooth out some burstiness in the
exchange.

Approved by:	re (glebius)
2013-09-09 04:38:57 +00:00
np
279a0ac6ad Fix a miscalculation that caused cxgbe/tom to auto-increment
a TOE socket's tx buffer size too aggressively.

Approved by:	re (delphij)
2013-09-09 00:16:59 +00:00