207 Commits

Author SHA1 Message Date
np
9e6f3aa5f7 MFC r266571, r266757, r268536, r269076, r269364, r269366, r269411,
r269413, r269428, r269440, r269537, r269644, r269731, and the cxgbe
portion of r270063.

r266571:
cxgbe(4): Remove stray if_up from the code that creates the tracing ifnet.

r266757:
cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way.  You can use both
simultaneously.

For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface.  These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software.  Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc.  You
should run netmap on the 'n' interfaces only, that's what they are for.

With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card.  2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress.  I expect it to be closer to 40Mpps once done.  In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size.  T4 gear is totally untested.

trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef)
881.962562 main [1882] Sending 512 packets every  0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...

r268536:
cxgbe(4): Add an iSCSI softc to the adapter structure.

r269076:
Some hooks in cxgbe(4) for the offloaded iSCSI driver.

r269364:
Improve compliance with style.Makefile(5).

r269366:
List one file per line in the Makefiles.  This makes it easier to read
diffs when a file is added or removed.

r269411:
cxgbe(4): minor optimizations in ingress queue processing.

Reorganize struct sge_iq.  Make the iq entry size a compile time
constant.  While here, eliminate RX_FL_ESIZE and use EQ_ESIZE directly.

r269413:
cxgbe(4):  Fix an off by one error when looking for the BAR2 doorbell
address of an egress queue.

r269428:
cxgbe(4):  some optimizations in freelist handling.

r269440:
cxgbe(4): Remove an unused version of t4_enable_vi.

r269537:
cxgbe(4): Do not run any sleepable code in the SIOCSIFFLAGS handler when
IFF_PROMISC or IFF_ALLMULTI is being flipped.  bpf(4) holds its global
mutex around ifpromisc in at least the bpf_dtor path.

r269644:
cxgbe(4):  Let caller specify whether it's ok to sleep in
t4_sched_config and t4_sched_params.

r269731:
cxgbe(4): Do not poke T4-only registers on a T5 (and vice versa).

Relnotes:	Yes (native netmap support for Chelsio T4/T5 cards)
2014-08-21 19:54:02 +00:00
bz
6f5b0b94bc MFC r266596:
Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
 functions to the tcp_var.h header file in order to avoid further
 duplication with upcoming commits.

 Reviewed by:	np
2014-08-16 13:50:15 +00:00
np
072ac2f2da MFC r268971 and r269032.
r268971:
Simplify r267600, there's no need to distinguish between allocated and
inlined mbufs.

r269032:
cxgbe(4):  Keep track of the clusters that have to be freed by the
custom free routine (rxb_free) in the driver.  Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet.  This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.
2014-07-31 23:04:41 +00:00
np
103bd6e167 MFC r268640 and r268989.
r268640:
Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl.  The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.

r268989:
Add missing newline to an error message.
2014-07-25 00:30:55 +00:00
np
46a564acf9 MFC r268706:
cxgbe(4): Display CF facility correctly in the device log.
2014-07-18 00:31:06 +00:00
np
810c9620cc MFC r267757:
cxgbe(4): Update the bundled T4 and T5 firmwares to versions 1.11.27.0

Obtained from:	Chelsio
2014-06-25 02:14:55 +00:00
np
e77b6c9934 MFC r267689:
Consider the total number of descriptors available (and not just those
that are ready to be reclaimed) when deciding whether to resume tx after
a stall.
2014-06-23 05:39:10 +00:00
np
a63ff939c2 MFC r267600:
cxgbe(4):  Fix bug in the fast rx buffer recycle path.  In some cases rx
buffers were getting recycled when they should have been left alone.
2014-06-21 00:30:51 +00:00
np
fcef7f1441 MFC r267082:
cxgbe(4):  Properly account for the freelist buffers used when returning
early from service_iq due to a budget restriction.  This fixes a potential
rx hang when using INTx.
2014-06-08 23:22:25 +00:00
np
7bdc535062 MFC r266908:
cxgbe(4): Fix a NULL dereference when the very first call to
get_scatter_segment() in get_fl_payload() fails.  While here,
fix the code to adjust fl_bufs_used when a failure occurs for
any other scatter segment.
2014-06-02 05:01:08 +00:00
np
ccdc23542c MFC r259382:
Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).

Read the range of tids available for the ETHOFLD functionality if it's
enabled.

New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.
2014-05-06 07:21:50 +00:00
np
ebd722c9b4 MFC r263317, r263412, and r263451.
r263317:
cxgbe(4): significant rx rework.

- More flexible cluster size selection, including the ability to fall
  back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
  case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
  mbuf chain for any kind of freelist.  This replaces two variants: one
  for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster.  It was limited to 4K clusters
  only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes.  The driver's
  man page will be updated separately.

r263412:
cxgbe(4): if_iqdrops statistic should include tunnel congestion drops.

r263451:
cxgbe(4): man page updates.
2014-05-06 06:49:39 +00:00
np
c4599dcf44 MFC r260210 (by adrian@):
Add an option to enable or disable the small RX packet copying that
is done to improve performance of small frames.

When doing RX packing, the RX copying isn't necessarily required.
2014-05-06 04:22:06 +00:00
np
3942b0b24c MFC r261533, r261536, r261537, and r263457.
r261533:
cxgbe(4): Use the port's tx channel to identify it to t4_clr_port_stats.

r261536:
cxgbe(4): The T5 allows for a different freelist starvation threshold
for queues with buffer packing.  Use the correct value to calculate a
freelist's low water mark.

r261537:
cxgbe(4): Use the rx channel map (instead of the tx channel map) as the
congestion channel map.

r263457:
cxgbe(4):  Recognize the "spider" configuration where a T5 card's 40G
QSFP port is presented as 4 distinct 10G SFP+ ports to the driver.
2014-05-06 02:22:52 +00:00
emax
e8052c13e8 MFC r264621
use correct (integer) type for the temperature sysctl

Reviewed by:	np, scottl
Obtained from:	Netflix
2014-04-21 17:17:23 +00:00
scottl
7d93aa7db1 MFC r261558
Add a new sysctl, dev.cxgbe.N.rsrv_noflow, and a companion tunable,
hw.cxgbe.rsrv_noflow.  When set, queue 0 of the port is reserved for
TX packets without a flowid.  The hash value of packets with a flowid
is bumped up by 1.  The intent is to provide a private queue for
link-level packets like LACP that is unlikely to overflow or suffer
deep queue latency.
2014-04-15 08:08:44 +00:00
dim
547b2fb503 MFC r261907:
In cxgbe, conditionalize the t4_pgprot_wc() function, since it is only
used when DOT5 is defined.

Reviewed by:	np
2014-02-17 20:45:39 +00:00
np
30cdcf32e9 MFC r259527:
Do not create a hardware IPv6 server if the listen address is not
in6addr_any and is not in the CLIP table either.  This fixes a reported
TOE+IPv6 NULL-dereference panic in do_pass_open_rpl().

While here, stop creating hardware servers for any loopback address.
It's just a waste of server tids.
2013-12-24 02:10:12 +00:00
np
346b651e7c MFC r259145:
Unstaticize t4_list and t4_uld_list.  This works around a clang
annoyance[1] and allows kgdb to find these symbols.

[1] http://lists.freebsd.org/pipermail/freebsd-hackers/2012-November/041166.html
2013-12-12 00:27:27 +00:00
np
7777b8aff6 MFC r257654, r257772, r258441, r258689, r258698, r258879, r259048, and
r259103.

r257654:
cxgbe(4): Exclude MPS_RPLC_MAP_CTL (0x11114) from the register dump.  Turns
out it's a write-only register with strange side effects on read.

r257772:
cxgbe(4): Tidy up the display for payload memory statistics (pm_stats).

r258441:
cxgbe(4): update the internal list of device features.

r258689:
Disable an assertion that relies on some code[1] that isn't in HEAD yet.

r258698:
cxgbetool: "modinfo" command to display SFP+ module information.

r258879:
cxgbe(4):  T4_SET_SCHED_CLASS and T4_SET_SCHED_QUEUE ioctls to program
scheduling classes in the chip and to bind tx queue(s) to a scheduling
class respectively.  These can be used for various kinds of tx traffic
throttling (to force selected tx queues to drain at a fixed Kbps rate,
or a % of the port's total bandwidth, or at a fixed pps rate, etc.).

r259048:
Two new cxgbetool subcommands to set up scheduler classes and to bind
them to NIC queues.

r259103:
cxgbe(4): save a copy of the RSS map for each port for the driver's use.
2013-12-09 22:40:22 +00:00
np
01cd6364a8 MFC r256694, r256713, r256714.
r256694:
iw_cxgbe: iWARP driver for Chelsio T4/T5 chips.  This is a straight port
of the iw_cxgb4 found in OFED distributions.

r256713:
iw_cxgbe should have a dependency on t4nex.

r256714:
Fix typo in previous commit.

Approved by:	re (hrs)
2013-10-21 01:10:37 +00:00
np
b36ea75245 MFC r256477:
cxgbe(4): Store the log2 of the # of doorbells per BAR2 page for both
ingress and egress queues, and for both T4 and T5.  These values are
used by the T4/T5 iWARP driver.

Approved by:	re (glebius)
2013-10-20 16:45:01 +00:00
np
6de0bdec97 MFC r256459.
cxgbe(4): Update T4 and T5 firmwares to 1.9.12.0

Approved by:	re (glebius)
2013-10-20 15:24:44 +00:00
glebius
75528d8e36 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00
dim
2b5c0ebf70 Fix kernel build on amd64 after r256118, since the machine/md_var.h
header is not implicitly included there.  So include it explicitly.

Approved by:	re (delphij)
Pointy hat to:	dim
MFC after:	3 days
X-MFC-With:	r256118
2013-10-07 22:30:03 +00:00
dim
1b35fd5d4c Remove redundant declaration of cpu_clflush_line_size in
sys/dev/cxgbe/t4_sge.c, to silence a gcc warning.

Approved by:	re (gjb)
MFC after:	3 days
2013-10-07 16:56:56 +00:00
np
a9b6160aa1 Rework the tx credit mechanism between the cxgbe/tom driver
and the card.  This helps smooth out some burstiness in the
exchange.

Approved by:	re (glebius)
2013-09-09 04:38:57 +00:00
np
279a0ac6ad Fix a miscalculation that caused cxgbe/tom to auto-increment
a TOE socket's tx buffer size too aggressively.

Approved by:	re (delphij)
2013-09-09 00:16:59 +00:00
np
297fdff2ee For TOE connections, the window scale factor in CPL_PASS_ACCEPT_REQ is
set to 15 to indicate that the peer did not send a window scale option
with its SYN.  Do not send a window scale option in the SYN|ACK reply
in that case.
2013-09-03 23:34:04 +00:00
np
52fb8d39b9 Fix the sysctl that displays whether buffer packing is enabled
or not.
2013-08-30 02:13:36 +00:00
np
053a3cacff Implement support for rx buffer packing. Enable it by default for T5
cards.

This is a T4 and T5 chip feature which lets the chip deliver multiple
Ethernet frames in a single buffer.  This is more efficient within the
chip, in the driver, and reduces wastage of space in rx buffers.

- Always allocate rx buffers from the jumbop zone, no matter what the
  MTU is.  Do not use the normal cluster refcounting mechanism.
- Reserve space for an mbuf and a refcount in the cluster itself and let
  the chip DMA multiple frames in the rest.
- Use the embedded mbuf for the first frame and allocate mbufs on the
  fly for any additional frames delivered in the cluster.  Each of these
  mbufs has a reference on the underlying cluster.
2013-08-30 01:45:36 +00:00
np
a52be00b7f Merge r254386 from user/np/cxl_tuning. Add an INET|INET6 check missing
in said revision.

r254386:
Flush inactive LRO entries periodically.
2013-08-29 06:26:22 +00:00
np
749ce4a06c Whitespace nit. 2013-08-28 23:15:05 +00:00
np
609f43ab7f Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.
- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
  around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
  had to be reworked as the kernel calls ifc_match with the global
  if_cloners_mtx held.
2013-08-28 20:59:22 +00:00
np
5147205a55 Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update a
couple of assertions in the TOE driver as well.
2013-08-28 20:45:45 +00:00
np
9cfbf27301 Use correct mailbox and PCIe PF number when querying RDMA parameters. 2013-08-26 19:02:52 +00:00
np
cde487f27a There is no need to hold the freelist lock around alloc/free of
software descriptors.  This also silences WITNESS warnings when
the software descriptors are allocated with M_WAITOK.

MFC after:	1 week
2013-08-23 18:03:18 +00:00
np
d221f4d62a Display P/N information in the description.
Submitted by:	gnn
MFC after:	3 days
2013-08-20 18:22:04 +00:00
np
5750092edd Display temperature sensor data. Shows -1 if sensor not
available on the card.

# sysctl dev.t4nex.0.temperature
# sysctl dev.t5nex.0.temperature
2013-08-02 18:05:42 +00:00
np
aa66c86ef9 Fix previous commit (r253873). "cong" has one bit per channel but the
congestion channel map has 1 nibble per channel.  So bits wxyz need to
be blown up into 000w000x000y000z.
2013-08-02 17:44:19 +00:00
np
1c55a6beb9 Set up congestion manager context properly for T5 based cards.
MFC after:	3 days (will check with re@)
2013-08-01 23:38:30 +00:00
np
4b06c1288c Display SGE tunables in the sysctl tree.
dev.t5nex.0.fl_pktshift: payload DMA offset in rx buffer (bytes)
dev.t5nex.0.fl_pad: payload pad boundary (bytes)
dev.t5nex.0.spg_len: status page size (bytes)
dev.t5nex.0.cong_drop: congestion drop setting

Discussed with:	scottl
2013-07-31 05:12:51 +00:00
np
040053e564 Display a string instead of a numeric code in the linkdnrc sysctl.
Submitted by:	gnn@
2013-07-27 07:43:43 +00:00
np
1bcee0582a Expand the list of devices claimed by cxgbe(4). 2013-07-27 00:53:07 +00:00
np
0a25bc10f1 Add support for packet-sniffing tracers to cxgbe(4). This works with
all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and
for general purpose monitoring without tapping any cxgbe or cxl ifnet
directly.

Tracers on the T4/T5 chips provide access to Ethernet frames exactly as
they were received from or transmitted on the wire.  On transmit, a
tracer will capture a frame after TSO segmentation, hw VLAN tag
insertion, hw L3 & L4 checksum insertion, etc.  It will also capture
frames generated by the TCP offload engine (TOE traffic is normally
invisible to the kernel).  On receive, a tracer will capture a frame
before hw VLAN extraction, runt filtering, other badness filtering,
before the steering/drop/L2-rewrite filters or the TOE have had a go at
it, and of course before sw LRO in the driver.

There are 4 tracers on a chip.  A tracer can trace only in one direction
(tx or rx).  For now cxgbetool will set up tracers to capture the first
128B of every transmitted or received frame on a given port.  This is a
small subset of what the hardware can do.  A pseudo ifnet with the same
name as the nexus driver (t4nex0 or t5nex0) will be created for tracing.
The data delivered to this ifnet is an additional copy made inside the
chip.  Normal delivery to cxgbe<n> or cxl<n> will be made as usual.

/* watch cxl0, which is the first port hanging off t5nex0. */
# cxgbetool t5nex0 tracer 0 tx0  (watch what cxl0 is transmitting)
# cxgbetool t5nex0 tracer 1 rx0  (watch what cxl0 is receiving)
# cxgbetool t5nex0 tracer list
# tcpdump -i t5nex0   <== all that cxl0 sees and puts on the wire

If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K
"frames" with no L3/L4 checksum but this will show you the frames that
were actually transmitted.

/* all done */
# cxgbetool t5nex0 tracer 0 disable
# cxgbetool t5nex0 tracer 1 disable
# cxgbetool t5nex0 tracer list
# ifconfig t5nex0 destroy
2013-07-26 22:04:11 +00:00
np
45f9b76a04 Reserve room for ioctls that aren't in this copy of the driver yet. 2013-07-26 20:54:33 +00:00
np
6a81f43eb6 Specify a timeout for the PL block.
MFC after:	3 days
2013-07-17 02:37:40 +00:00
np
06a7597ad3 Attach to the 4x10G T540-CR card. 2013-07-11 19:09:31 +00:00
np
5b502b74ad - Show the reason why link is down if this information is available.
- Display the temperature and PHY firmware version of the BT PHY.

MFC after:	1 day
2013-07-05 01:53:51 +00:00
np
0b91a49dc3 - Make note of interface MTU change if the rx queues exist, and not just
when the interface is up.
- Add a tunable to control the TOE's rx coalesce feature (enabled by
  default as it always has been).  Consider the interface MTU or the
  coalesce size when deciding which cluster zone to use to fill the
  offload rx queue's free list.  The tunable is:
  dev.{t4nex,t5nex}.<N>.toe.rx_coalesce

MFC after:	1 day
2013-07-04 21:19:01 +00:00