Commit Graph

123 Commits

Author SHA1 Message Date
Gleb Smirnoff
b4b12e52fb Garbage collect unused arguments of m_init(). 2016-02-10 18:54:18 +00:00
Gleb Smirnoff
8ee1c08783 Fix inverse logic. If this is zone_pack, then we shouldn't free the
cluster ourselves.

Found by review. Since this code is !386 and !amd64 and is executed
on error path, pretty sure no one ever executed it.
2016-02-03 20:39:52 +00:00
Hans Petter Selasky
e936121d31 Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets
  according to the hash type and flowid. This prevents exhaustion of
  the LRO entries due to too many connections at the same time.
  Testing using a larger number of higher bandwidth TCP connections
  showed that the incoming ACK packet aggregation rate increased from
  ~1.3:1 to almost 3:1. Another test showed that for a number of TCP
  connections greater than 16 per hardware receive ring, where 8 TCP
  connections was the LRO active entry limit, there was a significant
  improvement in throughput due to being able to fully aggregate more
  than 8 TCP stream. For very few very high bandwidth TCP streams, the
  optimizing LRO wrapper will add CPU usage instead of reducing CPU
  usage. This is expected. Network drivers which want to use the
  optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
  of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
  "tcp_lro_flush()". Further the LRO control structure must be
  initialized using "tcp_lro_init_args()" passing a non-zero number
  into the "lro_mbufs" argument.

- Make LRO statistics 64-bit. Previously 32-bit integers were used for
  statistics which can be prone to wrap-around. Fix this while at it
  and update all SYSCTL's which expose LRO statistics.

- Ensure all data is freed when destroying a LRO control structures,
  especially leftover LRO entries.

- Reduce number of memory allocations needed when setting up a LRO
  control structure by precomputing the total amount of memory needed.

- Add own memory allocation counter for LRO.

- Bump the FreeBSD version to force recompilation of all KLDs due to
  change of the LRO control structure size.

Sponsored by:	Mellanox Technologies
Reviewed by:	gallatin, sbruno, rrs, gnn, transport
Tested by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D4914
2016-01-19 15:33:28 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Ian Lepore
1eafc07856 Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
Gleb Smirnoff
c578b6aca0 Provide a set of inline functions to manage simple mbuf(9) queues, based
on queue(3)'s STAILQ.  Utilize them in cxgb(4) and Xen, deleting home
grown implementations.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 01:19:42 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Navdeep Parhar
e26e6373c8 cxgb(4): implement if_get_counter. 2014-09-27 18:35:16 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Andre Oppermann
1b4381afbb Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00
Andre Oppermann
3b460852c4 Remove unnecessary setup of the m->pkthdr.header pointer.
Sponsored by:	The FreeBSD Foundation
2013-08-24 17:14:14 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Navdeep Parhar
fef542fe1c Initialize the response queue mutex a bit earlier to avoid a panic that
occurs if t3_sge_alloc_qset fails and then t3_free_qset attempts to
destroy an uninitialized mutex.

Submitted by:	Vijay Singh <vijju dot singh at gmail>
MFC after:	3 days
2012-10-25 18:11:04 +00:00
Navdeep Parhar
0a7049095f cxgb(4): IPv6 rx/tx hw checksum, IPv6 TSO and LRO too.
(Some parts already worked, this makes it complete).
2012-06-30 02:11:53 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Bjoern A. Zeeb
47cfa99a50 MFp4 bz_ipv6_fast:
Allow LRO to work on IPv6 as well.
  Fix the module Makefile to at least properly inlcude opt_inet6.h
  and allow builds without INET or INET6.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 03:00:34 +00:00
Scott Long
b6f97155cc Convert a number of drivers to obtaining their parent DMA tag from their
PCI device attachment.
2012-03-12 08:03:51 +00:00
Navdeep Parhar
3e7cc3cab3 Add IPv6 TSO (including TSO+VLAN) support to cxgb(4).
If an IPv6 packet has extension headers the kernel needs to deal with it
itself.  For the rest it can set various CSUM_XXX flags and the driver
will act on them.
2012-02-09 23:19:09 +00:00
Navdeep Parhar
65d43cc6e7 Remove if_start from cxgb and cxgbe.
Submitted by:	jhb
MFC after:	3 days
2012-02-07 07:32:39 +00:00
Navdeep Parhar
7eeb16cee7 t3_free_sge_resources should be given the number of qsets it needs to free.
MFC after:	1 week
2011-03-24 01:16:48 +00:00
Matthew D Fleming
00f0e671ff Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9).  This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by:	rwatson
2011-01-27 00:34:12 +00:00
Matthew D Fleming
cbc134ad03 Introduce signed and unsigned version of CTLTYPE_QUAD, renaming
existing uses.  Rename sysctl_handle_quad() to sysctl_handle_64().
2011-01-19 23:00:25 +00:00
Matthew D Fleming
f8e4b4ef49 sysctl(8) should use the CTLTYPE to determine the type of data when
reading.  (This was already done for writing to a sysctl).  This
requires all SYSCTL setups to specify a type.  Most of them are now
checked at compile-time.

Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).

Succested by:	bde
2011-01-19 17:04:07 +00:00
Matthew D Fleming
deceab8792 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the cxgb driver piece.
2011-01-12 19:53:44 +00:00
Matthew D Fleming
4e6571599b Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte.  This behaviour was preserved, though it should not be
necessary.

Reviewed by:    phk (original patch)
2010-09-16 16:13:12 +00:00
Matthew D Fleming
404a593e28 Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn
2010-09-13 18:48:23 +00:00
Matthew D Fleming
dd67e2103c Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte.  This behaviour was preserved, though it should not be necessary.

Reviewed by:	phk
2010-09-09 18:33:46 +00:00
Navdeep Parhar
2c32b50248 Eliminate ext_intr_task. The "slow" interrupt handler is already
running on the adapter's task queue.  Just do what the task does
instead of enqueueing it.

MFC after:	3 days
2010-07-09 00:36:35 +00:00
Navdeep Parhar
27d1c65e8e cxgb(4): add knob to get packet timestamps from the hardware.
The T3 ASIC can provide an incoming packet's timestamp instead of its RSS hash.
The timestamp is just a counter running off the card's clock.  With a 175MHz
clock an increment represents ~5.7ns and the 32 bit value wraps around in ~25s.

# sysctl -d dev.cxgbc.0.pkt_timestamp
dev.cxgbc.0.pkt_timestamp: provide packet timestamp instead of connection hash

# sysctl -d dev.cxgbc.0.core_clock
dev.cxgbc.0.core_clock: core clock frequency (in KHz)
# sysctl dev.cxgbc.0.core_clock
dev.cxgbc.0.core_clock: 175000
2010-06-12 22:33:04 +00:00
Navdeep Parhar
1d4942f42f Don't ring the tx doorbell for every frame when we know more frames
will follow.  Adjust the freelist and response queue doorbells too.

Discussed with:	kmacy
2010-05-05 22:52:06 +00:00
Navdeep Parhar
489ca05be7 Increase response queue size to avoid starvation, add a counter
to track it when it does occur.
2010-04-02 17:50:52 +00:00
Navdeep Parhar
97ae3bc359 Multiple fixes related to queue set sizing and resources:
- Only the tunnelq (TXQ_ETH) requires a buf_ring, an ifq, and the watchdog/timer
  callouts.  Do not allocate these for the other tx queues.

- Use 16k jumbo clusters only on offload capable cards by default.

- Do not allocate a full tx ring for the offload queue if the card is not
  offload capable.

- Slightly better freelist size calculation.

- Fix nmbjumbo4 typo, remove unneeded global variables.

MFC after:	3 days
2010-03-31 00:27:49 +00:00
Navdeep Parhar
d228596091 Fix signed/unsigned mix-up that allowed txq->in_use to grow beyond txq->size. 2010-03-31 00:26:35 +00:00
Navdeep Parhar
92f61ecb4b Fix tx drop statistics.
MFC after:	3 days
2010-03-31 00:26:02 +00:00
Navdeep Parhar
63122c7e1d Fix build with "nooptions INET"
Requested by:	bz
MFC after:	3 days
2010-03-31 00:24:44 +00:00
Navdeep Parhar
f9c6e16451 Support IFCAP_VLANHWTSO in cxgb(4). It works with or without vlanhwtag.
While here, remove old DPRINTFs and tidy up the capability code a bit.
2010-02-26 07:08:44 +00:00
Navdeep Parhar
e83ec3e5c3 There is no need to test __FreeBSD_version for features that have
been around for a long time now (7.1-ish or even earlier); assume
they are present.  These includes MSI, TSO, LRO, VLAN, INTR_FILTERS,
FIRMWARE, etc.

Also, eliminate some dead code and clean up in other places as part
of this quick once-over.

MFC after:	1 week
2010-02-24 10:16:18 +00:00
Navdeep Parhar
be688bde90 Accessing an mbuf after it has been handed off to the hardware is a bad
race as it could already have been tx'd and freed by that time.  Place
the bpf tap just _before_ writing the gen bit.

This fixes a panic when running tcpdump on a cxgb interface.
2010-02-24 01:44:39 +00:00
Max Laier
193cbc4d24 Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
   an enqueue operation: returns true if altq is enabled or there are
   already packets in the ring (as we need to maintain packet order)
 - update all drbr consumers
 - fix drbr_flush
 - avoid using the driver queue (IFQ_DRV_*) in the altq case as the
   multiqueue consumer does not provide enough protection, serialize altq
   interaction with the main queue lock
 - make drbr_dequeue_cond work with altq

Discussed with:		kmacy, yongari, jfv
MFC after:		4 weeks
2010-02-13 16:04:58 +00:00
Navdeep Parhar
1299e07187 Complain if freelist queue sizes are significantly less than desired.
MFC after:	1 day
2010-01-20 07:28:14 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Navdeep Parhar
81af4f18d6 There is no need to log anything for a ctrlq stall or restart. These are
normal events.

Approved by:	gnn (mentor)
MFC after:	1 month
2009-09-09 18:55:18 +00:00
Navdeep Parhar
adb1423aa6 Fix cxgb(4) panic with jumbo frames.
Reviewed by:	kmacy
Approved by:	re (kib), gnn (mentor)
2009-07-09 19:27:58 +00:00
Navdeep Parhar
fd3e790228 mvec routines should have no knowledge of the SG engine.
Reviewed by:	kmacy
Approved by:	gnn (mentor)
2009-06-25 21:50:15 +00:00
Kip Macy
8b6dccee61 fix typo in conditional 2009-06-20 19:09:41 +00:00
Kip Macy
75417d6de3 - fix dma map handling for !x86 case
- fix allocation failure handing in refill_fl
2009-06-20 18:57:14 +00:00