Also return error if TSO is requested without Tx checksum offload.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2564
It just affects capabilities of the created VLAN interface.
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2563
If TxQ lock is obtained, deferred packet list shold be serviced even if
the packet addition fails because of overflow.
Without the patch freeze happens if:
- queue is not blocked (i.e. completion does not trigger unblock and service)
- put-list overflow (1024 entries)
- sfxge_tx_packet_add() acquires TxQ lock just as it is released it in
sfxge_tx_qdpl_service() on the second CPU but before pending check
- sfxge_tx_packet_add() swizzles put-list to get-list, fails because of
non-tcp get-list overflow and returns without packet list service
- sfxge_tx_qdpl_service() on the second CPU checks that there are no
pending packets in the put-list and returns
Other possible solution is to guaranee that maximum length of the put-list
is less than maximum length of any get-list.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2562
The driver uses ifm_data to save capabilities mask calculated during
initialization when supported phy modes are discovered.
The patch simply calculates it when either media or options are changed.
Reviewed by: glebius
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2540
It is required for the next patch which adds dependency of TSO
capabilities from Tx checksum offloads.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2553
It is a preparation to the next patch which will service packet queue even
if packet addtion fails.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2552
Now each branch has one and only one possible TxQ lock state.
It simplifies understanding of the code.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2551
We can't disable it in HW, but we can ignore result.
Discard Rx descriptor checksum flags if Rx checksum offload is off.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2544
It simplifies understanding of the sfxge_tx_packet_add() logic and
avoids passing of 'locked' to called function.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2547
It is simply not required since the kernel checks corresponding
IFCAP_TSOx capability and CSUM_TSO in hw-assisted offloads.
Note that CSUM_TSO is two bits (CSUM_IP_TSO|CSUM_IP6_TSO) and both bits
are set in IPv4 and IPv6 mbufs.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2546
Also it is cheaper to check Rx descriptor flags than TCP protocol in IP
header.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2542
Tx checksum offload may be enabled/disabled.
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2543
Split IFCAP_HWCSUM to IFCAP_RXCSUM and IFCAP_TXCSUM to highlight Tx and Rx.
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D2541
This is a temporary workaround until we determine a reliable sequence
of operations for detecting MC reboots.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D2084
bus_space_*_8() are not always macros, so it is not correct to use
#ifndef.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D2083
Tx multi queue is added in FreeBSD 8.0. So, the changeset drops earlier
versions support.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D2081
In theory the barriers are required to cope with write combining and
reordering. Two barriers are added (sometimes merged to one):
1. Before the first write to guarantee that previous writes to the region
have been done
2. Before the last write to guarantee that write to the last dword/qword is
done after previous writes
Barriers are inserted before in the assumption that it is better to
postpone barriers as much as it is possible (more chances that the
operation has already been already done and barrier does not stall CPU).
On x86 and amd64 bus space write barriers are just compiler memory barriers
which are definitely required.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D2077
ioctl to put interface down sets ifp->if_flags which holds the intended
administratively defined state and calls driver callback to apply it.
When everything is done, driver updates internal copy of
interface flags sc->if_flags which holds the operational state.
So, transmit from Rx path is possible when interface is intended to be
administratively down in accordance with ifp->if_flags, but not applied
yet and the operational state is up in accordance with sc->if_flags.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D2075
Drops are observed under multi-stream TCP traffic due to put-list
overflow with limit equal to 64.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Transmit may be called when TxQ is not started yet (i.e. txq->common is
invalid). TxQ state is checked below when mbuf is processed and dropped
if TxQ is not started.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
The information is required for NIC update and config tools.
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
It is done to structure sysctl and do not mix with Tx queue statistics
to be added.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
With the patch applied the number of instruction events is 1% less and
number of mispredicted branch events is 5% less under multistream TCP
traffic load close to line rate.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
To delay packets from a particular TX queue by a particular time, write a value
into the TX Pace table s.t. pace time <= TX Pace Clock Period * (2 ^ pace value)
- the TX pace clock is 1/13 of the system clock, so its period should be 104 or
52 ns depending on whether turbo mode is active.
EFX_TX_PACE_CLOCK_BASE added by me.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
It solves locking problem when EFSYS_MEM_ALLOC is called in
the context holding a mutex (not allowed to sleep).
E.g. on interface bring up or multicast addresses addition.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
It is one more place missed in the previous fix.
Most likely is was just memory leak on the error handling path since
typically efsys_mem_t is filled in by zeros on allocation.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Host-bus byte order translation is not requred.
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Use remaining number of DMA segment instead of maximum number in mapping
when checking space for one more TSO segment packet.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Sync endif comment with conditional.
BOOTROM and SIENA_BOOTROM are the same, but highlight that it is Siena.
Restore commented out assertion.
Sync comments with out-of-tree driver.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Before the common code had hard coded limits on the IDs RXQs and TXQs could
be created with which were suited for the Windows driver with VMQ, and so
would prevent queues with IDs greater than or equal to 259 (for TXQs) or 768
(for RXQs) from being created. This change allows the limits to be set in
efsys.h, so that all 1024 queues can be created during new manftest tests.
Also, the descriptor cache sizes were also hard coded to values suited to
the smaller queue counts, and so it was necessary to make them configurable
as well.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Caught when efx_filter_init() failed and called efx_filter_fini() in the
teardown path.
Submitted by: Andrew Lee <alee at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Previously the driver's view was the expected outcome of any
reconfiguration even if that reconfiguration failed.
Submitted by: Ben Horgan
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
We must not enable RX queues with random parameters when they are
mapped into a VF with an untrusted driver. It's probably not a good
idea to do this anyway, so take this bit out of the table test masks.
Submitted by: Ben Hutchings
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Otherwise when processing finally comes to efx_tx_qdesc_post() it could
be insufficient space between reaped and added to post pending
descriptors.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
efx_ev_mcdi() does not assert or check that all event handlers it
calls are non-null. Add assertions at the top for all required
event handlers, as some events (in the case of this bug, monitor
events) are rare.
Submitted by: Ben Hutchings
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Use nitem() to get number of array elements.
Remove unused define.
Use TAB to indent.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
TCP header is fragmented in the case of VLAN tagged IPv6 traffic without
HW VLAN tagging.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Allocate the minimum or maximum response length for GET_BOARD_CFG as
appropriate. When looking up firmware subtypes by partition ID,
check the ID against the actual response length.
Merge of the patch made by Ben Hutchings in 2011.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Mainly to unify with similar member for transmit and receive queues.
It will be used in the future for resources allocation processing.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
It avoids access to m_pkthdr when TSO packet is started and also makes
tso_start_new_packet() function smaller.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Now compiler does not need any help.
The patch does not change generated code.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor), glebius
It makes sfxge_tso_state smaller and even makes tso_start_new_packet()
few bytes smaller. Data used to calculate packet size are used nearby,
so it should be no problems with cache etc.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor), glebius
Lock name should include interface name.
Tx queue and event queue lock name should include queue number.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Allow access to statistics data not only from sysctl handlers.
Submitted by: Boris Misenov <Boris.Misenov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Add separate software Tx queue limit for non-TCP traffic to make total
limit higher and avoid local drops of TCP packets because of no
backpressure.
There is no point to make non-TCP limit high since without backpressure
UDP stream easily overflows any sensible limit.
Split early drops statistics since it is better to have separate counter
for each drop reason to make it unabmiguous.
Add software Tx queue high watermark. The information is very useful to
understand how big queues grow under traffic load.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Most likely is was just memory leak on the error handling path since
typically efsys_mem_t is filled in by zeros on allocation.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge_dma_alloc() calls bus_dmamem_alloc() with BUS_DMA_ZERO flag, so
allocated memory is already filled in by zeros
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Remove the first member alignment to cacheline since it is nop.
Use __aligned() for the whole structure to make sure that the structure
size is cacheline aligned.
Remove lock alignment to make the structure smaller and fit all members
used on event queue processing into one cacheline (128 bytes) on x86-64.
The lock is obtained as well from different context when event queue
statistics are retrived from sysctl context, but it is infrequent.
Reorder members to avoid padding and go in usage order on event
processing.
As the result all structure members used on event queue processing fit
into exactly one cacheline (128 byte) now.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
In fact the pointer is used only if more than one TXQ is processed in
one interrupt.
It is used (read-write) on completion path only.
Also it makes the first part of the structure smaller and it fits now
into one 128byte cache line. So, TXQ structure becomes 128 bytes smaller.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.
Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.
MFC after: 1 month
Sponsored by: Mellanox Technologies
Also increase default for Tx queue get-list limit.
Too small limit results in TCP packets drops especiall when many
streams are running simultaneously.
Put list may be kept small enough since it is just a temporary
location if transmit function can't get Tx queue lock.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Required size of event queue is calculated now.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Remove trailing whitespaces and tabs.
Enclose value in return statements in parentheses.
Use tabs after #define.
Do not skip comparison with 0/NULL in boolean expressions.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
The MAC filter set may be called without softc_lock held in the case of
SIOCADDMULTI and SIOCDELMULTI ioctls. The ioctl handler checks IFF_DRV_RUNNING
flag which implies port started, but it is not guaranteed to remain.
softc_lock shared lock can't be held in the case of these ioctls processing,
since it results in failure where kernel complains that non-sleepable
lock is held in sleeping thread.
Both problems are repeatable on LAG with LACP proto bring up.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 weeks
this set of patches fixes support for systems with > 32 cores.
Details include
sfxge: RXQ index (not label) comes from FW in flush done/failed events
Change the second argument name of the efx_rxq_flush_done_ev_t and
efx_rxq_flush_failed_ev_t prototypes to highlight that RXQ index (not label)
comes from FW in flush done and failed events.
sfxge: TXQ index (not label) comes from FW in flush done event
Change the second argument name of the efx_txq_flush_done_ev_t prototype to
highlight that TXQ index (not label) comes from FW in flush done event.
sfxge: use TXQ type as label to support more than 32 TXQs
There are 3 TXQs in event queue 0 and 1 TXQ (with TCP/UDP checksum offload)
in all other event queues.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Previous implementation limits put queue size only (when Tx lock can't
be acquired), but get queue may grow unboundedly which results in mbuf
pools exhaustion and latency growth.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
mbuf should be owned by if_transmit function in any case.
Submitted-by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].
[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].
Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
functions instead of using the IF_ADDR_LOCK directly. The wrapper
functions are the supported interface for device drivers.
Reviewed by: bz, philip
MFC after: 1 week
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
It's not currently used; it didn't build on 32-bit and the previous build fix
is incorrect. If we really implement self-tests we can do this again
properly.
Submitted by: Ben Hutchings <bwh -at- solarflare.com>
MFC after: 3 weeks
This field is supposed to be set to the interface bit rate, but for some
reason I thought it was denominated in kilobits. Multiply the values up
accordingly, taking care to saturate rather than overflow on 32-bit
architectures.
Submitted by: Ben Hutchings <bwh -at- solarflare.com>
MFC after: 3 weeks
based on Solarflare SFC9000 family controllers. The driver supports jumbo
frames, transmit/receive checksum offload, TCP Segmentation Offload (TSO),
Large Receive Offload (LRO), VLAN checksum offload, VLAN TSO, and Receive Side
Scaling (RSS) using MSI-X interrupts.
This work was sponsored by Solarflare Communications, Inc.
My sincere thanks to Ben Hutchings for doing a lot of the hard work!
Sponsored by: Solarflare Communications, Inc.
MFC after: 3 weeks