Add a much-simplified handler that works when all offloads are
disabled, except mbuf fast free. When compared against the default
handler, under ideal conditions, cycles per packet drop by 60+%.
The driver tries to use the simple handler first.
The idea of using specialized/simplified handlers is from the Intel
and Mellanox drivers.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
The NIC has one configurable VXLAN port, which is set to the default
4789 upon vNIC reset. Adding a non-default port replaces this single
VXLAN port. Deleting the previously added non-default port restores
the VXLAN port to the hardware default.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Add a new devarg "ig-vlan-rewrite" to allow the user to set
non-default rewrite mode. The UCS VIC may add/remove/modify the VLAN
header of an ingress packet depending on the ingress VLAN rewrite
mode.
By default, the driver sets the pass-through mode, which tells the NIC
"do not touch VLAN header and preserve it as is". This mode is usually
sufficient, but can complicate deployments for certain environments.
For example, OVS-DPDK in UCS blade environments may want to use "untag
default VLAN mode", which removes the VLAN header from an ingress
packet if it matches vNIC's default VLAN.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
The UDP RSS interface has changed in the release firmware for 100G VIC
adapters. The capability bit is now in NIC_CFG. Also the driver is
supposed to use CMD_NIC_CFG_CHK and check if RSS config is
successful. No more changes are expected with respect to UDP RSS API.
Fixes: 94c3518958 ("net/enic: update UDP RSS controls")
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Recent NIC models support overlay offload. The overlay offload
feature enables the following on the NIC.
- Rx/Tx checksum offloads for both inner and outer packets.
- Rx inner packet type classification.
- TSO.
- Inner RSS.
TX descriptors do not require any changes, except the header length
for TSO. The NIC parses outer/inner packets and performs offloads on
them as necessary. The header length for tunneled TSO includes both
inner and outer headers.
The NIC actually parses and performs the above for NVGRE as well. DPDK
currently has no offload flags for NVGRE, and the hardware has no
controls to individually enable tunnel types either. So do nothing for
now.
The driver enables overlay offload by default. Add a devargs
'disable-overlay=<0|1>' to allow the app to disable it.
Also update the enic guide doc.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Modified enic_del_mac_address() to get a return value from the vnic layer.
Reused the .mac_addr_add and .mac_addr_del callbacks code to implement
primary mac address handler.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
1330 and 1400 series adapters support the drop action. Check for its
availability and set the necessary flag when creating NIC filters.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Enable rx queue interrupts if the app requests them, and vNIC has
enough interrupt resources. Use interrupt vector 0 for link status and
errors. Use vector 1 for rx queue 0, vector 2 for rx queue 1, and so
on. So, with n rx queues, vNIC needs to have at n + 1 interrupts.
For VIC, enabling and disabling rx queue interrupts are simply
mask/unmask operations. VIC's credit based interrupt moderation is not
used, as the app wants to explicitly control when to enable/disable
interrupts.
This version requires MSI-X (vfio-pci). Sharing one interrupt for link
status and rx queues is possible, but is rather complex and has no
user demands.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Currently, enic completely ignores the requested max Rx packet size
(rxmode.max_rx_pkt_len). The desired behavior is that the NIC hardware
drops packets larger than the requested size, even though they are
still smaller than MTU.
Cisco VIC does not have such a feature. But, we can accomplish a
similar (not same) effect by reducing the size of posted receive
buffers. Packets larger than the posted size get truncated, and the
receive handler drops them. This is also how the kernel enic driver
enforces the Rx side MTU.
This workaround works only when scatter mode is *not* used. When
scatter is used, there is currently no way to support
rxmode.max_rx_pkt_len, as the NIC always receives packets up to MTU.
For posterity, add a copious amount of comments regarding the
hardware's drop/receive behavior with respect to max/current MTU.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Currently, when more than 1 receive queues are configured, the driver
always enables RSS with the driver's own default hash type, key, and
RETA. The user is unable to change any of the RSS settings. Address
this by implementing the ethdev RSS API as follows.
Correctly report the RETA size, key size, and supported hash types
through rte_eth_dev_info.
During dev_configure(), initialize RSS according to the device's
mq_mode and rss_conf. Start with the default RETA, and use the default
key unless a custom key is provided.
Add the RETA and rss_conf query/set handlers to let the user change
RSS settings after the initial configuration. The hardware is able to
change hash type, key, and RETA individually. So, the handlers change
only the affected settings.
Refactor/rename several functions in order to make their intentions
clear. For example, remove all traces of RSS from
enicpmd_vlan_offload_set() as it is confusing.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Like most NICs, this hardware (Cisco VIC) also requires partial
checksum in the packet for checksum offload and TSO. So, add
the tx_pkt_prepare handler like other PMDs do.
Technically, VIC has an offload mode that does not require partial
checksum for non-TSO packets. But, it has no such mode for TSO
packets, making tx_pkt_prepare unavoidable.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
ENIC_CQ_MAX, ENIC_WQ_MAX and others are arbitrary values that
prevent the app from using more queues when they are available on
hardware. Remove them and dynamically allocate vnic_cq and such
arrays to accommodate all available hardware queues.
As a side effect of removing ENIC_CQ_MAX, this commit fixes a segfault
that would happen when the app requests more than 16 CQs, because
enic_set_vnic_res() does not consider ENIC_CQ_MAX. For example, the
following command causes a crash.
testpmd -- --rxq=16 --txq=16
Fixes: ce93d3c36d ("net/enic: fix resource check failures when bonding devices")
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
enic is currently using BSD-2-Clause, whereas the DPDK approved
license is BSD-3-Clause. So replace license text with BSD-3-Clause.
Remove LICENSE as it is redundant.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Header splitting has been disabled at least since the following
commit. Remove the remaining code to avoid confusion.
commit 947d860c82 ("enic: improve Rx performance")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
The stats_get dev op API doesn't include return value, so PMD cannot
return an error in case of failure at stats getting process time.
Since PCI devices can be removed and there is a time between the
physical removal to the RMV interrupt, the user may get invalid stats
without any indication.
This patch changes the stats_get API return value to be int instead of
void.
All the net PMDs stats_get dev ops are adjusted by this patch.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The functions here aren't called anywhere in code, at least according to
both the compiler, and some greps.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: John Daley <johndale@cisco.com>
Flow support for 1300 series adapters with the 'Advanced Filter'
mode enabled via the UCS management interface. This enables:
Attributes: ingress
Items: Outer eth, ipv4, ipv6, udp, sctp, tcp, vxlan. Inner eth, ipv4,
ipv6, udp, tcp.
Actions: queue, and void
Selectors: 'is', 'spec' and 'mask'. 'last' is not supported
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
Stub callbacks for the generic flow API and a new FLOW debug define.
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
remove __rte_unused instances that are not required.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Some customers find adding MAC addr to VF sometimes can fail,
but it is still stored in dev->data->mac_addrs[ ]. So this
can lead to some errors that assumes the non-zero entry in
dev->data->mac_addrs[ ] is valid.
Following acknowledgements are from specific NIC PMD
maintainer for their managing part.
This patch changes the ethdev internal API, it should not be
backported to a stable/LTS release so far.
Fixes: af75078fec ("first public release")
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
If a packet send is attempted with a packet larger than the NIC
is capable of processing (9208) it will be dropped with no
completion descriptor returned or completion index update, which
will lead to an mbuf leak and eventual hang.
Drop and count oversized Tx packets in the Tx burst function and
dereference/free the mbuf without sending it to the NIC.
Since the maximum Rx and Tx packet sizes are different on enic
and are now both being used, make the define ENIC_DEFAULT_MAX_PKT_SIZE
be 2 defines, one for Rx and one for Tx.
Fixes: fefed3d1e6 ("enic: new driver")
Cc: stable@dpdk.org
Signed-off-by: John Daley <johndale@cisco.com>
The mac_addr_add callback function was simply replacing the primary MAC
address instead of adding new ones and the mac_addr_remove callback would
only remove the primary MAC form the adapter. Fix the functions to add or
remove new address. Allow up to 64 MAC addresses per port.
Fixes: fefed3d1e6 ("enic: new driver")
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
Remove __rte_unused attributes in function declaration when
the parameters really are used.
Fixes: dfbd6a9cb5 ("net/enic: extend flow director support for 1300 series")
Signed-off-by: John Daley <johndale@cisco.com>
The rx_free_thresh was not being initialized and left at 0
on 1/2 of the RQs which could lead to poor multi-queue
performance.
Fixes: 856d7ba7ed ("net/enic: support scattered Rx")
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
The function names for converting between RQ indexes known to
the RTE code and internal RQ indexes for primary Start of Packet
(SOP) queues and spill-over (Data) queues was unclear and
confusing.
Clarify with more explicit function names.
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
The incorrect completion queue corresponding to an RQ would be
freed if multiple Rx queues are in use and the MTU is changed,
or an Rx queue is released. This could lead to a segmentation fault
when the device is disabled or even in the Rx or Tx paths.
The index of the completion queue corresponding to a RQ needed
to be adjusted after Rx scatter was introduced.
Fixes: 856d7ba7ed ("net/enic: support scattered Rx")
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
1300 series Cisco adapter firmware version 2.0(13) for UCS
C-series servers and 3.1(2) for blade servers supports more
filtering capabilities. The feature can be enabled via Cisco
CIMC or USCM with the 'advanced filters' radio button. When
enabled, the these additional flow director modes are available:
RTE_ETH_FLOW_NONFRAG_IPV4_OTHER
RTE_ETH_FLOW_NONFRAG_IPV4_SCTP
RTE_ETH_FLOW_NONFRAG_IPV6_UDP
RTE_ETH_FLOW_NONFRAG_IPV6_TCP
RTE_ETH_FLOW_NONFRAG_IPV6_SCTP
RTE_ETH_FLOW_NONFRAG_IPV6_OTHER
Changes:
- Detect and set an 'advanced filters' flag dependent on the adapter
capability.
- Implement RTE_ETH_FILTER_INFO filter op to return the flow types
available dependent on whether advanced filters are enabled.
- Use a function pointer to select how filters are added to the adapter:
copy_fltr_v1() for older firmware/adapters or copy_fltr_v2() for
adapters which support advanced filters.
- Apply fdir global masks to filters when in advanced filter mode.
- Update documentation.
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
Re-initialize Rq's when MTU is changed. This allows for more
efficient use of mbufs when moving from an MTU that is greater
than the mbuf size to one that is less. Also move to using Rx
scatter mode when moving from an MTU less than the mbuf size
to one that is greater.
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Move link check code to a new function so that it can be reused
by the interrupt handler.
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
The enic PMD code has diverged from code that was once
shared with the enic kernel mode driver for performance
reasons. It is confusing and misleading to print the
internal version number. Remove it.
Fixes: fefed3d1e6 ("enic: new driver")
Signed-off-by: John Daley <johndale@cisco.com>
The enic PMD was using the same variables in the enic structure to
track two different things. Initially rq_count, wq_count, cq_count,
and intr_count were set to the values obtained from the VIC adapters
as the maximum resources allocated on the VIC, then in
enic_set_vnic_res(), they were set to the counts of resources actually
used, discarding the initial values. The checks in enic_set_vnic_res()
were technically incorrect if it is called more than once on a port,
which happens when using bonding, but were harmless in practice as the
checks couldn't fail on the second call.
The enic rx-scatter patch misunderstood the subtleties of
enic_set_vnic_res(), and naively added a multiply by two to the
rq_count check. This resulted in the rq_count check failing when
enic_set_vnic_res() was called a second time, ie when using bonding.
This patch adds new variables to the enic structure to track the
maximum resources the VIC is configured to provide so that the
information isn't later lost and calls to enic_set_vnic_res() do
the expected thing.
Fixes: 856d7ba7ed ("net/enic: support scattered Rx")
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
The macro ENIC_ASSERT does the same thing as RTE_ASSERT,
thus it can be removed.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: John Daley <johndale@cisco.com>
The Rx scatter patch failed to make a few changes and resulted in
problems when using multiple receive queues (RQs) in DPDK (ie RSS)
since the wrong adapter resources were being used.
- get and use the correct completion queue index associated with a
receive queue.
- set the correct receive queue index when using RSS
Fixes: 856d7ba7ed ("net/enic: support scattered Rx")
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Provide an update MTU callback. The function returns -ENOTSUP
if Rx scatter is enabled. Updating the MTU to be greater than
the value configured via the Cisco CIMC/UCSM management interface
is allowed provided it is still less than the maximum egress packet
size allowed by the NIC minus the size of the L2 header.
Signed-off-by: John Daley <johndale@cisco.com>
Pull in common VNIC code which enables querying for max egress
packet size with newer firmware via a device command. If the
field is non-zero, it is the max egress packet size. If it is
0, the default value (9022) can safely be assumed. The value
for 1300 series VICS using firmware versions >= 3.1.2 for blade
series and >= 2.0.13 for rack series servers is 9208.
Tx buffers can be emitted only if they are less than the max egress
packet size regardless of the MTU setting (the MTU is advisory).
The max egress packet size can used to determine the upper limit
of the MTU since the enic can also receive packets of size greater
than max egress packet size. A max_mtu variable is added with
a value of max egress packet size minus L2 header size.
The default MTU is set via the CIMC/UCSM management interface and
currently allows value up to 9000. If the value is changed, the
host must be reboot. To avoid the reboot and allow MTU values
up to the max capability of the NIC, MTU update capability will
be added with a max value capped by max_mtu.
Signed-off-by: John Daley <johndale@cisco.com>
enic_alloc_consistent() allocated memory, but enic_free_consistent()
was an empty function, so allocated memory was never freed.
This commit adds a list and lock to the enic structure to keep track
of the memzones allocated in enic_alloc_consistent(), and
enic_free_consistent() uses that information to properly free memory.
Fixes: fefed3d1e6 ("enic: new driver")
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
For performance reasons, this patch uses 2 VIC RQs per RQ presented to
DPDK.
The VIC requires that each descriptor be marked as either a start of
packet (SOP) descriptor or a non-SOP descriptor. A one RQ solution
requires skipping descriptors when receiving small packets and results
in bad performance when receiving many small packets.
The 2 RQ solution makes use of the VIC feature that allows a receive
on primary queue to 'spill over' into another queue if the receive is
too large to fit in the buffer assigned to the descriptor on the
primary queue. This means that there is no skipping of descriptors
when receiving small packets and results in much better performance.
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Private/conflicting ol_flags where used to enable UDP/TCP Tx
offloads. Use the common flags in PKT_TX_L4_MASK to support them.
When updating flags, also do some minor code rearranging for
slightly better performane.
Fixes: fefed3d1e6 ("enic: new driver")
Signed-off-by: John Daley <johndale@cisco.com>
Add an ASSERT macro for the enic driver which is enabled when the log
level is >= RTE_LOG_DEBUG. Assert that number of mbufs to return to
the pool in the Tx function is never greater than the max allowed.
Signed-off-by: John Daley <johndale@cisco.com>
The list of mbufs held by the driver on Tx was allocated in chunks
(a hold-over from the enic kernel mode driver). The structure used
next pointers across chunks which led to cache misses.
Allocate the array used to hold mbufs in flight on Tx with
rte_zmalloc_socket(). Remove unnecessary fields from the structure
and use head and tail pointers instead of next pointers.
Signed-off-by: John Daley <johndale@cisco.com>
Functions existed which were never called. Removed them. Also
rename the 'pmd' from the name of the Tx function to improve clarity.
Signed-off-by: John Daley <johndale@cisco.com>
The Tx functions were in enic_ethdev.c and enic_main.c - files in which
they did not logically belong. To make things consistent with most
other drivers, we therefore extract them and place them with the equivalent
Rx functions into a file called enic_rxtx.c.
Signed-off-by: John Daley <johndale@cisco.com>
Truncated packets occur on enic if an mbuf is not big enough to
receive it or there aren't enough mbufs if rx scatter is in use.
They show up as error packets but unlike other error packets (like
packets bad FCS) there are no nic drop counts incremented for them.
Truncated packets are calculated by subtracting hardware errors from
software errors. Note: this causes transient inaccuracies in the
ipackets count. Also, the length of truncated packets are counted
in ibytes even though truncated packets are dropped which can make
ibytes be slightly higher than it should be.
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
rx_no_bufs is a hardware counter of packets dropped on the
interface due to no host buffers and should be used to update
r_stats->imissed counter instead of rx_nombuf.
Include rx_drop in ierrors. rx_drop is incremented if packets
arrive when the receive queue is disabled.
Add a structure and functions for initializing and clearing
software counters. Add count of Rx mbuf allocation failures
(rx_nombuf) as the first counter.
Fixes: fefed3d1e6 ("enic: new driver")
Signed-off-by: John Daley <johndale@cisco.com>
The macro RTE_VERIFY always checks a condition.
It is optimized with "unlikely" hint.
While this macro is well suited for test applications, it is preferred
in libraries and examples to enable such check in debug mode.
That's why the macro RTE_ASSERT is introduced to call RTE_VERIFY only
if built with debug logs enabled.
A lot of assert macros were duplicated and enabled with a specific flag.
Removing these #ifdef allows to test these code branches more easily
and avoid dead code pitfalls.
The ENA_ASSERT is kept (in debug mode only) because it has more
parameters to log.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This is a wholesale replacement of the Enic PMD receive path in order
to improve performance and code clarity. The changes are:
- Simplify and reduce code path length of receive function.
- Put most of the fast-path receive functions in one file.
- Reduce the number of posted_index updates (pay attention to
rx_free_thresh)
- Remove the unneeded container structure around the RQ mbuf ring
- Prefetch next Mbuf and descriptors while processing the current one
- Use a lookup table for converting CQ flags to mbuf flags.
Signed-off-by: John Daley <johndale@cisco.com>