The v2.1.0 is refactoring Tx and Rx paths, including few bug fixes and
is also adding a new features which are going to be available with the
newest hardware.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Some ENA devices can pass to the driver descriptor with length 0. To
avoid extra allocation, the descriptor can be reused by simply putting
it back to the device.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
The original Tx function was very long and was containing both cleanup
and the sending sections. Because of that it was having a lot of local
variables, big indentation and was hard to read.
This function was split into 2 sections:
* Sending - which is responsible for preparing the mbuf, mapping it
to the device descriptors and finally, sending packet to the HW
* Cleanup - which is releasing packets sent by the HW. Loop which was
releasing packets was reworked a bit, to make intention more visible
and aligned with other parts of the driver.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
To improve code readability, abstraction was added for operating on IO
rings indexes.
Driver was defining local variable for ring mask in each function that
needed to operate on the ring indexes. Now it is being stored in the
ring as this value won't change unless size of the ring will change and
macros for advancing indexes using the mask has been added.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Divider used for both Tx and Rx cleanup/refill threshold can cause too
big delay in case of the really big rings - for example if the 8k Rx
ring will be used, the refill won't trigger unless 1024 threshold will
be reached. It will also cause driver to try to allocate that much
descriptors.
Limiting it by fixed value - 256 in that case, would limit maximum
time spent in repopulate function.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
ena_com API should be preferred for getting number of used/available
descriptors unless extra calculation needs to be performed.
Some helper variables were added for storing values that are later
reused. Moreover, for limiting the value of sent/received packets to
the number of available descriptors, the RTE_MIN is used instead of
if function, which was doing similar thing but was less descriptive.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
* Split main Rx function into multiple ones - the body of the main
was very big and further there were 2 nested loops, which were
making the code hard to read
* Rework how the Rx mbuf chains are being created - Instead of having
while loop which has conditional check if it's first segment, handle
this segment outside the loop and if more fragments are existing,
process them inside.
* Initialize Rx mbuf using simple function - it's the common thing for
the 1st and next segments.
* Create structure for Rx buffer to align it with Tx path, other ENA
drivers and to make the variable name more descriptive - on DPDK, Rx
buffer must hold only mbuf, so initially array of mbufs was used as
the buffers. However, it was misleading, as it was named
"rx_buffer_info". To make it more clear, the structure holding mbuf
pointer was added and now there is possibility to expand it in the
future without reworking the driver.
* Remove redundant variables and conditional checks.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
In the LLQ (Low-latency queue) mode, the device can indicate that meta
data descriptor caching is disabled. In that case the driver should send
valid meta descriptor on every Tx packet.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
ENA device can report in the AENQ handler amount of Tx packets that were
dropped and not sent.
This statistic is showing global value for the device and because
rte_eth_stats is missing field that could indicate this value (it
isn't the Tx error), it is being presented as a extended statistic.
As the current design of extended statistics prevents tx_drops from
being an atomic variable and both tx_drops and rx_drops are only updated
from the AENQ handler, both were set as non-atomic for the alignment.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
The doorbell code is already issuing the doorbell by using rte_write.
Because of that, there is no need to do that before calling the
function.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.
If the device supports larger LLQ headers, the user can activate them by
using device argument 'large_llq_hdr' with value '1'.
If the device isn't supporting this feature, the default value (96B)
will be used.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Reading values from the device is about the maximum capabilities of the
device. Because of that, the names of the fields storing those values,
functions and temporary variables, should be more descriptive in order
to improve self documentation of the code.
In connection with this, the way of getting maximum queue size could be
simplified - no hardcoded values are needed, as the device is going to
send it's capabilities anyway.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
IO rings were configured with the maximum allowed size for the Tx/Rx
rings. However, the application could decide to create smaller rings.
This patch is using value stored in the ring instead of the value from
the adapter which is indicating the maximum allowed value.
Fixes: df238f84c0 ("net/ena: recreate HW IO rings on start and stop")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
The current ena_com version was generated on 25.09.2019.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
As the alignment of the defines wasn't valid, it was removed at all, so
instead of using multiple spaces or tabs, the single space after define
name is being used.
Fixes: 99ecfbf845 ("ena: import communication layer")
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Because ena_com is being used by multiple platforms which are using
different C versions, PRIu64 cannot be used directly and must be defined
in the platform file.
Fixes: b2b02edeb0 ("net/ena/base: upgrade HAL for new HW features")
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
ENA device is using 48-bit memory for IO. Because of that, the upper
limit had to be updated.
From the driver perspective, it's just a cosmetic change to make
definition of the structure 'ena_common_mem_addr' more descriptive and
the address value was verified anyway for the valid range in the
function 'ena_com_mem_addr_set()'.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
To make the debugging easier, the error logs were added in the Tx path.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
The spaces instead of tabs were used for the indent.
Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
The documentation format was aligned and few typos were fixed.
Fixes: 99ecfbf845 ("ena: import communication layer")
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
In order to use the accelerated LLQ (Low-lateny queue) mode, the driver
must limit the Tx burst and be aware that the device has the meta
caching disabled. In that situation, the meta descriptor must be valid
on each Tx packet.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
This buffer was never used by the ENA PMD. It could be used for
debugging, but it's presence is redundant now.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
This feature allows for adaptive interrupt moderation. It's not used by
the DPDK PMD, but is a part of the newest HAL version.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
After the indirection table is being saved in the device, there is no
need to convert it back, as it's already saved in host_rss_ind_tbl
array.
As a result, the call to the ena_com_ind_tbl_convert_from_device() is
not needed.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
There was a bug in ena_com_fill_hash_function(), which was causing bit to
be shifted left one bit too much.
To fix that, the ENA_FFS macro is being used (returning the location of
the first bit set), hash_function value is being subtracted by 1 if any
hash function is supported by the device and BIT macro is used for
shifting for better verbosity.
Fixes: 99ecfbf845 ("ena: import communication layer")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Although the RSS key still cannot be set, it is now being generated
every time the driver is being initialized.
Multiple devices can still have the same key if they're used by the same
driver.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
rte_memzone_reserve() will reserve the biggest contiguous memzone
available if received 0 as size param.
Fixes: 9ba7981ec9 ("ena: add communication layer for DPDK")
Cc: stable@dpdk.org
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Memory allocation region id could possibly be non-unique
due to non-atomic increment, causing allocation failure.
Fixes: 9ba7981ec9 ("ena: add communication layer for DPDK")
Cc: stable@dpdk.org
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Some of the ENA devices can't handle buffers which are smaller than a
1400B. Because of this limitation, size of the buffer is being checked
and limited during the Rx queue setup.
If it's below the allowed value, PMD won't finish it's configuration
successfully..
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Update a switch rule' action from "to VSI" to "to VSI List"
should only happen when the same rule has been programmed with
a different fwd destination. This is already handled by below
code block:
m_entry = ice_find_adv_rule_entry(...)
if (m_entry) {
...
ice_adv_add_update_vsi_list(...)
}
The following ice_update_pkt_fwd_rule is unnecessary and should be
removed due to:
1) If a switch rule's action is still to VSI, which means, it is
the first time be issued, we don't need to update it "to VSI
List."
2) Actually the implementation does not match the comment, it still
update the rule with "to VSI" action.
Fixes: fed0c5ca5f ("net/ice/base: support programming a new switch recipe")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
This patch is intended to add iavf_dev_reset ops, enable iavf to support
"port reset all".
Signed-off-by: Lunyuan Cui <lunyuanx.cui@intel.com>
Tested-by: Zhaoyan Chen <zhaoyan.chen@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
Enable source MAC address and destination MAC address as FDIR's
input set for ipv4-other, ipv4-udp and ipv4-tcp. When OVS-DPDK is
working as a pure L2 switch, enable MAC address as FDIR input set
with Mark+RSS action would help the performance speed up. And FVL
FDIR supports to change input set with MAC address.
Signed-off-by: Lunyuan Cui <lunyuanx.cui@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
This patch moved the RSS initialization from dev start to dev configure
to fix RSS advanced rule invalid issue after running port stop and port
start.
Fixes: 5ad3db8d4b ("net/ice: enable advanced RSS")
Cc: stable@dpdk.org
Signed-off-by: Junyu Jiang <junyux.jiang@intel.com>
Tested-by: Zhiwei He <zhiwei.he@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
When nfp_pf_create_dev() is cleaning up, it does not correctly set
the dev_private variable to NULL, which will lead to a double free.
Fixes: ef28aa96e5 ("net/nfp: support multiprocess")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Heinrich Kuhn <heinrich.kuhn@netronome.com>
Changing format specifier for the 'size_t' as '%z' and for 'off_t' as
'%jd'.
Also this fix enables compiling PMD for 32bit architecture.
Fixes: 29a62d1476 ("net/nfp: add CPP bridge as service")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Heinrich Kuhn <heinrich.kuhn@netronome.com>
Since the ring buffer with host is shared for both transmit
completions and receive packets, it is possible that transmitter
could get starved if receive ring gets full.
Better to process all outstanding events which frees up transmit
buffer slots, even if means dropping some packets.
Fixes: 7e6c824307 ("net/netvsc: avoid over filling Rx descriptor ring")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The transmit need signal function can avoid an unnecessary
dereference by passing the right pointer. This also makes
code better match FreeBSD driver.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If tx_free_thresh is quite low, it is possible that we need to
cleanup based on burst size.
Fixes: fc30efe3a2 ("net/netvsc: change Rx descriptor setup and sizing")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Remove unlocked check for data in receive ring.
This check is not safe because of missing barriers etc.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The netvsc PMD was putting the mac address in private data but the
core rte_ethdev doesn't allow that it. It has to be in rte_malloc'd
memory or a message will be printed on shutdown/close.
EAL: Invalid memory
Fixes: f8279f47dd ("net/netvsc: fix crash in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The VMBus has reserved transmit area (per device) and transmit
descriptors (per queue). The previous code was always having a 1:1
mapping between send buffers and descriptors.
This can lead to one queue starving another and also buffer bloat.
Change to working more like FreeBSD where there is a pool of transmit
descriptors per queue. If send buffer is not available then no
aggregation happens but the queue can still drain.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
It is possible for a packet to arrive during the configuration
process when setting up multiple queue mode. This would cause
configure to fail; fix by just ignoring receive packets while
waiting for control commands.
Use the receive ring lock to avoid possible races between
oddly behaved applications doing rx_burst and control operations
concurrently.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If application cares about descriptor limits, the netvsc device
values should reflect those of the VF as well.
Fixes: dc7680e859 ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
After VF reset, VF's VSI number may be changed,
the switch rule which forwards packet to the old
VSI number should be redirected to the new VSI
number.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Enable flow redirect on switch, currently only
support VSI redirect.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
The input set for inner type of vlan item should
be ICE_INSET_ETHERTYPE, not ICE_INSET_VLAN_OUTER.
This mac vlan filter is also part of DCF switch filter.
Fixes: 47d460d632 ("net/ice: rework switch filter")
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
This patch add switch filter permission stage support
for more flow pattern in pf only pipeline mode.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
This patch add switch filter support for IPv6 NAT-T packets,
it enable switch filter to direct IPv6 packets with
NAT-T payload to specific action.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
This patch add switch filter support for PFCP packets,
it enable switch filter to direct IPv4/IPv6 packets with
PFCP session or node payload to specific action.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>