Sketching algorithm provide high-fidelity approximate measurements and
appears as a promising alternative to traditional approaches such as
packet sampling.
NitroSketch [1] is a software sketching framework that optimizes
performance, provides accuracy guarantees, and supports a variety of
sketches.
This commit adds a new data structure called sketch into
membership library. This new data structure is an efficient
way to profile the traffic for heavy hitters. Also use min-heap
structure to maintain the top-k flow keys.
[1] Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir
Braverman, Roy Friedman, Vyas Sekar, "NitroSketch: Robust and General
Sketch-based Monitoring in Software Switches", in ACM SIGCOMM 2019.
https://dl.acm.org/doi/pdf/10.1145/3341302.3342076
Signed-off-by: Alan Liu <zaoxingliu@gmail.com>
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>
Add support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts. And the two parts will be put into different mempools.
Currently, protocol based buffer split is not supported in vectorized
paths.
A new API ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Add command line parameter:
--rxhdrs=eth[,ipv4]
Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.
Add interactive mode command:
testpmd>set rxhdrs eth,ipv4,ipv4-udp
(protocol sequence should be valid)
The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
(default protocols of testpmd : eth|ipv4|ipv6|ipv4-tcp|ipv6-tcp|
ipv4-udp|ipv6-udp|ipv4-sctp|ipv6-sctp|grenat|inner-eth|
inner-ipv4|inner-ipv6|inner-ipv4-tcp|inner-ipv6-tcp|
inner-ipv4-udp|inner-ipv6-udp|inner-ipv4-sctp|inner-ipv6-sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.
However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.
This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.
Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
proto_hdr = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
RTE_PTYPE_L4_UDP
For inner ETH-IPV4-UDP, it should be defined as
proto_hdr = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
If the protocol header is repeated with the previously defined one,
the repeated part should be omitted. For example, split after ETH, ETH-IPV4
and ETH-IPV4-UDP, it should be defined as
proto_hdr0 = RTE_PTYPE_L2_ETHER
proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
proto_hdr2 = RTE_PTYPE_L4_UDP
If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be used to obtain a list of these protocol headers.
For example, let's suppose we configured the Rx queue with the
following segments:
seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
off0=2B
seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
seg2 - pool2, proto_hdr2=0, off1=0B
The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
seg1 - udp header @ 128 in mbuf from pool1
seg2 - payload @ 0 in mbuf from pool2
Now buffer split can be configured in two modes. User can choose length
or protocol header to configure buffer split according to NIC's
capability. For length based buffer split, the mp, length, offset field
in Rx packet segment should be configured, while the proto_hdr field
must be 0. For protocol header based buffer split, the mp, offset,
proto_hdr field in Rx packet segment should be configured, while the
length field must be 0.
Note: When protocol header split is enabled, NIC may receive packets
which do not match all the protocol headers within the Rx segments.
At this point, NIC will have two possible split behaviors according to
matching results, one is exact match, another is longest match.
The split result of NIC must belong to one of them.
The exact match means NIC only do split when the packets exactly match all
the protocol headers in the segments. Otherwise, the whole packet will be
put into the last valid mempool. The longest match means NIC will do split
until packets mismatch the protocol header in the segments. The rest will
be put into the last valid pool.
Pseudo-code for exact match:
FOR each seg in segs except last one
IF proto_hdr is not matched THEN
BREAK
END IF
END FOR
IF loop breaked THEN
put whole pkt in last seg
ELSE
put protocol header in each seg
put everything else in last seg
END IF
Pseudo-code for longest match:
FOR each seg in segs except last one
IF proto_hdr is matched THEN
put protocol header in seg
ELSE
BREAK
END IF
END FOR
put everything else in last seg
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
When creating flow subscription pattern that it might cause a
memory leak.
This patch fix the error by adding a free memory code.
And some typos have also been fixed.
Coverity issue: 381130
Fixes: 6d42380e59 ("net/iavf: add flow subscrption supported pattern")
Fixes: 7b902af499 ("net/iavf: support flow subscription rule")
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
In the following two cases, tcp_hdr + sizeof(*tcp_hdr) == pkt_end,
and the TCP port is not taken into account in calculating the HASH
value of TCP packets. TCP connections with the same source and
destination IP addresses will be hashed to the same slave port,
which may cause load imbalance.
1. TCP Pure ACK packets with no options, The header length is 20
and there is no data.
2. A TCP packet contains data, but the first seg of the mbuf
contains only the header information (ETH, IP, TCP), and the
data is in subsequent segs, which is usually the case in the
indirect mbuf used for zero-copy.
Fixes: 726158060d ("net/bonding: fix potential out of bounds read")
Cc: stable@dpdk.org
Signed-off-by: Jun Qiu <jun.qiu@jaguarmicro.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
Exclude CRC fields, the minimum Ethernet packet
length is 60 bytes. When the actual packet length
is less than 60 bytes, padding is added to the tail.
When GRO is performed on a packet containing a padding
field, mbuf->pkt_len is the one that contains the
padding field, which leads to the error of thinking
of the padding field as the actual content of the packet.
We need to trim away this extra padding field during
GRO processing.
Fixes: 0d2cbe59b7 ("lib/gro: support TCP/IPv4")
Cc: stable@dpdk.org
Signed-off-by: Jun Qiu <jun.qiu@jaguarmicro.com>
Acked-by: Jiayu Hu <Jiayu.hu@intel.com>
These actions have been deprecated since DPDK 21.11 as
ambiguous and hard-to-use, but their removal might not
be popular because net drivers i40e, ixgbe and txgbe
employ these actions in complicated "PF/VF + QUEUE"
tunnel rule support. Maintainers of these drivers
should voice their attitude to the said problem.
For now, document the status in deprecation notes.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
In commit [1], It was announced to remove the DPAA2 cmdif
raw driver as there was no active user known at that time.
But now, one of the DPAA2 user has objected this driver
removal so in this patch, removing the deprecation notice
for the driver.
[1] commit 10f0e51554 ("doc: announce removal of DPAA2 cmdif raw driver")
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
The IPsec SA expiry events were added as per below patch,
but the deprecation notice was not removed. This patch removed it.
Fixes: d1ce79d14b ("ethdev: add IPsec SA expiry event subtypes")
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Free mbufs from event vector list when enqueue operation fails
and during event port flush for cleanup.
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Add fixed point multiplication for EC curve in CNXK.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
In asym op, while parsing test interim info, existing buffer of size
256 bytes is not sufficient, hence setting it to maximum that a test
would need.
Fixes: 58cc98801e ("examples/fips_validation: add JSON parsing")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
Added function to calculate hash size for a given SHA hash algorithm.
Fixes: d5c247145c ("examples/fips_validation: add parsing for SHA")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
Asym tests need a callback to write interim info in expected output.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
If a test group does not have expected key, it should not crash.
This patch fixes parsing test group info to continue further
when a key does not exist (as in asym tests).
Fixes: 58cc98801e ("examples/fips_validation: add JSON parsing")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
In case of FIPS 140-2 format of test vectors in MCT test, msg is
not given in the test vector, hence pt will be NULL which test
function has to handle correctly.
Fixes: d5c247145c ("examples/fips_validation: add parsing for SHA")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Tested-by: Brian Dooley <brian.dooley@intel.com>
Added function to parse algorithm for TDES CBC and ECB tests in JSON.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
Make use of key param in test callbacks so that,
test callback can be shared with multiple keys.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
Store SHA test type in its own interim info struct instead of AES.
Fixes: d5c247145c ("examples/fips_validation: add parsing for SHA")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Tested-by: Brian Dooley <brian.dooley@intel.com>
Instead of allocating memory in every external iteration, do once
in the beginning of AES MCT tests and free at the end.
Fixes: 8b8546aaed ("examples/fips_validation: add parsing for AES-CBC")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Brian Dooley <brian.dooley@intel.com>
Code clean up due to if-check not required
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Added parameters in rte_bbdev_queue_data to expose information
with regards to any queue related failure and warning
which cannot be supported in existing API.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Extended bbdev operations to support FFT based operations.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Add support in existing bbdev PMDs for the explicit number of queues
and priority for each operation type configured on the device.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Added more options in the API to expose the number
of queues exposed and related priority.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Added device status information, so that the PMD can
expose information related to the underlying accelerator device status.
Minor order change in structure to fit into padding hole.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Mingshan Zhang <mingshan.zhang@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Updated the enum for rte_bbdev_op_type
to allow to keep ABI compatible for enum insertion
while adding padded maximum value for array need.
Removing RTE_BBDEV_OP_TYPE_COUNT and instead exposing
RTE_BBDEV_OP_TYPE_SIZE_MAX.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Added check so user gets error if they try to configure the
nb_max_matches value when using rte_regexdev_configure().
Signed-off-by: Gerry Gribbon <ggribbon@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Added support to allow parsing of a combined ROF file to
locate compatible binary ROF data for the Bluefield hardware
being run on.
Signed-off-by: Gerry Gribbon <ggribbon@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Allows application to query maximum number of mbuf segments that can
be chained together.
Signed-off-by: Gerry Gribbon <ggribbon@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Enabled software PMDs in IOVA as PA disabled build
as they work with IOVA as VA.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Enabled the flag pmd_supports_disable_iova_as_pa in cnxk driver build
files as they work with IOVA as VA. Updated cn9k and cn10k soc build
configurations to disable the IOVA as PA build by default.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Swapped position of mbuf next pointer and second dynamic field (dynfield2)
if the build is configured to disable IOVA as PA.
This is to move the mbuf next pointer to first cache line.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
If IOVA as PA is disabled during build, mbuf physical address field is
undefined. This space is used to add the second dynamic field.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
IOVA mode in DPDK is either PA or VA.
The new build option enable_iova_as_pa configures the mode to PA
at compile time.
By default, this option is enabled.
If the option is disabled, only drivers which support it are enabled.
Supported driver can set the flag pmd_supports_disable_iova_as_pa
in its build file.
mbuf structure holds the physical (PA) and virtual address (VA).
If IOVA as PA is disabled at compile time, PA field (buf_iova)
of mbuf is redundant as it is the same as VA
and is replaced by a dummy field.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Used rte_mbuf_data_iova API to get the physical address of mbuf data.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Added APIs rte_mbuf_iova_set and rte_mbuf_iova_get to set and get the
physical address of an mbuf respectively. Updated applications and
library to use the same.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
A flush threshold for the mempool cache was introduced in DPDK version
1.3, but rte_mempool_do_generic_get() was not completely updated back
then, and some inefficiencies were introduced.
Fix the following in rte_mempool_do_generic_get():
1. The code that initially screens the cache request was not updated
with the change in DPDK version 1.3.
The initial screening compared the request length to the cache size,
which was correct before, but became irrelevant with the introduction of
the flush threshold. E.g. the cache can hold up to flushthresh objects,
which is more than its size, so some requests were not served from the
cache, even though they could be.
The initial screening has now been corrected to match the initial
screening in rte_mempool_do_generic_put(), which verifies that a cache
is present, and that the length of the request does not overflow the
memory allocated for the cache.
This bug caused a major performance degradation in scenarios where the
application burst length is the same as the cache size. In such cases,
the objects were not ever fetched from the mempool cache, regardless if
they could have been.
This scenario occurs e.g. if an application has configured a mempool
with a size matching the application's burst size.
2. The function is a helper for rte_mempool_generic_get(), so it must
behave according to the description of that function.
Specifically, objects must first be returned from the cache,
subsequently from the backend.
After the change in DPDK version 1.3, this was not the behavior when
the request was partially satisfied from the cache; instead, the objects
from the backend were returned ahead of the objects from the cache.
This bug degraded application performance on CPUs with a small L1 cache,
which benefit from having the hot objects first in the returned array.
(This is probably also the reason why the function returns the objects
in reverse order, which it still does.)
Now, all code paths first return objects from the cache, subsequently
from the backend.
The function was not behaving as described (by the function using it)
and expected by applications using it. This in itself is also a bug.
3. If the cache could not be backfilled, the function would attempt
to get all the requested objects from the backend (instead of only the
number of requested objects minus the objects available in the backend),
and the function would fail if that failed.
Now, the first part of the request is always satisfied from the cache,
and if the subsequent backfilling of the cache from the backend fails,
only the remaining requested objects are retrieved from the backend.
The function would fail despite there are enough objects in the cache
plus the common pool.
4. The code flow for satisfying the request from the cache was slightly
inefficient:
The likely code path where the objects are simply served from the cache
was treated as unlikely. Now it is treated as likely.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Presently, HW is programmed only to receive packets from LPB pool.
Making all packets received from LPB pool.
But, CNXK HW supports two pools,
- SPB -> packets with smaller size (less than 4K)
- LPB -> packets with bigger size (greater than 4K)
Patch enables multiple mempool capability, pool is selected based
on the packet's length. So, basically, PMD programs HW for receiving
packets from both SPB and LPB pools based on the packet's length.
This is achieved by enabling rx multiple mempool offload,
RTE_ETH_RX_OFFLOAD_MUL_MEMPOOL. This allows the application to send
more than one pool(in our case two) to the driver, with different
segment(packet) lengths, which helps the driver to configure both
pools based on segment lengths.
This is often useful for saving the memory where the application
can create a different pool to steer the specific size of the
packet, thus enabling effective use of memory.
Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
Some of the HW has support for choosing memory pools based on the
packet's size.
This is often useful for saving the memory where the application
can create a different pool to steer the specific size of the
packet, thus enabling more efficient usage of memory.
For example, let's say HW has a capability of three pools,
- pool-1 size is 2K
- pool-2 size is > 2K and < 4K
- pool-3 size is > 4K
Here,
pool-1 can accommodate packets with sizes < 2K
pool-2 can accommodate packets with sizes > 2K and < 4K
pool-3 can accommodate packets with sizes > 4K
With multiple mempool capability enabled in SW, an application may
create three pools of different sizes and send them to PMD. Allowing
PMD to program HW based on the packet lengths. So that packets with
less than 2K are received on pool-1, packets with lengths between 2K
and 4K are received on pool-2 and finally packets greater than 4K
are received on pool-3.
Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
This patch adds the hairpin-conf command line parameter to flow-perf
application. hairpin-conf parameter takes a hexadecimal bitmask with
bits having the following meaning:
- Bit 0 - Force memory settings of hairpin RX queue.
- Bit 1 - Force memory settings of hairpin TX queue.
- Bit 4 - Use locked device memory for hairpin RX queue.
- Bit 5 - Use RTE memory for hairpin RX queue.
- Bit 8 - Use locked device memory for hairpin TX queue.
- Bit 9 - Use RTE memory for hairpin TX queue.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>