When calling rte_pktmbuf_alloc_bulk, if there are
not enough objects in the mempool, it returns
a negative value, which should be reflected
in the Doxygen comments.
Fixes: 9ec201f5d6e7 ("mbuf: provide bulk allocation")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Some PMDs need to know the tunnel type in order to handle advance TX
features. This patch adds a new TX offload flag for MPLS-in-UDP packets.
Signed-off-by: Harish Patil <harish.patil@cavium.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
rte_pktmbuf_headroom() and rte_pktmbuf_tailroom() should be usable
with any segment, not only with headered ones, so is_header should be 0
when we call for sanity check inside them.
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Different drivers use internal macros like force_inline for compiler
always inline feature.
Standardizing it through __rte_always_inline macro.
Verified the change by comparing the output binary file.
No difference found in the output binary file with this change.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The debug assertions when allocating a raw mbuf are not correct since
commit 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool"),
which triggers a panic when using this function in debug mode
Change the expected number of segments to 1 instead of 0, and
factorize these sanity checks.
Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Signed-off-by: Gregory Etelson <gregory@weka.io>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
With GCC 7 we need to explicitly document when we are falling through from
one switch case to another.
Fixes: af75078fece3 ("first public release")
Fixes: 8bae1da2afe0 ("hash: fallback to software CRC32 implementation")
Fixes: 9ec201f5d6e7 ("mbuf: provide bulk allocation")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
On i686 builds, the uin64_t type is 64-bits in size but is aligned to
32-bits only. This causes mbuf fields for rearm_data to not be 16-byte
aligned on 32-bit builds, which causes errors with some vector PMDs which
expect the rearm data to be aligned as on 64-bit.
Given that we cannot use the extra space in the data structures anyway, as
it's already used on 64-bit builds, we can just force alignment of the
physical address in the mbuf to 8-bytes in all cases. This has no effect on
64-bit systems, but fixes the updated PMDs on 32-bit.
Fixes: f4356d7ca168 ("net/i40e: eliminate mbuf write on rearm")
Fixes: f160666a1073 ("net/ixgbe: eliminate mbuf write on rearm")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Move this field in the second cache line, since no driver use it
in Rx path. The freed space will be used by a timestamp in next
commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Change the size of m->port and m->nb_segs to 16 bits. It is now possible
to reference a port identifier larger than 256 and have a mbuf chain
larger than 256 segments.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
To avoid multiple stores on fast path, Ethernet drivers
aggregate the writes to data_off, refcnt, nb_segs and port
to an uint64_t data and write the data in one shot
with uint64_t* at &mbuf->rearm_data address.
Some of the non-IA platforms have store operation overhead
if the store address is not naturally aligned.This patch
fixes the performance issue on those targets.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
To optimize the freeing of the segments, we try try to only update
m->refcnt, m->next, and m->nb_segs when it's required (idea from
Konstantin Ananyev <konstantin.ananyev@intel.com>).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Document the function and make it public, since it is used at several
places in the drivers. The old one is marked as deprecated.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
When possible, replace the uses of rte_mempool_create() with
the helper provided in librte_mbuf: rte_pktmbuf_pool_create().
This is the preferred way to create a mbuf pool.
This also updates the documentation.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
mi->next will be assigned to NULL few lines later, trivial patch
Fixes: ea672a8b1655 ("mbuf: remove the rte_pktmbuf structure")
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch adds function rte_pktmbuf_linearize to let crypto PMD coalesce
chained mbuf before crypto operation and extend their capabilities to
support segmented mbufs when device cannot handle them natively.
Included unit tests for rte_pktmbuf_linearize functionality:
1) Creates banch of segmented mbufs with different size and number of
segments.
2) Fills noncontigouos mbuf with sequential values.
3) Uses rte_pktmbuf_linearize to coalesce segmented buffer into one
contiguous.
4) Verifies data in linearized buffer.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add a new Tx flag in mbuf, that can be set by applications to
enable the MACsec offload for a packet to be transmitted.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Added API for `rte_eth_tx_prepare`
uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
Added fields to the `struct rte_eth_desc_lim`:
uint16_t nb_seg_max;
/**< Max number of segments per whole packet. */
uint16_t nb_mtu_seg_max;
/**< Max number of segments per one MTU */
These fields can be used to create valid packets according to the
following rules:
* For non-TSO packet, a single transmit packet may span up to
"nb_mtu_seg_max" buffers.
* For TSO packet the total number of data descriptors is "nb_seg_max",
and each segment within the TSO may span up to "nb_mtu_seg_max".
Added functions:
int
rte_validate_tx_offload(struct rte_mbuf *m)
to validate general requirements for tx offload set in mbuf of packet
such a flag completness. In current implementation this function is
called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
int rte_net_intel_cksum_prepare(struct rte_mbuf *m)
to prepare pseudo header checksum for TSO and non-TSO tcp/udp packets
before hardware tx checksum offload.
- for non-TSO tcp/udp packets full pseudo-header checksum is
counted and set.
- for TSO the IP payload length is not included.
int
rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)
this function uses same logic as rte_net_intel_cksum_prepare, but
allows application to choose which offloads should be taken into
account, if full preparation is not required.
PERFORMANCE TESTS
-----------------
This feature was tested with modified csum engine from test-pmd.
The packet checksum preparation was moved from application to Tx
preparation step placed before burst.
We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)
We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.
IMPACT:
1) For unimplemented Tx preparation callback the performance impact is
negligible,
2) For packet condition check without checksum modifications (nb_segs,
available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
initialization) is 14060924/13588094 (~3.48% drop)
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Fixes typos present in the documentation and code comments.
Signed-off-by: Alain Leon <xerebz@gmail.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Previous patch updated the functions without updating all the comments.
Fixes: 591a9d7985c1 ("add FILE argument to debug functions")
Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: John McNamara <john.mcnamara@intel.com>
When receiving coalesced packets in virtio, the original size of the
segments is provided. This is a useful information because it allows to
resegment with the same size.
Add a RX new flag in mbuf, that can be set when packets are coalesced by
a hardware or virtual driver when the m->tso_segsz field is valid and is
set to the segment size of original packets.
This flag is used in next commits in the virtio pmd.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Following discussions in [1] and [2], introduce a new bit to
describe the Rx checksum status in mbuf.
Before this patch, only one flag was available:
PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.
And same for L3:
PKT_RX_IP_CKSUM_BAD: IP cksum of RX pkt. is not OK.
This had 2 issues:
- it was not possible to differentiate "checksum good" from
"checksum unknown".
- it was not possible for a virtual driver to say "the checksum
in packet may be wrong, but data integrity is valid".
This patch tries to solve this issue by having 4 states (2 bits)
for the IP and L4 Rx checksums. New values are:
- PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
-> the application should verify the checksum by sw
- PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
-> the application can drop the packet without additional check
- PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
-> the application can accept the packet without verifying the
checksum by sw
- PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
data, but the integrity of the L4 data is verified.
-> the application can process the packet but must not verify the
checksum by sw. It has to take care to recalculate the cksum
if the packet is transmitted (either by sw or using tx offload)
And same for L3 (replace L4 by IP in description above).
This commit tries to be compatible with existing applications that
only check the existing flag (CKSUM_BAD).
[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-June/040007.html
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The functions rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name()
can dump one flag, or set of flag that are part of the same mask (ex:
PKT_TX_UDP_CKSUM, part of PKT_TX_L4_MASK). But they are not designed to
dump the list of flags contained in mbuf->ol_flags.
This commit introduce new functions to do that. Similarly to the packet
type dump functions, the goal is to factorize the code that could be
used in several applications and reduce the risk of desynchronization
between the flags and the dump functions.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The file rte_mbuf.h starts to be quite big, and next commits
will introduce more functions related to packet types. Let's
move them in a new file.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Introduce a new function to read the packet data from an mbuf chain. It
linearizes the data if required, and also ensures that the mbuf is large
enough.
This function is used in next commits that add a software parser to
retrieve the packet type.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
To support tunneling packet offload capabilities on Tx side, PMDs
(e.g., i40e) need to know what kind of tunneling type of this packet.
Instead of analyzing the packet itself, we depend on applications to
correctly set the tunneling type. These flags are defined inside
rte_mbuf.ol_flags.
Signed-off-by: Zhe Tao <zhe.tao@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some application use rte_mbuf_raw_alloc() function to improve
performance by not resetting mbuf's fields to their default state.
This can be however problematic for mbuf consumers that need some
headroom, meaning that data_off field gets decremented after
allocation. When the mbuf is re-used afterwards, there might not
be enough room for the consumer to prepend anything, if the data_off
field is not reset to its default value.
This patch adds a new rte_pktmbuf_reset_headroom() function that
applications can call to reset the data_off field.
This patch also replaces current data_off affectations in the mbuf
lib with a call to this function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Exported header files used by applications should allow the strictest
compiler flags. Language extensions used in many places must be explicitly
marked to avoid warnings and compilation failures.
Unnamed structs/unions are allowed since C11, however many compiler
versions do not use this mode by default.
This commit prevents the following errors:
error: ISO C99 doesn't support unnamed structs/unions
error: struct has no named members
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Exported header files used by applications should allow the strictest
compiler flags. Language extensions used in many places must be explicitly
marked or removed to avoid warnings and compilation failures.
This commit prevents the following errors:
error: type of bit-field `[...]' is a GCC extension
Note: the standard does not require implementations to issue a diagnostic
message with these, and such errors do not occur with recent GCC or clang
versions. However, GCC 4.7 is still common and using the extension keyword
is easier than checking compiler version.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Exported header files used by applications should allow the strictest
compiler flags. Language extensions used in many places must be explicitly
marked or removed to avoid warnings and compilation failures.
The extension keyword is used whenever the C99 syntax cannot do it.
This commit prevents the following errors:
error: ISO C forbids zero-size array `[...]'
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The function __rte_mbuf_raw_alloc was reserved for internal use and
has been deprecated in favor of the public function rte_mbuf_raw_alloc.
It can be safely removed now.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Following the discussions from:
http://dpdk.org/ml/archives/dev/2015-July/021721.htmlhttp://dpdk.org/ml/archives/dev/2016-April/038143.html
The value of these flags is 0, making them useless. Today, no example
application checks them on Rx, and only few drivers sets them and
silently give wrong packets to the application, which should not happen.
This patch removes the unused flags from rte_mbuf and their use in the
drivers. The i40e and fm10k are kept as they are today and should be
fixed to drop bad packets. The enic driver is managed by its maintainer
in another patch.
Fixes: c22265f6 ("mbuf: add new packet flags for i40e")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
PMDs not advertising the same flags in similar conditions.
Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
and PKT_RX_QINQ_STRIPPED that are better defined:
PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
is enabled in the RX configuration of the PMD.
For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
It should be removed from applications and PMDs in a future revision.
This patch also updates the drivers. For PKT_RX_VLAN_PKT:
- e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
required.
- fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
- ixgbe: modification done (vector and normal), the old flag was set
when a vlan was recognized, even if vlan stripping was disabled.
- the other drivers do not support vlan stripping.
For PKT_RX_QINQ_PKT, it was only supported on i40e, and the behavior was
already correct, so we can reuse the same bit value for
PKT_RX_QINQ_STRIPPED.
[1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some architectures (ex: Power8) have a cache line size of 128 bytes,
so the drivers should not expect that prefetching the second part of
the mbuf with rte_prefetch0(&m->cacheline1) is valid.
This commit add helpers that can be used by drivers to prefetch the
rx or tx part of the mbuf, whatever the cache line size.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The rte_pktmbuf_detach() function should decrease refcnt on a direct
buffer as stated in doc/guides/prog_guide/mbuf_lib.rst:
"whenever the indirect buffer is detached, the reference counter on the
direct buffer is decremented."
Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Many drivers provide their own implementation of rte_mbuf_raw_alloc(),
duplicating the code. Introduce a new public function in rte_mbuf to
allocate a raw mbuf (uninitialized).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The macro RTE_VERIFY always checks a condition.
It is optimized with "unlikely" hint.
While this macro is well suited for test applications, it is preferred
in libraries and examples to enable such check in debug mode.
That's why the macro RTE_ASSERT is introduced to call RTE_VERIFY only
if built with debug logs enabled.
A lot of assert macros were duplicated and enabled with a specific flag.
Removing these #ifdef allows to test these code branches more easily
and avoid dead code pitfalls.
The ENA_ASSERT is kept (in debug mode only) because it has more
parameters to log.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Some flags were poisoned after having been removed from EAL and mbuf
in releases 1.8 (b10eef348d, 62814bc2e9) and 2.0 (4769bc5a27cc).
After several releases, they have probably disappeared from all
applications going to upgrade to DPDK 16.07.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
X550 will do VxLAN & NVGRE RX checksum off-load automatically.
This patch exposes the result of the checksum off-load.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
As cryptodev library does not depend on mbuf_offload library
any longer, this patch removes it.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
are defined in each PMD driver file. Convert macros to inline
functions and move them to common lib/librte_mbuf/rte_mbuf.h file.
PMD drivers include rte_mbuf.h file directly/indirectly hence no
additioanl header file inclusion is necessary.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.
Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device
In this implementation, while() loop is used because we could not assume
count is strictly positive. Using while() loop saves one line of check.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
No need to split mbuf structure to two cache lines for 128-byte cache
line size targets as it can fit on a single 128-byte cache line.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This library add support for adding a chain of offload operations to a
mbuf. It contains the definition of the rte_mbuf_offload structure as
well as helper functions for attaching offloads to mbufs and a mempool
management functions.
This initial implementation supports attaching multiple offload
operations to a single mbuf, but only a single offload operation of a
specific type can be attach to that mbuf.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Increase the number of possible subports per port to allow up to 16 bits.
It is still possible that this will require excessive RAM.
Although mbuf structure is changed, it is ABI compatiable since it
just expands existing sched part of structure to overlap pre-existing hole
in the hash element of structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Chaining/segmenting mbufs can be useful in many places, so make it
global.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>