The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
PMDs not advertising the same flags in similar conditions.
Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
and PKT_RX_QINQ_STRIPPED that are better defined:
PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
is enabled in the RX configuration of the PMD.
For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
It should be removed from applications and PMDs in a future revision.
This patch also updates the drivers. For PKT_RX_VLAN_PKT:
- e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
required.
- fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
- ixgbe: modification done (vector and normal), the old flag was set
when a vlan was recognized, even if vlan stripping was disabled.
- the other drivers do not support vlan stripping.
For PKT_RX_QINQ_PKT, it was only supported on i40e, and the behavior was
already correct, so we can reuse the same bit value for
PKT_RX_QINQ_STRIPPED.
[1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some architectures (ex: Power8) have a cache line size of 128 bytes,
so the drivers should not expect that prefetching the second part of
the mbuf with rte_prefetch0(&m->cacheline1) is valid.
This commit add helpers that can be used by drivers to prefetch the
rx or tx part of the mbuf, whatever the cache line size.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The rte_pktmbuf_detach() function should decrease refcnt on a direct
buffer as stated in doc/guides/prog_guide/mbuf_lib.rst:
"whenever the indirect buffer is detached, the reference counter on the
direct buffer is decremented."
Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Many drivers provide their own implementation of rte_mbuf_raw_alloc(),
duplicating the code. Introduce a new public function in rte_mbuf to
allocate a raw mbuf (uninitialized).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The macro RTE_VERIFY always checks a condition.
It is optimized with "unlikely" hint.
While this macro is well suited for test applications, it is preferred
in libraries and examples to enable such check in debug mode.
That's why the macro RTE_ASSERT is introduced to call RTE_VERIFY only
if built with debug logs enabled.
A lot of assert macros were duplicated and enabled with a specific flag.
Removing these #ifdef allows to test these code branches more easily
and avoid dead code pitfalls.
The ENA_ASSERT is kept (in debug mode only) because it has more
parameters to log.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Some flags were poisoned after having been removed from EAL and mbuf
in releases 1.8 (b10eef348d, 62814bc2e9) and 2.0 (4769bc5a27).
After several releases, they have probably disappeared from all
applications going to upgrade to DPDK 16.07.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
X550 will do VxLAN & NVGRE RX checksum off-load automatically.
This patch exposes the result of the checksum off-load.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
As cryptodev library does not depend on mbuf_offload library
any longer, this patch removes it.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
are defined in each PMD driver file. Convert macros to inline
functions and move them to common lib/librte_mbuf/rte_mbuf.h file.
PMD drivers include rte_mbuf.h file directly/indirectly hence no
additioanl header file inclusion is necessary.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.
Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device
In this implementation, while() loop is used because we could not assume
count is strictly positive. Using while() loop saves one line of check.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
No need to split mbuf structure to two cache lines for 128-byte cache
line size targets as it can fit on a single 128-byte cache line.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This library add support for adding a chain of offload operations to a
mbuf. It contains the definition of the rte_mbuf_offload structure as
well as helper functions for attaching offloads to mbufs and a mempool
management functions.
This initial implementation supports attaching multiple offload
operations to a single mbuf, but only a single offload operation of a
specific type can be attach to that mbuf.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Increase the number of possible subports per port to allow up to 16 bits.
It is still possible that this will require excessive RAM.
Although mbuf structure is changed, it is ABI compatiable since it
just expands existing sched part of structure to overlap pre-existing hole
in the hash element of structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Chaining/segmenting mbufs can be useful in many places, so make it
global.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The extended unified packet type is now part of the standard ABI.
As mbuf struct is changed, the mbuf library version is incremented.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
When DPDK is being compiled in C++ project using g++ then
'invalid conversion from' error appears. Added explicit
typecast on function return to get rid of the error.
Fixes: 7755baae83 ("mbuf: silence warning on pointer arithmetic")
Signed-off-by: Sergey Balabanov <balabanovsv@ecotelecom.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
It looks better to have a data buffer address that is aligned to
8 bytes. This is the case when there is no mbuf private area, but
if there is one, the alignment depends on the size of this area
that is located between the mbuf structure and the data buffer.
Indeed, some drivers expects to have the buffer address aligned
to an even address, and moreover an unaligned buffer may impact
the performance when accessing to network headers.
Add a check in rte_pktmbuf_pool_create() to verify the alignment
constraint before creating the mempool. For applications that use
the alternative way (direct call to rte_mempool_create), also
add an assertion in rte_pktmbuf_init().
By the way, also add the MBUF log type.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
A packet is tunnelled if the tunnel type is identified or if it has
an inner part.
Fix also some typos in RTE_PTYPE_INNER_L3_MASK and IP comments.
Fixes: f295a00a2b ("mbuf: add definitions of unified packet types")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The comment for TX offload flags stated that those flags started at bit
55 and then were added to the right of that, leaving 8 bits reserved for
generic mbuf (i.e. non-offload) use. This comment may not have been
clear as 5 of the 8 flags which were reserved have now been used for TX
offloads.
This patch:
* updates the description so that it now reflects reality that
only three flags are available for generic mbuf use
* reserved the final generic flag so that it can't be taken over for TX
offload in future
* clarifies the comment for TX flags to indicate that they should be
counting downwards not upwards.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
As unified packet types are used instead, those old bit masks and
the relevant macros for packet type indication need to be removed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
As there are only 6 bit flags in ol_flags for indicating packet
types, which is not enough to describe all the possible packet
types hardware can recognize. For example, i40e hardware can
recognize more than 150 packet types. Unified packet type is
composed of L2 type, L3 type, L4 type, tunnel type, inner L2 type,
inner L3 type and inner L4 type fields, and can be stored in
'struct rte_mbuf' of 32 bits field 'packet_type'.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In order to unify the packet type, the field of 'packet_type' in
'struct rte_mbuf' needs to be extended from 16 to 32 bits.
Accordingly, some fields in 'struct rte_mbuf' are re-organized to support
this change for Vector PMD.
As 'struct rte_kni_mbuf' for KNI should be right mapped to
'struct rte_mbuf', it should be modified accordingly.
In ixgbe PMD driver, corresponding changes are added for the mbuf changes,
especially the bit masks of packet type for 'ol_flags' are replaced by
unified packet type. In addition, more packet types (UDP, TCP and SCTP)
are supported in vectorized ixgbe PMD.
To avoid breaking ABI compatibility, all the changes would be enabled by
RTE_NEXT_ABI.
Note that around 2% performance drop (64B) was observed of doing 4 ports
(1 port per 82599 card) IO forwarding on the same SNB core.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add mbuf field to store the IEEE1588 RX register index and other flags.
The size of the structure is not changed (still 2 cache lines).
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Use the reserved 16 bits in rte_mbuf structure for the outer vlan,
also add QinQ offloading flags for both RX and TX sides.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
The new version nodes inherit from the previous ones which
already include a default catch-all line for not exported symbols.
Reported-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There are a number of instances in the code where rte_pktmbuf_mtod()
is used to get the mbuf data pointer, only to add an offset before
casting the result to some other header type. This patch adds a new
rte_pktmbuf_mtod_offset() macro to eliminate these awful double cast
situations.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Translating from an mbuf element to the mbuf pointer does not break alignment
constraints. However, the compiler is unaware of this fact and complains on
-Wcast-align. This patch modifies the code to use RTE_PTR_SUB(), thereby
silencing the compiler by casting through (void *).
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In __rte_pktmbuf_prefree_seg(), there was an optimization to avoid using
a costly atomic operation when updating the mbuf reference counter if
its value is 1. Indeed, it means that we are the only owner of the mbuf,
and therefore nobody can change it at the same time.
We can generalize this optimization directly in rte_mbuf_refcnt_update()
so the other callers of this function, like rte_pktmbuf_attach(), can
also take advantage of this optimization.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Latest mbuf changes (priv_size addition and related fixes)
exposed small problem with testpmd and few other sample apps:
when mbuf size is exaclty 2KB or less, that causes
ixgbe PMD to select scattered RX even for configs with 'normal'
max packet length (max_rx_pkt_len == ETHER_MAX_LEN).
To overcome that problem and unify the code, new macro was created
to represent recommended minimal buffer length for mbuf.
When appropriate, samples are updated to use that macro.
Fixes: dfb03bbe2b ("app/testpmd: use standard functions to initialize
mbufs and mbuf pool")
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Remove one limitation of rte_pktmbuf_attach(): "mbuf we're attaching to
must be direct".
Now, when we attach to an indirect mbuf:
- copy the all relevant fields (addr, len, offload, ...) as before
- get the pointer to the mbuf that embeds the data buffer (direct mbuf),
and increase the reference counter of this one.
When detaching the mbuf, we can retrieve this direct mbuf as the pointer
is determined from the buffer address.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add a new priv_size field in mbuf structure that should
be initialized at mbuf pool creation. This field contains the
size of the application private data in mbufs.
Introduce new static inline functions rte_mbuf_from_indirect()
and rte_mbuf_to_baddr() to replace the existing macros, which
take the private size in account when attaching and detaching
mbufs.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add a new wrapper to rte_mempool_create() to simplify the creation
of a packet mbuf pool.
This wrapper can be used if there is no specific mempool flags, and
no specific mbuf or pool constructor function, which is most of the
use cases.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Allow the user to use the default rte_pktmbuf_init() function even
if the mbuf private size is not 0.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This code retrieving the pool private area is duplicated in many
places, we can use of function for it.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Deduct the mbuf data room size from mempool->elt_size and priv_size,
instead of using an hardcoded value that is not related to the real
buffer size.
To use rte_pktmbuf_pool_init(), the user can either:
- give a NULL parameter to rte_pktmbuf_pool_init(): in this case, the
private size is assumed to be 0, and the room size is
mp->elt_size - sizeof(struct rte_mbuf).
- give the rte_pktmbuf_pool_private filled with appropriate
data_room_size and priv_size values.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
CONFIG_RTE_MBUF_SCATTER_GATHER was renamed into CONFIG_RTE_MBUF_REFCNT
by commit 62814bc2e9 and removed by commit 4769bc5a27.
Some traces remain because of delayed patches.
It can also be removed from doxygen config.
It is now poisoned in rte_mbuf.h to warn any misuse.
Fixes: d0dff9ba44 ("doc: sample application user guide")
Fixes: fc1f2750a3 ("doc: programmers guide")
Fixes: 4769bc5a27 ("mbuf: remove build option to disable refcnt")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
The logic used in the condition check before freeing an mbuf is
sometimes confusing, so explain it in a proper comment.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Fix a couple of doxygen comments in mbuf structure:
- seqn had no doxygen syntax.
- usr was not generating proper link to function.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.
We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.
So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.
This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.
The patch also modifies the vhost example as it was using the
RTE_MBUF_INDIRECT macro to detect if it was an mbuf with external buffer.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This library provides reordering capability for out of order mbufs based
on a sequence number in the mbuf structure.
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Signed-off-by: Richardson Bruce <bruce.richardson@intel.com>
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Since previous commit, the flag PKT_TX_UDP_TUNNEL_PKT is not used by any PMD,
remove it from mbuf API and from csumonly (testpmd). In csumonly, the
PKT_TX_OUTER_IP_CKSUM flag is already set for vxlan checksum, providing
enough information to the underlying driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
The alias PKT_TX_IPV4_CSUM is only used in one place of i40e driver.
Remove it and only keep the legacy flag PKT_TX_IP_CSUM.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
To differentiate libraries that break ABI, we add a library version number
suffix to the library, which must be incremented when a given libraries ABI is
broken. This patch enforces that addition, sets the initial abi soname
extension to 1 for each library and creates a symlink to the base SONAME so that
the test applications will link properly.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Add linker version script files to each DPDK library to put a stake in the
ground from which we can start cleaning up API's
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Include rte_memory.h for lib files that use __rte_cache_aligned
attribute.
Consider the following code:
struct per_core_foo {
...
} __rte_cache_aligned;
struct global_foo {
struct per_core_foo foo[RTE_MAX_CORE];
};
If __rte_cache_aligned is not defined (rte_memory.h is not included),
the code compiles but the structure is not aligned... it defines the
structure and creates a global variable called __rte_cache_aligned.
And this can lead to really bad things if this code is in a .h that
is included by files that may or may not include rte_memory.h
Signed-off-by: Jia Yu <jyu@vmware.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The template mbuf_initializer is hard coded with a buflen which
might have been set differently by the application at the time of
mbuf pool creation.
- move buf_len fields out of rearm_data marker.
- make ixgbe_recv_pkts_vec() not touch buf_len field at all
(as all other RX functions behave).
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Replace the inner_l2_len and the inner_l3_len field with the
outer_l2_len and outer_l3_len field, and rework csum forward engine
and i40e PMD due to these changes.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Replace PKT_TX_VXLAN_CKSUM with PKT_TX_UDP_TUNNEL_PKT in order to indicate
a packet is an UDP tunneling packet, and introduce 3 TX offload flags for
outer IP TX checksum, which are PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IPV4
and PKT_TX_OUTER_IPV6 respectively.
Rework csum forward engine and i40e PMD due to these changes.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The reason of redefining the PKT_TX_IPV4 and the PKT_TX_IPV6 is listed below,
It will avoid to send a packet with a bad info:
- we receive a Ether/IP6/IP4/L4/data packet
- the driver sets PKT_RX_IPV6_HDR
- the stack decapsulates IP6
- the stack sends the packet, it has the PKT_TX_IPV6 flag but it's an IPv4 packet.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
(l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
TCP segmentation.
In ixgbe, when doing TSO, the IP length must not be included in the TCP
pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
used to fix the pseudo header checksum of the packet before giving it to
the hardware.
In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This should not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.
Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).
To delegate the TCP segmentation to the hardware, the user has to:
- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
PKT_TX_TCP_CKSUM)
- set the flag PKT_TX_IPV4 or PKT_TX_IPV6
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum without taking ip_len in account,
and set it in the TCP header, for instance by using
rte_ipv4_phdr_cksum(ip_hdr, ol_flags)
The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.
This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.
Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Introduce new functions to calculate checksums. These new functions
are derivated from the ones provided csumonly.c but slightly reworked.
There is still some room for future optimization of these functions
(maybe SSE/AVX, ...).
This API will be modified in tbe next commits by the introduction of
TSO that requires a different pseudo header checksum to be set in the
packet.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.
This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Describe how to use hardware checksum API.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The tx mbuf flags are now ordered from the lowest value to the
the highest. Add comments to explain where to add new flags.
By the way, move the PKT_TX_VXLAN_CKSUM at the right place.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
fdir field in rte_mbuf is extended to support flex bytes reported when fdir match.
8 flex bytes can be reported in maximum.
The reported flex bytes are part of flexible payload.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This field is added for librte_distributor. User of librte_distributor
is advocated to set value of mbuf->hash.usr before calling
rte_distributor_process. The value of usr is the tag which stands as
identifier of flow.
Signed-off-by: Qinglai Xiao <jigsaw@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Implement the configuration API of VXLAN destination UDP port,
and add new Rx offload flags for supporting VXLAN packet offload.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
Replace the "reserved2" field with the "packet_type" field
and add the "inner_l2_l3_len" field in the rte_mbuf structure.
The "packet_type" field is used to indicate ordinary packet format and also
tunneling packet format such as IP in IP, IP in GRE, MAC in GRE and MAC in UDP.
The "inner_l2_len" and the "inner_l3_len" fields are added
in the second cache line, they use 2 bytes for TX offloading of tunnels.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add in a doxygen comment for the ctrl mbuf flag definition.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Update the format of the RX flags to match that of the TX flags. In
general the flags are now specified as "1ULL << X", with a few
exceptions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch takes the existing TX flags defined for the mbuf and shifts
each uniquely defined one left so that additional RX flags can be
defined without having RX and TX flags mixed together. Under the new
scheme, RX flags start at bit 0 and work left, TX flags start at bit 55
and work right, and bits 56-63 are reserved for generic mbuf use, not
for offloads.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Move the vlan_tci field up by two bytes in the mbuf data structure. This
has two effects:
* Ensures the the ixgbe vector driver places the vlan tag in the correct
place in the mbuf.
* Allows a second vlan tag field, if one is added in the future, to be
placed after the existing vlan field, rather than before.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
While some applications may store metadata about packets in the packet
mbuf headroom, this is not a workable solution for packet metadata which
is either:
* larger than the headroom (or headroom is needed for adding pkt headers)
* needs to be shared or copied among packets
To support these use cases in applications, we reserve a general
"userdata" pointer field inside the second cache-line of the mbuf. This
is better than having the application store the pointer to the external
metadata in the packet headroom, as it saves an additional cache-line
from being used.
Apart from storing metadata, this field also provides a general 8-byte
scratch space inside the mbuf for any other application uses that are
applicable.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The receive functions for packets do not modify the next pointer so
the next pointer should always be cleared on mbuf free, just in case.
The slow-path TX needs to clear it, and the standard mbuf free function
also needs to clear it. Fast path TX does not handle chained mbufs so
is unaffected
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The l2_len and l3_len fields are used for TX offloads and so should be
put on the second cache line, along with the other fields only used on
TX.
The l2 and l3 lengths can be accessed as a single uint16_t for
performance, as well as individually.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This change splits the mbuf in two to move the pool and next pointers to
the second cache line. This frees up 16 bytes in first cache line.
The reason for this change is that we believe that there is no possible
way that we can ever fit all the fields we need to fit into a 64-byte
mbuf, and so we need to start looking at a 128-byte mbuf instead. Examples
of new fields that need to fit in, include -
* 32-bits more for filter information for support for the new filters in
the i40e driver (and possibly other future drivers)
* an additional 2-4 bytes for storing info on a second vlan tag to allow
drivers to support double Vlan/QinQ
* 4-bytes for storing a sequence number to enable out of order packet
processing and subsequent packet reordering
as well as potentially a number of other fields or splitting out fields
that are superimposed over each other right now, e.g. for the qos scheduler.
We also want to allow space for use by other non-Intel NIC drivers that may
be open-sourced to dpdk.org in the future too, where they support fields
and offloads that currently supported hardware doesn't.
If we accept the fact of a 2-cache-line mbuf, then the issue becomes
how to rework things so that we spread our fields over the two
cache lines while causing the lowest slow-down possible. The general
approach that we are looking to take is to focus the first cache
line on fields that are updated on RX , so that receive only deals
with one cache line. The second cache line can be used for application
data and information that will only be used on the TX leg. This would
allow us to work on the first cache line in RX as now, and have the
second cache line being prefetched in the background so that it is
available when necessary. Hardware prefetches should help us out
here. We also may move rarely used, or slow-path RX fields e.g. such
as those for chained mbufs with jumbo frames, to the second
cache line, depending upon the performance impact and bytes savings
achieved.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add markers or "labels" at given points inside the mbuf which can be
used instead of individual fields to identify the start of logical
sections inside the mbuf.
The use of typedefs and dummy fields was chosen over using unions
because of a couple reasons:
* unions cause an extra level of indentation (more likely two levels as
a union containing a struct for multiple fields would be needed). This
makes the lines longer than they need to be and increases the need for
wrapping. [This was the main reason]
* with markers, you can apply multiple markers at the same point if
wanted.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The metadata macros are only used by libs and apps using the rte_port
packet framework library, so move them to a header file there.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Removed the explicit zero-sized metadata definition at the end of the
mbuf data structure. Updated the metadata macros to take account of this
change so that all existing code which uses those macros still works.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
* Ensure comments line up correctly
* Simplify the #ifdefs around the refcnt fields to make them clearer
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Since the flags field is now 64-bits, we can allow one bit to be used to
indicate a control i.e. non-packet mbuf. Dedicate the high bit (bit 63)
for this purpose and add in a utility macro to test if a given mbuf has
the bit set or not.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The offload flags field (ol_flags) was 16-bits and had no further room
for expansion. This patch increases the field size to 64-bits, using up
the remaining reserved space in the single-cache-line mbuf.
NOTE: none of the values for existing flags have been changed, i.e. no
new numbers have been explicitly reserved between existing flag
definitions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
* Reorder the fields in the mbuf so that we have fields that are used
together side-by-side in the structure. This means that we have a
contiguous block of 8-bytes in the mbuf which are used to reset an mbuf
of descriptor rearm, and a block of 16-bytes of data (excluding flags)
which are set on RX from the received packet descriptor.
* Use dummy fields as appropriate to ensure alignment or to reserve gaps
for later field additions.
* Place most items which are not used by fast-path RX separately at the end
of the structure so they can later be moved to a separate cache line.
[The l2/l3 length fields are not moved at this stage as doing so will
cause overflow to the next cache line].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The mbuf structure already contains a pointer to the beginning of the
buffer (m->buf_addr). It is not needed to use 8 bytes again to store
another pointer to the beginning of the data.
Using a 16 bits unsigned integer is enough as we know that a mbuf is
never longer than 64KB. We gain 6 bytes in the structure thanks to
this modification.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
* Updated to apply to latest on mainline.
* Disabled vector PMD in config as it relies heavily on the mbuf layout
This will be re-enabled in a subsequent commit once vPMD has been
reworked to take account of mbuf changes.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The vlan_macip structure combined a vlan tag id with l2 and l3 headers
lengths for tracking offloads. However, this structure was only used as
a unit by the e1000 and ixgbe drivers, not generally.
This patch removes the structure from the mbuf header and places the
fields into the mbuf structure directly at the required point, without
any net effect on the structure layout. This allows us to treat the vlan
tags and header length fields as separate for future mbuf changes. The
drivers which were written to use the combined structure still do so,
using a driver-local definition of it.
Reduce perf regression caused by splitting vlan_macip field. This is
done by providing a single uint16_t value to allow writing/clearing
the l2 and l3 lengths together. There is still a small perf hit to the
slow path TX due to the reads from vlan_tci and l2/l3 lengths being
separated. (<5% in my tests with testpmd with no extra params).
Unfortunately, this cannot be eliminated, without restoring the vlan
tags and l2/l3 lengths as a combined 32-bit field. This would prevent
us from ever looking to move those fields about and is an artificial tie
that applies only for performance in igb and ixgbe drivers. Therefore,
this patch keeps the vlan_tci field separate from the lengths as the
best solution going forward.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
In some cases we may want to tag a packet for a particular destination
or output port, so rename the "in_port" field in the mbuf to just "port"
so that it can be re-used for this purpose if an application needs it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The rte_pktmbuf structure was initially included in the rte_mbuf
structure. This was needed when there was 2 types of mbuf (ctrl and
packet). As the control mbuf has been removed, we can merge the
rte_pktmbuf into the rte_mbuf structure.
Advantages of doing this:
- the access to mbuf fields is easier (ex: m->data instead of m->pkt.data)
- make the structure more consistent: for instance, there was no reason
to have the ol_flags field in rte_mbuf
- it will allow a deeper reorganization of the rte_mbuf structure in the
next commits, allowing to gain several bytes in it
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
[Bruce: updated for latest code and new example apps]
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The initial role of rte_ctrlmbuf is to carry generic messages (data
pointer + data length) but it's not used by the DPDK or it applications.
Keeping it implies:
- loosing 1 byte in the rte_mbuf structure
- having some dead code rte_mbuf.[ch]
This patch removes this feature. Thanks to it, it is now possible to
simplify the rte_mbuf structure by merging the rte_pktmbuf structure
in it. This is done in next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
* Updated patch to HEAD.
* Modified patch to retain the old function names for ctrl mbufs as
macros. This helps with app compatibility, and allows the concept
of a control mbuf to be reintroduced via a single-bit flag in
a future change.
* Updated the packet framework ip_pipeline example application to
work following this change.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
It seems that RTE_MBUF_SCATTER_GATHER is not the proper name for the
feature it provides. "Scatter gather" means that data is stored using
several buffers. RTE_MBUF_REFCNT seems to be a better name for that
feature as it provides a reference counter for mbufs.
The macro RTE_MBUF_SCATTER_GATHER is poisoned to ensure this
modification is seen by drivers or applications using it.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
New packet flags of both RX and TX have been added to support i40e.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Jing Chen <jing.d.chen@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Heqing Zhu <heqing.zhu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Added zero-size field (offset in data structure) to specify the beginning
of packet meta-data in the packet buffer just after the mbuf.
The size of the packet meta-data is application specific and the packet
meta-data is managed by the application.
The packet meta-data should always be accessed through the provided macros.
This is used by the Packet Framework libraries (port, table, pipeline).
There is absolutely no performance impact due to this mbuf field, as it
does not take any space in the mbuf structure (zero-size field).
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The DPDK dump functions are useful for remote debugging of an
applications. But when application runs as a daemon, stdout
is typically routed to /dev/null.
Instead change all these functions to take a stdio FILE * handle
instead. An application can then use open_memstream() to capture
the output.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: fix quota_watermark example]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
rte_pktmbuf_attach copies the packet meta data but does not
copy the offload flags. This means that cloned packets lose
their offload settings such as vlan tag.
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Allow poll-mode drivers to maintain their own caches of mbufs, by allowing them
to check if it's ok to free an mbuf (to their local cache) without actually
freeing it back to the memory pool itself.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>