In some situations, we would get several ip fragments, which total
data length is less than min_ip_len(64) and padding with zeros.
We simulated intermediate fragments by modifying the MTU.
To illustrate the problem, we simplify the packet format and
ignore the impact of the packet header.In namespace2,
a packet whose data length is 1520 is sent.
When the packet passes tap2, the packet is divided into two
fragments: fragment A and B, similar to (1520 = 1510 + 10).
When the packet passes tap3, the larger fragment packet A is
divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
Finally, the bond interface receives three fragments:
A1, A2, and B (1520 = 1500 + 10 + 10).
One fragmented packet A2 is smaller than the minimum Ethernet
frame length, so it needs to be padded.
|---------------------------------------------------|
| HOST |
| |--------------| |----------------------------| |
| | ns2 | | |--------------| | |
| | |--------| | | |--------| |--------| | |
| | | tap1 | | | | tap2 | ns1| tap3 | | |
| | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
| |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
| |--------| |--------| |--------| |
| | | | |
| |-----------------| | |
| | |
| |--------| |
| | bond | |
|--------------------------------------|mtu=1500|---|
|--------|
When processing the preceding packets above,
DPDK would aggregate fragmented packets A2 and B.
And error packets are generated, which padding(zero)
is displayed in the middle of the packet.
A2 + B:
0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
So, we would calculate the length of padding, and remove
the padding in pkt_len and data_len before aggregation.
And also we have the fix for both ipv4 and ipv6.
Fixes: 7f0983ee33 ("ip_frag: check fragment length of incoming packet")
Cc: stable@dpdk.org
Signed-off-by: Yicai Lu <luyicai@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Since each version map file is contained in the subdirectory of the library
it refers to, there is no need to include the library name in the filename.
This makes things simpler in case of library renaming.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Applications handling fragmented IPv6 packets need to match on IPv6
fragment extension header, in order to identify the fragments order
and location in the packet.
This patch introduces the IPv6 fragment extension header item,
proposed in [1].
Relevant definitions are moved from lib/librte_ip_frag/rte_ip_frag.h
to lib/librte_net/rte_ip.h, as they are needed for IPv6 header handling.
struct ipv6_extension_fragment renamed to rte_ipv6_fragment_ext to
adapt it to the common naming convention.
Default mask is not defined, since all fields are optional.
[1] http://mails.dpdk.org/archives/dev/2020-March/160255.html
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
A decision was made [1] to no longer support Make in DPDK, this patch
removes all Makefiles that do not make use of pkg-config, along with
the mk directory previously used by make.
[1] https://mails.dpdk.org/archives/dev/2020-April/162839.html
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Start a new release cycle with empty release notes.
The ABI version becomes 21.0.
The ABI major is back to normal, having only one number (21 vs 20.0).
The map files are updated to the new ABI major number (21).
The ABI exceptions are dropped.
Travis ABI check is disabled because compatibility is not preserved.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
In addition, do a formal parameter check.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Do a formal parameter check of mtu length, as well as
checking the the various inputs for validity. If any
aren't acceptable, we bail.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
There is a common macro __rte_packed for packing structs,
which is now used where appropriate for consistency.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Merge all versions in linker version script files to DPDK_20.0.
This commit was generated by running the following command:
:~/DPDK$ buildtools/update-abi.sh 20.0
Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Since the library versioning for both stable and experimental ABI's is
now managed globally, the LIBABIVER and version variables no longer
serve any useful purpose, and can be removed.
The replacement in Makefiles was done using the following regex:
^(#.*\n)?LIBABIVER\s*:=\s*\d+\n(\s*\n)?
(LIBABIVER := numbers, optionally preceded by a comment and optionally
succeeded by an empty line)
The replacement for meson files was done using the following regex:
^(#.*\n)?version\s*=\s*\d+\n(\s*\n)?
(version = numbers, optionally preceded by a comment and optionally
succeeded by an empty line)
[David]: those variables are manually removed for the files:
- drivers/common/qat/Makefile
- lib/librte_eal/meson.build
[David]: the LIBABIVER is restored for the external ethtool example
library.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Currently PKT_TX_IP_CKSUM is being set into mbuf->ol_flags during
fragmentation operation implicitly by the library. Because of this,
application is forced to use checksum offload whether it is supported
by platform or not.
Also documentation does not provide any expected value of ol_flags in
returned fragmented mbufs so application will never come to know that which
offloads are enabled. So transmission may be failed for the platforms which
does not support checksum offload.
So removing mentioned flag from the library.
Mentioned change is part of http://patches.dpdk.org/patch/53475.
Changes for reassembly operation is already accepted. This patch set
implements the similar change for fragmentation operation.
Fixes: e29fc44370 ("ip_frag: remove IP checkum offload flag")
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Currently PKT_TX_IP_CKSUM is being set into mbuf->ol_flags
during fragmentation and reassemble operation implicitly.
Because of this, application is forced to use checksum offload
whether it is supported by platform or not.
Also documentation does not provide any expected value of ol_flags
in returned mbuf (reassembled or fragmented) so application will never
come to know that which offloads are enabled. So transmission may be failed
for the platforms which does not support checksum offload.
Also, IPv6 does not contain any checksum field in header so setting
mbuf->ol_flags with PKT_TX_IP_CKSUM is itself invalid.
So removing mentioned flag from the library.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Putting a '__attribute__((deprecated))' in the middle of a function
prototype does not result in the expected result with gcc (while clang
is fine with this syntax).
$ cat deprecated.c
void * __attribute__((deprecated)) incorrect() { return 0; }
__attribute__((deprecated)) void *correct(void) { return 0; }
int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
$ gcc -o deprecated.o -c deprecated.c
deprecated.c: In function ‘main’:
deprecated.c:3:1: warning: ‘correct’ is deprecated (declared at
deprecated.c:2) [-Wdeprecated-declarations]
int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
^
Move the tag on a separate line and make it the first thing of function
prototypes.
This is not perfect but we will trust reviewers to catch the other not
so easy to detect patterns.
sed -i \
-e '/^\([^#].*\)\?__rte_experimental */{' \
-e 's//\1/; s/ *$//; i\' \
-e __rte_experimental \
-e '/^$/d}' \
$(git grep -l __rte_experimental -- '*.h')
Special mention for rte_mbuf_data_addr_default():
There is either a bug or a (not yet understood) issue with gcc.
gcc won't drop this inline when unused and rte_mbuf_data_addr_default()
calls rte_mbuf_buf_addr() which itself is experimental.
This results in a build warning when not accepting experimental apis
from sources just including rte_mbuf.h.
For this specific case, we hide the call to rte_mbuf_buf_addr() under
the ALLOW_EXPERIMENTAL_API flag.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
We had some inconsistencies between functions prototypes and actual
definitions.
Let's avoid this by only adding the experimental tag to the prototypes.
Tests with gcc and clang show it is enough.
git grep -l __rte_experimental |grep \.c$ |while read file; do
sed -i -e '/^__rte_experimental$/d' $file;
sed -i -e 's/ *__rte_experimental//' $file;
sed -i -e 's/__rte_experimental *//' $file;
done
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Take into account IPv6 fragment extension header when
calculating data size for each fragment.
Fixes: 7a838c8798 ("ip_frag: fix IPv6 when MTU sizes not aligned to 8 bytes")
Fixes: 0aa31d7a59 ("ip_frag: add IPv6 fragmentation support")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The same issue was fixed on for the ipv4 version of this routine in
commit 8d4d3a4f73 ("ip_frag: handle MTU sizes not aligned to 8 bytes").
Briefly, the size of an ipv6 header is always 40 bytes. With an MTU of
1500, this will never produce a multiple of 8 bytes for the frag_size
and this routine can never succeed. Since RTE_ASSERTS are disabled by
default, this failure is typically ignored.
To fix this, round down to the nearest 8 bytes and use this when
producing the fragments.
Fixes: 0aa31d7a59 ("ip_frag: add IPv6 fragmentation support")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Right now reassembly code relies on src_dst[] being all zeroes to
determine is it free/occupied entry in the fragments table.
This is suboptimal and error prone - user can crash DPDK ip_reassembly
app by something like the following scapy script:
x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
frags=fragment(x, fragsize=500)
sendp(frags, iface=...)
To overcome that issue and reduce overhead of
'key invalidate' and 'key is empty' operations -
add key_len into keys comparision procedure.
Fixes: 4f1a8f6338 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org
Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Under some conditions ill-formed fragments might cause
reassembly code to corrupt mbufs and/or crash.
Let say the following fragments sequence:
<ofs=0,len=100, flags=MF>
<ofs=96,len=100, flags=MF>
<ofs=200,len=0,flags=MF>
<ofs=200,len=100,flags=0>
can trigger the problem.
To overcome such situation, added check that fragment length
of incoming value is greater than zero.
Fixes: 601e279df0 ("ip_frag: move fragmentation/reassembly headers into a library")
Fixes: 4f1a8f6338 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org
Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
in struct ip_frag_key,src_dst[] type is uint64_t.
but "val" which to store the calc restult ,type is uint32_t.
we may lost high 32 bit key. and function return value is int,
but it won't return < 0.
Signed-off-by: Li Han <han.li1@zte.com.cn>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
A fragmented packets is supposed to live no longer than max_cycles,
but the lib deletes an expired packet only occasionally when it scans
a bucket to find an empty slot while adding a new packet.
Therefore a fragment might sit in the table forever.
Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In ip_frag_process, some IP_FRAG_LOG content is wrong.
Fixes: 4f1a8f6338 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org
Signed-off-by: Li Han <han.li1@zte.com.cn>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The first mbuf and the last mbuf to be visited in the preceding loop
are not set to NULL in the fragmentation table. This creates the
possibility of a double free when the fragmentation table is later freed
with rte_ip_frag_table_destroy().
Fixes: 95908f5239 ("ip_frag: free mbufs on reassembly table destroy")
Cc: stable@dpdk.org
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add non-EAL libraries to DPDK build. The compat lib is a special case,
along with the previously-added EAL, but all other libs can be build using
the same set of commands, where the individual meson.build files only need
to specify their dependencies, source files, header files and ABI versions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Many exported headers rely on definitions found in rte_config.h without
including it, as shown by the following command:
grep -L '^#include <rte_config.h>' -- \
$(grep -Rl \
$(sed -n '/^#define \([^ ]\+\).*$/{s//\1/;H;};${x;s/\n//;s/\n/\\|/g;p;}' \
build/include/rte_config.h) \
-- build/include/)
We cannot assume external applications will include rte_config.h on their
own, neither directly nor through a -include parameter like DPDK does
internally.
This not only causes obvious compilation failures that can be reproduced
with check-includes.sh such as:
[...]/rte_memory.h:88:43: error: ‘RTE_CACHE_LINE_SIZE’ was not declared in
this scope
#define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
^
It also results in less visible issues, for instance rte_hash_crc.h relying
on RTE_ARCH_X86_64's presence to provide dedicated inline functions.
This patch partially reverts the commit below and adds missing include
lines to the remaining files.
Fixes: f1a7a5c5f4 ("remove include of generated config header")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The list of libraries in LDLIBS was generated from the DEPDIRS-xyz
variable. This is valid when the subdirectory name match the library
name, but it's not always the case, especially for PMDs.
The patches removes this feature and explicitly adds the proper
libraries in LDLIBS.
Some DEPDIRS-xyz variables become useless, remove them.
Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
The filenames of the linker map files for DPDK libraries, all follow a
standard format: rte_<libname>_version.map. The ip_frag version, however,
was missing an underscore in the name, so was non-standard. By changing
this, we no longer need the build system to explicitly be given the name of
the mapfile, as it can determine it from the directory/library name.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Since SSE4 is now part of the minimum requirements for DPDK, we don't need
to check for its presence any more.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The rte_ipv4_fragment_packet API expects that the link/interface MTU value
passed in be divisible by 8 bytes. Given the name of the parameter is
"mtu" rather than "frag_size" it is not necessarily the case that it will
be divisible by 8. An MTU of 1500 happens to produce a max fragment size
of 1480 (1500 - sizeof(ipv4_hdr)) which is divisible by 8 but other MTU
values such as 1600 or 9000 do not produce values that are divisible by 8.
Unfortunately, the API checks that the frag_size value produced is
divisible by 8 with a call to RTE_ASSERT which is only enabled when the
RTE_LOG_LEVEL >= RTE_LOG_DEBUG. In cases where the log level is set
normally the code silently continues and produces IP fragments that have
invalid fragment offset values.
An application may not have control over what MTU a user selects and rather
than have each application adjust the MTU to pass a suitable value to the
fragmentation API this change modifies the fragmentation API to handle
cases where the "mtu" argument is not divisible by 8 and automatically
adjust the internal "frag_size".
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The rte_ip_frag_table_destroy procedure simply releases the memory for the
table without freeing the packet buffers that may be referenced in the hash
table for in-flight or incomplete packet reassembly operations. To prevent
leaked mbufs go through the list of fragments and free each one
individually.
Fixes: 416707812c ("ip_frag: refactor reassembly code into a proper library")
Cc: stable@dpdk.org
Reported-by: Matt Peters <matt.peters@windriver.com>
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Fixing typos across dpdk source code using codespell utility.
Skipped the ethdev driver's base code fixes to keep the base
code intact.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
After changing pkt[0] to pkt[], the example IP reassembly is not working.
It's weird because this change is fine. There should be no difference
between them.
As a workaround, revert this change.
Fixes: 347a1e037f ("lib: use C99 syntax for zero-size arrays")
Reported-by: Huilong Xu <huilongx.xu@intel.com>
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Not sure what exactly changed and where, but I've started getting
build failures on Fedora rawhide i386:
lib/librte_ip_frag/ip_frag_internal.c:36:23: fatal error:
rte_jhash.h: No such file or directory
#include <rte_jhash.h>
^
Looking at librte_ip_frag, it clearly depends on librte_hash so
its probably more a question of something commonly masking the issue.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Exported header files used by applications should allow the strictest
compiler flags. Language extensions used in many places must be explicitly
marked or removed to avoid warnings and compilation failures.
The extension keyword is used whenever the C99 syntax cannot do it.
This commit prevents the following errors:
error: ISO C forbids zero-size array `[...]'
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Some libraries were missing their dependency on eal, mbuf, mempool,
ring and kvargs.
It is revealed by the linker option "-z defs".
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch adds missing DEPDIRS to avoid any library referring to
symbols they are not linked against.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
The macro RTE_VERIFY always checks a condition.
It is optimized with "unlikely" hint.
While this macro is well suited for test applications, it is preferred
in libraries and examples to enable such check in debug mode.
That's why the macro RTE_ASSERT is introduced to call RTE_VERIFY only
if built with debug logs enabled.
A lot of assert macros were duplicated and enabled with a specific flag.
Removing these #ifdef allows to test these code branches more easily
and avoid dead code pitfalls.
The ENA_ASSERT is kept (in debug mode only) because it has more
parameters to log.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
If any fragment hole is found in ipv4_frag_reassemble() and
ipv6_frag_reassemble(), whole ip_frag_pkt mbufs are moved to death-row.
Any mbufs already chained to another mbuf are freed multiple times as
there are still in ip_frag_pkt array.
Signed-off-by: Chaeyong Chong <cychong@gmail.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
fix the error reported by checkpatch:
"ERROR: return is not a function, parentheses are not required"
remove parentheses in return like:
"return (logical expressions)"
remove parentheses in return a function like:
"return (rte_mempool_lookup(...))"
Fixes: 6307b909b8 ("lib: remove extra parenthesis after return")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Chaining/segmenting mbufs can be useful in many places, so make it
global.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Previous implementation won't work on every environment. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined.
Solution: used bytes instead of bit fields.
Signed-off-by: Piotr Azarewicz <piotrx.t.azarewicz@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>