Currently the IPv4 header checksum is calculated including its
current value, which can be a valid checksum or just garbage.
In any case, if the original value is not zero, then the result
is always wrong.
The IPv4 checksum is defined in RFC791, page 14 says:
Header Checksum: 16 bits
The checksum algorithm is:
The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header. For purposes of
computing the checksum, the value of the checksum field is zero.
Thus force the csum field to always be zero.
Fixes: b08b8cfeb2 ("vhost: fix IP checksum")
Cc: stable@dpdk.org
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If linear buffers requested and external buffers are not, vhost
will not be able to receive any buffer that doesn't fit in a
single mbuf. Moreover, if such a buffer will appear in a vring
it will never be dequeued and the whole vring will become dead
breaking the network connection.
Disable segmentation offloading from the host side to avoid
having such a big buffers.
Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
mbuf allocation failure is a hard failure that highlights some
significant issues with memory pool size or a mbuf leak.
We still have the message for subsequent chained mbufs, but not
for the first one. It was removed while introducing extbuf
support for large buffers. But it was useful for catching
mempool issues and needs to be returned back.
Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue
function by only update first used descriptor.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Optimize vhost device packed ring dequeue function by splitting batch
and single functions. No-chained and direct descriptors will be handled
by batch and other will be handled by single as before.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add vhost packed ring zero copy batch and single dequeue functions like
normal dequeue path.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Optimize vhost device packed ring enqueue function by splitting batch
and single functions. Packets can be filled into one desc will be
handled by batch and others will be handled by single as before.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Buffer used ring updates as many as possible in vhost dequeue function
for coordinating with virtio driver. For supporting buffer, shadow used
ring element should contain descriptor's flags. First shadowed ring
index was recorded for calculating buffered number.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Flush used elements when batched enqueue function is finished.
Descriptor's flags are pre-calculated as they will be reset by vhost.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Buffer vhost packed ring enqueue updates, flush ring descs if buffered
content filled up one cacheline. Thus virtio can receive packets at a
faster frequency.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add batch dequeue function like enqueue function for packed ring, batch
dequeue function will not support chained descriptors, single packet
dequeue function will handle it.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add vhost single packet dequeue function for packed ring and meanwhile
left space for shadow used ring update function.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Batch enqueue function will first check whether descriptors are cache
aligned. It will also check prerequisites in the beginning. Batch
enqueue function do not support chained mbufs, single packet enqueue
function will handle it.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Create macro for adding unroll pragma before for each loop. Batch
functions will be contained of several small loops which can be
optimized by compilers' loop unrolling pragma.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add vhost enqueue function for single packet and meanwhile left space
for flush used ring function.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When enqueuing or dequeuing, the virtqueue's local available and used
indexes are increased.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The VXLAN related definitions and structures are moved from
rte_ether.h to a new header file: rte_xvlan.h.
Also introducing a new define macro for VXLAN default port id:
RTE_VXLAN_DEFAULT_PORT
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Raslan Darawsheh <rasland@mellanox.com>
No need to let those (non RTE_ prefixed) defines public.
Hide them where we use them.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Those two defines have been missed.
Fixes: 35b2d13fd6 ("net: add rte prefix to ether defines")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The include for rte_ether.h in each of these files should not use
quotes, as the header file is not in the librte_ethdev directory.
These are now updated to use <> symbols, to search directories
pre-designated by the compiler.
Fixes: 57668ed7bc ("net: move ethernet definitions to the net library")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Enable testpmd to forward GTP packet in csum fwd mode.
A GTP header structure (without optional fields and extension header)
is defined in new rte_gtp.h.
A parser function in testpmd is added. GTPU and GTPC packets are both
supported, with respective UDP destination port and GTP message type.
Signed-off-by: Ting Xu <ting.xu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Many features require to store data inside the mbuf. As the room in mbuf
structure is limited, it is not possible to have a field for each
feature. Also, changing fields in the mbuf structure can break the API
or ABI.
This commit addresses these issues, by enabling the dynamic registration
of fields or flags:
- a dynamic field is a named area in the rte_mbuf structure, with a
given size (>= 1 byte) and alignment constraint.
- a dynamic flag is a named bit in the rte_mbuf structure.
The typical use case is a PMD that registers space for an offload
feature, when the application requests to enable this feature. As
the space in mbuf is limited, the space should only be reserved if it
is going to be used (i.e when the application explicitly asks for it).
The registration can be done at any moment, but it is not possible
to unregister fields or flags.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently, mem config will be mapped without using the virtual
area reservation infrastructure, which means it will be mapped
at an arbitrary location. This may cause failures to map the
shared config in secondary process due to things like PCI
whitelist arguments allocating memory in a space where the
primary has allocated the shared mem config.
Fix this by using virtual area reservation to reserve space for
the mem config, thereby avoiding the problem and reserving the
shared config (hopefully) far away from any normal memory
allocations.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Not all OS's follow Linux's memory layout, which may lead to
problems following the suggested common address hint absent
of a base-virtaddr flag. Make this address hint OS-specific.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Moving RTE_CPU* definitions from the common code to the Linux and
FreeBSD rte_os.h file to avoid #ifdef clutter.
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Antara Ganesh Kolar <antara.ganesh.kolar@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Fix off-by-one error in 64bit reciprocal division when divisor is 32bit.
Caught with the unit test:
RTE>>reciprocal_division
Validating unsigned 32bit division.
Validating unsigned 64bit division.
Validating unsigned 64bit division with 32bit divisor.
Division failed, 16983222950483802557/819 = expected 20736535959076681
result 20736535959076682
Validating division by power of 2.
Test Failed
Fixes: 6d45659eac ("eal: add u64-bit variant for reciprocal divide")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Right now inclusion of rte_mbuf.h header can cause inclusion of
some arch/os specific headers.
That prevents it to be included directly by some
non-DPDK (but related) entities: KNI, BPF programs, etc.
To overcome that problem usually a separate definitions of rte_mbuf
structure is created within these entities.
That aproach has a lot of drawbacks: code duplication, error prone, etc.
This patch moves rte_mbuf structure definition (and some related macros)
into a separate file that can be included by both rte_mbuf.h and
other non-DPDK entities.
Note that it doesn't introduce any change for current DPDK code.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Michel Machado <michel@digirati.com.br>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Right now RTE_CACHE_ and IOVA definitions are located inside rte_memory.h
That might cause an unwanted inclusions of arch/os specific header files.
See [1] for particular problem example.
Probably the simplest way to deal with such problems -
move these definitions into rte_commmon.h
Note that this move doesn't introduce any change in functionality.
[1] https://bugs.dpdk.org/show_bug.cgi?id=321
Suggested-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Michel Machado <michel@digirati.com.br>
Adding a new port type called eventdev to the
rte_port library.
Signed-off-by: Rahul Shah <rahul.r.shah@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
To support high bandwidth network interfaces, all rates (port,
subport level token bucket and traffic class rates, pipe level
token bucket and traffic class rates) and stats counters defined
in public data structures (rte_sched.h) are modified to support
64 bit counters.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Remove redundant data structure fields from port level data
structures and update the release notes.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify pipe queue stats read function to allow different subports
of the same port to have different configuration in terms of number
of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify scheduler packet dequeue operation to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify packet grinder functions of the schedule to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Update memory footprint compute function for allowing subports of
the same port to have different configuration in terms of number of
pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify scheduler packet enqueue operation of the scheduler to allow
different subports of the same port to have different configuration
in terms of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify pipe level functions to allow different subports of the same
port to have different configuration in terms of number of pipes,
pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add pipes configuration from the port level to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Remove pipes configuration from the port level to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Update internal structures related to port and subport to allow
different subports of the same port to have different configuration
in terms of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add pipe configuration parameters to subport level structure to
allow different subports of the same port to have different
configuration in terms of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add GTP tunnel type flag in mbuf for future use in GTP
Tx checksum offload.
Signed-off-by: Ting Xu <ting.xu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_higig2_hdr in order to match higig2 header.
It is a layer 2.5 protocol and used in Broadcom switches.
Header format is based on the following document.
http://read.pudn.com/downloads558/doc/comm/2301468/HiGig_protocol.pdf
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The promiscuous enable and disable functions now check the
promiscuous state of the device before checking if the dev_ops
function exists for the device.
This change is necessary to allow sample applications run on
virtual PMDs, as previously -ENOTSUP returned when the promiscuous
enable function was called. This caused the sample application to
fail unnecessarily.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
OVS currently maintains a copy of those headers with the right endianness
annotations so that sparse checks can pass.
We introduced rte_beXX_t for better readibility in v17.08.
Let's make use of them, OVS then only needs to override those rte_beXX_t
types by exposing a tweaked rte_byteorder.h header.
Other existing dpdk users won't be affected since rte_beXX_t types are
mapped to uintXX_t types.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch reserves several bits as input set selection from the
high end of the 64 bits. It is combined with exisiting ETH_RSS_*
to represent RSS types. This patch also checks the simultaneous
use of SRC_ONLY and DST_ONLY of the same level.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
This patch decouples RTE_ETH_FLOW_* and ETH_RSS_*. The former defines
flow types and the latter defines RSS offload types.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
The rte_vhost_dequeue_burst supports two ways of dequeuing data.
If the data fits into a buffer, then all data is copied and a
single linear buffer is returned. Otherwise it allocates
additional mbufs and chains them together to return a multiple
segments mbuf.
While that covers most use cases, it forces applications that
need to work with larger data sizes to support multiple segments
mbufs. The non-linear characteristic brings complexity and
performance implications to the application.
To resolve the issue, add support to attach external buffer
to a pktmbuf and let the host provide during registration if
attaching an external buffer to pktmbuf is supported and if
only linear buffer are supported.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>