An optimization was done to only take the iotlb cache lock
once per packet burst instead of once per IOVA translation.
With this, IOTLB miss requests are sent to Qemu with the lock
held, which can cause a deadlock if the socket buffer is full,
and if Qemu is waiting for an IOTLB update to be done.
Holding the lock is not necessary when sending an IOTLB miss
request, as it is not manipulating the IOTLB cache list, which
the lock protects. Let's just release it while sending the
IOTLB miss.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Compiler error:
irte_efd.o: In function `rte_efd_lookup':
rte_efd.c:(.text+0x6d6e): undefined reference to `efd_lookup_internal_avx2'
rte_efd.o: In function `rte_efd_lookup_bulk':
rte_efd.c:(.text+0x87d4): undefined reference to `efd_lookup_internal_avx2'
This can be observed with a compiler that doesn't support AVX2 and
shared build.
Fixes: 86d8989688 ("efd: add AVX2 vector lookup function")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
MSI masks contain a 1 if interrupt is masked, 0 if unmasked.
I got that wrong with the !!state calculation. For better
readability, the mask is now changed like in igbuio_msi_mask_irq.
Fixes: a8ea1e5fb6 ("igb_uio: fix unknown MSI symbols")
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch partially reverts the commit d196343a25 and adds some
functions from Markus' previous version of the patch [1].
igb_uio uses pci_msi_unmask_irq() and pci_msi_mask_irq() kernel APIs
when kernel version is >= 3.19 because these APIs are implemented in
this Linux kernel version.
But these APIs only exported beginning from Linux kernel 4.5, so before
this Linux kernel version igb_uio kernel module is not usable,
and giving following warnings:
"igb_uio: Unknown symbol pci_msi_unmask_irq"
"igb_uio: Unknown symbol pci_msi_mask_irq"
The support for these APIs increased to Linux kernel >= 4.5
For older version of Linux kernel unmask_msi_irq() and mask_msi_irq()
are used but these functions are not exported at all.
Instead of these functions switched back to previous implementation in
igb_uio for MSI-X, and for MSI used igbuio_msi_mask_irq() from [1].
[1]
http://dpdk.org/dev/patchwork/patch/28144/
Fixes: d196343a25 ("igb_uio: use kernel functions for masking MSI-X")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch enables x86 EFD file be compiled only if the compiler
supports AVX2 since it is already chosen at run-time.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
This patch dynamically selects functions of memcpy at run-time based
on CPU flags that current machine supports. This patch uses function
pointers which are bind to the relative functions at constrctor time.
In addition, AVX512 instructions set would be compiled only if users
config it enabled and the compiler supports it.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
Clock Information Leaf to determine the tsc hz on platforms that
supports it (does not require privileged user).
If the CPUID leaf is not available, then try to determine the tsc hz by
reading the MSR 0xCE (requires privileged user).
Default to the tsc hz estimation if both methods fail.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
In ppc_64, rte_rdtsc() returns timebase register value which increments
at independent timebase frequency and hence not related to lcore cpu
frequency to derive TSC hz. Hence, we stick with master lcore frequency.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Use cntvct_el0 system register to get the system counter frequency.
If the system is configured with RTE_ARM_EAL_RDTSC_USE_PMU then
return 0(let the common code calibrate the tsc frequency).
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
When calibrating the TSC frequency, first, probe the architecture specific
function. If not available, use the existing calibrate scheme.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Fix xstats functions, rte_eth_xstats_get_names_by_id()
and rte_eth_xstats_get_by_id(), in current implementation
ethdev level reads all xstat values and filters out
the ones requested by the application. This behavior doesn't
benefit from PMD ops and doesn't provide the benefit the
API was created in the first place for. APIs are also unnecessarily
complicated. Both APIs have different returns for the same params.
In this fix, instead of reading all the stats and finding the
requested value, drivers can provide ops to get selected xstats.
API no longer crashes with certain params,
rte_eth_get_by_id returned seg fault with
"ids = NULL && values != NULL && n<max”
rte_eth_get_names_by_id returned seg fault with
"ids = NULL && values != NULL && n=0”
These now return max number of stats available, matching the other API.
rte_eth_get_by_id returned seg fault with
"ids != NULL && values = NULL && n<max”
This now returns -22,(EINVAL).
Standardized variable/parameter names between the 2 APIs.
Overall code complexity reduced.
Fixes: 79c913a42f ("ethdev: retrieve xstats by ID")
Signed-off-by: Lee Daly <lee.daly@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_flow_error_set() is a convenient helper to initialize error objects.
Since there is no fundamental reason to prevent applications from using it,
expose it through the public interface after modifying its return value
from positive to negative. This is done for consistency with the rest of
the public interface.
Documentation is updated accordingly.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Some features applied were still developed based on older version uint8_t
port_id, but port_id has been increased range to uint16_t. The patch fixes
the issue.
Fixes: f8244c6399 ("ethdev: increase port id range")
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
The librte_sched uses rte_bitmap to manage large arrays of bits in an
optimized method so, moving it to eal/common would allow other libraries
and applications to use it.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
In order to achieve fully reproducible builds, always use the same
inclusion order for headers in the Makefiles.
Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
With the introduction of IOVA mode, the only blocker to run
with 4KB pages for NICs binding to vfio-pci, is that
RTE_BAD_PHYS_ADDR is not a valid IOVA address.
We can refine this by using VA as IOVA if it's IOVA mode.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Add support to AES-CCM, for 128, 192 and 256-bit keys.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
AES-CCM algorithm has some restrictions when
handling nonce (IV) and AAD information.
As the API stated, the nonce needs to be place 1 byte
after the start of the IV field. This field needs
to be 16 bytes long, regardless the length of the nonce,
but it is important to clarify that the first byte
and the padding added after the nonce may be modified
by the PMDs using this algorithm.
Same happens with the AAD. It needs to be placed 18 bytes
after the start of the AAD field. The field also needs
to be multiple of 16 bytes long and all memory reserved
(the first bytes and the padding (may be modified by the PMDs).
Lastly, nonce is not needed to be placed in the first 16 bytes
of the AAD, as the API stated, as that depends on the PMD
used, so the comment has been removed.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Currently, in order to get the name of a crypto device,
a user needs to access to it using the crypto device structure.
It is a better practise to have a function to retrieve this
name, given a device id.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
When register a crypto driver, a cryptodev driver
structure was being allocated, using malloc.
Since this call may fail, it is safer to allocate
this memory statically in each PMD, so driver registration
will never fail.
Coverity issue: 158645
Fixes: 7a364faef1 ("cryptodev: remove crypto device type enumeration")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
This is not required to be printed for every mempool call.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
This reverts commit a1e7c17555.
Original commit assumes there is 1:1 mapping between physical device and
ethdev port, so that device name can be used per port instead of ethdev
name field.
But one physical device may have multiple ethdev ports and each port
needs its own unique name.
One issue reported here:
http://dpdk.org/ml/archives/users/2017-September/002484.html
So reverting back the commit to continue using ethdev name field per
port.
Fixes: a1e7c17555 ("ethdev: use device name from device structure")
Cc: stable@dpdk.org
Reported-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The stats_get dev op API doesn't include return value, so PMD cannot
return an error in case of failure at stats getting process time.
Since PCI devices can be removed and there is a time between the
physical removal to the RMV interrupt, the user may get invalid stats
without any indication.
This patch changes the stats_get API return value to be int instead of
void.
All the net PMDs stats_get dev ops are adjusted by this patch.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add a new offload capability flag for Rx HW
timestamp and enabling/disabling this via rte_eth_rxmode.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Some devices do not support reset of eth stats. An application may
need to know not to clear shadow stats if the device cannot.
rte_eth_stats_reset is updated to provide a return code to share
whether the device supports reset or not.
Signed-off-by: David Harton <dharton@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Allow sufficient space for UUID in string form (36+1).
Needed to use UUID with Hyper-V.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
We have change the type of rx_adv_conf, so change the comment for it.
Fixes: 4bdefaade6 ("ethdev: VMDQ enhancements")
Cc: stable@dpdk.org
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch adds GSO support for GRE-tunneled packets. Supported GRE
packets must contain an outer IPv4 header, and inner TCP/IPv4 headers.
They may also contain a single VLAN tag. GRE GSO doesn't check if all
input packets have correct checksums and doesn't update checksums for
output packets. Additionally, it doesn't process IP fragmented packets.
As with VxLAN GSO, GRE GSO uses a two-segment MBUF to organize each
output packet, which requires multi-segment mbuf support in the TX
functions of the NIC driver. Also, if a packet is GSOed, GRE GSO reduces
its MBUF refcnt by 1. As a result, when all of its GSOed segments are
freed, the packet is freed automatically.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch adds a framework that allows GSO on tunneled packets.
Furthermore, it leverages that framework to provide GSO support for
VxLAN-encapsulated packets.
Supported VxLAN packets must have an outer IPv4 header (prepended by an
optional VLAN tag), and contain an inner TCP/IPv4 packet (with an optional
inner VLAN tag).
VxLAN GSO doesn't check if input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it doesn't
process IP fragmented packets.
As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each
output packet, which mandates support for multi-segment mbufs in the TX
functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO
reduces its MBUF refcnt by 1. As a result, when all of its GSO'd segments
are freed, the packet is freed automatically.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input
packets have correct checksums, and doesn't update checksums for
output packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.
TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.
If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.
To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. To segment a packet requires two steps. The first
is to set proper flags to mbuf->ol_flags, where the flags are the same
as that of TSO. The second is to call the segmentation API,
rte_gso_segment(). This patch introduces the GSO API framework to DPDK.
rte_gso_segment() splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. Each of the newly-created GSO
segments is organized as a two-segment MBUF, where the first segment is a
standard MBUF, which stores a copy of packet header, and the second is an
indirect MBUF which points to a section of data in the input packet.
rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore,
when all GSO segments are freed, the input packet is freed automatically.
Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs),
the driver of the interface which the GSO segments are sent to should
support to transmit multi-segment packets.
The GSO framework clears the PKT_TX_TCP_SEG flag for both the input
packet, and all produced GSO segments in the event of success, since
segmentation in hardware is no longer required at that point.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Adding new wrapper function to existing private (but unused 'till now)
function with an rte_power_ prefix.
The plan is to clean up all the header files in the next release so
that only the intended public functions are in the map file and only
the relevant headers have the rte_ prefix so that only they are
included in the documentation.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.
Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
64bit load and store will be an atomic operation on all the
64bit processors.
Change RTE_ARCH_X86_64 to RTE_ARCH_64 to reflect the case.
Fixes: 9b15ba895b ("timer: use a skip list")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The rte_timer_reset function should be able to register timers on service
lcores as they are EAL threads.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This function can be used to check the role of a specific lcore.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
GCC does have the __get_cpuid_count builtin which checks for maximum
supported leaf, but implementations differ between CLANG and GCC.
This change provides an implementation compatible with both GCC and
CLANG 3.4+.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When compiling with clang extra warning flags, such as used by default with
meson, a warning is given in iotlb.c:
lib/librte_vhost/iotlb.c:318:6: warning:
variable 'socket' is used uninitialized whenever
'if' condition is false [-Wsometimes-uninitialized]
This is a false positive, as the socket value will be initialized by the
call to get_mempolicy in the case where the NUMA build-time flag is set,
and in cases where it is not set, "if (ret)" will always be true as ret is
initialized to -1 and never changed.
However, this is not immediately obvious, and is perhaps a little fragile,
as it will break if other code using ret is subsequently added above the
call to get_mempolicy by someone unaware of this subtle dependency.
Therefore, we can fix the warning and making the code more robust by
explicitly initializing socket to zero, and moving the extra condition
check on the return from get_mempolicy() into the #ifdef
Fixes: d012d1f293 ("vhost: add IOTLB helper functions")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The adapter implementation uses eventdev PMDs to configure the packet
transfer if HW support is available and if not, it uses an EAL service
function that reads packets from ethernet Rx queues and injects these
as events into the event device.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add RTE_EVENT_TYPE_ETH_RX_ADAPTER event type. Certain platforms (e.g.,
octeontx), in the event dequeue function, need to identify events
injected from ethernet hardware into eventdev so that DPDK mbuf can be
populated from the HW descriptor.
Events injected from ethernet hardware would use an event type of
RTE_EVENT_TYPE_ETHDEV and events injected from the rx adapter service
function would use an event type of RTE_EVENT_TYPE_ETH_RX_ADAPTER to
help the event dequeue function differentiate between these two event
sources.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add common APIs for configuring packet transfer from ethernet Rx
queues to event devices across HW & SW packet transfer mechanisms.
A detailed description of the adapter is contained in the header's
comments.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The PMD callbacks are used by the rte_event_eth_rx_xxx() APIs to
configure and control the ethernet receive adapter if packet transfers
from the ethdev to eventdev is implemented in hardware.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The caps API allows application to retrieve capability information
needed to configure the ethernet Rx adapter for the eventdev and
ethdev pair.
For e.g., the ethdev, eventdev pairing maybe such that all of the
ethdev Rx queues can only be connected to a single event queue, in
this case the application is required to pass in -1 as the queue id
when adding a receive queue to the adapter.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This commit adds the new_event_threshold port attribute, so the entire port
configuration structure passed to rte_event_queue_setup can be queried.
Signed-off-by: Gage Eads <gage.eads@intel.com>
This commit adds three new queue attributes, so that the entire queue
configuration structure passed to rte_event_queue_setup can be queried.
Signed-off-by: Gage Eads <gage.eads@intel.com>
This commit bumps the library version to refect the ABI change
caused by removing the individual rte_event_port_count, queue_count,
and other get functions. These functions are superseded by the
get-attribute style API, which allows fetching values without API/ABI
changes.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>