Add new Device ID's for backplane and QSFP+ adapters, and delete
deprecated one for backplane.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Modified driver and eal code to support I217 and I218 Intel NICs.
Compiled and tested (via testpmd) on Ubuntu 14.04 for target
x86_64-native-linuxapp-gcc
Compiled for target x86_64-native-linuxapp-clang
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Currently, default values of kickfd and callfd are -1.
If the values are -1, current code guesses kickfd and callfd haven't
been initialized yet. Then vhost library will guess the virtqueue isn't
ready for processing.
But callfd and kickfd will be set as -1 when "--enable-kvm"
isn't specified in QEMU command line. It means we cannot treat -1 as
uninitialized state.
The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and
VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for
the default values of kickfd and callfd.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If a malicious guest forges a dead loop chain, it could lead to a dead
loop of copying the desc buf to mbuf, which results to all mbuf being
exhausted.
Add a var nr_desc to avoid such case.
Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
A malicious guest may easily forge some illegal vring desc buf.
To make our vhost robust, we need make sure desc->next will not
go beyond the vq->desc[] array.
Suggested-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We need make sure that desc->len is bigger than the size of virtio net
header, otherwise, unexpected behaviour might happen due to "desc_avail"
would become a huge number with for following code:
desc_avail = desc->len - vq->vhost_hlen;
For dequeue code path, it will try to allocate enough mbuf to hold such
size of desc buf, which ends up with consuming all mbufs, leading to no
free mbuf is available. Therefore, you might see an error message:
Failed to allocate memory for mbuf.
Also, for both dequeue/enqueue code path, while it copies data from/to
desc buf, the big "desc_avail" would result to access memory not belong
the desc buf, which could lead to some potential memory access errors.
A malicious guest could easily forge such malformed vring desc buf. Every
time we restart an interrupted DPDK application inside guest would also
trigger this issue, as all huge pages are reset to 0 during DPDK re-init,
leading to desc->len being 0.
Therefore, this patch does a sanity check for desc->len, to make vhost
robust.
Reported-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost.
Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't
make sense to me at all.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
First of all, rte_memcpy() is mostly useful for copying big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.
And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy()
is inlined, it increases the binary code size linearly every time
we call it at a different place. Replacing the two rte_memcpy()
with directly copy saves nearly 12K bytes of code size!
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Current virtio_dev_merge_rx() implementation just looks like the
old rte_vhost_dequeue_burst(), full of twisted logic, that you
can see same code block in quite many different places.
However, the logic of virtio_dev_merge_rx() is quite similar to
virtio_dev_rx(). The big difference is that the mergeable one
could allocate more than one available entries to hold the data.
Fetching all available entries to vec_buf at once makes the
difference a bit bigger then.
The refactored code looks like below:
while (mbuf_has_not_drained_totally || mbuf_has_next) {
if (this_desc_has_no_room) {
this_desc = fetch_next_from_vec_buf();
if (it is the last of a desc chain)
update_used_ring();
}
if (this_mbuf_has_drained_totally)
mbuf = fetch_next_mbuf();
COPY(this_desc, this_mbuf);
}
This patch reduces quite many lines of code, therefore, make it much
more readable.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This is a simple refactor, as there isn't any twisted logic in old
code. Here I just broke the code and introduced two helper functions,
reserve_avail_buf() and copy_mbuf_to_desc() to make the code more
readable.
Also, it saves nearly 1K bytes of binary code size.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The current rte_vhost_dequeue_burst() implementation is a bit messy
and logic twisted. And you could see repeat code here and there.
However, rte_vhost_dequeue_burst() acutally does a simple job: copy
the packet data from vring desc to mbuf. What's tricky here is:
- desc buff could be chained (by desc->next field), so that you need
fetch next one if current is wholly drained.
- One mbuf could not be big enough to hold all desc buff, hence you
need to chain the mbuf as well, by the mbuf->next field.
The simplified code looks like following:
while (this_desc_is_not_drained_totally || has_next_desc) {
if (this_desc_has_drained_totally) {
this_desc = next_desc();
}
if (mbuf_has_no_room) {
mbuf = allocate_a_new_mbuf();
}
COPY(mbuf, desc);
}
Note that the old patch does a special handling for skipping virtio
header. However, that could be simply done by adjusting desc_avail
and desc_offset var:
desc_avail = desc->len - vq->vhost_hlen;
desc_offset = vq->vhost_hlen;
This refactor makes the code much more readable (IMO), yet it reduces
binary code size.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled inverse
multiply. To maintain equivalent precision, scaled math is used.
The application ABI is the same.
This improved performance from 5Gbit/sec to 10 Gbit/sec when configured
for 10 Gbit/sec rate.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This adds (with permission of the original author)
reciprocal divide based on algorithm in Linux.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Add DT_NEEDED entries for librte_eal external dependencies.
Details between the platforms differ somewhat, and for static
builds they need to be handled from mk/exec-env still.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Add DT_NEEDED entries for external library dependencies which
are the most critical ones for sane operation.
Clean up vhost_cuse CFLAGS/LDFLAGS confusion while at it.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
There are two places that need -lm (test app and librte_sched) and
exactly one that needs -lrt (librte_sched). Add the relevant
DT_NEEDED entries to both, and eliminate the bogus discrepancy
between Linux and BSD EXECENV_LDLIBS wrt these libs.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Originally, sink ports in librte_port releases received mbufs back to
mempool. This patch adds optional packet dumping to PCAP feature in sink
port: the packets will be dumped to user defined PCAP file for storage or
debugging. The user may also choose the sink port's activity: either it
continuously dump the packets to the file, or stops at certain dumping
This feature shares same CONFIG_RTE_PORT_PCAP compiler option as source
port PCAP file support feature. Users can enable or disable this feature
by setting CONFIG_RTE_PORT_PCAP compiler option "y" or "n".
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Originally, source ports in librte_port is an input port used as packet
generator. Similar to Linux kernel /dev/zero character device, it
generates null packets. This patch adds optional PCAP file support to
source port: instead of sending NULL packets, the source port generates
packets copied from a PCAP file. To increase the performance, the packets
in the file are loaded to memory initially, and copied to mbufs in circular
manner. Users can enable or disable this feature by setting
CONFIG_RTE_PORT_PCAP compiler option "y" or "n".
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Change the fields of outer_mac and inner_mac in struct
rte_eth_tunnel_filter_conf from pointer to struct in order to
keep the code's readability.
Signed-off-by: Xutao Sun <xutao.sun@intel.com>
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
X550 will do VxLAN & NVGRE RX checksum off-load automatically.
This patch exposes the result of the checksum off-load.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The names of function for tunnel port configuration are not
accurate. They're tunnel_add/del, better change them to
tunnel_port_add/del.
The old functions are directly replaced because the API and ABI
compatibility of ethdev are already broken in 16.04.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add functions to support l2 tunnel configuration and operations.
1, L2 tunnel ether type modification.
It means modifying the ether type of a specific type of tunnel.
So the packet with this ether type will be parsed as this type
of tunnel.
2, Enabling/disabling l2 tunnel support.
It means enabling/disabling the ability of parsing the specific
type of tunnel. This ability should be enabled before we enable
filtering, forwarding, offloading for this specific type of
tunnel.
3, Insertion and stripping for l2 tunnel tag.
4, Forwarding the packets to a pool based on l2 tunnel tag.
Only support e-tag tunnel now.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
In order to set ether type of VLAN for single VLAN, inner
and outer VLAN, the VLAN type as an input parameter is added
to 'rte_eth_dev_set_vlan_ether_type()'.
In addition, corresponding changes in e1000, ixgbe and i40e
are also added.
It is an ABI break but ethdev library is already bumped for 16.04.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.
The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
case transmitting a buffered burst fails. By default, we just free the
unsent packets.
As well as these, an additional reference callbacks are provided, which
frees the packets:
* rte_eth_tx_buffer_drop_callback - silently drop packets (default
behavior)
* rte_eth_tx_buffer_count_callback - drop and update user-provided counter
to track the number of dropped packets
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
vq is allocated on pairs, hence we should do pair reallocation
at numa_realloc() as well, otherwise an error like following
occurs while do numa reallocation:
VHOST_CONFIG: reallocate vq from 0 to 1 node
PANIC in rte_free():
Fatal error: Invalid memory
The reason we don't catch it is because numa_realloc() will
not take effect when RTE_LIBRTE_VHOST_NUMA is not enabled,
which is the default case.
Fixes: e049ca6d10 ("vhost-user: prepare multiple queue setup")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
We could first check if we need realloc vq or not, if so,
reallocate it. We then do similar to vhost dev realloc.
This could get rid of the tons of repeated "if (realloc_dev)"
and "if (realloc_vq)" statements, therefore, makes code
a bit more readable.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
While we use a single linked list to maintain all devices, we could
use a static array to achieve the same goal, just like what we did
to maintain the eth devices with rte_eth_devices array. This could
simplifies the code a bit.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
VIRTIO_NET_F_GUEST_ANNOUNCE is a new feature introduced since kernel
v3.5. For older kernels (or more precisely, old distributions), we
could simply define it manually, to fix the "macro not defined" error.
Fixes: d293dac8f3 ("vhost: claim support of guest announce")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE,
CONFIG_RTE_LIBRTE_PIPELINE libraries for arm and arm64
TABLE, PIPELINE libraries were disabled due to LPM library dependency.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
-Used architecture agnostic xmm_t to represent 128 bit SIMD variable
-Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4
API in architecture agnostic way
-Moved rte_lpm_lookupx4 SSE implementation to architecture specific
rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation
for a different architecture.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch add a mechanism for discovery of crypto device features and supported
crypto operations and algorithms. It also provides a method for a crypto PMD to
publish any data range limitations it may have for the operations and algorithms
it supports.
The parameter feature_flags added to rte_cryptodev struct is used to capture
features such as operations supported (symmetric crypto, operation chaining etc)
as well parameter such as whether the device is hardware accelerated or uses
SIMD instructions.
The capabilities parameter allows a PMD to define an array of supported operations
with any limitation which that implementation may have.
Finally the rte_cryptodev_info struct has been extended to allow retrieval of
these parameter using the existing rte_cryptodev_info_get() API.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
This patch provides the implementation of an AES-NI accelerated crypto PMD
which is dependent on Intel's multi-buffer library, see the white paper
"Fast Multi-buffer IPsec Implementations on Intel® Architecture Processors"
This PMD supports AES_GCM authenticated encryption and authenticated
decryption using 128-bit AES keys
The patch also contains the related unit tests functions
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: John Griffin <john.griffin@intel.com>
Wireless algorithms like Snow3G needs input in bits.
In this patch, changes have been made to incorporate this requirement
in both QAT and SW PMD.
Signed-off-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Added new SW PMD which makes use of the libsso SW library,
which provides wireless algorithms SNOW 3G UEA2 and UIA2
in software.
This PMD supports cipher-only, hash-only and chained operations
("cipher then hash" and "hash then cipher") of the following
algorithms:
- RTE_CRYPTO_SYM_CIPHER_SNOW3G_UEA2
- RTE_CRYPTO_SYM_AUTH_SNOW3G_UIA2
The SNOW 3G hash and cipher algorithms, which are enabled
by this crypto PMD are implemented by Intel's libsso software
library. For library download and build instructions,
see the documentation included (doc/guides/cryptodevs/snow3g.rst)
The patch also contains the related unit tests function to test the PMD
supported operations.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
As cryptodev library does not depend on mbuf_offload library
any longer, this patch removes it.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
This patch modifies the crypto burst enqueue/dequeue APIs to operate on bursts
rte_crypto_op's rather than the current implementation which operates on
rte_mbuf bursts, this simplifies the burst processing in the crypto PMDs and the
use of crypto operations in general, including new functions for managing
rte_crypto_op pools.
These changes continues the separation of the symmetric operation parameters
from the more general operation parameters, which will simplify the integration
of asymmetric crypto operations in the future.
PMDs, unit tests and sample applications are also modified to work with the
modified and new API.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Remove unused phys_addr field from key in crypto_xform,
simplify struct and fix knock-on impacts in l2fwd-crypto app
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
This patch splits symmetric specific definitions and
functions away from the common crypto APIs to facilitate the future extension
and expansion of the cryptodev framework, in order to allow asymmetric
crypto operations to be introduced at a later date, as well as to clean the
logical structure of the public includes. The patch also introduces the _sym
prefix to symmetric specific structure and functions to improve clarity in
the API.
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
- Fixed >80char lines in test file
- Removed unused elements from stats struct
- Removed unused objects in rte_cryptodev_pmd.h
- Renamed variables
- Replaced leading spaces with tabs
- Improved performance results display in test
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Two new pipeline API functions have been added to the library. The packet
hijack API function can be called by any input/output port or table action
handler to remove selected packets from the burst of packets read from one
of the pipeline input ports and then either send these packets out through
any pipeline output port or drop them.
Another packet drop API function can be used by the pipeline action
handlers (port in/out, table) to drop the packets selected using packet
mask. This function updates the drop statistics counters correctly.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Currently, there is no mechanism that allows the pipeline ports (in/out)
and table action handlers to override the default forwarding decision
(as previously configured per input port or in the table entry). The port
(in/out) and table action handler prototypes have been changed to allow
pipeline action handlers (port in/out, table) to remove the selected
packets from the further pipeline processing and to take full ownership
for these packets. This feature will be helpful to implement functions
such as exception handling (e.g. TTL =0), load balancing etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
call pci_ioport_map (on x86) only if the pci device is not bound
to a kernel driver.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Use RTE_KDRV_NONE to indicate that kernel driver (other than VFIO/UIO) isn't
managing the device.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
positive return of devinit of pci driver means the driver doesn't support
this device.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
A new rte_lpm_config structure is used so LPM library will allocate
exactly the amount of memory which is necessary to hold application’s
rules.
Signed-off-by: Michal Kobylinski <michalx.kobylinski@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
This patch extend next_hop field from 8-bits to 24-bits in LPM library
for IPv4.
Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.
Signed-off-by: Michal Kobylinski <michalx.kobylinski@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
There was an ABI change in the release 16.04.
Fixes: fb76dd26a3 ("cmdline: increase command line buffer")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
There was an ABI change and more are coming in the release 16.04.
Fixes: a9963a86b2 ("ethdev: increase RETA entry size")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This patch adds a new function to the EAL API:
int rte_eal_primary_proc_alive(const char *path);
The function indicates if a primary process is alive right now.
This functionality is implemented by testing for a write-
lock on the config file, and the function tests for a lock.
The use case for this functionality is that a secondary
process can wait until a primary process starts by polling
the function and waiting. When the primary is running, the
secondary continues to poll to detect if the primary process
has quit unexpectedly, the secondary process can detect this.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Maryam Tahhan <maryam.tahhan@intel.com>
This patch fixes a race-condition when a primary and
secondary process simultaneously probe PCI devices.
This is implemented by moving the rte_eal_mcfg_complete()
function call in rte_eal_init() until after rte_eal_pci_probe().
The memory mapping of PCI device in the secondary process *must*
happen after the primary has finished doing the mapping as it
relies on information written by the primary.
The end result is that the secondary process waits longer,
until the primary has completed its PCI probing, and then
notifies the secondary process.
This race-condition became visible during the development of
a function that allows a secondary process to be polling until
a primary process exists. The secondary would then probe PCI
devices at the same time, causing an error during rte_eal_init()
Linux EAL:
Fixes: 916e4f4f4e ("memory: fix for multi process support")
BSD EAL:
Fixes: 764bf26873 ("add FreeBSD support")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
It deprecates sys files of 'extended_tag' and
'max_read_request_size' which was not documented.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Remove pci configuration of 'extended tag' and 'max read request
size', as they are not required by all devices and it lets PMD to
configure them if necessary.
In addition, 'pci_config_space_set()' is deprecated.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch removes double newlines between functions
in keepalive.[hc] aligning it with the rest of DPDK.
Fixes: 75583b0d1e ("eal: add keep alive monitoring")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This patch sets a timestamp on each lcore when it is registered
for keepalive. This causes the first values read by the monitor
to show time since the core was registered, instead of the delta
between 0 and the timestamp counter.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
This was working fine because addresses of two structs are same:
struct A {
struct B b;
} a;
As above sample "a" and "b" has same address.
Now casting private data back to the correct struct type, to the one
stored.
Fixes: af75078fec ("first public release")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
armv8.1 adds support for new atomic instructions.
Linux kernel v4.3 onwards, the presence of atomic instruction
support can detect through HWCAP_ATOMICS
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
CONFIG_RTE_LIBRTE_EAL_*APP can be replaced by CONFIG_RTE_EXEC_ENV_*APP.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Fixed issue of byte order in ethdev library that the structure
for setting fdir's mask and flow entry is inconsist and made
inputs of mask be in big endian.
Fixes: 2d4c1a9ea2 ("ethdev: add new flow director masks")
Fixes: 76c6f89e80 ("ixgbe: support new flow director masks")
Reported-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Zhe Tao <zhe.tao@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
The tbl8 and tbl24 structures were essentially identical except for
slightly different names for one or two fields. Merge these two
structures into a single structure definition.
Two fields have been renamed as part of this change: the
"ext_entry" field in the tbl24 has been renamed to "valid_group" to match
the tbl8 value to make the merge easier, and the "tbl8_gindex" field has
been renamed to "group_idx". The "valid_group" field now serves two
purposes: in a tbl8 it indicates if the group, i.e. the tbl8, is valid,
and in a tbl24, it indicates if the "group_idx" is valid, i.e. whether
the value is a next_hop or a tbl8 index. [The name "group_idx" was used
to make this latter link between the fields clearer]
Suggested-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
are defined in each PMD driver file. Convert macros to inline
functions and move them to common lib/librte_mbuf/rte_mbuf.h file.
PMD drivers include rte_mbuf.h file directly/indirectly hence no
additioanl header file inclusion is necessary.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
cmdline_parse_*.h headers use struct cmdline_token_hdr /
cmdline_parse_token_hdr_t which is defined in cmdline_parse.h, but
do not include it, forcing manual inclusion.
This commit includes cmdline_parse.h in all cmdline_parse_*.h.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Several NICs can handle 512 entries/queues in their RETA table,
an 8 bit field is not large enough for them.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Allow long command lines in testpmd (like flow director with IPv6, ...).
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
with only one hugepage or already sorted hugepage addresses, the sort
function called memcpy with same src and dst pointer. Debugging with
valgrind will issue a warning about overlapping area. This patch changes
the sort method to qsort to avoid this behavior. The separate sort
function is no longer necessary.
Suggested-by: Jay Rolette <rolette@infiniteio.com>
Signed-off-by: Ralf Hoffmann <ralf.hoffmann@allegro-packets.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Fix compile error when enable CONFIG_RTE_LIBEAL_USE_HPET.
Error messages:
lib/librte_eal/linuxapp/eal/eal_timer.c: In function ‘rte_eal_hpet_init’:
lib/librte_eal/linuxapp/eal/eal_timer.c:222:2: error:
implicit declaration of function ‘rte_thread_setname’
Fixes: badb3688ff ("eal/linux: fix build with glibc < 2.12")
Signed-off-by: Yi Lu <luyi68@live.com>
Acked-by: David Marchand <david.marchand@6wind.com>
The version 2.3 has been renamed 16.04.
Fixes: 6d7de6d2e3 ("version: switch to year.month numbers")
Reported-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The physically linked-together combined library has been an increasing
source of problems, as was predicted when library and symbol versioning
was introduced. Replace the complex and fragile construction with a
simple linker script which achieves the same without all the problems,
remove the related kludges from eg mlx drivers.
Since creating the linker script is practically zero cost, remove the
config option and just create it always.
Based on a patch by Sergio Gonzales Monroy, linker script approach
initially suggested by Neil Horman.
Suggested-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Fix crc32c hash functions to return a valid crc32c value for
data lengths not multiple of 4 bytes.
ARM code is not tested.
Fixes: af75078fec ("first public release")
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.
Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device
In this implementation, while() loop is used because we could not assume
count is strictly positive. Using while() loop saves one line of check.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Broadcast RARP packet by injecting it to receiving mbuf array at
rte_vhost_dequeue_burst().
Commit 33226236a3 ("vhost: handle request to send RARP") iterates
all host interfaces and then broadcast it by all of them. It did
notify the switches about the new location of the migrated VM, however,
the mac learning table in the target host is wrong (at least in my
test with OVS):
$ ovs-appctl fdb/show ovsbr0
port VLAN MAC Age
1 0 b6:3c:72:71:cd:4d 10
LOCAL 0 b6:3c:72:71:cd:4e 10
LOCAL 0 52:54:00:12:34:68 9
1 0 56:f6:64:2c:bc:c0 1
Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the
above, the port learned is "LOCAL", which is the "ovsbr0" port. That
is reasonable, since we indeed send the pkt by the "ovsbr0" interface.
The wrong mac table lead all the packets to the VM go to the "ovsbr0"
in the end, which ends up with all packets being lost, until the guest
send a ARP quest (or reply) to refresh the mac learning table.
Jianfeng then came up with a solution I have thought of firstly but NAKed
by myself, concerning it has potential issues [0]. The solution is as title
stated: broadcast the RARP packet by injecting it to the receiving mbuf
arrays at rte_vhost_dequeue_burst(). The re-bring of that idea made me
think it twice; it looked like a false concern to me then. And I had done
a rough verification: it worked as expected.
[0]: http://dpdk.org/ml/archives/dev/2016-February/033527.html
Another note is that while preparing this version, I found that DPDK has
some ARP related structures and macros defined. So, use them instead of
the one from standard header files here.
Cc: Thibaut Collet <thibaut.collet@6wind.com>
Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
rte_get_log_type and rte_get_log_level functions has been available
for many versions. But they are missing from the shared library map
and therefore do not get exported correctly.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This is useful when sections have duplicate names.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch adds new function rte_jobstats_abort.
It marks *job* as finished and time of this work will be add to management
time instead of execution time.
This function should be used instead of rte_jobstats_finish if condition
occurs, condition is defined by the application for example when receiving
n>0 packets.
Example of usage is added to the example l2fwd-jobstats.
At maximum load do-while loop inside Idle job will be execute once because
one or more jobs waiting to be executed, so this time should not be include
as the execution time by calling rte_jobstats_abort().
Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
User should be able to configure ethdev with zero rx/tx queues,
but both should not be zero.
After above change, rte_eth_dev_tx_queue_config,
rte_eth_dev_rx_queue_config should allocate memory for rx/tx queues only
when number of rx/tx queues are nonzero.
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Macro RTE_PROC_PRIMARY_OR_ERR_RET blocking the secondary process from
API usage. API access should be given to both secondary and primary.
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Macros RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET
are blocking the secondary process from using the APIs.
API access should be given to both secondary and primary.
Reported-by: Sean Harte <sean.harte@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Include vfio map/rd/wr support for pci ioport.
Signed-off-by: Santosh Shukla <sshukla@mvista.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
vfio_pci_mmap() try to map all pci bars. ioport region are not mapped in
vfio/kernel so ignore mmaping for ioport.
Signed-off-by: Santosh Shukla <sshukla@mvista.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
iopl() syscall not supported in linux-arm/arm64 so always return 0 value.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Santosh Shukla <sshukla@mvista.com>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When built in a C++ application, the include fails for 2 reasons:
rte_mbuf_offload.h:128:24: error:
invalid conversion from ‘void*’ to ‘rte_pktmbuf_offload_pool_private*’ [-fpermissive]
rte_mempool_get_priv(mpool);
^
The cast must be explicit for C++.
rte_mbuf_offload.h:304:1: error: expected declaration before ‘}’ token
There was a closing brace for __cplusplus but not an opening one.
Fixes: 78c8709b5d ("mbuf_offload: introduce library to attach offloads to mbuf")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When built in a C++ application, the jhash include fails:
rte_jhash.h:123:22: error:
invalid conversion from ‘const void*’ to ‘const uint32_t*’ [-fpermissive]
const uint32_t *k = key;
^
The cast must be explicit for C++.
Fixes: 8718219a87 ("hash: add new jhash functions")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
When built in a C++ application, the keepalive include fails:
rte_keepalive.h:142:41: error: ‘ALIVE’ was not declared in this scope
keepcfg->state_flags[rte_lcore_id()] = ALIVE;
^
C++ requires to use a scope operator to access an enum inside a struct.
There was also a namespace issue for the values (no RTE prefix).
The solution is to move the struct and related code out of the header file.
Fixes: 75583b0d1e ("eal: add keep alive monitoring")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
some reason. This causes NULL dereference in qva_to_vva().
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The vhost_net_device_ops indirection is unnecessary because there is only
one implementation of the vhost common code.
Removing it makes the code more readable.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Since commits ff909fe21f and 4e32101f9b, it is now possible to free
memzones and rings.
The rte_mempool_create() should be modified to take advantage of this
and not leak memory when an allocation fails.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The common vhost code only supported a single mmap per device. vhost-user
worked around this by saving the address/length/fd of each mmap after the end
of the rte_virtio_memory struct. This only works if the vhost-user code frees
dev->mem, since the common code is unaware of the extra info. The
VHOST_USER_RESET_OWNER message is one situation where the common code frees
dev->mem and leaks the fds and mappings. This happens every time I shut down a
VM.
The new code calls back into the implementation (vhost-user or vhost-cuse) to
clean up these resources.
The vhost-cuse changes are only compile tested.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To claim that we support vhost-user live migration support:
SET_LOG_BASE request will be send only when this feature flag
is set.
Besides this flag, we actually need another feature flag set
to make vhost-user live migration work: VHOST_F_LOG_ALL.
Which, however, has been enabled long time ago.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
While in former patch we enabled GUEST_ANNOUNCE feature, so that the
guest OS will broadcast a GARP message after migration to notify the
switch about the new location of migrated VM, the thing is that
GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
VHOST_USER_SEND_RARP request comes to rescue.
The payload of this new request is the mac address of the migrated VM,
with that, we could construct a RARP message, and then broadcast it
to host interfaces.
That's how this patch works:
- list all interfaces, with the help of SIOCGIFCONF ioctl command
- construct an RARP message and broadcast it
Cc: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
It's actually a feature already enabled in Linux kernel (since v3.5).
What we need to do is simply to claim that we support such feature,
and nothing else.
With that, the guest will send an ARP message after live migration
to notify the switches about the new location of migrated VM.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Every time we copy a buf to vring desc, we need to log it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Introduce vhost_log_write() helper function to log the dirty pages we
touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each
log is presented by 1 bit.
Therefore, vhost_log_write() simply finds the right bit for related
page we are gonna change, and set it to 1. dev->log_base denotes the
start of the dirty page bitmap.
Every time we update virtio used ring, we need to log it. And it's
been done by a new vhost_log_write() wrapper, vhost_log_used_vring().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
VHOST_USER_SET_LOG_BASE request is used to tell the backend (dpdk
vhost-user) where we should log dirty pages, and how big the log
buffer is.
This request introduces a new payload:
typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
} VhostUserLog;
Also, a fd is delivered from QEMU by ancillary data.
With those info given, an area of memory is mmaped, assigned
to dev->log_base, for logging dirty pages.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Commit d0cf91303d added dependency on librte_net headers to vhost
but did not add this to the Makefile, which makes builds
non-deterministic. Curiously it is non-parallel build that is
consistently broken by this missing dependency, usually it's the other
way around, but trying to build without -j(n) fails with:
lib/librte_vhost/vhost_rxtx.c:41:20:
fatal error: rte_ip.h: No such file or directory
Fixes: d0cf91303d ("vhost: add Tx offload capabilities")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add guest offload setting in vhost lib.
Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says:
1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so,
the packet checksum at offset csum_offset from csum_start
and any preceding checksums have been validated. The checksum
on the packet is incomplete and csum_start and csum_offset
indicate how to calculate it (see Packet Transmission point 1).
2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
negotiated, then gso_type MAY be something other than
VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the
desired MSS (see Packet Transmission point 2).
In order to support these features, the following changes are added,
1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation.
2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr.
There are more explanations for the implementation.
For VM2VM case, there is no need to do checksum, for we think the
data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM
at RX side will let the TCP layer to bypass the checksum validation,
so that the RX side could receive the packet in the end.
In terms of us-vhost, at vhost RX side, the offload information is
inherited from mbuf, which is in turn inherited from TX side. If we
can still get those info at RX side, it means the packet is from
another VM at same host. So, it's safe to set the
VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib.
In order to support these features, and the following changes are added,
1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features
negotiation.
2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the
related fileds in mbuf.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Most of the code is inspired on virtio driver.
rte_pci_ioport structure is filled at map time with anything needed for later
read / write calls.
At the moment, base field is used to store a x86 ioport (uint16_t) and will
be reused for other arches.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Tested-by: Santosh Shukla <sshukla@mvista.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The compiler optimization was disabled a long time ago
without describing what was the exact issue.
Maybe it does not apply anymore.
As it looks unneeded, let's remove this strange pragma.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The structure feature_entry does not need leaf/subleaf
which were copied from x86 CPUID implementation.
On x86, a valid flag is detected with the non-zero leaf value.
This check is replaced by a check with a dummy "none" register.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The structure feature_entry does not need leaf/subleaf
which were copied from x86 CPUID implementation.
On x86, a valid flag is detected with the non-zero leaf value.
This check is replaced by a check with a dummy "none" register.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The patch c344eab3ee has moved the hardware definition of CPU flags.
Now the functions checking these hardware flags are also moved.
The function rte_cpu_get_flag_enabled() is no more inline.
The benefits are:
- remove rte_cpu_feature_table from the ABI (recently added)
- hide hardware details from the API
- allow to adapt structures per arch (done in next patch)
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The new function rte_cpu_get_flag_name() is added to the EAL API.
It is implemented (duplicated) in each arch because the next patch
will remove the public exposure of the feature tables.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
non-temporal/transient/stream version of rte_prefetch0()
The non-temporal prefetch is intended as a prefetch hint that processor
will use the prefetched data only once or short period,
unlike the rte_prefetch0() function which imply that
prefetched data to use repeatedly.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
slow-path data structures need not be 128-byte cache aligned.
Reduce the alignment to 64-byte to save the memory.
No behavior change for 64-byte cache aligned systems as minimum
cache line size as 64.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
existing rte_bitmap library implementation optimally configured to run on
64-bytes cache line, extending to 128-bytes cache line targets.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
No need to split mbuf structure to two cache lines for 128-byte cache
line size targets as it can fit on a single 128-byte cache line.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
- RTE_CACHE_LINE_MIN_SIZE(Supported minimum cache line size)
- __rte_cache_min_aligned(Force minimum cache line alignment)
- RTE_CACHE_LINE_SIZE_LOG2(Express cache line size in terms of log2)
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
by default, all the targets will be configured with the 64-byte cache line
size, targets which have different cache line size can be overridden
through target specific config file.
Selected ThunderX and power8 as CONFIG_RTE_CACHE_LINE_SIZE=128 targets
based on existing configuration.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The file rte_config.h is automatically generated and included.
No need to #include it.
The example performance-thread needs a makefile fix to avoid
overwriting the default cflags.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
As discussed on list, switch numbering scheme to be based on year/month.
Release 2.3 then becomes 16.04.
Ref: http://dpdk.org/ml/archives/dev/2015-December/030336.html
Also, added zero padding to the month so that it appear as 16.04 and
not 16.4 in "make showversion" and rte_version().
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
It was requested by Intel, more than one year ago, to replace the name
"Intel DPDK" by "DPDK".
Some references to the old name were still in some docs and code comments,
leading to confusion.
Fixes: ac8ada004c ("doc: remove Intel references from release notes")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
fix the error reported by checkpatch:
"ERROR: return is not a function, parentheses are not required"
remove parentheses in return like:
"return (logical expressions)"
remove parentheses in return a function like:
"return (rte_mempool_lookup(...))"
Fixes: 6307b909b8 ("lib: remove extra parenthesis after return")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Currently rte_eal_check_module() detects Linux kernel modules via reading
/proc/modules. Built-in ones aren't listed there and therefore they are not
being found.
Add support for checking built-in modules with parsing the sysfs files
This commit obsoletes the /proc/modules parsing approach.
Signed-off-by: Kamil Rytarowski <kamil.rytarowski@caviumnetworks.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When configuring RTE_MACHINE to "default", rte_memcpy implementation
is the default one (old AVX).
In this code, clang raises a warning thanks to -Wsometimes-uninitialized:
rte_memcpy.h:838:6: error:
variable 'srcofs' is used uninitialized whenever 'if' condition is false
if (dstofss > 0) {
^~~~~~~~~~~
rte_memcpy.h:849:6: note: uninitialized use occurs here
if (srcofs == 0) {
^~~~~~
It is fixed by moving srcofs initialization out of the condition.
Also dstofss calculation is corrected.
Fixes: 1ae817f9f8 ("eal/x86: tune memcpy for platforms without AVX512")
Reported-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
invoke pci_map_device internally for us. From that point view, there
is no need to export pci_map_device.
However, for virtio pmd driver, which is designed to work without
binding UIO (or something similar first), pci_map_device() will fail,
which ends up with virtio pmd driver being skipped. Therefore, we can
not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
Therefore, this patch exports pci_map_device, and let virtio pmd call
it when necessary.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Santosh Shukla <sshukla@mvista.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Reviewed-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Move cpu_feature_table array from arch specific rte_cpuflags.h files to
new arch specific rte_cpuflags.c files.
Main motivation is to escape from static variable declarations in
header files. cpu_feature_table has many copies in final binary, even
exist in some object files that does not use this variable at all.
And this can be a sample to create architecture specific source files
and move some functions which are not performance sensitive from
architecture header files to source files.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions to eal_vfio.h, and DMA mapping
functions to eal_pci_vfio.c.
Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Santosh Shukla <sshukla@mvista.com>
Tested-by: Santosh Shukla <sshukla@mvista.com>
In fedora 22 with GCC version 5.3.1, when compile,
will result an error:
include/rte_memcpy.h:309:7: error: "RTE_MACHINE_CPUFLAG_AVX2"
is not defined [-Werror=undef]
#elif RTE_MACHINE_CPUFLAG_AVX2
Fixes: 9484092baa ("eal/x86: optimize memcpy for AVX512 platforms")
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
For prior platforms, add condition for unalignment handling, to keep this
operation from interrupting the batch copy loop for aligned cases.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Implement AVX512 memcpy and choose the right implementation based on
predefined macros, to make full utilization of hardware resources and
deliver high performance.
In current DPDK, memcpy holds a large proportion of execution time in
libs like Vhost, especially for large packets, and this patch can bring
considerable benefits for AVX512 platforms.
The implementation is based on the current DPDK memcpy framework, some
background introduction can be found in these threads:
http://dpdk.org/ml/archives/dev/2014-November/008158.htmlhttp://dpdk.org/ml/archives/dev/2015-January/011800.html
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Move these error logs and checks on detach capabilities in a common place.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
We are in static functions and those passed arguments can't be NULL.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
The kernel fills new allocated (huge) pages with zeros.
DPDK just has to populate page tables to trigger the allocation.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Changing from 1/2 second to 1/10 doesn't compromise the precision,
and a 4/10 second is worth saving.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
The API has been removed but the symbols were still declared in the map.
Fixes: a421b86a4a ("ethdev: remove old flow director API")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
test_mp_secondary was initially added by mistake.
rte_snprintf has been removed.
Fixes: 9d41beed24 ("lib: provide initial versioning")
Fixes: 3185322809 ("eal: remove rte_snprintf")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
In eal_intr_proc_rxtx_intr, negative value may be used as argument to
a function expecting a positive value. If 'read' returns EAGAIN as
example, the bytes_read updates to a negative value which continue
be passed as argument for the next 'read'.
Coverity issue: 107115
Function read(fd, &buf, bytes_read) returns a negative number.
Assigning: signed variable bytes_read = read.
CID 107115 (#1 of 1): Argument cannot be negative
(NEGATIVE_RETURNS) bytes_read is passed to a parameter
that cannot be negative.
bytes_read = read(fd, &buf, bytes_read);
Fixes: c9f3ec1a0f ("eal/linux: add Rx interrupt control function")
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Cryptodev was marked experimental and mbuf_offload depends on it.
The mbuf_offload library is one of the crypto area which requires
some discussions before having a stable API.
The experimental mark is also added to rte_cryptodev_configure()
to be sure one cannot miss it.
Fixes: 66874e55f5 ("cryptodev: mark experimental state")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The current implementation of VFIO will not with the new no-IOMMU mode
in 4.4 kernel. The original code assumed that IOMMU group zero would
never be used. Group numbers are assigned starting at zero, and up
until now the group numbers came from the hardware which is likely
to use group 0 for system devices that are not used with DPDK.
The fix is to allow 0 as a valid group and rearrange code
to split the return value from the group value.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When guest is booting (or any othertime guest is busy) it is possible
for the small receive ring (256) to get full. If this happens the
vhost library should just return normally. It's current behavior
of logging just creates massive log spew/overflow which could even
act as a DoS attack against host.
Reported-by: Nathan Law <nlaw@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch fixes the compile errors caused by lacking of "size_t"
definition in rte_hash.h.
The compile error exists on IBM POWER and ARM.
The errors are like:
In file included from app/test/test_hash_scaling.c:35:0:
rte_hash.h:70:70: error: unknown type name ‘size_t’
Fixes: 95da2f8e9c ("hash: customize compare function")
Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
So that we will not break ABI in future extension by adding few more
fields.
Struct vhost_virtqueue is reserved with 16 qwords (the later vhost-live
migration support would at least consume 3 of them), and struct virtio_net
is reserved with a bit more, 64 qwords, as there is only one instance for
a virtio nic instance.
Note that both reservation are not placed at the end of the struct, but
instead before the last field, since both the last field at the two struct
take a lot spaces. Putting the reservation after it would divide those
reserved fields to another cacheline. (we might need fix them in future, btw)
Suggested-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
RHEL 7.2 contains additional backports from newer upstream kernels.
Add RHEL_RELEASE_CODE logic for RHEL_RELEASE_VERSION(7,2) to pick up
the changes to kernel functions.
Signed-off-by: Lee Roberts <lee.roberts@hpe.com>
Implement vqtbl1q_u8 intrinsic function, which is not supported in armv7-a.
Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
CONFIG_* from config files can not be used in code.
Fixes: 12f45fa7e2 ("eal/arm: read timer from PMU if enabled")
Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
This is one of those trivial things git and other tools complain
about.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The CHANNEL_CMDS_MAX_VM_CHANNELS is duplicated in the channel_commands
header file. This commit removes that duplication.
Fixes: 210c383e24 ("power: packet format for vm power management")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The rx_mbuf_alloc_failed counter was only cleared by virtio driver.
Now it is cleared by common rte_eth_stats_reset function for all
drivers at once.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Fixes the mbuf allocation not initialized problem. This problem will cause
the mbufs not be able to freed back to mempool by rte_pktmbuf_free().
Fixes: ef3403fb6f ("port: source and sink")
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Give user a chance to customize the hash key compare function.
The default rte_hash_cmp_eq function is set in the rte_hash_create
function, but these builtin ones may not good enough, so the user
may call this to override the default one.
Signed-off-by: Yu Nemo Wenbin <yuwb_bjy@ctbri.com.cn>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
rte_hash_create function was accidentally duplicated in
DPDK_2.1 in rte_hash_version.map.
Fixes: 473d1beb ("hash: allow to store data in hash table")
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
RTE_PMD_DEBUG_TRACE was changed also to support pedantic flag,
but not completely.
the macro changed only under the RTE_LIBRTE_ETHDEV_DEBUG define,
but when RTE_LIBRTE_ETHDEV_DEBUG is not defined the old format
was left.
fix the macro to support pedantic flag in any case.
Fixes: b974e4a40c ("ethdev: make error checking macros public")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Irrelevant of the target, the preprocessor #ifdef SSE2 for the
grinder_pipe_exists function is inadequate since the __mm_testz_si128
function requires SSE4.1, PTEST instruction described in
https://en.wikipedia.org/wiki/SSE4#SSE4.1 (I do no have better spec
reference). I have bumped the preprocessor #ifdef to require SSE4.
The Atom N2600 does not have SSE4, http://ark.intel.com/products/58916,
and so I had trouble building rte_sched with optimized version of
grinder_pipe_exists, with following:
error: inlining failed in call to always_inline _mm_testz_si128’:
target specific option mismatch
GCC 4.9 correctly identifies my target as not having SSE4, and with
provided patch builds the non-optimized version of grinder_pipe_exists.
Signed-off-by: Mike Sowka <msowka@gmail.com>
The function rte_mempool_obj_iter used in mlx drivers
was not exported. So the driver loading was failing:
EAL: open shared lib librte_pmd_mlx4.so
EAL: x86_64-native-linuxapp-gcc/lib/librte_pmd_mlx4.so:
undefined symbol: rte_mempool_obj_iter
Fixes: 9d41beed24 ("lib: provide initial versioning")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There is a new function in the EAL API for internal use.
It has neither a proper prefix nor a .map export:
libethdev.so: undefined reference to `is_xen_dom0_supported'
Fixes: 719dbebceb ("xen: allow determining DOM0 at runtime")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
gcc 4.3.4 does not include "immintrin.h", and will post below error:
lib/librte_sched/rte_sched.c:56:23: error:
immintrin.h: No such file or directory
This compiler issue is fixed with rte_vect.h
There is another issue, need SSE2 support
Fixes: 42ec27a017 ("sched: enable SSE optimizations in config")
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
- Fix for build error caused by flexible array member in
struct rte_ccryptodev_session:
error: flexible array member in otherwise empty struct
- Change void** casting of sess parameter in
rte_cryptodev_session_create which causes a strict-aliasing error.
Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
This patch is to adding hash table operations for key signature
computed on lookup ("do-sig") for LRU hash tables and Extendible buckets.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch relates to ABI change proposed for librte_table.
The key_mask parameter is added for 8-byte and 16-byte
key extendible bucket and LRU tables.The release notes
is updated and the deprecation notice is removed.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Added functions for adding/deleting multiple records to table owned by
pipeline. The LIBABIVER number is incremented.
Signed-off-by: Maciej Gajdzica <maciejx.t.gajdzica@intel.com>
Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
New functions prototypes for bulk add/delete added to table API. New
functions allows adding/deleting multiple records with single function
call. For now those functions are implemented only for ACL table. For
other tables these function pointers are set to NULL.
Signed-off-by: Maciej Gajdzica <maciejx.t.gajdzica@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Bug fixes for ring ports with IPv4/IPv6 reassembly support.
Previous implementation can't work properly due to incorrect choosing
process function.
Also, assuming that, when processing ip packet, ip header is know we can
set l3_len parameter here.
Fix usage RTE_MBUF_METADATA_* macros due to redefinition the macros.
Fixes: 50f54a84df ("port: add IPv6 reassembly port")
Fixes: ba92d511dd ("port: move metadata offset reference at mbuf head")
Signed-off-by: Piotr Azarewicz <piotrx.t.azarewicz@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
ring_multi_reader input port (on top of multi consumer rte_ring)
ring_multi_writer output port (on top of multi producer rte_ring)
Signed-off-by: Piotr Azarewicz <piotrx.t.azarewicz@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
select hash function for cuckoo, fbk as rte_hash_crc_4byte
if arm64-CRC extension available
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
armv8-a has optional CRC32 extension, march=armv8-a+crc enables code
generation for the ARMv8-A architecture together with
the optional CRC32 extensions.
added RTE_MACHINE_CPUFLAG_CRC32 to detect the availability of
CRC32 extension in compile time. At run-time, The RTE_CPUFLAG_CRC32
can be used to find the availability.
armv8-a+crc target support added in GCC 4.9,
Used inline assembly and emulated __ARM_FEATURE_CRC32 to work
with tool-chain < 4.9
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The following measurements shows improvement over the default
libc memcmp function
Length(B) by X% over libc memcmp
16 149.57%
32 122.7%
48 104.96%
64 98.21%
80 93.75%
96 90.55%
112 110.48%
128 137.24%
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This fix is for IPv6 checksum offload error on RHEL65.
Any optimalisation above -O0 provide error in IPv6 checksum
flag "-fstrict-aliasing" is default for optimalisation above -O0.
Step 1: testpmd -c 0x6 -n 4 -- -i --portmask=0x3 --disable-hw-vlan
--enable-rx-cksum --crc-strip --txqflags=0
Step 2: settings and start
set verbose 1
set fwd csum
start
Step 3: send scapy with bad checksum IPv6/TCP packet
Ether(src="52:00:00:00:00:00",
dst="90:e2:ba:4a:33:5d")/IPv6(src="::1")/TCP(chksum=0xf)/("X"*46)
Step 4: Received packets:
RESULTS: IPv6/TCP': ['0xd41'] or other unexpected.
EXPECTED RESULTS: IPv6/TCP': ['0x9f5e']
Fixes: 2b039d5f20 ("net: fix build with gcc 4.4.7 and strict aliasing")
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The crypto API is in an early state.
It requires more discussions and experiments to declare it stable,
as discussed in http://dpdk.org/ml/archives/dev/2015-November/028634.html
A documentation section will be required in the guides.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This library add support for adding a chain of offload operations to a
mbuf. It contains the definition of the rte_mbuf_offload structure as
well as helper functions for attaching offloads to mbufs and a mempool
management functions.
This initial implementation supports attaching multiple offload
operations to a single mbuf, but only a single offload operation of a
specific type can be attach to that mbuf.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch contains the initial proposed APIs and device framework for
integrating crypto packet processing into DPDK.
features include:
- Crypto device configuration / management APIs
- Definitions of supported cipher algorithms and operations.
- Definitions of supported hash/authentication algorithms and
operations.
- Crypto session management APIs
- Crypto operation data structures and APIs allocation of crypto
operation structure used to specify the crypto operations to
be performed on a particular mbuf.
- Extension of mbuf to contain crypto operation data pointer and
extra flags.
- Burst enqueue / dequeue APIs for processing of crypto operations.
Signed-off-by: Des O Dea <des.j.o.dea@intel.com>
Signed-off-by: John Griffin <john.griffin@intel.com>
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Adding a new macro for specifying __aligned__ attribute, and updating the
current __rte_cache_aligned macro to use it.
Also adding a new macro to specify the __packed__ attribute
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are
supported by very few PMDs. Therefore, it is best to check for support
for the functions in the ethdev library, so as to avoid run-time crashes
at run-time if the application goes to use those APIs. Similarly, the
port parameter should also be checked for validity.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The functions for rx/tx burst, for rx_queue_count and descriptor_done in
the ethdev library all had two copies of the code. One copy in
rte_ethdev.h was inlined for performance, while a second was in
rte_ethdev.c for debugging purposes only. We can eliminate the second
copy of the functions by moving the additional debug checks into the
copies of the functions in the header file. [Any compilation for
debugging at optimization level 0 will not inline the function so the
result should be same as when the function was in the .c file.]
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Move the function pointer and port id checking macros to rte_ethdev and
rte_dev header files, so that they can be used in the static inline
functions there. Also replace the RTE_LOG call within
RTE_PMD_DEBUG_TRACE so this macro can be built with the -pedantic flag
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The macros to check that the function pointers and port ids are valid
for an ethdev are potentially useful to have in a common headers for
use with all PMDs. However, since they would then become externally
visible, we apply the RTE_ & RTE_ETH_ prefix to them as approtiate.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
pthread_setname_np() function added in glibc 2.12, using this function
in older glibc versions cause compile error:
error: implicit declaration of function "pthread_setname_np"
This patch adds "rte_thread_setname" macro and set it according
glibc >= 2.12 check, thread naming disabled for older glibc versions,
glibc versions that support "pthread_setname_np" will keep using this
function.
Fixes: 67b6d3039e ("eal: set name to threads")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Otherwise mbufs will leak when the port is destroyed. The
rte_sched_port_qbase() and rte_sched_port_qsize() functions are used
in free now, so move them up.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Increase the number of possible subports per port to allow up to 16 bits.
It is still possible that this will require excessive RAM.
Although mbuf structure is changed, it is ABI compatiable since it
just expands existing sched part of structure to overlap pre-existing hole
in the hash element of structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Make rte_sched conform to kernel/DPDK coding style.
Fix missing whitespace and some of the excessively long lines.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Only use RTE_SCHED_PORT_N_GRINDERS from config.
Use RTE_BUILD_BUG_ON for errors.
The remaining implementation constants can be put together.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Drop conditional code which was for debugging credit checks.
It is hard to maintain code with any additional #ifdef baggage.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
All #ifdefs in code should be enabled/disabled via DPDK config
(or better yet removed all together).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Break block comments that exceed common practice for line length.
Shorten wording for obvious things.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The actual port_hierarchy was deprecated and hidden in 2.1
so drop it from view in DPDK 2.2.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
To get pci_dev and vf number from dev, benefit from
existing macros in pci.h
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
[Thomas note: it breaks the old 2.6.33 support]
Fixes following error when Ubuntu 12.04 uses kernel 3.13.0-30-generic,
since skb_set_hash() is implemented in the kernel from 3.13.0-30,
which is declared as UBUNTU_KERNEL_VERSION(3,13,0,30,0) and not
UBUNTU_KERNEL_VERSION(3,13,0,30,54)
In file included
from /usr/src/linux-headers-3.13.0-30-generic/include/linux/if_ether.h:23:0,
from /tmp/dpdk/lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_osdep.h:39,
from /tmp/dpdk/lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_hw.h:31,
from /tmp/dpdk/lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_api.h:31,
from /tmp/dpdk/lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_mbx.h:31,
from /tmp/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/e1000_mbx.c:28:
/usr/src/linux-headers-3.13.0-30-generic/include/linux/skbuff.h:740:1:
note: previous definition of ‘skb_set_hash’ was here
skb_set_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types type)
^
Fixes: e88c3b0a ("kni: fix build on Ubuntu 12.04.5")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Problem:if I firstly insert my kmod_test.ko, then insert eventfd_link.ko,
error will happen with hint "Device or resource busy". This is because
the default minor device number, 0, has been occupied by my kmod_test.ko .
root@distro:~/test$ lsmod
Module Size Used by
kmod_test 927 0
vboxsf 35930 4
vboxguest 222130 1 vboxsf
microcode 10315 0
autofs4 25051 0
root@distro:~/test$ insmod ./eventfd_link.ko
insmod: ERROR: could not insert module ./eventfd_link.ko: Device or
resource busy
Explanation: For miscdevices, the major device_no is same, so the minor
device_no should be set to ditinguish different misc devices; if not set
the minor, it may fail while insmod due to the default minor value, 0, has
been used by other miscdevice. MISC_DYNAMIC_MINOR means to let Linux
kernel dynamically assign one minor devide number while loading.
Signed-off-by: Xiaobo Chi <xiaobo.chi@nokia.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The VHOST_USER_SET_VRING_ENABLE request was sent for each queue-pair.
However, it's changed to be sent per queue in the queue-pair by QEMU
commit dc3db6ad ("vhost-user: start/stop all rings"). The change
is reasonable, as we send all other request per queue, instead of
queue-pair.
Hence we should do proper changes to adapt to the QEMU change here.
Otherwise, a segfault will be triggered when last TX queue was enabled.
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch fixes a bug under lower version linux kernel, mmap()
fails when length is not aligned with hugepage size. mmap()
without flag of MAP_ANONYMOUS, should be called with length
argument aligned with hugepagesz at older longterm version
Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL.
This bug was fixed in Linux kernel by commit:
dab2d3dc45ae7343216635d981d43637e1cb7d45
To avoid failure, make sure in caller to keep length aligned.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The patch fixes reset_owner message handling not to clear callfd,
because callfd will be valid while connection is established.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add a sanity check for number of tx descriptors requested during tx
queue setup. Return -EINVAL if the number requested does not meet
the tx descriptor requirements of the device.
Fixes: 80a1deb4c7 ("ethdev: add API to retrieve queue information")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It prevents some drivers to load:
undefined symbol: rte_eth_dma_zone_reserve
Fixes: 719dbebceb ("xen: allow determining DOM0 at runtime")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Not all filesystems supply struct dirent d_type field, in which case
everything in the specified directory would go ignored. One such
filesystem being XFS which RHEL 7 defaults to... stat() the entries
instead.
Fixes: 9f8eb1d9ca ("eal: support driver loading from directory")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
The added error checking on plugin initialization in
commit 9f8eb1d9ca broke the ability of
loading plugins by their basename from default linker locations.
Only use stat() for directory discovery and leave error handling
to dlopen() to restore former behavior.
Fixes: 9f8eb1d9ca ("eal: support driver loading from directory")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
No need for those forward declarations (which breaks build when asking for
C++11 or adding pedantic flag).
Signed-off-by: David Marchand <david.marchand@6wind.com>
It does not build with every C++ compilers.
Reverts the _UNDERLYING_TYPE workarounds to prepare
for another fix in the next patch.
Fixes: 621389bbbe ("eal: fix C++ app build")
Signed-off-by: David Marchand <david.marchand@6wind.com>
CLOCK_MONOTONIC_RAW added in glibc 2.12, using this define in older
glibc versions cause compile error:
'error: identifier "CLOCK_MONOTONIC_RAW" is undefined'
This patch replaces "CLOCK_MONOTONIC_RAW" with "CLOCK_MONOTONIC" for
older glibc versions, versions that support "CLOCK_MONOTONIC_RAW"
will keep using this clock type.
Fixes: d08d304508 ("eal/linux: make alarm not affected by system time jump")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
On HSW box with icc 16.0.0 build for x86_64-native-linuxapp-icc fails with:
icc: command line warning #10120: overriding '-march=native' with '-msse4.1'
...
dpdk.org/x86_64-native-linuxapp-icc/include/rte_memcpy.h(96): error: identifier "__m256i" is undefined
The reason is that icc treats "-march=native ... -msse4.1"
in a different way, then gcc and clang.
For icc it means override all flags enabled with
'-march=native' with '-msse4.1'.
Even when '-march=native' is a superset for '-msse4.1'.
To overcome the problem add a check is SSE4.1 compilation flag already enabled.
If yes, then no need to add '-msse4.1'
Similar change for avx2 compilation option.
Fixes: 074f54ad03 ("acl: fix build and runtime for default target")
Reported-by: Declan Doherty <declan.doherty@intel.com>
Reported-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Adds functions for detecting and reporting the live-ness of LCores,
the primary requirement of which is minimal overheads for the
core(s) being checked. Core failures are notified via an application
defined callback.
Signed-off-by: Remy Horton <remy.horton@intel.com>
It fixes the compile issue on kernel version 2.6.32 or old ones.
Error logs:
lib/librte_eal/linuxapp/kni/kni_misc.c:121: error: unknown field id specified in initializer
lib/librte_eal/linuxapp/kni/kni_misc.c:121: error: excess elements in struct initializer
lib/librte_eal/linuxapp/kni/kni_misc.c:121: error: (near initialization for kni_net_ops)
lib/librte_eal/linuxapp/kni/kni_misc.c:122: error: unknown field size specified in initializer
lib/librte_eal/linuxapp/kni/kni_misc.c:122: error: excess elements in struct initializer
lib/librte_eal/linuxapp/kni/kni_misc.c:122: error: (near initialization for kni_net_ops)
Fixes: 72a7a2b246 ("kni: allow per-net instances")
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
/proc/version_signature is the version for the host machine, but in
e.g., chroots, this does not necessarily match that DPDK is built
for. DPDK will then build for the wrong kernel version - that of the
server, and not that installed in the (build) chroot.
The patch uses utsrelease.h from the kernel sources instead and fakes
the upload version.
Tested on a server with Ubuntu 12.04, building in a chroot for Ubuntu
14.04.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This commit introduce rte_smp_mb(), rte_smp_wmb() and rte_smp_rmb(), in
order to enable memory barriers between lcores.
The patch does not provide any functional change for IA, the goal is to
have infrastructure for weakly ordered machines like ARM to work on DPDK.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The implementation uses NEON gcc intrinsic.
Verified with testacl and acl_autotest applications on arm64 architecture.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
cntcvt_el0 ticks are not based on cpu clk unlike rdtsc in x86.
Its a fixed clock running based at constant speed.
Though its a armv8-a implementer choice, typically it runs at 50 or 100 MHz
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Based on the patch by David Hunt and Armuta Zende:
lib: added support for armv7 architecture
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Amruta Zende <amruta.zende@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This implementation is based on IBM POWER version of
rte_cpuflags. We use software emulation of HW capability
registers, because those are usually not directly accessible
from userspace on ARM.
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
The GCC can be configured to avoid using NEON extensions.
For that purpose, we provide just the memcpy implementation
of the rte_memcpy.
Based on the patch by David Hunt and Armuta Zende:
lib: added support for armv7 architecture
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Amruta Zende <amruta.zende@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch adds spinlock operations for ARM architecture.
We do not support HTM in spinlocks on ARM. Setting of the
RTE_FORCE_INTRINSICS=y is required.
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch adds architecture specific atomic operation file
for ARM architecture. The RTE_FORCE_INTRINSICS=y is required.
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch adds architecture specific prefetch operations
for ARM architecture. It utilizes the pld instruction that
starts filling the appropriate cache line without blocking.
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Enable to choose a preferred way to read timer based on the
configuration entry CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU.
It requires a kernel module that is not included to work.
Based on the patch by David Hunt and Armuta Zende:
lib: added support for armv7 architecture
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Amruta Zende <amruta.zende@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
ARM architecture doesn't have a suitable source of CPU cycles. This
patch uses clock_gettime instead. The implementation should be improved
in the future.
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch adds architecture specific byte order operations
for ARM. The architecture supports both big and little endian.
It requires RTE_FORCE_INTRINSICS=y.
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Add common functions and structures to handle time, and cycle counts
which will be used for PTP processing.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
Add additional functions to support the existing IEEE1588
functionality.
* rte_eth_timesync_write_time(): set the device clock time.
* rte_eth_timesync_read_time(): get the device clock time.
* rte_eth_timesync_adjust_time(): adjust the device clock time.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
Use deprecated attribute to highlight any use of fields that
are marked as going away in the rte_ether device statistics.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch refactors the generic queue stats to be exposed
by rte_ethdev_xstats_get().
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Maryam Tahhan <maryam.tahhan@intel.com>
Add support for directories as arguments to -d for loading all drivers
from a given directory. Additionally a default driver directory can be
set in build-time configuration, in which case it will be always be used
when EAL is initialized.
This simplifies usage in shared library configuration significantly over
manually loading individual drivers with -d, and allows distros to
establish a drop-in driver directory for seamless integration
with 3rd party drivers etc.
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Suggested-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
There's no good reason to limit plugins to Linux, make it available
on FreeBSD too. Refactor the plugin code from Linux EAL to common
helper functions, also check for and fail on errors during initialization.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Currently, we reset all fields of a device to zero when reset
happens, which is wrong, since for some fields like device_fh,
ifname, and virt_qp_nb, they should be same and be kept after
reset until the device is removed. And this is what's the new
helper function reset_device() for.
And use rte_zmalloc() instead of rte_malloc, so that we could
avoid init_device(), which basically dose zero reset only so far.
Hence, init_device() is dropped in this patch.
This patch also removes a hack of using the offset a specific
field (which is virtqueue now) inside of `virtio_net' structure
to do reset, which could be broken easily if someone changed the
field order without caution.
Cc: Tetsuya Mukawa <mukawa@igel.co.jp>
Cc: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
QEMU sends VHOST_RESET_OWNER first when shutting down.
There was previously no way for the dataplane to know that the
virtio_net instance had become unusable and it would segfault
when trying to do RX/TX.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>