Introduce a new API to get the status of a descriptor.
For Rx, it is almost similar to rx_descriptor_done API, except it
differentiates "used" descriptors (which are hold by the driver and not
returned to the hardware).
For Tx, it is a new API.
The descriptor_done() API, and probably the rx_queue_count() API could
be replaced by this new API as soon as it is implemented on all PMDs.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Modify the enqueue and dequeue macros to support copying any type of
object by passing in the exact object type. Rather than using the "ring"
structure member of rte_ring, which is of type "array of void *", instead
have the macros take the start of the ring a a pointer value, thereby
leaving the rte_ring structure as purely a header value. This allows it
to be reused by other future ring types which can add on extra fields if
they want, or even to have the actual ring elements, of whatever type
stored separate from the ring header.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Both producer and consumer use the same logic for updating the tail
index so merge into a single function.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
We can write a single common function for head manipulation for enq
and a common one for deq, allowing us to have a single worker function
for enq and deq, rather than two of each. Update all other inline
functions to use the new functions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The local variable i is only used for loop control so define it in
the enqueue and dequeue blocks directly, rather than at the function
level.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add an extra parameter to the ring enqueue burst/bulk functions so that
those functions can optionally return the amount of free space in the
ring. This information can be used by applications in a number of ways,
for instance, with single-producer queues, it provides a max
enqueue size which is guaranteed to work. It can also be used to
implement watermark functionality in apps, replacing the older
functionality with a more flexible version, which enables apps to
implement multiple watermark thresholds, rather than just one.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
create a common structure to hold the metadata for the producer and
the consumer, since both need essentially the same information - the
head and tail values, the ring size and mask.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure. Therefore just remove the build
option and set the structures to be always split. On platforms with 64B
cachelines, for improved performance use 128B rather than 64B alignment
since it stops the producer and consumer data being on adjacent cachelines.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This is the main switch over between the legacy API and the new
burst API. We rename all the functions in rte_distributor.c to remove
the _v1705, and we add in _v20 in the rte_distributor_v20.c
We also rename the rte_distributor_next.h as rte_distributor.h, as
this is now the public header.
At the same time, we need the autotests and sample app to compile
properly, hence those changes are in this patch also.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.
Falls back to scalar version if SSE4.2 not available
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch includes the code for new burst-capable distributor library.
It also includes the rte_distributor_next.h file which will
be used as the public header once we add in the symbol versioning
for v20 and v1705 APIs, at which stage we will rename it to
rte_distributor.h.
The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
We'll be adding internal implementation definitions in here
that are common to both burst and legacy APIs.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Move files out of the way so that we can replace with new
versions of the distributor library. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Rather than querying the number of CPUs on the system multiple times, and
printing out the number each time, just query the value from sysctl once
and store it for future reuse.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).
Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
The rte_eal_init function will now pass failure reason hints to the
application. To help app developers decipher this, add some brief
information about what the codes are indicating.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
For now, exit the init. It's likely that even aborting the initialization
is premature in this case, as it may be possible to proceed even if one
bus or another is not available.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Even if one vdev should fail, there's no need to prevent further
processing. Log the error, and reflect it to the higher levels to
decide.
Seems like it's possible to continue. At least, the error is reflected
properly in the logs. A user could then go and correct or investigate
the situation.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Some devices may be inaccessible for a variety of reasons, or the
PCI-bus may be unavailable causing the whole thing to fail. Still,
better to continue attempts at probes.
Since PCI isn't neccessarily required, it may be possible to simply log
the error and continue on letting the user check the logs and restart
the application when things have failed.
This will usually be an issue because of permissions. However, it could
also be caused by OOM. In either case, errno will contain the
underlying cause.
For linux, it is safe to re-init the system here, so allow the
application to take corrective action and reinit.
For BSD, this is not the case, for other reasons, including hugepage
allocation has already happened, and needs to be properly uninitialized.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Plugins are useful and important. However, it seems crazy to abort
everything just because they don't initialize properly.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There could be some confusion as to why the call failed - this change
will always reflect the value of the error in rte_error.
When initializing the interrupt thread, there are a number of possible
reasons for failure - some of which are correctable by the application.
Do not panic() needlessly, and give the application a change to reflect
this information to the user.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
After code inspection, there is no way for eal_timer_init() to fail. It
simply returns 0 in all cases. As such, this test could either go-away
or stay here as 'future-proofing'.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When log initialization fails, it's generally because the fopencookie
failed. While this is rare in practice, it could happen, and it is
likely because of memory pressure. So, flag the error, and allow the
user to retry.
Memory init can only fail when access to hugepages (either as primary or
secondary process) fails (and that is usually permissions). Since the
manner of failure is not reversible, we cannot allow retry.
There are some theoretical racy conditions in the system that _could_
cause early tailq init to fail; however, no need to panic the
application. While it can't continue using DPDK, it could make better
alerts to the user.
rte_eal_alarm_init() call uses the linux timerfd framework to create a
poll()-able timer using standard posix file operations. This could fail
for a few reasons given in the man-pages, but many could be
corrected by the user application. No need to panic.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When memzone initialization fails, report the error to the calling
application rather than panic(). Without a good way of detaching /
releasing hugepages, at this point the application will have to restart.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
It's possible that the application could take a corrective action here,
and either prompt the user for different arguments, or at least perform
a better logging. Exiting this early prevents any useful information
gathering from the application layer.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When attempting to scan hugepages, signal to the eal that an error has
occurred, rather than performing a panic.
If we fail to acquire hugepage information, simply signal an error to
the application. This clears the run_once counter, allowing the user or
application to take a corrective action and retry.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This adds a new API to check for the eal cpu versions.
It's now possible to gracefully exit the application, or for
applications which support non-dpdk datapaths working in concert with
DPDK datapaths, there no longer is the possibility of exiting for
unsupported CPUs.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There may be no way to gracefully recover, but the application
should be notified that a failure happened, rather than completely
aborting. This allows the user to proceed with a "slow-path" type
solution.
After this change, the EAL CPU NUMA node resolution step can no longer
emit an rte_panic. This aligns with the code in rte_eal_init, which
expects failures to return an error code.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The FreeBSD implementation wasn't registering new devices
with the device framework on start up. However, common
code attempts to unregister them on shutdown which causes
a SEGFAULT. This fix makes the FreeBSD code do the same
thing as the Linux code for registration.
Fixes: 13a1317d3ba7 ("pci: create device list and fallback on its members")
Cc: stable@dpdk.org
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.
Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.
Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Allow the BAR setup to succeed if a device has at least 1 BAR region
defined. Previously, the device probe would only succeed if at least one
memory BAR existed, but there are devices that have only port I/O BARs.
For example, on Virtual Box a virtio device has only a single I/O BAR
because by default MSI-X is not enabled. While in qemu/kvm the virtio
device has MSI-X enabled and therefore has both an I/O and Memory BAR.
The following are excerpts from "lspci -nnvvvv -s 00:09.0" on both types of
systems.
Virtual Box:
Region 0: I/O ports at d260 [size=32]
Capabilities: [80] #00 [0000]
QEMU/KVM:
Region 0: I/O ports at c060 [size=32]
Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at feb80000 [disabled] [size=256K]
Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
Vector table: BAR=1 offset=00000000
PBA: BAR=1 offset=00000800
Signed-off-by: Matt Peters <matt.peters@windriver.com>
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
When possible, replace the uses of rte_mempool_create() with
the helper provided in librte_mbuf: rte_pktmbuf_pool_create().
This is the preferred way to create a mbuf pool.
This also updates the documentation.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The check of queue_id is done in all drivers implementing
rte_eth_rx_queue_count(). Factorize this check in the generic function.
Note that the nfp driver was doing the check differently, which could
induce crashes if the queue index was too big.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The API comments are not consistent between each other.
The function rte_eth_rx_queue_count() returns the number of used
descriptors on a receive queue.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Applications and other libraries should not be reading inside the
rte_ring structure directly to get the ring size. Instead add a fn
to allow it to be queried.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This adds a check to ensure that the container_of() macro is not used to
cast away (remove) constness.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This fixes the usage of structure members that are declared const to get
a pointer to the embedding parent structure.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Re-enable CONFIG_RTE_LIBRTE_SCHED, since it is needed to build
correctly.
Fix a few warnings when compiling mpipe_tilegx.c.
Remove an empty rte_cpu_feature_table[] array using a bogus type.
Properly set RTE_OBJCOPY_{TARGET,ARCH} in mk/arch/tile/rte.vars.mk.
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>