Introduce 2 new functions to support dynamic log types:
- rte_log_register(): register a log name, and return a log type id
- rte_log_set_level(): set the log level of a given log type
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The reorganization of the mbuf structure induces an ABI breakage.
Bump the library version, and update the documentation accordingly.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Move this field in the second cache line, since no driver use it
in Rx path. The freed space will be used by a timestamp in next
commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Change the size of m->port and m->nb_segs to 16 bits. It is now possible
to reference a port identifier larger than 256 and have a mbuf chain
larger than 256 segments.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
To avoid multiple stores on fast path, Ethernet drivers
aggregate the writes to data_off, refcnt, nb_segs and port
to an uint64_t data and write the data in one shot
with uint64_t* at &mbuf->rearm_data address.
Some of the non-IA platforms have store operation overhead
if the store address is not naturally aligned.This patch
fixes the performance issue on those targets.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
To optimize the freeing of the segments, we try try to only update
m->refcnt, m->next, and m->nb_segs when it's required (idea from
Konstantin Ananyev <konstantin.ananyev@intel.com>).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Document the function and make it public, since it is used at several
places in the drivers. The old one is marked as deprecated.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit documents two error return values for the
rte_event_dev_start() function.
-ESTALE indicates not all ports are configured
-ENOLINK indicates that not all queues are linked to ports. If an
application enqueues to such a queue it can lead to deadlock
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit adds rte_errno return values to rte_event_enqueue_burst() and
rte_event_dequeue_burst().
These return values allows user software to differentiate between an
invalid argument (such as an invalid queue_id or sched_type in an enqueued
event) and backpressure from the event device.
The port and device ID checks are placed in RTE_LIBRTE_EVENTDEV_DEBUG
header guards to avoid the performance hit in non-debug execution.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Add in APIs for extended stats so that eventdev implementations can report
out information on their internal state. The APIs are based on, but not
identical to, the equivalent ethdev functions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
PMDs that only do a specific type of scheduling cannot provide
CFG_ALL_TYPES, so the Eventdev infrastructure should not demand
that every PMD supports CFG_ALL_TYPES.
By not overriding the default configuration of the queue as
suggested by the PMD, the eventdev_common unit tests can pass
on all PMDs, regardless of their capabilities.
RTE_EVENT_QUEUE_CFG_DEFAULT is no longer used by the eventdev layer
it can be removed now. Applications should use CFG_ALL_TYPES
if they require enqueue of all types a queue, or specify which
type of queue they require.
The CFG_DEFAULT value is changed to CFG_ALL_TYPES in event/skeleton,
to not break the compile.
A capability flag is added that indicates if the underlying PMD
supports creating queues of ALL_TYPES.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
eventdev driver may return error on dequeue timeout tick conversion.
Change the pmd callback interface to address the same.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This patch initializes the links_map array entries to
EVENT_QUEUE_SERVICE_PRIORITY_INVALID, as expected by
rte_event_port_links_get(). This is necessary for the sw eventdev PMD,
which does not initialize links_map when rte_event_port_setup() calls
rte_event_port_unlink().
Fixes: 4f0804bbdf ("eventdev: implement the northbound APIs")
Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
rte_device is a generic device which is available to the applications
and EAL. This patch replaces rte_pci_device in 'struct rte_eventdev'
and in 'struct rte_event_dev_info' with common rte_device.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Improve the documentation of the return values of the
rte_event_dequeue_timeout_ticks() function, adding a
-ENOTSUP value for eventdevs that do not support waiting.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Large port enqueue sizes were not supported as the value
it was stored in was a uint8_t. Using uint8_ts to save
space in config apis makes no sense - increasing the 3
instances of uint8_t enqueue / dequeue depths to more
appropriate values (based on the context around them).
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This commit clarifies the usage of nb_links and nb_unlinks when passing
a NULL pointer as the queues argument.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Updated the comments on 'nb_events_limit' of 'struct rte_event_dev_config'
and 'new_event_threshold' of 'struct rte_event_port_conf'.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
On port_setup, the link_map is updated only
for configured number of event queues.
Limit the port_links_get scan only to configured number
of event queues. Also, Limit the port link and unlink queue
validation to configured number of event queues.
Fixes: 4f0804bbdf ("eventdev: implement the northbound APIs")
Reported-by: Nipun Gupta <nipun.gupta@nxp.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Added eventdev vdev uninit support to release the resources
allocated in eventdev vdev init.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
- Removed uninitialized max_devs value
- Corrected dev assignment
Fixes: 4f0804bbdf ("eventdev: implement the northbound APIs")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Since eventdev uses event structures rather than working directly on
mbufs, there is no actual dependencies on the mbuf library. The
inclusion of an mbuf pointer element inside the event itself does not
require the inclusion of the mbuf header file. Similarly the pci
header is not needed, but following their removal, rte_memory.h is
needed for the definition of the __rte_cache_aligned macro.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Added a pointer to the rte_eventdev type in the event port
link and unlink callbacks. This device shall be used by some
of the event drivers to fetch queue related information.
Also, update the skeleton eventdev driver with corresponding changes.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This patch adds infrastructure for registering the vdev or
the PCI based event device.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch implements northbound eventdev API interface using
southbond driver interface
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
In a polling model, lcores poll ethdev ports and associated
rx queues directly to look for packet. In an event driven model,
by contrast, lcores call the scheduler that selects packets for
them based on programmer-specified criteria. Eventdev library
adds support for event driven programming model, which offer
applications automatic multicore scaling, dynamic load balancing,
pipelining, packet ingress order maintenance and
synchronization services to simplify application packet processing.
By introducing event driven programming model, DPDK can support
both polling and event driven programming models for packet processing,
and applications are free to choose whatever model
(or combination of the two) that best suits their needs.
This patch adds the eventdev specification header file.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
It doesn't make any sense to invoke destroy_device() callback at
while handling SET_MEM_TABLE message.
From the vhost-user spec, it's the GET_VRING_BASE message indicates
the end of a vhost device: the destroy_device() should be invoked
from there (luckily, we already did that).
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
rte_mbuf struct is something more likely will be used only in vhost-user
net driver, while we have made vhost-user generic enough that it can
be used for implementing other drivers (such as vhost-user SCSI), they
have also include <rte_mbuf.h>. Otherwise, the build will be broken.
We could workaround it by using forward declaration, so that other
non-net drivers won't need include <rte_mbuf.h>.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We used to use rte_vhost_driver_session_start() to trigger the vhost-user
session. It takes no argument, thus it's a global trigger. And it could
be problematic.
The issue is, currently, rte_vhost_driver_register(path, flags) actually
tries to put it into the session loop (by fdset_add). However, it needs
a set of APIs to set a vhost-user driver properly:
* rte_vhost_driver_register(path, flags);
* rte_vhost_driver_set_features(path, features);
* rte_vhost_driver_callback_register(path, vhost_device_ops);
If a new vhost-user driver is registered after the trigger (think OVS-DPDK
that could add a port dynamically from cmdline), the current code will
effectively starts the session for the new driver just after the first
API rte_vhost_driver_register() is invoked, leaving later calls taking
no effect at all.
To handle the case properly, this patch introduce a new API,
rte_vhost_driver_start(path), to trigger a specific vhost-user driver.
To do that, the rte_vhost_driver_register(path, flags) is simplified
to create the socket only and let rte_vhost_driver_start(path) to
actually put it into the session loop.
Meanwhile, the rte_vhost_driver_session_start is removed: we could hide
the session thread internally (create the thread if it has not been
created). This would also simplify the application.
NOTE: the API order in prog guide is slightly adjusted for showing the
correct invoke order.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Export few APIs for the vhost-user driver to log the guest memory writes,
which is a must for live migration support.
This patch basically moves vhost_log_write() and vhost_log_used_vring()
into vhost.h and then add an wrapper (the public API) to them.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Features could be changed after the feature negotiation. For example,
VHOST_F_LOG_ALL will be set/cleared at the start/end of live migration,
respecitively. Thus, we need a new callback to inform the application
on such change.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Rename "virtio-net" to "vhost" in the API comments and vhost prog guide.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
rename "virtio_net_device_ops" to "vhost_device_ops", to not let it
be virtio-net specific.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
They are virtio-net specific and should be defined inside the virtio-net
driver.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether
a virtio device is ready or not. However, we only do it when handling
SET_VRING_KICK message, which could be wrong if a vhost-user frontend
send SET_VRING_KICK first and SET_VRING_CALL later.
To work for all possible vhost-user frontend implementations, we could
move the ready check at the end of vhost-user message handler.
Meanwhile, since we do the check more often than before, the "virtio
not ready" message is dropped, to not flood the screen.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.
Meanwhile, mark rte_vhost_get_queue_num as deprecated.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The queue pair is very virtio-net specific, other devices don't have
such concept. To make it generic, we should log the number of vrings
instead of the number of queue pairs.
This patch just does a simple convert, a later patch would export the
number of vrings to applications.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Some vhost-user driver may need this info to setup its own page tables
for GPA (guest physical addr) to HPA (host physical addr) translation.
SPDK (Storage Performance Development Kit) is one example.
Besides, by exporting this memory info, we could also export the
gpa_to_vva() as an inline function, which helps for performance.
Otherwise, it has to be referenced indirectly by a "vid".
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Assume there is an application both support vhost-user net and
vhost-user scsi, the callback should be different. Making notify
ops per vhost driver allow application define different set of
callbacks for different driver.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Introduce few APIs to set/get/enable/disable driver features.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
A vhost-user server socket could have many connections, thus many connfd.
However, we currently just use one single int var to store it. Meaning,
it will get overwritten every time a new connection is created.
While this will not create fatal issue as it sounds (since the correct
connfd is closured to the event loop thread by fdset_add), it may cause
fd leaks if a user invokes rte_vhost_driver_unregister before shutting
down all connections: it just closes the recent connfd.
A simple example that should be able to reproduce this leaks issues is,
del the ovs vhost-user port while the connected VMs are still alive. (Note
that it's suggested to use one socket for one VM, which makes the issue
not that fatal as it sounds again).
Since we already use a struct "vhost_user_connection" to track all info
about one connection, it's obvious that we should put the connfd there.
Then we could build a connection list inside the vhost_user_socket struct,
to represent all connections belong that socket file.
Fixes: 164fd39678 ("vhost: fix unregistering in client mode")
Cc: stable@dpdk.org
Cc: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
A new interrupt type, RTE_INTR_HANDLE_VDEV, is added to support lsc and rxq
interrupt for vdev.
For lsc interrupt, except from original EPOLLIN events, we also listen for
socket peer closed connection event (EPOLLRDHUP and EPOLLHUP).
For rxq interrupt, add a precondition to avoid invoking any vfio and uio
code.
For intr_handle initialization, let each vdev driver to do that.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
The broadcast_rarp field in the virtio_net struct is checked in the
dequeue datapath regardless of whether descriptors are available or not.
As it is checked with cmpset leading to a write, false sharing on the
virtio_net struct can happen between enqueue and dequeue datapaths
regardless of whether a RARP is requested. In OVS, the issue can cause
a uni-directional performance drop of up to 15%.
Fix that by only performing the cmpset if a read of broadcast_rarp
indicates that the cmpset is likely to succeed.
Fixes: a66bcad322 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements the function for the application to
get the MTU value.
rte_vhost_get_mtu() fills the mtu parameter with the MTU value
set in QEMU if VIRTIO_NET_F_MTU has been negotiated and returns 0,
-ENOTSUP otherwise.
The function returns -EAGAIN if Virtio feature negotiation
didn't happened yet.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch adds a new status flag indicating the Virtio device
is ready to operate.
This is required to be able to call rte_vhost_mtu_get() in the
.new_device() callback, as rte_vhost_mtu_get needs that the
negotiation is done, but it is too early to rely on running status
flag, which is set just after .new_device() returns.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements the vhost-user MTU protocol feature support.
When VIRTIO_NET_F_MTU is negotiated, QEMU notifies the vhost-user
backend with the configured MTU if dedicated protocol feature is
supported.
The value can be used by the application to ensure consistency with
value set by the user.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch enables the new VIRTIO_NET_F_MTU feature,
which makes possible for the host to advise the guest
with its maximum supported MTU.
MTU value is set via QEMU parameters, either via Libvirt XML, or
directly in virtio-net device command line arguments.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We used to allocate queues based on the index from SET_VRING_CALL
request: if corresponding queue hasn't been allocated, allocate it.
Though it's pratically right (it's the first per-vring request we
will get from QEMU for vhost-user negotiation), but it's not technically
right: it's not documented in the vhost-user spec that it will always
be the first per-vring request. For example, SET_VRING_ADDR could also
be the first per-vring request.
Thus, we should not depend the SET_VRING_CALL on queue allocation.
Instead, we could catch all the per-vring messages at the entrance of
request handler, and allocate one if it hasn't been allocated before.
By that, we could remove a hack.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
0x8000 is the max virito-net queue pairs the virtio 1.0 spec claims to
support. While for vhost-user, it's a different story: the max vring
index could be passed by the vhost-user spec is 0xff, masked by the
VHOST_USER_VRING_IDX_MASK.
That said, the max queue pairs could vhost-user could supported is 0x80.
If user are asking more, I think the vhost-user need be extended.
Fixes: b09b198bfb ("vhost-user: announce queue number in message")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Some macros (say VIRTIO_NET_F_MQ) are needed for enabling multiple queue,
however they are introduced since kernel v3.8, meaning build error happens
if we build DPDK vhost on those platforms.
71dfdbe66a ("vhost: fix build with kernel < 3.8") meant to fix it, but
in a wrong way: it completely disables the MQ features for those kernels.
However, the MQ feature doesn't depend on the kernel at all (except the
macros dependency stated above), that we could still enable the MQ feature
even the host kernel has no such support.
The right fix is to define the macro if it's not defined.
Fixes: 71dfdbe66a ("vhost: fix build with kernel < 3.8")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Inability to connect to socket is a normal situation
in client mode because, in common case, server isn't
started yet. RTE_LOG_WARNING should be suitable for
the case of some unusual errors.
Message about reconnection is not an error at all.
Fixes: e623e0c6d8 ("vhost: add reconnect ability")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
fdset_add increments pfdset->num, but fdset_del doesn't decrement
pfdset->num, so if we call fdset_add then fdset_del in a loop without
calling fdset_shrink, we can easily exceed MAX_FDS with only a few
number of fds used.
So my solution is simply to call fdset_shrink in fdset_add when it
exceeds MAX_FDS.
Because fdset_shrink and fdset_add locks pfdset->fd_mutex we can't
call fdset_shrink inside fdset_add because that would cause a dead
lock, so this patch split fdset_shrink in two, fdset_shrink and
fdset_shrink_nolock.
Fixes: 59317cef24 ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
In rte_eth_check_reta_mask(), it is required to align the size of the RETA
table to RTE_RETA_GROUP_SIZE but as the size can be less than the limit,
this should be removed. The change is also applied to a command of testpmd.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
This patch adds MPLS and GRE items to generic rte flow.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Prior to this patch only UIO/VFIO interrupt handlers types were supported.
This patch adds support for the external interrupt handler type, allowing
external drivers to set their own fds with specific interrupt handlers.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Add support for SLES12SP3, which uses kernel 4.4,
but backported features from newer kernels.
Signed-off-by: Nirmoy Das <ndas@suse.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
This commit adds support to the cfgfile library for parsing a key=value
line that has no value string specified (e.g., "key="). This can be used
to override a configuration attribute that has a default value or default
list of values to set it back to an undefined value to disable
functionality.
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
When parsing a ini file with a "key = value" line that has both "key" and
"value" sized to the maximum allowed length causes a parsing failure. The
internal "buffer" variable should be sized at least as large as the maximum
for both fields. This commit updates the local array to be sized to hold
the max name, max value, " = ", and the nul terminator.
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The call to memchr() uses the absolute length of the string buffer instead
of the actual length of the string returned by fgets(). This causes the
search to go beyond the '\n' character and find ';' characters in random
garbage on the stack. This then causes the 'len' variable to be updated
and the subsequent search for the '=' character to potentially find one
beyond the first newline character.
Since this bug relies on ';' and '=' characters appearing in random places
in the 'buffer' variable it is intermittently reproducible at best.
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The current cfgfile comment character is hardcoded to ';'. This commit a
new API to allow the user to specify which comment character to use while
parsing the file.
This is to ease adoption by applications that have an existing
configuration file which may use a different comment character. For
instance, an application may already have a configuration file that uses
the '#' as the comment character.
The approach of using a new API with an extensible parameters structure was
used rather than simply adding a new argument to the existing API to allow
for additional arguments to be introduced in the future.
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The current implementation of the cfgfile library requires that all
key=value pairs be within [SECTION] definitions. The ini file standard
allows for key=value pairs in an unnamed section.
https://en.wikipedia.org/wiki/INI_file#Global_properties
This commit adds the capability of parsing key=value pairs from such an
unnamed section. The CFG_FLAG_GLOBAL_SECTION flag must be passed to the
rte_cfgfile_load() API to enable this functionality. Any key=value pairs
found before the first section can be accessed in the section named
"GLOBAL".
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
glibc 2.25 is warning about if applications depend on
sys/types.h for makedev macro, it expects to be included
from <sys/sysmacros.h>
Found this error while testing with GCC 6.3.1 on archlinux.
lib/librte_eal/linuxapp/eal/eal_pci_uio.c: In function ‘pci_mknod_uio_dev’:
lib/librte_eal/linuxapp/eal/eal_pci_uio.c:134:13:
error: In the GNU C Library, "makedev" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "makedev", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"makedev", you should undefine it after including <sys/types.h>. [-Werror]
dev = makedev(major, minor);
^~~~~~~~~~~~~~~~~
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
When loading nic_uio from /boot/loader.conf as specified in the Getting
Started Guide doc, the NIC devices were not bound at boot. Unloading the
nic_uio driver and reloading it would cause them to be bound, however.
The root cause appears to be the fact that when the module is loaded at
boot, the call to find the pci device when parsing the b:d:f parameter
fails to return the device. That means that later on when the device
is probed as part of a PCI scan, no action is taken as it's not recorded
as a device to be used.
We fix this by having the b:d:f string parsed again on probe if the
initial check to see if it's an already-known device fails. In my tests,
this causes the NIC devices to be successfully bound at boot time, as
well as leaving things working as before in the case the module is loaded
post-boot.
Fixes: 764bf26873 ("add FreeBSD support")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When binding with vfio-pci, secondary process cannot be started with
an error message:
cannot find TAILQ entry for PCI device.
It's due to: struct rte_pci_addr is padded with 1 byte for alignment
by compiler. Then below comparison in commit 2f4adfad0a
("vfio: add multiprocess support") will fail if the last byte is not
initialized.
memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)
And commit cdc242f260 ("eal/linux: support running as unprivileged user")
just triggers this bug by using a stack un-initialized variable.
The fix is to use rte_eal_compare_pci_addr() for pci addr comparison.
Fixes: 2f4adfad0a ("vfio: add multiprocess support")
Fixes: cdc242f260 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org
Reported-by: Pawel Rutkowski <pawelx.rutkowski@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Some compilers require definition of vfio_iommu_spapr_tce_ddw_info
before its use in vfio_iommu_spapr_tce_info, so move tce_info
definition below tce_ddw_info.
Fixes: 468f42cc26 ("vfio: fix build on old kernel")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Recently added "dma_zalloc_coherent()" call is causing build error
for Linux kernels < 3.2.
compile error:
lib/librte_eal/linuxapp/igb_uio/igb_uio.c:
In function ‘igbuio_pci_probe’:
lib/librte_eal/linuxapp/igb_uio/igb_uio.c:434:2:
error: implicit declaration of function ‘dma_zalloc_coherent’
[-Werror=implicit-function-declaration]
map_addr = dma_zalloc_coherent(&dev->dev, 1024,
^
dma_zalloc_coherent() introduced with Linux kernel 3.2, with commit
Linux: 842fa69f3e0c ("include/linux/dma-mapping.h: add dma_zalloc_coherent()")
Since it does not exist for older kernels, causing a build error.
Switched to dma_alloc_coherent() API to prevent build error.
Fixes: d287e4d41b ("igb_uio: map dummy DMA forcing IOMMU domain attachment")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Moved from lib/librte_mempool, stack mempool handler is an independent
driver.
Shared builds would now require to link in librte_mempool_stack for
"stack" mempool handler.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Moved from lib/librte_mempool, ring mempool is now an independent
driver.
Shared builds would now need to add librte_mempool_ring for:
* ring_mp_mc
* ring_sp_sc
* ring_sp_mc
* ring_mp_sc
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In case the stack or ring mempool handler are compiled as shared
library and not linked in with test binary, segfault is reported.
This is because return value of rte_mempool_set_ops_byname is not
being checked in rte_mempool_ops_alloc.
This patch handles error returned from rte_mempool_set_ops_byname
when a mempool is not found.
Fixes: 449c49b93a ("mempool: support handler operations")
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Commit 30e6399892 ("mempool: support non-EAL thread") added the
capability for non-EAL threads to use the mempool library. This commit
removes the note indicating that the mempool library cannot be used safely
by non-EAL threads, and replaces it with a more up-to-date note.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This eliminates the overhead of a task switch when an interrupt arrives.
Signed-off-by: David Su <david.w.su@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
For using a DPDK app when iommu is enabled, it requires to
add iommu=pt to the kernel command line. But using igb_uio driver
makes DMAR errors because the device has not an IOMMU domain.
Since kernel 3.15, iommu=pt requires to use the internal kernel
DMA API for attaching the device to the IOMMU 1:1 mapping, aka
si_domain. Previous versions did attach the device to that
domain when intel iommu notifier was called.
This is not a problem if the driver does later some call to the
DMA API because the mapping can be done then. But DPDK apps do
not use that DMA API at all.
Doing this dma map and unmap is harmless even when iommu is not
enabled at all.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Current device hotplug is just supported by UIO managed devices.
This patch adds same functionality with VFIO.
It has been validated through tests using IOMMU and also with
VFIO and no-iommu mode.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The flags member of irq_set should be ORed with VFIO_IRQ_SET_ACTION_MASK
and not VFIO_IRQ_SET_ACTION_UNMASK. The bug was found by code inspection.
Fixes: 5c782b3928 ("vfio: interrupts")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
compile error:
.../build/build/lib/librte_eal/linuxapp/kni/kni_net.c:124:6:
error: implicit declaration of function ‘signal_pending’
[-Werror=implicit-function-declaration]
if (signal_pending(current) || ret_val <= 0) {
^~~~~~~~~~~~~~
Linux 4.11 moves signal function declarations to its own header file:
Linux: 174cd4b1e5fb ("sched/headers: Prepare to move signal wakeup &
sigpending methods from <linux/sched.h> into <linux/sched/signal.h>")
Use new header file "linux/sched/signal.h" to fix the build error.
Cc: stable@dpdk.org
Reported-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
In rte.lib.mk, the list of libraries passed to the link
command (LDLIBS) is generated from the DEPDIRS-xxx variables.
If a library is not compiled because it is disabled in
configuration, it should not appear in DEPDIRS-xxx.
- librte_port depends on librte_kni only if it is enabled.
- librte_table depends on librte_acl only if it is enabled.
Fixes: feb9f680cd ("mk: optimize directory dependencies")
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Introduce a new API to get the status of a descriptor.
For Rx, it is almost similar to rx_descriptor_done API, except it
differentiates "used" descriptors (which are hold by the driver and not
returned to the hardware).
For Tx, it is a new API.
The descriptor_done() API, and probably the rx_queue_count() API could
be replaced by this new API as soon as it is implemented on all PMDs.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Modify the enqueue and dequeue macros to support copying any type of
object by passing in the exact object type. Rather than using the "ring"
structure member of rte_ring, which is of type "array of void *", instead
have the macros take the start of the ring a a pointer value, thereby
leaving the rte_ring structure as purely a header value. This allows it
to be reused by other future ring types which can add on extra fields if
they want, or even to have the actual ring elements, of whatever type
stored separate from the ring header.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Both producer and consumer use the same logic for updating the tail
index so merge into a single function.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
We can write a single common function for head manipulation for enq
and a common one for deq, allowing us to have a single worker function
for enq and deq, rather than two of each. Update all other inline
functions to use the new functions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The local variable i is only used for loop control so define it in
the enqueue and dequeue blocks directly, rather than at the function
level.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add an extra parameter to the ring enqueue burst/bulk functions so that
those functions can optionally return the amount of free space in the
ring. This information can be used by applications in a number of ways,
for instance, with single-producer queues, it provides a max
enqueue size which is guaranteed to work. It can also be used to
implement watermark functionality in apps, replacing the older
functionality with a more flexible version, which enables apps to
implement multiple watermark thresholds, rather than just one.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
create a common structure to hold the metadata for the producer and
the consumer, since both need essentially the same information - the
head and tail values, the ring size and mask.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure. Therefore just remove the build
option and set the structures to be always split. On platforms with 64B
cachelines, for improved performance use 128B rather than 64B alignment
since it stops the producer and consumer data being on adjacent cachelines.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This is the main switch over between the legacy API and the new
burst API. We rename all the functions in rte_distributor.c to remove
the _v1705, and we add in _v20 in the rte_distributor_v20.c
We also rename the rte_distributor_next.h as rte_distributor.h, as
this is now the public header.
At the same time, we need the autotests and sample app to compile
properly, hence those changes are in this patch also.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.
Falls back to scalar version if SSE4.2 not available
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch includes the code for new burst-capable distributor library.
It also includes the rte_distributor_next.h file which will
be used as the public header once we add in the symbol versioning
for v20 and v1705 APIs, at which stage we will rename it to
rte_distributor.h.
The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
We'll be adding internal implementation definitions in here
that are common to both burst and legacy APIs.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Move files out of the way so that we can replace with new
versions of the distributor library. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Rather than querying the number of CPUs on the system multiple times, and
printing out the number each time, just query the value from sysctl once
and store it for future reuse.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).
Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
The rte_eal_init function will now pass failure reason hints to the
application. To help app developers decipher this, add some brief
information about what the codes are indicating.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
For now, exit the init. It's likely that even aborting the initialization
is premature in this case, as it may be possible to proceed even if one
bus or another is not available.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Even if one vdev should fail, there's no need to prevent further
processing. Log the error, and reflect it to the higher levels to
decide.
Seems like it's possible to continue. At least, the error is reflected
properly in the logs. A user could then go and correct or investigate
the situation.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Some devices may be inaccessible for a variety of reasons, or the
PCI-bus may be unavailable causing the whole thing to fail. Still,
better to continue attempts at probes.
Since PCI isn't neccessarily required, it may be possible to simply log
the error and continue on letting the user check the logs and restart
the application when things have failed.
This will usually be an issue because of permissions. However, it could
also be caused by OOM. In either case, errno will contain the
underlying cause.
For linux, it is safe to re-init the system here, so allow the
application to take corrective action and reinit.
For BSD, this is not the case, for other reasons, including hugepage
allocation has already happened, and needs to be properly uninitialized.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Plugins are useful and important. However, it seems crazy to abort
everything just because they don't initialize properly.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There could be some confusion as to why the call failed - this change
will always reflect the value of the error in rte_error.
When initializing the interrupt thread, there are a number of possible
reasons for failure - some of which are correctable by the application.
Do not panic() needlessly, and give the application a change to reflect
this information to the user.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
After code inspection, there is no way for eal_timer_init() to fail. It
simply returns 0 in all cases. As such, this test could either go-away
or stay here as 'future-proofing'.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When log initialization fails, it's generally because the fopencookie
failed. While this is rare in practice, it could happen, and it is
likely because of memory pressure. So, flag the error, and allow the
user to retry.
Memory init can only fail when access to hugepages (either as primary or
secondary process) fails (and that is usually permissions). Since the
manner of failure is not reversible, we cannot allow retry.
There are some theoretical racy conditions in the system that _could_
cause early tailq init to fail; however, no need to panic the
application. While it can't continue using DPDK, it could make better
alerts to the user.
rte_eal_alarm_init() call uses the linux timerfd framework to create a
poll()-able timer using standard posix file operations. This could fail
for a few reasons given in the man-pages, but many could be
corrected by the user application. No need to panic.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When memzone initialization fails, report the error to the calling
application rather than panic(). Without a good way of detaching /
releasing hugepages, at this point the application will have to restart.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
It's possible that the application could take a corrective action here,
and either prompt the user for different arguments, or at least perform
a better logging. Exiting this early prevents any useful information
gathering from the application layer.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When attempting to scan hugepages, signal to the eal that an error has
occurred, rather than performing a panic.
If we fail to acquire hugepage information, simply signal an error to
the application. This clears the run_once counter, allowing the user or
application to take a corrective action and retry.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This adds a new API to check for the eal cpu versions.
It's now possible to gracefully exit the application, or for
applications which support non-dpdk datapaths working in concert with
DPDK datapaths, there no longer is the possibility of exiting for
unsupported CPUs.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There may be no way to gracefully recover, but the application
should be notified that a failure happened, rather than completely
aborting. This allows the user to proceed with a "slow-path" type
solution.
After this change, the EAL CPU NUMA node resolution step can no longer
emit an rte_panic. This aligns with the code in rte_eal_init, which
expects failures to return an error code.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The FreeBSD implementation wasn't registering new devices
with the device framework on start up. However, common
code attempts to unregister them on shutdown which causes
a SEGFAULT. This fix makes the FreeBSD code do the same
thing as the Linux code for registration.
Fixes: 13a1317d3b ("pci: create device list and fallback on its members")
Cc: stable@dpdk.org
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.
Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.
Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Allow the BAR setup to succeed if a device has at least 1 BAR region
defined. Previously, the device probe would only succeed if at least one
memory BAR existed, but there are devices that have only port I/O BARs.
For example, on Virtual Box a virtio device has only a single I/O BAR
because by default MSI-X is not enabled. While in qemu/kvm the virtio
device has MSI-X enabled and therefore has both an I/O and Memory BAR.
The following are excerpts from "lspci -nnvvvv -s 00:09.0" on both types of
systems.
Virtual Box:
Region 0: I/O ports at d260 [size=32]
Capabilities: [80] #00 [0000]
QEMU/KVM:
Region 0: I/O ports at c060 [size=32]
Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at feb80000 [disabled] [size=256K]
Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
Vector table: BAR=1 offset=00000000
PBA: BAR=1 offset=00000800
Signed-off-by: Matt Peters <matt.peters@windriver.com>
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
When possible, replace the uses of rte_mempool_create() with
the helper provided in librte_mbuf: rte_pktmbuf_pool_create().
This is the preferred way to create a mbuf pool.
This also updates the documentation.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The check of queue_id is done in all drivers implementing
rte_eth_rx_queue_count(). Factorize this check in the generic function.
Note that the nfp driver was doing the check differently, which could
induce crashes if the queue index was too big.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The API comments are not consistent between each other.
The function rte_eth_rx_queue_count() returns the number of used
descriptors on a receive queue.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Applications and other libraries should not be reading inside the
rte_ring structure directly to get the ring size. Instead add a fn
to allow it to be queried.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This adds a check to ensure that the container_of() macro is not used to
cast away (remove) constness.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This fixes the usage of structure members that are declared const to get
a pointer to the embedding parent structure.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Re-enable CONFIG_RTE_LIBRTE_SCHED, since it is needed to build
correctly.
Fix a few warnings when compiling mpipe_tilegx.c.
Remove an empty rte_cpu_feature_table[] array using a bogus type.
Properly set RTE_OBJCOPY_{TARGET,ARCH} in mk/arch/tile/rte.vars.mk.
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
It's trivial to directly invoke a read of the special-purpose
register that holds the clock cycle counter, so just do that.
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
As announced in the deprecation notice, remove the functions for
single/multi producer/consumer enqueue/dequeue.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
When removing log history functions, the map has not been updated.
Fixes: d7e61ad3ae ("log: remove deprecated history dump")
Reported-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Uninitialized scalar variable.
Using uninitialized value cfg->sections[curr_section]->num_entries
when calling rte_cfgfile_close.
And memory in variables cfg->sections[curr_section],
sect->entries[curr_entry] maybe not equal NULL.
We must decrement counters curr_section, curr_entry when failed to realloc.
Fixes: eaafbad419 ("cfgfile: library to interpret config files")
Signed-off-by: Dmitriy Yakovlev <bombermag@gmail.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>