Add a sched_yield() syscall if the thread spins for too long,
waiting other thread to finish its operations on the ring.
That gives pre-empted thread a chance to proceed and finish
with ring enqueue/dequeue operation.
The purpose is to reduce contention on the ring.
By ring_perf_test, it doesn't shows additional perf penalty.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
ring debug stat won't take care non-EAL thread.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.
Haven't found significant performance decrease by mempool_perf_test.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. – dynamically created thread will be able to reset/stop timer for lcore thread,
but it will be not allowed to setup timer for itself or another non-lcore thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return straightway.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In non-EAL thread, lcore_id always be LCORE_ID_ANY.
It can't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
For those non-EAL thread, *_lcore_id* is invalid and probably larger than RTE_MAX_LCORE.
The patch adds the check and allows only EAL thread using EAL per thread log level and log type.
Others shares the global log level.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Set _lcore_id and _socket_id to (-1) by default.
For those non EAL thread, _lcore_id shall always be LCORE_ID_ANY.
The libraries using _lcore_id as index need to take care.
_socket_id always be SOCKET_ID_ANY until the thread changes the affinity
by rte_thread_set_affinity().
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add check for rte_socket_id(), avoid get unexpected return like (-1).
By using rte_malloc_socket(), socket id is assigned by socket_arg.
If socket_arg set to SOCKET_ID_ANY, it expects to use the socket id to which the current cores belongs.
As the thread may affinity on a cpuset, the cores in the cpuset may belongs to different NUMA nodes.
The value of _socket_id probably be SOCKET_ID_ANY(-1), the case is not expected in origin malloc_get_numa_socket().
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
1. add two TLS *_socket_id* and *_cpuset*
2. add one internal API, eal_cpu_socket_id/eal_thread_dump_affinity
3. add two public API, rte_thread_set/get_affinity
4. update EAL version map for EAL public API
The API works for both EAL thread and non EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successfully set the cpu affinity.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It defines eal_cpu_socket_id() which exposing the origin private cpu_socket_id().
The function is only used inside EAL. It returns socket_id of the specified cpu_id.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The problem is that strnlen() here may return invalid value with 32bit icc.
(actually it returns it’s second parameter,e.g: sysconf(_SC_ARG_MAX)).
It starts to manifest hwen max_len parameter is > 2M and using icc –m32 –O2 (or above).
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It supports one new eal long option '--lcores' for EAL thread cpuset assignment.
The format pattern:
--lcores='<lcores[@cpus]>[<,lcores[@cpus]>...]'
lcores, cpus could be a single digit/range or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.
e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
lcore 0 runs on cpuset 0x41 (cpu 0,6)
lcore 1 runs on cpuset 0x2 (cpu 1)
lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
lcore 6 runs on cpuset 0x41 (cpu 0,6)
lcore 7 runs on cpuset 0x80 (cpu 7)
lcore 8 runs on cpuset 0x100 (cpu 8)
Test report: http://dpdk.org/ml/archives/dev/2015-February/013383.html
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Qun Wan <qun.wan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.
It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some macros already been defined by freebsd 'sys/param.h'.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Commit 71f0ab1849 broke compilation
on some versions of Debian and Ubuntu where gcc has been modified
to only emit MAJOR.MINOR part of the version from 'gcc -dumpversion'.
Drop the micro-version from gcc version comparisons to work around
this, it wasn't being used for anything anyway.
Fixes: 71f0ab1849 ("mk: rework gcc version detection to permit versions newer than 4.x")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Help is printed with -h or --help.
Help is also printed for an unknown option.
This was broken since the rework of options.
Fixes: 489a9d6c9f ("merge bsd and linux common options parsing")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Options listing in usage help was a mess.
The main usage line is fixed and shorter.
The options in usage output are logically sorted (cpu/mem/dev/proc),
aligned and lightly reworded.
The options in declarations are alphabetically sorted.
Code in swith statement is not moved.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Drivers should be silent on boot.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Device driver should log via DPDK log, not to printf which is
sends to /dev/null in a daemon application.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
[Thomas: include rte_log.h]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Separately comparing major and minor versions becomes seriously clumsy
when with major version changes, convert the entire version string into
a numeric value (ie 4.6.0 becomes 460 and 5.0.0 becomes 500) and use
that for comparisons, eliminate unnecessary negations while at it.
This makes the comparisons simpler, more obvious and makes gcc 5.0
naturally recognized at least as capable as newest 4.x.
This three-digit scheme would run into trouble if gcc ever went to
two-digit version segments, but that hasn't happened in the last 10+
years so it seems like a safe assumption.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit fixes the following error which was reported when
compiling with clang by removing the option.
error: unknown warning option '-Wno-unused-but-set-variable'
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
This commit fixes the following error which was reported when
compiling with clang by moving the function inside an
RTE_LIBRTE_FM10K_DEBUG_RX ifdef block.
error: unused function 'dump_rxd'
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
There was no error checking after calling rte_reorder_create.
Move the creation of the reorder buffer before launching threads
in case of memory error.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Fix a couple of doxygen comments in mbuf structure:
- seqn had no doxygen syntax.
- usr was not generating proper link to function.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
In C++11 concatenated string literals need to have a space in between.
Found with clang++-3.4, IIRC g++-4.8 also complains about this.
Sample error message:
error: invalid suffix on literal; C++11 requires a space between literal
and identifier [-Wreserved-user-defined-literal]
Signed-off-by: Stefan Puiu <stefan.puiu@gmail.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
The current implementation of rte_kni_rx_burst polls the fifo for buffers.
Irrespective of success or failure, it allocates the mbuf and try to put them into the alloc_q
if the buffers are not added to alloc_q, it frees them.
This waste lots of cpu cycles in allocating and freeing the buffers if alloc_q is full.
The logic has been changed to:
1. Initially allocand add buffer(burstsize) to alloc_q
2. Add buffers to alloc_q only when you are pulling out the buffers.
Signed-off-by: Hemant Agrawal <hemant@freescale.com>
Reviewed-by: Jay Rolette <rolette@infiniteio.com>
LPM table overflow may occur if table is full and added rule has
the biggest depth that already have some rules.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
* support calling rte_vhost_driver_register after rte_vhost_driver_session_start
* add mutext to protect fdset from concurrent access
* add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.
mutex lock scenario in vhost:
* event_dispatch(in rte_vhost_driver_session_start) runs in a separate thread, infinitely
processing vhost messages through cb(callback).
* event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
and releases the mutex.
* vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
* vserver_message_handler cb frees data context, marks remove flag to request to delete
connfd(connection fd) from fdset.
* after cb returns, event_dispatch
1. clears busy flag.
2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
removes connfd from fdset.
* rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.
The above steps ensures fd data context isn't freed when cb is using.
VM(s) should have been shutdown before rte_vhost_driver_unregister.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.
In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.
To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
remove set_memory_table ops
vhost-cuse or vhost-user will both implement their own set_memory_region handler.
In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
This functions accepts a virtual address and pid(qemu), and maps it into
current process(vhost)'s address space.
The memory behind the virtual address should be backed by a file,
and virtual address should be the starting address.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
File descriptor is copied from qemu process into vhost process.
vhost-user doesn't need eventfd kernel module to copy fds between processes.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Rename vhost-net-cdev.h to vhost-net.h.
This file defines common operations provided by virtio-net(.c).
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse.
vhost-cuse driver will be divided into two parts: cuse driver specific message
handling(in cuse directory) and common message handling(in virtio-net.c).
vhost ioctl message is pre-processed in cuse and then sent to virtio-net
if is not terminated.
virtio-net.c provides common message handling for both vhost-cuse and vhost-user.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ.
Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
In virtnet_send_command:
/* Caller should know better */
BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
(out + in > VIRTNET_SEND_COMMAND_SG_MAX));
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>