This new parameter is needed to keep device type like PCI or virtual.
Port detaching processes are different between PCI device and virtual
device.
RTE_ETH_DEV_PCI indicates device type is PCI. RTE_ETH_DEV_VIRTUAL
indicates device is virtual.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
The patch adds function pointer to rte_pci_driver and eth_driver
structure. These function pointers are used when ports are detached.
Also, the patch adds rte_eth_dev_uninit(). So far, it's not called
by anywhere, but it will be called when port hotplug function is
implemented.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
This patch adds rte_eth_dev_release_port(). The function is used for
changing an attached status of the device that has specified name.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
To remove assumption, do like followings.
This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver
structure. The flags indicate the driver can detach devices at runtime.
Also, remove assumption that port will not be detached.
To remove the assumption.
- Add 'attached' member to rte_eth_dev structure.
This member is used for indicating the port is attached, or not.
DEV_ATTACHED indicates a port is attached.
DEV_DETACHED indicates a port is detached.
- Add rte_eth_dev_allocate_new_port().
This function is used for allocating new port.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The patch adds following functions.
- rte_eal_vdev_init();
- rte_eal_vdev_uninit();
- rte_eal_parse_devargs_str().
These functions are used for driver initialization and finalization.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
- Add pci_close_all_drivers()
The function tries to find a driver for the specified device, and
then close the driver.
- Add rte_eal_pci_probe_one() and rte_eal_pci_close_one()
The functions are used for probe and close a device.
First the function tries to find a device that has the specified
PCI address. Then, probe or close the device.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
The patch adds functions for unmapping igb_uio resources. The patch is only
for Linux and igb_uio environment. VFIO and BSD are not supported.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
rte_eal_compare_pci_addr().
To compare PCI addresses, rte_eal_compare_pci_addr() doesn't use memcmp().
This is because sizeof(struct rte_pci_addr) returns 6, but actually
this structure is like below.
struct rte_pci_addr {
uint16_t domain; /**< Device domain */
uint8_t bus; /**< Device bus */
uint8_t devid; /**< Device ID */
uint8_t function; /**< Device function. */
};
If the structure is dynamically allocated in a function without bzero,
last 1 byte may have value. As a result, memcmp may not work.
To avoid such a case, rte_eal_compare_pci_addr() compare following values.
dev_addr = (addr->domain << 24) | (addr->bus << 16) |
(addr->devid << 8) | addr->function;
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
With the driver type flag in struct rte_pci_dev, we do not need
to always map uio devices with vfio related function when
vfio enabled.
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Currently, dpdk has no ability to know which type of driver(
vfio-pci/igb_uio/uio_pci_generic) the device used. It only can
check whether vfio is enabled or not statically.
It really useful to have the flag, because different type need to
handle differently in runtime. For example, pci memory map,
pot hotplug, and so on.
This patch add a flag field for pci device to solve above issue.
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
gcc 5 supports a new logical-not-parentheses warning which
ixgbe_common.c triggers, causing build failure with -Werror.
Since this source must not be modified, silence the warning instead.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This PMD manages all variants of Mellanox ConnectX-3 (EN 40, EN 10, Pro EN
40) as well as their virtual functions in SR-IOV context through IB Verbs
(libibverbs) and the dedicated user-space driver (libmlx4).
It is disabled by default due to dependencies on these libraries and only
supports Linux userland at the moment partly because /sys (sysfs) support is
required.
Also claim responsibility in the MAINTAINERS file.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Main code changes:
1. Differentiate architectural features based on CPU flags
a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
b. Implement separated copy flow specifically optimized for target architecture
2. Rewrite the memcpy function "rte_memcpy"
a. Add store aligning
b. Add load aligning based on architectural features
c. Put block copy loop into inline move functions for better control of instruction order
d. Eliminate unnecessary MOVs
3. Rewrite the inline move functions
a. Add move functions for unaligned load cases
b. Change instruction order in copy loops for better pipeline utilization
c. Use intrinsics instead of assembly code
4. Remove slow glibc call for constant copies
Test report: http://dpdk.org/ml/archives/dev/2015-January/011848.html
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
- API rte_timer_reset() should return -1 when the timer is in the
RUNNING or CONFIG state. Instead, it ignores the return value of
internal function __rte_timer_reset() and always returns 0.
We change rte_timer_reset() to return the value returned by
__rte_timer_reset().
- Enhance timer stress test 2 to report how many timer reset
collisions occur, i.e., how many times rte_timer_reset() fails
due to a timer being in the CONFIG state.
Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In rte_timer_reset_sync(), insert rte_pause() into loop that waits
for rte_timer_reset() to succeed.
Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
As per_lcore__socket_id and rte_sys_gettid are missing in version map,
it causes compiling error when CONFIG_RTE_BUILD_SHARED_LIB is enabled.
Fixes: ef76436c68 ("eal: get unique thread id")
Fixes: 9e29251b2a ("eal: thread affinity API")
Reported-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: John McNamara <john.mcnamara@intel.com>
In pci_uio_map_resource we check that we are in a primary process
before calling pci_uio_set_bus_master. However, there is already
an earlier check which means that we are always in a primary instance
at this point in the code, so the check can be removed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Rather than scanning the resource file in sysfs a second time, we
can pull the information on physical addresses of BARs from the
pci resource information already present in the dev structure.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Instead of distinguishing the BAR mappings via offset within a single
file, originally /dev/uioX, switch to mapping each individual bar via
the appropriately numbered resourceX file.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This library provide API to measure time spend in particular parts of
code and to calculate optimal polling time.
To calculate a those statistics application code need to be divided into
parts (called jobs) that do something. It is up to application to decide
what is considered a job.
Series of jobs must be surrounded with the rte_jobstats_context_start()
and rte_jobstats_context_finish() calls. After that, jobs might be
started. Each job must be surrounded with rte_jobstats_start() and
rte_jobstats_finish() calls.
After job finishes its execution, period in which it should be called
again is adjusted. It might be used to minimize time wasted on
unnecessary polls/calls. Adjustment is based on data provided by job
itself (ex: number of packets it processed).
After all jobs in serie are executed fallowing statistics are updated
and might be used by application. Statistics can be reset. Some of
provided statistic data:
- total/min/max execution - time spent in executing jobs.
- total/min/max management - time spent outside execution area. This
value might be used to measure overhead of scheduling jobs. This time
also contains overhead of rte_jobstats library itself.
- number of loops that executed at least one job
- executed jobs
- time when statistics were reset.
Each job provide total/min/max execution time and execution count
statistics.
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Following commit c07691ae10, an implicit change has been done in the
devargs API.
This triggers problem in virtual pmds that did not check for parameters
validity as it was implicitely valid.
Fix this by restoring the empty argument as "" and add a note in the api.
Restore associated tests.
Fixes: c07691ae10 ("devargs: remove limit on parameters length")
Reported-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Add a sched_yield() syscall if the thread spins for too long,
waiting other thread to finish its operations on the ring.
That gives pre-empted thread a chance to proceed and finish
with ring enqueue/dequeue operation.
The purpose is to reduce contention on the ring.
By ring_perf_test, it doesn't shows additional perf penalty.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
ring debug stat won't take care non-EAL thread.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.
Haven't found significant performance decrease by mempool_perf_test.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. – dynamically created thread will be able to reset/stop timer for lcore thread,
but it will be not allowed to setup timer for itself or another non-lcore thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return straightway.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In non-EAL thread, lcore_id always be LCORE_ID_ANY.
It can't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
For those non-EAL thread, *_lcore_id* is invalid and probably larger than RTE_MAX_LCORE.
The patch adds the check and allows only EAL thread using EAL per thread log level and log type.
Others shares the global log level.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Set _lcore_id and _socket_id to (-1) by default.
For those non EAL thread, _lcore_id shall always be LCORE_ID_ANY.
The libraries using _lcore_id as index need to take care.
_socket_id always be SOCKET_ID_ANY until the thread changes the affinity
by rte_thread_set_affinity().
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add check for rte_socket_id(), avoid get unexpected return like (-1).
By using rte_malloc_socket(), socket id is assigned by socket_arg.
If socket_arg set to SOCKET_ID_ANY, it expects to use the socket id to which the current cores belongs.
As the thread may affinity on a cpuset, the cores in the cpuset may belongs to different NUMA nodes.
The value of _socket_id probably be SOCKET_ID_ANY(-1), the case is not expected in origin malloc_get_numa_socket().
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
1. add two TLS *_socket_id* and *_cpuset*
2. add one internal API, eal_cpu_socket_id/eal_thread_dump_affinity
3. add two public API, rte_thread_set/get_affinity
4. update EAL version map for EAL public API
The API works for both EAL thread and non EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successfully set the cpu affinity.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It defines eal_cpu_socket_id() which exposing the origin private cpu_socket_id().
The function is only used inside EAL. It returns socket_id of the specified cpu_id.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The problem is that strnlen() here may return invalid value with 32bit icc.
(actually it returns it’s second parameter,e.g: sysconf(_SC_ARG_MAX)).
It starts to manifest hwen max_len parameter is > 2M and using icc –m32 –O2 (or above).
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It supports one new eal long option '--lcores' for EAL thread cpuset assignment.
The format pattern:
--lcores='<lcores[@cpus]>[<,lcores[@cpus]>...]'
lcores, cpus could be a single digit/range or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.
e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
lcore 0 runs on cpuset 0x41 (cpu 0,6)
lcore 1 runs on cpuset 0x2 (cpu 1)
lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
lcore 6 runs on cpuset 0x41 (cpu 0,6)
lcore 7 runs on cpuset 0x80 (cpu 7)
lcore 8 runs on cpuset 0x100 (cpu 8)
Test report: http://dpdk.org/ml/archives/dev/2015-February/013383.html
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Qun Wan <qun.wan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.
It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some macros already been defined by freebsd 'sys/param.h'.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Commit 71f0ab1849 broke compilation
on some versions of Debian and Ubuntu where gcc has been modified
to only emit MAJOR.MINOR part of the version from 'gcc -dumpversion'.
Drop the micro-version from gcc version comparisons to work around
this, it wasn't being used for anything anyway.
Fixes: 71f0ab1849 ("mk: rework gcc version detection to permit versions newer than 4.x")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Help is printed with -h or --help.
Help is also printed for an unknown option.
This was broken since the rework of options.
Fixes: 489a9d6c9f ("merge bsd and linux common options parsing")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Options listing in usage help was a mess.
The main usage line is fixed and shorter.
The options in usage output are logically sorted (cpu/mem/dev/proc),
aligned and lightly reworded.
The options in declarations are alphabetically sorted.
Code in swith statement is not moved.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Drivers should be silent on boot.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Device driver should log via DPDK log, not to printf which is
sends to /dev/null in a daemon application.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
[Thomas: include rte_log.h]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Separately comparing major and minor versions becomes seriously clumsy
when with major version changes, convert the entire version string into
a numeric value (ie 4.6.0 becomes 460 and 5.0.0 becomes 500) and use
that for comparisons, eliminate unnecessary negations while at it.
This makes the comparisons simpler, more obvious and makes gcc 5.0
naturally recognized at least as capable as newest 4.x.
This three-digit scheme would run into trouble if gcc ever went to
two-digit version segments, but that hasn't happened in the last 10+
years so it seems like a safe assumption.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit fixes the following error which was reported when
compiling with clang by removing the option.
error: unknown warning option '-Wno-unused-but-set-variable'
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
This commit fixes the following error which was reported when
compiling with clang by moving the function inside an
RTE_LIBRTE_FM10K_DEBUG_RX ifdef block.
error: unused function 'dump_rxd'
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Fix a couple of doxygen comments in mbuf structure:
- seqn had no doxygen syntax.
- usr was not generating proper link to function.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
In C++11 concatenated string literals need to have a space in between.
Found with clang++-3.4, IIRC g++-4.8 also complains about this.
Sample error message:
error: invalid suffix on literal; C++11 requires a space between literal
and identifier [-Wreserved-user-defined-literal]
Signed-off-by: Stefan Puiu <stefan.puiu@gmail.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
The current implementation of rte_kni_rx_burst polls the fifo for buffers.
Irrespective of success or failure, it allocates the mbuf and try to put them into the alloc_q
if the buffers are not added to alloc_q, it frees them.
This waste lots of cpu cycles in allocating and freeing the buffers if alloc_q is full.
The logic has been changed to:
1. Initially allocand add buffer(burstsize) to alloc_q
2. Add buffers to alloc_q only when you are pulling out the buffers.
Signed-off-by: Hemant Agrawal <hemant@freescale.com>
Reviewed-by: Jay Rolette <rolette@infiniteio.com>
LPM table overflow may occur if table is full and added rule has
the biggest depth that already have some rules.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
* support calling rte_vhost_driver_register after rte_vhost_driver_session_start
* add mutext to protect fdset from concurrent access
* add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.
mutex lock scenario in vhost:
* event_dispatch(in rte_vhost_driver_session_start) runs in a separate thread, infinitely
processing vhost messages through cb(callback).
* event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
and releases the mutex.
* vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
* vserver_message_handler cb frees data context, marks remove flag to request to delete
connfd(connection fd) from fdset.
* after cb returns, event_dispatch
1. clears busy flag.
2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
removes connfd from fdset.
* rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.
The above steps ensures fd data context isn't freed when cb is using.
VM(s) should have been shutdown before rte_vhost_driver_unregister.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.
In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.
To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
remove set_memory_table ops
vhost-cuse or vhost-user will both implement their own set_memory_region handler.
In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
This functions accepts a virtual address and pid(qemu), and maps it into
current process(vhost)'s address space.
The memory behind the virtual address should be backed by a file,
and virtual address should be the starting address.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
File descriptor is copied from qemu process into vhost process.
vhost-user doesn't need eventfd kernel module to copy fds between processes.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Rename vhost-net-cdev.h to vhost-net.h.
This file defines common operations provided by virtio-net(.c).
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse.
vhost-cuse driver will be divided into two parts: cuse driver specific message
handling(in cuse directory) and common message handling(in virtio-net.c).
vhost ioctl message is pre-processed in cuse and then sent to virtio-net
if is not terminated.
virtio-net.c provides common message handling for both vhost-cuse and vhost-user.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ.
Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
In virtnet_send_command:
/* Caller should know better */
BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
(out + in > VIRTNET_SEND_COMMAND_SG_MAX));
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Add optional support for inline processing of packets inside the RX
or TX call. For an RX callback, what happens is that we get a set of
packets from the NIC and then pass them to a callback function, if
configured, to allow additional processing to be done on them, e.g.
filling in more mbuf fields, before passing back to the application.
On TX, the packets are similarly post-processed before being handed
to the NIC for transmission.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The 'callbacks' member of the rte_eth_dev structure has been renamed
to 'link_intr_cbs' to make it clear that it refers to callbacks from
NIC interrupts. This allows us to add other types of callbacks to
the structure without ambiguity.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch enables PF's internal switch by setting ALLOWLOOPBACK
flag when VEB is created. With this patch, traffic from PF can be
switched on the VEB.
Test report: http://www.dpdk.org/ml/archives/dev/2015-February/013237.html
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
In i40e_vsi_config_tc_queue_mapping, should add a flag to indicate
another valid setting by OR operation, but not set this flag to
valid_sections, otherwise it will overwrite the flags set before.
Test report: http://www.dpdk.org/ml/archives/dev/2015-February/013237.html
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
On XL710, performance number is far from the expectation on recent
firmware versions, if promiscuous mode is disabled, or promiscuous
mode is enabled and port MAC address is equal to the packet
destination MAC address. The fix for this issue may not be
integrated in the following firmware version. So the workaround in
software driver is needed. For XL710, it needs to modify the initial
values of 3 internal only registers, which are the same as X710.
Note that the values for X710 and XL710 registers could be different,
and the workaround can be removed when it is fixed in firmware in
the future.
Test report: http://www.dpdk.org/ml/archives/dev/2015-February/012749.html
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
While VFIO doesn't allow us to map complete BARs with MSI-X tables,
it does allow us to map around them in PAGE_SIZE granularity. There
might be adapters that provide their registers in the same BAR
but on a different page. For example, Intel's NVME adapter, though
not a network adapter, provides only one MMIO BAR that contains
the MSI-X table.
Signed-off-by: Dan Aloni <dan@kernelim.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.
We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.
So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.
This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.
The patch also modifies the vhost example as it was using the
RTE_MBUF_INDIRECT macro to detect if it was an mbuf with external buffer.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch introduces CONFIG_RTE_KNI_PREEMPT_DEFAULT flag. When set to 'no',
KNI kernel thread(s) do not call schedule_timeout_interruptible(), which
improves overall KNI performance at the expense of CPU cycles (polling).
Default values is 'yes', maintaining the same behaviour as of now.
Signed-off-by: Marc Sune <marc.sune@bisdn.de>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.
Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Initially, SSE4.2 support is detected via the constructor function.
Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default.
rte_hash_crc_*byte() functions reworked so they choose available
CRC32 implementation in the runtime.
Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
SSE4.2 provides CRC32 intrinsic with 8-byte operand.
Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Give up using built-in intrinsics and use our own assembly
implementation. Remove #include entry as well.
Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Added:
- crc32c_sse42_u32() emits 'crc32l' asm instruction;
- crc32c_sse42_u64() emits 'crc32q' asm instruction;
- crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform.
Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and 64-bit
operand.
Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The filter types supported are listed below for NVGRE packet:
1. Inner MAC and Inner VLAN ID.
2. Inner MAC address, inner VLAN ID and tenant ID.
3. Inner MAC and tenant ID.
4. Inner MAC address.
5. Outer MAC address, tenant ID and inner MAC address.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Add an Ethernet type definition for Transparent Ethernet Bridging.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
RSS offload types were defined separately for 1/10G and 40G NICs,
and have no relationship with flow types. The modifications are to
unify all RSS offload types for all PMDs. Unified RSS offload types
have new and common names which can be used for any PMD or
applications, and decouple from specific hardwares.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: merge with fm10k]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Flow types was defined actually for i40e hardware specifically,
and wasn't able to be used for defining RSS offload types of all
PMDs. It removed the enum flow types, and uses macros instead
with new names. The new macros can be used for defining RSS
offload types later. Also modifications are made in i40e and
testpmd accordingly.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: merge with new flow director API]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
It wrongly calculates the size of the flow type mask array. The fix
is to align the flow type maximum index ID with the number of
element bit width, and then divide the number of element bit width.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
The old ethertype filter API was removed in commit 75db20648,
but was still in (newly integrated) version map for ABI.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Following structures are removed:
- rte_2tuple_filter
- rte_5tuple_filter
Following APIs are removed:
- rte_eth_dev_add_2tuple_filter
- rte_eth_dev_remove_2tuple_filter
- rte_eth_dev_get_2tuple_filter
- rte_eth_dev_add_5tuple_filter
- rte_eth_dev_remove_5tuple_filter
- rte_eth_dev_get_5tuple_filter
It also move macros TCP_*_FLAG to rte_eth_ctrl.h, and removes the macro
TCP_UGR_FLAG which is duplicated with TCP_URG_FLAG.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
[Thomas: remove also from version map]
This patch defines new functions dealing with ntuple filters which is
corresponding to 5tuple in HW.
It removes old functions which deal with 5tuple filters.
Ntuple filter is dealt with through entrance ixgbe_dev_filter_ctrl.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This patch defines new functions dealing with ntuple filters which is
corresponding to 2tuple filter for 82580 and i350 in HW, and to 5tuple
filter for 82576 in HW.
It removes old functions which deal with 2tuple and 5tuple filters in igb driver.
Ntuple filter is dealt with through entrance eth_igb_filter_ctrl.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This patch defines ntuple filter type RTE_ETH_FILTER_NTUPLE and its structure rte_eth_ntuple_filter.
It also corrects the typo TCP_UGR_FLAG to TCP_URG_FLAG
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Structure rte_syn_filter is removed.
Following APIs are removed:
- rte_eth_dev_add_syn_filter
- rte_eth_dev_remove_syn_filter
- rte_eth_dev_get_syn_filter
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: remove also from version map]
This patch defines new functions dealing with syn filter.
It removes old functions which deal with syn filter.
Syn filter is dealt with through entrance ixgbe_dev_filter_ctrl.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
This patch defines new functions dealing with syn filter.
It removes old functions of syn filter in igb driver.
Syn filter is dealt with through entrance eth_igb_filter_ctrl.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Structure rte_flex_filter is removed.
Following APIs are removed:
- rte_eth_dev_add_flex_filter
- rte_eth_dev_remove_flex_filter
- rte_eth_dev_get_flex_filter
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: remove also from version map]
This patch defines new functions dealing with flex filter.
It removes old functions of flex filter in igb driver.
Syn filter is dealt with through entrance eth_igb_filter_ctrl.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
This patch implement RTE_ETH_FILTER_FLUSH operation to delete all
flow director filters in ixgbe driver.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch changes the get info operation to be implemented through
filter_ctrl API and RTE_ETH_FILTER_INFO/RTE_ETH_FILTER_STATS ops.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch implement the mask configuration of flow director filter,
by using the mask defined in rte_fdir_conf instead of callback function
fdir_set_masks.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch removes the flexbytes_offset from rte_fdir_conf, because
the flexible payload setting is done by flex_conf instead of flexbytes_offset.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch implement the flexpayload configuration of flow director filter.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch adds RTE_ETH_FLOW_TYPE_RAW and RTE_ETH_RAW_PAYLOAD to support the
flexible payload is started from the beginning of the packet.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch changes the add/delete/update operations to be implemented through
filter_ctrl API and RTE_ETH_FILTER_ADD/RTE_ETH_FILTER_DELETE/RTE_ETH_FILTER_UPDATE ops.
It also removes the callback functions:
- ixgbe_eth_dev_ops.fdir_add_signature_filter
- ixgbe_eth_dev_ops.fdir_update_signature_filter
- ixgbe_eth_dev_ops.fdir_remove_signature_filter
- ixgbe_eth_dev_ops.fdir_add_perfect_filter
- ixgbe_eth_dev_ops.fdir_update_perfect_filter
- ixgbe_eth_dev_ops.fdir_remove_perfect_filter
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
enable/disable interrupt by manipulating a control bit of command
register on NIC's PCIe configuration space.
Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Tested-by: Qun Wan <qun.wan@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Change the EAL PCI code so that it can work with both the
uio_pci_generic in-tree driver, as well as the igb_uio
DPDK-specific driver.
This involves changes to
1) Modify method of retrieving BAR resource mapping information
2) Mapping using resource files in /sys rather than /dev/uio*
2) Setup bus master bit in NIC's PCIe configuration space for
uio_pci_generic.
Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
This patch modify mode older name from
BONDING_MODE_ADAPTIVE_TRANSMIT_LOAD_BALANCING to BONDING_MODE_TLB
This patch also changes order of TEST_ASSERT macro in
test_tlb_verify_slave_link_status_change_failover.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
This patch add some debug information when using link bonding mode 6.
It prints basic information about ARP packets on RX and TX (MAC, ip,
packet number, arp packet type).
If CONFIG_RTE_LIBRTE_BOND_DEBUG_ALB == y.
If CONFIG_RTE_LIBRTE_BOND_DEBUG_ALB_L1 is enabled instead of previous
one, use show command to see IPv4 balancing from clients.
Signed-off-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
This mode includes adaptive TLB and receive load balancing (RLB). In RLB
the bonding driver intercepts ARP replies send by local system and
overwrites its source MAC address, so that different peers send data to
the server on different slave interfaces. When local system sends ARP
request, it saves IP information from it. When ARP reply from that peer
is received, its MAC is stored, one of slave MACs assigned and ARP reply
send to that peer.
Signed-off-by: Maciej Gajdzica <maciejx.t.gajdzica@intel.com>
Signed-off-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Changed MAC address type from uint8_t[6] to struct ether_addr and IP
address type from uint8_t[4] to uint32_t to make it consistent with
other DPDK code using MAC and IP addresses. It allows us to use
is_same_ether_addr and ether_addr_copy functions on MAC addresses in ARP header. Also
removed union from arp_hdr struct to make calls to arp_data items
shorter. Updated test-pmd to match new arp_hdr version.
Signed-off-by: Maciej Gajdzica <maciejx.t.gajdzica@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
[Thomas: doxygenize comments]
Remove those hotspots which is unnecessary when early returning occurs.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Make virtio not require UIO for some security reasons, this is to match
6WIND's virtio-net-pmd.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
This makes virtio driver work like ixgbe. Transmit buffers are
held until a transmit threshold is reached. The previous behavior
was to hold mbuf's until the ring entry was reused which caused
more memory usage than needed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Need to have do special things to set default mac address.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Change order of initialization to match Linux kernel.
Don't blow away control queue by doing reset when stopped.
Calling dev_stop then dev_start would not work.
Dev_stop was calling virtio reset and that would clear all queues
and clear all feature negotiation.
Resolved by only doing reset on device removal.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
To keep the consistent logic with normal Rx path, the mergeable
Rx path also needs software vlan strip/decap if it is enabled.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Implement VLAN stripping in software. This allows application
to be device independent.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
It is helpful to allow device drivers that don't support hardware
VLAN stripping to emulate this in software. This allows application
to be device independent.
Avoid discarding shared mbufs. Make a copy in rte_vlan_insert() of any
packet to be tagged that has a reference count > 1.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Since vq_alignment is constant (always 4K), it does not
need to be part of the vring struct.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Virtio has link state interrupt which can be used.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Starting driver with link down should be ok, it is with every
other driver. So just allow it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Updating the vring descriptor index should be done before notifying host;
Remove 2 duplicated store memory barriers in both Rx and Tx path because there is
store memory barrier in vq_update_avail_idx function;
Notify the host only if packets actually transmitted(nb_tx > 0).
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The DPDK driver only has to deal with the case of running on PCI
and with SMP. In this case, the code can use the weaker barriers
instead of using hard (fence) barriers. This will help performance.
The rationale is explained in Linux kernel virtio_ring.h.
To make it clearer that this is a virtio thing and not some generic
barrier, prefix the barrier calls with virtio_.
Add missing (and needed) barrier between updating ring data
structure and notifying host.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Better to check at compile time than fail at runtime.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Make vtpci_get_status a local function as it is used in one file.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
For clarity make the setup of PCI resources for Linux into a function rather
than block of code #ifdef'd in middle of dev_init.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Eliminate ambiguity in the condition which trips up a "logical not
is only applied to the left..." warning from gcc 5, causing build
failure with -Werror. Besides non-ambiguous, the condition is
far more obvious this way.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is a prerequisite
of CONFIG_RTE_IXGBE_INC_VECTOR.
Reported-by: Alexander Belyakov <abelyako@gmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When the vector pmd was receiving a mix of packets of various sizes,
some of which were split across multiple mbufs, there was an issue
with reassembly of the jumbo frames. This was due to a skipped increment
when using "continue" in a while loop. Changing the loop to a "for"
loop fixes this problem, by ensuring the increment is always performed.
Reported-by: Prashant Upadhyaya <praupadhyaya@gmail.com>
Reported-by: Martin Weiser <martin.weiser@allegro-packets.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Martin Weiser <martin.weiser@allegro-packets.com>
ixgbe is little endian, but cpu maybe not.
add necessary conversions.
rte_cpu_to_le_32(...) for PCI write
rte_le_to_cpu_32(...) for PCI read.
Signed-off-by: Xuelin Shi <xuelin.shi@freescale.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
e1000 is little endian, but cpu maybe not.
add necessary conversions.
rte_cpu_to_le_32(...) for PCI write
rte_le_to_cpu_32(...) for PCI read.
Signed-off-by: Xuelin Shi <xuelin.shi@freescale.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Variables are unsigned int but format scans for signed int.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Building shared libraries and using virtio PMD results in undefined
reference to 'rte_eal_iopl_init'.
Add missing function to eal version map.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The argument ressize contains the size of the result buffer which
should be large enough to store the parsed result of a token. In
this case, it should be larger or equal to sizeof(cmdline_portlist_t)
(4 bytes), not PORTLIST_TOKEN_SIZE which is the max size of the token
string.
This is not a critical, it fixes cases where the total length of the
parsed instruction is greater than the maximum.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
rte_eth_bond_8023ad_setup and rte_eth_bond_8023ad_conf_get entries need
to be exported to be used by test application for shared libraries
compilation.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Link up and down implementation.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
rte_eth_stats_get is to get physical device statistics. Without
return status, caller does not know whether function fails or not
(i.e. invalid port_id, driver has no stats_get callback). Caller
cannot differiente normal 0 stats or error case. This fix adds
a return status to the function.
Signed-off-by: Jia Yu <jyu@vmware.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Since LSR interrupt is disabled by pmd drivers, link status in rte_eth_device is always down.
Bond slave_configure() enables LSR interrupt on devices to get notification if link status
changes. However, the LSC interrupt at device start time is still lost.
In this fix, call link_update to read link status from hardware
register at device start time.
Signed-off-by: Jia Yu <jyu@vmware.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This library provides reordering capability for out of order mbufs based
on a sequence number in the mbuf structure.
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Signed-off-by: Richardson Bruce <bruce.richardson@intel.com>
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
As far as I know, there is no reason why we should have a limit on the length of
parameters that can be given for a device.
Remove this limit by using dynamic allocations.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Prepare for next commit.
Fix some indent issues, refactor error code.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
1. Add functions to enable PF/VF interrupt.
2. Add function to process error message passed from interrupt.
2. Add 2 interrupt handling functions, one for PF and one for VF.
2. Enable interrupt after completing initialization of NIC.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
fm10k pmd driver will support both PF and VF device with single
copy of code. The reason is NIC maps registers with same
function in PF and VF to same PCI I/O address. Then, PF/VF drivers
use same address to access registers belonging to it, HW will
translate the request to correct units.
For some functionalities that are unique to PF, driver will check
current driver type and behave correctly.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
1. Add fm10k_recv_scattered_pkts function to receive jumbo frame
and multi-segment packets.
2. Configure correct receive function in rx_init and dev_init.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
1. Configure RSS in fm10k_dev_rx_init function.
2. Add fm10k_rss_hash_update and fm10k_rss_hash_conf_get to get
and inquery RSS configuration.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
1. Add fm10k_recv_pkts and fm10k_xmit_pkts functions.
2. Link app function pointer to actual fm10k recv/xmit
functions.
3. Change Makefile to compile new file fm10k_rxtx.c
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
1. Add function to initialize RX queues.
2. Add function to initialize TX queues.
3. Add fm10k_dev_start, fm10k_dev_stop and fm10k_dev_close
functions.
4. Add function to close mailbox service.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
1. Add 4 functions fm10k_dev_rx_queue_start,
fm10k_dev_rx_queue_stop, fm10k_dev_tx_queue_start,
and fm10k_dev_tx_queue_stop.
2. verify Rx packet buffer alignment is valid.
Hardware requires specific alignment for Rx packet buffers. At
least one of the following two conditions must be satisfied.
1) Address is 512B aligned
2) Address is 8B aligned and buffer does not cross 4K boundary.
Alignment is checked by the driver when the Rx queue is reset. It
is assumed that if an entire descriptor ring can be filled with
buffers containing valid alignment, then all buffers in that mempool
have valid address alignment. It is the responsibility of the user
to ensure all buffers have valid alignment, as it is the user who
creates the mempool.
It is assumed the buffer needs only to store a maximum size Ethernet
frame.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
1. Add init function to scan and initialize fm10k PF device.
2. Add implementation to register fm10k pmd PF driver.
3. Add 3 functions fm10k_dev_configure, fm10k_stats_get and
fm10k_stats_get.
4. Add fm10k.h to define macros and basic data structure.
5. Add fm10k_logs.h to control log message output.
6. Change config/common_bsdapp and config/common_linuxapp, add
macros to control fm10k pmd driver compile for linux and bsd.
7. Add Makefile.
8. Change lib/Makefile to add fm10k driver into compile list.
9. Change mk/rte.app.mk to add fm10k lib into link.
10. Add ABI version of librte_pmd_fm10k
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Add fm10k device ID list into rte_pci_dev_ids.h.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Base driver is developed and maintained by Intel ND team, includes
basic functional service to Intel Ethernet Switch FM10000 Series
of silicons.
Any suggestion on bug fix and improvement within this directory is
welcome, but need this team to change and update.
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
This could be useful to have this values for debug purposes.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
When offloading the checksums of ipip tunnels, m->l2_len is set to 0
as there is no tunnel or inner l2 header. Since this is a valid value
remove the test.
By the way, also remove the same test with l3_len because at this
point, it is expected that the software provides proper values in the
mbuf. It should avoid a test in dataplane processing and therefore
slightly increase performance.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Advertise the DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM flag in the PMD
features. It means that the i40e PMD supports the offload of outer IP
checksum when transmitting tunneling packet.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
If the flag is advertised by a PMD, the NIC supports the outer IP
checksum TX offload of tunneling packets, therefore an application can
set the PKT_TX_OUTER_IP_CKSUM flag in mbufs when transmitting on this
port.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Since previous commit, the flag PKT_TX_UDP_TUNNEL_PKT is not used by any PMD,
remove it from mbuf API and from csumonly (testpmd). In csumonly, the
PKT_TX_OUTER_IP_CKSUM flag is already set for vxlan checksum, providing
enough information to the underlying driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
The definition of the flag PKT_TX_UDP_TUNNEL_PKT in rte_mbuf.h was:
TX packet is an UDP tunneled packet. It must be specified when using
outer checksum offload (PKT_TX_OUTER_IP_CKSUM)
This flag was used to tell the NIC that the offload type is UDP
(I40E_TXD_CTX_UDP_TUNNELING flag). In the datasheet, it says it's
required to specify the tunnel type in the register. However, some tests
(see [1]) showed that it also works without this flag.
Moreover, it is not explained how the hardware use this
information. From a network perspective, this information is useless for
calculating the outer IP checksum as it does not depend on the payload.
Having this flag in the API would force the application to specify the
tunnel type for something that looks only useful for this PMD. It will
limit the number of possible tunnel types (we would need a flag for each
tunnel type) and therefore prevent to support outer IP checksum for
proprietary tunnels.
Finally, if a hardware advertises "I support outer IP checksum", it must
be supported for any payload types.
This has been validated by [2], knowing that the ipip test case was fixed
after this test report [3].
[1] http://dpdk.org/ml/archives/dev/2015-January/011380.html
[2] http://dpdk.org/ml/archives/dev/2015-January/011475.html
[3] http://dpdk.org/ml/archives/dev/2015-January/011610.html
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
From i40e datasheet:
The IP header type and its offload. In case of tunneling, the IIPT
relates to the inner IP header. See also EIPT field for the outer
(External) IP header offload.
00 - non IP packet or packet type is not defined by software
01 - IPv6 packet
10 - IPv4 packet with no IP checksum offload
11 - IPv4 packet with IP checksum offload
Therefore it is not needed to fill the IIPT field if no offload is
requested (we can keep the value to 00). For instance, the linux driver
code does not set it when (skb->ip_summed != CHECKSUM_PARTIAL). We can
do the same in the dpdk driver.
The function i40e_txd_enable_checksum() that fills the offload registers
can only be called for packets requiring an offload.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
The alias PKT_TX_IPV4_CSUM is only used in one place of i40e driver.
Remove it and only keep the legacy flag PKT_TX_IP_CSUM.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
In loopback mode, it's expected force link up even when there's no cable connect.
But in codes, setup_sfp() rewrites the related register.
It causes in the case 'multispeed_fiber', it can't link up without cable connect.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Patrick Lu <patrick.lu@intel.com>
max_vfs will only be created by igb_uio driver, for other
drivers like vfio or pci_uio_generic, max_vfs will miss.
But sriov_numvfs is not driver related, just get the vf numbers
from that field.
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
cmdline_token_portlist_ops fell through cracks in the initial symbol
versioning patch, breaking pktgen build.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
To differentiate libraries that break ABI, we add a library version number
suffix to the library, which must be incremented when a given libraries ABI is
broken. This patch enforces that addition, sets the initial abi soname
extension to 1 for each library and creates a symlink to the base SONAME so that
the test applications will link properly.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Add linker version script files to each DPDK library to put a stake in the
ground from which we can start cleaning up API's
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Add initial pass header files to support symbol versioning.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Hash filter control has been implemented for i40e. It includes
getting/setting,
- global hash configurations (hash function type, and symmetric
hash enable per flow type)
- symmetric hash enable per port
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In order to support hash filter configuration, filter type of hash
is added, also the corresponding structures, macros and definitions
are added.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Calculating the default RSS hash keys at run time is not needed
at all, and may have race conditions. The alternative is to use
array of random values which were generated manually as the
default hash keys.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The Link bonding library is incorrectly using receive packet type flags
in the transmit policy hashing functions, which would cause packets
generated locally to be incorrectly distributed across the slave
devices. This patch completely removes the dependency on the packet
type flags and uses the ether_type from either the Ethernet header or
the VLAN headers for branching.
This patch also includes the associate changes in the test suite and in
the packet_burst_generator code to remove the dependences on the packet
type flags.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
This is a duplication of some EAL parts for a standalone packaging
which is not documented.
Packaging should be done outside of DPDK.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
If at build phase we don't make any trie splitting,
then temporary build structures and resulting RT structure might be
much bigger than current.
>From other side - having just one trie instead of multiple can speedup
search quite significantly.
>From my measurements on rule-sets with ~10K rules:
RT table up to 8 times bigger, classify() up to 80% faster
than current implementation.
To make it possible for the user to decide about performance/space trade-off -
new parameter for build config structure (max_size) is introduced.
Setting it to the value greater than zero, instructs rte_acl_build() to:
- make sure that size of RT table wouldn't exceed given value.
- attempt to minimise number of tries in the table.
Setting it to zero maintains current behaviour.
That introduces a minor change in the public API, but I think the possible
performance gain is too big to ignore it.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Vector code reorganisation/deduplication:
To avoid maintaining two nearly identical implementations of calc_addr()
(one for SSE, another for AVX2), replace it with a new macro that suits
both SSE and AVX2 code-paths.
Also remove no needed any more MM_* macros.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reorganise SSE code-path a bit by moving lo/hi dwords shuffle
out from calc_addr().
That allows to make calc_addr() for SSE and AVX2 practically identical
and opens opportunity for further code deduplication.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Previous improvements made scalar method the fastest one
for tiny bunch of packets (< 4).
That allows us to remove specific vector code-path for small number of packets
(search_sse_2) and always use scalar method for such cases.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Introduce new classify() method that uses AVX2 instructions.
>From my measurements:
On HSW boards when processing >= 16 packets per call,
AVX2 method outperforms it's SSE counterpart by 10-25%,
(depending on the ruleset).
When build with the compilers that don't support AVX2 instructions,
make rte_acl_classify_avx2() do nothing and return an error.
At runtime, if librte_acl was build with the compiler that supports AVX2,
this method is selected as default one on HW that supports AVX2.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
New data type to manipulate 256 bit AVX values.
Rename field in the rte_xmm to keep common naming across SSE/AVX fields.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Move common check for input parameters up into rte_acl_classify_alg().
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Make classify_scalar to behave in the same way as it's vector counterpart:
move match check out of the inner loop, etc.
That makes scalar and vector code look more identical.
Plus it improves scalar code performance.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Right now we allocate indexes for all types of nodes, except MATCH,
at 'gen final RT table' stage.
For MATCH type nodes we are doing it at building temporary tree stage.
This is totally unnecessary and makes code more complex and error prone.
Rework the code and make MATCH indexes being allocated at the same stage
as all others.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Introduced division of whole 256 child transition enties
into 4 sub-groups (64 kids per group).
So 2 groups within the same node with identical children,
can use one set of transition entries.
That allows to compact some DFA nodes and get space savings in the RT table,
without any negative performance impact.
>From what I've seen an average space savings: ~20%.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
There was a bug at build phase that can cause matches beeing overwritten.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Current rule-wildness based heuristics can cause unnecessary splits of
the ruleset.
That might have negative performance effect:
more tries to traverse, bigger RT tables.
After removing it, on some test-cases with big rulesets (~10K)
observed ~50% speedup.
No difference for smaller rulesets.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Make data_indexes long enough to survive idle transitions.
That allows to simplify match processing code.
Also fix incorrect size calculations for data indexes.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Fix compilation with RTE_LIBRTE_ACL_STANDALONE=y
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
For ixgbe, when queue start fails, for example, mbuf allocate
failure, the device will still start successfully, which could be
an issue.
Add return status check of queue start to avoid this issue.
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
To follow up the comments from Pawel Wodkowski, remove this unnecessary check,
as check_mq_mode has already check the queue number in device configure stage,
if the queue number of vf is not correct, it will return error code and exit,
so it doesn't need check again here in device start stage (note: pf_host_configure
is called in device start stage).
Fixes: 42d2f78abc ("configure VF RSS")
Suggested-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
The link_status variable is not set when device is initialized.
This can lead to problems with link never being reported as up
if using some SFP modules where the link is instantly on.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The rte_eth_stats_get is the only API that should call the device
statistics function directly, and it already does a memset of the
resulting structure since commit 02331c16ec. Therefore doing
memset() in the driver is redundant and should be removed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[David: remove also in igbvf and pcap PMDs]
Acked-By: David Marchand <david.marchand@6wind.com>
Add missing extern 'C' decls in rte_ip_frag.h.
Fixes: 601e279df0 ("move fragmentation/reassembly headers into a library")
Signed-off-by: Marc Sune <marc.sune@bisdn.de>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
rte_power_freq_min function did not include "extern" keyword,
causing linking errors.
Fixes: 445c6528b5 ("power: common interface for guest and host")
Reported-by: Ildar Mustafin <imustafin@bk.ru>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ethernet device's data should contain the virtual device name for pcap port.
This name is correctly set by rte_eth_dev_allocate() at initialization time,
but it is directly lost.
Fixes: 83b4113693 ("ethdev: add unique name to devices")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When using core list argument to define which core to enable (ie -l) the
core_num field of the rte configuration is not updated the same way as using
coremask. This causes rte_lcore_num() to yield different value from the one
using coremask.
Fixes: d888cb8b96 ("add core list input format")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Structure rte_ethertype_filter is removed.
Following APIs are removed:
- rte_eth_dev_add_ethertype_filter
- rte_eth_dev_remove_ethertype_filter
- rte_eth_dev_get_ethertype_filter
It is replaced by filter_ctrl API.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch removes old functions which deal with ethertype filter in ixgbe driver.
It also defines ixgbe_dev_filter_ctrl which is binding to filter_ctrl API,
and ethertype filter can be dealt with through this new entrance.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch removes old functions which deal with ethertype filter in igb driver.
It also defines eth_igb_filter_ctrl which is binding to filter_ctrl API,
and ethertype filter can be dealt with through this new entrance.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
In commit 2fc8d6d the behaviour of function rte_is_power_of_2 was
changed to not return true for 0. memzone_reserve_aligned_thread_unsafe
and rte_malloc_socket both make the assumption that for align = 0
!rte_is_power_of_2(align) will return false. This patch adds a check
that align parameter is non-zero before doing the power of 2 check.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
[Thomas: use && operator instead of ternary ?: and fix precedence with parens]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF RSS.
The psrtype will determine how many queues the received packets will distribute to,
and the value of psrtype should depends on both facet: max VF rxq number which
has been negotiated with PF, and the number of rxq specified in config on guest.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Check mq mode for VMDq RSS, handle it correctly instead of returning an error;
Also remove the limitation of per pool queue number has max value of 1, because
the per pool queue number could be 2 or 4 if it is VMDq RSS mode;
The number of rxq specified in config will determine the mq mode for VMDq RSS.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Get the available Rx and Tx queue number when receiving IXGBE_VF_GET_QUEUES
message from VF.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Negotiate API version with VF when receiving the IXGBE_VF_API_NEGOTIATE message.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Put global register configuring out of loop for queue; also fix typo and indent.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
The lack of result checking of fscanf function, breaks compilation
for default "-Werror=unused-result" flag.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Replace d_thread_t with struct thread in nic_uio.
Ref: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196691
Quote:
"The d_thread_t typedef is a compat shim to support FreeBSD 4.x.
I'm planning to remove this shim from 11 and dpdk is very unlikely
to ever be ported to 4.x.
If it does it will need far more changes than just d_thread_t"
Reported-by: John Baldwin <jhb@freebsd.org>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This patch contains a fix for link bonding handling of vlan tagged packets in mode 3 and 5.
Currently xmit_slave_hash function misinterprets the PKT_RX_VLAN_PKT flag to mean that
there is a vlan tag within the packet when in actually means that there is a valid entry
in the vlan_tci field in the mbuf.
- Fixed VLAN tag support in hashing functions.
- Adds support for TCP in layer 4 header hashing.
- Splits transmit hashing function into separate functions for each policy to
reduce branching and to make the code clearer.
- Fixed incorrect flag set in test application packet generator.
Test report: http://dpdk.org/ml/archives/dev/2015-January/010792.html
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Tested-by: SunX Jiajia <sunx.jiajia@intel.com>
When vfio module is not loaded when kernel support vfio feature,
the routine still try to open the container to get file
description.
This action is not safe, and of course got error messages:
EAL: Detected 40 lcore(s)
EAL: unsupported IOMMU type!
EAL: VFIO support could not be initialized
EAL: Setting up memory...
This may make user confuse, this patch make it reasonable
and much more smooth to user.
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The read/seek/close stub functions are unnecessary on the
log stream. Per glibc fopencookie man page:
cookie_read_function_t *read
If *read is a null pointer, then reads from the custom stream
always return end of file.
cookie_seek_function_t *seek
If *seek is a null pointer, then it is not possible to perform
seek operations on the stream.
cookie_close_function_t *close
If *close is NULL, then no special action is performed when the
stream is closed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When scanning the hugetlbfs maps search only for the DPDK maps.
This will allow the application create its own hugetlbfs mappings
and use the DPDK facilities on the same hugetlbfs mount point.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
rte_is_power_of_2 returns true for 0 and 0 is not power_of_2.
Fix by checking for n.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
This patch fixes checking the link state of a virtual function. If the
state has already been checked, it does not need to be checked
again. Previously, get_link_status in the ixgbe_hw struct was used to
track if the information had already been retrieved, but this field
was always set to false (signifying that the information was
up-to-date). The problem was introduced by commit 8ef32003 which was
part of a patch set to update the ixgbe portion of the PMD. This patch
does not break consistency with the ixgbevf driver. Instead, it fixes
the problem at the level of DPDK.
Applications that rely on the reported link speed could fail without
this patch. The qos_sched example application provided with DPDK did
not run when virtual functions were used. The output for this example
application is shown below:
EAL: Error - exiting with code: 1
Cause: Unable to config sched subport 0, err=-2
The problem and the effect of the patch can been seen by running the
l2fwd example application using the following command:
sudo ./build/l2fwd -c 0x3 -n 4 -- -p 0x3 -T 0
Before the patch has been applied (with both links up):
...
Checking link statusdone
Port 0 Link Up - speed 100 Mbps - half-duplex
Port 1 Link Up - speed 100 Mbps - half-duplex
L2FWD: entering main loop on lcore 1
...
After the patch has been applied (with both links up):
...
Checking link statusdone
Port 0 Link Up - speed 10000 Mbps - full-duplex
Port 1 Link Up - speed 10000 Mbps - full-duplex
L2FWD: entering main loop on lcore 1
...
Before the patch has been applied (with link 0 down, link 1 up):
...
Checking link statusdone
Port 0 Link Up - speed 100 Mbps - half-duplex
Port 1 Link Up - speed 100 Mbps - half-duplex
L2FWD: entering main loop on lcore 1
...
After the patch has been applied (with link 0 down, link 1 up):
...
Checking link status............................................................
..............................done
Port 0 Link Down
Port 1 Link Up - speed 10000 Mbps - full-duplex
...
Signed-off-by: Balazs Nemeth <balazs.nemeth@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
EAL: probe driver: 8086:10fb rte_ixgbe_pmd
EAL: PCI memory mapped at 0x7f18c2a00000
EAL: PCI memory mapped at 0x7f18c2a80000
Segmentation fault (core dumped)
This is introduced by commit: 46bc9d75
ixgbe: fix multi-process support
When start primary process with command line:
./app/test/test -n 1 -c ffff -m 64
then start the second one:
./app/test/test -n 1 --proc-type=secondary --file-prefix=rte
This segment-fault will occur.
Root cause is test app on primary process only starts device, but
the queue need initialized by manually command line.
So the tx queue is still NULL when secondary process startup.
Reported-by: Yong Liu <yong.liu@intel.com>
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This patch removes the interrupt registration code which was under the flag
VFIO_PRESENT and relies on the rte_lib code for the same.
This also ignores the initial trigger of ISR from the lib.
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
In rte_pmd_init_internals, we are mapping memory but not released
if error occurs it could produce memory leak.
Add unmmap function to release memory.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
In rte_eth_af_packet.c we are we are missing NULL pointer
checks after calls to allocate memory for queues.
Add checking NULL pointer and error handling.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Back in commit aaa662e75c ("cmdline: fix overflow on bsd"),
the author failed to fixup a call to cmdline_parse_etheraddr in xenvirt.
This patch makes the needed correction to avoid a build break.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch fixes the issue whereby when using userspace vhost ports
in the context of vSwitching, the name provided to the hypervisor/QEMU
of the vhost tap device needs to be exposed in the library, in order
for the vSwitch to be able to direct packets to the correct device.
This patch introduces an 'ifname' member to the virtio-net structure
which is populated with the tap device name when QEMU is brought up
with a vhost device.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Anthony Fee <anthonyx.fee@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
From CentOS 6.6, function skb_set_hash is introduced, this breaks
the previous assumption. So modify RHEL_RELEASE_VERSION from 7.0
to 6.6 to fix build for rte_kni.ko.
Related mail from Barak Enat:
http://dpdk.org/ml/archives/dev/2014-December/010124.html
building error likes:
CC [M] lib/librte_eal/linuxapp/kni/e1000_82575.o
In file included from lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_osdep.h:41,
from lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_hw.h:31,
from lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_api.h:31,
from lib/librte_eal/linuxapp/kni/e1000_82575.c:38:
lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h:3870: error: conflicting types for ‘skb_set_hash’
include/linux/skbuff.h:620: note: previous definition of ‘skb_set_hash’ was here
Reported-by: Barak Enat <barak@saguna.net>
Signed-off-by: Jincheng Miao <jincheng.miao@gmail.com>
Compile warning which is treated as error occurs on Oracle Linux
(kernel 2.6.39, gcc 4.4.7) as below, or RHEL, CentOS. Aliasing
'struct i40e_aqc_debug_reg_read_write' should be avoided. Use the
elements inside that structure directly can fix the issue.
lib/librte_pmd_i40e/i40e_ethdev.c: In function 'eth_i40e_dev_init':
lib/librte_pmd_i40e/i40e_ethdev.c:5318: error: dereferencing pointer
'cmd' does break strict-aliasing rules
lib/librte_pmd_i40e/i40e_ethdev.c:5314: note: initialized from here
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Seen on RHEL-6.5:
lib/librte_eal/linuxapp/kni/kni_vhost.c:222:
error: ‘struct socket’ has no member named ‘wq’
lib/librte_eal/linuxapp/kni/kni_vhost.c:313:
error: implicit declaration of function ‘sk_sleep’
lib/librte_eal/linuxapp/kni/kni_vhost.c:313:
error: passing argument 1 of ‘__wake_up’ makes pointer from integer without a cast
include/linux/wait.h:146: note: expected ‘struct wait_queue_head_t *’
but argument is of type ‘int’
lib/librte_eal/linuxapp/kni/kni_vhost.c:580:
error: assignment makes pointer from integer without a cast
RHEL6.5 kernel is based on 2.6.32. But there are two changing
from 2.6.35:
1. socket struct is changed
It wrappered previous wait_queue_head_t of socket to
struct socket_wq. So for the kernel older than 2.6.35, we should
directly use socket->wait instead.
2. new function sk_sleep()
This function is implemented from 2.6.35 to obtain wait queue
from struct sock. This patch adds a macro in kni/compat.h
to be compatible with older kernels.
Patch is tested in RHEL6.5 and RHEL7.0 with:
CONFIG_RTE_LIBRTE_KNI=y
CONFIG_RTE_KNI_KO_DEBUG=y
CONFIG_RTE_KNI_VHOST=y
CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
CONFIG_RTE_KNI_VHOST_VNET_HDR_EN=y
CONFIG_RTE_KNI_VHOST_DEBUG_RX=y
CONFIG_RTE_KNI_VHOST_DEBUG_TX=y
Signed-off-by: Jincheng Miao <jmiao@redhat.com>
In commit 59d0ecdbf0 ("MTU accessors"),
max_frame_size was replaced with mtu.
Default size is ETHER_MTU = 1500.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The cleanup code on error checks for *internals being NULL only after
using the pointer to perform other cleanup. Fix this by moving the
clean-up based on the pointer inside the check for NULL.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Since commit fbde27f19a "get default Rx/Tx configuration from dev info",
a default RX/TX configuration can be used for all PMDs.
In case of vmxnet3, the whole structure was zeroed and not filled out.
The PMD does not support multi segments or offload functions,
so txq_flags should have those flags set.
Test report: http://dpdk.org/ml/archives/dev/2014-December/009933.html
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tested-by: Xiaonan Zhang <xiaonanx.zhang@intel.com>
On X710, performance number is far from the expectation on recent
firmware versions. The fix for this issue may not be integrated in
the following firmware version. So the workaround in software driver
is needed. It needs to modify the initial values of 3 internal only
registers. Note that the workaround can be removed when it is fixed
in firmware in the future.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Add missing setup for X540 MAC type when setting up VF.
Additional check exists in Linux driver but not in DPDK.
Signed-off-by: Bill Hong <bhong@brocade.com>
Signed-off-by: Stephen Hemminger <shemming@brocade.com>