This script looks for types, macros and functions in header files using
compilation options found in the environment (CC, CFLAGS, CPPFLAGS) to
define feature macros in a generated header.
Useful in combination with external headers that do not provide such macros.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Main code changes:
1. Differentiate architectural features based on CPU flags
a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
b. Implement separated copy flow specifically optimized for target architecture
2. Rewrite the memcpy function "rte_memcpy"
a. Add store aligning
b. Add load aligning based on architectural features
c. Put block copy loop into inline move functions for better control of instruction order
d. Eliminate unnecessary MOVs
3. Rewrite the inline move functions
a. Add move functions for unaligned load cases
b. Change instruction order in copy loops for better pipeline utilization
c. Use intrinsics instead of assembly code
4. Remove slow glibc call for constant copies
Test report: http://dpdk.org/ml/archives/dev/2015-January/011848.html
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Main code changes:
1. Added more typical data points for a thorough performance test
2. Added unaligned test cases since it's common in DPDK usage
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Removed unnecessary test cases for base move functions
since the function "func_test" covers them all.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
VTA is for debugging only, it increases compile time and binary size,
especially when there're a lot of inlines.
So disable it since memcpy test contains a lot of inline calls.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
- API rte_timer_reset() should return -1 when the timer is in the
RUNNING or CONFIG state. Instead, it ignores the return value of
internal function __rte_timer_reset() and always returns 0.
We change rte_timer_reset() to return the value returned by
__rte_timer_reset().
- Enhance timer stress test 2 to report how many timer reset
collisions occur, i.e., how many times rte_timer_reset() fails
due to a timer being in the CONFIG state.
Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In rte_timer_reset_sync(), insert rte_pause() into loop that waits
for rte_timer_reset() to succeed.
Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This app demonstrate usage of new rte_jobstats library.
It is basically the orginal l2fwd with following modifications to met
library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in original application.
- stats is moved to rte_alarm callback to not introduce overhead of
printing.
- stats are expanded to show rte_jobstats statistics.
- added new parameter '-l' to automatic thousands separator.
Comparing original l2fwd and l2fwd-jobstats apps will show approach what
is needed to properly write own application with rte_jobstats
measurements.
New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of rte_jobstats
library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.
Fixes: 2caeb8c014 ("examples/l2fwd-jobstats: new example")
[Thomas: files were missing in the previous commit]
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
As per_lcore__socket_id and rte_sys_gettid are missing in version map,
it causes compiling error when CONFIG_RTE_BUILD_SHARED_LIB is enabled.
Fixes: ef76436c68 ("eal: get unique thread id")
Fixes: 9e29251b2a ("eal: thread affinity API")
Reported-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: John McNamara <john.mcnamara@intel.com>
Since DPDK now has support for the in-tree uio_pci_generic driver,
update the programmers guide document to reference this module, and to use it
in preference to the igb_uio driver, which is DPDK-specific.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Since DPDK now has support for the in-tree uio_pci_generic driver,
update the GSG document to reference this module, and to use it
in preference to the igb_uio driver, which is DPDK-specific.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
In pci_uio_map_resource we check that we are in a primary process
before calling pci_uio_set_bus_master. However, there is already
an earlier check which means that we are always in a primary instance
at this point in the code, so the check can be removed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Rather than scanning the resource file in sysfs a second time, we
can pull the information on physical addresses of BARs from the
pci resource information already present in the dev structure.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Instead of distinguishing the BAR mappings via offset within a single
file, originally /dev/uioX, switch to mapping each individual bar via
the appropriately numbered resourceX file.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This app demonstrate usage of new rte_jobstats library.
It is basically the orginal l2fwd with following modifications to met
library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in original application.
- stats is moved to rte_alarm callback to not introduce overhead of
printing.
- stats are expanded to show rte_jobstats statistics.
- added new parameter '-l' to automatic thousands separator.
Comparing original l2fwd and l2fwd-jobstats apps will show approach what
is needed to properly write own application with rte_jobstats
measurements.
New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of rte_jobstats
library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This library provide API to measure time spend in particular parts of
code and to calculate optimal polling time.
To calculate a those statistics application code need to be divided into
parts (called jobs) that do something. It is up to application to decide
what is considered a job.
Series of jobs must be surrounded with the rte_jobstats_context_start()
and rte_jobstats_context_finish() calls. After that, jobs might be
started. Each job must be surrounded with rte_jobstats_start() and
rte_jobstats_finish() calls.
After job finishes its execution, period in which it should be called
again is adjusted. It might be used to minimize time wasted on
unnecessary polls/calls. Adjustment is based on data provided by job
itself (ex: number of packets it processed).
After all jobs in serie are executed fallowing statistics are updated
and might be used by application. Statistics can be reset. Some of
provided statistic data:
- total/min/max execution - time spent in executing jobs.
- total/min/max management - time spent outside execution area. This
value might be used to measure overhead of scheduling jobs. This time
also contains overhead of rte_jobstats library itself.
- number of loops that executed at least one job
- executed jobs
- time when statistics were reset.
Each job provide total/min/max execution time and execution count
statistics.
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
In test_sched, we are missing NULL pointer checks after create_mempool()
and rte_pktmbuf_alloc(). Add in these checks using TEST_ASSERT_NOT_NULL macros.
VERIFY macro was removed and replaced by standard test ASSERTS from "test.h" header.
This provides additional information to track when the failure occurred.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Following commit c07691ae10, an implicit change has been done in the
devargs API.
This triggers problem in virtual pmds that did not check for parameters
validity as it was implicitely valid.
Fix this by restoring the empty argument as "" and add a note in the api.
Restore associated tests.
Fixes: c07691ae10 ("devargs: remove limit on parameters length")
Reported-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Add a sched_yield() syscall if the thread spins for too long,
waiting other thread to finish its operations on the ring.
That gives pre-empted thread a chance to proceed and finish
with ring enqueue/dequeue operation.
The purpose is to reduce contention on the ring.
By ring_perf_test, it doesn't shows additional perf penalty.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
ring debug stat won't take care non-EAL thread.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.
Haven't found significant performance decrease by mempool_perf_test.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. – dynamically created thread will be able to reset/stop timer for lcore thread,
but it will be not allowed to setup timer for itself or another non-lcore thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return straightway.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In non-EAL thread, lcore_id always be LCORE_ID_ANY.
It can't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
For those non-EAL thread, *_lcore_id* is invalid and probably larger than RTE_MAX_LCORE.
The patch adds the check and allows only EAL thread using EAL per thread log level and log type.
Others shares the global log level.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Set _lcore_id and _socket_id to (-1) by default.
For those non EAL thread, _lcore_id shall always be LCORE_ID_ANY.
The libraries using _lcore_id as index need to take care.
_socket_id always be SOCKET_ID_ANY until the thread changes the affinity
by rte_thread_set_affinity().
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add check for rte_socket_id(), avoid get unexpected return like (-1).
By using rte_malloc_socket(), socket id is assigned by socket_arg.
If socket_arg set to SOCKET_ID_ANY, it expects to use the socket id to which the current cores belongs.
As the thread may affinity on a cpuset, the cores in the cpuset may belongs to different NUMA nodes.
The value of _socket_id probably be SOCKET_ID_ANY(-1), the case is not expected in origin malloc_get_numa_socket().
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
1. add two TLS *_socket_id* and *_cpuset*
2. add one internal API, eal_cpu_socket_id/eal_thread_dump_affinity
3. add two public API, rte_thread_set/get_affinity
4. update EAL version map for EAL public API
The API works for both EAL thread and non EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successfully set the cpu affinity.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It defines eal_cpu_socket_id() which exposing the origin private cpu_socket_id().
The function is only used inside EAL. It returns socket_id of the specified cpu_id.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The problem is that strnlen() here may return invalid value with 32bit icc.
(actually it returns it’s second parameter,e.g: sysconf(_SC_ARG_MAX)).
It starts to manifest hwen max_len parameter is > 2M and using icc –m32 –O2 (or above).
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
It supports one new eal long option '--lcores' for EAL thread cpuset assignment.
The format pattern:
--lcores='<lcores[@cpus]>[<,lcores[@cpus]>...]'
lcores, cpus could be a single digit/range or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.
e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
lcore 0 runs on cpuset 0x41 (cpu 0,6)
lcore 1 runs on cpuset 0x2 (cpu 1)
lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
lcore 6 runs on cpuset 0x41 (cpu 0,6)
lcore 7 runs on cpuset 0x80 (cpu 7)
lcore 8 runs on cpuset 0x100 (cpu 8)
Test report: http://dpdk.org/ml/archives/dev/2015-February/013383.html
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Qun Wan <qun.wan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.
It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some macros already been defined by freebsd 'sys/param.h'.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Commit 71f0ab1849 broke compilation
on some versions of Debian and Ubuntu where gcc has been modified
to only emit MAJOR.MINOR part of the version from 'gcc -dumpversion'.
Drop the micro-version from gcc version comparisons to work around
this, it wasn't being used for anything anyway.
Fixes: 71f0ab1849 ("mk: rework gcc version detection to permit versions newer than 4.x")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Help is printed with -h or --help.
Help is also printed for an unknown option.
This was broken since the rework of options.
Fixes: 489a9d6c9f ("merge bsd and linux common options parsing")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Options listing in usage help was a mess.
The main usage line is fixed and shorter.
The options in usage output are logically sorted (cpu/mem/dev/proc),
aligned and lightly reworded.
The options in declarations are alphabetically sorted.
Code in swith statement is not moved.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Drivers should be silent on boot.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Device driver should log via DPDK log, not to printf which is
sends to /dev/null in a daemon application.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
[Thomas: include rte_log.h]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Separately comparing major and minor versions becomes seriously clumsy
when with major version changes, convert the entire version string into
a numeric value (ie 4.6.0 becomes 460 and 5.0.0 becomes 500) and use
that for comparisons, eliminate unnecessary negations while at it.
This makes the comparisons simpler, more obvious and makes gcc 5.0
naturally recognized at least as capable as newest 4.x.
This three-digit scheme would run into trouble if gcc ever went to
two-digit version segments, but that hasn't happened in the last 10+
years so it seems like a safe assumption.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>