numam-dpdk

Author	SHA1	Message	Date
Gage Eads	e75bc77f98	mempool/stack: add lock-free stack mempool handler This commit adds support for lock-free (linked list based) stack mempool handler. In mempool_perf_autotest the lock-based stack outperforms the lock-free handler for certain lcore/alloc count/free count combinations, however: - For applications with preemptible pthreads, a standard (lock-based) stack's worst-case performance (i.e. one thread being preempted while holding the spinlock) is much worse than the lock-free stack's. - Using per-thread mempool caches will largely mitigate the performance difference. Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4, running on isolcpus cores with a tickless scheduler. The lock-based stack's rate_persec was 0.6x-3.5x the lock-free stack's. Signed-off-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2019-04-04 22:06:16 +02:00
Gage Eads	3340202f59	stack: add lock-free implementation This commit adds support for a lock-free (linked list based) stack to the stack API. This behavior is selected through a new rte_stack_create() flag, RTE_STACK_F_LF. The stack consists of a linked list of elements, each containing a data pointer and a next pointer, and an atomic stack depth counter. The lock-free push operation enqueues a linked list of pointers by pointing the tail of the list to the current stack head, and using a CAS to swing the stack head pointer to the head of the list. The operation retries if it is unsuccessful (i.e. the list changed between reading the head and modifying it), else it adjusts the stack length and returns. The lock-free pop operation first reserves num elements by adjusting the stack length, to ensure the dequeue operation will succeed without blocking. It then dequeues pointers by walking the list -- starting from the head -- then swinging the head pointer (using a CAS as well). While walking the list, the data pointers are recorded in an object table. This algorithm stack uses a 128-bit compare-and-swap instruction, which atomically updates the stack top pointer and a modification counter, to protect against the ABA problem. The linked list elements themselves are maintained in a lock-free LIFO list, and are allocated before stack pushes and freed after stack pops. Since the stack has a fixed maximum depth, these elements do not need to be dynamically created. Signed-off-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2019-04-04 22:06:16 +02:00
Gage Eads	05d3b5283c	stack: introduce stack library The rte_stack library provides an API for configuration and use of a bounded stack of pointers. Push and pop operations are MT-safe, allowing concurrent access, and the interface supports pushing and popping multiple pointers at a time. The library's interface is modeled after another DPDK data structure, rte_ring, and its lock-based implementation is derived from the stack mempool handler. An upcoming commit will migrate the stack mempool handler to rte_stack. Signed-off-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2019-04-04 22:06:16 +02:00
Fan Zhang	3ed37e0934	doc: update supported algorithms in IPsec guide This patch updates the ipsec library programmer's guide with the additional algorithms which are now supported. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-04-03 13:50:58 +02:00
Dharmik Thakkar	f401363d98	hash: support lock-free extendable bucket This patch enables lock-free read-write concurrency support for extendable bucket feature. Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2019-04-03 20:52:35 +02:00
Anatoly Burakov	1e3380a2f4	mem: do not use lockfiles for single file segments mode Due to internal glibc limitations [1], DPDK may exhaust internal file descriptor limits when using smaller page sizes, which results in inability to use system calls such as select() by user applications. Single file segments option stores lock files per page to ensure that pages are deleted when there are no more users, however this is not necessary because the processes will be holding onto the pages anyway because of mmap(). Thus, removing pages from the filesystem is safe even though they may be used by some other secondary process. As a result, single file segments mode no longer stores inordinate amounts of segment fd's, and the above issue with fd limits is solved. However, this will not work for legacy mem mode. For that, simply document that using bigger page sizes is the only option. [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-04-02 16:07:25 +02:00
Shahaf Shuler	c33a675b62	bus: introduce device level DMA memory mapping The DPDK APIs expose 3 different modes to work with memory used for DMA: 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers. 2. Use memory allocated by the user and register to the DPDK memory systems. Upon registration of memory, the DPDK layers will DMA map it to all needed devices. After registration, allocation of this memory will be done with rte_malloc APIs. 3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory (e.g. avoid the rte_malloc header). The user should create a memory, register it through rte_extmem_register API, and call DMA map function in order to register such memory to the different devices. The scope of the patch focus on #3 above. Currently the only way to map external memory is through VFIO (rte_vfio_dma_map). While VFIO is common, there are other vendors which use different ways to map memory (e.g. Mellanox and NXP). The work in this patch moves the DMA mapping to vendor agnostic APIs. Device level DMA map and unmap APIs were added. Implementation of those APIs was done currently only for PCI devices. For PCI bus devices, the pci driver can expose its own map and unmap functions to be used for the mapping. In case the driver doesn't provide any, the memory will be mapped, if possible, to IOMMU through VFIO APIs. Application usage with those APIs is quite simple: allocate memory * call rte_extmem_register on the memory chunk. * take a device, and query its rte_device. * call the device specific mapping function for this device. Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap APIs, leaving the rte device APIs as the preferred option for the user. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-03-30 16:48:56 +01:00
Liron Himi	ff1e35fb5f	kni: calculate MTU from mbuf size - mbuf_size and mtu are now being calculated according to the given mb-pool. - max_mtu is now being set according to the given mtu the above two changes provide the ability to work with jumbo frames Signed-off-by: Liron Himi <lironh@marvell.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-30 00:59:59 +01:00
David Marchand	86dc5089e6	doc: fix examples in bonding guide Removed incorrect space character and fixed PCI addresses. Fixes: fc1f2750a3ec ("doc: programmers guide") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-21 21:09:16 +01:00
Pavan Nikhilesh	1534cc6ab1	doc: add notes about eventdev producer/consumer dependency EventDev i.e consumer needs to be started before starting the event producers. Update documentation of EventDev and EventDev adapters. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Reviewed-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Reviewed-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2019-03-15 06:46:50 +01:00
Bruce Richardson	218c4e68c1	mk: use linux and freebsd in config names Rather than using linuxapp and bsdapp everywhere, we can change things to use the, more readable, terms "linux" and "freebsd" in our build configs. Rather than renaming the configs we can just duplicate the existing ones with the new names using symlinks, and use the new names exclusively internally. ["make showconfigs" also only shows the new names to keep the list short] The result is that backward compatibility is kept fully but any new builds or development can be done using the newer names, i.e. both "make config T=x86_64-native-linuxapp-gcc" and "T=x86_64-native-linux-gcc" work. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 23:05:06 +01:00
Bruce Richardson	91d7846ce6	eal/linux: rename linuxapp to linux The term "linuxapp" is a legacy one, but just calling the subdirectory "linux" is just clearer for all concerned. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 17:31:13 +01:00
Bruce Richardson	25c99fbd68	eal/bsd: rename bsdapp to freebsd The term "bsdapp" is a legacy one, but just calling the subdirectory "freebsd" is just clearer for all concerned. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 17:30:20 +01:00
David Marchand	c3568ea376	eal: restrict control threads to startup CPU affinity Spawning the ctrl threads on anything that is not part of the eal coremask is not that polite to the rest of the system, especially when you took good care to pin your processes on cpu resources with tools like taskset (linux) / cpuset (freebsd). Rather than introduce yet another eal options to control on which cpu those ctrl threads are created, let's take the startup cpu affinity as a reference and remove the eal coremask from it. If no cpu is left, then we default to the master core. The cpuset is computed once at init before the original cpu affinity is lost. Introduced a RTE_CPU_AND macro to abstract the differences between linux and freebsd respective macros. Examples in a 4 cores FreeBSD vm: $ ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1057 PID TID COMM TDNAME CPU CSID CPU MASK 1057 100131 testpmd - 2 1 2 1057 100140 testpmd eal-intr-thread 1 1 0-1 1057 100141 testpmd rte_mp_handle 1 1 0-1 1057 100142 testpmd lcore-slave-3 3 1 3 $ cpuset -l 1,2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1061 PID TID COMM TDNAME CPU CSID CPU MASK 1061 100131 testpmd - 2 2 2 1061 100144 testpmd eal-intr-thread 1 2 1 1061 100145 testpmd rte_mp_handle 1 2 1 1061 100147 testpmd lcore-slave-3 3 2 3 $ cpuset -l 2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1065 PID TID COMM TDNAME CPU CSID CPU MASK 1065 100131 testpmd - 2 2 2 1065 100148 testpmd eal-intr-thread 2 2 2 1065 100149 testpmd rte_mp_handle 2 2 2 1065 100150 testpmd lcore-slave-3 3 2 3 Fixes: d651ee4919cd ("eal: set affinity for control threads") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2019-03-07 19:21:28 +01:00
Thomas Monjalon	5a10413c58	doc: fix PCI whitelist typo in prog guide The placeholder for PCI address should be named DBDF which stands for Domain/Bus/Device/Function. Fixes: 33af337773ac ("ethdev: add common devargs parser") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Rami Rosen <ramirose@gmail.com>	2019-03-05 11:57:33 +00:00
Rami Rosen	f959f1148a	doc: remove reference to rte.doc.mk in programmers guide This patch removes the reference to rte.doc.mk in DPDK programmers guide. Fixes: ee801f6cc7b8 ("mk: clean dead doc rules") Cc: stable@dpdk.org Signed-off-by: Rami Rosen <ramirose@gmail.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-05 10:42:03 +00:00
Tiwei Bie	5c6c1480b3	doc: improve vhost zero copy guide Highlight that vhost zero copy mbufs should be consumed as soon as possible. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2019-03-01 18:17:36 +01:00
Bruce Richardson	a9de470cc7	test: move to app directory Since all other apps have been moved to the "app" folder, the autotest app remains alone in the test folder. Rather than having an entire top-level folder for this, we can move it back to where it all started in early versions of DPDK - the "app/" folder. This move has a couple of advantages: * This reduces clutter at the top level of the project, due to one less folder. * It eliminates the separate build task necessary for building the autotests using make "make test-build" which means that developers are less likely to miss something in their own compilation tests * It re-aligns the final location of the test binary in the app folder when building with make with it's location in the source tree. For meson builds, the autotest app is different from the other apps in that it needs a series of different test cases defined for it for use by "meson test". Therefore, it does not get built as part of the main loop in the app folder, but gets built separately at the end. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-02-26 15:29:27 +01:00
Thomas Monjalon	0e0fb9431c	doc: add references to flow isolated mode in NICs guide Some drivers (mlx, mvpp2, sfc) support the flow isolated mode, but the feature was not advertised. A reference to the feature description is added for each driver. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-01-31 18:41:07 +01:00
Thomas Monjalon	610f8f441a	doc: remove useless anchor for flow API guide A doc page (.rst file) can be referenced with :doc: syntax instead of :ref: to .. anchor. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-01-31 18:41:07 +01:00
David Hunt	fa77f80f49	doc: fix references in power management guide In the References section in the Power Management overview, both links pointed to the same l3fwd-power app. Fix the links so that one points to l3fwd-power, and the other points to the vm_power_manager sample app. Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>	2019-01-20 13:17:48 +01:00
David Marchand	e5062369c1	doc: remove file listings No need to keep those file listings, they are very likely to become outdated. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>	2019-01-20 13:08:50 +01:00
Jiayu Hu	5bd5f7b3ae	doc: add GRO limitations in programmers guide This patch adds GRO limitations in the programmer guide. Fixes: 2c900d09055e ("doc: add GRO guide") Cc: stable@dpdk.org Signed-off-by: Jiayu Hu <jiayu.hu@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-01-17 22:44:06 +01:00
Yong Wang	5036034924	doc: fix a typo in power management guide This patch fixes a typo in programmer's guide. It should be Frequency, not Fequence. Fixes: 450f0791312c ("power: add traffic pattern aware power control") Cc: stable@dpdk.org Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>	2019-01-15 02:40:41 +01:00
Dekel Peled	e76d76b27b	doc: fix MAC address rewrite actions in prog guide This patch fixes a typo in SET_MAC_DST action description. It also adds restriction note for set MAC src/dst actions description. Fixes: 15dbcdaada77 ("ethdev: add generic MAC address rewrite actions") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>	2019-01-14 17:44:29 +01:00
Konstantin Ananyev	9ef6cb1a15	doc: add IPsec library guide Add IPsec library guide and update release notes. Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	b1d978fc7b	cryptodev: add opaque data field to symmetric session This patch adds a opaque data field to cryptodev symmetric session. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	5d6c73dd59	cryptodev: add reference count to session private data This patch adds a refcnt field to every session private data in the cryptodev symmetric session. The counter is used to prevent freeing symmetric session blindly before it is not cleared by every type of crypto device in use. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	9e5f5ecb5e	cryptodev: add user data size to symmetric session This patch adds a user_data_sz field to cryptodev symmetric session. The field is used to check if reading or writing the session's user data field is eligible. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	e764cd72a9	cryptodev: update symmetric session structure This patch updates the rte_cryptodev_sym_session structure for cryptodev library. The updates include a changed session private data array and an added nb_drivers field. They are used to calculate the correct session header size and ensure safe access of the session private data. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	1d6f89885e	cryptodev: add sym session mempool create This patch adds a new API "rte_cryptodev_sym_session_pool_create()" to cryptodev library. All applications are required to use this API to create sym session mempool as it adds private data and nb_drivers information to the mempool private data. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	725d2a7fbf	cryptodev: change queue pair configure structure This patch changes the cryptodev queue pair configure structure to enable two mempool passed into cryptodev PMD simutaneously. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Anatoly Burakov	41328c404f	doc: remove note on memory mode limitation in multi-process Memory mode flags are now shared between primary and secondary processes, so the in documentation about limitations is no longer necessary. Fixes: 64cdfc35aaad ("mem: store memory mode flags in shared config") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 23:13:58 +01:00
Anatoly Burakov	bed7941886	mem: allow usage of non-heap external memory in multiprocess Add multiprocess support for externally allocated memory areas that are not added to DPDK heap (and add relevant doc sections). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:14:55 +01:00
Anatoly Burakov	950e8fb4e1	mem: allow registering external memory areas The general use-case of using external memory is well covered by existing external memory API's. However, certain use cases require manual management of externally allocated memory areas, so this memory should not be added to the heap. It should, however, be added to DPDK's internal structures, so that API's like ``rte_virt2memseg`` would work on such external memory segments. This commit adds such an API to DPDK. The new functions will allow to register and unregister externally allocated memory areas, as well as documentation for them. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:14:55 +01:00
Jim Harris	476c847ab6	malloc: add option --match-allocations SPDK uses the rte_mem_event_callback_register API to create RDMA memory regions (MRs) for newly allocated regions of memory. This is used in both the SPDK NVMe-oF target and the NVMe-oF host driver. DPDK creates internal malloc_elem structures for these allocated regions. As users malloc and free memory, DPDK will sometimes merge malloc_elems that originated from different allocations that were notified through the registered mem_event callback routine. This results in subsequent allocations that can span across multiple RDMA MRs. This requires SPDK to check each DPDK buffer to see if it crosses an MR boundary, and if so, would have to add considerable logic and complexity to describe that buffer before it can be accessed by the RNIC. It is somewhat analagous to rte_malloc returning a buffer that is not IOVA-contiguous. As a malloc_elem gets split and some of these elements get freed, it can also result in DPDK sending an RTE_MEM_EVENT_FREE notification for a subset of the original RTE_MEM_EVENT_ALLOC notification. This is also problematic for RDMA memory regions, since unregistering the memory region is all-or-nothing. It is not possible to unregister part of a memory region. To support these types of applications, this patch adds a new --match-allocations EAL init flag. When this flag is specified, malloc elements from different hugepage allocations will never be merged. Memory will also only be freed back to the system (with the requisite memory event callback) exactly as it was originally allocated. Since part of this patch is extending the size of struct malloc_elem, we also fix up the malloc autotests so they do not assume its size exactly fits in one cacheline. Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 13:01:08 +01:00
Tiwei Bie	e9436f54af	pdump: remove deprecated APIs We already changed to use generic IPC in pdump since below commit: commit 660098d61f57 ("pdump: use generic multi-process channel") The `rte_pdump_set_socket_dir()`, the `path` parameter of `rte_pdump_init()` and the `enum rte_pdump_socktype` have been deprecated since then. This commit removes these deprecated APIs and also bumps the pdump ABI. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com>	2018-12-19 01:25:56 +01:00
Thomas Monjalon	43d162bc16	fix dpdk.org URLs The DPDK website has a new URL scheme since June 2018. Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-11-26 20:19:24 +01:00
Yipeng Wang	8747682a69	doc: improve hash library guide This commit improves the programmer guide of the hash library to be more accurate on new features introduced in 18.11. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Sameh Gobriel <sameh.gobriel@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-11-25 11:09:03 +01:00
Thomas Monjalon	e3e363a2d1	doc: remove PCI-specific details from EAL guide The PCI bus is an independent driver and not part of EAL as it was in the early days. EAL must be understood as a generic layer. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-11-23 02:57:20 +01:00
Thomas Monjalon	55a76e8261	doc: remove lists of figure and table references The references to the figures and tables in the index are not maintained. It is probably better to have no list than an incomplete list. Anyway the usage of such figures list is not obvious. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-11-23 02:57:05 +01:00
Reshma Pattan	999aa0635c	doc: update timestamp validity for latency measurement Updated the doc on how packets are marked to identify their timestamp as valid and considered for latency measurement. Suggested-by: Bao-Long Tran <longtb5@viettel.com.vn> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>	2018-11-19 01:35:17 +01:00
Eric Zhang	075b182b54	eal: force IOVA to a particular mode This patch uses EAL option "--iova-mode" to force the IOVA mode to a particular value. There exists virtual devices that are not directly attached to the PCI bus, and therefore the auto detection of the IOVA mode based on probing the PCI bus and IOMMU configuration may not report the required addressing mode. Using the EAL option permits the mode to be explicitly configured in this scenario. Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>	2018-10-29 00:01:05 +01:00
Dekel Peled	1193403422	ethdev: fix metadata documentation Previous patch introduced the Tx metadata feature, with unnecessary restrictions on data entry. This fix updates the documentation, removing the data entry restrictions on metadata item. Fixes: 839b20be0e9b ("ethdev: support metadata as flow rule criteria") Acked-by: Ori Kam <orika@mellanox.com> Signed-off-by: Dekel Peled <dekelp@mellanox.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	c9cce42876	ethdev: remove deprecated attach/detach functions The hotplug attach/detach features are implemented in EAL layer. There is a new ethdev iterator to retrieve ports from ethdev layer. As announced earlier, the (buggy) ethdev functions are now removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Ori Kam	7307cf6333	ethdev: add raw encapsulation action Currenlty the encap/decap actions only support encapsulation of VXLAN and NVGRE L2 packets (L2 encapsulation is where the inner packet has a valid Ethernet header, while L3 encapsulation is where the inner packet doesn't have the Ethernet header). In addtion the parameter to to the encap action is a list of rte items, this results in 2 extra translation, between the application to the actioni and from the action to the NIC. This results in negative impact on the insertion performance. Looking forward there are going to be a need to support many more tunnel encapsulations. For example MPLSoGRE, MPLSoUDP. Adding the new encapsulation will result in duplication of code. For example the code for handling NVGRE and VXLAN are exactly the same, and each new tunnel will have the same exact structure. This patch introduce a raw encapsulation that can support L2 tunnel types and L3 tunnel types. In addtion the new encapsulations commands are using raw buffer inorder to save the converstion time, both for the application and the PMD. In order to encapsulate L3 tunnel type there is a need to use both actions in the same rule: The decap to remove the L2 of the original packet, and then encap command to encapsulate the packet with the tunnel. For decap L3 there is also a need to use both commands in the same flow first the decap command to remove the outer tunnel header and then encap to add the L2 header. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Dekel Peled	839b20be0e	ethdev: support metadata as flow rule criteria As described in [1], a new rte_flow item is added to support metadata to use as flow rule match pattern. The metadata is an opaque item, fully controlled by the application. The use of metadata is relevant for egress rules only. It can be set in the flow rule using the RTE_FLOW_ITEM_META. An additional member 'tx_metadata' is added in union with existing member 'hash' of struct 'rte_mbuf', located to avoid conflicts with existing fields. This additional member is used to carry the metadata item. Application should set the packet metadata in the mbuf dedicated field, and set the PKT_TX_METADATA flag in the mbuf->ol_flags. The NIC will use the packet metadata as match criteria for relevant flow rules. This patch introduces metadata item type for rte_flow RTE_FLOW_ITEM_META, along with corresponding struct rte_flow_item_meta and ol_flag PKT_TX_METADATA. [1] "[RFC,v2] ethdev: support metadata as flow rule criteria" Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Dan Gora	89397a01ce	kni: set default carrier state of interface Add module parameter 'carrier='on\|off' to set the default carrier state for linux network interfaces created by the KNI module. The default carrier state is 'off'. For KNI interfaces which need to reflect the carrier state of a physical Ethernet port controlled by the DPDK application, the default carrier state should be left set to 'off'. The application can set the carrier state of the KNI interface to reflect the state of the physical Ethernet port using rte_kni_update_link(). For KNI interfaces which are purely virtual, the default carrier state can be set to 'on'. This enables the KNI interface to be used without having to explicity set the carrier state to 'on' using rte_kni_update_link(). Signed-off-by: Dan Gora <dg@adax.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 19:46:20 +02:00
Honnappa Nagarahalli	e605a1d36c	hash: add lock-free r/w concurrency Add lock-free read-write concurrency. This is achieved by the following changes. 1) Add memory ordering to avoid race conditions. The only race condition that can occur is - using the key store element before the key write is completed. Hence, while inserting the element the release memory order is used. Any other race condition is caught by the key comparison. Memory orderings are added only where needed. For ex: reads in the writer's context do not need memory ordering as there is a single writer. key_idx in the bucket entry and pdata in the key store element are used for synchronisation. key_idx is used to release an inserted entry in the bucket to the reader. Use of pdata for synchronisation is required due to updation of an existing entry where-in only the pdata is updated without updating key_idx. 2) Reader-writer concurrency issue, caused by moving the keys to their alternative locations during key insert, is solved by introducing a global counter(tbl_chng_cnt) indicating a change in table. 3) Add the flag to enable reader-writer concurrency during run time. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:50:43 +02:00
Liang Ma	450f079131	power: add traffic pattern aware power control 1. Abstract For packet processing workloads such as DPDK polling is continuous. This means CPU cores always show 100% busy independent of how much work those cores are doing. It is critical to accurately determine how busy a core is hugely important for the following reasons: * No indication of overload conditions. * User does not know how much real load is on a system, resulting in wasted energy as no power management is utilized. Compared to the original l3fwd-power design, instead of going to sleep after detecting an empty poll, the new mechanism just lowers the core frequency. As a result, the application does not stop polling the device, which leads to improved handling of bursts of traffic. When the system become busy, the empty poll mechanism can also increase the core frequency (including turbo) to do best effort for intensive traffic. This gives us more flexible and balanced traffic awareness over the standard l3fwd-power application. 2. Proposed solution The proposed solution focuses on how many times empty polls are executed. The less the number of empty polls, means current core is busy with processing workload, therefore, the higher frequency is needed. The high empty poll number indicates the current core not doing any real work therefore, we can lower the frequency to safe power. In the current implementation, each core has 1 empty-poll counter which assume 1 core is dedicated to 1 queue. This will need to be expanded in the future to support multiple queues per core. 2.1 Power state definition: LOW: Not currently used, reserved for future use. MED: the frequency is used to process modest traffic workload. HIGH: the frequency is used to process busy traffic workload. 2.2 There are two phases to establish the power management system: a.Initialization/Training phase. The training phase is necessary in order to figure out the system polling baseline numbers from idle to busy. The highest poll count will be during idle, where all polls are empty. These poll counts will be different between systems due to the many possible processor micro-arch, cache and device configurations, hence the training phase. In the training phase, traffic is blocked so the training algorithm can average the empty-poll numbers for the LOW, MED and HIGH power states in order to create a baseline. The core's counter are collected every 10ms, and the Training phase will take 2 seconds. Training is disabled as default configuration. The default parameter is applied. Sample App still can trigger training if that's needed. Once the training phase has been executed once on a system, the application can then be started with the relevant thresholds provided on the command line, allowing the application to start passing start traffic immediately b.Normal phase. Traffic starts immediately based on the default thresholds, or based on the user supplied thresholds via the command line parameters. The run-time poll counts are compared with the baseline and the decision will be taken to move to MED power state or HIGH power state. The counters are calculated every 10ms. 3. Proposed API 1. rte_power_empty_poll_stat_init(struct ep_params *eptr, uint8_t freq_tlb, struct ep_policy policy); which is used to initialize the power management system. 2. rte_power_empty_poll_stat_free(void); which is used to free the resource hold by power management system. 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); which is used to update specific core empty poll counter, not thread safe 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); which is used to update specific core valid poll counter, not thread safe 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core empty poll counter. 6. rte_power_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core valid poll counter. 7. rte_empty_poll_detection(struct rte_timer tim, void *arg); which is used to detect empty poll state changes then take action. Signed-off-by: Liang Ma <liang.j.ma@intel.com> Reviewed-by: Lei Yao <lei.a.yao@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2018-10-26 01:55:07 +02:00

1 2 3 4 5 ...

357 Commits