numam-dpdk

Author	SHA1	Message	Date
Cristian Dumitrescu	bacfdd908d	pipeline: fix memory leak Coverity issue: 362741 Fixes: `b32c0a2c5e` ("pipeline: add SWX table update high level API") Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-08 15:00:42 +02:00
Bruce Richardson	979e29ddbb	raw/ioat: rename functions to be operation-agnostic Since the hardware supported by the ioat driver is capable of operations other than just copies, we can rename the doorbell and completion-return functions to not have "copies" in their names. These functions are not copy-specific, and so would apply for other operations which may be added later to the driver. Also add a suitable warning using deprecation attribute for any code using the old functions names. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>	2020-10-08 14:33:20 +02:00
Erik Gabriel Carrillo	0875ec4dd5	timer: add limitation note for sync stop and reset If a timer's callback function calls rte_timer_reset_sync() or rte_timer_stop_sync() on another timer that is in the RUNNING state and owned by the current lcore, the _sync() calls will loop indefinitely. Relatedly, if a timer's callback function calls _sync() on another timer that is in the RUNNING state and is owned by a different lcore, but a timer callback function runs on that different lcore and calls *_sync() on a timer that is in the RUNNING state and owned by the current lcore, the two lcores will loop indefinitely. Add a note in the rte_timer_stop_sync and rte_timer_reset_sync documentation that indicates that these APIs should not be used inside timer callback functions in order to avoid the hangs described above, and suggests an alternative. Bugzilla ID: 491 Cc: stable@dpdk.org Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-10-08 09:43:57 +02:00
Timothy McDaniel	7fde94b2df	trace: increase event CTF description buffer size The current buffer size is not big enough to register trace points for new additions in the eventdev subsystem. Increase TRACE_CTF_FIELD_SIZE by 64 bytes for now. Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com> Acked-by: Sunil Kumar Kori <skori@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-10-08 09:03:01 +02:00
Reshma Pattan	a3f9cca718	power: fix current frequency index During power initialization the pstate cpufreq api is not setting the initial curr_idx of pstate_power_info to corresponding current frequency index. Without this the idx is always 0, which is causing the below check to pass and returns without setting the initial min/max frequency to system max frequency and this leads to incorrect frequency settings when power_pstate_cpufreq_set_freq() is called in the apps. set_freq_internal(struct pstate_power_info pi, uint32_t idx) { ... / Check if it is the same as current */ if (idx == pi->curr_idx) return 0; ... } scenario 1: If system has starting scaling min/max: 1000/1000, and want to set this to 2200/2200, the max frequency gets updated but not min. scenario 2: If system has starting scaling min/max: 2200/1000, and want to set to 2200/2200, the max, min frequency was not updated. Since no change in max that should be ok, but min was also ignored, which will be fixed now with the new changes. Fixes: `e6c6dc0f` ("power: add p-state driver compatibility") Cc: stable@dpdk.org Signed-off-by: Reshma Pattan <reshma.pattan@intel.com> Reviewed-by: Liang Ma <liang.j.ma@intel.com>	2020-10-07 14:51:52 +02:00
Pavan Nikhilesh	ca32fa67a7	trace: add size_t as generic trace point Add size_t as a generic trace point. Also, update test_generic_trace_point() to validate size_t emitter. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Sunil Kumar Kori <skori@mavell.com>	2020-10-07 14:44:03 +02:00
Pavan Nikhilesh	2114521cff	trace: fix size_t field emitter Add size_t CTF format metadata, this is needed by CTF analyzers to parse the emitted CTF trace. Fixes: `262c4ee791` ("trace: add size_t field emitter") Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Sunil Kumar Kori <skori@mavell.com>	2020-10-07 14:40:31 +02:00
Hemant Agrawal	b24b892895	mempool: dump handler index and name Enhance the dump function to also print the ops index and associated mempool ops name Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-10-06 23:44:15 +02:00
Fan Zhang	9ef2627cf6	port: remove useless assignment This patch fixes an unused value in pcap source port by removing the setting to the value. Coverity issue: 362020 Fixes: `d4b42133d8` ("port: add pcap file source") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>	2020-10-06 23:39:34 +02:00
Ciara Power	083b0b310b	ethdev: add common stats for telemetry The ethdev library now registers a telemetry command for common ethdev statistics. An example usage is shown below: Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2 {"version": "DPDK 20.08.0-rc1", "pid": 14119, "max_output_len": 16384} --> /ethdev/stats,0 {"/ethdev/stats": {"ipackets": 0, "opackets": 0, "ibytes": 0, "obytes": \ 0, "imissed": 0, "ierrors": 0, "oerrors": 0, "rx_nombuf": 0, \ "q_ipackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \ "q_opackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \ "q_ibytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \ "q_obytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \ "q_errors": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}} Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-06 22:55:41 +02:00
Ciara Power	c933bb5177	telemetry: support array values in data object Arrays of type uint64_t/int/string can now be included within an array or dict. One level of embedded containers is supported. This is necessary to allow for instances such as the ethdev queue stats to be reported as a list of uint64_t values, rather than having multiple dict entries with one uint64_t value for each queue stat. The memory management APIs provided by telemetry simplify the memory allocation/free aspect of the embedded container. The rte_tel_data_alloc function is called in the library/app callback to return a pointer to a container that has been allocated memory. When adding this container to an array/dict, a parameter is passed to indicate if the memory should be freed by telemetry after use. This will allow reuse of the allocated memory if the library/app wishes to do so. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-06 22:55:00 +02:00
Ciara Power	5514319e7b	telemetry: fix passing full params string to command Telemetry only passed the first param to the command handler if multiple were entered by the user, separated by commas. Telemetry is required to pass the full params string to the command, by splitting by a comma delimiter only once to remove the command part of the string. This will enable future commands to take multiple param values. Fixes: `b1ad0e1245` ("rawdev: add telemetry callbacks") Fixes: `c190daedb9` ("ethdev: add telemetry callbacks") Fixes: `6dd571fd07` ("telemetry: introduce new functionality") Cc: stable@dpdk.org Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-06 22:54:58 +02:00
Yi Yang	e2d8110636	gro: support VXLAN UDP/IPv4 VXLAN UDP/IPv4 GRO can help improve VM-to-VM UDP performance when UFO or GSO is enabled in VM, GRO must be supported if UFO or GSO is enabled, otherwise, performance can't get big improvement if only GSO is there. With this enabled in DPDK, OVS DPDK can leverage it to improve VM-to-VM UDP performance, it will reassemble VXLAN UDP/IPv4 fragments immediate after they are received from a physical NIC. It is very helpful in OVS DPDK VXLAN use case. Signed-off-by: Yi Yang <yangyi01@inspur.com> Acked-by: Jiayu Hu <jiayu.hu@intel.com>	2020-10-06 21:51:03 +02:00
Yi Yang	1ca5e67408	gro: support UDP/IPv4 UDP/IPv4 GRO can help improve VM-to-VM UDP performance when UFO or GSO is enabled in VM, GRO must be supported if UFO or GSO is enabled, otherwise, performance can't get big improvement if only GSO is there. With this enabled in DPDK, OVS DPDK can leverage it to improve VM-to-VM UDP performance, it will reassemble UDP fragments immediate after they are received from a physical NIC. It is very helpful in OVS DPDK VLAN use case. Signed-off-by: Yi Yang <yangyi01@inspur.com> Acked-by: Jiayu Hu <jiayu.hu@intel.com>	2020-10-06 21:51:03 +02:00
Michael Pfeiffer	ac0ad5eff8	net: calculate checksum of packet with IPv4 options Currently, rte_ipv4_cksum() and rte_ipv4_udptcp_cksum() assume all IPv4 headers have sizeof(struct rte_ipv4_hdr) bytes. This is not true for those (rare) packets with IPv4 options. Thus, both IPv4 and TCP/UDP checksums are calculated wrong. This patch fixes the issue by using the actual IPv4 header length from the packet's IHL field. Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-10-06 17:10:02 +02:00
Olivier Matz	e4233e1d0a	mempool: promote some experimental functions as stable Move symbols introduced in version <= 19.11 in the stable ABI. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-10-06 12:19:21 +02:00
Olivier Matz	9b427d5229	mempool: remove v20 ABI compatibility Remove the deprecated v20 ABI of rte_mempool_populate_iova() and rte_mempool_populate_virt(). Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-10-06 12:19:21 +02:00
Joyce Kong	6ee0c53fda	rcu: promote library as stable RCU library supporting quiescent state was introduced in 19.05 release and has been around 4 releases, it should be mature enough to remove the experimental tag. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com>	2020-10-06 10:31:13 +02:00
Joyce Kong	ebfe34c501	mcslock: promote as stable Since rte_mcslock APIs were introduced in 19.08 release, it is now possible to remove the experimental tag from: rte_mcslock_lock() rte_mcslock_unlock() rte_mcslock_trylock() rte_mcslock_is_locked() Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Phil Yang <phil.yang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com>	2020-10-06 10:31:13 +02:00
Joyce Kong	d3a76807fc	ticketlock: promote as stable As rte_ticketlock was introduced in 19.05 release and there were no changes in its public API since 19.11 release, it should be mature enough to remove the experimental tag. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com>	2020-10-06 10:31:13 +02:00
Joyce Kong	62867b7749	eal: promote wait until equal API as stable rte_wait_until_equal_xx APIs were introduced in 19.11 release and there were no changes in the public APIs since then, it should be mature enough to remove the experimental tag. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com>	2020-10-06 10:31:13 +02:00
David Marchand	aa48ddf4f0	mem: fix allocation in container with SELinux This is something we encountered while working in an OpenShift environment with SELinux enabled. In this environment, a DPDK application could create/write to hugepage files but removing them was refused. This resulted in dirty files being reused when starting a new DPDK application and triggered random crashes / erratic behavior. Getting a SELinux setup can be a challenge, and even more if you add containers to the picture :-). So here is a reproducer for the interested testers: # cat >wrap.c <<EOF #define _GNU_SOURCE #include <dlfcn.h> #include <errno.h> #include <stdio.h> #include <string.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> int unlink(const char pathname) { static int (orig)(const char pathname) = NULL; struct stat st; if (orig == NULL) orig = dlsym(RTLD_NEXT, "unlink"); if (strstr(pathname, "rtemap_") != NULL && stat(pathname, &st) == 0) { fprintf(stderr, "### refused unlink for %s\n", pathname); errno = EACCES; return -1; } fprintf(stderr, "### called unlink for %s\n", pathname); return orig(pathname); } int unlinkat(int dirfd, const char pathname, int flags) { static int (orig)(int dirfd, const char pathname, int flags) = NULL; struct stat st; if (orig == NULL) orig = dlsym(RTLD_NEXT, "unlinkat"); if (strstr(pathname, "rtemap_") != NULL && fstatat(dirfd, pathname, &st, flags) == 0) { fprintf(stderr, "### refused unlinkat for %s\n", pathname); errno = EACCES; return -1; } fprintf(stderr, "### called unlinkat for %s\n", pathname); return orig(dirfd, pathname, flags); } EOF # gcc -fPIC -shared -o libwrap.so wrap.c -ldl # \rm /dev/hugepages/rtemap* # # First run is fine # LD_PRELOAD=libwrap.so dpdk-testpmd -w 0000:01:00.0 -- -i [...] Configuring Port 0 (socket 0) Port 0: 24:6E:96:3C:52:D8 Checking link statuses... Done testpmd> # # Second run we have dirty memory # LD_PRELOAD=libwrap.so dpdk-testpmd -w 0000:01:00.0 -- -i [...] ### refused unlinkat for rtemap_0 [...] Port 0 is now not stopped Please stop the ports first Done testpmd> Removing hugepage files is done in multiple places and the memory allocation code is complex. This fix tries to do the minimum and avoids touching other paths. If trying to remove the hugepage file before allocating a page fails, the error is reported to the caller and the user will see a memory allocation error log. Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-10-06 01:11:45 +02:00
Sachin Saxena	0621033073	mempool: dump socket attribute Enhance the dump function to also print socket_id attribute passed at creation time. Signed-off-by: Sachin Saxena <sachin.saxena@oss.nxp.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-10-06 00:57:16 +02:00
Dmitry Kozlyuk	a6c824360a	rcu: avoid literal suffix warning in C++ mode Sequences like "value = %"PRIu64 (no space before PRIu64) are parsed as a single preprocessor token, user-defined-string-literal, in C++11 onwards. While modern compilers are smart enough to parse this properly, GCC 9.3.0 generates warnings like: rte_rcu_qsbr.h:555:26: warning: invalid suffix on literal; C++11 requires a space between literal and string macro [-Wliteral-suffix] Add spaces around format specifier macros to make public headers compatible with C++ without causing warnings. Make similar changes in C source for style consistency within the library. Fixes: `64994b56c` ("rcu: add RCU library supporting QSBR mechanism") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-10-06 00:45:37 +02:00
Gage Eads	100d9b8066	stack: promote library as stable The stack library was first released in 19.05, and its interfaces have been stable since their initial introduction. This commit promotes the full interface to stable, starting with the 20.11 major version. Signed-off-by: Gage Eads <gage.eads@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-10-05 11:56:17 +02:00
Erik Gabriel Carrillo	99cdd4b5ef	timer: promote some experimental functions as stable Some new APIs were added to the timer library in the 19.05 release, and there have been no changes to their interfaces since then. These functions can be considered stable enough to remove their 'experimental' tag. Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-10-05 11:12:52 +02:00
Ferruh Yigit	61e436e1f3	meter: remove experimental alias Remove ABI versioning for APIs: 'rte_meter_trtcm_rfc4115_profile_config()' 'rte_meter_trtcm_rfc4115_config()' The alias was introduced in commit `60197bda97` ("meter: provide experimental alias for matured API") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-10-05 11:11:59 +02:00
Yunjian Wang	1f16fa99aa	vfio: fix group descriptor check The issue is that a file descriptor at 0 is a valid one. Currently the file not found, the return value will be set to 0. As a result, it is impossible to distinguish between a correct descriptor and a failed return value. Fix it to return -ENOENT instead of 0. Fixes: `b758423bc4` ("vfio: fix race condition with sysfs") Fixes: `ff0b67d1c8` ("vfio: DMA mapping") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-10-05 10:08:57 +02:00
Ophir Munk	2d8eb20ad4	hash: build on Windows Build the lib for Windows. Export the needed function from eal. Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com> Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>	2020-10-05 09:49:55 +02:00
Dmitry Kozlyuk	ff5db45d47	eal/windows: use bundled getopt with MinGW Clang builds use getopt.c in librte_eal while MinGW provides implementation as part of the toolchain. Statically linking librte_eal to an application that depends on getopt results in undefined reference errors with MinGW. There are no such errors with Clang, because with Clang librte_eal actually defines getopt functions. Use getopt.c in EAL with Clang and MinGW to get identical behavior. Adjust code for MinGW. Incidentally, this removes a bug when free() is called on uninitialized memory. Fixes: `5e373e456e` ("eal/windows: add getopt implementation") Cc: stable@dpdk.org Reported-by: Khoa To <khot@microsoft.com> Reported-by: Tal Shnaiderman <talshn@nvidia.com> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Khoa To <khot@microsoft.com> Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>	2020-10-05 09:12:24 +02:00
David Marchand	2a8be2ff75	pipeline: fix build with glibc < 2.26 reallocarray has been introduced in glibc 2.26 but we still support glibc >= 2.7. Simply replace with realloc, as the considered sizes are unlikely to overflow. """ The reallocarray() function changes the size of the memory block pointed to by ptr to be large enough for an array of nmemb elements, each of which is size bytes. It is equivalent to the call realloc(ptr, nmemb * size); However, unlike that realloc() call, reallocarray() fails safely in the case where the multiplication would overflow. If such an over‐ flow occurs, reallocarray() returns NULL, sets errno to ENOMEM, and leaves the original block of memory unchanged. """ Fixes: `3ca60ceed7` ("pipeline: add SWX pipeline specification file") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-02 13:49:16 +02:00
Maxime Coquelin	cacf8267cc	vhost: remove dequeue zero-copy support Dequeue zero-copy removal was announced in DPDK v20.08. This feature brings constraints which makes the maintenance of the Vhost library difficult. Its limitations makes it also difficult to use by the applications (Tx vring starvation). Removing it makes it easier to add new features, and also remove some code in the hot path, which should bring a performance improvement for the standard path. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-30 23:16:56 +02:00
Maxime Coquelin	eca9a0d6c8	vhost: promote vDPA API as stable As announced in v20.08, this patch makes the vDPA and related Vhost API stable. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-30 23:16:55 +02:00
Thomas Monjalon	fbd1913561	ethdev: remove old close behaviour The temporary flag RTE_ETH_DEV_CLOSE_REMOVE is removed. It was introduced in DPDK 18.11 in order to give time for PMDs to migrate. The old behaviour was to free only queues when closing a port. The new behaviour is calling rte_eth_dev_release_port() which does three more tasks: - trigger event callback - reset state and few pointers - free all generic port resources The private port resources must be released in the .dev_close callback. The .remove callback should: - call .dev_close callback - call rte_eth_dev_release_port() - free multi-port device shared resources Despite waiting two years, some drivers have not migrated, so they may hit issues with the incompatible new behaviour. After sending emails, adding logs, and announcing the deprecation, the only last solution is to declare these drivers as unmaintained: ionic, liquidio, nfp Below is a summary of what to implement in those drivers. * The freeing of private port resources must be moved from the ".remove(device)" function to the ".dev_close(port)" function. * If a generic resource (.mac_addrs or .hash_mac_addrs) cannot be freed, it must be set to NULL in ".dev_close" function to protect from subsequent rte_eth_dev_release_port() freeing. * Note 1: The generic resources are freed in rte_eth_dev_release_port(), after ".dev_close" is called in rte_eth_dev_close(), but not when calling ".dev_close" directly from the ".remove" PMD function. That's why rte_eth_dev_release_port() must still be called explicitly from ".remove(device)" after calling the ".dev_close" PMD function. * Note 2: If a device can have multiple ports, the common resources must be freed only in the ".remove(device)" function. * Note 3: The port is supposed to be in a stopped state when it is closed. If it is not the case, it is free to the PMD implementation how to react when trying to close a non-stopped port: either try to stop it automatically or just return an error. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Liron Himi <lironh@marvell.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:14 +02:00
Thomas Monjalon	b142387b07	ethdev: allow drivers to return error on close The device operation .dev_close was returning void. This driver interface is changed to return an int. Note that the API rte_eth_dev_close() is still returning void, although a deprecation notice is pending to change it as well. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com> Reviewed-by: Liron Himi <lironh@marvell.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:13 +02:00
Thomas Monjalon	2c65898b48	ethdev: reset device and interrupt pointers on release The pointers .device and .intr_handle were already reset by the helper rte_eth_dev_pci_generic_remove(). It is now made part of rte_eth_dev_release_port(). It makes rte_eth_dev_pci_release() meaningless, so it is replaced with a call to rte_eth_dev_release_port(). Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:13 +02:00
Manish Chopra	92c6786e85	net/qede: define PCI config space specific osals This patch defines various PCI config space access APIs in order to read and find IOV specific PCI capabilities. With these definitions implemented, it enables the base driver to do SR-IOV specific initialization and HW specific configuration required from PF-PMD driver instance. Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Rasesh Mody <rmody@marvell.com>	2020-09-30 19:19:11 +02:00
Manish Chopra	e00d2b4cea	bus/pci: query PCI extended capabilities By adding generic API, this patch removes individual functions/defines implemented by drivers to find extended PCI capabilities. Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Reviewed-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Jerin Jacob <jerinj@marvell.com>	2020-09-30 19:19:11 +02:00
Cristian Dumitrescu	d0a0096661	table: add exact match SWX table Add the exact match table type for the SWX pipeline. Used under the hood by the SWX pipeline table instruction. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:10 +02:00
Cristian Dumitrescu	394813eb60	port: add source and sink SWX ports Add the PCAP file-based source (input) and sink (output) port types for the SWX pipeline. The sink port is typically used to implement the packet drop pipeline action. Used under the hood by the pipeline rx and tx instructions. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:10 +02:00
Cristian Dumitrescu	ecc5f43396	port: add ethernet device SWX port Add the Ethernet device input/output port type for the SWX pipeline. Used under the hood by the pipeline rx and tx instructions. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	3ca60ceed7	pipeline: add SWX pipeline specification file Add support for building the SWX pipeline based on specification file with syntax aligned to the P4 language. The specification file may be generated by the P4C compiler in the future. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	b32c0a2c5e	pipeline: add SWX table update high level API High-level transaction-oriented API for SWX pipeline table updates. It supports multi-table atomic updates, i.e. multiple tables can be updated in a single step with only the before and after table set visible to the packets. Uses the lower-level table update mechanisms. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	e87cf5d85e	pipeline: add SWX pipeline flush Flush the packets currently buffered by the SWX pipeline output ports. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	393b96e2aa	pipeline: add SWX pipeline query API Query API to be used by the control plane to detect the configuration and state of the SWX pipeline and its internal objects. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	31035e87b2	pipeline: add SWX instruction optimizer Instruction optimizer. Detects frequent patterns and replaces them with some more powerful vector-like pipeline instructions without any user effort. Executes at instruction translation, not at run-time. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	75634474ca	pipeline: add SWX instruction verifier Instruction verifier. Executes at instruction translation time during SWX pipeline build, i.e. at initialization instead of run-time. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	11a86e2268	pipeline: add SWX instruction description Added SWX instruction set reference table. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	b3947e25be	pipeline: introduce SWX jump and return instructions The jump instructions are either unconditional (jmp) or conditional on positive/negative tests such as header validity (jmpv/jmpnv), table lookup hit/miss (jmph/jmpnh), executed action (jmpa/jmpna), equality (jmpeq/jmpneq), comparison result (jmplt/jmpgt). The return instruction resumes the pipeline execution after action subroutine. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	c6b752cdf2	pipeline: introduce SWX extern instruction The extern instruction calls one of the member functions of a given extern object or it calls the given extern function. The function arguments must be written in advance to the mailbox. The results are available in the same place after execution. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	6077ae611d	pipeline: introduce SWX table instruction The table instruction looks up the input key into the table and then it triggers the execution of the action found in the table entry. On lookup miss, the default table action is executed. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	e0f51638b7	pipeline: introduce SWX SHR instruction The shr (i.e. shift right) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	b09ba6d0a3	pipeline: introduce SWX SHL instruction The shl (i.e. shift left) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	b4e607f9fd	pipeline: introduce SWX XOR instruction The xor (i.e. bitwise exclusive or) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	8f796198dc	pipeline: introduce SWX or instruction The or (i.e. bitwise or) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	650195cf96	pipeline: introduce SWX and instruction The and (i.e. bitwise and) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	1e6bf5997c	pipeline: introduce SWX cksub instruction The cksub (i.e. checksum subtract) instruction is used to update the 1's complement sum commonly used by protocols such as IPv4, TCP or UDP. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	7aa4569822	pipeline: introduce SWX ckadd instruction The ckadd (i.e. checksum add) instruction is used to either compute, verify or update the 1's complement sum commonly used by protocols such as IPv4, TCP or UDP. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:08 +02:00
Cristian Dumitrescu	c88c629438	pipeline: introduce SWX subtract instruction The sub (i.e. subtract) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	baf7999303	pipeline: introduce SWX add instruction The add instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	8319c47c5f	pipeline: add SWX DMA instruction The DMA instruction handles the bulk read transfer of one header from the table entry action data. Typically used to generate headers, i.e. headers that are not extracted from the input packet. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	7210349d5b	pipeline: add SWX move instruction The mov (i.e. move) instruction source can be header field (H), meta-data field (M), extern object (E) or function (F) mailbox field, table entry action data field (T) or immediate value (I). The destination is HMEF. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	27e48ec92f	pipeline: add header validate and invalidate SWX instructions Add instructions to flag a header as valid or invalid. This flag can be tested by the jmpv (jump if header valid) and jmpnv (jump if header not valid) instructions. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	99abed441d	pipeline: add SWX Tx and emit instructions Add header emit and packet transmission instructions. Emit adds to the output packet a header that is either generated (e.g. read from table entry by action) or extracted from the input packet. Tx ends the pipeline processing; discard is implemented by tx to special port. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	a1711f948d	pipeline: add SWX Rx and extract instructions Add packet reception and header extraction instructions. The Rx must be the first pipeline instruction. Each extracted header is logically removed from the packet, then it can be read/written by instructions, emitted into the outgoing packet or discarded. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	7d666cfa96	pipeline: add SWX pipeline instructions The SWX pipeline instructions represent the main program that defines the life of the packet. As packets go through tables that trigger action subroutines, the headers and meta-data get transformed along the way. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	e9d870dd93	pipeline: add SWX pipeline tables Add tables to the SWX pipeline. The match fields are flexibly selected from the headers and meta-data. The set of table actions is flexibly selected for each table from the set of pipeline actions. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	85af61ba62	pipeline: add SWX pipeline action Add SWX actions that are dynamically-defined through instructions as opposed to pre-defined. The actions are subroutines of the pipeline program that triggered by table lookup. The input arguments are the action data from the table entry (format defined by struct), the headers and meta-data are in/out. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	1e4c88caea	pipeline: add SWX extern objects and funcs Add extern objects and functions to plug into the SWX pipeline any functionality that cannot be efficiently implemented with existing instructions, e.g. special checksum/ECC, crypto, meters, stats arrays, heuristics, etc. In/out arguments are passed through mailbox with format defined by struct. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	e719fe9e95	pipeline: add SWX headers and meta-data Add support for dynamically-defined packet headers and meta-data to the SWX pipeline. The header and meta-data format are defined by the struct type they instantiate. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	99a0ba8f4c	pipeline: add SWX pipeline output port Add output ports to the newly introduced SWX pipeline type. Each port instantiates a port type that defines the port operations, e.g. ethdev port, PCAP port, etc. The TX interface is single packet, with packet batching internally for performance. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	6e0ca01c93	pipeline: add SWX pipeline input port Add input ports to the newly introduced SWX pipeline type. Each port instantiates a port type that defines the port operations, e.g. ethdev port, PCAP port, etc. The RX interface is single packet, with packet batching internally for performance. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:07 +02:00
Cristian Dumitrescu	56492fd536	pipeline: add new SWX pipeline type Add new improved Software Switch (SWX) pipeline type that supports dynamically-defined packet headers, meta-data, actions and pipelines. Actions and pipelines are defined through instructions. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-01 18:43:06 +02:00
Yunjian Wang	d1ca6cfd3c	stack: fix uninitialized variable This patch fixes an issue that uninitialized 'success' is used to be compared with '0'. Coverity issue: 337676 Fixes: `3340202f59` ("stack: add lock-free implementation") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-09-30 21:08:39 +02:00
Steven Lariau	4ee8dc7f66	stack: relax pop CAS ordering Replace the store-release by relaxed for the CAS success at the end of pop. Release isn't needed, because there is not write to data that need to be synchronized. The only preceding write is when the length is decreased, but the length CAS loop already ensures the right synchronization. The situation to avoid is when a thread sees the old length but the new list, that doesn't have enough items for pop to success. But the CAS success on length before the pop loop ensures any core reads and updates the latest length, preventing this situation. The store-release is also used to make sure that the items are read before the head is updated, in order to prevent a core in pop to read an incorrect value because another core rewrites it with push. But this isn't needed, because items are read only when removed from the used list. Right after this, they are pushed to the free list, and the store-release in push makes sure the items are read before they are visible in the free list. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com>	2020-09-30 21:08:39 +02:00
Steven Lariau	18effad9cf	stack: reload head when pop fails List head must be loaded right before continue (when failed to find the new head). Without this, one thread might keep trying and failing to pop items without ever loading the new correct head. Fixes: `7e6e609939` ("stack: add C11 atomic implementation") Cc: gage.eads@intel.com Cc: stable@dpdk.org Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com>	2020-09-30 21:08:39 +02:00
Steven Lariau	2cdfe4e577	stack: remove redundant orderings on pop The load-acquire of list->len on pop function is redundant. Only the CAS success needs to be load-acquire. It synchronizes with the store release in push, to ensure that the updated head is visible when the new length is visible. Without this, one thread in pop could see the increased length but the old list, which doesn't have enough items yet for pop to succeed. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com>	2020-09-30 21:08:39 +02:00
Steven Lariau	9e90bc9c71	stack: remove acquire fence on push An acquire fence is used to make sure loads after the fence can observe all store operations before a specific store-release. But push doesn't read any data, except for the head which is part of a CAS operation (the items on the list are not read). So there is no need for the acquire barrier. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com>	2020-09-30 21:08:39 +02:00
Steven Lariau	0df8e2d2c9	stack: fix inconsistent weak/strong CAS Fix cmpexchange usage of weak / strong. The generated code is the same on x86 and ARM (there is no weak cmpexchange), but the old usage was inconsistent. For push and pop update size, weak is used because cmpexchange is inside a loop. For pop update root, strong is used even though cmpexchange is inside a loop, because there may be a lot of operations to do in a loop iteration (locate the new head). Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com>	2020-09-30 21:08:39 +02:00
David Marchand	dc3cdcd69d	ethdev: fix link speed helper documentation When generating the documentation, a new warning can be seen: .../dpdk/lib/librte_ethdev/rte_ethdev.h:2441: warning: argument 'link_speed' of command @param is not found in the argument list of rte_eth_link_speed_to_str(uint32_t speed_link) .../dpdk/lib/librte_ethdev/rte_ethdev.h:2455: warning: The following parameters of rte_eth_link_speed_to_str(uint32_t speed_link) are not documented: parameter 'speed_link' Align the function prototype to its doxygen description. Fixes: `fbf931c9c3` ("ethdev: format link status text") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-29 17:43:32 +02:00
Fan Zhang	2d962bb736	vhost/crypto: fix possible TOCTOU attack This patch fixes the possible time-of-check to time-of-use (TOCTOU) attack problem by copying request data and descriptor index to local variable prior to process. Also the original sequential read of descriptors may lead to TOCTOU attack. This patch fixes the problem by loading all descriptors of a request to local buffer before processing. CVE-2020-14375 Fixes: `3bb595ecd6` ("vhost/crypto: add request handler") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:19:13 +02:00
Fan Zhang	e15b7c0112	vhost/crypto: fix data length check This patch fixes the incorrect data length check to vhost crypto. Instead of blindly accepting the descriptor length as data length, the change compare the request provided data length and descriptor length first. The security issue CVE-2020-14374 is not fixed alone by this patch, part of the fix is done through: "vhost/crypto: fix missed request check for copy mode". CVE-2020-14374 Fixes: `3c79609fda` ("vhost/crypto: handle virtually non-contiguous buffers") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:19:13 +02:00
Fan Zhang	409c47c7c5	vhost/crypto: fix write back source This patch fixes vhost crypto library for the incorrect source and destination buffer calculation in the copy mode. Fixes: `cd1e8f03ab` ("vhost/crypto: fix packet copy in chaining mode") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:19:02 +02:00
Fan Zhang	b2866f4733	vhost/crypto: fix missed request check for copy mode This patch fixes the missed request check to vhost crypto copy mode. CVE-2020-14376 CVE-2020-14377 Fixes: `3bb595ecd6` ("vhost/crypto: add request handler") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:17:08 +02:00
Fan Zhang	5677e68c05	vhost/crypto: fix descriptor deduction This patch fixes the incorrect descriptor deduction for vhost crypto. CVE-2020-14378 Fixes: `16d2e718b8` ("vhost/crypto: fix possible out of bound access") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:16:37 +02:00
Fan Zhang	57680e3449	vhost/crypto: fix pool allocation This patch fixes the missing iv space allocation in crypto operation mempool. Fixes: `709521f4c2` ("examples/vhost_crypto: support multi-core") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:16:37 +02:00
Maxime Coquelin	09424c3f74	vhost: fix external backends readiness Commit `d0fcc38f5f` ("vhost: improve device readiness notifications") makes the assumption that every Virtio devices are considered ready for preocessing as soon as first queue pair is configured and enabled. While this is true for Virtio-net, it isn't for Virtio-scsi and Virtio-blk. This patch fixes this by only making this assumption for the builtin Virtio-net backend, and restores back to previous behaviour for other backends. Fixes: `d0fcc38f5f` ("vhost: improve device readiness notifications") Reported-by: Changpeng Liu <changpeng.liu@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-09-28 13:16:37 +02:00
Phil Yang	f5d6738c2e	ethdev: use C11 atomics for link status Since rte_atomicXX APIs are not allowed to be used, use C11 atomic builtins for link status update. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-25 15:42:34 +02:00
Phil Yang	e623c943eb	power: use C11 atomics for power state Since rte_atomicXX APIs are not allowed to be used, use C11 atomic builtins for power in use state update. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-09-25 15:42:29 +02:00
Phil Yang	b5bafde26c	bbdev: use C11 atomics for device processing counter Since rte_atomicXX APIs are not allowed to be used, use C11 atomic builtins for device processing counter. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Nicolas Chautru <nicolas.chautru@intel.com>	2020-09-25 15:37:55 +02:00
Phil Yang	c5d6c47257	eal: use C11 atomics for initialization check Since rte_atomicXX APIs are not allowed to be used, use C11 builtins to check if EAL is already initialized. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>	2020-09-25 15:36:17 +02:00
Radu Nicolau	84fb33fec1	build: remove deprecated cpuflag macros Replace use of RTE_MACHINE_CPUFLAG macros with regular compiler macros, which are more complete than those provided by DPDK, and as such it allows new instruction sets to be leveraged without having to do extra work to set them up in DPDK. Signed-off-by: Sean Morrissey <sean.morrissey@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-25 11:13:57 +02:00
Phil Yang	f0f5d844d1	eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remove them and use rte_io_*mb instead. Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-23 13:40:26 +02:00
Chengchang Tang	61efaf5b62	ethdev: support getting Rx buffer size in Rx queue info Add a field named rx_buf_size in rte_eth_rxq_info to indicate the buffer size used in receiving packets for HW. In this way, upper-layer users can get this information by calling rte_eth_rx_queue_info_get. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-09-21 18:05:38 +02:00
Ivan Dyukov	fbf931c9c3	ethdev: format link status text There is new link_speed value introduced. It's INT_MAX value which means that speed is unknown. To simplify processing of the value in application, new function is added which convert link_speed to string. Also dpdk examples have many duplicated code which format entire link status structure to text. This commit adds two functions: * rte_eth_link_speed_to_str - format link_speed to string * rte_eth_link_to_str - convert link status structure to string Signed-off-by: Ivan Dyukov <i.dyukov@samsung.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-09-21 18:05:37 +02:00
Eugenio Pérez	46d3f57537	vhost: fix IOTLB mempool single-consumer flag Control thread (which handles iotlb msg) and forwarding thread both use iotlb to translate address. The former may modify the same entry of mempool and may cause a loop in iotlb_pending_entries list. Bugzilla ID: 523 Fixes: `d012d1f293` ("vhost: add IOTLB helper functions") Cc: stable@dpdk.org Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-09-18 18:55:12 +02:00
Chenbo Xia	671cc679a5	vhost: add device reset status vhost lib now does not have definition of reset status. This patch adds the reset status definition and changes related log. Signed-off-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-09-18 18:55:12 +02:00
Kiran Kumar K	333a38bb2e	ethdev: support encapsulation level for RSS offload This patch reserves 2 bits as input selection to select inner and outer encapsulation level for RSS computation. It is combined with existing ETH_RSS_* to choose inner or outer layers. This functionality already exists in rte_flow through level parameter in RSS action configuration rte_flow_action_rss. Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2020-09-18 18:55:12 +02:00
Haiyue Wang	976277f9ce	net: adjust header length parse size Enlarge the L3 and tunnel header length from 8-bit to 16-bit to handle the bigger headers. And reorder the fields to avoid creating a structure hole. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-18 18:55:12 +02:00
Yi Yang	b9b75d9b5c	gso: fix payload unit size for UDP Fragment offset of IPv4 header is measured in units of 8 bytes. Fragment offset of UDP fragments will be wrong after GSO if pyld_unit_size isn't multiple of 8. Say pyld_unit_size is 1500, fragment offset of the second UDP fragment will be 187 (i.e. 1500 / 8), which means 1496, and it will result in 4-byte data loss (1500 - 1496 = 4). So UDP GRO will reassemble out a wrong packet. Fixes: `b166d4f30b` ("gso: support UDP/IPv4 fragmentation") Cc: stable@dpdk.org Signed-off-by: Yi Yang <yangyi01@inspur.com> Acked-by: Jiayu Hu <jiayu.hu@intel.com>	2020-09-18 18:55:12 +02:00
Andrew Rybchenko	bf3785fbd8	net: check first segment length on SW VLAN insertion SW VLAN insertion relies on Ethernet addresses location in contiguous memory (do not split across mbuf segments). There is no any formal requirements on data location and mbuf structure which guarantee it. So, check it explicitly to avoid corrupted packets if the condition is violated. Typically software VLAN insertion is done on Tx prepare stage and application will get indication that the packet is invalid and cannot be transmitted. Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-09-18 18:55:10 +02:00
Nithin Dabilpuram	bf4e0faae9	ethdev: support TM for shaper config in packet mode Some NIC hardware support shaper to work in packet mode i.e shaping or ratelimiting traffic is in packets per second (PPS) as opposed to default bytes per second (BPS). Hence this patch adds support to configure shared or private shaper in packet mode, provide rate in PPS and add related tm capabilities in port/level/node capability structures. This patch also updates tm port/level/node capability structures with exiting features of scheduler wfq packet mode, scheduler wfq byte mode and private/shared shaper byte mode. SoftNIC PMD is also updated with new capabilities. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-09-18 18:55:10 +02:00
Ferruh Yigit	5723fbed4f	ethdev: remove underscore prefix from internal API '_rte_eth_dev_callback_process()' & '_rte_eth_dev_reset()' internal APIs has unconventional underscore ('_') prefix. Although this is not documented most probably this is to mark them as internal. Since we have '__rte_internal' flag to mark this, removing '_' from API names. For '_rte_eth_dev_reset()', there is already a public API named 'rte_eth_dev_reset()', so renaming '_rte_eth_dev_reset()' to 'rte_eth_dev_internal_reset'. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	8682e492ed	ethdev: use hairpin helper functions Hairpin helper functions were not used by drivers, but it was used only local to ethdev. They are: 'rte_eth_dev_is_rx_hairpin_queue()' 'rte_eth_dev_is_tx_hairpin_queue()' Exposing them as internal APIs and update mlx5 driver (only user of hairpin) to use them. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	cb4115cb84	ethdev: mark internal functions Some ethdev functions are for drivers only, not for applications. Since we have '__rte_internal' tag available now, marking internal functions with it and moving functions to INTERNAL section in linker script. This is also good for documenting the internal functions. Some internal APIs seems marked as experimental, but it doesn't make sense to have internals APIs as experimental, updating their tag and doxygen comments. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	47909357a0	ethdev: make device operations struct private Hiding the 'struct eth_dev_ops' from applications. Removing relevant deprecation notice. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	cbfc6111b5	ethdev: move inline device operations This patch is a preparation to hide the 'struct eth_dev_ops' from applications by moving some device operations from 'struct eth_dev_ops' to 'struct rte_eth_dev'. Mentioned ethdev APIs are in the data path and implemented as inline because of performance reasons. Exposing 'struct eth_dev_ops' to applications is bad because it is a contract between ethdev and PMDs, not really needs to be known by applications, also changes in the struct causing ABI breakages which shouldn't. To be able to both keep APIs inline and hide the 'struct eth_dev_ops', moving device operations used in ethdev inline APIs to 'struct rte_eth_dev' to the same level with Rx/Tx burst functions. The list of dev_ops moved: eth_rx_queue_count_t rx_queue_count; eth_rx_descriptor_done_t rx_descriptor_done; eth_rx_descriptor_status_t rx_descriptor_status; eth_tx_descriptor_status_t tx_descriptor_status; Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	fa1f5fe4d8	ethdev: deprecate descriptor status check API Marking 'rte_eth_rx_descriptor_done()' API as deprecated. ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status`` APIs can be used as replacement. Plan is to remove the API on 21.11 release. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	b895a521f9	ethdev: remove redundant license text Redundant BSD-3 license text removed, the licensing already documented by "SPDX-License-Identifier: BSD-3-Clause" SPDX tag. Cc: stable@dpdk.org Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-18 18:55:08 +02:00
Nithin Dabilpuram	5dfcb68984	ethdev: mark all traffic manager API as experimental This patch marks all traffic manager API as experimental as per deprecation notice[1] and discussion[2] mentioned in following threads. [1] https://mails.dpdk.org/archives/dev/2020-May/166221.html [2] https://mails.dpdk.org/archives/dev/2020-April/165364.html Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-09-18 18:55:08 +02:00
Thomas Monjalon	810b17d116	ethdev: allow unknown link speed When querying the link information, the link status is a mandatory major information. Other boolean values are supposed to be accurate: - duplex mode (half/full) - negotiation (auto/fixed) This API update is making explicit that the link speed information is optional. The value ETH_SPEED_NUM_NONE (0) was already part of the API. The value ETH_SPEED_NUM_UNKNOWN (infinite) is added to cover two different cases: - speed is not known by the driver - device is virtual Suggested-by: Morten Brørup <mb@smartsharesystems.com> Suggested-by: Benoit Ganne <bganne@cisco.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-09-18 18:55:07 +02:00
Morten Brørup	a6f67a3b64	net: optimize ethernet address functions * rte_is_broadcast_ether_addr(): Use binary logic instead of comparisons and boolean logic, thus reducing the number of branches. It now resembles rte_is_zero_ether_addr(). * rte_ether_addr_copy(): The source code modifications were discussed on the mailing list: http://mails.dpdk.org/archives/dev/2020-June/171584.html Remove obsolete ICC-specific code and related comment. Restrict pointer aliasing (suggested by Jerin Jacob). Remove superfluous "Fast" from function description headline; all DPDK data plane functions are supposed to be fast. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-09-18 18:55:06 +02:00
Wei Hu (Xavier)	ba2fb4f022	ethdev: check if queue setup when getting queue info This patch adds checking whether the related Tx or Rx queue has been setup in the rte_eth_rx_queue_info_get and rte_eth_tx_queue_info_get API function to avoid illegal address access. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-09-18 18:55:06 +02:00
Harry van Haaren	8929de043e	service: retrieve lcore active state This commit adds a new experimental API which allows the user to retrieve the active state of an lcore. Knowing when the service lcore is completed its polling loop can be useful to applications to avoid race conditions when e.g. finalizing statistics. The service thread itself now has a variable to indicate if its thread is active. When zero the service thread has completed its service, and has returned from the service_runner_func() function. Suggested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-09-21 16:37:11 +02:00
David Marchand	22a2f54f25	eal: hide internal device event structure This structure is not used in the public API. Fixes: `a753e53d51` ("eal: add device event monitor framework") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jeff Guo <jia.guo@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-09-21 10:12:10 +02:00
David Marchand	e200535c1c	mem: drop mapping API workaround Now that the pci_map_resource API is private to the PCI bus, we can drop the compatibility workaround we had implemented in 20.08. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-21 10:12:10 +02:00
David Marchand	e1ece60956	pci: move resource mapping to the PCI bus As reported during 20.08 work for Windows, the pci_map_resource API was built with the assumption that its flags would be passed to mmap(). This introduced a regression when adding the rte_mem_map API as reported in the workaround commit `9d2b245937` ("pci: keep API compatibility with mmap values"). This API was only used in the PCI bus code, so move it there. There is no code change happening during the move. The only change is in the pci_map_resource description where the additional flags are now documented as rte_mem_map API flags: - * The additional flags for the mapping range. + * The additional rte_mem_map() flags for the mapping range. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-09-21 10:12:10 +02:00
David Marchand	7c0d798aab	bus/pci: switch to private kernel driver enum The rte_kernel_driver enum actually only pointed at PCI drivers and is only used in the PCI subsystem. Remove it from the generic device API and use a private enum in the PCI code. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-21 10:11:44 +02:00
David Marchand	dbd3809261	ethdev: remove unused kernel driver field This field was not generic as it was filled with PCI kernel drivers only. It has no known in-tree user (and I could not find opensource projects using it). Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-21 09:30:36 +02:00
Thomas Monjalon	4be717272e	mbuf: remove physical address alias Remove the deprecated buf_physaddr union field from rte_mbuf. It is replaced with buf_iova which is at the same offset. The single field buf_physaddr in rte_kni_mbuf is also renamed. This concludes a 3-year process of semantic change. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-09-19 00:25:37 +02:00
Thomas Monjalon	ce627d633b	mbuf: remove deprecated function and macro aliases Remove the deprecated functions - rte_mbuf_data_dma_addr - rte_mbuf_data_dma_addr_default which aliased the more recent functions - rte_mbuf_data_iova - rte_mbuf_data_iova_default Remove the deprecated macros - rte_pktmbuf_mtophys - rte_pktmbuf_mtophys_offset which aliased the more recent macros - rte_pktmbuf_iova - rte_pktmbuf_iova_offset Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-09-19 00:25:37 +02:00
Thomas Monjalon	28e3c8b286	mempool: remove physical address aliases Remove the deprecated unioned fields physaddr and phys_addr from the structures rte_mempool_objhdr and rte_mempool_memhdr. They are replaced with the fields iova which are at the same offsets. Remove the deprecated macro MEMPOOL_F_NO_PHYS_CONTIG which is an alias of the more recent MEMPOOL_F_NO_IOVA_CONTIG. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-09-19 00:25:37 +02:00
Thomas Monjalon	72f82c4324	mem: remove physical address aliases Remove the deprecated unioned fields phys_addr from the structures rte_memseg and rte_memzone. They are replaced with the fields iova which are at the same offsets. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-09-19 00:25:35 +02:00
Pawel Wodkowski	c23432a9c2	trace: fix C++ compilation trace_mem is declared as 'void ' which triggers following error: '...invalid conversion from ‘void’ to ‘__rte_trace_header’ [-fpermissive]...' Fix this by adding proper typecast to 'struct __rte_trace_header '. Fixes: `ebaee64097` ("trace: simplify trace point headers") Cc: stable@dpdk.org Signed-off-by: Pawel Wodkowski <pawelwod@gmail.com> Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Acked-by: Nicolas Chautru <nicolas.chautru@intel.com>	2020-09-17 11:13:44 +02:00
Conor Walsh	dc18be1d8b	bpf: promote library as stable The BPF lib was introduced in 18.05. There were no changes in its public API since 19.11. It should be mature enough to remove its 'experimental' tag. RTE_BPF_XTYPE_NUM is also being dropped from rte_bpf_xtype to avoid possible ABI problems in the future. Signed-off-by: Conor Walsh <conor.walsh@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-09-16 18:52:55 +02:00
Stephen Hemminger	f484baeeb6	log: hide internal variable As announced in earlier releases, rte_logs can now be made internal to EAL. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Signed-off-by: David Marchand <david.marchand@redhat.com>	2020-09-16 18:37:11 +02:00
Stephen Hemminger	2adc121ef9	log: promote rte_log_get_stream as stable Applications will need to use this API now to get internal state of rte_log. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David Marchand <david.marchand@redhat.com>	2020-09-16 18:37:11 +02:00
Stephen Hemminger	64201f1059	eal/linux: change udev debug message The debug message was poorly worded and did not include the part that would be useful. I.e it never said what was being ignored. Change it to print the message so that if udev changes format or other subsystems need to be added then the necessary information will be in the debug log. Fixes: `0d0f478d04` ("eal/linux: add uevent parse and process") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jeff Guo <jia.guo@intel.com>	2020-09-16 18:14:35 +02:00
Phil Yang	e41d27a68d	mbuf: remove atomic reference counters Remove the deprecated refcnt_atomic union fields in rte_mbuf and rte_mbuf_ext_shared_info structures. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-09-15 11:29:23 +02:00
Bruce Richardson	8c658761de	rawdev: mark start and stop functions optional Not all rawdevs will require a device start/stop function, so rather than requiring such drivers to provide dummy functions, just set the started/stopped rawdev flag from the rawdev layer and return success. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com>	2020-09-11 11:51:16 +02:00
Bruce Richardson	13f8e4a27e	rawdev: allow queue config query to return error The driver APIs for returning the queue default config can fail if the parameters are invalid, or other reasons, so allow them to return error codes to the rawdev layer and from hence to the app. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com>	2020-09-11 11:51:15 +02:00
Bruce Richardson	f574ed8116	rawdev: add private data size to queue config inputs The queue setup and queue defaults query functions take a void * parameter as configuration data, preventing any compile-time checking of the parameters and limiting runtime checks. Adding in the length of the expected structure provides a measure of typechecking, and can also be used for ABI compatibility in future, since ABI changes involving structs almost always involve a change in size. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com>	2020-09-11 11:51:03 +02:00
Bruce Richardson	8db9dce72d	rawdev: add private data size to config inputs Currently with the rawdev API there is no way to check that the structure passed in via the dev_private pointer in the structure passed to configure API is of the correct type - it's just checked that it is non-NULL. Adding in the length of the expected structure provides a measure of typechecking, and can also be used for ABI compatibility in future, since ABI changes involving structs almost always involve a change in size. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com>	2020-09-11 11:50:55 +02:00
Bruce Richardson	f150dd8839	rawdev: allow drivers to return error from info query Since we now allow some parameter checking inside the driver info_get() functions, it makes sense to allow error return from those functions to the caller. Therefore we change the driver callback return type from void to int. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com>	2020-09-11 11:50:54 +02:00
Bruce Richardson	10b71caecb	rawdev: add private data size to info query Currently with the rawdev API there is no way to check that the structure passed in via the dev_private pointer in the dev_info structure is of the correct type - it's just checked that it is non-NULL. Adding in the length of the expected structure provides a measure of typechecking, and can also be used for ABI compatibility in future, since ABI changes involving structs almost always involve a change in size. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com>	2020-09-11 11:50:53 +02:00
Fady Bader	790defbdb6	ethdev: build on Windows Add ethdev and a missing dependency (meter) to the list of libraries built on Windows. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com>	2020-09-11 01:55:39 +02:00
Fady Bader	0e9e9e7548	ethdev: remove structs from export map Some ethdev structs were present in .map export list. There structs are removed from the .map file. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-11 01:55:39 +02:00
Fady Bader	5ae0d39d4a	telemetry: build stubs on Windows Telemetry didn't compile under Windows. Empty stubs are arranged, waiting for a proper implementation. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-11 01:55:35 +02:00
Fady Bader	fc5ae478f8	eal/windows: update symbols export The .def file is a reduced copy of the .map file. In order to ease comparison, some lines are moved in the .def file to be in the same order as in the .map file. rte_eal_get_configuration is removed because it has been removed from the .map file in DPDK 19.11. Note: it had been removed and re-added by mistake in 20.08 .def file. Few functions are added in the .def file to allow ethdev on Windows. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-11 01:40:36 +02:00
Fady Bader	f5192f9162	eal/windows: add stub for Rx interrupt control Interrupts are not implemented for Windows. In order to compile ethdev on Windows, an empty interrupt control function stub has to be added for Windows. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-11 01:38:26 +02:00
Fady Bader	16f0d03098	net: build on Windows librte_net was not compiling under Windows. To solve this, needed header files are added. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-10 21:53:48 +02:00
Fady Bader	64fb21d86b	net: replace htons with constant endian swap htons is not defined in Windows with the MinGW compiler. htons is replaced with RTE_BE16 in order to compile under Windows. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>	2020-09-10 21:52:28 +02:00
Fady Bader	507e1ca07b	net: fix redefinition in Windows In Windows, s_addr is defined in winsock2.h which is included by windows.h. It is undefined in order to be defined as part of rte_ether_hdr. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-10 21:52:04 +02:00
Tal Shnaiderman	a4235b781f	eal/windows: probe vdev Add needed function calls in rte_eal_init to detect vdev PMD. eal_option_device_parse() rte_service_init() rte_bus_probe() Signed-off-by: Tal Shnaiderman <talshn@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Tested-by: Narcisa Vasile <navasile@linux.microsoft.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>	2020-09-09 14:40:41 +02:00
Tal Shnaiderman	ec6d514636	bus/vdev: build on Windows current support will build vdev with empty MP functions currently unsupported for Windows. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com> Tested-by: Narcisa Vasile <navasile@linux.microsoft.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>	2020-09-09 14:39:37 +02:00
Ciara Power	ec260aa3ad	config: remove default configs used with make Make is not supported for compiling DPDK, the config files are no longer needed. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-08 00:11:30 +02:00
Ciara Power	3cc6ecfdfe	build: remove makefiles A decision was made [1] to no longer support Make in DPDK, this patch removes all Makefiles that do not make use of pkg-config, along with the mk directory previously used by make. [1] https://mails.dpdk.org/archives/dev/2020-April/162839.html Signed-off-by: Ciara Power <ciara.power@intel.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-08 00:09:50 +02:00
Thomas Monjalon	4f86c0ba19	version: 20.11-rc0 Start a new release cycle with empty release notes. The ABI version becomes 21.0. The ABI major is back to normal, having only one number (21 vs 20.0). The map files are updated to the new ABI major number (21). The ABI exceptions are dropped. Travis ABI check is disabled because compatibility is not preserved. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-08-12 11:32:16 +02:00
Stephen Hemminger	156055da95	ethdev: improve API comment for MAC address addition The comment used the term whitelist and was awkardly written. Replace it with simpler direct description of adding a new address. No code or API changes for this. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: John McNamara <john.mcnamara@intel.com>	2020-08-07 13:02:10 +02:00
Stephen Hemminger	95a2e18dfb	kni: fix reference to master/slave process In DPDK, the correct terms for process are primary/secondary. This is bugfix, not a change in terms for new release. Fixes: `f2e7592c47` ("kni: fix multi-process support") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2020-08-07 13:01:54 +02:00
Thomas Monjalon	2fca871ce7	ethdev: remove device-specific comments from VLAN API Some confusing comments were still present from old days, when most drivers were from Intel. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-08-05 20:01:49 +02:00
Patrick Fu	6563cf9238	vhost: fix async copy on multi-page buffers Async copy fails when single ring buffer vector is split on multiple physical pages. This happens because current hpa address translation function doesn't handle multi-page buffers. A new gpa to hpa address conversion function, which returns the hpa on the first hitting host pages, is implemented in this patch. Async data path recursively calls this new function to construct a multi-segments async copy descriptor for ring buffers crossing physical page boundaries. Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-30 00:41:24 +02:00
Maxime Coquelin	b53a497294	vhost: fix guest notification setting If rte_vhost_enable_guest_notification is called before the virtqueue is ready, the configuration is lost. This patch fixes this by saving the guest notification enablement value requested by the application, and apply it before the virtqueue is made ready to the application. Fixes: `604052ae53` ("net/vhost: support queue update") Reported-by: Yinan Wang <yinan.wang@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Yinan Wang <yinan.wang@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-30 00:41:24 +02:00
Yuying Zhang	4a67e71816	net: fix IPv6 checksum with TSO The ol_flags check lacks of flag for IPv6 which causes checksum flag configuration error while IPv6/TCP TSO packet is sent. This patch fixes the issue by adding PKT_TX_TCP_SEG flag. The rte_net_intel_cksum_flags_prepare() function prepares the pseudo header checksum in packet data when doing checksum or TSO offload. Fixes: `520059a41a` ("net: check fragmented headers in non-debug as well") Signed-off-by: Yuying Zhang <yuying.zhang@intel.com> Tested-by: Xi Zhang <xix.zhang@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-30 00:41:24 +02:00
Patrick Fu	819a716858	vhost: fix async callback return type The async copy device callbacks are used by async APIs to transfer data and check completion status. Async APIs return the number of packets successfully processed to the caller applications and no error (negative) value is allowed for API return value. Thus, negative return values from async device callbacks don't have meaningful usage, while adding overhead in checking the return value validity. This patch change the callback return values from "int" to "uint32_t" to get aligned with async API definition. Fixes: `78639d5456` ("vhost: introduce async enqueue registration API") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-30 00:41:23 +02:00
Parav Pandit	21587b4921	eal: introduce macro for bit definition There are several drivers which duplicate bit generation macro. Introduce a generic bit macros so that such drivers avoid redefining same in multiple drivers. Signed-off-by: Parav Pandit <parav@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: Morten Brørup <mb@smartsharesystems.com>	2020-07-28 18:27:46 +02:00
Yunjian Wang	a5f803c804	hash: fix out-of-memory handling in hash creation The function rte_zmalloc_socket() could return NULL, the return value need to be checked. Fixes: `5915699153` ("hash: fix scaling by reducing contention") Cc: stable@dpdk.org Reported-by: Bin Huang <brian.huangbin@huawei.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2020-07-27 12:53:40 +02:00
Anatoly Burakov	8b7b02f945	power: fix environment detection Anything coming from sysfs has a newline at the end. Cut it off before comparing the strings. Fixes: `20ab67608a` ("power: add environment capability probing") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Hunt <david.hunt@intel.com> Tested-by: Lihong Ma <lihongx.ma@intel.com> Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-22 01:35:39 +02:00
Zhike Wang	9dbe628a7b	mempool: fix allocation in memzone during retry If allocation is successful on the first attempt, typically there is no problem since we allocated everything required and we'll terminate the loop (if memory chunk is really sufficient to populate required number of mempool elements). If the first attempt fails, we try to allocate half of mem_size and it succeed, we'll have one more iteration of the for-loop to allocate memory for remaining elements and should not try the next time with quarter of the mem_size. It is wrong that max_alloc_size is divided by 2 in the case of successful allocation as well, or invalid memory can be allocated, and leads to population failure, then errno other than ENOMEM may be returned. Fixes: `3a3d0c75b4` ("mempool: fix slow allocation of large pools") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Signed-off-by: Zhike Wang <wangzhike@jd.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-07-22 01:27:10 +02:00
Nithin Dabilpuram	5b2655a693	node: add packet classifier This node classifies pkts based on packet type and sends them to appropriate next node. This is node helps in distribution of packets from ethdev_rx node to different next node with a constant overhead for all packet types. Currently all except non fragmented IPV4 packets are marked to be sent to "pkt_drop" node. Performance difference on ARM64 Octeontx2 is -4.9% due to addition of new node in the path. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>	2020-07-22 01:18:59 +02:00
Raslan Darawsheh	e4f9eab7d9	net: fix pedantic build when trying to compile rte_mpls with pedantic enabled, on old compilers like 4.8 it will complain about bit field definition. error: type of bit-field 'bs' is a GCC extension [-Werror=pedantic] error: type of bit-field 'tc' is a GCC extension [-Werror=pedantic] error: type of bit-field 'tag_lsb' is a GCC extension [-Werror=pedantic] This fixes the compilation error by adding extension to the header definition. Fixes: `e480cf487a` ("net: add MPLS header structure") Cc: stable@dpdk.org Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-21 17:36:54 +02:00
Patrick Fu	a608436b63	vhost: fix double-free with zero-copy zmbufs should be set to NULL when getting freed to avoid double free on the same buffer pointer Fixes: `b0a985d1f3` ("vhost: add dequeue zero copy") Cc: stable@dpdk.org Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:55:30 +02:00
Patrick Fu	47958f7cbf	vhost: fix async completion of multi-seg packets In async enqueue copy, a packet could be split into multiple copy segments. When polling the copy completion status, current async data path assumes the async device callbacks are aware of the packet boundary and return completed segments only if all segments belonging to the same packet are done. Such assumption are not generic to common async devices and may degrade the copy performance if async callbacks have to implement it in software manner. This patch adds tracking of the completed copy segments at vhost side. If async copy device reports partial completion of a packets, only vhost internal record is updated and vring status keeps unchanged until remaining segments of the packet are also finished. The async copy device is no longer necessary to care about the packet boundary. Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:54:58 +02:00
Patrick Fu	5c7ddd6b14	vhost: fix missing virtqueue status check in async path Vring should not be touched if vq is disabled. This patch adds the vq status check in async enqueue polling to avoid accessing to a disabled queue. Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:50:29 +02:00
Patrick Fu	6a82bceb56	vhost: fix missing device pointer validity check This patch adds the check of dev pointer in vhost async enqueue completion poll. If a NULL dev pointer detected, the poll function returns immediately. Coverity issue: 360839 Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:50:29 +02:00
Andrew Rybchenko	520059a41a	net: check fragmented headers in non-debug as well Pseudo-header checksum calculation requires contiguous headers. There is no any formal requirements on data location and mbuf structure which could be used by the application. Since commit `dfc6b2fd8d` ("mbuf: remove Intel offload checks from generic API") fragmented headers checks are done inside rte_net_intel_cksum_flags_prepare() in RTE_LIBRTE_ETHDEV_DEBUG build because it is moved from rte_validate_tx_offload() which is called under debug only. Make corresponding check to be done in non-debug build as well to avoid bad accesses, incorrect checksum calculation and to return appropriate error from Tx prepare. Make no-offloads check more precise and do it in non-debug build as well to avoid contiguous headers check and Tx prepare failure if it is not actually required. Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-21 13:54:54 +02:00
Ruifeng Wang	4cdd49f9b0	lpm: report error when defer queue overflows Coverity complains about unchecked return value of rte_rcu_qsbr_dq_enqueue. By default, defer queue size is big enough to hold all tbl8 groups. When enqueue fails, return error to the user to indicate system issue. Coverity issue: 360832 Fixes: `8a9f8564e9` ("lpm: implement RCU rule reclamation") Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-21 20:48:40 +02:00
Phil Yang	db48bae253	mbuf: use C11 atomic builtins for refcnt Use C11 atomic builtins with explicit ordering instead of rte_atomic ops which enforce unnecessary barriers on aarch64. Suggested-by: Olivier Matz <olivier.matz@6wind.com> Suggested-by: Dodji Seketeli <dodji@redhat.com> Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-21 10:30:35 +02:00
Ciara Power	2a7d0b872f	telemetry: add upper limit on connections This patch limits the number of client connections to the new telemetry socket. The limit is set to 10. Signed-off-by: Ciara Power <ciara.power@intel.com>	2020-07-19 15:36:37 +02:00
Phil Yang	672a150563	eal: add wrapper for C11 atomic thread fence Provide a wrapper for __atomic_thread_fence builtins to support optimized code for __ATOMIC_SEQ_CST memory order for x86 platforms. Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-17 16:00:30 +02:00
Ciara Power	9683022930	metrics: fix header installation with meson If Jansson was found, the headers list is overwritten when including rte_metrics_telemetry.h, which prevents rte_metrics.h from being installed. This is now fixed to add to headers, rather than overwrite, to allow both headers be installed when Jansson is present. Fixes: `c5b7197f66` ("telemetry: move some functions to metrics library") Cc: stable@dpdk.org Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-07-17 16:00:30 +02:00
Honnappa Nagarahalli	8831678b51	eal: change the log level for test asserts Change the log level for RTE_TEST_ASSERT macro to error to help log errors while running test cases. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>	2020-07-17 10:47:56 +02:00
Ferruh Yigit	353162537f	lpm: fix build dependency on RCU library 'librte_rcu' is now dependency to 'librte_lpm' library, this dependency should be reflected to build system. Fixes: `8a9f8564e9` ("lpm: implement RCU rule reclamation") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-07-15 13:15:06 +02:00
Bing Zhao	d164c609e7	ethdev: add eCPRI key fields to flow API Add a new item "rte_flow_item_ecpri" in order to match eCRPI header. eCPRI is a packet based protocol used in the fronthaul interface of 5G networks. Header format definition could be found in the specification via the link below: https://www.gigalight.com/downloads/standards/ecpri-specification.pdf eCPRI message can be over Ethernet layer (.1Q supported also) or over UDP layer. Message header formats are the same in these two variants. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-13 02:11:30 +02:00
Renata Saiakhova	c9c74288f0	ethdev: add function to release HW rings Free previously allocated memzone for HW rings Signed-off-by: Renata Saiakhova <renata.saiakhova@ekinops.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-11 06:18:54 +02:00
Viacheslav Ovsiienko	9da82e8d8b	mbuf: introduce accurate packet Tx scheduling There is the requirement on some networks for precise traffic timing management. The ability to send (and, generally speaking, receive) the packets at the very precisely specified moment of time provides the opportunity to support the connections with Time Division Multiplexing using the contemporary general purpose NIC without involving an auxiliary hardware. For example, the supporting of O-RAN Fronthaul interface is one of the promising features for potentially usage of the precise time management for the egress packets. The main objective of this patchset is to specify the way how applications can provide the moment of time at what the packet transmission must be started and to describe in preliminary the supporting this feature from mlx5 PMD side [1]. The new dynamic timestamp field is proposed, it provides some timing information, the units and time references (initial phase) are not explicitly defined but are maintained always the same for a given port. Some devices allow to query rte_eth_read_clock() that will return the current device timestamp. The dynamic timestamp flag tells whether the field contains actual timestamp value. For the packets being sent this value can be used by PMD to schedule packet sending. The device clock is opaque entity, the units and frequency are vendor specific and might depend on hardware capabilities and configurations. If might (or not) be synchronized with real time via PTP, might (or not) be synchronous with CPU clock (for example if NIC and CPU share the same clock source there might be no any drift between the NIC and CPU clocks), etc. After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed deprecation and obsoleting, these dynamic flag and field might be used to manage the timestamps on receiving datapath as well. Having the dedicated flags for Rx/Tx timestamps allows applications not to perform explicit flags reset on forwarding and not to promote received timestamps to the transmitting datapath by default. The static PKT_RX_TIMESTAMP is considered as candidate to become the dynamic flag and this move should be discussed. When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it tries to synchronize the time of packet appearing on the wire with the specified packet timestamp. If the specified one is in the past it should be ignored, if one is in the distant future it should be capped with some reasonable value (in range of seconds). These specific cases ("too late" and "distant future") can be optionally reported via device xstats to assist applications to detect the time-related problems. There is no any packet reordering according timestamps is supposed, neither within packet burst, nor between packets, it is an entirely application responsibility to generate packets and its timestamps in desired order. The timestamps can be put only in the first packet in the burst providing the entire burst scheduling. PMD reports the ability to synchronize packet sending on timestamp with new offload flag: This is palliative and might be replaced with new eth_dev API about reporting/managing the supported dynamic flags and its related features. This API would break ABI compatibility and can't be introduced at the moment, so is postponed to 20.11. For testing purposes it is proposed to update testpmd "txonly" forwarding mode routine. With this update testpmd application generates the packets and sets the dynamic timestamps according to specified time pattern if it sees the "rte_dynfield_timestamp" is registered. The new testpmd command is proposed to configure sending pattern: set tx_times <burst_gap>,<intra_gap> <intra_gap> - the delay between the packets within the burst specified in the device clock units. The number of packets in the burst is defined by txburst parameter <burst_gap> - the delay between the bursts in the device clock units As the result the bursts of packet will be transmitted with specific delays between the packets within the burst and specific delay between the bursts. The rte_eth_read_clock is supposed to be engaged to get the current device clock value and provide the reference for the timestamps. [1] http://patches.dpdk.org/patch/73714/ Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-11 06:18:54 +02:00
Ivan Malov	5cf04fd15a	net: use named constants for deprecated QinQ TPIDs Add named constants for deprecated QinQ TPIDs. Update drivers which have already been using existing TPID named constants from librte_net to use the new named constants rather than magic numbers. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-11 06:18:53 +02:00
Junfeng Guo	d9a8bc6570	ethdev: add RSS types for IPv6 prefix This patch defines new RSS offload types for IPv6 prefix with 32, 40, 48, 56, 64, 96 bits of both SRC and DST IPv6 address. Ref https://tools.ietf.org/html/rfc6052. Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-11 06:18:53 +02:00
Wei Hu (Xavier)	50ce3e7aec	ethdev: fix VLAN offloads set if no relative capabilities Currently, there is a potential problem that calling the API function rte_eth_dev_set_vlan_offload to start VLAN hardware offloads which the driver does not support. If the PMD driver does not support certain VLAN hardware offloads and does not check for it, the hardware setting will not change, but the VLAN offloads in dev->data->dev_conf.rxmode.offloads will be turned on. It is supposed to check the hardware capabilities to decide whether the relative callback needs to be called just like the behavior in the API function named rte_eth_dev_configure. And it is also needed to cleanup duplicated checks which are done in some PMDs. Also, note that it is behaviour change for some PMDs which simply ignore (with error/warning log message) unsupported VLAN offloads, but now it will fail. Fixes: `a4996bd89c` ("ethdev: new Rx/Tx offloads API") Fixes: `0ebce6129b` ("net/dpaa2: support new ethdev offload APIs") Fixes: `f9416bbafd` ("net/enic: remove VLAN filter handler") Fixes: `4f7d9e383e` ("fm10k: update vlan offload features") Fixes: `fdba3bf15c` ("net/hinic: add VLAN filter and offload") Fixes: `b96fb2f0d2` ("net/i40e: handle QinQ strip") Fixes: `d4a27a3b09` ("nfp: add basic features") Fixes: `56139e85ab` ("net/octeontx: support VLAN filter offload") Fixes: `ba1b3b081e` ("net/octeontx2: support VLAN offloads") Fixes: `d87246a437` ("net/qede: enable and disable VLAN filtering") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com> Acked-by: Harman Kalra <hkalra@marvell.com> Acked-by: Jeff Guo <jia.guo@intel.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-11 06:18:53 +02:00
Wei Hu (Xavier)	36fbaaf30d	ethdev: fix data room size verification in Rx queue setup In the rte_eth_rx_queue_setup API function, the local variable named mbp_buf_size, which is the data room size of the input parameter mp, is checked to guarantee that each memory chunk used for net device in the mbuf is bigger than the min_rx_bufsize. But if mbp_buf_size is less than RTE_PKTMBUF_HEADROOM, the value of the following statement will be a large number since the mbp_buf_size is a unsigned value. mbp_buf_size - RTE_PKTMBUF_HEADROOM As a result, it will cause a segment fault in this situation. This patch fixes it by modify the check condition to guarantee that the local variable named mbp_buf_size is bigger than RTE_PKTMBUF_HEADROOM. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-11 06:18:53 +02:00
Ferruh Yigit	cacd2bb786	ethdev: verify reserved HW ring Function 'rte_eth_dma_zone_reserve()' returns an existing memzone based on name match, but other requested attributes are discarded. This may cause driver using a memzone with wrong size or alignment. Verify size, alignment and socket_id for matched memzone, and do not use memzone if any one of the attributes are not justified. It is possible to free the existing memzone and allocate again with the requested attributes but it is better caller do the explicit free. Reported-by: Renata Saiakhova <renata.saiakhova@ekinops.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-11 06:18:52 +02:00
Adrian Moreno	2025f4fe6c	vhost: support virtio status message This patch adds support to the new Virtio device get status Vhost-user message. The driver can send this new message to read the device status. One of the uses of this message is to ensure the feature negotiation has succeeded. According to the virtio spec, after completing the feature negotiation, the driver sets the FEATURE_OK status bit and re-reads it to ensure the device has accepted the features. This patch also clears the FEATURE_OK status bit if the feature negotiation has failed to let the driver know about his failure. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	41d201804c	vhost: support virtio status This patch adds support to the new Virtio device status Vhost-user protocol feature. Getting such information in the backend helps to know when the driver is done with the device configuration and so makes the initialization phase more robust. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	a15f9dbba0	vhost: check vDPA configuration succeed This patch checks whether vDPA device configuration succeed and does not set the CONFIGURED flag if it didn't. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	b46a99c600	vhost: make some vDPA callbacks mandatory Some of the vDPA callbacks have to be implemented for vDPA to work properly. This patch marks them as mandatory in the API doc and simplify code calling these ops with removing unnecessary checks that are now done at registration time. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	2ab58f20db	vhost: refactor virtio ready check This patch is a small refactoring, as preliminary work for adding support to Virtio status support. No functional change here. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	1c3df72bda	vhost: fix virtio ready flag check Before checking whether the device is ready is done a check on whether the RUNNING flag is set. Then the READY flag is set if virtio_is_ready() returns true. While it seems to not cause any issue, it makes more sense to check whether the READY flag is set and not the RUNNING one. Fixes: `c0674b1bc8` ("vhost: move the device ready check at proper place") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
David Marchand	7d1af09e98	eal/linux: truncate thread name pthread_setname_np refuses names larger than 16 bytes (\0 included). Rather than return an error, truncate the name to this limit in the rte_thread_setname helper. Caught with ixgbe which creates control thread with name "ixgbe-link-handler": Configuring Port 0 (socket 0) EAL: Cannot set name for ctrl thread ... EAL: Cannot set name for ctrl thread Port 0: link state change event ... EAL: Cannot set name for ctrl thread Port 0: link state change event Note: before this change, the thread would keep its original name, which meant in my test for the ixgbe handler either "dpdk-testpmd" or "eal-intr-thread". Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-07-11 15:03:47 +02:00
Ruifeng Wang	0f392d91b9	lpm: hide defer queue handle There is no need to return the defer queue handle in rte_lpm_rcu_qsbr_add, since enough flexibility has been provided to configure the defer queue. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-07-11 14:35:04 +02:00
Anatoly Burakov	20ab67608a	power: add environment capability probing Currently, there is no way to know if the power management env is supported without trying to initialize it. The init API also does not distinguish between failure due to some error and failure due to power management not being available on the platform in the first place. Thus, add an API that provides capability of probing support for a specific power management API. Suggested-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-07-11 13:31:16 +02:00
Thomas Monjalon	9d2b245937	pci: keep API compatibility with mmap values The function pci_map_resource() returns MAP_FAILED in case of error. When replacing the call to mmap() by rte_mem_map(), the error code became NULL, breaking the API. This function is probably not used outside of DPDK, but it is still a problem for two reasons: - the deprecation process was not followed - the Linux function pci_vfio_mmap_bar() is broken for i40e The error code is reverted to the Unix value MAP_FAILED. Windows needs to define this special value (-1 as in Unix). After proper deprecation process, the API could be changed again if really needed. Because of the switch from mmap() to rte_mem_map(), another part of the API was changed: "int additional_flags" are defined as "additional flags for the mapping range" without mentioning it was directly used in mmap(). Currently it is directly used in rte_mem_map(), that's why the values rte_map_flags must be mapped (sic) on the mmap ones in case of Unix OS. These are side effects of a badly defined API using Unix values. Bugzilla ID: 503 Fixes: `2fd3567e54` ("pci: use OS generic memory mapping functions") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Tested-by: Lihong Ma <lihongx.ma@intel.com>	2020-07-11 11:48:13 +02:00
Harman Kalra	3596a037ab	eal: fix parentheses in alignment macros Found an issue while using RTE_ALIGN_MUL_NEAR with an expression, like as passed in estimate_tsc_freq(). RTE_ALIGN_MUL_FLOOR resulted in unexpected value as parathesis are required to evaluate an expression. Fixes: `5120203d75` ("eal: add macros to align value to multiple") Cc: stable@dpdk.org Signed-off-by: Harman Kalra <hkalra@marvell.com>	2020-07-11 11:41:33 +02:00
Dmitry Kozlyuk	7daf5bdb0f	eal/windows: detect insufficient privileges for hugepages AdjustTokenPrivileges() succeeds even if no requested privileges have been granted; this behavior is documented. Check last error code in addition to return value to detect such case. Make error messages more specific and add troubleshooting hint. Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com>	2020-07-11 00:45:20 +02:00
Hongzhi Guo	982bb68cab	net: fix checksum on big endian CPUs With current code, the checksum of odd-length buffers is wrong on big endian CPUs: the last byte is not properly summed to the accumulator. Fix this by left-shifting the remaining byte by 8. For instance, if the last byte is 0x42, we should add 0x4200 to the accumulator on big endian CPUs. This change is similar to what is suggested in Errata 3133 of RFC 1071. Fixes: 6006818cfb26("net: new checksum functions") Cc: stable@dpdk.org Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-11 00:45:20 +02:00
Hongzhi Guo	d5df2ae042	net: fix unneeded replacement of TCP checksum 0 Per RFC768: If the computed checksum is zero, it is transmitted as all ones. An all zero transmitted checksum value means that the transmitter generated no checksum. RFC793 for TCP has no such special treatment for the checksum of zero. Fixes: `6006818cfb` ("net: new checksum functions") Cc: stable@dpdk.org Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Morten Brørup <mb@smartsharesystems.com>	2020-07-11 00:45:20 +02:00
Joyce Kong	58902736a4	vhost: restrict pointer aliasing for packed ring Restrict pointer aliasing to allow the compiler to vectorize loop more aggressively. With this patch, a 9.6% improvement is observed in throughput for the packed virtio-net PVP case, and a 2.8% improvement in throughput for the packed virtio-user PVP case. All performance data are measured on ThunderX-2 platform under 0.001% acceptable packet loss with 1 core on both vhost and virtio side. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-07-10 15:43:41 +02:00
Joyce Kong	428e684795	introduce restricted pointer aliasing marker The 'restrict' keyword is recognized in C99, while type qualifier '__restrict' compiles ok in C with all language levels. This patch is to replace the existing 'restrict' with '__rte_restrict' which is a common wrapper supported by all compilers. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-07-10 15:35:32 +02:00
Ruifeng Wang	8a9f8564e9	lpm: implement RCU rule reclamation Currently, the tbl8 group is freed even though the readers might be using the tbl8 group entries. The freed tbl8 group can be reallocated quickly. This results in incorrect lookup results. RCU QSBR process is integrated for safe tbl8 group reclaim. Refer to RCU documentation to understand various aspects of integrating RCU library into other libraries. To avoid ABI breakage, a struct __rte_lpm is created for lpm library internal use. This struct wraps rte_lpm that has been exposed and also includes members that don't need to be exposed such as RCU related config. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2020-07-10 13:41:29 +02:00
Phil Yang	e0a439466b	eal/linux: use C11 atomics for interrupt status The event status is defined as a volatile variable and shared between threads. Use C11 atomic built-ins with explicit ordering instead of rte_atomic ops which enforce unnecessary barriers on aarch64. The event status has been cleaned up by the compare-and-swap operation when we free the event data, so there is no need to set it to invalid after that. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Harman Kalra <hkalra@marvell.com>	2020-07-09 18:53:40 +02:00
Feifei Wang	2d6ed071a8	ring: use custom element for fixed size API Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation. This reduces code duplication and improves code maintenance. Tests done on Arm, x86 [1] and PPC [2] do not indicate performance degradation. [1] https://mails.dpdk.org/archives/dev/2020-July/173780.html [2] https://mails.dpdk.org/archives/dev/2020-July/173863.html Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-09 17:22:36 +02:00
Feifei Wang	019bffab51	ring: remove experimental flag from custom element API Remove the experimental tag for rte_ring_xxx_elem APIs that have been around for 2 releases. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-09 17:22:29 +02:00
Feifei Wang	2f86fb666e	ring: remove experimental flag from reset API Remove the experimental tag for rte_ring_reset API that have been around for 4 releases. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>	2020-07-09 17:21:54 +02:00
Levend Sayar	604d426de3	service: fix C++ linkage "extern C" define is added to rte_service_component.h file to be able to use in C++ context Fixes: `21698354c8` ("service: introduce service cores concept") Cc: stable@dpdk.org Signed-off-by: Levend Sayar <levendsayar@gmail.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2020-07-09 16:23:57 +02:00
Ferruh Yigit	f6ac2b06ec	ethdev: fix log type for some error messages Some log macros was using 'EAL' logtype, convert them to 'ethdev'. Also fix missing EOL and fix syntax for some logs. Fixes: `214ed1acd1` ("ethdev: add iterator to match devargs input") Fixes: `e489007a41` ("ethdev: add generic create/destroy ethdev APIs") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-07 23:38:28 +02:00
Patrick Fu	cd6760da10	vhost: introduce async enqueue for split ring This patch implements async enqueue data path for split ring. 2 new async data path APIs are defined, by which applications can submit and poll packets to/from async engines. The async engine is either a physical DMA device or it could also be a software emulated backend. The async enqueue data path leverages callback functions registered by applications to work with the async engine. Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-07 23:38:28 +02:00
Patrick Fu	78639d5456	vhost: introduce async enqueue registration API Performing large memory copies usually takes up a major part of CPU cycles and becomes the hot spot in vhost-user enqueue operation. To offload the large copies from CPU to the DMA devices, asynchronous APIs are introduced, with which the CPU just submits copy jobs to the DMA but without waiting for its copy completion. Thus, there is no CPU intervention during data transfer. We can save precious CPU cycles and improve the overall throughput for vhost-user based applications. This patch introduces registration/un-registration APIs for vhost async data enqueue operation. Together with the registration APIs implementations, data structures and the prototype of the async callback functions required for async enqueue data path are also defined. Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-07 23:38:28 +02:00
Simei Su	9a859b8c4a	ethdev: add PPPoE RSS offload types This patch defines new RSS offload types for PPPoE. Typically, session id would be the RSS input set for a PPPoE packet, but as a hint, each driver may have different default behaviors. Signed-off-by: Simei Su <simei.su@intel.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-07 23:38:27 +02:00
Lukasz Wojciechowski	048db4b6dc	service: fix core mapping reset The rte_service_lcore_reset_all function stops execution of services on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE. However the thread loop for slave lcores (eal_thread_loop) distincts these roles to set lcore state after processing delegated function. It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE. So changing the role to RTE before stopping work in slave lcores causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait must be run after rte_service_lcore_reset_all to bring back lcores to launchable (WAIT) state. This has been fixed in test app and clarified in API documentation. Setting the state to WAIT in rte_service_runner_func is premature as the rte_service_runner_func function is still a part of the lcore function delegated to slave lcore. The state is overwritten anyway in slave lcore thread loop. This premature setting state to WAIT might however cause rte_eal_lcore_wait, that was called by the application, to return before slave lcore thread set the FINISHED state. That's why it is removed from librte_eal rte_service_runner_func function. Bugzilla ID: 464 Fixes: `21698354c8` ("service: introduce service cores concept") Fixes: `f038a81e1c` ("service: add unit tests") Cc: stable@dpdk.org Reported-by: Sarosh Arif <sarosh.arif@emumba.com> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2020-07-08 18:52:49 +02:00
Phil Yang	030c216411	eventdev: relax SMP barriers with C11 atomics The impl_opaque field is shared between the timer arm and cancel operations. Meanwhile, the state flag acts as a guard variable to make sure the update of impl_opaque is synchronized. The original code uses rte_smp barriers to achieve that. This patch uses C11 atomics with an explicit one-way memory barrier instead of full barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64. Since compilers can generate the same instructions for volatile and non-volatile variable in C11 __atomics built-ins, so remain the volatile keyword in front of state enum to avoid the ABI break issue. Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2020-07-08 18:16:41 +02:00
Phil Yang	e84d9c62c6	eventdev: remove redundant reset on timer cancel There is no thread will access these impl_opaque data after timer canceled. When new timer armed, it got refilled. So the cleanup process is unnecessary. Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>	2020-07-08 18:16:41 +02:00
Phil Yang	1028d63eb2	eventdev: use C11 atomics for lcore timer armed flag The in_use flag is a per core variable which is not shared between lcores in the normal case and the access of this variable should be ordered on the same core. However, if non-EAL thread pick the highest lcore to insert timers into, there is the possibility of conflicts on this flag between threads. Then the atomic compare-and-swap operation is needed. Use the C11 atomics instead of the generic rte_atomic operations to avoid the unnecessary barrier on aarch64. Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2020-07-08 18:16:41 +02:00
Phil Yang	aceb737d6f	eventdev: fix race condition on timer list counter The n_poll_lcores counter and poll_lcore array are shared between lcores and the update of these variables are out of the protection of spinlock on each lcore timer list. The read-modify-write operations of the counter are not atomic, so it has the potential of race condition between lcores. Use c11 atomics with RELAXED ordering to prevent confliction. Fixes: `cc7b73ea9e` ("eventdev: add new software timer adapter") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2020-07-08 18:16:41 +02:00
Fiona Trahe	b69f927d52	cryptodev: add traces in multi-process path This patch adds traces to some Cryptodev functions that are used in primary/secondary context. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2020-07-08 18:16:17 +02:00
Fiona Trahe	21b6a35171	cryptodev: add function to check queue pair status This patch adds function that can check if queue pair was already setup. This may be useful when dealing with multi process approach in cryptodev. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2020-07-08 18:16:17 +02:00
David Coyle	15bb21aa60	cryptodev: add comments for DOCSIS protocol Add a note to the rte_crypto_sym_op->auth.data fields to state that for DOCSIS security protocol, these are used to specify the offset and length of data over which the CRC is calculated. Signed-off-by: David Coyle <david.coyle@intel.com> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2020-07-08 00:15:35 +02:00
David Coyle	e44b3faf85	security: support DOCSIS protocol Add support for DOCSIS protocol to rte_security library. This support currently comprises the combination of Crypto and CRC operations. Signed-off-by: David Coyle <david.coyle@intel.com> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2020-07-08 00:15:35 +02:00
Adam Dybkowski	53bf0ffd51	cryptodev: verify session mempool element size This patch adds the verification of the element size of the mempool provided for the session creation. Returns the error if the element size is too small to hold the session object. Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2020-07-08 00:15:35 +02:00
David Marchand	d2fd16c8b0	eal: add multiprocess disable API The multiprocess feature has been implicitly enabled so far. Applications might want to explicitly disable like when using the non-EAL threads registration API. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:41:06 +02:00
David Marchand	b41befd3af	eal: add lcore iterators Add a helper to iterate all lcores. The iterator callback is read-only wrt the lcores list. Implement a dump function on top of this for debugging. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:41:06 +02:00
David Marchand	61bb531295	eal: add lcore init callbacks DPDK components and applications can have their say when a new lcore is initialized. For this, they can register a callback for initializing and releasing their private data. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:41:06 +02:00
David Marchand	5c307ba2a5	eal: register non-EAL threads as lcores DPDK allows calling some part of its API from a non-EAL thread but this has some limitations. OVS (and other applications) has its own thread management but still want to avoid such limitations by hacking RTE_PER_LCORE(_lcore_id) and faking EAL threads potentially unknown of some DPDK component. Introduce a new API to register non-EAL thread and associate them to a free lcore with a new NON_EAL role. This role denotes lcores that do not run DPDK mainloop and as such prevents use of rte_eal_wait_lcore() and consorts. Multiprocess is not supported as the need for cohabitation with this new feature is unclear at the moment. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:41:05 +02:00
David Marchand	a837d5c598	eal: move lcore role code For consistency sake, move all lcore role code in the dedicated compilation unit / header. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:39:26 +02:00
David Marchand	2ab55f78d1	eal: introduce thread uninit helper This is a preparation step for dynamically unregistering threads. Since we explicitly allocate a per thread trace buffer in __rte_thread_init, add an internal helper to free this buffer. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:39:26 +02:00
David Marchand	266b641ccf	eal: introduce thread init helper Introduce a helper responsible for initialising the per thread context. We can then have a unified context for EAL and non-EAL threads and remove copy/paste'd OS-specific helpers. Per EAL thread CPU affinity setting is separated from the thread init. It is to accommodate with Windows EAL where CPU affinity is not set at the moment. Besides, having affinity set by the master lcore in FreeBSD and Linux will make it possible to detect errors rather than panic in the child thread. But the cleanup when such an event happens is left for later. A side-effect of this patch is that control threads can now use recursive locks (rte_gettid() was not called before). Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:39:26 +02:00
David Marchand	3ef1a0b4a1	eal: fix multiple definition of per lcore thread id Because of the inline accessor + static declaration in rte_gettid(), we end up with multiple symbols for RTE_PER_LCORE(_thread_id). Each compilation unit will pay a cost when accessing this information for the first time. $ nm build/app/dpdk-testpmd \| grep per_lcore__thread_id 0000000000000054 d per_lcore__thread_id.5037 0000000000000040 d per_lcore__thread_id.5103 0000000000000048 d per_lcore__thread_id.5259 000000000000004c d per_lcore__thread_id.5259 0000000000000044 d per_lcore__thread_id.5933 0000000000000058 d per_lcore__thread_id.6261 0000000000000050 d per_lcore__thread_id.7378 000000000000005c d per_lcore__thread_id.7496 000000000000000c d per_lcore__thread_id.8016 0000000000000010 d per_lcore__thread_id.8431 Make it global as part of the DPDK_21 stable ABI. Fixes: `ef76436c68` ("eal: get unique thread id") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Reviewed-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:39:26 +02:00
David Marchand	7afdfac8e6	eal: relocate per thread symbols to common We have per lcore thread symbols scattered in OS implementations but common code relies on them. Move all of them in common. RTE_PER_LCORE(_socket_id) and RTE_PER_LCORE(_cpuset) have public accessors and are not exported through the library map, they can be made static. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-08 14:39:26 +02:00
Honnappa Nagarahalli	264f7f80e1	eal/arm: adjust memory barriers for IO on ARMv8 Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy atomicity memory model. Armv8-a memory model has been strengthened to require other-multi-copy atomicity. This property requires memory accesses from an observer to become visible to all other observers simultaneously [3]. This means a) A write arriving at an endpoint shared between multiple CPUs is visible to all CPUs b) A write that is visible to all CPUs is also visible to all other observers in the shareability domain This allows for using cheaper DMB instructions in the place of DSB for devices that are visible to all CPUs (i.e. devices that DPDK caters to). Please refer to [1], [2] and [3] for more information. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f [2] https://www.youtube.com/watch?v=i6DayghhA8Q [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>	2020-07-08 13:44:23 +02:00
Igor Romanov	f3c256b621	service: fix lcore iteration The service core list is populated, but not used. Incorrect lcore states are examined for a service. Use the populated list to iterate over service cores. Fixes: `e484ccddbe` ("service: avoid false sharing on core state") Cc: stable@dpdk.org Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-07-07 23:48:44 +02:00
Stephen Hemminger	b2a0b9f044	rib: add C++ include guard All include files should be safe from C++ Fixes: `5a5793a5ff` ("rib: add RIB library") Fixes: `f7e861e21c` ("rib: support IPv6") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2020-07-07 23:24:38 +02:00
Stephen Hemminger	58607c2e27	rib: check for negative maximum of nodes Max_nodes in config is signed, but a negative value makes no sense. Get rid of extra BSD style parens. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2020-07-07 23:22:18 +02:00
Stephen Hemminger	7bd83e60bf	rib: constify arguments The getter functions should take a constant pointer to make it clear that node is not modified. The rib create functions do not modify their config structure. Mark the config as constant so that programs can pass simple constant data. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2020-07-07 23:22:05 +02:00
Stephen Hemminger	041a3971c8	cfgfile: fix stack buffer underflow If cfgfile is give a line with comment character at the start of the line, it will dereference outside of the buffer. Detected with address sanitizer: SUMMARY: AddressSanitizer: stack-buffer-underflow lib/librte_cfgfile/rte_cfgfile.c:194 in rte_cfgfile_load_with_params Shadow bytes around the buggy address: 0x200fff79f6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200fff79f6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200fff79f6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200fff79f6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200fff79f6e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x200fff79f6f0: 00 00 00 00 f1 f1 f1[f1]00 00 00 00 00 00 00 00 0x200fff79f700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200fff79f710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200fff79f720: 04 f2 f2 f2 f3 f3 f3 f3 00 00 00 00 00 00 00 00 0x200fff79f730: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 0x200fff79f740: f2 f2 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==2189==ABORTING Fixes: `a6a47ac9c2` ("cfgfile: rework load function") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-07 23:22:04 +02:00
Bruce Richardson	a8550b7731	rawdev: export dump function in map file The rte_rawdev_dump function was missing from the map file, meaning it was unavailable for use when linking dynamically. Fixes: `c88b3f2558` ("rawdev: introduce raw device library") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2020-07-07 19:18:04 +02:00
Bruce Richardson	2689221592	rawdev: fill NUMA socket ID in info The rawdev info struct has a socket_id field which was not filled in. We can also omit the checks for the parameter struct being null, since that is previously checked in the function. Fixes: `c88b3f2558` ("rawdev: introduce raw device library") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2020-07-07 19:18:04 +02:00
Bruce Richardson	201a68c678	rawdev: allow getting info for unknown device To call the rte_rawdev_info_get() function, the user currently has to know the underlying type of the device in order to pass an appropriate structure or buffer as the dev_private pointer in the info structure. By allowing a NULL value for this field, we can skip getting the device-specific info and just return the generic info - including the device name and driver, which can be used to determine the device type - to the user. This ensures that basic info can be get for all rawdevs, without knowing the type, and even if the info driver API call has not been implemented for the device. Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2020-07-07 19:18:04 +02:00
Haiyue Wang	598be72395	vfio: support VF token The Linux kernel module vfio-pci introduces the VF token to enable SR-IOV support since 5.7. The VF token can be set by a vfio-pci based PF driver and must be known by the vfio-pci based VF driver in order to gain access to the device. Since the vfio-pci module uses the VF token as internal data to provide the collaboration between SR-IOV PF and VFs, so DPDK can use the same VF token for all PF devices by specifying the related EAL option. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Harman Kalra <hkalra@marvell.com>	2020-07-07 14:06:49 +02:00
Haiyue Wang	edca6d883e	eal: fix uuid header dependencies Add the dependent header files explicitly, so that the user just needs to include the 'rte_uuid.h' header file directly to avoid compile error: (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’ (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’ Fixes: `6bc67c497a` ("eal: add uuid API") Cc: stable@dpdk.org Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-07-07 14:06:49 +02:00
Stephen Hemminger	2cc263c3ff	vfio: lower the priority of startup messages The startup of VFIO is too noisy. Logging is expensive on some systems, and distracting to the user. It should not be logging at NOTICE level, reduce it to INFO level. It really should be DEBUG here but that would hide it by default. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-07-07 13:51:28 +02:00
Yunjian Wang	f4823a3982	vfio: remove unused variable The 'group_status' has never been used and can be removed. Fixes: `94c0776b1b` ("vfio: support hotplug") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-07-07 13:49:56 +02:00
Stephen Hemminger	67ae5936c4	eal: fix lcore accessors for non-EAL threads If rte_lcore_index() is asked to give the index of the current lcore (argument -1) and is called from a non-EAL thread then it would invalid result. The result would come lcore_config[-1].core_index which is some other data in the per-thread area. The resolution is to return -1 which is what rte_lcore_index() returns if handed an invalid lcore. Same issue existed with rte_lcore_to_cpu_id(). Bugzilla ID: 446 Fixes: `26cc3bbe4d` ("eal: add lcore accessors") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Marchand <david.marchand@redhat.com>	2020-07-07 13:45:50 +02:00
Honnappa Nagarahalli	e1c9850f55	eal/armv8: force inlining of timer API Change the inline functions to use __rte_always_inline to be consistent with rest of the inline functions. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-07-07 13:21:26 +02:00
Honnappa Nagarahalli	97c910139b	eal/armv8: fix timer frequency calibration with PMU get_tsc_freq uses 'nanosleep' system call to calculate the CPU frequency. However, 'nanosleep' results in the process getting un-scheduled. The kernel saves and restores the PMU state. This ensures that the PMU cycles are not counted towards a sleeping process. When RTE_ARM_EAL_RDTSC_USE_PMU is defined, this results in incorrect CPU frequency calculation. This logic is replaced with generic counter based loop. Bugzilla ID: 450 Fixes: `f91bcbb2d9` ("eal/armv8: use high-resolution cycle counter") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-07-07 13:20:50 +02:00
Jerin Jacob	a7551b6c60	log: remove unneeded logtype declaration RTE_LOG_REGISTER macro already declares the log type. Remove the unneeded log type declaration. Fixes: `9c99878aa1` ("log: introduce logtype register macro") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Gage Eads <gage.eads@intel.com>	2020-07-07 13:18:23 +02:00
Hemant Agrawal	c843fca96c	rawdev: remove remaining experimental tags The experimental tags were removed, but the comment is still having API classification as EXPERIMENTAL Fixes: `931cc531aa` ("rawdev: remove experimental tag") Cc: stable@dpdk.org Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-07-07 12:54:22 +02:00
David Marchand	74f4d6424d	lib: remind experimental status in headers The following libraries are experimental, all of their functions can be changed or removed: - librte_bbdev - librte_bpf - librte_compressdev - librte_fib - librte_flow_classify - librte_graph - librte_ipsec - librte_node - librte_rcu - librte_rib - librte_stack - librte_telemetry Their status is properly announced in MAINTAINERS. Remind this status in their headers in a common fashion (aligned to ABI docs). Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-07 12:49:10 +02:00
David Marchand	7762e0139b	build: remove special versioning for non stable libraries Having a special versioning for experimental/internal libraries put a additional maintenance cost while this status is already announced in MAINTAINERS and the library headers/documentation. Following discussions and vote at 05/20 TB meeting [1], use a single versioning for all libraries in DPDK. Note: for the ABI check, an exception [2] had been added when tweaking this special versioning [3]. Prefer explicit libabigail rules (which will be dropped in 20.11). 1: https://mails.dpdk.org/archives/dev/2020-May/168450.html 2: https://git.dpdk.org/dpdk/commit/?id=23d7ad5db41c 3: https://git.dpdk.org/dpdk/commit/?id=ec2b8cd7ed69 Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-07 12:48:25 +02:00
Tal Shnaiderman	f6d7f40576	mbuf: build on Windows Build the lib for Windows. Export needed EAL functions used by the lib. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-07-07 01:47:24 +02:00
Tal Shnaiderman	9af385fd89	eal: support endianness detection on Windows Inclusion of the endian.h header is set only for Linux OS. Windows endianness will be determined by the predefined __BYTE_ORDER__ macro. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-07-07 01:42:27 +02:00
Fady Bader	080bda1a0a	mempool: build on Windows Some EAL functions are used by mempool lib but not exported on Windows. The functions are exported. Added mempool to supported libraries for Windows compilation. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-07 01:28:12 +02:00
Fady Bader	972848da5f	mempool: use generic memory syscall wrappers Using generic memory management calls instead of Unix memory management calls for mempool. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2020-07-07 01:24:55 +02:00
Fady Bader	2f59f3b085	eal: disable function versioning on Windows Function versioning implementation is not supported by Windows. Function versioning is disabled on Windows. Signed-off-by: Fady Bader <fady@mellanox.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-07 01:23:29 +02:00
Alan Dewar	83415d4fd8	sched: fix port time rounding The QoS scheduler works off port time that is computed from the number of CPU cycles that have elapsed since the last time the port was polled. It divides the number of elapsed cycles to calculate how many bytes can be sent, however this division can generate rounding errors, where some fraction of a byte sent may be lost. Lose enough of these fractional bytes and the QoS scheduler underperforms. The problem is worse with low bandwidths. To compensate for this rounding error this fix doesn't advance the port's time_cpu_cycles by the number of cycles that have elapsed, but by multiplying the computed number of bytes that can be sent (which has been rounded down) by number of cycles per byte. This will mean that port's time_cpu_cycles will lag behind the CPU cycles momentarily. At the next poll, the lag will be taken into account. Fixes: `de3cfa2c98` ("sched: initial import") Cc: stable@dpdk.org Signed-off-by: Alan Dewar <alan.dewar@att.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>	2020-07-07 00:58:31 +02:00
Ori Kam	f66967d218	regexdev: implement API functions This commit implements all the RegEx public API. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Guy Kaneti <guyk@marvell.com>	2020-07-07 00:24:52 +02:00
Ori Kam	b25246beae	regexdev: add core functions This commit introduce the API that is needed by the RegEx devices in order to work with the RegEX lib. During the probe of a RegEx device, the device should configure itself, and allocate the resources it requires. On completion of the device init, it should call the rte_regex_dev_register in order to register itself as a RegEx device. Signed-off-by: Ori Kam <orika@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Acked-by: Guy Kaneti <guyk@marvell.com>	2020-07-07 00:24:51 +02:00
Ori Kam	9c26771bc5	regexdev: add core structures This commit introduce the rte_regexdev_core.h file. This file holds internal structures and API that are used by the regexdev. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Guy Kaneti <guyk@marvell.com>	2020-07-07 00:24:39 +02:00
Jerin Jacob	bab9497ef7	regexdev: introduce API As RegEx usage become more used by DPDK applications, for example: * Next Generation Firewalls (NGFW) * Deep Packet and Flow Inspection (DPI) * Intrusion Prevention Systems (IPS) * DDoS Mitigation * Network Monitoring * Data Loss Prevention (DLP) * Smart NICs * Grammar based content processing * URL, spam and adware filtering * Advanced auditing and policing of user/application security policies * Financial data mining - parsing of streamed financial feeds * Application recognition. * Dmemory introspection. * Natural Language Processing (NLP) * Sentiment Analysis. * Big data database acceleration. * Computational storage. Number of PMD providers started to work on HW implementation, along side with SW implementations. This lib adds the support for those kind of devices. The RegEx Device API is composed of two parts: - The application-oriented RegEx API that includes functions to setup a RegEx device (configure it, setup its queue pairs and start it), update the rule database and so on. - The driver-oriented RegEx API that exports a function allowing a RegEx poll Mode Driver (PMD) to simultaneously register itself as a RegEx device driver. RegEx device components and definitions: +-----------------+ \| \| \| o---------+ rte_regexdev_[en\|de]queue_burst() \| PCRE based o------+ \| \| \| RegEx pattern \| \| \| +--------+ \| \| matching engine o------+--+--o \| \| +------+ \| \| \| \| \| queue \|<==o===>\|Core 0\| \| o----+ \| \| \| pair 0 \| \| \| \| \| \| \| \| +--------+ +------+ +-----------------+ \| \| \| ^ \| \| \| +--------+ \| \| \| \| \| \| +------+ \| \| +--+--o queue \|<======>\|Core 1\| Rule\|Database \| \| \| pair 1 \| \| \| +------+----------+ \| \| +--------+ +------+ \| Group 0 \| \| \| \| +-------------+ \| \| \| +--------+ +------+ \| \| Rules 0..n \| \| \| \| \| \| \|Core 2\| \| +-------------+ \| \| +--o queue \|<======>\| \| \| Group 1 \| \| \| pair 2 \| +------+ \| +-------------+ \| \| +--------+ \| \| Rules 0..n \| \| \| \| +-------------+ \| \| +--------+ \| Group 2 \| \| \| \| +------+ \| +-------------+ \| \| \| queue \|<======>\|Core n\| \| \| Rules 0..n \| \| +-------o pair n \| \| \| \| +-------------+ \| +--------+ +------+ \| Group n \| \| +-------------+ \|<-------rte_regexdev_rule_db_update() \| \| \| \|<-------rte_regexdev_rule_db_compile_activate() \| \| Rules 0..n \| \|<-------rte_regexdev_rule_db_import() \| +-------------+ \|------->rte_regexdev_rule_db_export() +-----------------+ RegEx: A regular expression is a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A common abbreviation for this is â~@~\RegExâ~@~]. RegEx device: A hardware or software-based implementation of RegEx device API for PCRE based pattern matching syntax and semantics. PCRE RegEx syntax and semantics specification: http://regexkit.sourceforge.net/Documentation/pcre/pcrepattern.html RegEx queue pair: Each RegEx device should have one or more queue pair to transmit a burst of pattern matching request and receive a burst of receive the pattern matching response. The pattern matching request/response embedded in rte_regex_ops structure. Rule: A pattern matching rule expressed in PCRE RegEx syntax along with Match ID and Group ID to identify the rule upon the match. Rule database: The RegEx device accepts regular expressions and converts them into a compiled rule database that can then be used to scan data. Compilation allows the device to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time. A rule database contains a set of rules that compiled in device specific binary form. Match ID or Rule ID: A unique identifier provided at the time of rule creation for the application to identify the rule upon match. Group ID: Group of rules can be grouped under one group ID to enable rule isolation and effective pattern matching. A unique group identifier provided at the time of rule creation for the application to identify the rule upon match. Scan: A pattern matching request through enqueue API. It may possible that a given RegEx device may not support all the features of PCRE. The application may probe unsupported features through struct rte_regexdev_info::pcre_unsup_flags By default, all the functions of the RegEx Device API exported by a PMD are lock-free functions which assume to not be invoked in parallel on different logical cores to work on the same target object. For instance, the dequeue function of a PMD cannot be invoked in parallel on two logical cores to operates on same RegEx queue pair. Of course, this function can be invoked in parallel by different logical core on different queue pair. It is the responsibility of the upper level application to enforce this rule. In all functions of the RegEx API, the RegEx device is designated by an integer >= 0 named the device identifier dev_id At the RegEx driver level, RegEx devices are represented by a generic data structure of type rte_regexdev. RegEx devices are dynamically registered during the PCI/SoC device probing phase performed at EAL initialization time. When a RegEx device is being probed, a rte_regexdev structure and a new device identifier are allocated for that device. Then, the regexdev_init() function supplied by the RegEx driver matching the probed device is invoked to properly initialize the device. The role of the device init function consists of resetting the hardware or software RegEx driver implementations. If the device init operation is successful, the correspondence between the device identifier assigned to the new device and its associated rte_regexdev structure is effectively registered. Otherwise, both the rte_regexdev structure and the device identifier are freed. The functions exported by the application RegEx API to setup a device designated by its device identifier must be invoked in the following order: - rte_regexdev_configure() - rte_regexdev_queue_pair_setup() - rte_regexdev_start() Then, the application can invoke, in any order, the functions exported by the RegEx API to enqueue pattern matching job, dequeue pattern matching response, get the stats, update the rule database, get/set device attributes and so on If the application wants to change the configuration (i.e. call rte_regexdev_configure() or rte_regexdev_queue_pair_setup()), it must call rte_regexdev_stop() first to stop the device and then do the reconfiguration before calling rte_regexdev_start() again. The enqueue and dequeue functions should not be invoked when the device is stopped. Finally, an application can close a RegEx device by invoking the rte_regexdev_close() function. Each function of the application RegEx API invokes a specific function of the PMD that controls the target device designated by its device identifier. For this purpose, all device-specific functions of a RegEx driver are supplied through a set of pointers contained in a generic structure of type regexdev_ops. The address of the regexdev_ops structure is stored in the rte_regexdev structure by the device init function of the RegEx driver, which is invoked during the PCI/SoC device probing phase, as explained earlier. In other words, each function of the RegEx API simply retrieves the rte_regexdev structure associated with the device identifier and performs an indirect invocation of the corresponding driver function supplied in the regexdev_ops structure of the rte_regexdev structure. For performance reasons, the address of the fast-path functions of the RegEx driver is not contained in the regexdev_ops structure. Instead, they are directly stored at the beginning of the rte_regexdev structure to avoid an extra indirect memory access during their invocation. RTE RegEx device drivers do not use interrupts for enqueue or dequeue operation. Instead, RegEx drivers export Poll-Mode enqueue and dequeue functions to applications. The enqueue operation submits a burst of RegEx pattern matching request to the RegEx device and the dequeue operation gets a burst of pattern matching response for the ones submitted through enqueue operation. Typical application utilisation of the RegEx device API will follow the following programming flow. - rte_regexdev_configure() - rte_regexdev_queue_pair_setup() - rte_regexdev_rule_db_update() Needs to invoke if precompiled rule database not provided in rte_regexdev_config::rule_db for rte_regexdev_configure() and/or application needs to update rule database. - rte_regexdev_rule_db_compile_activate() Needs to invoke if rte_regexdev_rule_db_update function was used. - Create or reuse exiting mempool for rte_regex_ops objects. - rte_regexdev_start() - rte_regexdev_enqueue_burst() - rte_regexdev_dequeue_burst() Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Signed-off-by: Ori Kam <orika@mellanox.com>	2020-07-07 00:24:38 +02:00
David Marchand	0fc601af3a	trace: simplify trace point registration RTE_TRACE_POINT_DEFINE and RTE_TRACE_POINT_REGISTER must come in pairs. Merge them and let RTE_TRACE_POINT_REGISTER handle the constructor part. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-07-05 21:34:21 +02:00
Bruce Richardson	06c7871dde	eal: restrict default plugin path to shared lib mode When using statically linked DPDK binaries, the EAL checks the default PMD path and tries to load any drivers there, despite the fact that all drivers are normally linked into the binary. This behaviour can cause issues if the PMD path and lib dir is configured to a non-standard location which is not in the ld.so.conf paths, e.g. a build with prefix set to a home directory location. In a case such as this, EAL will try and (unnecessarily) load the .so driver files but that load will fail as their dependent libraries, such as ethdev, for example, will not be found. Because of this, it is better if statically linked DPDK apps do not load drivers from the standard paths automatically. The user can always have this behaviour by explicitly specifying the path using -d flag, if so desired. Not loading the libraries automatically can also prevent potential issues with a user building and running a statically-linked DPDK binary based off a private copy of DPDK, while there exists on the same machine a system-wide installation of DPDK in the default locations. Without this change, the system-installed drivers will be loaded to the binary alongside the statically-linked drivers, which is not what the user would have intended. To detect whether we are in a statically or dynamically linked binary, we can have EAL try to get a dlopen handle to its own shared library, by calling dlopen with the RTLD_NOLOAD flag. This will return NULL if there is no such shared lib loaded i.e. the code is executing from a static library, or a handle to the lib if it is loaded. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Tested-by: Sunil Pai G <sunil.pai.g@intel.com>	2020-07-05 21:32:40 +02:00
Bruce Richardson	e84921fb56	eal: cache last directory permissions checked When loading a directory of drivers, we check the same hierarchy multiple times. If we just cache the last directory checked, this avoids repeated checks of the same path, since all drivers in that path have been added to the list consecutively. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-05 21:32:40 +02:00
Bruce Richardson	7b69fd1e95	eal: forbid loading drivers from insecure paths Any paths on the system which are world-writable are insecure and should not be used for loading drivers. Therefore, whenever an absolute or relative driver path is passed to EAL, check for world-writability and don't load any drivers from that path if it is insecure. Drivers loaded from system locations i.e. those passed without any path info and found automatically by the loader, are excluded from these checks as system paths are assumed to be secure. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-05 21:32:40 +02:00
Bruce Richardson	49b536fc30	eal: load only shared libs from driver plugin directories When we pass a "-d" flag to EAL pointing to a directory, we attempt to load all files in that directory as driver plugins, irrespective of file type. This procludes using e.g. the build/drivers directory, as a driver source since it contains static libs and other files as well as the shared objects. By filtering out any files whose filename does not end in ".so", we can improve usability by allowing other non-driver files to be present in the driver directory. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-05 21:32:40 +02:00
Bruce Richardson	efa75d61d2	eal: remove unnecessary null-termination in plugin path Since strlcpy always null-terminates, and the buffer is zeroed before copy anyway, there is no need to explicitly zero the end of the character array, or to limit the bytes that strlcpy can write. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-05 21:32:40 +02:00
Thomas Monjalon	de321d5918	build: remove special handling for node library The node library had a need of being linked as a whole to make some constructors effective. Now that all libraries are linked with --whole-archive, there is no need to have this library separate. Fixes: `e2db26f766` ("build: always link whole DPDK static libraries") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Tested-by: Jerin Jacob <jerinj@marvell.com>	2020-07-05 10:52:11 +02:00
Jerin Jacob	9c99878aa1	log: introduce logtype register macro Introduce the RTE_LOG_REGISTER macro to avoid the code duplication in the logtype registration process. It is a wrapper macro for declaring the logtype, registering it and setting its level in the constructor context. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2020-07-03 15:52:51 +02:00
Bruce Richardson	e2db26f766	build: always link whole DPDK static libraries To ensure all constructors are included in static build, we need to pass the --whole-archive flag when linking, which is used with the "link_whole" meson option. Since we use link_whole for all libs, we no longer need to track the lib as part of the static dependency, just the path to the headers for compiling. After this patch is applied, all DPDK .a files are inside --whole-archive/--no-whole-archive flags, but external dependencies and shared libs being linked against remain outside. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Andrzej Ostruszka <aostruszka@marvell.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-07-01 19:30:52 +02:00
Morten Brørup	4b7284a71f	ring: optimize empty test Testing if the ring is empty is as simple as comparing the producer and consumer pointers. In theory, this optimization reduces the number of potential cache misses from 3 to 2 by not having to read r->mask in rte_ring_count(). The modification of this function were also discussed in the RFC here: https://mails.dpdk.org/archives/dev/2020-April/165752.html Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-01 11:46:09 +02:00
Morten Brørup	36e3cfbbef	ring: cleanup coding style Fix coding style violations that checkpatch will complain about. Add missing "int" after "unsigned". Add missing spaces around "+=" and "+". Remove superfluous type cast of numerical constant. Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-01 11:46:09 +02:00
Feifei Wang	708d904451	ring: fix tail in peek API for ST mode The value of *tail should be the prod->tail not prod->head. After modification, it can record 'tail' so head/tail can be updated accordingly. Fixes: `664ff4b172` ("ring: introduce peek style API") Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-01 10:41:19 +02:00
Feifei Wang	c8d909aeb7	ring: fix bulk enqueue for HTS/RTS ring modes Remove the unwanted call to "_rte_ring_do_enqueue_elem" to allow for correct handling of RTS/HTS modes. Fixes: `e6ba4731c0` ("ring: introduce RTS ring mode") Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-01 10:41:19 +02:00
Matan Azrad	da2b788041	vhost: fix features definition location The vhost library provide an infrastructure in order to help the DPDK users to manage vhost devices. One of the infrastructure parts is the features enablement APIs. Some features bits may be defined only in the internal file vhost.h in case the kernel version doesn't include them. Hence, user running on old kernel may not be able to manage thus features. Move all the feature bits definitions to the API file rte_vhost.h. Fixes: `db69be54b6` ("vhost: hide internal code") Fixes: `8d286dbeb8` ("vhost: fix multiple queue not enabled for old kernels") Fixes: `3d3c6590b5` ("vhost: enable virtio MTU feature") Fixes: `704098fc47` ("vhost: fix build with old kernels") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-06-30 14:52:31 +02:00
Matan Azrad	b213af9aa4	vhost: notify virtq file descriptor update When virtq call or kick file descriptors are changed in the device configuration when the queue is ready, the application and the vDPA driver should be notified to be aligned to the new file descriptors. Notify the state to be disabled before the file descriptor update and return it back to be enabled after the update. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-06-30 14:52:31 +02:00
Matan Azrad	127f9c6f7b	vhost: handle memory hotplug with vDPA devices Some vDPA drivers' basic configurations should be updated when the guest memory is hotplugged. Close vDPA device before hotplug operation and recreate it after the hotplug operation is done. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-06-30 14:52:30 +02:00
Matan Azrad	d0fcc38f5f	vhost: improve device readiness notifications Some guest drivers may not configure disabled virtio queues. In this case, the vhost management never notifies the application and the vDPA device readiness because it waits to the device to be ready. The current ready state means that all the virtio queues should be configured regardless the enablement status. In order to support this case, this patch changes the ready state: The device is ready when at least 1 queue pair is configured and enabled. So, now, the application and vDPA driver are notifies when the first queue pair is configured and enabled. Also the queue notifications will be triggered according to the new ready definition. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-06-30 14:52:30 +02:00
Matan Azrad	9f2016b2ce	vhost: skip access lock when vDPA is configured No need to take access lock in the vhost-user message handler when vDPA driver controls all the data-path of the vhost device. It allows the vDPA set_vring_state operation callback to configure guest notifications. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-06-30 14:52:30 +02:00
Matan Azrad	0329868d6a	vhost: support host notifier queue configuration As an arrangement to per queue operations in the vDPA device it is needed to change the next experimental API: The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue instead of per device. A `qid` parameter was added to the API arguments list. Setting the parameter to the value RTE_VHOST_QUEUE_ALL configures the host notifier to all the device queues as done before this patch. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	a49f758d11	vhost: split vDPA header file This patch split the vDPA header file in two, making rte_vdpa_device structure opaque to the application. Applications should only include rte_vdpa.h, while drivers should include both rte_vdpa.h and rte_vdpa_dev.h. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	e91ac959fa	vhost: remove vDPA device count API This API is no more useful, this patch removes it. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	8d44fc3a81	vhost: introduce wrappers for some vDPA ops This patch is preliminary work to make the vDPA device structure opaque to the user application. Some callbacks of the vDPA devices are used to query capabilities before attaching to a Vhost port. This patch introduces wrappers for these ops. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	08a4f9bab3	vhost: use linked list for vDPA devices There is no more notion of device ID outside of vdpa.c. We can now move from array to linked-list model for keeping track of the vDPA devices. There is no point in using array here, as all vDPA API are used from the control path, so no performance concerns. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	f6d587754c	vhost: remove useless vDPA API vDPA is no more used outside of the vDPA internals, so remove rte_vdpa_get_device() API that is now useless. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	0f700f90ad	vhost: replace device ID in applications This patch replaces the use of vDPA device ID with vDPA device pointer. The goals is to remove the vDPA device ID to avoid confusion with the Vhost ID. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	2263f13941	vhost: replace vDPA device ID in Vhost This removes the notion of device ID in Vhost library as a preliminary step to get rid of the vDPA device ID. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	81a6b7fe06	vhost: replace device ID in vDPA ops This patch is a preliminary step to get rid of the vDPA device ID. It makes vDPA callbacks to use the vDPA device struct as a reference instead of the ID. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	38f8ab0bbc	vhost: make vDPA framework bus agnostic This patch makes the vDPA framework to no more support only PCI devices, but any devices by relying on the generic device name as identifier. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Maxime Coquelin	383fb5a9c7	vhost: introduce vDPA device class This patch introduces vDPA device class. It will enable application to iterate over the vDPA devices. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-06-30 14:52:30 +02:00
Ruifeng Wang	4eb25acdb7	eal/arm: add vcopyq intrinsic for aarch32 vcopyq_laneq_u32 should be implemented for aarch32 which doesn't have the intrinsic. This fixes build of examples/l3fwd for armv7. Fixes: `3c4b4024c2` ("arch/arm: add vcopyq_laneq_u32 for old gcc") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-06-30 14:52:30 +02:00
Matan Azrad	1cb4415751	vhost: introduce operation to get vDPA queue stats The vDPA device offloads all the datapath of the vhost device to the HW device. In order to expose to the user traffic information this patch introduces new 3 APIs to get traffic statistics, the device statistics name and to reset the statistics per virtio queue. The statistics are taken directly from the vDPA driver managing the HW device and can be different for each vendor driver. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-06-30 14:52:29 +02:00
Maxime Coquelin	d1c074bd76	vhost: enable reply-ack systematically As announced during v20.05 release cycle, this patch makes reply-ack protocol feature to be enabled unconditionally. This protocol feature makes the communication between the master and the slave more robust, avoiding for example possible undefined behaviour with VHOST_USER_SET_MEM_TABLE. Also, reply-ack support will be required for upcoming VHOST_USER_SET_STATUS request. Note that this protocol feature was disabled by default because Qemu version 2.7.0 to 2.9.0 had a bug causing a deadlock when reply-ack was negotiated and multiqueue enabled. These Qemu version are now very old and no more maintained, so we can reasonably consider we no more support them. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-06-30 14:52:29 +02:00
Fady Bader	8d05adbd54	ring: enable for Windows Building ring on Windows. Signed-off-by: Fady Bader <fady@mellanox.com>	2020-06-30 01:30:22 +02:00
Tasnim Bashar	5652a4ad24	eal/windows: fix thread handle Casting thread ID to handle is not accurate way to get thread handle. Need to use OpenThread function to get thread handle from thread ID. pthread_setaffinity_np and pthread_getaffinity_np functions for Windows are affected because of it. Signed-off-by: Tasnim Bashar <tbashar@mellanox.com>	2020-06-30 00:50:26 +02:00
Tal Shnaiderman	b762221ac2	bus/pci: support Windows with bifurcated drivers Uses SetupAPI.h functions to scan PCI tree. Uses DEVPKEY_Device_Numa_Node to get the PCI NUMA node. Uses SPDRP_BUSNUMBER and SPDRP_BUSNUMBER to get the BDF. scanning currently supports types RTE_KDRV_NONE. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:54 +02:00
Tal Shnaiderman	33031608e8	bus/pci: introduce Windows support with stubs Addition of stub eal and bus/pci functions to compile bus/pci for Windows. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:54 +02:00
Tal Shnaiderman	8517072c87	pci: fix address domain format size the struct rte_pci_addr defines domain as uint32_t variable however the PCI_PRI_FMT macro used for logging the struct sets the format of domain to uint16_t. The mismatch causes the following warning messages in Windows clang build: format specifies type 'unsigned short' but the argument has type 'uint32_t' (aka 'unsigned int') [-Wformat] Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:54 +02:00
Tal Shnaiderman	b137f95366	pci: build on Windows Added <sys/types.h> in rte_pci header file to include off_t type since it is missing for Windows. Define the implementation of the Linux function rte_pci_get_sysfs_path in pci_common.c for Linux OS only as it is unneeded for other OSs and to avoid the warning on deprecated call to getenv() on Windows: "warning: 'getenv' is deprecated: This function or variable may be unsafe. Consider using _dupenv_s instead." Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:54 +02:00
Tal Shnaiderman	2fd3567e54	pci: use OS generic memory mapping functions Changing all of PCIs Unix memory mapping to the new memory allocation API wrapper. Change all of PCI mapping function usage in bus/pci to support the new API. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:54 +02:00
Tal Shnaiderman	2ab745bd8f	eal: move OS common options usage functions Move common functions between Unix and Windows to eal_common_options.c. Those functions are getter functions for rte_application_usage_hook. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:53 +02:00
Tal Shnaiderman	57a2efb304	eal: move OS common config objects Move common functions between Unix and Windows to eal_common_config.c. Those functions are getter functions for IOVA, configuration, Multi-process. Move rte_config, internal_config, early_mem_config and runtime_dir to be defined in the common file with getter functions. Refactor the users of the config variables above to use the getter functions. Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:53 +02:00
Tal Shnaiderman	309bf90bf9	build: generate version map file for MinGW The MinGW build for Windows has special cases where exported function contain additional prefix: __emutls_v.per_lcore__* To avoid adding those prefixed functions to the version.map file the map_to_def.py script was modified to create a map file for MinGW with the needed changed. The file name was changed to map_to_win.py and lib/meson.build map output was unified with drivers/meson.build output Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>	2020-06-30 00:02:53 +02:00
Xiaolong Ye	6248368a9a	mbuf: add dump of free dynamic flags Add support to dump free_flags as below format: Free bit in mbuf->ol_flags (0 = occupied, 1 = free): 0000: 0 0 0 0 0 0 0 0 0008: 0 0 0 0 0 0 0 0 0010: 0 0 0 0 0 0 0 1 0018: 1 1 1 1 1 1 1 1 0020: 1 1 1 1 1 1 1 1 0028: 1 0 0 0 0 0 0 0 0030: 0 0 0 0 0 0 0 0 0038: 0 0 0 0 0 0 0 0 Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-06-25 23:03:18 +02:00
Xiaolong Ye	d27a4cb3ef	mbuf: fix dynamic field dump log For each mbuf byte, free_space[i] == 0 means the space is occupied, free_space[i] != 0 means space is free. Fixes: `4958ca3a44` ("mbuf: support dynamic fields and flags") Cc: stable@dpdk.org Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-06-25 23:03:18 +02:00

... 4 5 6 7 8 ...

6717 Commits