numam-dpdk

Author	SHA1	Message	Date
Yunjian Wang	c2402fcaf9	efd: fix tailq entry leak in error path In rte_efd_create() allocated memory for tailq entry, we should free it when error happens, otherwise it will lead to memory leak. Fixes: 56b6ef874f80 ("efd: new Elastic Flow Distributor library") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2020-10-22 22:07:15 +02:00
David Marchand	5a1c7b6ddd	hash: use x86 common flag for jhash jhash has been forgotten when factorising the x86 arch check. Fixes: dbf17d44f375 ("hash: use common x86 flag") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-22 22:07:15 +02:00
David Marchand	a5c369d486	bpf: use helper to install headers Libraries can use the headers variable to install headers. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-22 14:15:19 +02:00
David Marchand	6b3848e211	build: fix version map file references in documentation Fixes: 63b3907833d8 ("build: remove library name from version map file name") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-22 14:11:49 +02:00
Yunjian Wang	01072d52af	eal/linux: fix memory leak in uevent handling When the memory for uevent.devname is allocated in dev_uev_parse(). It is not freed when parse the subsystem layer fails in dev_uev_parse(). Before return, it is also not freed in dev_uev_handler(). These cause a memory leak. Fixes: 0d0f478d0483 ("eal/linux: add uevent parse and process") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-10-20 16:01:37 +02:00
Tal Shnaiderman	ddfaa718b7	eal/windows: add missing stdint include Following the addition of the in_addr/in6_addr structs to in.h the header file must have stdint.h included for the definitions of the uint8_t/uint32_t types used within the new structs. Not having it could results in the following errors in places where in.h is included: in.h:30:2: error: unknown type name 'uint32_t' uint32_t s_addr; in.h:34:2: error: unknown type name 'uint8_t' uint8_t s6_addr[16]; Fixes: f40a74cfcf0 ("eal/windows: improve compatibility networking headers") Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2020-10-20 13:46:32 +02:00
Stephen Hemminger	cb056611a8	eal: rename lcore master and slave Replace master lcore with main lcore and replace slave lcore with worker lcore. Keep the old functions and macros but mark them as deprecated for this release. The "--master-lcore" command line option is also deprecated and any usage will print a warning and use "--main-lcore" as replacement. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-10-20 13:17:08 +02:00
Stephen Hemminger	0303192581	eal: add macro to mark macros as deprecated Add a macro that causes GCC and CLANG to emit a warning when a deprecated macro is used. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-20 11:42:29 +02:00
Gregory Etelson	174db36812	ethdev: rename tunnel flow offload callbacks Rename new rte_flow ops callbacks to emphasize relation to tunnel offload API. Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-10-19 23:28:22 +02:00
Stephen Hemminger	5b183ff611	ipc: fix spelling in log and comment Fixes spelling in comment and message about thread error. Found while looking at checkpatch complaints about "thead" Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2020-10-19 23:25:06 +02:00
Bruce Richardson	a8d0d473a0	build: replace use of old build macros Use the newer macros defined by meson in all DPDK source code, to ensure there are no errors when the old non-standard macros are removed. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Rosen Xu <rosen.xu@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-10-19 22:15:44 +02:00
Bruce Richardson	a20b2c01a7	build: standardize component names and defines As discussed on the dpdk-dev mailing list[1], we can make some easy improvements in standardizing the naming of the various components in DPDK, and their associated feature-enabled macros. Following this patch, each library will have the name in format, 'librte_<name>.so', and the macro indicating that library is enabled in the build will have the form 'RTE_LIB_<NAME>'. Similarly, for libraries, the equivalent name formats and macros are: 'librte_<class>_<name>.so' and 'RTE_<CLASS>_<NAME>', where class is the device type taken from the relevant driver subdirectory name, i.e. 'net', 'crypto' etc. To avoid too many changes at once for end applications, the old macro names will still be provided in the build in this release, but will be removed subsequently. [1] http://inbox.dpdk.org/dev/ef7c1a87-79ab-e405-4202-39b7ad6b0c71@solarflare.com/t/#u Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Rosen Xu <rosen.xu@intel.com>	2020-10-19 22:15:34 +02:00
Bruce Richardson	63b3907833	build: remove library name from version map file name Since each version map file is contained in the subdirectory of the library it refers to, there is no need to include the library name in the filename. This makes things simpler in case of library renaming. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Rosen Xu <rosen.xu@intel.com>	2020-10-19 22:13:59 +02:00
Ciara Power	1e6a661302	acl: check max SIMD bitwidth When choosing a vector path to take, an extra condition must be satisfied to ensure the max SIMD bitwidth allows for the CPU enabled path. These checks are added in the check alg helper functions. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-10-19 16:45:02 +02:00
Ciara Power	13facf47d6	node: choose vector path at runtime When choosing the vector path, max SIMD bitwidth is now checked to ensure the vector path is suitable. To do this, the scalar function is chosen by default in the struct, but at node initialisation time, this function pointer is updated to the vector version if supported, and if it is within the max SIMD bitwidth limit. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>	2020-10-19 16:45:02 +02:00
Ciara Power	209fd1984a	net: check max SIMD bitwidth When choosing a vector path to take, an extra condition must be satisfied to ensure the max SIMD bitwidth allows for the CPU enabled path. The vector path was initially chosen in RTE_INIT, however this is no longer suitable as we cannot check the max SIMD bitwidth at that time. Default handlers are now chosen on initialisation, these default handlers are used the first time the crc calc is called, and they set the suitable handlers to be used going forward. Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com> Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>	2020-10-19 16:45:02 +02:00
Ciara Power	ad2ffbf6d4	efd: check max SIMD bitwidth When choosing a vector path to take, an extra condition must be satisfied to ensure the max SIMD bitwidth allows for the CPU enabled path. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2020-10-19 16:45:02 +02:00
Ciara Power	c9cd5806f7	member: check max SIMD bitwidth When choosing a vector path to take, an extra condition must be satisfied to ensure the max SIMD bitwidth allows for the CPU enabled path. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2020-10-19 16:45:02 +02:00
Ciara Power	eec67546aa	distributor: check max SIMD bitwidth When choosing a vector path to take, an extra condition must be satisfied to ensure the max SIMD bitwidth allows for the CPU enabled path. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 16:45:02 +02:00
Ciara Power	580af30dd6	eal: control max SIMD bitwidth This patch adds a max SIMD bitwidth EAL configuration. The API allows for an app to set this value. It can also be set using EAL argument --force-max-simd-bitwidth, which will lock the value and override any modifications made by the app. Each arch has a define for the default SIMD bitwidth value, this is used on EAL init to set the config max SIMD bitwidth. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-10-19 16:45:02 +02:00
Stephen Hemminger	17b347dab7	malloc: add alloc_size attribute to functions By using the alloc_size() attribute the compiler can optimize better and detect errors at compile time. For example, Gcc will fail one of the invalid allocation examples in app/test/test_malloc.c because the allocation is outside the limits of memory. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2020-10-19 16:25:43 +02:00
Hemant Agrawal	f030bff72f	bitrate: add free function This patch adds support for free function. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2020-10-19 16:08:36 +02:00
Stephen Hemminger	bb548625c6	eal/linux: add function to allow interruptible epoll The existing definition of rte_epoll_wait retries if interrupted by a signal. This behavior makes it hard to use rte_epoll_wait for applications that want to use signals do do things like exit polling loop and shutdown. Since changing existing semantic might break applications, add a new rte_epoll_wait_interruptible() function that does the same thing as rte_epoll_wait but will return -1 and errno of EINTR if it receives a signal. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Harman Kalra <hkalra@marvell.com>	2020-10-19 12:17:25 +02:00
Lukasz Wojciechowski	20fa39d230	distributor: fix clearing returns buffer The patch clears distributors returns buffer in clear_returns() by setting start and count to 0. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	91d6b8235e	distributor: fix flushing in flight packets rte_distributor_flush() is using total_outstanding() function to calculate if it should still wait for processing packets. However in burst mode only backlog packets were counted. This patch fixes that issue by counting also in flight packets. There are also sum fixes to properly keep count of in flight packets for each worker in bufs[].count. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	626ceefbf4	distributor: fix scalar matching Fix improper indexes while comparing tags. In the find_match_scalar() function: * j iterates over flow tags of following packets; * w iterates over backlog or in flight tags positions. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	ed4be82d9e	distributor: fix API documentation After introducing burst API there were some artefacts in the API documentation from legacy single API. Also the rte_distributor_poll_pkt() function return values mismatched the implementation. Fixes: c0de0eb82e40 ("distributor: switch over to new API") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	f25fe0d5e3	distributor: fix return pkt calls in single mode In the single legacy version of the distributor synchronization requires continues exchange of buffers between distributor and workers. Empty buffers are sent if only handshake synchronization is required. However calls to the rte_distributor_return_pkt() with 0 buffers in single mode were ignored and not passed to the legacy algorithm implementation causing lack of synchronization. This patch fixes this issue by passing NULL as buffer which is a valid way of sending just synchronization handshakes in single mode. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	480d5a7c81	distributor: handle worker shutdown in burst mode The burst version of distributor implementation was missing proper handling of worker shutdown. A worker processing packets received from distributor can call rte_distributor_return_pkt() function informing distributor that it want no more packets. Further calls to rte_distributor_request_pkt() or rte_distributor_get_pkt() however should inform distributor that new packets are requested again. Lack of the proper implementation has caused that even after worker informed about returning last packets, new packets were still sent from distributor causing deadlocks as no one could get them on worker side. This patch adds handling shutdown of the worker in following way: 1) It fixes usage of RTE_DISTRIB_VALID_BUF handshake flag. This flag was formerly unused in burst implementation and now it is used for marking valid packets in retptr64 replacing invalid use of RTE_DISTRIB_RETURN_BUF flag. 2) Uses RTE_DISTRIB_RETURN_BUF as a worker to distributor handshake in retptr64 to indicate that worker has shutdown. 3) Worker that shuts down blocks also bufptr for itself with RTE_DISTRIB_RETURN_BUF flag allowing distributor to retrieve any in flight packets. 4) When distributor receives information about shutdown of a worker, it: marks worker as not active; retrieves any in flight and backlog packets and process them to different workers; unlocks bufptr64 by clearing RTE_DISTRIB_RETURN_BUF flag and allowing use in the future if worker requests any new packets. 5) Do not allow to: send or add to backlog any packets for not active workers. Such workers are also ignored if matched. 6) Adjust calls to handle_returns() and tags matching procedure to react for possible activation deactivation of workers. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	6bd951b482	distributor: fix buffer use after free rte_distributor_request_pkt and rte_distributor_get_pkt dereferenced oldpkt parameter when in RTE_DIST_ALG_SINGLE even if number of returned buffers from worker to distributor was 0. This patch passes NULL to the legacy API when number of returned buffers is 0. This allows passing NULL as oldpkt parameter. Distributor tests are also updated passing NULL as oldpkt and 0 as number of returned packets, where packets are not returned. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	5acce079e7	distributor: fix handshake deadlock Synchronization of data exchange between distributor and worker cores is based on 2 handshakes: retptr64 for returning mbufs from workers to distributor and bufptr64 for passing mbufs to workers. Without proper order of verifying those 2 handshakes a deadlock may occur. This can happen when worker core wants to return back mbufs and waits for retptr handshake to be cleared while distributor core waits for bufptr to send mbufs to worker. This can happen as worker core first returns mbufs to distributor and later gets new mbufs, while distributor first releases mbufs to worker and later handle returning packets. This patch fixes possibility of the deadlock by always taking care of returning packets first on the distributor side and handling packets while waiting to release new. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com>	2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski	bea84d5592	distributor: fix handshake synchronization rte_distributor_return_pkt function which is run on worker cores must wait for distributor core to clear handshake on retptr64 before using those buffers. While the handshake is set distributor core controls buffers and any operations on worker side might overwrite buffers which are unread yet. Same situation appears in the legacy single distributor. Function rte_distributor_return_pkt_single shouldn't modify the bufptr64 until handshake on it is cleared by distributor lcore. Fixes: 775003ad2f96 ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: David Hunt <david.hunt@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-10-19 10:57:17 +02:00
Akhil Goyal	e30b2833c4	security: update session create API The API ``rte_security_session_create`` takes only single mempool for session and session private data. So the application need to create mempool for twice the number of sessions needed and will also lead to wastage of memory as session private data need more memory compared to session. Hence the API is modified to take two mempool pointers - one for session and one for private data. This is very similar to crypto based session create APIs. Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com> Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>	2020-10-19 09:54:54 +02:00
Churchill Khangar	ccbbb7f23f	pipeline: fix SWX jump instruction parsing This patch fixes the jump if not valid header instruction parsing. Fixes: b3947e25bed4 ("pipeline: introduce SWX jump and return instructions") Signed-off-by: Churchill Khangar <churchill.khangar@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-19 09:20:25 +02:00
Venkata Suresh Kumar P	1e17748b0a	pipeline: fix SWX jump instruction population This patch fixes jump next instruction pointer population. Fixes: b3947e25bed4 ("pipeline: introduce SWX jump and return instructions") Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2020-10-19 09:20:25 +02:00
Ferruh Yigit	3d98f921fb	ethdev: unify prefix for static functions and variables Prefix static function and variables with 'eth_dev'. For some 'rte_' prefix dropped, and for others 'eth_dev' added. This is useful to differentiate public and static function/variables. The cleanup is good to for having consistent naming to help new additions naming. No functional change, only naming. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Gaetan Rivet <grive@u256.net>	2020-10-17 01:14:50 +02:00
Andrew Rybchenko	f6c763fbed	ethdev: unify error code if port ID is invalid Use ENODEV as the error code if specified port ID is invalid. Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-17 01:14:50 +02:00
Ferruh Yigit	a72cb3e765	doc: announce queue stats moving to xstats Queue stats will be removed from basic stats to xstats. It will be PMDs responsibility to fill queue stats based on number of queues they have. Until all PMDs implement the xstats, a temporary 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' device flag created. PMDs switched to the xstats should clear this flag to bypass the ethdev layer autofill for queue stats. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2020-10-16 23:27:15 +02:00
Ferruh Yigit	f30e69b41f	ethdev: add device flag to bypass auto-filled queue xstats Queue stats are stored in 'struct rte_eth_stats' as array and array size is defined by 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag. As a result of technical board discussion, decided to remove the queue statistics from 'struct rte_eth_stats' in the long term. Instead PMDs should represent the queue statistics via xstats, this gives more flexibility on the number of the queues supported. Currently queue stats in the xstats are filled by ethdev layer, using some basic stats, when queue stats removed from basic stats the responsibility to fill the relevant xstats will be pushed to the PMDs. During the switch period, temporary 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' device flag is created. Initially all PMDs using xstats set this flag. The PMDs implemented queue stats in the xstats should clear the flag. When all PMDs switch to the xstats for the queue stats, queue stats related fields from 'struct rte_eth_stats' will be removed, as well as 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' flag. Later 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag also can be removed. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-10-16 23:27:15 +02:00
Ivan Ilchenko	62024eb827	ethdev: change stop operation callback to return int Change eth_dev_stop_t return value from void to int. Make eth_dev_stop_t implementations across all drivers to return negative errno values if case of error conditions. Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Ivan Ilchenko	58af59172b	ethdev: allow stop function to return an error Change rte_eth_dev_stop() return value from void to int and return negative errno values in case of error conditions. Also update the usage of the function in ethdev according to the new return type. Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Thomas Monjalon	8a5a0aad5d	ethdev: allow close function to return an error The API function rte_eth_dev_close() was returning void. The return type is changed to int for notifying of errors. If an error happens during a close operation, the status of the port is undefined, a maximum of resources having been freed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Liron Himi <lironh@marvell.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Thomas Monjalon	0607dadf98	ethdev: reset all when releasing a port The function rte_eth_dev_release_port() is partially resetting the struct rte_eth_dev. The drivers were completing this reset with more pointers set to NULL in the close or remove operations. More pointers are reset at ethdev level, and some redundant assignments are removed from PMDs. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2020-10-16 22:26:41 +02:00
Thomas Monjalon	b8f5d2ae75	ethdev: remove forcing stopped state upon close When closing a port, it is supposed to be already stopped, and marked as such with "dev_started" state zeroed by the stop API. Resetting "dev_started" before calling the driver close operation was hiding the case of not properly stopped port being closed. The flag "dev_started" is not changed anymore in "rte_eth_dev_close()". In case the "dev_stop" function is called from "dev_close", bypassing "rte_eth_dev_stop()" API, the "dev_started" state must be explicitly reset in the PMD in order to keep the same behaviour. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-10-16 22:26:41 +02:00
Viacheslav Ovsiienko	4ff702b5df	ethdev: introduce Rx buffer split The DPDK datapath in the transmit direction is very flexible. An application can build the multi-segment packet and manages almost all data aspects - the memory pools where segments are allocated from, the segment lengths, the memory attributes like external buffers, registered for DMA, etc. In the receiving direction, the datapath is much less flexible, an application can only specify the memory pool to configure the receiving queue and nothing more. In order to extend receiving datapath capabilities it is proposed to add the way to provide extended information how to split the packets being received. The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device capabilities is introduced to present the way for PMD to report to application about supporting Rx packet split to configurable segments. Prior invoking the rte_eth_rx_queue_setup() routine application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag. The following structure is introduced to specify the Rx packet segment for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload: struct rte_eth_rxseg_split { struct rte_mempool mp; / memory pools to allocate segment from / uint16_t length; / segment maximal data length, configures "split point" / uint16_t offset; / data offset from beginning of mbuf data buffer / uint32_t reserved; / reserved field */ }; The segment descriptions are added to the rte_eth_rxconf structure: rx_seg - pointer the array of segment descriptions, each element describes the memory pool, maximal data length, initial data offset from the beginning of data buffer in mbuf. This array allows to specify the different settings for each segment in individual fashion. rx_nseg - number of elements in the array If the extended segment descriptions is provided with these new fields the mp parameter of the rte_eth_rx_queue_setup must be specified as NULL to avoid ambiguity. There are two options to specify Rx buffer configuration: - mp is not NULL, rrx_conf.rx_nseg is zero, it is compatible configuration, follows existing implementation, provides the single pool and no description for segment sizes and offsets. - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not zero, it provides the extended configuration, individually for each segment. f the Rx queue is configured with new settings the packets being received will be split into multiple segments pushed to the mbufs with specified attributes. The PMD will split the received packets into multiple segments according to the specification in the description array. For example, let's suppose we configured the Rx queue with the following segments: seg0 - pool0, len0=14B, off0=2 seg1 - pool1, len1=20B, off1=128B seg2 - pool2, len2=20B, off2=0B seg3 - pool3, len3=512B, off3=0B The packet 46 bytes long will look like the following: seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 seg1 - 20B long @ 128 in mbuf from pool1 seg2 - 12B long @ 0 in mbuf from pool2 The packet 1500 bytes long will look like the following: seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 seg1 - 20B @ 128 in mbuf from pool1 seg2 - 20B @ 0 in mbuf from pool2 seg3 - 512B @ 0 in mbuf from pool3 seg4 - 512B @ 0 in mbuf from pool3 seg5 - 422B @ 0 in mbuf from pool3 The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and configured to support new buffer split feature (if rx_nseg is greater than one). The split limitations imposed by underlying PMD is reported in the new introduced rte_eth_dev_info->rx_seg_capa field. The new approach would allow splitting the ingress packets into multiple parts pushed to the memory with different attributes. For example, the packet headers can be pushed to the embedded data buffers within mbufs and the application data into the external buffers attached to mbufs allocated from the different memory pools. The memory attributes for the split parts may differ either - for example the application data may be pushed into the external memory located on the dedicated physical device, say GPU or NVMe. This would improve the DPDK receiving datapath flexibility with preserving compatibility with existing API. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-10-16 22:26:40 +02:00
Eli Britstein	9ec0f97e02	ethdev: add tunnel offload model rte_flow API provides the building blocks for vendor-agnostic flow classification offloads. The rte_flow "patterns" and "actions" primitives are fine-grained, thus enabling DPDK applications the flexibility to offload network stacks and complex pipelines. Applications wishing to offload tunneled traffic are required to use the rte_flow primitives, such as group, meta, mark, tag, and others to model their high-level objects. The hardware model design for high-level software objects is not trivial. Furthermore, an optimal design is often vendor-specific. When hardware offloads tunneled traffic in multi-group logic, partially offloaded packets may arrive to the application after they were modified in hardware. In this case, the application may need to restore the original packet headers. Consider the following sequence: The application decaps a packet in one group and jumps to a second group where it tries to match on a 5-tuple, that will miss and send the packet to the application. In this case, the application does not receive the original packet but a modified one. Also, in this case, the application cannot match on the outer header fields, such as VXLAN vni and 5-tuple. There are several possible ways to use rte_flow "patterns" and "actions" to resolve the issues above. For example: 1 Mapping headers to a hardware registers using the rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects. 2 Apply the decap only at the last offload stage after all the "patterns" were matched and the packet will be fully offloaded. Every approach has its pros and cons and is highly dependent on the hardware vendor. For example, some hardware may have a limited number of registers while other hardware could not support inner actions and must decap before accessing inner headers. The tunnel offload model resolves these issues. The model goals are: 1 Provide a unified application API to offload tunneled traffic that is capable to match on outer headers after decap. 2 Allow the application to restore the outer header of partially offloaded packets. The tunnel offload model does not introduce new elements to the existing RTE flow model and is implemented as a set of helper functions. For the application to work with the tunnel offload API it has to adjust flow rules in multi-table tunnel offload in the following way: 1 Remove explicit call to decap action and replace it with PMD actions obtained from rte_flow_tunnel_decap_and_set() helper. 2 Add PMD items obtained from rte_flow_tunnel_match() helper to all other rules in the tunnel offload sequence. VXLAN Code example: Assume application needs to do inner NAT on the VXLAN packet. The first rule in group 0: flow create <port id> ingress group 0 pattern eth / ipv4 / udp dst is 4789 / vxlan / end actions {pmd actions} / jump group 3 / end The first VXLAN packet that arrives matches the rule in group 0 and jumps to group 3. In group 3 the packet will miss since there is no flow to match and will be sent to the application. Application will call rte_flow_get_restore_info() to get the packet outer header. Application will insert a new rule in group 3 to match outer and inner headers: flow create <port id> ingress group 3 pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3 / end actions set_ipv4_dst 186.1.1.1 / queue index 3 / end Resulting of the rules will be that VXLAN packet with vni=10, outer IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decapped on queue 3 with IPv4 dst=186.1.1.1 Note: The packet in group 3 is considered decapped. All actions in that group will be done on the header that was inner before decap. The application may specify an outer header to be matched on. It's PMD responsibility to translate these items to outer metadata. API usage: /** * 1. Initiate RTE flow tunnel object / const struct rte_flow_tunnel tunnel = { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .tun_id = 10, } /* * 2. Obtain PMD tunnel actions * * pmd_actions is an intermediate variable application uses to * compile actions array / struct rte_flow_action pmd_actions; rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions, &num_pmd_actions, &error); /* * 3. offload the first rule * matching on VXLAN traffic and jumps to group 3 * (implicitly decaps packet) / app_actions = jump group 3 rule_items = app_items; /* eth / ipv4 / udp / vxlan / rule_actions = { pmd_actions, app_actions }; attr.group = 0; flow_1 = rte_flow_create(port_id, &attr, rule_items, rule_actions, &error); /* * 4. after flow creation application does not need to keep the * tunnel action resources. / rte_flow_tunnel_action_release(port_id, pmd_actions, num_pmd_actions); /* * 5. After partially offloaded packet miss because there was no * matching rule handle miss on group 3 / struct rte_flow_restore_info info; rte_flow_get_restore_info(port_id, mbuf, &info, &error); /* * 6. Offload NAT rule: / app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3 } app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 } rte_flow_tunnel_match(&info.tunnel, &pmd_items, &num_pmd_items, &error); rule_items = {pmd_items, app_items}; rule_actions = app_actions; attr.group = info.group_id; flow_2 = rte_flow_create(port_id, &attr, rule_items, rule_actions, &error); /* * 7. Release PMD items after rule creation */ rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items); References 1. https://mails.dpdk.org/archives/dev/2020-June/index.html Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:19 +02:00
Gregory Etelson	5d1bff8fe2	ethdev: allow negative values in flow rule types RTE flow items & actions use positive values in item & action type. Negative values are reserved for PMD private types. PMD items & actions usually are not exposed to application and are not used to create RTE flows. The patch allows applications with access to PMD flow items & actions ability to integrate RTE and PMD items & actions and use them to create flow rule. RTE flow item or action conversion library accepts positive known element types with predefined sizes only. Private PMD items and actions do not fit into this scheme because PMD type values are negative, each PMD has it's own types numeration and element types and their sizes are not visible at RTE level. To resolve these limitations the patch proposes this solution: 1. PMD can expose elements of pointer size only. RTE flow conversion functions will use pointer size for each configuration object in private PMD element it processes; 2. RTE flow verification will not reject elements with negative type. Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:19 +02:00
Dekel Peled	09315fc838	ethdev: add VLAN attributes to ethernet and VLAN items This patch implements the change proposes in RFC [1], adding dedicated fields to ETH and VLAN items structs, to clearly define the required characteristic of a packet, and enable precise match criteria. Documentation is updated accordingly. [1] https://mails.dpdk.org/archives/dev/2020-August/177536.html Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-10-16 19:48:19 +02:00
Bing Zhao	bc6e15de08	ethdev: add hairpin queue operations Every hairpin queue pair should be configured properly and the connection between Tx and Rx queues should be established, before hairpin function works. In single port hairpin mode, the queues of each pair belong to the same device. It is easy to get the hardware and software information of each queue and configure the hairpin connection with such information. In two ports hairpin mode, it is not easy or inappropriate to access one queue's information from another device. Since hairpin is configured per queue pair, three new APIs are introduced and they are internal for the PMD using. The peer update API helps to pass one queue's information to the peer queue and get the peer's information back for the next step. The peer bind API configures the current queue with the peer's information. For each hairpin queue pair, this API may need to be called twice to configure the Tx, Rx queues separately. The peer unbind API resets the current queue configuration and state to disconnect it from the peer queue. Also, it may need to be called twice to disconnect Tx, Rx queues from each other. Some parameter of the above APIs might not be mandatory, and it depends on the PMD implementation. The structure of `rte_hairpin_peer_info` is only a declaration and the actual members will be defined in each PMD when being used. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-10-16 19:48:19 +02:00
Bing Zhao	9a9ba10ada	ethdev: add function to get hairpin peer ports list After hairpin queues are configured, in general, the application will maintain the ports topology and even the queues configuration for the hairpin. But sometimes it will not. If there is no hot-plug, it is easy to bind and unbind hairpin among all the ports. The application can just connect or disconnect the hairpin egress ports to/from all the probed ingress ports. Then all the connections could be handled properly. But with hot-plug / hot-unplug, one port could be probed and removed dynamically. With two ports hairpin, all the connections from and to this port should be handled after start(bind) or before stop(unbind). It is necessary to know the hairpin topology with this port. This function will return the ports list with the actual peer ports number after configuration. Either peer Rx or Tx ports will be gotten with this function call. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-10-16 19:48:19 +02:00

1 2 3 4 5 ...

6608 Commits