numam-dpdk

Author	SHA1	Message	Date
Patrick Fu	47958f7cbf	vhost: fix async completion of multi-seg packets In async enqueue copy, a packet could be split into multiple copy segments. When polling the copy completion status, current async data path assumes the async device callbacks are aware of the packet boundary and return completed segments only if all segments belonging to the same packet are done. Such assumption are not generic to common async devices and may degrade the copy performance if async callbacks have to implement it in software manner. This patch adds tracking of the completed copy segments at vhost side. If async copy device reports partial completion of a packets, only vhost internal record is updated and vring status keeps unchanged until remaining segments of the packet are also finished. The async copy device is no longer necessary to care about the packet boundary. Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:54:58 +02:00
Patrick Fu	5c7ddd6b14	vhost: fix missing virtqueue status check in async path Vring should not be touched if vq is disabled. This patch adds the vq status check in async enqueue polling to avoid accessing to a disabled queue. Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:50:29 +02:00
Patrick Fu	6a82bceb56	vhost: fix missing device pointer validity check This patch adds the check of dev pointer in vhost async enqueue completion poll. If a NULL dev pointer detected, the poll function returns immediately. Coverity issue: 360839 Fixes: `cd6760da10` ("vhost: introduce async enqueue for split ring") Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-07-21 16:50:29 +02:00
Andrew Rybchenko	520059a41a	net: check fragmented headers in non-debug as well Pseudo-header checksum calculation requires contiguous headers. There is no any formal requirements on data location and mbuf structure which could be used by the application. Since commit `dfc6b2fd8d` ("mbuf: remove Intel offload checks from generic API") fragmented headers checks are done inside rte_net_intel_cksum_flags_prepare() in RTE_LIBRTE_ETHDEV_DEBUG build because it is moved from rte_validate_tx_offload() which is called under debug only. Make corresponding check to be done in non-debug build as well to avoid bad accesses, incorrect checksum calculation and to return appropriate error from Tx prepare. Make no-offloads check more precise and do it in non-debug build as well to avoid contiguous headers check and Tx prepare failure if it is not actually required. Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-21 13:54:54 +02:00
Ruifeng Wang	4cdd49f9b0	lpm: report error when defer queue overflows Coverity complains about unchecked return value of rte_rcu_qsbr_dq_enqueue. By default, defer queue size is big enough to hold all tbl8 groups. When enqueue fails, return error to the user to indicate system issue. Coverity issue: 360832 Fixes: `8a9f8564e9` ("lpm: implement RCU rule reclamation") Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-07-21 20:48:40 +02:00
Phil Yang	db48bae253	mbuf: use C11 atomic builtins for refcnt Use C11 atomic builtins with explicit ordering instead of rte_atomic ops which enforce unnecessary barriers on aarch64. Suggested-by: Olivier Matz <olivier.matz@6wind.com> Suggested-by: Dodji Seketeli <dodji@redhat.com> Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-21 10:30:35 +02:00
Ciara Power	2a7d0b872f	telemetry: add upper limit on connections This patch limits the number of client connections to the new telemetry socket. The limit is set to 10. Signed-off-by: Ciara Power <ciara.power@intel.com>	2020-07-19 15:36:37 +02:00
Phil Yang	672a150563	eal: add wrapper for C11 atomic thread fence Provide a wrapper for __atomic_thread_fence builtins to support optimized code for __ATOMIC_SEQ_CST memory order for x86 platforms. Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-17 16:00:30 +02:00
Ciara Power	9683022930	metrics: fix header installation with meson If Jansson was found, the headers list is overwritten when including rte_metrics_telemetry.h, which prevents rte_metrics.h from being installed. This is now fixed to add to headers, rather than overwrite, to allow both headers be installed when Jansson is present. Fixes: `c5b7197f66` ("telemetry: move some functions to metrics library") Cc: stable@dpdk.org Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-07-17 16:00:30 +02:00
Honnappa Nagarahalli	8831678b51	eal: change the log level for test asserts Change the log level for RTE_TEST_ASSERT macro to error to help log errors while running test cases. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>	2020-07-17 10:47:56 +02:00
Ferruh Yigit	353162537f	lpm: fix build dependency on RCU library 'librte_rcu' is now dependency to 'librte_lpm' library, this dependency should be reflected to build system. Fixes: `8a9f8564e9` ("lpm: implement RCU rule reclamation") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-07-15 13:15:06 +02:00
Bing Zhao	d164c609e7	ethdev: add eCPRI key fields to flow API Add a new item "rte_flow_item_ecpri" in order to match eCRPI header. eCPRI is a packet based protocol used in the fronthaul interface of 5G networks. Header format definition could be found in the specification via the link below: https://www.gigalight.com/downloads/standards/ecpri-specification.pdf eCPRI message can be over Ethernet layer (.1Q supported also) or over UDP layer. Message header formats are the same in these two variants. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-13 02:11:30 +02:00
Renata Saiakhova	c9c74288f0	ethdev: add function to release HW rings Free previously allocated memzone for HW rings Signed-off-by: Renata Saiakhova <renata.saiakhova@ekinops.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-11 06:18:54 +02:00
Viacheslav Ovsiienko	9da82e8d8b	mbuf: introduce accurate packet Tx scheduling There is the requirement on some networks for precise traffic timing management. The ability to send (and, generally speaking, receive) the packets at the very precisely specified moment of time provides the opportunity to support the connections with Time Division Multiplexing using the contemporary general purpose NIC without involving an auxiliary hardware. For example, the supporting of O-RAN Fronthaul interface is one of the promising features for potentially usage of the precise time management for the egress packets. The main objective of this patchset is to specify the way how applications can provide the moment of time at what the packet transmission must be started and to describe in preliminary the supporting this feature from mlx5 PMD side [1]. The new dynamic timestamp field is proposed, it provides some timing information, the units and time references (initial phase) are not explicitly defined but are maintained always the same for a given port. Some devices allow to query rte_eth_read_clock() that will return the current device timestamp. The dynamic timestamp flag tells whether the field contains actual timestamp value. For the packets being sent this value can be used by PMD to schedule packet sending. The device clock is opaque entity, the units and frequency are vendor specific and might depend on hardware capabilities and configurations. If might (or not) be synchronized with real time via PTP, might (or not) be synchronous with CPU clock (for example if NIC and CPU share the same clock source there might be no any drift between the NIC and CPU clocks), etc. After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed deprecation and obsoleting, these dynamic flag and field might be used to manage the timestamps on receiving datapath as well. Having the dedicated flags for Rx/Tx timestamps allows applications not to perform explicit flags reset on forwarding and not to promote received timestamps to the transmitting datapath by default. The static PKT_RX_TIMESTAMP is considered as candidate to become the dynamic flag and this move should be discussed. When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it tries to synchronize the time of packet appearing on the wire with the specified packet timestamp. If the specified one is in the past it should be ignored, if one is in the distant future it should be capped with some reasonable value (in range of seconds). These specific cases ("too late" and "distant future") can be optionally reported via device xstats to assist applications to detect the time-related problems. There is no any packet reordering according timestamps is supposed, neither within packet burst, nor between packets, it is an entirely application responsibility to generate packets and its timestamps in desired order. The timestamps can be put only in the first packet in the burst providing the entire burst scheduling. PMD reports the ability to synchronize packet sending on timestamp with new offload flag: This is palliative and might be replaced with new eth_dev API about reporting/managing the supported dynamic flags and its related features. This API would break ABI compatibility and can't be introduced at the moment, so is postponed to 20.11. For testing purposes it is proposed to update testpmd "txonly" forwarding mode routine. With this update testpmd application generates the packets and sets the dynamic timestamps according to specified time pattern if it sees the "rte_dynfield_timestamp" is registered. The new testpmd command is proposed to configure sending pattern: set tx_times <burst_gap>,<intra_gap> <intra_gap> - the delay between the packets within the burst specified in the device clock units. The number of packets in the burst is defined by txburst parameter <burst_gap> - the delay between the bursts in the device clock units As the result the bursts of packet will be transmitted with specific delays between the packets within the burst and specific delay between the bursts. The rte_eth_read_clock is supposed to be engaged to get the current device clock value and provide the reference for the timestamps. [1] http://patches.dpdk.org/patch/73714/ Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-11 06:18:54 +02:00
Ivan Malov	5cf04fd15a	net: use named constants for deprecated QinQ TPIDs Add named constants for deprecated QinQ TPIDs. Update drivers which have already been using existing TPID named constants from librte_net to use the new named constants rather than magic numbers. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-11 06:18:53 +02:00
Junfeng Guo	d9a8bc6570	ethdev: add RSS types for IPv6 prefix This patch defines new RSS offload types for IPv6 prefix with 32, 40, 48, 56, 64, 96 bits of both SRC and DST IPv6 address. Ref https://tools.ietf.org/html/rfc6052. Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-11 06:18:53 +02:00
Wei Hu (Xavier)	50ce3e7aec	ethdev: fix VLAN offloads set if no relative capabilities Currently, there is a potential problem that calling the API function rte_eth_dev_set_vlan_offload to start VLAN hardware offloads which the driver does not support. If the PMD driver does not support certain VLAN hardware offloads and does not check for it, the hardware setting will not change, but the VLAN offloads in dev->data->dev_conf.rxmode.offloads will be turned on. It is supposed to check the hardware capabilities to decide whether the relative callback needs to be called just like the behavior in the API function named rte_eth_dev_configure. And it is also needed to cleanup duplicated checks which are done in some PMDs. Also, note that it is behaviour change for some PMDs which simply ignore (with error/warning log message) unsupported VLAN offloads, but now it will fail. Fixes: `a4996bd89c` ("ethdev: new Rx/Tx offloads API") Fixes: `0ebce6129b` ("net/dpaa2: support new ethdev offload APIs") Fixes: `f9416bbafd` ("net/enic: remove VLAN filter handler") Fixes: `4f7d9e383e` ("fm10k: update vlan offload features") Fixes: `fdba3bf15c` ("net/hinic: add VLAN filter and offload") Fixes: `b96fb2f0d2` ("net/i40e: handle QinQ strip") Fixes: `d4a27a3b09` ("nfp: add basic features") Fixes: `56139e85ab` ("net/octeontx: support VLAN filter offload") Fixes: `ba1b3b081e` ("net/octeontx2: support VLAN offloads") Fixes: `d87246a437` ("net/qede: enable and disable VLAN filtering") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com> Acked-by: Harman Kalra <hkalra@marvell.com> Acked-by: Jeff Guo <jia.guo@intel.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-11 06:18:53 +02:00
Wei Hu (Xavier)	36fbaaf30d	ethdev: fix data room size verification in Rx queue setup In the rte_eth_rx_queue_setup API function, the local variable named mbp_buf_size, which is the data room size of the input parameter mp, is checked to guarantee that each memory chunk used for net device in the mbuf is bigger than the min_rx_bufsize. But if mbp_buf_size is less than RTE_PKTMBUF_HEADROOM, the value of the following statement will be a large number since the mbp_buf_size is a unsigned value. mbp_buf_size - RTE_PKTMBUF_HEADROOM As a result, it will cause a segment fault in this situation. This patch fixes it by modify the check condition to guarantee that the local variable named mbp_buf_size is bigger than RTE_PKTMBUF_HEADROOM. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-11 06:18:53 +02:00
Ferruh Yigit	cacd2bb786	ethdev: verify reserved HW ring Function 'rte_eth_dma_zone_reserve()' returns an existing memzone based on name match, but other requested attributes are discarded. This may cause driver using a memzone with wrong size or alignment. Verify size, alignment and socket_id for matched memzone, and do not use memzone if any one of the attributes are not justified. It is possible to free the existing memzone and allocate again with the requested attributes but it is better caller do the explicit free. Reported-by: Renata Saiakhova <renata.saiakhova@ekinops.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-11 06:18:52 +02:00
Adrian Moreno	2025f4fe6c	vhost: support virtio status message This patch adds support to the new Virtio device get status Vhost-user message. The driver can send this new message to read the device status. One of the uses of this message is to ensure the feature negotiation has succeeded. According to the virtio spec, after completing the feature negotiation, the driver sets the FEATURE_OK status bit and re-reads it to ensure the device has accepted the features. This patch also clears the FEATURE_OK status bit if the feature negotiation has failed to let the driver know about his failure. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	41d201804c	vhost: support virtio status This patch adds support to the new Virtio device status Vhost-user protocol feature. Getting such information in the backend helps to know when the driver is done with the device configuration and so makes the initialization phase more robust. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	a15f9dbba0	vhost: check vDPA configuration succeed This patch checks whether vDPA device configuration succeed and does not set the CONFIGURED flag if it didn't. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	b46a99c600	vhost: make some vDPA callbacks mandatory Some of the vDPA callbacks have to be implemented for vDPA to work properly. This patch marks them as mandatory in the API doc and simplify code calling these ops with removing unnecessary checks that are now done at registration time. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	2ab58f20db	vhost: refactor virtio ready check This patch is a small refactoring, as preliminary work for adding support to Virtio status support. No functional change here. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
Maxime Coquelin	1c3df72bda	vhost: fix virtio ready flag check Before checking whether the device is ready is done a check on whether the RUNNING flag is set. Then the READY flag is set if virtio_is_ready() returns true. While it seems to not cause any issue, it makes more sense to check whether the READY flag is set and not the RUNNING one. Fixes: `c0674b1bc8` ("vhost: move the device ready check at proper place") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-11 06:18:52 +02:00
David Marchand	7d1af09e98	eal/linux: truncate thread name pthread_setname_np refuses names larger than 16 bytes (\0 included). Rather than return an error, truncate the name to this limit in the rte_thread_setname helper. Caught with ixgbe which creates control thread with name "ixgbe-link-handler": Configuring Port 0 (socket 0) EAL: Cannot set name for ctrl thread ... EAL: Cannot set name for ctrl thread Port 0: link state change event ... EAL: Cannot set name for ctrl thread Port 0: link state change event Note: before this change, the thread would keep its original name, which meant in my test for the ixgbe handler either "dpdk-testpmd" or "eal-intr-thread". Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-07-11 15:03:47 +02:00
Ruifeng Wang	0f392d91b9	lpm: hide defer queue handle There is no need to return the defer queue handle in rte_lpm_rcu_qsbr_add, since enough flexibility has been provided to configure the defer queue. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2020-07-11 14:35:04 +02:00
Anatoly Burakov	20ab67608a	power: add environment capability probing Currently, there is no way to know if the power management env is supported without trying to initialize it. The init API also does not distinguish between failure due to some error and failure due to power management not being available on the platform in the first place. Thus, add an API that provides capability of probing support for a specific power management API. Suggested-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2020-07-11 13:31:16 +02:00
Thomas Monjalon	9d2b245937	pci: keep API compatibility with mmap values The function pci_map_resource() returns MAP_FAILED in case of error. When replacing the call to mmap() by rte_mem_map(), the error code became NULL, breaking the API. This function is probably not used outside of DPDK, but it is still a problem for two reasons: - the deprecation process was not followed - the Linux function pci_vfio_mmap_bar() is broken for i40e The error code is reverted to the Unix value MAP_FAILED. Windows needs to define this special value (-1 as in Unix). After proper deprecation process, the API could be changed again if really needed. Because of the switch from mmap() to rte_mem_map(), another part of the API was changed: "int additional_flags" are defined as "additional flags for the mapping range" without mentioning it was directly used in mmap(). Currently it is directly used in rte_mem_map(), that's why the values rte_map_flags must be mapped (sic) on the mmap ones in case of Unix OS. These are side effects of a badly defined API using Unix values. Bugzilla ID: 503 Fixes: `2fd3567e54` ("pci: use OS generic memory mapping functions") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Tested-by: Lihong Ma <lihongx.ma@intel.com>	2020-07-11 11:48:13 +02:00
Harman Kalra	3596a037ab	eal: fix parentheses in alignment macros Found an issue while using RTE_ALIGN_MUL_NEAR with an expression, like as passed in estimate_tsc_freq(). RTE_ALIGN_MUL_FLOOR resulted in unexpected value as parathesis are required to evaluate an expression. Fixes: `5120203d75` ("eal: add macros to align value to multiple") Cc: stable@dpdk.org Signed-off-by: Harman Kalra <hkalra@marvell.com>	2020-07-11 11:41:33 +02:00
Dmitry Kozlyuk	7daf5bdb0f	eal/windows: detect insufficient privileges for hugepages AdjustTokenPrivileges() succeeds even if no requested privileges have been granted; this behavior is documented. Check last error code in addition to return value to detect such case. Make error messages more specific and add troubleshooting hint. Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com>	2020-07-11 00:45:20 +02:00
Hongzhi Guo	982bb68cab	net: fix checksum on big endian CPUs With current code, the checksum of odd-length buffers is wrong on big endian CPUs: the last byte is not properly summed to the accumulator. Fix this by left-shifting the remaining byte by 8. For instance, if the last byte is 0x42, we should add 0x4200 to the accumulator on big endian CPUs. This change is similar to what is suggested in Errata 3133 of RFC 1071. Fixes: 6006818cfb26("net: new checksum functions") Cc: stable@dpdk.org Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-07-11 00:45:20 +02:00
Hongzhi Guo	d5df2ae042	net: fix unneeded replacement of TCP checksum 0 Per RFC768: If the computed checksum is zero, it is transmitted as all ones. An all zero transmitted checksum value means that the transmitter generated no checksum. RFC793 for TCP has no such special treatment for the checksum of zero. Fixes: `6006818cfb` ("net: new checksum functions") Cc: stable@dpdk.org Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Morten Brørup <mb@smartsharesystems.com>	2020-07-11 00:45:20 +02:00
Joyce Kong	58902736a4	vhost: restrict pointer aliasing for packed ring Restrict pointer aliasing to allow the compiler to vectorize loop more aggressively. With this patch, a 9.6% improvement is observed in throughput for the packed virtio-net PVP case, and a 2.8% improvement in throughput for the packed virtio-user PVP case. All performance data are measured on ThunderX-2 platform under 0.001% acceptable packet loss with 1 core on both vhost and virtio side. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>	2020-07-10 15:43:41 +02:00
Joyce Kong	428e684795	introduce restricted pointer aliasing marker The 'restrict' keyword is recognized in C99, while type qualifier '__restrict' compiles ok in C with all language levels. This patch is to replace the existing 'restrict' with '__rte_restrict' which is a common wrapper supported by all compilers. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-07-10 15:35:32 +02:00
Ruifeng Wang	8a9f8564e9	lpm: implement RCU rule reclamation Currently, the tbl8 group is freed even though the readers might be using the tbl8 group entries. The freed tbl8 group can be reallocated quickly. This results in incorrect lookup results. RCU QSBR process is integrated for safe tbl8 group reclaim. Refer to RCU documentation to understand various aspects of integrating RCU library into other libraries. To avoid ABI breakage, a struct __rte_lpm is created for lpm library internal use. This struct wraps rte_lpm that has been exposed and also includes members that don't need to be exposed such as RCU related config. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2020-07-10 13:41:29 +02:00
Phil Yang	e0a439466b	eal/linux: use C11 atomics for interrupt status The event status is defined as a volatile variable and shared between threads. Use C11 atomic built-ins with explicit ordering instead of rte_atomic ops which enforce unnecessary barriers on aarch64. The event status has been cleaned up by the compare-and-swap operation when we free the event data, so there is no need to set it to invalid after that. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Harman Kalra <hkalra@marvell.com>	2020-07-09 18:53:40 +02:00
Feifei Wang	2d6ed071a8	ring: use custom element for fixed size API Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation. This reduces code duplication and improves code maintenance. Tests done on Arm, x86 [1] and PPC [2] do not indicate performance degradation. [1] https://mails.dpdk.org/archives/dev/2020-July/173780.html [2] https://mails.dpdk.org/archives/dev/2020-July/173863.html Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-09 17:22:36 +02:00
Feifei Wang	019bffab51	ring: remove experimental flag from custom element API Remove the experimental tag for rte_ring_xxx_elem APIs that have been around for 2 releases. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-07-09 17:22:29 +02:00
Feifei Wang	2f86fb666e	ring: remove experimental flag from reset API Remove the experimental tag for rte_ring_reset API that have been around for 4 releases. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>	2020-07-09 17:21:54 +02:00
Levend Sayar	604d426de3	service: fix C++ linkage "extern C" define is added to rte_service_component.h file to be able to use in C++ context Fixes: `21698354c8` ("service: introduce service cores concept") Cc: stable@dpdk.org Signed-off-by: Levend Sayar <levendsayar@gmail.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2020-07-09 16:23:57 +02:00
Ferruh Yigit	f6ac2b06ec	ethdev: fix log type for some error messages Some log macros was using 'EAL' logtype, convert them to 'ethdev'. Also fix missing EOL and fix syntax for some logs. Fixes: `214ed1acd1` ("ethdev: add iterator to match devargs input") Fixes: `e489007a41` ("ethdev: add generic create/destroy ethdev APIs") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2020-07-07 23:38:28 +02:00
Patrick Fu	cd6760da10	vhost: introduce async enqueue for split ring This patch implements async enqueue data path for split ring. 2 new async data path APIs are defined, by which applications can submit and poll packets to/from async engines. The async engine is either a physical DMA device or it could also be a software emulated backend. The async enqueue data path leverages callback functions registered by applications to work with the async engine. Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-07 23:38:28 +02:00
Patrick Fu	78639d5456	vhost: introduce async enqueue registration API Performing large memory copies usually takes up a major part of CPU cycles and becomes the hot spot in vhost-user enqueue operation. To offload the large copies from CPU to the DMA devices, asynchronous APIs are introduced, with which the CPU just submits copy jobs to the DMA but without waiting for its copy completion. Thus, there is no CPU intervention during data transfer. We can save precious CPU cycles and improve the overall throughput for vhost-user based applications. This patch introduces registration/un-registration APIs for vhost async data enqueue operation. Together with the registration APIs implementations, data structures and the prototype of the async callback functions required for async enqueue data path are also defined. Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2020-07-07 23:38:28 +02:00
Simei Su	9a859b8c4a	ethdev: add PPPoE RSS offload types This patch defines new RSS offload types for PPPoE. Typically, session id would be the RSS input set for a PPPoE packet, but as a hint, each driver may have different default behaviors. Signed-off-by: Simei Su <simei.su@intel.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-07-07 23:38:27 +02:00
Lukasz Wojciechowski	048db4b6dc	service: fix core mapping reset The rte_service_lcore_reset_all function stops execution of services on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE. However the thread loop for slave lcores (eal_thread_loop) distincts these roles to set lcore state after processing delegated function. It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE. So changing the role to RTE before stopping work in slave lcores causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait must be run after rte_service_lcore_reset_all to bring back lcores to launchable (WAIT) state. This has been fixed in test app and clarified in API documentation. Setting the state to WAIT in rte_service_runner_func is premature as the rte_service_runner_func function is still a part of the lcore function delegated to slave lcore. The state is overwritten anyway in slave lcore thread loop. This premature setting state to WAIT might however cause rte_eal_lcore_wait, that was called by the application, to return before slave lcore thread set the FINISHED state. That's why it is removed from librte_eal rte_service_runner_func function. Bugzilla ID: 464 Fixes: `21698354c8` ("service: introduce service cores concept") Fixes: `f038a81e1c` ("service: add unit tests") Cc: stable@dpdk.org Reported-by: Sarosh Arif <sarosh.arif@emumba.com> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2020-07-08 18:52:49 +02:00
Phil Yang	030c216411	eventdev: relax SMP barriers with C11 atomics The impl_opaque field is shared between the timer arm and cancel operations. Meanwhile, the state flag acts as a guard variable to make sure the update of impl_opaque is synchronized. The original code uses rte_smp barriers to achieve that. This patch uses C11 atomics with an explicit one-way memory barrier instead of full barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64. Since compilers can generate the same instructions for volatile and non-volatile variable in C11 __atomics built-ins, so remain the volatile keyword in front of state enum to avoid the ABI break issue. Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2020-07-08 18:16:41 +02:00
Phil Yang	e84d9c62c6	eventdev: remove redundant reset on timer cancel There is no thread will access these impl_opaque data after timer canceled. When new timer armed, it got refilled. So the cleanup process is unnecessary. Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>	2020-07-08 18:16:41 +02:00
Phil Yang	1028d63eb2	eventdev: use C11 atomics for lcore timer armed flag The in_use flag is a per core variable which is not shared between lcores in the normal case and the access of this variable should be ordered on the same core. However, if non-EAL thread pick the highest lcore to insert timers into, there is the possibility of conflicts on this flag between threads. Then the atomic compare-and-swap operation is needed. Use the C11 atomics instead of the generic rte_atomic operations to avoid the unnecessary barrier on aarch64. Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2020-07-08 18:16:41 +02:00
Phil Yang	aceb737d6f	eventdev: fix race condition on timer list counter The n_poll_lcores counter and poll_lcore array are shared between lcores and the update of these variables are out of the protection of spinlock on each lcore timer list. The read-modify-write operations of the counter are not atomic, so it has the potential of race condition between lcores. Use c11 atomics with RELAXED ordering to prevent confliction. Fixes: `cc7b73ea9e` ("eventdev: add new software timer adapter") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2020-07-08 18:16:41 +02:00

1 2 3 4 5 ...

6305 Commits