numam-dpdk

Author	SHA1	Message	Date
Dan Gora	c6fd54f28c	kni: add function to set link state on kernel interface Add a new API function to KNI, rte_kni_update_link() to allow DPDK applications to update the link status for KNI network interfaces in the linux kernel. Signed-off-by: Dan Gora <dg@adax.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 19:46:15 +02:00
Phil Yang	fd5f33323e	kni: introduce C11 atomic into FIFO synchronization Syncing the values by adding c11 atomic memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	711859cd0d	kni: fix kernel FIFO synchronization Adding memory barrier to make sure the values being synced before updating fifo_write in kni_fifo_put and fifo_read in kni_fifo_get. Fixes: `3fc5ca2f63` ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	0b05abe7bf	kni: fix FIFO synchronization With existing code in kni_fifo_put, rx_q values are not being updated before updating fifo_write. While reading rx_q in kni_net_rx_normal, This is causing the sync issue on other core. The same situation happens in kni_fifo_get as well. So syncing the values by adding memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Fixes: `3fc5ca2f63` ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	ede56cc18d	config: rename option for C11 memory model Keep only single config option RTE_USE_C11_MEM_MODEL for C11 memory model, so all modules can leverage C11 atomic extension by enable this option. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-26 18:09:22 +02:00
David Hunt	31259a3376	power: fix traffic aware build 1. %ld to PRId64 for 32-bit builds 2. Fix dependency on librte_timer Fixes: `450f079131` ("power: add traffic pattern aware power control") Signed-off-by: David Hunt <david.hunt@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 14:51:36 +02:00
Jerin Jacob	1f8494f002	eal/ppc: support pause API Add support for rte_pause() implementation for ppc64. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>	2018-10-26 14:37:56 +02:00
Honnappa Nagarahalli	e605a1d36c	hash: add lock-free r/w concurrency Add lock-free read-write concurrency. This is achieved by the following changes. 1) Add memory ordering to avoid race conditions. The only race condition that can occur is - using the key store element before the key write is completed. Hence, while inserting the element the release memory order is used. Any other race condition is caught by the key comparison. Memory orderings are added only where needed. For ex: reads in the writer's context do not need memory ordering as there is a single writer. key_idx in the bucket entry and pdata in the key store element are used for synchronisation. key_idx is used to release an inserted entry in the bucket to the reader. Use of pdata for synchronisation is required due to updation of an existing entry where-in only the pdata is updated without updating key_idx. 2) Reader-writer concurrency issue, caused by moving the keys to their alternative locations during key insert, is solved by introducing a global counter(tbl_chng_cnt) indicating a change in table. 3) Add the flag to enable reader-writer concurrency during run time. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:50:43 +02:00
Honnappa Nagarahalli	dbdbc4a2e9	hash: fix key store element alignment Fix the key store array element alignment such that every array element is aligned on KEY_ALIGNMENT boundary. This is required to make 'pdata' in 'struct rte_hash_key' align on its natural boundary for atomic load/store. Fixes: `473d1bebce` ("hash: allow to store data in hash table") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:45:40 +02:00
Honnappa Nagarahalli	9d033dac7d	hash: support no free on delete rte_hash_lookup_xxx APIs return the index of slot in the key store. Application(reader) can use that index to reference other data structures in its scope. Because of this, the index should not be freed till the application completes using the index. RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this. When this flag is enabled rte_hash_del_xxx APIs do not free the key-store index/internal memory associated with the deleted entry. The new API rte_hash_free_key_with_position should be called to free the key-store index/internal memory after calling rte_hash_del_xxx APIs. Suggested-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:44:52 +02:00
Honnappa Nagarahalli	40f8e9c28c	hash: separate multi-writer from r/w concurrency RW concurrency is required with single writer and multiple reader usecase as well. Hence, multi-writer should not be enabled by default when RW concurrency is enabled. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:43:52 +02:00
David Hunt	757bf2e7cf	lib/power: add changes for host commands/policies This patch does a couple of things: * Adds a new message type for removing policies (PKT_POLICY_REMOVE) Used when we want to remove a previously created policy. * Adds a core_type bool to the channel packet struct to specify whether the type of core we want to control is virtual or physical. Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-26 10:48:15 +02:00
Liang Ma	450f079131	power: add traffic pattern aware power control 1. Abstract For packet processing workloads such as DPDK polling is continuous. This means CPU cores always show 100% busy independent of how much work those cores are doing. It is critical to accurately determine how busy a core is hugely important for the following reasons: * No indication of overload conditions. * User does not know how much real load is on a system, resulting in wasted energy as no power management is utilized. Compared to the original l3fwd-power design, instead of going to sleep after detecting an empty poll, the new mechanism just lowers the core frequency. As a result, the application does not stop polling the device, which leads to improved handling of bursts of traffic. When the system become busy, the empty poll mechanism can also increase the core frequency (including turbo) to do best effort for intensive traffic. This gives us more flexible and balanced traffic awareness over the standard l3fwd-power application. 2. Proposed solution The proposed solution focuses on how many times empty polls are executed. The less the number of empty polls, means current core is busy with processing workload, therefore, the higher frequency is needed. The high empty poll number indicates the current core not doing any real work therefore, we can lower the frequency to safe power. In the current implementation, each core has 1 empty-poll counter which assume 1 core is dedicated to 1 queue. This will need to be expanded in the future to support multiple queues per core. 2.1 Power state definition: LOW: Not currently used, reserved for future use. MED: the frequency is used to process modest traffic workload. HIGH: the frequency is used to process busy traffic workload. 2.2 There are two phases to establish the power management system: a.Initialization/Training phase. The training phase is necessary in order to figure out the system polling baseline numbers from idle to busy. The highest poll count will be during idle, where all polls are empty. These poll counts will be different between systems due to the many possible processor micro-arch, cache and device configurations, hence the training phase. In the training phase, traffic is blocked so the training algorithm can average the empty-poll numbers for the LOW, MED and HIGH power states in order to create a baseline. The core's counter are collected every 10ms, and the Training phase will take 2 seconds. Training is disabled as default configuration. The default parameter is applied. Sample App still can trigger training if that's needed. Once the training phase has been executed once on a system, the application can then be started with the relevant thresholds provided on the command line, allowing the application to start passing start traffic immediately b.Normal phase. Traffic starts immediately based on the default thresholds, or based on the user supplied thresholds via the command line parameters. The run-time poll counts are compared with the baseline and the decision will be taken to move to MED power state or HIGH power state. The counters are calculated every 10ms. 3. Proposed API 1. rte_power_empty_poll_stat_init(struct ep_params *eptr, uint8_t freq_tlb, struct ep_policy policy); which is used to initialize the power management system. 2. rte_power_empty_poll_stat_free(void); which is used to free the resource hold by power management system. 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); which is used to update specific core empty poll counter, not thread safe 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); which is used to update specific core valid poll counter, not thread safe 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core empty poll counter. 6. rte_power_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core valid poll counter. 7. rte_empty_poll_detection(struct rte_timer tim, void *arg); which is used to detect empty poll state changes then take action. Signed-off-by: Liang Ma <liang.j.ma@intel.com> Reviewed-by: Lei Yao <lei.a.yao@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2018-10-26 01:55:07 +02:00
Yipeng Wang	c7d93df552	hash: use partial-key hashing This commit changes the hashing mechanism to "partial-key hashing" to calculate bucket index and signature of key. This is proposed in Bin Fan, et al's paper "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing". Basically the idea is to use "xor" to derive alternative bucket from current bucket index and signature. With "partial-key hashing", it reduces the bucket memory requirement from two cache lines to one cache line, which improves the memory efficiency and thus the lookup speed. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00
Yipeng Wang	75706568a7	hash: add extendable bucket feature In use cases that hash table capacity needs to be guaranteed, the extendable bucket feature can be used to contain extra keys in linked lists when conflict happens. This is similar concept to the extendable bucket hash table in packet framework. This commit adds the extendable bucket feature. User can turn it on or off through the extra flag field during table creation time. Extendable bucket table composes of buckets that can be linked list to current main table. When extendable bucket is enabled, the hash table load can always achieve 100%. In other words, the table can always accommodate the same number of keys as the specified table size. This provides 100% table capacity guarantee. Although keys ending up in the ext buckets may have longer look up time, they should be rare due to the cuckoo algorithm. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00
Yipeng Wang	9904094344	hash: fix race condition in iterate In rte_hash_iterate, the reader lock did not protect the while loop which checks empty entry. This created a race condition that the entry may become empty when enters the lock, then a wrong key data value would be read out. This commit reads out the position in the while condition, which makes sure that the position will not be changed to empty before entering the lock. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 00:33:51 +02:00
Yipeng Wang	86c1ef2090	hash: remove unused constant Since the depth-first search of cuckoo path is removed, we do not need the macro anymore which specifies the depth of the cuckoo search. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 00:00:16 +02:00
Jerin Jacob	a5cba5a2c4	mbuf: add IGMP packet type Add support for IGMP packet type. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-25 15:51:16 +02:00
Jerin Jacob	8e255bdb1b	mbuf: add MPLS packet type Add support of MPLS packet type. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-25 15:48:30 +02:00
Jerin Jacob	07e70104e0	mbuf: add FCoE packet type Add support of FCoE packet type. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-25 15:48:24 +02:00
Ferruh Yigit	73aa5c1332	ring: add library version to meson build Fixes: `a3d6026711` ("ring: relax alignment constraint on ring structure") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-10-25 14:30:05 +02:00
Ferruh Yigit	63f54e650a	mbuf: fix library version on meson build Fixes: `d27a626187` ("mbuf: remove control mbuf") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-10-25 14:29:22 +02:00
Rami Rosen	2b4aa851ec	bpf: fix a typo This trivial patch fixes a typo in rte_bpf_ethdev.h, Fixes: `a93ff62a89` ("bpf: introduce basic Rx/Tx filters") Cc: stable@dpdk.org Signed-off-by: Rami Rosen <ramirose@gmail.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-25 11:27:49 +02:00
Reshma Pattan	77b7485af7	latency: fix timestamp marking and latency calculation Latency calculation logic is not correct for the case where packets gets dropped before TX. As for the dropped packets, the timestamp is not cleared, and such packets still gets counted for latency calculation in next runs, that will result in inaccurate latency measurement. So fix this issue as below, Before setting timestamp in mbuf, check mbuf don't have any prior valid time stamp flag set and after marking the timestamp, set mbuf flags to indicate timestamp is valid. Before calculating timestamp check mbuf flags are set to indicate timestamp is valid. With the above logic it is guaranteed that correct timestamps have been used. Fixes: `5cd3cac9ed` ("latency: added new library for latency stats") Cc: stable@dpdk.org Reported-by: Bao-Long Tran <longtb5@viettel.com.vn> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com> Tested-by: Bao-Long Tran <longtb5@viettel.com.vn> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-25 10:30:13 +02:00
Paul Luse	66fd3a3b0f	bus/vdev: fix multi-process IPC buffer leak on scan This patch fixes an issue caught with ASAN where a vdev_scan() to a secondary bus was failing to free some memory. The doxygen comment in EAL is fixed at the same time. Fixes: `cdb068f031` ("bus/vdev: scan by multi-process channel") Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Paul Luse <paul.e.luse@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-25 10:28:13 +02:00
Gaetan Rivet	97e476ad7c	devargs: fix variadic parsing memory leak rte_devargs_parsef will leak memory each time it is called. The device string must be freed. Fixes: `a23bc2c4e0` ("devargs: add non-variadic parsing function") Cc: stable@dpdk.org Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-25 08:54:25 +02:00
Keith Wiles	81bede55e3	eal: add macro for attribute weak eal: add shorthand __rte_weak macro qat: update code to use __rte_weak macro avf: update code to use __rte_weak macro fm10k: update code to use __rte_weak macro i40e: update code to use __rte_weak macro ixgbe: update code to use __rte_weak macro mlx5: update code to use __rte_weak macro virtio: update code to use __rte_weak macro acl: update code to use __rte_weak macro bpf: update code to use __rte_weak macro Signed-off-by: Keith Wiles <keith.wiles@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-25 02:11:23 +02:00
Stephen Hemminger	dea302eb4e	eal/arm: remove profanity in comment Update comment to describe the problem better without risk of being offensive. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-25 02:11:22 +02:00
Stephen Hemminger	4f06c51c7e	eal/linux: eliminate cast of HPET thread signature The cast of hpet_msb_inc is causing a warning in some compilations. Yet the cast is unnecessary, the function is used only one place just use the correct signature. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-25 02:11:22 +02:00
Stephen Hemminger	08e348daab	eal: remove double space in init alert messages rte_init_alert already adds a newline, don't do it twice. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-25 02:11:22 +02:00
Akhil Goyal	8b593b8cbf	security: support PDCP Packet Data Convergence Protocol (PDCP) is added in rte_security for 3GPP TS 36.323 for LTE. The patchset provide the structure definitions for configuring the PDCP sessions and relevant documentation is added. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>	2018-10-24 15:12:33 +02:00
Dariusz Stojaczyk	b4f62e5862	ipc: fix undefined behavior in no-shconf mode In no-shconf mode the rte_mp_request_sync() wasn't initializing the `reply` parameter, which contained e.g. a number of sent requests. Callers of rte_mp_request_sync() might check that param afterwards and might read potentially unitialized memory. The no-shconf check that makes us return early (with rc = 0) was placed before the `reply` initialization. Fix this by making the `reply` initialization occur first. Fixes: `5848e3d281` ("ipc: support --no-shconf mode") Cc: stable@dpdk.org Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-24 21:49:57 +02:00
Thomas Monjalon	25495407cb	kvargs: fix processing a null list In the doxygen description of rte_kvargs_process(), it is said: If kvlist is NULL function does nothing. It has been added by mistake here instead of rte_kvargs_free(). Anyway, null list should be correctly handled in both functions. Comments are fixed in both functions and NULL handling is added to rte_kvargs_process(). Fixes: `c34af7424e` ("kvargs: fix freeing behaviour for null") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-24 15:06:46 +02:00
Anatoly Burakov	198b66b946	mem: fix resource leak Segment preallocation code allocates an array of structures on the heap but does not free the memory afterwards. Fix it by freeing it at the end of the function, and changing control flow to always go through that code path. Coverity issue: 323524 Fixes: `1dd342d0fd` ("mem: improve segment list preallocation") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-23 11:36:48 +02:00
Qi Zhang	4dc3db031d	eal: fix bus name read for removal in multi-process A crash may appear when removing some PCI devices because dev->devargs is not always initialized. So use dev->bus instead of dev->devargs->bus when building devargs string to remove a device. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-22 12:41:28 +02:00
Anatoly Burakov	1dd342d0fd	mem: improve segment list preallocation Current code to preallocate segment lists is trying to do everything in one go, and thus ends up being convoluted, hard to understand, and, most importantly, does not scale beyond initial assumptions about number of NUMA nodes and number of page sizes, and therefore has issues on some configurations. Instead of fixing these issues in the existing code, simply rewrite it to be slightly less clever but much more logical, and provide ample comments to explain exactly what is going on. We cannot use the same approach for 32-bit code because the limitations of the target dictate current socket-centric approach rather than type-centric approach we use on 64-bit target, so 32-bit code is left unmodified. FreeBSD doesn't support NUMA so there's no complexity involved there, and thus its code is much more readable and not worth changing. Fixes: `1d406458db` ("mem: make segment preallocation OS-specific") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-22 12:40:14 +02:00
Anatoly Burakov	0042eb5646	eal: improve musl compatibility of thread log Musl complains about pthread id being of wrong size, because on musl, pthread_t is a struct pointer, not an unsigned int. Fix the printing code by casting pthread id to unsigned pointer type and adjusting the format specifier to be of appropriate size. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 12:40:14 +02:00
Anatoly Burakov	0601defe2e	eal: improve musl compatibility of string functions Musl wraps various string functions such as strlcpy in order to harden them. However, the fortify wrappers are included without including the actual string functions being wrapped, which throws missing definition compile errors. Fix by including string.h in string functions header. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 12:40:14 +02:00
Anatoly Burakov	3717943819	mem: improve musl compatibility When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 31 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 11:29:37 +02:00
Anatoly Burakov	1c7fd81054	eal/linux: improve musl compatibility When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 33 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 11:28:52 +02:00
Anatoly Burakov	997b0ef8f8	fbarray: improve musl compatibility When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 34 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 11:28:46 +02:00
Anatoly Burakov	5d7b673d5f	mk: build with _GNU_SOURCE defined by default We use _GNU_SOURCE all over the place, but often times we miss defining it, resulting in broken builds on musl. Rather than fixing every library's and driver's and application's makefile, fix it by simply defining _GNU_SOURCE by default for all builds. Remove all usages of _GNU_SOURCE in source files and makefiles, and also fixup a couple of instances of using __USE_GNU instead of _GNU_SOURCE. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-22 11:28:27 +02:00
Thomas Monjalon	739e13bcc9	devargs: fix freeing during device removal After calling unplug function of a bus, the device is expected to be freed. It is too late for getting devargs to remove. Anyway, the buses which implement unplug are already freeing the devargs, except the PCI bus. So the call to rte_devargs_remove() is removed from EAL and added in PCI. Fixes: `2effa126fb` ("devargs: simplify parameters of removal function") Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-19 22:37:10 +02:00
Nithin Dabilpuram	52d8bd535f	mbuf: fix missing Tx outer UDP checksum flag name Fix missing Tx outer udp checksum flag name Fixes: `df694a05bf` ("ethdev: add Tx offload outer UDP checksum definition") Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-18 10:24:39 +02:00
Jerin Jacob	c23e465944	mbuf: fix offload flag name and list Fix missing PKT_TX* & PKT_RX* ol_flag name and fix ol_flag list. Fixes: `6d18505efa` ("vhost: support UDP Fragmentation Offload") Fixes: `829a1c2c41` ("mbuf: extend flow director field") Fixes: `63c0d74daa` ("mbuf: add Tx side tunneling type") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	aa18d98bca	vhost: enable postcopy protocol feature Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	cd85039e7e	vhost: restrict postcopy live-migration enablement Postcopy live-migration feature requires the application to not populate the guest memory. As the vhost library cannot prevent the application to that (e.g. preventing the application to call mlockall()), the feature is disabled by default. The application should only enable the feature if it does not force the guest memory to be populated. In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT flag at registration but the feature was not compiled, registration fails. For the same reason, postcopy and dequeue zero copy features are not compatible, so don't advertize postcopy support if dequeue zero copy is requested. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	9a0a3a25fa	vhost: support postcopy end request The master sends this message before stopping handling userfaults, so that the backend closes the userfaultfd. The master waits for the slave to acknowledge the request with an empty 64bits payload for synchronization purpose. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	5c2ee9662d	vhost: send userfault range addresses back to Qemu Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	f40d8ea63a	vhost: avoid useless VhostUserMemory copy The VHOST_USER_SET_MEM_TABLE payload is copied when handled, whereas it could directly be referenced. This is not very important, but next, we'll need to update the payload and send it back to Qemu. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	9b62e2da18	vhost: register new regions with userfaultfd Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	25807d6893	vhost: support postcopy listen message Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	9eefef3b59	vhost: introduce postcopy advise message This patch opens a userfaultfd and sends it back to Qemu's VHOST_USER_POSTCOPY_ADVISE request. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	06e787bd94	vhost: add config flag for postcopy Postcopy live-migration features relies on userfaultfd, which was only introduced in kernel v4.3. This patch introduces a new define to allow building vhost library on kernels not supporting userfaultfd. With legacy build system, user has to explicitly set CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'. With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets automatically defined if userfaultfd kernel header is present. Suggested-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	9bd3979306	vhost: enable fds passing in vhost-user messages Passing userfault fds to Qemu will be required for postcopy live-migration feature. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	02ef07efc0	vhost: pass socket fd to message handling callbacks This is not used for now, but will be needed for the special handling of VHOST_USER_SET_MEM_TABLE message once postcopy will be supported. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	c00bb88d35	vhost: add number of fds to vhost-user messages As soon as some ancillary data (fds) are received, it is copied without checking its length. This patch adds the number of fds received to the message, which is set in read_vhost_message(). This is preliminary work to support sending fds to Qemu. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	7f20d9a965	vhost: define postcopy protocol flag Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	74ee315e4f	vhost: fix error handling when mem table gets updated When the memory table gets updated, the rings addresses need to be translated again. If it fails, we need to exit cleanly by unmapping memory regions. Fixes: `d5022533c2` ("vhost: retranslate vring addr when memory table changes") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	57b4d90b58	vhost: fix payload size of reply QEMU doesn't expect any payload for the reply of VHOST_USER_SET_LOG_BASE request, so don't send any. Note that the Vhost-user specification isn't clear about it and would need to be fixed. Fixes: `54f9e32305` ("vhost: handle dirty pages logging request") Cc: stable@dpdk.org Reported-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	7987eb1bc7	vhost: clarify reply-ack in case a reply was already sent For messages that require a reply, a second ack should not be sent when reply-ack protocol feature is negotiated, even if the corresponding flag is set in the message. The code is compliant with the spec but it isn't clear it is, so this patch adds a comment to make it explicit. Suggested-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	ef6fb7d3fd	vhost: fix return code of messages requiring replies VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE and VHOST_USER_SET_LOG_BASE require replies, so their handlers should return VH_RESULT_REPLY, not VH_RESULT_OK. Fixes: `0bff510b5e` ("vhost: unify message handling function signature") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-10-18 10:24:39 +02:00
Maxime Coquelin	a52ac8ec27	vhost: fix messages results handling Return of message handling has now changed to an enum that can take non-negative value that is not zero in case a reply is needed. But the code checking the variable afterwards has not been updated, leading to success messages handling being treated as errors. External post and pre callbacks return type needs also to be changed to the new enum, so that its handling is consistent. This is done in this patch alongside with the convertion of its only user, vhost-crypto backend. Fixes: `0bff510b5e` ("vhost: unify message handling function signature") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Xiaolong Ye	d0d4887d62	vhost: add doxygen comment to vDPA header As APIs in rte_vdpa.h are public, we need to add doxygen comments to all APIs and structures. Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Tiwei Bie	cd9012c3f8	vhost: fix notification for packed ring The notification can't be disabled in packed ring when application tries to disable notification, because the device event flags field is overwritten by an unexpected value. This patch fixes this issue. Fixes: `b1cce26af1` ("vhost: add notification for packed ring") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2018-10-18 10:24:39 +02:00
Xiaoyu Min	15dbcdaada	ethdev: add generic MAC address rewrite actions rte_flow actions: - RTE_FLOW_ACTION_TYPE_SET_MAC_SRC - RTE_FLOW_ACTION_TYPE_SET_MAC_DST added in order to offload to NIC The rte_flow_itme_eth must be present in rte_flow pattern Signed-off-by: Xiaoyu Min <jackmin@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-18 10:24:39 +02:00
Xiaoyu Min	6f1c2168bc	ethdev: add generic TTL rewrite actions rewrite TTL by decrease or just set it directly it's not necessary to check if the final result is zero or not This is slightly different from the one defined by openflow and more generic Signed-off-by: Xiaoyu Min <jackmin@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-18 10:24:39 +02:00
Alejandro Lucero	deb373fb07	ethdev: add field for device data per process Primary and secondary processes share a per-device private data. With current design it is not possible to have data per-device per-process. This is required for handling properly the CPP interface inside the NFP PMD with multiprocess support. There is also at least another PMD driver, tap, with similar requirements for per-process device data. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-18 10:24:39 +02:00
Fan Zhang	3f1d382ca3	cryptodev: fix library version This patch fixes the cryptodev library version number that was missed updating in DPDK 18.08. Fixes: `a4493be5bd` ("cryptodev: replace bus specific struct with generic dev") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-17 12:23:40 +02:00
Junxiao Shi	f49d7e9a99	cryptodev: fix pool element size for undefined operation The documentation of rte_crypto_op_pool_create indicates that specifying RTE_CRYPTO_OP_TYPE_UNDEFINED would create a pool that supports all operation types. This change makes the code consistent with documentation. Fixes: `c0f87eb525` ("cryptodev: change burst API to be crypto op oriented") Cc: stable@dpdk.org Signed-off-by: Junxiao Shi <git@mail1.yoursunny.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-10-17 12:23:40 +02:00
Tomasz Jozwiak	f1597231ac	compressdev: fix compression API description This patch fixes following API descriptions - rte_comp_op_raw_bulk_alloc - rte_comp_op_bulk_alloc Fixes: `96086db5a3` ("compressdev: add operation management") Cc: stable@dpdk.org Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com>	2018-10-17 12:16:54 +02:00
Thomas Monjalon	e9d159c3d5	eal: allow probing a device again In the devargs syntax for device representors, it is possible to add several devices at once: -w dbdf,representor=[0-3] It will become a more frequent case when introducing wildcards and ranges in the new devargs syntax. If a devargs string is provided for probing, and updated with a bigger range for a new probing, then we do not want it to fail because part of this range was already probed previously. There can be new ports to create from an existing rte_device. That's why the check for an already probed device is moved as bus responsibility. In the case of vdev, a global check is kept in insert_vdev(), assuming that a vdev will always have only one port. In the case of ifpga and vmbus, already probed devices are checked. In the case of NXP buses, the probing is done only once (no hotplug), though a check is added at bus level for consistency. In the case of PCI, a driver flag is added to allow PMD probing again. Only the PMD knows the ports attached to one rte_device. As another consequence of being able to probe in several steps, the field rte_device.devargs must not be considered as a full representation of the rte_device, but only the latest probing args. Anyway, the field rte_device.devargs is used only for probing. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-18 01:49:52 +02:00
Thomas Monjalon	52897e7e70	eal: add function to query device status The function rte_dev_is_probed() is added in order to improve semantic and enforce proper check of the probing status of a device. It will answer this rte_device query: Is it already successfully probed or not? Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-18 01:49:28 +02:00
Thomas Monjalon	391797f042	drivers/bus: move driver assignment to end of probing The PCI mapping requires to know the PCI driver to use, even before the probing is done. That's why the PCI driver is referenced early inside the PCI device structure. See commit `1d20a073fa` ("bus/pci: reference driver structure before mapping") However the rte_driver does not need to be referenced in rte_device before the device probing is done. By moving back this assignment at the end of the device probing, it becomes possible to make clear the status of a rte_device. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com>	2018-10-17 10:26:59 +02:00
Thomas Monjalon	9fca820459	compressdev: remove driver name from logs The logs printed by COMPRESSDEV_LOG were prefixed with the driver name. In order to avoid assigning the driver before the end of the probing, the driver name is removed from the compressdev library logs. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-17 10:26:59 +02:00
Thomas Monjalon	0a8d2cb1fa	cryptodev: remove driver name from logs The logs printed by CDEV_LOG_* were prefixed with the driver name. In order to avoid assigning the driver before the end of the probing, the driver name is removed from the cryptodev library logs. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-17 10:26:59 +02:00
Thomas Monjalon	5e046832f1	ethdev: rename memzones allocated for DMA The helper rte_eth_dma_zone_reserve() is called by PMDs when probing a new port. It creates a new memzone with an unique name. The name of this memzone was using the name of the driver doing the probe. In order to avoid assigning the driver before the end of the probing, the driver name is removed from these memzone names. The ethdev name (data->name) is not used because it may be too long and may be not set at this stage of probing. Syntax of old name: <driver>_<ring>_<port>_<queue> Syntax of new name: eth_p<port>_q<queue>_<ring> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-17 10:26:59 +02:00
Qi Zhang	ac9e4a1737	eal: support attach/detach shared device from secondary This patch cover the multi-process hotplug case when a device attach/detach request be issued from a secondary process device attach on secondary: a) secondary send sync request to the primary. b) primary receive the request and attach the new device if failed goto i). c) primary forward attach sync request to all secondary. d) secondary receive the request and attach the device and send a reply. e) primary check the reply if all success goes to j). f) primary send attach rollback sync request to all secondary. g) secondary receive the request and detach the device and send a reply. h) primary receive the reply and detach device as rollback action. i) send attach fail to secondary as a reply of step a), goto k). j) send attach success to secondary as a reply of step a). k) secondary receive reply and return. device detach on secondary: a) secondary send sync request to the primary. b) primary send detach sync request to all secondary. c) secondary detach the device and send a reply. d) primary check the reply if all success goes to g). e) primary send detach rollback sync request to all secondary. f) secondary receive the request and attach back device. goto h). g) primary detach the device if success goto i), else goto e). h) primary send detach fail to secondary as a reply of step a), goto j). i) primary send detach success to secondary as a reply of step a). j) secondary receive reply and return. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-17 10:16:18 +02:00
Qi Zhang	244d513071	eal: enable hotplug on multi-process We are going to introduce the solution to handle hotplug in multi-process, it includes the below scenario: 1. Attach a device from the primary 2. Detach a device from the primary 3. Attach a device from a secondary 4. Detach a device from a secondary In the primary-secondary process model, we assume devices are shared by default. that means attaches or detaches a device on any process will broadcast to all other processes through mp channel then device information will be synchronized on all processes. Any failure during attaching/detaching process will cause inconsistent status between processes, so proper rollback action should be considered. This patch covers the implementation of case 1,2. Case 3,4 will be implemented on a separate patch. IPC scenario for Case 1, 2: attach a device a) primary attach the new device if failed goto h). b) primary send attach sync request to all secondary. c) secondary receive request and attach the device and send a reply. d) primary check the reply if all success goes to i). e) primary send attach rollback sync request to all secondary. f) secondary receive the request and detach the device and send a reply. g) primary receive the reply and detach device as rollback action. h) attach fail i) attach success detach a device a) primary send detach sync request to all secondary b) secondary detach the device and send reply c) primary check the reply if all success goes to f). d) primary send detach rollback sync request to all secondary. e) secondary receive the request and attach back device. goto g) f) primary detach the device if success goto g), else goto d) g) detach fail. h) detach success. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-17 10:16:18 +02:00
Qi Zhang	95f3e8846e	ethdev: add function to release port in secondary process Add driver API rte_eth_release_port_secondary to support the case when an ethdev need to be detached on a secondary process. Local state is set to unused and shared data will not be reset so the primary process can still use it. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>	2018-10-17 10:16:18 +02:00
Jerin Jacob	94d7265976	vfio: fix missing header inclusion The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE and used in the below files. drivers/bus/pci/linux/pci_vfio.c drivers/bus/pci/pci_common.c lib/librte_eal/linuxapp/eal/eal_interrupts.c However, Except the first file, the change missed to include <rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined. This creates runtime following error on vfio-pci mode and kernel >= 4.0.0 combination. EAL: [rte_intr_enable] Unknown handle type of fd 95 EAL: [pci_vfio_enable_notifier]Fail to enable req notifier. EAL: Fail to unregister req notifier handler. EAL: Error setting up notifier! EAL: Requested device 0000:07:00.1 cannot be used Fixes: `cda9441996` ("vfio: fix build with Linux < 4.0") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-10-17 10:16:18 +02:00
Jeff Guo	c89fdd8da2	eal/bsd: fix build When compiling on FreeBSD, a warning/error is thrown for unused parameter. This patch aim to fix the issue by delete the useless func definition. Fixes: `89ecd11052` ("eal: modify device event process function") Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-16 14:54:25 +02:00
Jeff Guo	cda9441996	vfio: fix build with Linux < 4.0 Since the older kernel version do not implement the device request interface for vfio, so when build on the kernel < v4.0.0, which is the version begin to add the device request interface, it will throw the error to show “VFIO_PCI_REQ_IRQ_INDEX” is undeclared. This patch aim to fix this compile issue by add the macro “HAVE_VFIO_DEV_REQ_INTERFACE” after checking the kernel version. Fixes: `0eb8a1c4c7` ("vfio: add request notifier interrupt") Fixes: `c115fd000c` ("vfio: handle hotplug request notifier") Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-16 14:54:25 +02:00
Jeff Guo	89ecd11052	eal: modify device event process function This patch modify the device event callback process function name to be more explicit, change the variable to be const. And more, because not only eal device helper will use the callback, but also vfio bus will use the callback to handle hot-unplug, so exposure the API out from private eal. The bus drivers and eal device would directly use this API to process device event callback. Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-15 22:55:55 +02:00
Jeff Guo	0eb8a1c4c7	vfio: add request notifier interrupt Add a new req notifier in eal interrupt for enable vfio hotplug. Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-15 22:29:35 +02:00
Jeff Guo	0fc54536b1	eal: add failure handling for hot-unplug The mechanism can initially register the sigbus handler after the device event monitor is enabled. When a sigbus event is captured, it will check the failure address and accordingly handle the memory failure of the corresponding device by invoke the hot-unplug handler. It could prevent the application from crashing when a device is hot-unplugged. By this patch, users could call below new added APIs to enable/disable the device hotplug handle mechanism. Note that it just implement the hot-unplug handler in these functions, the other handler of hotplug, such as handler for hotplug binding, could be add in the future if need: - rte_dev_hotplug_handle_enable - rte_dev_hotplug_handle_disable Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-15 22:17:49 +02:00
Jeff Guo	62e63653ba	bus: add helper to handle sigbus This patch aims to add a helper to iterate over all buses to find the relevant bus to handle the sigbus error. Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Shaopeng He <shaopeng.he@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-15 22:17:35 +02:00
Jeff Guo	538d974bcd	bus: add sigbus handler When a device is hot-unplugged, a sigbus error will occur of the datapath can still read/write to the device. A handler is required here to capture the sigbus signal and handle it appropriately. This patch introduces a bus ops to handle sigbus errors. Each bus can implement its own case-dependent logic to handle the sigbus errors. Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Shaopeng He <shaopeng.he@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-15 22:17:01 +02:00
Jeff Guo	a8a279da63	bus: add hot-unplug handler A hot-unplug failure and app crash can be caused, when a device is hot-unplugged but the application still try to access the device by reading or writing from the BARs, which is already invalid but still not timely be unmap or released. This patch introduces bus ops to handle hot-unplug failures. Each bus can implement its own case-dependent logic to handle the failures. Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-15 22:16:47 +02:00
Cristian Dumitrescu	196eae6148	pipeline: add table action for packet decap This patch introduces a new table action for packet decapsulation which removes n bytes from the start of the input packet. The n is read from the current table entry. The following mbuf fields are updated by the action: data_off, data_len, pkt_len. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-12 19:33:34 +02:00
Cristian Dumitrescu	a63e06d706	pipeline: add table action for packet tag This patch introduces the packet tag table action which attaches a 32-bit value (the tag) to the current input packet. The tag is read from the current table entry. The tag is written into the mbuf->hash.fdir.hi and the flags PKT_RX_FDIR and PKT_RX_FDIR_ID are set into mbuf->ol_flags. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-12 19:33:26 +02:00
Fan Zhang	96303217a6	pipeline: add symmetric crypto table action This patch adds the symmetric crypto action support to pipeline library. The symmetric crypto action works as the shim layer between pipeline and DPDK cryptodev and is able to interact with cryptodev with the control path requests such as session creation/deletion and data path work to assemble the crypto operations for received packets. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-12 19:33:07 +02:00
Fan Zhang	cc85c0781f	port: add symmetric crypto This patch adds the symmetric crypto support to port library. The crypto port acts as a shim layer to DPDK cryptodev library and supports in-place crypto workload processing. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-12 19:33:02 +02:00
Kevin Laatz	ea7be0a038	lib/librte_table: add hash function headers This commit adds rte_table_hash_func.h and rte_table_hash_func_arm64.h to librte_table. This reduces code duplication by removing duplicate header files within two folders and consolidating them into a single one. This also adds a scalar implementation of the x86_64 intrinsic for crc32 as a generic fallback. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-12 17:58:53 +02:00
Cristian Dumitrescu	b594cf43c3	pipeline: add VXLAN encap table action Add support for VXLAN as part of the encap table action. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-12 17:57:44 +02:00
Jasvinder Singh	923010592a	sched: allocate memory on the given socket id Replace rte_zmalloc() with rte_zmalloc_socket() to allocate memory on the socket id provided by the application. Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-08 17:52:29 +02:00
Rosen Xu	9378d24bef	ethdev: expand queue threshold size of RED parameters There's very commonly that more than 4G DDR memory in NIC for HQoS, so right now the queue threshold size of RED needs to expand to uint64_t. This patch fixes it. Signed-off-by: Rosen Xu <rosen.xu@intel.com>	2018-10-08 17:51:54 +02:00
Vivek Sharma	bed70e5deb	eal: use correct data type for bitmap slab operations Currently, slab operations use unsigned long data type for 64-bit slab related operations. On target 'i686-native-linuxapp-gcc', unsigned long is 32-bit and thus, slab operations breaks on this target. Changing slab operations to use unsigned long long for correct functioning on all targets. Fixes: `de3cfa2c98` ("sched: initial import") Fixes: `693f715da4` ("remove extra parentheses in return statement") Cc: stable@dpdk.org Signed-off-by: Vivek Sharma <vivek.sharma@caviumnetworks.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-08 17:51:24 +02:00
Jerin Jacob	df694a05bf	ethdev: add Tx offload outer UDP checksum definition Introduced DEV_TX_OFFLOAD_OUTER_UDP_CKSUM offload flags and PKT_TX_OUTER_UDP_CKSUM mbuf ol_flags to enable Tx outer UDP checksum offload. To use hardware Tx outer UDP checksum offload, the user needs to, - enable following in mbuf: a) fill outer_l2_len and outer_l3_len in mbuf b) set the PKT_TX_OUTER_UDP_CKSUM flag c) set the flag PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6 - configure DEV_TX_OFFLOAD_OUTER_UDP_CKSUM offload flags in slow path Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-11 18:53:49 +02:00
Jerin Jacob	ec7f71577f	ethdev: add Rx offload outer UDP checksum definition Introduced DEV_RX_OFFLOAD_OUTER_UDP_CKSUM Rx offload flag and PKT_RX_OUTER_L4_CKSUM_* mbuf ol_flags to detect outer UDP checksum status. - To use hardware Rx outer UDP checksum offload, the user needs to configure DEV_RX_OFFLOAD_OUTER_UDP_CKSUM offload flags in slowpath. - Driver updates checksum status in mbuf ol_flag as PKT_RX_OUTER_L4_CKSUM_* flags. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-11 18:53:49 +02:00
Rahul Lakkireddy	8287597059	ethdev: add flow action to swap MAC addresses This action is useful for offloading loopback mode, where the hardware will swap source and destination MAC addresses in the outermost Ethernet header before looping back the packet. This action can be used in conjunction with other rewrite actions to achieve MAC layer transparent NAT where the MAC addresses are swapped before either the source or destination MAC address is rewritten and NAT is performed. Must be used with a valid RTE_FLOW_ITEM_TYPE_ETH flow pattern item. Otherwise, RTE_FLOW_ERROR_TYPE_ACTION error should be returned by the PMDs. Original work by Shagun Agrawal Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-11 18:53:49 +02:00
Rahul Lakkireddy	9ccc949195	ethdev: add flow API actions to modify TCP/UDP port numbers Add actions: - SET_TP_SRC - set a new TCP/UDP source port number. - SET_TP_DST - set a new TCP/UDP destination port number. Original work by Shagun Agrawal Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-11 18:53:49 +02:00
Rahul Lakkireddy	0517eea761	ethdev: add flow API actions to modify IP addresses Add actions: - SET_IPV4_SRC - set a new IPv4 source address. - SET_IPV4_DST - set a new IPv4 destination address. - SET_IPV6_SRC - set a new IPv6 source address. - SET_IPV6_DST - set a new IPv6 destination address. Original work by Shagun Agrawal Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-11 18:53:49 +02:00
Ferruh Yigit	b2fd027389	mbuf: clarify QinQ flag usage Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN should be also set. Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also should be set. So that appllication can rely on PKT_RX_QINQ flag to access both mbuf.vlan_tci & mbuf.vlan_tci_outer Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-11 18:53:49 +02:00
Jerin Jacob	1037ed842c	mbuf: fix Tx offload mask Fixes missing PKT_TX_UDP_SEG, PKT_TX_OUTER_IPV6,PKT_TX_OUTER_IPV4, PKT_TX_IPV6 and PKT_TX_IPV4 values in PKT_TX_OFFLOAD_MASK. Also sort them in bit wise order to recognize missing items later. Fixes: `6d18505efa` ("vhost: support UDP Fragmentation Offload") Fixes: `1c3b7c33e9` ("mbuf: add Tx offloading flags for tunnels") Fixes: `711ba9e23e` ("mbuf: remove aliasing of Tx offloading flags with Rx ones") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Jiayu Hu <jiayu.hu@intel.com>	2018-10-11 18:53:49 +02:00
Jerin Jacob	28f6a3b88d	ethdev: support SCTP Rx checksum offload Added SCTP Rx checksum offload support Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-11 18:53:49 +02:00
Xiaoyun Li	4539803652	ethdev: get Rx queue interrupt fd Some users want to use their own epoll instances to control both DPDK rxq interrupt fds and their own other fds. So added a function to get rxq interrupt fd based on port id and queue id. Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-11 18:53:49 +02:00
Adrien Mazarguil	5ca8203907	ethdev: deprecate flow object copy function No users left for this function, time to deprecate it. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-10-11 18:53:49 +02:00
Adrien Mazarguil	c2beb1d469	ethdev: add missing items/actions to flow object converter Several pattern items and actions were never handled by rte_flow_copy() because their descriptions were missing. rte_flow_conv() inherited this deficiency. This patch adds them and reorders others to match rte_flow.h. It doesn't pose as a fix because so far no one has complained about it and rte_flow_conv() would have to be backported as well: this function is the only sane approach to handle VXLAN and NVGRE encap definitions. As a matter of fact, it's the last missing piece to finally allow testpmd users to request the creation of VXLAN/NVGRE encap/decap flow rules without getting rejected outright. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-10-11 18:53:49 +02:00
Adrien Mazarguil	239dfc8d66	ethdev: add flow API item/action name conversion This provides a means for applications to retrieve the name of flow pattern items and actions. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-10-11 18:53:49 +02:00
Adrien Mazarguil	063911ee1d	ethdev: add flow API object converter rte_flow_copy() is bound to duplicate flow rule descriptions (attributes, pattern and list of actions, all at once), however applications sometimes need more flexibility, for instance the ability to duplicate only one of the underlying objects (a single pattern item or action) or retrieve other properties such as their names. Instead of adding dedicated functions to handle each possible use case, this patch introduces rte_flow_conv(), which supports any number of object conversion operations in an extensible manner. This patch re-implements rte_flow_copy() as a wrapper to rte_flow_conv(). Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-10-11 18:53:49 +02:00
Xiaolong Ye	0e0a7d3801	vhost: introduce API to get vDPA device number It's used to get number of available registered vDPA devices. Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Acked-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-11 18:53:49 +02:00
Thomas Monjalon	911462eb4a	eal: simplify parameters of hotplug functions All information about a device to probe can be grouped in a common string, which is what we usually call devargs. An application should not have to parse this string before calling the EAL probe function. And the syntax could evolve to be more complex and support matching multiple devices in one string. That's why the bus name and device name should be removed from rte_eal_hotplug_add(). Instead of changing this function, a simpler one is added and used in the old one, which may be deprecated later. When removing a device, we already know its rte_device handle which can be directly passed as parameter of rte_eal_hotplug_remove(). If the rte_device is not known, it can be retrieved with the devargs, by iterating in the device list (future RTE_DEV_FOREACH()). Similarly to the probing case, a new function is added and used in the old one, which may be deprecated later. The new function is used in failsafe, because the replacement is easy. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-11 14:09:24 +02:00
Thomas Monjalon	6878cd397d	eal: remove experimental flag of hotplug functions These functions are quite old and are the only available replacement for the deprecated attach/detach functions. Note: some new functions may (again) replace these hotplug functions, in future, with better parameters. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-11 14:09:24 +02:00
Thomas Monjalon	6844d146ff	eal: add bus pointer in device structure When a device is added with a devargs (hotplug or whitelist), the bus pointer can be retrieved via its devargs. But there is no such devargs.bus in case of standard scan. A pointer to the rte_bus handle is added to rte_device. When a device is allocated (during a scan), the pointer to its bus is assigned. It will make possible to remove a rte_device, using the function pointer from its bus. The function rte_bus_find_by_device() becomes useless, and may be removed later. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-11 14:09:24 +02:00
Thomas Monjalon	2effa126fb	devargs: simplify parameters of removal function The function rte_devargs_remove(), which is intended to be internal, can take a devargs structure as argument. The matching is still using string comparison of bus name and device name. It is simpler and may allow a different devargs matching in future. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-11 14:09:24 +02:00
Thomas Monjalon	e7ec4d2fc8	devargs: remove deprecated functions rte_eal_parse_devargs_str() does not support parsing the bus name at the start of devargs. So it was renamed and deprecated. rte_eal_devargs_add(), rte_eal_devargs_type_count() and rte_eal_devargs_dump() were declared deprecated and had their implementation body renamed. All these functions were deprecated in release 18.05. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-11 14:09:18 +02:00
Thomas Monjalon	3f7a40c670	devargs: rename enum items with singular form The enum names are _params (plural form). And the items are also using the plural form: _PARAMS_. It looks more natural to use the singular form _PARAM_* for items. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-11 13:57:29 +02:00
Pawel Wodkowski	8e1fdcaa3d	mem: fix --huge-unlink option The final_va field is set during remap_segment() but this information is not propagated to temporal copy of huge page memory configuration so the unlink_hugepage_files() function wrongly assume that there is nothing to unlink. Fix this issue by checking orig_va instead of final_va. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Cc: stable@dpdk.org Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 12:19:58 +02:00
Anatoly Burakov	f32c7c9de9	malloc: enable event callbacks for external memory When adding or removing external memory from the memory map, there may be actions that need to be taken on account of this memory (e.g. DMA mapping). Add support for triggering callbacks when adding, removing, attaching or detaching external memory. Some memory event callback handlers will need additional logic to handle external memory regions. For example, virtio callback has to completely ignore externally allocated memory, because there is no way to find file descriptors backing the memory address in a generic fashion. All other callbacks have also been adjusted to handle RTE_BAD_IOVA as IOVA address, as this is one of the expected use cases for external memory support. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:55 +02:00
Anatoly Burakov	c842d1c3b0	malloc: allow detaching from external memory Add API to detach from existing chunk of external memory in a process. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:55 +02:00
Anatoly Burakov	ff3619d624	malloc: allow attaching to external memory chunks In order to use external memory in multiple processes, we need to attach to primary process's memseg lists, so add a new API to do that. It is the responsibility of the user to ensure that memory is accessible and that it has been previously added to the malloc heap by another process. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:55 +02:00
Anatoly Burakov	75185aa5fe	malloc: allow removing memory from named heaps Add an API to remove memory from specified heaps. This will first check if all elements within the region are free, and that the region is the original region that was added to the heap (by comparing its length to length of memory addressed by the underlying memseg list). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:55 +02:00
Anatoly Burakov	7d75c31014	malloc: allow adding memory to named heaps Add an API to add externally allocated memory to malloc heap. The memory will be stored in memseg lists like regular DPDK memory. Multiple segments are allowed within a heap. If IOVA table is not provided, IOVA addresses are filled in with RTE_BAD_IOVA. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:55 +02:00
Anatoly Burakov	15d6dd023c	malloc: allow destroying heaps Add an API to destroy specified heap. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:55 +02:00
Anatoly Burakov	02e323a8a8	malloc: allow creating malloc heaps Add API to allow creating new malloc heaps. They will be created with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing with internal heaps. This breaks the ABI, so document the change. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:56:51 +02:00
Anatoly Burakov	65ff37b105	malloc: add function to check if socket is external An API is needed to check whether a particular socket ID belongs to an internal or external heap. Prime user of this would be mempool allocator, because normal assumptions of IOVA contiguousness in IOVA as VA mode do not hold in case of externally allocated memory. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:11:25 +02:00
Anatoly Burakov	e1fe3c2fab	malloc: add function to query socket ID of named heap When we will be creating external heaps, they will have their own "fake" socket ID, so add a function that will map the heap name to its socket ID. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:11:25 +02:00
Anatoly Burakov	d14c148e79	malloc: add name to malloc heaps We will need to refer to external heaps in some way. While we use heap ID's internally, for external API use it has to be something more user-friendly. So, we will be using a string to uniquely identify a heap. This breaks the ABI, so document the change. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 11:11:23 +02:00
Anatoly Burakov	f50c6c4bd1	sched: do not check for invalid socket ID We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-11 10:37:45 +02:00
Anatoly Burakov	5675c2ea15	pipeline: do not check for invalid socket ID We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-10-11 10:37:45 +02:00
Anatoly Burakov	21bd1106ea	flow_classify: do not check for invalid socket ID We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>	2018-10-11 10:37:45 +02:00
Anatoly Burakov	f473b6d191	mem: do not check for invalid socket ID We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. This changes the semantics of what we understand by "socket ID", so document the change in the release notes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 10:37:45 +02:00
Anatoly Burakov	72cf92b318	malloc: index heaps using heap ID rather than NUMA node Switch over all parts of EAL to use heap ID instead of NUMA node ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA node's index within the detected NUMA node list. Heap ID for external heaps will be order of their creation. This breaks the ABI, so document the changes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-11 10:37:39 +02:00
Anatoly Burakov	5282bb1c36	mem: allow memseg lists to be marked as external When we allocate and use DPDK memory, we need to be able to differentiate between DPDK hugepage segments and segments that were made part of DPDK but are externally allocated. Add such a property to memseg lists. This breaks the ABI, so document the change in release notes. This also breaks a few internal assumptions about memory contiguousness, so adjust malloc code in a few places. All current calls for memseg walk functions were adjusted to ignore external segments where it made sense. Mempools is a special case, because we may be asked to allocate a mempool on a specific socket, and we need to ignore all page sizes on other heaps or other sockets. Previously, this assumption of knowing all page sizes was not a problem, but it will be now, so we have to match socket ID with page size when calculating minimum page size for a mempool. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-10-11 10:24:29 +02:00
Anatoly Burakov	4104b2a485	mem: add length to memseg list Previously, to calculate length of memory area covered by a memseg list, we would've needed to multiply page size by length of fbarray backing that memseg list. This is not obvious and unnecessarily low level, so store length in the memseg list itself. This breaks ABI, so bump the EAL ABI version and document the change. Also, while we're breaking ABI, pack the members a little better. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-11 10:24:16 +02:00
Ferruh Yigit	850716bc57	eventdev: fix build build error: .../lib/librte_eventdev/rte_event_eth_tx_adapter.c: In function ‘txa_service_queue_del’: .../lib/librte_eventdev/rte_event_eth_tx_adapter.c:800:7: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] compilation terminated due to -Wfatal-errors. https://mails.dpdk.org/archives/test-report/2018-October/065919.html 'ret' may be used uninitialized when 'dev->data->nb_tx_queues' is 0, although this is not a practical value, initialize 'ret' to cover this case. Fixes: `a3bbf2e097` ("eventdev: add eth Tx adapter implementation") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-10 21:40:40 +02:00
Anatoly Burakov	03ba15ca65	vfio: allow mapping MSI-X BARs if kernel allows it Currently, DPDK will skip mapping some areas (or even an entire BAR) if MSI-X table happens to be in them but is smaller than page size. Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this as a capability flag. Capability flags themselves are also only supported since kernel 4.6 [2]. This commit will introduce support for checking VFIO capabilities, and will use it to check if we are allowed to map BARs with MSI-X tables in them, along with backwards compatibility for older kernels, including a workaround for a variable rename in VFIO region info structure [3]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-04 00:45:50 +02:00
Anatoly Burakov	b1621823ea	mem: fix undefined behavior in NUMA-aware mapping When NUMA-aware hugepages config option is set, we rely on libnuma to tell the kernel to allocate hugepages on a specific NUMA node. However, we allocate node mask before we check if NUMA is available in the first place, which, according to the manpage [1], causes undefined behaviour. Fix by only using nodemask when we have NUMA available. [1] https://linux.die.net/man/3/numa_alloc_onnode Bugzilla ID: 20 Fixes: `1b72605d24` ("mem: balanced allocation of hugepages") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2018-10-04 00:33:58 +02:00
Anatoly Burakov	64cdfc35aa	mem: store memory mode flags in shared config Currently, command-line switches for legacy mem mode or single-file segments mode are only stored in internal config. This leads to a situation where these flags have to always match between primary and secondary, which is bad for usability. Fix this by storing these flags in the shared config as well, so that secondary process can know if the primary was launched in single-file segments or legacy mem mode. This bumps the EAL ABI, however there's an EAL deprecation notice already in place[1] for a different feature, so that's OK. [1] http://patches.dpdk.org/patch/43502/ Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-04 00:09:47 +02:00
Gaetan Rivet	ca372b3f50	devargs: remove comment regarding logs rte_log() is available in the context of this compilation unit, do not deter from using it. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-03 14:36:18 +02:00
Gaetan Rivet	e815a7f693	ethdev: register as a class Implement the operators of an rte_class for the ethdev abstraction layer. Register the layer as such. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-03 14:23:02 +02:00
Gaetan Rivet	600ce80536	ethdev: add private generic device iterator This iterator can be customized with a comparison function that will trigger a stopping condition. It can be leveraged to write several different iterators that have similar but non-identical purposes. It is private to librte_ethdev. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-03 14:22:41 +02:00
Igor Ryzhov	edd2fafbc0	kni: allocate memory dynamically for each device Long time ago preallocation of memory for KNI was introduced in commit `0c6bc8e`. It was done because of lack of ability to free previously allocated memzones, which led to memzone exhaustion. Currently memzones can be freed and this patch uses this ability for dynamic KNI memory allocation. Signed-off-by: Igor Ryzhov <iryzhov@nfware.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-02 17:57:00 +02:00
Nikhil Rao	475425186f	eventdev: fix port id argument in Rx adapter caps Make the ethernet port id passed into rte_event_eth_rx_adapter_caps_get() 16 bit. Also, update the event rx adapter test to use 16 bit ethernet port ids. Fixes: `c2189c907d` ("eventdev: make ethdev port identifiers 16-bit") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-01 16:53:13 +02:00
Nikhil Rao	a3bbf2e097	eventdev: add eth Tx adapter implementation This patch implements the Tx adapter APIs by invoking the corresponding eventdev PMD callbacks and also provides the common rte_service function based implementation when the eventdev PMD support is absent. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>	2018-10-01 16:51:13 +02:00
Nikhil Rao	c662a950f4	eventdev: add caps API and PMD callbacks for eth Tx adapter The caps API allows the application to query if the transmit stage is implemented in the eventdev PMD or uses the common rte_service function. The PMD callbacks support the eventdev PMD implementation of the adapter. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-01 16:50:54 +02:00
Nikhil Rao	c9bf83947e	eventdev: add eth Tx adapter APIs The ethernet Tx adapter abstracts the transmit stage of an event driven packet processing application. The transmit stage may be implemented with eventdev PMD support or use a rte_service function implemented in the adapter. These APIs provide a common configuration and control interface and an transmit API for the eventdev PMD implementation. The transmit port is specified using mbuf::port. The transmit queue is specified using the rte_event_eth_tx_adapter_txq_set() function. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-01 16:49:41 +02:00
Harry van Haaren	e279bbe4b2	event: add function for reading unlink in progress This commit introduces a new function in the eventdev API, which allows applications to read the number of unlink requests in progress on a particular port of an eventdev instance. This information allows applications to verify when no more packets from a particular queue (or any queue) will arrive at a port. The application could decide to stop polling, or put the core into a sleep state if it wishes, as it is ensured that no new packets will arrive at a particular port anymore if all queues are unlinked. Suggested-by: Matias Elo <matias.elo@nokia.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-01 16:48:38 +02:00
Nikhil Rao	d7b5f102c4	eventdev: fix eth Rx adapter hotplug incompatibility Use RTE_MAX_ETHPORTS instead of rte_eth_dev_count_total() when allocating eth Rx adapter's per-eth device data structure to account for hotplugged devices. Fixes: `9c38b704d2` ("eventdev: add eth Rx adapter implementation") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-01 16:47:56 +02:00
Jiayu Hu	729199397f	vhost: fix corner case for enqueue operation When performing enqueue operations on the split and packed rings, if the reserved buffer length from the descriptor table exceeds 65535, the returned length by fill_vec_buf_split/_packed() overflows. This patch is to avoid this corner case. Fixes: `f689586bc0` ("vhost: shadow used ring update") Fixes: `fd68b4739d` ("vhost: use buffer vectors in dequeue path") Fixes: `2f3225a7d6` ("vhost: add vector filling support for packed ring") Fixes: `37f5e79a27` ("vhost: add shadow used ring support for packed rings") Fixes: `a922401f35` ("vhost: add Rx support for packed ring") Fixes: `ae999ce49d` ("vhost: add Tx support for packed ring") Cc: stable@dpdk.org Signed-off-by: Jiayu Hu <jiayu.hu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-28 01:41:03 +02:00
Nikolay Nikolaev	2f270595c0	vhost: rework message handling as a callback array Introduce vhost_message_handlers, which maps the message request type to the message handler. Then replace the switch construct with a map and call. Failing vhost_user_set_features is fatal and all processing should stop immediately and propagate the error to the upper layers. Change the code accordingly to reflect that. Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-28 01:41:03 +02:00
Nikolay Nikolaev	0bff510b5e	vhost: unify message handling function signature Each vhost-user message handling function will return an int result which is described in the new enum vh_result: error, OK and reply. All functions will now have two arguments, virtio_net double pointer and VhostUserMsg pointer. Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-28 01:41:03 +02:00
Nikolay Nikolaev	fd29c33b65	vhost: handle unsupported message types in functions Add new functions to handle the unsupported vhost message types: - vhost_user_set_vring_err - vhost_user_set_log_fd Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-28 01:41:03 +02:00
Nikolay Nikolaev	e951355ffc	vhost: make message handling functions prepare the reply As VhostUserMsg structure is reused to generate the reply, move the relevant fields update into the respective message handling functions. Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-28 01:41:03 +02:00
Nikolay Nikolaev	44eb792f9f	vhost: unify struct VhostUserMsg usage Do not use the typedef version of struct VhostUserMsg. Also unify the related parameter name. Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-28 01:41:03 +02:00
Paul M Stillwell Jr	cb0ad8fa26	ethdev: fix doxygen comment to be with structure The doxygen comment describing the rte_eth_dev_info structure was separated from the structure itself so move the comment back to be with the structure. Fixes: `7238e63bce` ("ethdev: add support for device offload capabilities") Cc: stable@dpdk.org Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-09-28 01:41:03 +02:00
Alejandro Lucero	aa3c4fb6a4	ethdev: fix error handling in create function This patch fixes how function exit is handled when errors inside rte_eth_dev_create. Fixes: `e489007a41` ("ethdev: add generic create/destroy ethdev APIs") Cc: stable@dpdk.org Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-09-28 01:41:02 +02:00
Didier Pallard	ae0207d4b5	net: fix Intel prepare function for IP checksum offload Current Intel tx prepare function does not properly handle the case where only IP checksum is requested, without requesting any L4 checksum or TSO: IP checksum is not properly reset to 0 and output packet may contain invalid IP checksum. Fixes: `4fb7e803eb` ("ethdev: add Tx preparation") Cc: stable@dpdk.org Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-09-28 01:41:02 +02:00
Anatoly Burakov	55d6bb67c9	eal/bsd: fix build When compiling on FreeBSD, lots of warnings/errors are thrown for unused parameter. Fix these by marking the parameters as unused in the code. Fixes: `1009ba1704` ("mem: add internal API to get and set segment fd") Fixes: `3a44687139` ("mem: allow querying offset into segment fd") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-20 14:51:52 +02:00
Alex Kiselev	d5946eef6a	ip_frag: add function to delete expired entries A fragmented packets is supposed to live no longer than max_cycles, but the lib deletes an expired packet only occasionally when it scans a bucket to find an empty slot while adding a new packet. Therefore a fragment might sit in the table forever. Signed-off-by: Alex Kiselev <alex@therouter.net> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-09-19 19:45:38 +02:00
Alex Kiselev	e480688dce	lpm6: add incremental update on delete Rework the delete function and add additional internal data structures to support incremental LPM tree update rather than full tree rebuild. Signed-off-by: Alex Kiselev <alex@therouter.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-19 17:11:37 +02:00
Alex Kiselev	86b3b21952	lpm6: store rules in hash table Rework the lpm6 rule subsystem and replace current rules algorithm complexity O(n) with hashtables which allow dealing with large (50k) rule sets. Signed-off-by: Alex Kiselev <alex@therouter.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-19 17:11:17 +02:00
Anatoly Burakov	c127be93f6	mem: support using memfd segments for in-memory mode Enable using memfd-created segments if supported by the system. This will allow having real fd's for pages but without hugetlbfs mounts, which will enable in-memory mode to be used with virtio. The implementation is mostly piggy-backing on existing real-fd code, except that we no longer need to unlink any files or track per-page locks in single-file segments mode, because in-memory mode does not support secondary processes anyway. We move some checks from EAL command-line parsing code to memalloc because it is now possible to use single-file segments mode with in-memory mode, but only if memfd is supported. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 15:02:19 +02:00
Anatoly Burakov	3a44687139	mem: allow querying offset into segment fd In a few cases, user may need to query offset into fd for a particular memory segment (for example, to selectively map pages). This commit adds a new API to do that. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 15:01:58 +02:00
Anatoly Burakov	41dbdb6872	mem: add external API to retrieve page fd Now that we can retrieve page fd's internally, we can expose it as an external API. This will add two flavors of API - thread-safe and non-thread-safe. Fix up internal API's to return values we need without modifying rte_errno internally if called from within EAL. We do not want calling code to accidentally close an internal fd, so we make a duplicate of it before we return it to the user. Caller is therefore responsible for closing this fd. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:48:04 +02:00
Anatoly Burakov	1009ba1704	mem: add internal API to get and set segment fd Enable setting and retrieving segment fd's internally. For now, retrieving fd's will not be used anywhere until we get an external API, but it will be useful for things like virtio, where we wish to share segment fd's. Setting segment fd's will not be available as a public API at this time, but internally it is needed for legacy mode, because we're not allocating our hugepages in memalloc in legacy mode case, and we still need to store the fd. Another user of get segment fd API is memseg info dump, to show which pages use which fd's. Not supported on FreeBSD. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:46:34 +02:00
Anatoly Burakov	16cab6e5c8	mem: track page fd in non-single file mode Previously, we were only tracking lock file fd's in single-file segments mode, but did not track fd's in non-single file mode because we didn't need to (mmap() call still kept the lock). Now that we are going to expose these fd's to the world, we need to have access to them, so track them even in non-single file segments mode. We don't need to close fd's after mmap() because we're still tracking them in an fd list. Also, for anonymous hugepages mode, fd will always be -1 so exit early on error. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:44:11 +02:00
Anatoly Burakov	a033a4158b	mem: rename lock list to fd list Previously, we were only using lock lists to store per-page lock fd's because we cannot use modern fcntl() file description locks to lock parts of the page in single file segments mode. Now, we will be using this list to store either lock fd's (along with memseg list fd) in single file segments mode, or per-page fd's (and set memseg list fd to -1), so rename the list accordingly. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:43:14 +02:00
Anatoly Burakov	18329a4366	mem: raise maximum fd limit unconditionally Previously, when we allocated hugepages, we closed the fd's corresponding to them after we've done our mappings. Since we did mmap(), we didn't actually lose the reference, but file descriptors used for mmap() do not count against the fd limit. Since we are going to store all of our fd's, we will hit the fd limit much more often when using smaller page sizes. Fix this to raise the fd limit to maximum unconditionally. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:41:38 +02:00
Anatoly Burakov	d4ce95d6b4	eal: do not allow legacy mode with --in-memory mode In-memory mode was never meant to support legacy mode, because we cannot sort anonymous pages anyway. Fixes: `72b49ff623` ("mem: support --in-memory mode") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:40:56 +02:00
Anatoly Burakov	310aa7c041	fbarray: fix detach in --no-shconf mode In noshconf mode, no shared files are created, but we're still trying to unlink them, resulting in detach/destroy failure even though it should have succeeded. Fix it by exiting early in noshconf mode. Fixes: `3ee2cde248` ("fbarray: support --no-shconf mode") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-19 14:40:18 +02:00
Gaetan Rivet	b0236c7cf7	eal: add strscpy function The strncpy function has long been deemed unsafe for use, in favor of strlcpy or snprintf. While snprintf is standard and strlcpy is still largely available, they both have issues regarding error checking and performance. Both will force reading the source buffer past the requested size if the input is not a proper c-string, and will return the expected number of bytes copied, meaning that error checking needs to verify that the number of bytes copied is not superior to the destination size. This contributes to awkward code flow, unclear error checking and potential issues with malformed input. The function strscpy has been discussed for some time already and has been made available in the linux kernel[1]. Propose this new function as a safe alternative. [1]: http://git.kernel.org/linus/30c44659f4a3 Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Juhamatti Kuusisaari <juhamatti.kuusisaari@coriant.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-09-19 11:38:19 +02:00
David Marchand	c4833b8ed3	mbuf: remove deprecated segment free functions __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for a long time now (early 17.05), are not part of the abi and are easily replaced with existing api. Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-09-19 10:35:01 +02:00
Reshma Pattan	2c6c3e0dc8	pdump: remove dependency on libpthread pdump library now uses generic multi process channel and it is no more dependent on the pthreads, so remove the dependency from the Makefile. Fixes: `660098d61f` ("pdump: use generic multi-process channel") Cc: stable@dpdk.org Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>	2018-09-18 11:48:32 +02:00
Bruce Richardson	806c45dd48	build: add configuration summary at end of config After running meson to configure a DPDK build, it can be useful to know what was automatically enabled or disabled. Therefore, print out by way of summary a categorised list of libraries and drivers to be built. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-17 13:58:40 +02:00
Bruce Richardson	34b3d7a4a4	build: simplify logic for default library dependencies EAL is a standard dependency of all libraries, except for those built before it. We can therefore simplify the logic by just checking if EAL has been processed, and make it a standard dependency if so. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-17 13:51:25 +02:00
Luca Boccassi	54d609a138	build: add ppc64 meson build This has been only build-tested for now, on a native ppc64el POWER8E machine running Debian sid. Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-17 12:21:17 +02:00
Luca Boccassi	888904417d	eal: include missing hypervisor files in meson They are built by the legacy makefiles but not by Meson. Fixes: `8f40ee0734` ("eal/x86: get hypervisor name") Cc: stable@dpdk.org Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-09-17 12:17:02 +02:00
Ferruh Yigit	323e7b667f	ethdev: make default behavior CRC strip on Rx Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag. Without any specific Rx offload flag, default behavior by PMDs is to strip CRC. PMDs that support keeping CRC should advertise DEV_RX_OFFLOAD_KEEP_CRC Rx offload capability. Applications that require keeping CRC should check PMD capability first and if it is supported can enable this feature by setting DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure() Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tomasz Duszynski <tdu@semihalf.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Jan Remes <remes@netcope.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com>	2018-09-14 20:08:41 +02:00
Dekel Peled	d9552ee24a	ethdev: fix missing names in Tx offload name array Patch `5355f443` added two definitions of DEV_TX_OFFLOAD_xxx. If new Tx offload capabilities are defined, they also must be mentioned in rte_tx_offload_names in rte_ethdev.c file. This patch adds the required lines in array rte_tx_offload_names. Fixes: `5355f4439e` ("ethdev: introduce generic IP/UDP tunnel checksum and TSO") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-09-14 20:08:41 +02:00
Tiwei Bie	58e90a9113	vhost: fix return value on enqueue path Fixes: `62250c1d09` ("vhost: extract split ring handling from Rx and Tx functions") Fixes: `a922401f35` ("vhost: add Rx support for packed ring") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-14 20:08:41 +02:00
Ilya Maximets	0d7853a4da	vhost-user: drop connection on message handling failures There are a lot of cases where vhost-user massage handling could fail and end up in a fully not recoverable state. For example, allocation failures of shadow used ring and batched copy array are not recoverable and leads to the segmentation faults like this on the receiving/transmission path: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f913fecf0 (LWP 43625)] in copy_desc_to_mbuf () at /lib/librte_vhost/virtio_net.c:760 760 batch_copy[vq->batch_copy_nb_elems].dst = This could be easily reproduced in case of low memory or big number of vhost-user ports. Fix that by propagating error to the upper layer which will end up with disconnection in case we can not report to the message sender when the error happens. Fixes: `f689586bc0` ("vhost: shadow used ring update") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-14 20:08:41 +02:00
Tiwei Bie	77de7c781c	vhost: fix vhost interrupt support When VIRTIO_RING_F_EVENT_IDX is negotiated, we need to update the avail event to enable the notification. Fixes: `3f8ff12821` ("vhost: support interrupt mode") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-09-14 20:08:41 +02:00
Ilya Maximets	28925156d9	vhost: fix zmbufs array leak after NUMA realloc 'numa_realloc()' allocates 'zmbufs' even if zero copy mode is not configured. This leads to memory leak, because array is freed only for zero copy case. Fixes: `2651726def` ("vhost: do deep copy while reallocating queue") CC: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-09-12 19:10:09 +02:00
Stephen Hemminger	40d96ffbd1	ethdev: fix port ownership logs The rte_eth_dev_owner_unset function always generates a log message because the unset value for owner id is 0. Also, when rte_eth_dev_owner_delete is called with a valid owner id, the log message should be at NOTICE not ERROR severity. Fixes: `5b7ba31148` ("ethdev: add port ownership") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Matan Azrad <matan@mellanox.com>	2018-08-28 15:27:39 +02:00
Alejandro Lucero	1e5e3d2e72	ethdev: fix MAC changes when live change not supported Current code assumes a MAC change can occur when the port has been started. In fact, there are some NICs which require this port state for being successful, but other NICs not always support MAC change in that case. This patch supports a new device flag for a device advertising this limitation, and if the flag is set, the MAC is changed before the port starts. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-08-28 15:27:39 +02:00
Ilya Maximets	d02f2092a3	vhost: suppress error if NUMA is not available It's a common case that 'get_mempolicy' fails on systems without NUMA support. No need to flag an error in log for this situation. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-08-28 15:27:39 +02:00
Ilia Kurakin	2c1bbab7f0	ethdev: change vtune profiling approach The patch changes rx_burst profiling approach: 1. VTune's instrumentation is removed 2. empty hook callback for profiling is added This way all VTune-specific logic moves to the VTune side. Hook is enabled only when CONFIG_RTE_ETHDEV_PROFILE_WITH_VTUNE option is turned on. VTune uses this hook to attach to the polling cycle. It is not possible to attach to the rx_burst directly, as it is inline. Signed-off-by: Ilia Kurakin <ilia.kurakin@intel.com> Acked-by: Keith Wiles <keith.wiles@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-08-28 15:27:39 +02:00
Konstantin Ananyev	5394547798	acl: forbid rule with priority zero If user specifies priority=0 for some of ACL rules that can cause rte_acl_classify to return wrong results. The reason is that priority zero is used internally for no-match nodes. See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79. The simplest way to overcome the issue is just not allow zero to be a valid priority for the rule. Fixes: `dc276b5780` ("acl: new library") Cc: stable@dpdk.org Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-09-16 11:53:25 +02:00
Tiwei Bie	dde37a8fb8	malloc: fix potential null pointer dereference We need to do the NULL pointer check first after malloc(). Fixes: `07dcbfe010` ("malloc: support multiprocess memory hotplug") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-09-16 11:23:12 +02:00
Thomas Monjalon	76b9d9de5c	version: 18.11-rc0 Start version numbering for a new release cycle, and introduce a template file for release notes. The release notes comments have a new block to suggest the order of items, inspired by Ferruh's proposal. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-08-13 12:42:46 +02:00
Thomas Monjalon	11a1f847d2	version: 18.08.0 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-08-09 23:11:26 +02:00
Yipeng Wang	8308d2ff5d	hash: add more accurate thread-safety comments Describing the thread-safety support more accurately for API documentation. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-08-09 21:56:29 +02:00
Rami Rosen	2da7f0146e	ethdev: fix a doxygen comment for port allocation This patch fixes a doxygen comment of the rte_eth_dev_allocate() method. There is no parameter named "type" for this method; so this patch removes the doxygen comment about it. Fixes: `6751f6deb7` ("ethdev: get rid of device type") Cc: stable@dpdk.org Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-08-09 17:53:34 +02:00
Dan Gora	e716b63985	kni: fix crash with null name Fix a segmentation fault which occurs when the kni_autotest is run in the 'test' application. This segmenation fault occurs when rte_kni_get() is called with a NULL value for 'name'. Fixes: `0c6bc8ef70` ("kni: memzone pool for alloc and release") Cc: stable@dpdk.org Signed-off-by: Dan Gora <dg@adax.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-08-09 11:50:10 +02:00
Olivier Matz	6b867cc113	eal: remove experimental tag for user mbuf pool ops Remove experimental tag from rte_eal_mbuf_user_pool_ops(). Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-08-09 01:03:14 +02:00
Olivier Matz	83a8a143bb	eal: remove deprecated function for mbuf pool ops rte_eal_mbuf_default_mempool_ops() is replaced by rte_mbuf_best_mempool_ops(). Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-08-09 01:03:14 +02:00
Pablo de Lara	f9975d333a	hash: fix doxygen of return values rte_hash_lookup_data() and rte_hash_lookup_with_hash_data() functions return the index of the table where the key is stored when this is found, and not 0 as the Doxygen currently states. Also, these functions, and rte_hash_get_key_with_position() return negative values when keys are not found (-EINVAL and -ENOENT), where the minus sign was missing. Bugzilla ID: 78 Fixes: `473d1bebce` ("hash: allow to store data in hash table") Fixes: `6dc34e0afe` ("hash: retrieve a key given its position") Cc: stable@dpdk.org Reported-by: Petr Houska <t-pehous@microsoft.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-08-07 15:40:04 +02:00
Thomas Monjalon	db803c3e4c	ethdev: bump library version The old offload API is removed in 18.08, so the library version must be increased, in order to show the incompatibility with 18.05 one. Fixes: `ab3ce1e0c1` ("ethdev: remove old offload API") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-08-07 13:20:39 +02:00

... 2 3 4 5 6 ...

5001 Commits