numam-dpdk

Author	SHA1	Message	Date
Anatoly Burakov	1a7dc2252f	mem: revert to using flock and add per-segment lockfiles The original implementation used flock() locks, but was later switched to using fcntl() locks for page locking, because fcntl() locks allow locking parts of a file, which is useful for single-file segments mode, where locking the entire file isn't as useful because we still need to grow and shrink it. However, according to fcntl()'s Ubuntu manpage [1], semantics of fcntl() locks have a giant oversight: This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. Basically, closing any fd with an fcntl() lock (which we do because we don't want to leak fd's) will drop the lock completely. So, in this commit, we will be reverting back to using flock() locks everywhere. However, that still leaves the problem of locking parts of a memseg list file in single file segments mode, and we will be solving it with creating separate lock files per each page, and tracking those with flock(). We will also be removing all of this tailq business and replacing it with a simple array - saving a few bytes is not worth the extra hassle of dealing with pointers and potential memory allocation failures. Also, remove the tailq lock since it is not needed - these fd lists are per-process, and within a given process, it is always only one thread handling access to hugetlbfs. So, first one to allocate a segment will create a lockfile, and put a shared lock on it. When we're shrinking the page file, we will be trying to take out a write lock on that lockfile, which would fail if any other process is holding onto the lockfile as well. This way, we can know if we can shrink the segment file. Also, if no other locks are found in the lock list for a given memseg list, the memseg list fd is automatically closed. One other thing to note is, according to flock() Ubuntu manpage [2], upgrading the lock from shared to exclusive is implemented by dropping and reacquiring the lock, which is not atomic and thus would have created race conditions. So, on attempting to perform operations in hugetlbfs, we will take out a writelock on hugetlbfs directory, so that only one process could perform hugetlbfs operations concurrently. [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Fixes: `a5ff05d60f` ("mem: support unmapping pages at runtime") Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	046aa5c447	mem: add memalloc init stage Currently, memseg lists for secondary process are allocated on sync (triggered by init), when they are accessed for the first time. Move this initialization to a separate init stage for memalloc. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	1be7644986	mem: improve autodetection of hugepage counts on 32-bit For non-legacy mode, we are preallocating space for hugepages, so we know in advance which pages we will be able to allocate, and which we won't. However, the init procedure was using hugepage counts gathered from sysfs and paid no attention to hugepage sizes that were actually available for reservation, and failed on attempts to reserve unavailable pages. Fix this by limiting total page counts by number of pages actually preallocated. Also, VA preallocate procedure only looks at mountpoints that are available, and expects pages to exist if a mountpoint exists. That might not necessarily be the case, so also check if there are hugepages available for a particular page size on a particular NUMA node. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	e82ca1a75e	mem: improve preallocation on 32-bit Previously, if we couldn't preallocate VA space on 32-bit for one page size, we simply bailed out, even though we could've tried allocating VA space with other page sizes. For example, if user had both 1G and 2M pages enabled, and has asked DPDK to allocate memory on both sockets, DPDK would've tried to allocate VA space for 1x1G page on both sockets, failed and never tried again, even though it could've allocated the same 1G of VA space for 512x2M pages. Fix this by retrying with different page sizes if VA space reservation failed. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	a99e8df63f	mem: fix 32-bit memory upper limit for non-legacy mode 32-bit mode has an upper limit on amount of VA space it can preallocate, but the original implementation used the wrong constant, resulting in failure to initialize due to integer overflow. Fix it by using the correct constant. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	64b6fcb161	malloc: check for heap corruption Previous code checked for both first/last elements being NULL, but if they weren't, the expectation was that they're both non-NULL, which will be the case under normal conditions, but may not be the case due to heap structure corruption. Coverity issue: 272566 Fixes: `bb372060da` ("malloc: make heap a doubly-linked list") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	0af8db3172	malloc: fix out-of-bounds segment array access Technically, while the pointer would've been invalid if msl_idx were invalid, we wouldn't have actually attempted to access the pointer until verifying the index. Fix it by moving array access to after we've verified validity of the index. Coverity issue: 272574 Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	627e80e4f6	malloc: replace snprintf with strlcpy Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	8f91f368a1	mem: log page address before unmapping If user has specified a flag to unmap the area right after mapping it, we were passing an already-unmapped pointer to RTE_LOG. This is not an issue since RTE_LOG doesn't actually dereference the pointer, but fix it anyway by moving call to RTE_LOG to before unmap. Coverity issue: 272584 Fixes: `b7cc54187e` ("mem: move virtual area function in common directory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	0f1631be24	mem: fix page fault trigger Coverity reports these lines as having no effect. Technically, we do want for those lines to have no effect, however they would've likely been optimized out. Add volatile qualifiers to ensure the code has effects. Coverity issue: 272608 Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	e27ffec169	mem: fix potential bad unmap on map failure Previously, if mmap failed to map page address at requested address, we were attempting to unmap the wrong address. Fix it by unmapping our actual mapped address, and jump further to avoid unmapping memory that is not allocated. Coverity issue: 272602 Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	8ee25c7e81	mem: fix comparison of old policy Previous code had an old rebase leftover from the time when oldpolicy was an actual int, instead of a pointer. Fix it to do comparison with dereferencing the pointer. Coverity issue: 272589 Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	8dfb09dee4	mem: fix potential resource leak on alloc Normally, tailq entry should have a valid fd by the time we attempt to map the segment. However, in case it doesn't, we're leaking fd, so fix it. Coverity issue: 272570 Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	b48f859a03	mem: fix potential resource leak on freeing We close fd if we managed to find it in the list of allocated segment lists (which should always be the case under normal conditions), but if we didn't, the fd was leaking. Close it if we couldn't find it in the segment list. This is not an issue as if the segment is zero length, we're getting rid of it anyway, so there's no harm in not storing the fd anywhere. Coverity issue: 272568 Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	6f0fa9f238	mem: fix potential double close on map failure We were closing descriptor before checking if mapping has failed, but if it did, we did a second close afterwards. Fix it by moving closing descriptor to after we've done all error checks. Coverity issue: 272560 Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	5441fcfd87	mem: fix resource leak on map failure Coverity issue: 272601 Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	42c2a6a819	mem: use strlcpy instead of snprintf Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Jianfeng Tan	5e6df556f6	mem: fix resize return handling for --single-file-segments resize_hugefile() returns either 0 (which indicates success) or -1 (which indicates failure). We failed to check the success as we use --single-file-segments option. Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 23:42:40 +02:00
Jianfeng Tan	3d09a6e26d	eal: fix threads block on barrier Below commit introduced pthread barrier for synchronization. But two IPC threads block on the barrier, and never wake up. (gdb) bt #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184 #3 rte_thread_init (arg=0x7fffffffcfe0) at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Through analysis, we find the barrier defined on the stack could be the root cause. This patch will change to use heap memory as the barrier. Fixes: `d651ee4919` ("eal: set affinity for control threads") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-04-27 21:47:43 +02:00
Yongseok Koh	a53aa2b9f3	mbuf: support attaching external buffer This patch introduces a new way of attaching an external buffer to a mbuf. Attaching an external buffer is quite similar to mbuf indirection in replacing buffer addresses and length of a mbuf, but a few differences: - When an indirect mbuf is attached, refcnt of the direct mbuf would be 2 as long as the direct mbuf itself isn't freed after the attachment. In such cases, the buffer area of a direct mbuf must be read-only. But external buffer has its own refcnt and it starts from 1. Unless multiple mbufs are attached to a mbuf having an external buffer, the external buffer is writable. - There's no need to allocate buffer from a mempool. Any buffer can be attached with appropriate free callback. - Smaller metadata is required to maintain shared data such as refcnt. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 20:11:25 +02:00
Fan Zhang	613e827fb2	vhost/crypto: fix checks while moving descriptors This patch fix final condition check while moving virtqueue descriptors. Fixes: `3bb595ecd6` ("vhost/crypto: add request handler") Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 19:49:20 +02:00
Fan Zhang	d4cc4c65df	vhost/crypto: fix missing head correction This patch fixes the missing head descriptor correction for indirect descriptors. Fixes: `0aee242841` ("vhost/crypto: move to safe GPA translation API") Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 19:49:07 +02:00
Xiao Wang	dfdf4b84b8	vhost: fix vDPA set features We should call set_features callback after setting features in virtio_net structure, otherwise vDPA driver cannot get the right features. Fixes: `07718b4f87` ("vhost: adapt library for selective datapath") Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 18:01:00 +01:00
Maxime Coquelin	bb77d555d4	vhost: revert avoid concurrency when logging dirty pages This reverts commit `394313fff3`. While the patch did solve concurrency issue, it induces more pages copies as some clean pages are marked as dirty for performance reasons. Moreover, as there is no more contention doing the logging, the rate of packets than can be processed is higher, leading to even more pages to be dirtied. It has been reported that with more than one queue pair, and with a relatively low packet rate (1Mpps), the live migration never converges until the flow is stopped. While a better solution is found, it is better to reset to the old behaviour, i.e. using atomic operation for dirty pages logging. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 18:01:00 +01:00
Ferruh Yigit	01eb53eefe	ethdev: rename folder to library name Library folder name and output library name are same except a few flaws including librte_ether. This library is network device abstraction layer, the name "ethdev" fits better than "ether", and library & header files already named as ethdev. Also there is a rte_ether.h in the net library which can cause confusion. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-27 18:01:00 +01:00
Declan Doherty	fb8fd96d42	ethdev: add shared counter to flow API Add rte_flow_action_count action data structure to enable shared counters across multiple flows on a single port or across multiple flows on multiple ports within the same switch domain. Also this enables multiple count actions to be specified in a single flow action. This patch also modifies the existing rte_flow_query API to take the rte_flow_action structure as an input parameter instead of the rte_flow_action_type enumeration to allow querying a specific action from a flow rule when multiple actions of the same type are specified. This patch also contains updates for the bonding, failsafe and mlx5 PMDs and testpmd application which are affected by this API change. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	e05419b3f0	ethdev: add mark flow item Introduces a new action type RTE_FLOW_ITEM_TYPE_MARK which enables flow patterns to specify arbitrary integer values to match aginst set by the RTE_FLOW_ACTION_TYPE_MARK action in previously matched flows. Add support for specification of new MARK flow item in testpmd's cli. Update testpmd documentation to describe new MARK flow item support. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	2f82d143fb	ethdev: add group jump action Add jump action type which defines an action which allows a matched flow to be redirect to the specified group. This allows physical and logical flow table/group hierarchies to be defined through rte_flow. This breaks ABI compatibility for the following public functions (as it modifes the ordering of the rte_flow_action_type enumeration): - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Add support for specification of new JUMP action to testpmd's flow cli, and update the testpmd documentation to describe this new action. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	3850cf0c8c	ethdev: add tunnel encap/decap actions Add new flow action types and associated action data structures to support the encapsulation and decapsulation of VXLAN and NVGRE tunnel endpoints. The RTE_FLOW_ACTION_TYPE_[VXLAN/NVGRE]_ENCAP action will cause the matching flow to be encapsulated in the tunnel endpoint overlay defined in the [vxlan/nvgre]_encap action data. The RTE_FLOW_ACTION_TYPE_[VXLAN/NVGRE]_DECAP action will cause all headers associated with the outer most tunnel endpoint of the specified type for the matching flows. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	ce92504063	ethdev: add switch domain allocator Add switch domain allocate and free API to enable NET devices to synchronise switch domain allocation. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Remy Horton	33af337773	ethdev: add common devargs parser Introduces a new structure, rte_eth_devargs, to support generic ethdev arguments common across NET PMDs, with a new API rte_eth_devargs_parse API to support PMD parsing these arguments. The patch add support for a representor argument passed with passed with the EAL -w option. The representor parameter allows the user to specify which representor ports to initialise on a device. The argument supports passing a single representor port, a list of port values or a range of port values. -w BDF,representor=1 # create representor port 1 on pci device BDF -w BDF,representor=[1,2,5,6,10] # create representor ports in list -w BDF,representor=[0-31] # create representor ports in range Signed-off-by: Remy Horton <remy.horton@intel.com> Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Declan Doherty	736b30ebf2	ethdev: add port representor device flag Add new device flag to specify that an ethdev port is a port representor. Extend rte_eth_dev_info structure to expose device flags to the user which enables applications to discover if a port is a representor port. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Declan Doherty	e489007a41	ethdev: add generic create/destroy ethdev APIs Add new bus generic ethdev create/destroy APIs which are bus independent and provide hooks for bus specific initialisation. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Declan Doherty	0804dfc209	ethdev: add switch identifier parameter to port Introduces a new port attribute to ethdev port's which denotes the switch domain a port belongs to. By default all port's switch identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports which supported the concept of switch domains can be configured with the same switch domain id. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 18:00:56 +01:00
Qi Zhang	7e3389b172	ethdev: add VLAN and MPLS actions to flow API Add support for the following OpenFlow-defined actions: - RTE_FLOW_ACTION_OF_POP_VLAN: pop the outer VLAN tag. - RTE_FLOW_ACTION_OF_PUSH_VLAN: push a new VLAN tag. - RTE_FLOW_ACTION_OF_SET_VLAN_VID: set the 802.1q VLAN id. - RTE_FLOW_ACTION_OF_SET_VLAN_PCP: set the 802.1q priority. - RTE_FLOW_ACTION_OF_POP_MPLS: pop the outer MPLS tag. - RTE_FLOW_ACTION_OF_PUSH_MPLS: push a new MPLS tag. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:55 +01:00
Qi Zhang	1c54c93809	ethdev: add TTL change actions to flow API Add support for the following OpenFlow-defined actions: - RTE_FLOW_ACTION_OF_SET_MPLS_TTL: MPLS TTL. - RTE_FLOW_ACTION_OF_DEC_MPLS_TTL: decrement MPLS TTL. - RTE_FLOW_ACTION_OF_SET_NW_TTL: IP TTL. - RTE_FLOW_ACTION_OF_DEC_NW_TTL: decrement IP TTL. - RTE_FLOW_ACTION_OF_COPY_TTL_OUT: copy TTL "outwards". - RTE_FLOW_ACTION_OF_COPY_TTL_IN: copy TTL "inwards". Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:55 +01:00
Qi Zhang	a903c049be	ethdev: add neighbor discovery to flow API - RTE_FLOW_ITEM_TYPE_ARP_ETH_IPV4: matches an ARP header for Ethernet/IPv4. - RTE_FLOW_ITEM_TYPE_IPV6_EXT: matches the presence of any IPv6 extension header. - RTE_FLOW_ITEM_TYPE_ICMP6: matches any ICMPv6 header. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_NS: matches an ICMPv6 neighbor discovery solicitation. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_NA: matches an ICMPv6 neighbor discovery advertisement. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT: matches the presence of any ICMPv6 neighbor discovery option. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_ETH_SLA: matches an ICMPv6 neighbor discovery source Ethernet link-layer address option. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_ETH_TLA: matches an ICMPv6 neighbor discovery target Ethernet link-layer address option. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:55 +01:00
Xueming Li	063f39f26d	ethdev: introduce tunnel type MPLS-in-GRE and MPLS-in-UDP This patch adds new tunnel type for MPLS-in-GRE and MPLS-in-UDP. MPLS-in-GRE protocol link: https://tools.ietf.org/html/rfc4023 MPLS-in-UDP protocol link: https://tools.ietf.org/html/rfc7510 Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 18:00:55 +01:00
Xueming Li	6f99e5b54e	ethdev: introduce new tunnel VXLAN-GPE VXLAN-GPE enables VXLAN for all protocols. Protocol link: https://www.ietf.org/id/draft-ietf-nvo3-vxlan-gpe-05.txt Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-27 18:00:55 +01:00
Adrien Mazarguil	fc6bbb3f28	ethdev: add port ID item and action to flow API RTE_FLOW_ACTION_TYPE_PORT_ID brings the ability to inject matching traffic into a different device, as identified by its DPDK port ID. This is normally only supported when the target port ID has some kind of relationship with the port ID the flow rule is created against, such as being exposed by a common physical device (e.g. a different port of an Ethernet switch). The converse pattern item, RTE_FLOW_ITEM_TYPE_PORT_ID, makes the resulting flow rule match traffic whose origin is the specified port ID. Note that specifying a port ID that differs from the one the flow rule is created against is normally meaningless (if even accepted), but can make sense if combined with the transfer attribute. These must not be confused with their PHY_PORT counterparts, which refer to physical ports using device-specific indices, but unlike PORT_ID are not necessarily tied to DPDK port IDs. This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	e7b657058f	ethdev: add physical port action to flow API This patch adds the missing action counterpart to the PHY_PORT pattern item, that is, the ability to directly inject matching traffic into a physical port of the underlying device. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	fee1fa0285	ethdev: rename physical port item in flow API While RTE_FLOW_ITEM_TYPE_PORT refers to physical ports of the underlying device using specific identifiers, these are often confused with DPDK port IDs exposed to applications in the global name space. Since this pattern item is seldom used, rename it RTE_FLOW_ITEM_PHY_PORT for better clarity. No ABI impact. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	39b8dda700	ethdev: fix behavior of VF/PF in flow API Contrary to all other pattern items, these are inconsistently documented as affecting traffic instead of simply matching its origin, without provision for the latter. This commit clarifies documentation and updates PMDs since the original behavior now has to be explicitly requested using the new transfer attribute. It breaks ABI compatibility for the following public functions: - rte_flow_create() - rte_flow_validate() Impacted PMDs are bnxt and i40e, for which the VF pattern item is now only supported when a transfer attribute is also present. Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	76e9a55b5b	ethdev: add transfer attribute to flow API This new attribute enables applications to create flow rules that do not simply match traffic whose origin is specified in the pattern (e.g. some non-default physical port or VF), but actively affect it by applying the flow rule at the lowest possible level in the underlying device. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	0730ab674c	ethdev: fix default VLAN TCI mask in flow API VLAN TCI is a 16-bit field broken down as PCP (3b), DEI (1b) and VID (12b). The default mask used by PMDs for the VLAN pattern when one isn't provided by the application comprises the entire TCI, which is problematic because most devices only support VID matching. This forces applications to always provide a mask limited to the VID part in order to successfully apply a flow rule with a VLAN pattern item. Moreover, applications rarely want to match PCP and DEI intentionally. Given the above and since VID is what is commonly referred to when talking about VLAN, this commit excludes PCP and DEI from the default mask. Fixes: `6de5c0f130` ("ethdev: define default item masks in flow API") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	e58638c324	ethdev: fix TPID handling in flow API TPID handling in rte_flow VLAN and E_TAG pattern item definitions is not consistent with the normal stacking order of pattern items, which is confusing to applications. Problem is that when followed by one of these layers, the EtherType field of the preceding layer keeps its "inner" definition, and the "outer" TPID is provided by the subsequent layer, the reverse of how a packet looks like on the wire: Wire: [ ETH TPID = A \| VLAN EtherType = B \| B DATA ] rte_flow: [ ETH EtherType = B \| VLAN TPID = A \| B DATA ] Worse, when QinQ is involved, the stacking order of VLAN layers is unspecified. It is unclear whether it should be reversed (innermost to outermost) as well given TPID applies to the previous layer: Wire: [ ETH TPID = A \| VLAN TPID = B \| VLAN EtherType = C \| C DATA ] rte_flow 1: [ ETH EtherType = C \| VLAN TPID = B \| VLAN TPID = A \| C DATA ] rte_flow 2: [ ETH EtherType = C \| VLAN TPID = A \| VLAN TPID = B \| C DATA ] While specifying EtherType/TPID is hopefully rarely necessary, the stacking order in case of QinQ and the lack of documentation remain an issue. This patch replaces TPID in the VLAN pattern item with an inner EtherType/TPID as is usually done everywhere else (e.g. struct vlan_hdr), clarifies documentation and updates all relevant code. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Summary of changes for PMDs that implement ETH, VLAN or E_TAG pattern items: - bnxt: EtherType matching is supported with and without VLAN, but TPID matching is not and triggers an error. - e1000: EtherType matching is only supported with the ETHERTYPE filter, which does not support VLAN matching, therefore no impact. - enic: same as bnxt. - i40e: same as bnxt with existing FDIR limitations on allowed EtherType values. The remaining filter types (VXLAN, NVGRE, QINQ) do not support EtherType matching. - ixgbe: same as e1000, with additional minor change to rely on the new E-Tag macro definition. - mlx4: EtherType/TPID matching is not supported, no impact. - mlx5: same as bnxt. - mvpp2: same as bnxt. - sfc: same as bnxt. - tap: same as bnxt. Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Fixes: `99e7003831` ("net/ixgbe: parse L2 tunnel filter") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	18aee2861a	ethdev: add encap level to RSS flow API action RSS hash types (ETH_RSS_* macros defined in rte_ethdev.h) describe the protocol header fields of a packet that must be taken into account while computing RSS. When facing encapsulated (e.g. tunneled) packets, there is an ambiguity as to whether these should apply to inner or outer packets. Applications need the ability to tell exactly "where" RSS must be performed. This is addressed by adding encapsulation level information to the RSS flow action. Its default value is 0 and stands for the usual unspecified behavior. Other values provide a specific encapsulation level. Contrary to the change announced by commit `676b605182` ("doc: announce ethdev API change for RSS configuration"), this patch does not affect struct rte_eth_rss_conf but struct rte_flow_action_rss as the former is not used anymore by the RSS flow action. ABI impact is therefore limited to rte_flow. This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	929e331934	ethdev: add hash function to RSS flow API action By definition, RSS involves some kind of hash algorithm, usually Toeplitz. Until now it could not be modified on a flow rule basis and PMDs had to always assume RTE_ETH_HASH_FUNCTION_DEFAULT, which remains the default behavior when unspecified (0). This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	ac8d22de23	ethdev: flatten RSS configuration in flow API Since its inception, the rte_flow RSS action has been relying in part on external struct rte_eth_rss_conf for compatibility with the legacy RSS API. This structure lacks parameters such as the hash algorithm to use, and more recently, a method to tell which layer RSS should be performed on [1]. Given struct rte_eth_rss_conf will never be flexible enough to represent a complete RSS configuration (e.g. RETA table), this patch supersedes it by extending the rte_flow RSS action directly. A subsequent patch will add a field to use a non-default RSS hash algorithm. To that end, a field named "types" replaces the field formerly known as "rss_hf" and standing for "RSS hash functions" as it was confusing. Actual RSS hash function types are defined by enum rte_eth_hash_function. This patch updates all PMDs and example applications accordingly. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() [1] commit `676b605182` ("doc: announce ethdev API change for RSS configuration") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:53 +01:00
Adrien Mazarguil	19b3bc47c6	ethdev: fix C99 flexible arrays from flow API This patch replaces C99-style flexible arrays in struct rte_flow_action_rss and struct rte_flow_item_raw with standard pointers to the same data. They proved difficult to use in the field (e.g. no possibility of static initialization) and unsuitable for C++ applications. Affected PMDs and examples are updated accordingly. This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-04-27 18:00:53 +01:00

1 2 3 4 5 ...

4283 Commits