From time to time, someone sends patches about unlinking existing
sockets when registering a vhost user in server mode.
A recent example:
http://dpdk.org/ml/archives/dev/2018-February/090025.html
This problem has been discussed many times, and it was made clear that
the library should not unlink files given by the application in order
to avoid possible security problems, such as removing random files
used by other programs.
One of the first discussions:
http://dpdk.org/ml/archives/dev/2015-December/030326.html
To avoid such patches in the future, it was decided to add a comment
that explains what is happening and tries to describe the reasoning.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Enable rx queue interrupts if the app requests them, and vNIC has
enough interrupt resources. Use interrupt vector 0 for link status and
errors. Use vector 1 for rx queue 0, vector 2 for rx queue 1, and so
on. So, with n rx queues, vNIC needs to have at n + 1 interrupts.
For VIC, enabling and disabling rx queue interrupts are simply
mask/unmask operations. VIC's credit based interrupt moderation is not
used, as the app wants to explicitly control when to enable/disable
interrupts.
This version requires MSI-X (vfio-pci). Sharing one interrupt for link
status and rx queues is possible, but is rather complex and has no
user demands.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
The driver provides a DMA buffer to the firmware when it requests port
stats. The NIC then fills that buffer with latest stats. Currently,
the driver allocates the DMA buffer the first time it requests stats
and saves it for later use. This can lead to crashes when
primary/secondary processes are involved. For example, the following
sequence crashes the secondary process.
1. Start a primary app that does not call rte_eth_stats_get()
2. dpdk-procinfo -- --stats
dpdk-procinfo crashes while trying to allocate the stats DMA buffer
because the alloc function pointer (vdev.alloc_consistent) is valid
only in the primary process, not in the secondary process.
Overwriting the alloc function pointer in the secondary process is not
an option, as it will simply make the pointer invalid in the primary
process. Instead, allocate the DMA buffer during probe so that only
the primary process does both allocate and free. This allows the
secondary process to dump stats as well.
Fixes: 9913fbb91d ("enic/base: common code")
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
VIC does not support VLAN filtering at the moment. The firmware does
accept the filter add/del commands and returns success. But, they are
no-ops. To avoid confusion, remove the filter set handler so the app
sees an error instead of silent failure.
Also during the device configure time, enicpmd_vlan_offload_set would
not print a warning message about unsupported VLAN filtering, because
the caller specifies only ETH_VLAN_STRIP_MASK. This is wrong, as we
should attempt to apply all requested offloads at the configure
time. So, pass all VLAN offload masks, which triggers a warning
message about VLAN filtering, if requested.
Finally, enicpmd_vlan_offload_set should check both mask and
rxmode.offloads, not just mask.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Currently, enic completely ignores the requested max Rx packet size
(rxmode.max_rx_pkt_len). The desired behavior is that the NIC hardware
drops packets larger than the requested size, even though they are
still smaller than MTU.
Cisco VIC does not have such a feature. But, we can accomplish a
similar (not same) effect by reducing the size of posted receive
buffers. Packets larger than the posted size get truncated, and the
receive handler drops them. This is also how the kernel enic driver
enforces the Rx side MTU.
This workaround works only when scatter mode is *not* used. When
scatter is used, there is currently no way to support
rxmode.max_rx_pkt_len, as the NIC always receives packets up to MTU.
For posterity, add a copious amount of comments regarding the
hardware's drop/receive behavior with respect to max/current MTU.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Currently, when more than 1 receive queues are configured, the driver
always enables RSS with the driver's own default hash type, key, and
RETA. The user is unable to change any of the RSS settings. Address
this by implementing the ethdev RSS API as follows.
Correctly report the RETA size, key size, and supported hash types
through rte_eth_dev_info.
During dev_configure(), initialize RSS according to the device's
mq_mode and rss_conf. Start with the default RETA, and use the default
key unless a custom key is provided.
Add the RETA and rss_conf query/set handlers to let the user change
RSS settings after the initial configuration. The hardware is able to
change hash type, key, and RETA individually. So, the handlers change
only the affected settings.
Refactor/rename several functions in order to make their intentions
clear. For example, remove all traces of RSS from
enicpmd_vlan_offload_set() as it is confusing.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Add support for filters which drop packets when forming MCDI request
for a filter.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Despite being versatile, the hardware support for filtering has a number
of special properties which must be taken into account. Namely, there is
a known set of valid filters which don't take any effect despite being
accepted by the hardware.
The combinations of match flags and field values which can describe the
exceptional filters are as follows:
- ETHER_TYPE or ETHER_TYPE | LOC_MAC with IPv4 or IPv6 EtherType
- ETHER_TYPE | IP_PROTO or ETHER_TYPE | IP_PROTO | LOC_MAC with UDP or
TCP IP protocol value
- The same combinations with OUTER_VID and/or INNER_VID
These exceptional filters can be expressed in terms of RTE flow rules.
If the user creates such a flow rule, no traffic will hit the underlying
filter, and no errors will be reported.
This patch adds a means to prevent such ineffective flow rules from
being created.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
To filter all traffic, need to create two hardware filter specifications
with both unknown unicast and unknown multicast destination MAC address
match flags.
In terms of RTE flow API, this would require adding multiple flow rules
with corresponding ETH items. In order to avoid such a complication, the
patch implements a mechanism to auto-complete an underlying filter
representation of a flow rule in order to create additional filter
specifications featuring the missing match flags.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Knowledge of a network identifier is not sufficient to construct a
workable hardware filter for encapsulated traffic. It's obligatory to
specify one of the match flags associated with inner frame destination
MAC. If the address is unknown, then one needs to specify either unknown
unicast or unknown multicast destination match flag.
In terms of RTE flow API, this would require adding multiple flow rules
with corresponding ETH items besides the tunnel item. In order to avoid
such a complication, the patch implements a mechanism to auto-complete
an underlying filter representation of a flow rule in order to create
additional filter specifications featuring the missing match flags.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Hardware filter specification for encapsulated traffic must contain
EtherType. In terms of RTE flow API, this would require L3 item to be
used in the flow rule. In the simplest case, if the user needs to filter
encapsulated traffic without knowledge of exact EtherType, they will
have to create multiple variants of the flow rule featuring all possible
L3 items (IPv4, IPv6), respectively. In order to hide the gory details
and avoid such a complication, this patch implements a mechanism to
auto-complete the filter specifications if need be.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Not all flow rules can be expressed in one hardware filter, so some flow
rules have to be expressed in terms of multiple hardware filters. This
patch provides a means to produce a filter spec template from the flow
rule which then can be used to produce a set of fully elaborated specs
to be inserted.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Support destination MAC address match in inner frames.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Exact match of virtual network identifier is supported by parser.
IP protocol match are enforced to UDP.
Only Ethernet protocol type is supported.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Exact match of virtual subnet ID is supported by parser.
IP protocol match are enforced to GRE.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Exact match of VXLAN network identifier is supported by parser.
IP protocol match are enforced to UDP.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Add filter match flag to distinguish filters applied only to
encapsulated packets.
Match flags set should allow to determine whether a filter
is supported or not. The problem is that if specification
has supported set outer match flags and specified
encapsulation without any inner flags, check says that it
is supported, and filter insertion is performed. However,
there is no filtering of the encapsulated traffic. A new
flag is added to solve this problem and separate the
filters for the encapsulated packets.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Reviewed-by: Mark Spender <mspender@solarflare.com>
This supports VNI/VSID and inner frame local MAC fields to
match in VXLAN, GENEVE, or NVGRE packets.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
This adds filters for encapsulated packets to the list
returned by ef10_filter_supported_filters().
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
The new code uses the new 32-bit Port Capabilities exclusively and
only translates to/from the old 16-bit Port Capabilities at the last
point possible when talking to older Firmware.
For the old versus new Firmware issue, we use the new FW_PARAMS_CMD[PFVF,
CAPS32] command to tell the Firmware that we want Asynchronous Port Status
updates to use the new 32-bit version of the Port Information message. If
we get an error, we know we're dealing with older Firmware, and if not,
we'll start getting th new 32-bit Port Capability message formats.
Also, refactor t4_handle_fw_rpl() to handle new 32-bit Port Capability
replies from firmware in t4_handle_get_port_info().
Original work by Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Update link configuration API to prepare for 32-bit port capability
support. Continue using 16-bit port capability for older firmware.
Original work by Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Normally, firmware reads various Forward Error Correction parameters
from a Transceiver Module i2c EPROM and uses a couple of IEEE Standards
(802.3bj for 100Gb/s and 802.3by for 25Gb/s) to interpret those
parameters and come up with supported and default FEC settings.
Firmware then sends these FEC parameters to the Host Driver which gives
the Host Administrator an opportunity to change them if necessary in
order to establish a Link with a Switch which may have made a
non-standard FEC decision.
This commit recognizes "auto" as a discrete FEC mode which can be
used to explicitly select the IEEE 802.3 standard based FEC selection.
Original work by Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Original work by Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Add firmware API for updating RSS hash configuration and key. Move
RSS hash configuration from cxgb4_write_rss() to a separate function
cxgbe_write_rss_conf().
Also, rename cxgb4_write_rss() to cxgbe_write_rss() for consistency.
Original work by Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Attach to rte_eth_dev devices allocated by Primary process for
Ports other than Port-0 in the secondary process.
Save the Primary rte_eth_dev device eth_dev_data as part of txq
structure needed for tx path.
Fixes: 8318984927 ("cxgbe: add pmd skeleton")
Cc: stable@dpdk.org
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Rework rte_eth_dev allocation for other ports under same PF.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
NetVSC netdevices which are already routed should not be probed because
they are used for management purposes by the HyperV.
The corrupted code got the routed devices from the system file
/proc/net/route and wrongly parsed only the odd lines, so devices which
their routes were in even lines, were considered as unrouted devices
and were probed.
Use linux netlink lib to detect the routed NetVSC devices instead of
file parsing.
Fixes: 31182fadfb ("net/vdev_netvsc: skip routed netvsc probing")
Cc: stable@dpdk.org
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Matan Azrad <matan@mellanox.com>
In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.
Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The current code compares two strings upto the length of 1st string
(searched name). If the 1st string is prefix of 2nd string (existing name),
the string comparison returns the port_id of earliest prefix matches.
This patch fixes the bug by using strcmp instead of strncmp.
Fixes: 9c5b8d8b9f ("ethdev: clean port id retrieval when attaching")
Cc: stable@dpdk.org
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In case osal_dma_alloc_coherent() or osal_dma_alloc_coherent_aligned() are
called from a management thread, core_id turn out to be LCORE_ID_ANY, and
the resulting socket for alloc will be socket 0.
This is not desirable when using a NIC from socket 1 which might very
likely be configured to use memory from that socket only.
In that case, allocation will fail.
To address this, use master lcore instead when called from mgmt thread.
The associated socket should have memory available.
Fixes: ec94dbc573 ("qede: add base driver")
Cc: stable@dpdk.org
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com>
Acked-by: Harish Patil <harish.patil@cavium.com>
Acked-by: Harish Patil <harish.patil@cavium.com>
My english is far worse than those from the marketing team.
Fixes: 80bc1752f1 ("nfp: add guide")
Fixes: d625beafc8 ("doc: update NFP with PF support information")
Fixes: 80987c40fd ("config: enable nfp driver on Linux")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Mixing numeric macros with bit shifts macros is not a good idea.
Fixes: 011411586e ("net/nfp: extend speed capabilities advertised")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
The barrier needs to be after reading the DD bit. It has not been
a problem because the potential reads which can not happen before
reading the DD bit seem to be far enough, so the compiler is not
rescheduling them. However, a refactoring could make this problem
to arise.
Fixes: b812daadad ("nfp: add Rx and Tx")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Although this can be done by the app, because other PMDs are doing it,
apps expect this behaviour from the PMD.
Fixes: b812daadad ("nfp: add Rx and Tx")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
primary_slave_port_id is uint16_t which needs to be correctly stored
with the same data type of input parameter in bond_ethdev_configure.
In powerpc, creating bond pmd results in below error due to wrong
cast on input param. This is reproducible, only when using shared
libraries.
sudo -E LD_LIBRARY_PATH=$PWD/$RTE_TARGET/lib $RTE_TARGET/app/testpmd \
-l 0,8 --socket-mem=1024,1024 \
--vdev 'net_tap0,iface=dpdktap0' --vdev 'net_tap1,iface=dpdktap1' \
--vdev 'net_bonding0,mode=1,slave=0,slave=1,primary=0,socket_id=1' \
-d $RTE_TARGET/lib/librte_pmd_tap.so \
-d $RTE_TARGET/lib/librte_mempool_ring.so -- --forward-mode=rxonly
Configuring Port 0 (socket 0)
PMD: net_tap0: 0x70a854070280: TX configured queues number: 1
PMD: net_tap0: 0x70a854070280: RX configured queues number: 1
Port 0: 86:EA:6D:52:3E:DB
Configuring Port 1 (socket 0)
PMD: net_tap1: 0x70a854074300: TX configured queues number: 1
PMD: net_tap1: 0x70a854074300: RX configured queues number: 1
Port 1: 42:9A:B8:49:B6:00
Configuring Port 2 (socket 1)
EAL: Failed to set primary slave port 7424 on bonded device net_bonding0
Fail to configure port 2
EAL: Error - exiting with code: 1
Cause: Start ports failed
Fixes: f8244c6399 ("ethdev: increase port id range")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>