Exception handling is executed in the normal path and it will cause
vhost-user init failure.
Fixes: d6983a70e259 ("vhost: check return of pthread calls")
Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Revert "devargs: make device types generic"
This commit broke the rte_devargs API by changing the meaning of
the rte_devtype enum.
Restore the previous API, unit tests and function calls.
Introduce parallel enum that acts as translation between previous API
and current structures.
Restoring the previous API means that -w and -b are not usable anymore
with any bus having implemented the "parse" operation. Only PCI devices
can be used with -w and -b, virtual devices are declared using vdev.
This (partially) reverts commit bd279a79366f50a4893fb84db91bbf64b56f9fb1.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The prior scan should link the relevant rte_devargs to the newly
allocated rte_device. As such, it is useless to pass device arguments to
the plug callback. Those arguments are available within the devargs
field of the rte_device structure.
Fixes: 7c8810f43f6e ("bus: introduce device plug/unplug")
Fixes: 00e62aae69c0 ("bus/pci: implement plug/unplug operations")
Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The device handle is already known and does not have to be infered from
the PCI address. The relevant helpers are already available within the
PCI bus to avoid searching for a handle already known.
Additionally, rte_memcpy.h was erroneously included.
Fixes: 00e62aae69c0 ("bus/pci: implement plug/unplug operations")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The field is set but never resetted on error.
This marks the device as being attached while it is not, and forbid
further attempts to hotplug it.
Fixes: 7917d5f5ea46 ("pci: initialize generic driver pointer")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
When an application requests the use of a PCI device, it can currently
interchangeably use either the longform DomBDF format (0000:00:00.0) or
the shorter BDF format (00:00.0).
When a device is inserted via the hotplug API, it must first be scanned
and then will be identified by its name using `find_device`. The name of
the device must match the name given by the user to be found and then
probed.
A new function sets the expected name for a scanned PCI device. It was
previously generated from parsing the PCI address. This canonical name
is superseded when an rte_devargs exists describing the device. In such
case, the device takes the given name found within the rte_devargs.
As the rte_devargs is linked to the rte_pci_device during scanning, it
can be avoided during the probe. Additionally, this fixes the issue of
the rte_devargs lookup not being done within rte_pci_probe_one.
Fixes: beec692c5157 ("eal: add name field to generic device")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The hotplug API requires a few properties that were not previously
explicitly enforced:
- Idempotency, two consecutive scans should result in the same state.
- Upon returning, internal devices are now allocated and available
through the new `find_device` operator, meaning that they should be
identifiable.
The current rte_eal_hotplug_add implementation identifies devices by
their names, as it is readily available and easy to define.
The device name must be passed to the internal rte_device handle in
order to be available during scan, when it is then assigned to the
device. The current way of passing down this information from the device
declaration is through the global rte_devargs list.
Furthermore, the rte_device cannot take a bus-specific generated name,
as it is then not identifiable by the `find_device` operator. The device
must take the user-defined name. Ideally, an rte_device name should not
change during its existence.
This commit generates a new rte_devargs associated with the plugged
device and inserts it in the global rte_devargs list. It consequently
releases it upon device removal.
Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Some buses will operate either in whitelist or blacklist mode.
This mode is currently passed down by the rte_eal_devargs_add function
with the devtype argument.
When inserting devices using the hotplug API, the implicit assumption is
that this device is being whitelisted, meaning that it is explicitly
requested by the application to be used. This can conflict with the
initial bus configuration.
While the rte_eal_devargs_add API is being deprecated soon, it cannot
be modified at the moment to accommodate this situation.
As such, this new experimental API offers a bare interface for inserting
rte_devargs without directly manipulating the global rte_devargs list.
This new function expects a fully-formed rte_devargs, previously parsed
and allocated.
It does not check whether the new rte_devargs is compatible with current
bus configuration, but will replace any eventual existing one for the same
device, allowing the hotplug operation to proceed. i.e. a previously
blacklisted device can be redefined as being whitelisted.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Hotplug support introduces the possibility of removing devices from the
system. Allocated resources must be freed.
Extend the rte_devargs API to allow freeing allocated resources.
This API is experimental and bound to change. It is currently designed
as a symetrical to rte_eal_devargs_add(), but the latter will evolve
shortly anyway.
Its DEVTYPE parameter is currently only used to specify scan policies,
and those will evolve in the next release. This evolution should
rationalize the rte_devargs API.
As such, the proposed API here is not the most convenient, but is
taylored to follow the current design and integrate easily with its main
use within rte_eal_hotplug_* functions.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This method must be implemented to allow using a unified, generic API to
hotplug devices, including virtual ones.
VDEV devices actually exist unattached after performing a scan on the
rte_devargs list. As such it makes sense to be able to perform a device
hotplug afterward.
Finally, missing this generic interface forces the EAL to be dependent
on vdev-specific API, which hinders the plan of moving the vdev bus to
drivers/bus.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The public API (struct rte_metric_name) includes the NULL terminator
byte in RTE_METRICS_MAX_NAME_LENGTH but the library itself internally
excludes it. This makes it possible for an application to receive an
unterminated name string. Fix be enforcing the NULL termination of all
name strings to the length that the public API expects.
Fixes: 349950ddb9c5 ("metrics: add information metrics library")
Cc: stable@dpdk.org
Signed-off-by: Remy Horton <remy.horton@intel.com>
IANA assigns a destination port of 4789 for the VXLAN in the Service
Name and Transport Protocol Port Number Registry. This is mentioned in
RFC 7348.
Fixes: f295a00a2b44 ("mbuf: add definitions of unified packet types")
Cc: stable@dpdk.org
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This commit allows the -S (captial 's') to be used to indicate
a corelist for Services. This is a "nice to have" patch, and does
not modify any of the service core functionality.
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Introducing the DEV_TX_OFFLOAD_MT_LOCKFREE TX capability flag.
if a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads
can invoke rte_eth_tx_burst() concurrently on the same tx queue without
SW lock. This PMD feature will be useful in the following use cases and
found in the OCTEON family of NPUs.
1) Remove explicit spinlock in some applications where lcores
to TX queues are not mapped 1:1.
example: OVS has such instance
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L299https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L1859
See the the usage of tx_lock spinlock.
2) In the eventdev use case, avoid dedicating a separate TX core for
transmitting and thus enables more scaling as all workers can
send the packets.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Add logic for parsing a coremask from EAL, which allows
the application to be unaware of the cores being taken from
its coremask.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This commit shows the changes required in rte_eal_init()
to transparently launch the service threads. The threads
are launched into the service worker functions here because
after rte_eal_init() the application is not gauranteed to
call any other DPDK API.
As the registration of services happens at initialization
time, the services that require CPU time are already available
when we reach the end of rte_eal_init().
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add header files, update .map files with new service
functions, and add the service header to the doxygen
for building.
This service header API allows DPDK to use services as
a concept of something that requires CPU cycles. An example
is a PMD that runs in software to schedule events, where a
hardware version exists that does not require a CPU.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Seen on Ubuntu 16.04 with GCC 5.4.0:
lib/librte_ether/rte_ethdev.c: In function 'get_mac_addr_index':
lib/librte_ether/rte_ethdev.c:2369:26: error:
'dev_info.max_mac_addrs' may be used uninitialized in this function
Indeed, rte_eth_dev_info_get() do not write into dev_info
if the port_id is not valid.
So we need to check the port_id and return in case of error.
This extra check should not be needed because the port_id is always
checked before calling get_mac_addr_index().
However it does not hurt.
Reported-by: Matan Azrad <matan@mellanox.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Matan Azrad <matan@mellanox.com>
The rte_flow feature breaks the monolithic approach for ethdev by
introducing the new rte_flow API to ethdev using a plugin-like approach.
Basically, the rte_flow API is still logically part of ethdev:
- It extends the ethdev functionality: rte_flow is a new feature/
capability of ethdev;
- all its functions work on an Ethernet device: the first parameter of the
rte_flow functions is Ethernet device port ID.
Also, the rte_flow API is a sort of capability plugin for ethdev:
- the rte_flow API functions have their own name space: they are called
rte_flow_operationXYZ() as opposed to rte_eth_dev_flow_operationXYZ());
- the rte_flow API functions are placed in separate files in the same
librte_ether folder as opposed to rte_ethdev.[hc].
The way it works is by using the existing ethdev API function
rte_eth_dev_filter_ctrl() to query the current Ethernet device port ID for
the support of the rte_flow capability and return the pointer to the
rte_flow operations when supported and NULL otherwise:
struct rte_flow_ops *eth_flow_ops;
int rte = rte_eth_dev_filter_ctrl(eth_port_id,
RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, ð_flow_ops);
This patch reuses the same approach for ethdev Traffic Management API.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Removed unnecessary macro RTE_STD_C11, which is used
for unnamed structs.
Since there is no longer an unnamed structure in
rte_cryptodev_sym_session, this is not needed and
it is actually breaking compilation on icc:
lib/librte_cryptodev/rte_cryptodev.h(887): error: expected a declaration
__extension__ void *sess_private_data[0];
^
Fixes: 7c110ce7aa4e ("cryptodev: remove mempool from session")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
It isn't necessary to use rte_bus_find_by_name() to find a reference to
our own bus.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Instead of getting the name from the devargs lets take it from the
rte_device.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
NXP Copyright has been wrongly worded with '(c)' at various places.
This patch removes these extra characters. It also removes
"All rights reserved".
Only NXP copyright syntax is changed. Freescale copyright is not
modified.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
vaddvq_u16() is not available for armv7.
Emulate the vaddvq_u16() using armv7 NEON intrinsics.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
From documentation it is very unclear how VMDq configuration can be
tweaked, and online search offer very poor results.
This patch will ultimately spawn an online documentation page
for the rte_eth_vmdq_rx_conf struct which will eventually add a bit of
documentation about the rx_mode tag and how to allow e.g. VMDq pools
to receive packets without VLAN tags.
Signed-off-by: Tom Barbette <tom.barbette@ulg.ac.be>
Acked-by: John McNamara <john.mcnamara@intel.com>
In order to be able to replicate a configuration onto a second port,
device configuration should be fully described and available.
Other configuration items (i.e. MAC addresses) are stored within
rte_eth_dev_data, but not this one.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This trivial patch removes wrong comments about
the return value of the rte_bus_dump(), as
this method does not return any value
(it's return type is void)
Fixes: a97725791eec ("bus: introduce bus abstraction")
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Virtual device/driver probing done via name.
A new alternative method introduced to probe the device with providing
driver name in devargs as "driver=<driver_name>".
This patch removes alternative method and fixes virtual device usages
with proper device names.
Fixes: 87c3bf29c642 ("test: do not short-circuit null device creation")
Fixes: d39670086a63 ("eal: parse driver argument before probing drivers")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
- gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
to merge packets.
- gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
- gro_tcp4_tbl_pkt_count: return the number of packets in a TCP/IPv4
reassembly table.
- gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
reassembly table.
TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).
In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
array and a item array, where the key array keeps the criteria to merge
packets and the item array keeps packet information.
One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.
Each element in the item array keeps the information of one packet. It
mainly includes three parts:
- firstseg: the address of the first segment of the packet
- lastseg: the address of the last segment of the packet
- next_pkt_index: the index of the next packet in the same item group.
All packets in the same item group are chained by next_pkt_index.
With next_pkt_index, we can locate all packets in the same item
group one by one.
To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with one of the
following properties won't be processed:
- FIN, SYN, RST, URG, PSH, ECE or CWR bit is set;
- packet payload length is 0.
b. traverse the key array to find a key which has the same criteria
value with the incoming packet. If find, goto step c. Otherwise,
insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
key. Then traverse all packets in the item group via next_pkt_index.
If find one packet which can merge with the incoming one, merge them
together. If can't find, insert the packet into this item group.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.
To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.
rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.
rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in GRO reassembly tables. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO context object, which keeps reassembly tables of
desired GRO types, by rte_gro_ctx_create. Then applications can use
rte_gro_reassemble to merge packets. The GROed packets are in the
reassembly tables of the GRO context object. If applications want to get
them, applications need to manually flush them by flush API.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Introduce a more versatile helper to parse device strings. This
helper expects a generic rte_devargs structure as storage in order not
to require API changes in the future, should this structure be
updated.
The old equivalent function is thus being deprecated, as its API does
not allow to accompany rte_devargs evolutions.
A deprecation notice is issued.
This new helper will parse bus information as well as device name and
device parameters. It does not allocate an rte_devargs structure and
expects one to be given as input.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
rte_devargs now represents any device from any bus.
The related devtypes do not identify a bus anymore, only which scan
policy the device subscribes to.
The bus itself is identified by a bus handle previously introduced.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Remove the dependency of this subsystem upon bus specific device
representation.
Devargs only validates that a device declaration is correct and handled
by a bus. The device interpretation is done afterward within the bus.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Scan policies describe the way a bus should scan the system to search
for possible devices.
Three flags are introduced:
RTE_BUS_SCAN_UNDEFINED: Configuration is irrelevant for this bus
RTE_BUS_SCAN_WHITELIST: Scanning should be limited to declared devices
RTE_BUS_SCAN_BLACKLIST: Scanning should exclude only declared devices
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Device kernel module is a device attribute.
It is used in generic device structures and must not be tied to a bus.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Find which bus should be able to parse this device name into an internal
device representation.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This operation can be used either to validate that a device
representation can be understood by a bus, as well as store the resulting
specialized device representation in any format determined by the bus.
Implementing this function allows EAL initialization routines to infer
which bus should handle a device. This is used as a way to respect
backward compatibility.
This API will disappear once this compatibility is not enforced anymore.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
The bus name was stored with embedded double quotes.
Indeed the bus name is given with a string in a macro,
which is not used elsewhere.
These macros are useless because the buses are drivers,
so they must not have any API for the application writer.
The registration can be done with a hardcoded value without quotes.
There is another (small) benefit of not using macros for driver names:
it is to have a meaningful constructor function name.
For instance, it was businitfn_PCI_BUS_NAME instead of businitfn_pci.
The bus registration macro is also changed to use
the new RTE_INIT_PRIO macro, similar to RTE_INIT used for other drivers.
The priority is the highest (101) in order to be sure that the bus driver
is registered before its device drivers.
Fixes: 0fd1a0eaae19 ("pci: add bus driver")
Fixes: fea892e35f21 ("bus/vdev: use standard bus registration")
Fixes: 7e7df6d0a41d ("bus/fslmc: introduce fsl-mc bus driver")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
A separate boolean variable is not necessary when searching for
starting point in find_device. Just use the passed argument
as its own flag value.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When adding items to a hash table with multiple threads,
there is an spinlock used to prevent data corruption
(unless Transactional Memory is supported).
If there is a failure, the spinlock should be released,
but there were cases where that was not happening.
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org
Signed-off-by: Mike Stolarchuk <mike.stolarchuk@bigswitch.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>