When binding with vfio-pci, secondary process cannot be started with
an error message:
cannot find TAILQ entry for PCI device.
It's due to: struct rte_pci_addr is padded with 1 byte for alignment
by compiler. Then below comparison in commit 2f4adfad0a69
("vfio: add multiprocess support") will fail if the last byte is not
initialized.
memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)
And commit cdc242f260e7 ("eal/linux: support running as unprivileged user")
just triggers this bug by using a stack un-initialized variable.
The fix is to use rte_eal_compare_pci_addr() for pci addr comparison.
Fixes: 2f4adfad0a69 ("vfio: add multiprocess support")
Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org
Reported-by: Pawel Rutkowski <pawelx.rutkowski@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Some compilers require definition of vfio_iommu_spapr_tce_ddw_info
before its use in vfio_iommu_spapr_tce_info, so move tce_info
definition below tce_ddw_info.
Fixes: 468f42cc2645 ("vfio: fix build on old kernel")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Current device hotplug is just supported by UIO managed devices.
This patch adds same functionality with VFIO.
It has been validated through tests using IOMMU and also with
VFIO and no-iommu mode.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The flags member of irq_set should be ORed with VFIO_IRQ_SET_ACTION_MASK
and not VFIO_IRQ_SET_ACTION_UNMASK. The bug was found by code inspection.
Fixes: 5c782b3928b8 ("vfio: interrupts")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
For now, exit the init. It's likely that even aborting the initialization
is premature in this case, as it may be possible to proceed even if one
bus or another is not available.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Even if one vdev should fail, there's no need to prevent further
processing. Log the error, and reflect it to the higher levels to
decide.
Seems like it's possible to continue. At least, the error is reflected
properly in the logs. A user could then go and correct or investigate
the situation.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Some devices may be inaccessible for a variety of reasons, or the
PCI-bus may be unavailable causing the whole thing to fail. Still,
better to continue attempts at probes.
Since PCI isn't neccessarily required, it may be possible to simply log
the error and continue on letting the user check the logs and restart
the application when things have failed.
This will usually be an issue because of permissions. However, it could
also be caused by OOM. In either case, errno will contain the
underlying cause.
For linux, it is safe to re-init the system here, so allow the
application to take corrective action and reinit.
For BSD, this is not the case, for other reasons, including hugepage
allocation has already happened, and needs to be properly uninitialized.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Plugins are useful and important. However, it seems crazy to abort
everything just because they don't initialize properly.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There could be some confusion as to why the call failed - this change
will always reflect the value of the error in rte_error.
When initializing the interrupt thread, there are a number of possible
reasons for failure - some of which are correctable by the application.
Do not panic() needlessly, and give the application a change to reflect
this information to the user.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
After code inspection, there is no way for eal_timer_init() to fail. It
simply returns 0 in all cases. As such, this test could either go-away
or stay here as 'future-proofing'.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When log initialization fails, it's generally because the fopencookie
failed. While this is rare in practice, it could happen, and it is
likely because of memory pressure. So, flag the error, and allow the
user to retry.
Memory init can only fail when access to hugepages (either as primary or
secondary process) fails (and that is usually permissions). Since the
manner of failure is not reversible, we cannot allow retry.
There are some theoretical racy conditions in the system that _could_
cause early tailq init to fail; however, no need to panic the
application. While it can't continue using DPDK, it could make better
alerts to the user.
rte_eal_alarm_init() call uses the linux timerfd framework to create a
poll()-able timer using standard posix file operations. This could fail
for a few reasons given in the man-pages, but many could be
corrected by the user application. No need to panic.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When memzone initialization fails, report the error to the calling
application rather than panic(). Without a good way of detaching /
releasing hugepages, at this point the application will have to restart.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
It's possible that the application could take a corrective action here,
and either prompt the user for different arguments, or at least perform
a better logging. Exiting this early prevents any useful information
gathering from the application layer.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When attempting to scan hugepages, signal to the eal that an error has
occurred, rather than performing a panic.
If we fail to acquire hugepage information, simply signal an error to
the application. This clears the run_once counter, allowing the user or
application to take a corrective action and retry.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This adds a new API to check for the eal cpu versions.
It's now possible to gracefully exit the application, or for
applications which support non-dpdk datapaths working in concert with
DPDK datapaths, there no longer is the possibility of exiting for
unsupported CPUs.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There may be no way to gracefully recover, but the application
should be notified that a failure happened, rather than completely
aborting. This allows the user to proceed with a "slow-path" type
solution.
After this change, the EAL CPU NUMA node resolution step can no longer
emit an rte_panic. This aligns with the code in rte_eal_init, which
expects failures to return an error code.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
When removing log history functions, the map has not been updated.
Fixes: d7e61ad3ae36 ("log: remove deprecated history dump")
Reported-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The max number of interrupt request is possible
be changed after rte_intr_callback_register, so
in get_max_intr, we need to check if necessary to
update the max_intr.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
The "dev->intr_handle.fd" is possibly a negative value while it is
passed as an argument to function "close". Fix the check to the fd.
Fixes: 5a60a7ffc801 ("pci: introduce functions to alloc and free uio resource")
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
When a secondary process wants access to the VFIO container file
descriptor, the primary process calls vfio_get_container_fd() which
always opens an entirely new file descriptor on /dev/vfio/vfio.
However, once the file descriptor has been passed to the subprocess, it
is effectively duplicated, meaning that the copy of the file descriptor
in the primary process is no longer needed. However, the primary
process does not close the duplicate fd, which results in a resource
leak.
This can be reproduced by starting a primary process with a small
RLIMIT_NOFILE limit configured to use VFIO for at least one device, and
repeatedly launching secondary processes until the file descriptor limit
is exceeded.
Fix the resource leak by closing the local vfio container file
descriptor after passing it to the secondary process.
Fixes: 2f4adfad0a69 ("vfio: add multiprocess support")
Cc: stable@dpdk.org
Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Bus implementations can implement a probe handler to match the devices
scanned against the drivers registered.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Scan for bus discovers the devices available on the bus and adds them
to a bus specific device list. Each bus mandatorily implements this
method.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch introduces the rte_bus abstraction for EAL.
The model is:
- One or more devices are connected to a Bus
- Drivers are running instances which manage one or more devices
- Bus is responsible for identifying devices (and interrupt propogation)
- Driver is responsible for initializing the device
This patch adds a 'rte_bus' base class which would be extended for
specific implementations. It also introduces Bus registration and
deregistration functions.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Instead of passing domain, bus, devid, func, just pass
an rte_pci_addr.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Both register/unregister and enable/disable don't necessarily require the
rte_intr_handle to be modifiable. Therefore lets constify it.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
No device driver sets the unbind flag in current public code base.
Therefore it is good time to remove the unused dead code.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Some platform like octeontx may use pci and
vdev based combined device to represent a logical
dpdk functional device.In such case, postponing the
vdev initialization after pci device
initialization will provide the better view of
the pci device resources in the system in
vdev's probe function, and it allows better
functional subsystem registration in vdev probe
function.
As a bonus, This patch fixes a bond device
initialization use case.
example command to reproduce the issue:
./testpmd -c 0x2 --vdev 'eth_bond0,mode=0,
slave=0000:02:00.0,slave=0000:03:00.0' --
--port-topology=chained
root cause:
In existing case(vdev initialization and then pci
initialization), creates three Ethernet ports with
following port ids
0 - Bond device
1 - PCI device 0
2 - PCI devive 1
Since testpmd, calls the configure/start on all the ports on
start up,it will translate to following illegal setup sequence
1)bond device configure/start
1.1) pci device0 stop/configure/start
1.2) pci device1 stop/configure/start
2)pci device 0 configure(illegal setup case,
as device in start state)
The fix changes the initialization sequence and
allow initialization in following valid setup order
1) pcie device 0 configure/start
2) pcie device 1 configure/start
3) bond device 2 configure/start
3.1) pcie device 0/stop/configure/start
3.2) pcie device 1/stop/configure/start
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
In function pci_mknod_uio_dev() in lib/librte_eal/eal/eal_pci_uio.c,
The return value of mknod() is ret, not f got by fopen().
So the value of ret should be checked for mknod().
Fixes: f7f97c16048e ("pci: add option --create-uio-dev to run without hotplug")
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Before this patch, application-specific loggers could not be
installed before rte_eal_init completed (the initialization process
called rte_openlog_stream, overwriting any previously installed
logger). This made it impossible for an application to capture the
initial log messages generated during rte_eal_init. This patch changes
initialization so that information from a previous call to
rte_openlog_stream is not lost. Specifically:
* The default log stream is now maintained separately from an
application-specific log stream installed with rte_openlog_stream.
* rte_eal_common_log_init has been renamed to eal_log_set_default,
since this is all it does. It no longer invokes rte_openlog_stream; it
just updates the default stream. Also, this method now returns void,
rather than int, since there are no errors.
This patch also removes the "early log" mechanism and cleans up the
log initialization mechanism:
* The default log stream defaults to stderr on all platforms if
eal_log_set_default hasn't been invoked (Linux used to use stdout
during the first part of initialization).
* Removed rte_eal_log_early_init; all of the desired functionality can
be achieved by calling eal_log_set_default.
* Removed lib/librte_eal/bsdapp/eal/eal_log.c: it contained only one
function, rte_eal_log_init, which is not needed or invoked for BSD.
* Removed declaration for eal_default_log_stream in rte_log.h (it's now
private to eal_common_log.c).
* Moved call to rte_eal_log_init earlier in rte_eal_init for Linux, so
that it starts using the preferrred log ASAP.
Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
When compiled with EXTRA_CFLAGS="-O1", the compiler is not
able to detect that size is always initialized when used, and
issues a wrong warning:
eal_memory.c: In function ‘rte_eal_hugepage_attach’:
eal_memory.c:1684:3: error: ‘size’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
munmap(hp, size);
^
Workaround this issue by initializing size to 0.
Seen on gcc (Debian 5.4.1-1) 5.4.1 20160803.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Running secondary is tricky due to the need to map the memory region
at the right place in VM, which is whatever primary has chosen. If the
base address for primary happens to by already mapped in the
secondary, we will hit precisely these error messages (depending if we
fail on the config region or the hugepages). This is why there is
already a comment about ASLR.
The issue is that in most cases, remapping does not happen and "errno"
is not changed and therefore stale. In our case, we got a "permission
denied", which sent us down the wrong track. It's such a common error
for secondary that I feel this error message should be unambiguous and
helpful.
The call to close was also moved because close() may override errno.
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Now that rte_device is available, drivers can start using its members
(numa, name) as well as link themselves into another rte_device list.
As of now no one is using this list, but can be used for moving over all
devices (pdev/vdev/Xdev) and perform bulk actions (like cleanup).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Reword commit log for extra rte_device list]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Move all PMD_VDEV-specific code into a separate module and header
file to not polute the generic code anymore. There is now a list
of virtual devices available.
The rte_vdev_driver integrates the original rte_driver inside
(C inheritance). The rte_driver will be however change in the
future to serve as a common base for all other types of drivers.
The existing PMDs (PMD_VDEV) are to be modified later (there is
no change for them at the moment).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Hotplug invocations, which deals with devices, should come from the layer
that already handles them, i.e. EAL.
For both attach and detach operations, 'name' is used to select the bus
that will handle the request.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
- Move rte_eth_dev_create_unique_device_name() from ether/rte_ethdev.c to
common/include/rte_pci.h as rte_eal_pci_device_name(). Being a common
method, can be used across crypto/net PCI PMDs.
- Remove crypto specific routine and fallback to common name function.
- Introduce a eal private Update function for PCI device naming.
Signed-off-by: David Marchand <david.marchand@6wind.com>
[Shreyansh: Merge crypto/pci helper patches]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
These lists can be initialized once and for all at build time.
With this, those lists are only manipulated in a common place
(and we could even make them private).
A nice side effect is that pci drivers can now register in constructors.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h
Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
An application might be linked to DPDK but not really use it,
so move the cpu flag check to the EAL initialization instead.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Aaron Conole <aconole@redhat.com>
In ASLR-enabled system, it is possible that selected
virtual space is occupied by program segments. Therefore,
error path should not blindly unmap all memmory segments
but only those already mapped.
Steps that lead to crash:
1. memeseg 0 in secondary process overlaps with libc.so
2. mmap of /dev/zero fails for virtual space of memseg 0
3. munmap of memseg 0 leads to unmapping libc.so itself
4. app gets SIGSEGV after returning from syscall to libc
Fixes: ea329d7f8e34 ("mem: fix leak after mapping failure")
Signed-off-by: Maciej Czekaj <maciej.czekaj@caviumnetworks.com>
RTE_EAL_SINGLE_FILE_SEGMENTS was introduced with ivshmem integration.
Now that ivshmem was removed (commit c711ccb30987)
and a simple git grep shows no one else references it;
I think we can now remove it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
When running single-core, some drivers tend to call rte_delay_us for a
long time, and that is causing packet drops.
To avoid this, rte_delay_us can be replaced with user-defined delay
function with:
void rte_delay_us_callback_register(void(*userfunc)(unsigned));
When userfunc==rte_delay_us_block build-in blocking delay function is
restored.
Signed-off-by: Jozef Martiniak <jozmarti@cisco.com>
Use mempool buf_addr and buf_physaddr fields for address translation.
Since each mbuf address calculated separately, the restriction of all
mbufs should come from a continuous memory restriction is no more valid.
mbuf related FIFO's content changed, rx_q and alloc_q now carries
physical address of mbufs. tx_q and free_q content not changed, they
still carries virtual address of mbufs.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
There is no need for the page files to be readable (and executable) by
other users. This can be exploited by non-privileged users to access the
working memory of a DPDK app.
Open the files with 0600.
Signed-off-by: Robin Jarry <robin.jarry@6wind.com>
Exported header files used by applications should allow the strictest
compiler flags. Language extensions used in many places must be explicitly
marked to avoid warnings and compilation failures.
Unnamed structs/unions are allowed since C11, however many compiler
versions do not use this mode by default.
This commit prevents the following errors:
error: ISO C99 doesn't support unnamed structs/unions
error: struct has no named members
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>