Commit Graph

66 Commits

Author SHA1 Message Date
Qi Zhang
55e411b301 bus/pci: fix resource mapping override
When scanning an already plugged device, the virtual address
of mapped PCI resource in rte_pci_device will be overridden
with 0, that may cause driver does not work correctly.
The fix is not to update any rte_pci_device's field if the being
scanned device's driver is already probed.

Bugzilla ID: 85
Fixes: c752998b5e ("pci: introduce library and driver")
Cc: stable@dpdk.org

Reported-by: Geoffrey Lv <geoffrey.lv@gmail.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-10-31 19:43:34 +01:00
Darek Stojaczyk
3f2ef27972 bus/pci: propagate probing error codes
In a couple of places we check its error code against -EEXIST,
but this function returned either -1, 0, or 1.

This gets critical when hotplugging a device in secondary
process, while the same device is already plugged in the
primary. Failing to "hotplug" it in the primary will cause
the secondary to fail as well.

Fixes: e9d159c3d5 ("eal: allow probing a device again")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-29 01:59:48 +01:00
Darek Stojaczyk
d59ba0296e vfio: fix interrupt unregister for hotplug notifier
This function is documented to return the number of unregistered
callbacks or negative numbers on error, but pci_vfio checks for
ret != 0 to detect failures. Not anymore.

Fixes: c115fd000c ("vfio: handle hotplug request notifier")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:59:48 +01:00
Alejandro Lucero
630deed612 bus/pci: compare kernel driver instead of interrupt handler
Invoking the right pci read/write functions is based on interrupt
handler type. However, this is not configured for secondary processes
precluding to use those functions.

This patch fixes the issue using the driver name the device is bound
to instead.

Fixes: 632b2d1dee ("eal: provide functions to access PCI config")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:02:32 +01:00
Luca Boccassi
e8d435f1f3 bus/pci: harmonize return value of config read
On Linux, rte_pci_read_config on success returns the number of read
bytes, but on BSD it returns 0.
Document the return values, and have BSD behave as Linux does.

At least one case (bnx2x PMD) treats 0 as an error, so the change
makes sense also for that.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-29 00:32:14 +01:00
Alejandro Lucero
fe822eb8c5 bus/pci: use IOVA DMA mask check when setting IOVA mode
Currently the code precludes IOVA mode if IOMMU hardware reports
less addressing bits than necessary for full virtual memory range.

Although VT-d emulation currently only supports 39 bits, it could
be iovas for allocated memlory being within that supported range.
This patch allows IOVA mode in such a case adding a call to
rte_eal_check_dma_mask using the reported addressing bits by the
IOMMU hardware.

Indeed, memory initialization code has been modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most systems. And this is likely 100% true for VMs.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-28 22:06:33 +01:00
Alejandro Lucero
f74d50a7df bus/pci: check IOMMU addressing limitation just once
Current code checks if IOMMU hardware reports enough addressing
bits for using IOVA mode but it repeats the same check for any
PCI device present. This is not necessary because the IOMMU hardware
is the same for all of them.

This patch only checks the IOMMU using first PCI device found.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-28 22:06:15 +01:00
Anatoly Burakov
edd035d241 vfio: improve musl compatibility
Musl already has PAGE_SIZE defined, and our define clashed with it.
Rename our define to SYS_PAGE_SIZE.

Bugzilla ID: 36

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 11:28:35 +02:00
Anatoly Burakov
5d7b673d5f mk: build with _GNU_SOURCE defined by default
We use _GNU_SOURCE all over the place, but often times we miss
defining it, resulting in broken builds on musl. Rather than
fixing every library's and driver's and application's makefile,
fix it by simply defining _GNU_SOURCE by default for all
builds.

Remove all usages of _GNU_SOURCE in source files and makefiles,
and also fixup a couple of instances of using __USE_GNU instead
of _GNU_SOURCE.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-22 11:28:27 +02:00
Thomas Monjalon
739e13bcc9 devargs: fix freeing during device removal
After calling unplug function of a bus, the device is expected
to be freed. It is too late for getting devargs to remove.
Anyway, the buses which implement unplug are already freeing
the devargs, except the PCI bus.
So the call to rte_devargs_remove() is removed from EAL and
added in PCI.

Fixes: 2effa126fb ("devargs: simplify parameters of removal function")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-19 22:37:10 +02:00
Thomas Monjalon
e9d159c3d5 eal: allow probing a device again
In the devargs syntax for device representors, it is possible to add
several devices at once: -w dbdf,representor=[0-3]
It will become a more frequent case when introducing wildcards
and ranges in the new devargs syntax.

If a devargs string is provided for probing, and updated with a bigger
range for a new probing, then we do not want it to fail because
part of this range was already probed previously.
There can be new ports to create from an existing rte_device.

That's why the check for an already probed device
is moved as bus responsibility.
In the case of vdev, a global check is kept in insert_vdev(),
assuming that a vdev will always have only one port.
In the case of ifpga and vmbus, already probed devices are checked.
In the case of NXP buses, the probing is done only once (no hotplug),
though a check is added at bus level for consistency.
In the case of PCI, a driver flag is added to allow PMD probing again.
Only the PMD knows the ports attached to one rte_device.

As another consequence of being able to probe in several steps,
the field rte_device.devargs must not be considered as a full
representation of the rte_device, but only the latest probing args.
Anyway, the field rte_device.devargs is used only for probing.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-18 01:49:52 +02:00
Thomas Monjalon
52897e7e70 eal: add function to query device status
The function rte_dev_is_probed() is added in order to improve semantic
and enforce proper check of the probing status of a device.

It will answer this rte_device query:
Is it already successfully probed or not?

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-18 01:49:28 +02:00
Thomas Monjalon
391797f042 drivers/bus: move driver assignment to end of probing
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa ("bus/pci: reference driver structure before mapping")

However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
2018-10-17 10:26:59 +02:00
Jerin Jacob
94d7265976 vfio: fix missing header inclusion
The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE
and used in the below files.

drivers/bus/pci/linux/pci_vfio.c
drivers/bus/pci/pci_common.c
lib/librte_eal/linuxapp/eal/eal_interrupts.c

However, Except the first file, the change missed to include
<rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined.
This creates runtime following error on vfio-pci mode and
kernel >= 4.0.0 combination.

EAL: [rte_intr_enable] Unknown handle type of fd 95
EAL: [pci_vfio_enable_notifier]Fail to enable req notifier.
EAL: Fail to unregister req notifier handler.
EAL: Error setting up notifier!
EAL: Requested device 0000:07:00.1 cannot be used

Fixes: cda9441996 ("vfio: fix build with Linux < 4.0")

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-10-17 10:16:18 +02:00
Jeff Guo
cda9441996 vfio: fix build with Linux < 4.0
Since the older kernel version do not implement the device request
interface for vfio, so when build on the kernel < v4.0.0, which is
the version begin to add the device request interface, it will
throw the error to show “VFIO_PCI_REQ_IRQ_INDEX” is undeclared.
This patch aim to fix this compile issue by add the macro
“HAVE_VFIO_DEV_REQ_INTERFACE” after checking the kernel version.

Fixes: 0eb8a1c4c7 ("vfio: add request notifier interrupt")
Fixes: c115fd000c ("vfio: handle hotplug request notifier")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-16 14:54:25 +02:00
Jeff Guo
c115fd000c vfio: handle hotplug request notifier
When device is be hot-unplugged, the vfio kernel module will sent req
notifier to request user space to release the allocated resources at
first. After that, vfio kernel module will detect the device disappear,
and then delete the device in kernel.

This patch aim to add req notifier processing to enable hotplug for vfio.
By enable the req notifier monitoring and register the notifier callback,
when device be hot-unplugged, the hot-unplug handler will be called to
process hotplug for vfio.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-15 23:42:15 +02:00
Jeff Guo
44c976236e bus/pci: add VFIO request interrupt handle to device
There are some extended interrupt types in vfio pci device except from the
existing interrupts, such as err and req notifier, they could be useful for
device error monitoring. And these corresponding interrupt handler is
different from the other interrupt handler that register in PMDs, so a new
interrupt handler should be added. This patch will add specific req handler
in generic pci device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-15 23:13:31 +02:00
Jeff Guo
5c96a29934 bus/pci: support sigbus handler
This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:17:26 +02:00
Jeff Guo
b01dc3da88 bus/pci: support hot-unplug handler
This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:16:48 +02:00
Thomas Monjalon
6844d146ff eal: add bus pointer in device structure
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.

A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.

It will make possible to remove a rte_device,
using the function pointer from its bus.

The function rte_bus_find_by_device() becomes useless,
and may be removed later.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-11 14:09:24 +02:00
Thomas Monjalon
3f7a40c670 devargs: rename enum items with singular form
The enum names are *_params (plural form).
And the items are also using the plural form: *_PARAMS_*.
It looks more natural to use the singular form *_PARAM_* for items.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 13:57:29 +02:00
Thomas Monjalon
4e7a69c9d9 bus/pci: rename devargs parameter id to addr
We could match devices by their PCI id (vendor id, device id, etc).
But for now, only matching by PCI address is implemented.
The devargs parameter "id" is renamed "addr" to reflect its real meaning.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 13:57:29 +02:00
Anatoly Burakov
4104b2a485 mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-11 10:24:16 +02:00
Anatoly Burakov
03ba15ca65 vfio: allow mapping MSI-X BARs if kernel allows it
Currently, DPDK will skip mapping some areas (or even an entire BAR)
if MSI-X table happens to be in them but is smaller than page size.

Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this
as a capability flag. Capability flags themselves are also only
supported since kernel 4.6 [2].

This commit will introduce support for checking VFIO capabilities,
and will use it to check if we are allowed to map BARs with MSI-X
tables in them, along with backwards compatibility for older
kernels, including a workaround for a variable rename in VFIO
region info structure [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279

[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:45:50 +02:00
Gaetan Rivet
4410c1b0c0 bus/pci: add iteration filter on address
The PCI bus can now parse a matching field "id" as follows:

   "bus=pci,id=0000:00:00.0"

           or

   "bus=pci,id=00:00.0"

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-03 14:20:07 +02:00
Gaetan Rivet
46521ca27b bus/pci: implement device iteration
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-03 14:19:58 +02:00
Rami Rosen
95b01d8cdc bus/pci: remove unneeded EAL private include
This trivial patch removes an uneeded include
from drivers/bus/pci/linux/pci.c.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
2018-08-09 17:47:53 +02:00
Rami Rosen
40ee5c60f0 bus/pci: remove useless forward declaration
This patch removes the forward declaration of rte_pci_remove_device()
method. In the past, this forward decalaration was needed for
rte_pci_detach(), which is now removed from pci_common.c.

Fixes: e690338a7b ("bus/pci: remove unused function to detach by address")

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
2018-08-09 17:43:53 +02:00
Anatoly Burakov
45fdc3ed87 vfio: revert retry logic for MSI-X BAR mapping
This reverts commit d4774a568b.

The patch is incomplete because kernel 4.16+, while being capable
of mapping MSI-X BARs, will also report if such a capability is
available. Without checking this capability, gratuitous errors
are displayed on kernels <4.16 while VFIO is attempting to mmap
MSI-X BAR and fails, which can be confusing to the user.

Fixes: d4774a568b ("vfio: fix workaround of BAR mapping")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-08-01 17:51:04 +02:00
Rami Rosen
73d1562103 bus/pci: fix a typo
This patch fixes a trivial typo in pci_common.c.

Fixes: 23eaa9059e ("bus/pci: use given name as generic name")
Cc: stable@dpdk.org

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2018-07-26 21:23:16 +02:00
Takeshi Yoshimura
d4774a568b vfio: fix workaround of BAR mapping
Currently, VFIO will try to map around MSI-X table in the BARs. When
MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
VFIO will just skip the BAR.

Recent kernel versions will allow VFIO to map the entire BAR containing
MSI-X tables (*), so instead of trying to map around the MSI-X vector
or skipping the BAR entirely if it's not possible, we can now try
mapping the entire BAR first. If mapping the entire BAR doesn't
succeed, fall back to the old behavior of mapping around MSI-X table or
skipping the BAR.

(*): "vfio-pci: Allow mapping MSIX BAR",
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

Fixes: 90a1633b23 ("eal/linux: allow to map BARs with MSI-X tables")

Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-26 11:34:08 +02:00
Qi Zhang
ab53203e19 vfio: enable unmapping resource for secondary
Subroutine to unmap VFIO resource is shared by secondary and
primary, and it does not work on the secondary process. Since
for secondary process, it is not necessary to close interrupt
handler, set pci bus mastering and remove vfio_res from
vfio_res_list. So, the patch adds a dedicate function to handle
the situation when a device is unmapped on a secondary process.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-20 14:26:16 +02:00
Qi Zhang
2a3de3710f vfio: fix PCI address comparison
When use memcmp to compare two PCI address, sizeof(struct rte_pci_addr)
is 4 bytes aligned, and it is 8. While only 7 byte of struct rte_pci_addr
is valid. So compare the 8th byte will cause the unexpected result, which
happens when repeatedly attach/detach a device.

Fixes: 94c0776b1b ("vfio: support hotplug")
Cc: stable@dpdk.org

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-07-20 14:26:16 +02:00
Thomas Monjalon
f8e9989606 remove useless constructor headers
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-12 00:00:35 +02:00
Rafal Kozik
4a928ef9f6 bus/pci: enable write combining during mapping
Write combining (WC) increases NIC performance by making better
utilization of PCI bus, but cannot be used by all PMDs.

It will be enabled only if RTE_PCI_DRV_WC_ACTIVATE will be set in
drivers flags. For proper work also igb_uio driver must be loaded with
wc_activate set to 1.

When mapping PCI resources, firstly check if it support WC
and then try to use it.
In case of failure, it will fallback to normal mode.

Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-06-30 00:12:58 +02:00
Rafal Kozik
1d20a073fa bus/pci: reference driver structure before mapping
Add pointer to driver structure before calling rte_pci_map_device.
It allows to use driver flags for adjusting configuration.

Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-06-30 00:12:58 +02:00
Rami Rosen
e690338a7b bus/pci: remove unused function to detach by address
The function rte_pci_detach() is private to PCI and is
not used anywhere in current code base. Remove dead code.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
2018-06-27 22:54:08 +02:00
Stephen Hemminger
f3bac43b60 bus/pci: remove unused function to probe by address
The function rte_pci_probe_one is private to PCI and is
not used anywhere in current code base. Remove dead code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-27 22:53:55 +02:00
Stephen Hemminger
5417dfc984 bus/pci: remove unused unbind function prototype
Prototype for pci_unbind_kernel_driver exists but no code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-06-27 22:53:38 +02:00
Stephen Hemminger
607514e729 bus/pci: make remove function static
Only used in one file, and therefore can be made static.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-06-27 22:53:34 +02:00
Olivier Matz
f83a3d3fa8 use SPDX tag for 6WIND copyrighted files
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2018-05-25 10:47:06 +02:00
Andy Green
52f711f7b8 bus/pci: fix size of driver name buffer
Variable dri_name is a pointer and it is incorrect to use its
size as the buffer size. Caller knows the buffer size and
it is safer to pass it explicitly.

Fixes: fe5f777b53 ("bus/pci: replace strncpy by strlcpy")
Cc: stable@dpdk.org

Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-05-15 15:19:13 +02:00
Andy Green
fe5f777b53 bus/pci: replace strncpy by strlcpy
In function ‘pci_get_kernel_driver_by_path’,
    inlined from ‘pci_scan_one.isra.1’ at
	drivers/bus/pci/linux/pci.c:317:8:
drivers/bus/pci/linux/pci.c:57:3: error:
‘strncpy’ specified bound depends on the length of the source argument
[-Werror=stringop-overflow=]
   strncpy(dri_name, name + 1, strlen(name + 1) + 1);

Fixes: d9a8cd9595 ("pci: add kernel driver type")
Cc: stable@dpdk.org

Signed-off-by: Andy Green <andy@warmcat.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-05-14 23:32:23 +02:00
Gaetan Rivet
64de7e4069 bus/pci: fix find device implementation
If start is set, and a device before it matches the data
passed for comparison, then this first device is returned.

This induces potentially infinite loops.

Fixes: c7fe1eea8a ("bus: simplify finding starting point")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-04-27 16:31:44 +02:00
Gaetan Rivet
7765f0f408 bus/pci: do not reference devargs list
This list should not be used by drivers.
Use the public API instead.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:58:10 +02:00
Anatoly Burakov
66cc45e293 mem: replace memseg with memseg lists
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.

In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.

So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.

Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.

This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.

On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.

The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.

Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.

[1] http://dpdk.org/dev/patchwork/patch/34002/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:39 +02:00
Anatoly Burakov
7411d03249 bus/pci: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:48:10 +02:00
Olivier Matz
fd4ab1fe9c bus/pci: use SPDX tags in 6WIND copyrighted files
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:32:52 +01:00
Bruce Richardson
6c9457c279 build: replace license text with SPDX tag
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
04c5af4272 bus/pci: build with meson
Many drivers across the various device types rely on PCI infrastructure,
so the bus drivers should be the first driver class built.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-30 17:49:16 +01:00