14335 Commits

Author SHA1 Message Date
Nelio Laranjeiro
cd24d52639 net/mlx5: add mark/flag flow action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
89464c8e89 net/mlx5: add flow TCP item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
535f686e54 net/mlx5: add flow UDP item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
62b2c4d925 net/mlx5: add flow IPv6 item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
4899185ff9 net/mlx5: add flow IPv4 item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
109723ed9b net/mlx5: add flow VLAN item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
5747b170b3 net/mlx5: add flow stop/start
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
9944ab8a13 net/mlx5: add flow queue action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
af689f1f04 net/mlx5: support flow Ethernet item along with drop action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
2815702bae net/mlx5: replace verbs priorities by flow
Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs.  Rename this to make more
sense on what the PMD has to translate.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
78be885295 net/mlx5: handle drop queues as regular queues
Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow.  Due to this, drop queues can be fully mapped on regular
queues.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
b42c000e37 net/mlx5: remove flow support
This start a series to re-work the flow engine in mlx5 to easily support
flow conversion to Verbs or TC.  This is necessary to handle both regular
flows and representors flows.

As the full file needs to be clean-up to re-write all items/actions
processing, this patch starts to disable the regular code and only let the
PMD to start in isolated mode.

After this patch flow API will not be usable.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Adrien Mazarguil
6de569f5ec net/mlx5: add parameter for port representors
Prior to this patch, all port representors detected on a given device were
probed and Ethernet devices instantiated for each of them.

This patch adds support for the standard "representor" parameter, which
implies that port representors are not probed by default anymore, except
for the list provided through device arguments.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:29 +02:00
Adrien Mazarguil
116f90ad7e net/mlx5: probe port representors in natural order
Port representors are probed in whatever unspecified order
ibv_get_device_list() returns them.

This is counterintuitive to users since DPDK port IDs assignment almost
never follows the same sequence as representor IDs. Additionally, the
master device does not necessarily inherit the lowest DPDK port ID.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:37:26 +02:00
Adrien Mazarguil
2b73026388 net/mlx5: probe all port representors
Probe existing port representors in addition to their master device and
associate them automatically.

To avoid collision between Ethernet devices, they are named as follows:

- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
  representors.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:19 +02:00
Adrien Mazarguil
26c08b979d net/mlx5: add port representor awareness
The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.

Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.

This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:

- There is only one matching device which isn't identified as a
  representor, in that case use it.
- Otherwise log an error and do not probe the device.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:14 +02:00
Adrien Mazarguil
681289345e net/mlx5: re-indent generic probing function
Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.

This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:10 +02:00
Adrien Mazarguil
f38c54571d net/mlx5: split PCI from generic probing
All the generic probing code needs is an IB device. While this device is
currently supplied by a PCI lookup, other methods will be added soon.

This patch divides the original function, which has become huge over time,
as follows:

1. PCI-specific (mlx5_pci_probe()).
2. Verbs device (mlx5_dev_spawn()).

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:03 +02:00
Adrien Mazarguil
9083982ce7 net/mlx5: drop useless support for several Verbs ports
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.

While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:36:55 +02:00
Adrien Mazarguil
3ff4b0866f net/mlx5: remove redundant objects in probe function
This patch gets rid of redundant calls to open the device and query its
attributes in order to simplify the code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:52 +02:00
Adrien Mazarguil
6057a10b3b net/mlx5: rename confusing object in probe function
There are several attribute objects in this function:

- IB device attributes (struct ibv_device_attr_ex device_attr).
- Direct Verbs attributes (struct mlx5dv_context attrs_out).
- Port attributes (struct ibv_port_attr).
- IB device attributes again (struct ibv_device_attr_ex device_attr_ex).

"attrs_out" is both odd and initialized using a nonstandard syntax. Rename
it "dv_attr" for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:46 +02:00
Moti Haimovsky
ba576975a8 net/mlx4: support hardware TSO
Implement support for hardware TSO.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2018-07-10 14:02:57 +02:00
Pablo de Lara
07581a7e4d test/power: fix 32-bit build
Compilation issue:

test/test/test_power_acpi_cpufreq.c:556:31:
error: format ‘%lx’ expects argument of type ‘long unsigned int’,
but argument 2 has type ‘uint64_t {aka long long unsigned int}’

  printf("ACPI: Capabilities %lx\n", caps.capabilities);
                             ~~^     ~~~~~~~~~~~~~~~~~
                             %llx

Fixes: 39e38d583075 ("test/power: add unit test for get capabilities API")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
2018-07-13 18:24:52 +02:00
Nelio Laranjeiro
449660994e ethdev: fix missing function in map file
Add rte_flow_expand_rss in map file and tag it as experimental.

Fixes: 4ed05fcd441b ("ethdev: add flow API to expand RSS flows")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-13 15:53:29 +02:00
Thomas Monjalon
ab8ed0e3d9 doc: fix lists in release notes
Some blank lines and hyphens are missing, so lists were badly
interpreted and rendered.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-13 15:36:51 +02:00
Anatoly Burakov
72b49ff623 mem: support --in-memory mode
Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:35:43 +02:00
Anatoly Burakov
14de8734c4 eal: add --in-memory option
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:35:26 +02:00
Anatoly Burakov
d435aad37d mem: support --huge-unlink mode
Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:34:17 +02:00
Anatoly Burakov
5cb42707bc eal: do not create runtime dir in --no-shconf mode
Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:33:51 +02:00
Anatoly Burakov
cb14962a00 eal: support --no-shconf in hugepage data file
Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:33:27 +02:00
Anatoly Burakov
7296447acb eal: support --no-shconf for hugepage info
Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:33:07 +02:00
Anatoly Burakov
5848e3d281 ipc: support --no-shconf mode
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:32:43 +02:00
Anatoly Burakov
3ee2cde248 fbarray: support --no-shconf mode
When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:32:05 +02:00
Anatoly Burakov
adf1d86736 eal: move runtime config file to new location
As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice and
update release notes to indicate the changes.

[1] http://dpdk.org/patch/40418

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 13:29:01 +02:00
Anatoly Burakov
3456b5ea1a doc: add IPC callback limitations
For asynchronous requests, user callback may be triggered either from
IPC thread or from interrupt thread. Because of this, delivery of
other interrupt-based events such as alarms may not be possible inside
the asynchronous IPC request callback handler. Document this
limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:42:05 +02:00
Anatoly Burakov
daf9bfca71 ipc: remove thread for async requests
Previously, we were using two IPC threads - one to handle messages
and synchronous requests, and another to handle asynchronous requests.
To handle replies for an async request, rte_mp_handle woke up the
rte_mp_handle_async thread to process through pthread_cond variable.

Change it to handle asynchronous messages within the main IPC thread.
To handle timeout events, for each async request which is sent,
we set an alarm for it. If its reply is received before timeout,
we will cancel the alarm when we handle the reply; otherwise,
alarm will invoke the async_reply_handle() as the alarm callback.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-13 12:41:34 +02:00
Jianfeng Tan
d74b7748d6 eal: bring forward init of interrupt handling
Next commit will make asynchronous IPC requests rely on alarm API,
which in turn relies on interrupts to work. Therefore, move the EAL
interrupt initialization before IPC initialization to avoid breaking
IPC in the next commit.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:41:15 +02:00
Anatoly Burakov
26021a7150 eal/bsd: support alarm API
Implement EAL alarm API support for FreeBSD. The implementation
is largely identical to that of Linux version, with one key
difference.

The alarm API is a little Linux-centric in that it is expecting
the alarm API to manage alarm timeouts without involvement of the
interrupt thread. This works on Linux because in Linux, there's
timerfd API which allows waiting for timer events on an fd.

On FreeBSD, however, there are no timerfd's, and timer events are
set up directly in kevent. There is no way to pass information from
the alarm API to the interrupt thread, so we also add a little
back-channel magic to get soonest alarm timeout from the alarm API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:40:45 +02:00
Anatoly Burakov
23150bd8d8 eal/bsd: add interrupt thread
Add interrupt thread to FreeBSD. It is largely a copy-paste from
Linuxapp interrupt thread, except for a few key differences:

* Use kevent instead of epoll
* Do not recreate the event queue on adding/removing interrupt
  sources, add/remove them to/from the queue on the fly instead
* No support for UIO/VFIO handles

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:40:36 +02:00
Jianfeng Tan
4bb69970af eal/linux: use libc malloc in interrupt handling
IPC uses interrupts API internally, and memory subsystem uses IPC.
Therefore, IPC should not use rte_malloc to avoid circular dependency.
Switch to using regular glibc malloc in interrupts API.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:40:25 +02:00
Jianfeng Tan
204df26c1b eal/linux: use libc malloc in alarm
Alarm API is going to be used by IPC internally. However, because
memory subsystem depends on IPC, alarm API cannot use rte_malloc as
it creates a circular dependency.

To avoid such chicken and egg problem, we change to use glibc malloc
in the alarm API.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:39:51 +02:00
Anatoly Burakov
c63a42535a vfio: fix uninitialized variable
Some static analyzers complain about it, even though
value is never used if not initialized. To avoid additional
false positives about a potential null-pointer dereferences,
also add a null-check.

Bugzilla ID: 58
Fixes: ea2dc1066870 ("vfio: add multi container support")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:56 +02:00
Anatoly Burakov
96712b33af eal/linux: fix uninitialized value
The value is not used, but some static analyzers may give out a
warning. Fix it by assigning default value of zero.

Bugzilla ID: 58
Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:43 +02:00
Anatoly Burakov
462dd3722e eal/linux: fix invalid syntax in interrupts
Parentheses were missing. It worked because macro is enclosed in
parentheses, so syntax was valid after macro expansion.

Bugzilla ID: 58
Fixes: 0a45657a6794 ("pci: rework interrupt handling")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:17 +02:00
Anatoly Burakov
e4348122a4 eal: add option to limit memory allocation on sockets
Previously, it was possible to limit maximum amount of memory
allowed for allocation by creating validator callbacks. Although a
powerful tool, it's a bit of a hassle and requires modifying the
application for it to work with DPDK example applications.

Fix this by adding a new parameter "--socket-limit", with syntax
similar to "--socket-mem", which would set per-socket memory
allocation limits, and set up a default validator callback to deny
all allocations above the limit.

This option is incompatible with legacy mode, as validator callbacks
are not supported there.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:15 +02:00
Anatoly Burakov
0b82bd7b24 memzone: improve zero-length reserve
Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.

First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.

Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.

Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.

Also, fixup unit tests to account for the new behavior.

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:27:30 +02:00
Anatoly Burakov
68b6092bd3 malloc: allow reserving biggest element
Add an internal-only function to allocate biggest element from
the heap. Nominally, it supports SOCKET_ID_ANY as its socket
argument, but it's essentially useless because other sockets
will only be allocated from if the entire heap on current or
specified socket is busy.

Still, asking to reserve a biggest element will allow fixing
race condition in memzone reserve that has been there for a
long time.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
2018-07-13 11:27:27 +02:00
Anatoly Burakov
9fe6bceafd malloc: add finding biggest free IOVA-contiguous element
Adding internal-only function to find biggest free IOVA-contiguous
malloc element. This is not exposed to external API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-07-13 11:23:07 +02:00
Anatoly Burakov
e43a9f52b7 malloc: fix pad erasing
Previously, when joining adjacent free elements, we were erasing
trailer and header, but did not erase the padding. Fix this by
accounting for padding on erase, and do not erase padding twice
by adjusting data pointer and data len to not include padding.

Fixes: bb372060dad4 ("malloc: make heap a doubly-linked list")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:30 +02:00
Anatoly Burakov
e26415428f mem: provide thread-unsafe memseg list walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_list_walk() inside callbacks.

Also, remove existing reimplementation from memalloc code and use
the new API instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:25 +02:00