Commit Graph

5001 Commits

Author SHA1 Message Date
Dan Gora
c6fd54f28c kni: add function to set link state on kernel interface
Add a new API function to KNI, rte_kni_update_link() to allow DPDK
applications to update the link status for KNI network interfaces in
the linux kernel.

Signed-off-by: Dan Gora <dg@adax.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-26 19:46:15 +02:00
Phil Yang
fd5f33323e kni: introduce C11 atomic into FIFO synchronization
Syncing the values by adding c11 atomic memory barriers to make sure
the values being synced before updating fifo_write and fifo_read.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-26 18:10:14 +02:00
Phil Yang
711859cd0d kni: fix kernel FIFO synchronization
Adding memory barrier to make sure the values being synced
before updating fifo_write in kni_fifo_put and fifo_read in
kni_fifo_get.

Fixes: 3fc5ca2f63 ("kni: initial import")
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-26 18:10:14 +02:00
Phil Yang
0b05abe7bf kni: fix FIFO synchronization
With existing code in kni_fifo_put, rx_q values are not being updated
before updating fifo_write. While reading rx_q in kni_net_rx_normal,
This is causing the sync issue on other core. The same situation happens
in kni_fifo_get as well.

So syncing the values by adding memory barriers to make sure the values
being synced before updating fifo_write and fifo_read.

Fixes: 3fc5ca2f63 ("kni: initial import")
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-26 18:10:14 +02:00
Phil Yang
ede56cc18d config: rename option for C11 memory model
Keep only single config option RTE_USE_C11_MEM_MODEL for C11 memory
model, so all modules can leverage C11 atomic extension by enable this
option.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-26 18:09:22 +02:00
David Hunt
31259a3376 power: fix traffic aware build
1. %ld to PRId64 for 32-bit builds
2. Fix dependency on librte_timer

Fixes: 450f079131 ("power: add traffic pattern aware power control")

Signed-off-by: David Hunt <david.hunt@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-26 14:51:36 +02:00
Jerin Jacob
1f8494f002 eal/ppc: support pause API
Add support for rte_pause() implementation for ppc64.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2018-10-26 14:37:56 +02:00
Honnappa Nagarahalli
e605a1d36c hash: add lock-free r/w concurrency
Add lock-free read-write concurrency. This is achieved by the
following changes.

1) Add memory ordering to avoid race conditions. The only race
condition that can occur is -  using the key store element
before the key write is completed. Hence, while inserting the element
the release memory order is used. Any other race condition is caught
by the key comparison. Memory orderings are added only where needed.
For ex: reads in the writer's context do not need memory ordering
as there is a single writer.

key_idx in the bucket entry and pdata in the key store element are
used for synchronisation. key_idx is used to release an inserted
entry in the bucket to the reader. Use of pdata for synchronisation
is required due to updation of an existing entry where-in only
the pdata is updated without updating key_idx.

2) Reader-writer concurrency issue, caused by moving the keys
to their alternative locations during key insert, is solved
by introducing a global counter(tbl_chng_cnt) indicating a
change in table.

3) Add the flag to enable reader-writer concurrency during
run time.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:50:43 +02:00
Honnappa Nagarahalli
dbdbc4a2e9 hash: fix key store element alignment
Fix the key store array element alignment such that every array
element is aligned on KEY_ALIGNMENT boundary. This is required to
make 'pdata' in 'struct rte_hash_key' align on its natural boundary
for atomic load/store.

Fixes: 473d1bebce ("hash: allow to store data in hash table")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:45:40 +02:00
Honnappa Nagarahalli
9d033dac7d hash: support no free on delete
rte_hash_lookup_xxx APIs return the index of slot in
the key store. Application(reader) can use that index to reference
other data structures in its scope. Because of this, the
index should not be freed till the application completes
using the index.
RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this.
When this flag is enabled rte_hash_del_xxx APIs do not free the
key-store index/internal memory associated with the deleted
entry. The new API rte_hash_free_key_with_position should be called
to free the key-store index/internal memory after calling
rte_hash_del_xxx APIs.

Suggested-by: Yipeng Wang <yipeng1.wang@intel.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:44:52 +02:00
Honnappa Nagarahalli
40f8e9c28c hash: separate multi-writer from r/w concurrency
RW concurrency is required with single writer and multiple reader
usecase as well. Hence, multi-writer should not be enabled by default when
RW concurrency is enabled.

Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:43:52 +02:00
David Hunt
757bf2e7cf lib/power: add changes for host commands/policies
This patch does a couple of things:
  * Adds a new message type for removing policies (PKT_POLICY_REMOVE)
    Used when we want to remove a previously created policy.
  * Adds a core_type bool to the channel packet struct to specify whether
    the type of core we want to control is virtual or physical.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-26 10:48:15 +02:00
Liang Ma
450f079131 power: add traffic pattern aware power control
1. Abstract

For packet processing workloads such as DPDK polling is continuous.
This means CPU cores always show 100% busy independent of how much work
those cores are doing. It is critical to accurately determine how busy
a core is hugely important for the following reasons:

   * No indication of overload conditions.

   * User does not know how much real load is on a system, resulting
     in wasted energy as no power management is utilized.

Compared to the original l3fwd-power design, instead of going to sleep
after detecting an empty poll, the new mechanism just lowers the core
frequency. As a result, the application does not stop polling the device,
which leads to improved handling of bursts of traffic.

When the system become busy, the empty poll mechanism can also increase the
core frequency (including turbo) to do best effort for intensive traffic.
This gives us more flexible and balanced traffic awareness over the
standard l3fwd-power application.

2. Proposed solution

The proposed solution focuses on how many times empty polls are executed.
The less the number of empty polls, means current core is busy with
processing workload, therefore, the higher frequency is needed. The high
empty poll number indicates the current core not doing any real work
therefore, we can lower the frequency to safe power.

In the current implementation, each core has 1 empty-poll counter which
assume 1 core is dedicated to 1 queue. This will need to be expanded in the
future to support multiple queues per core.

2.1 Power state definition:

	LOW:  Not currently used, reserved for future use.

	MED:  the frequency is used to process modest traffic workload.

	HIGH: the frequency is used to process busy traffic workload.

2.2 There are two phases to establish the power management system:

	a.Initialization/Training phase. The training phase is necessary
	  in order to figure out the system polling baseline numbers from
	  idle to busy. The highest poll count will be during idle, where
	  all polls are empty. These poll counts will be different between
	  systems due to the many possible processor micro-arch, cache
	  and device configurations, hence the training phase.
	  In the training phase, traffic is blocked so the training
	  algorithm can average the empty-poll numbers for the LOW, MED and
	  HIGH  power states in order to create a baseline.
	  The core's counter are collected every 10ms, and the Training
	  phase will take 2 seconds.
	  Training is disabled as default configuration. The default
	  parameter is applied. Sample App still can trigger training
	  if that's needed. Once the training phase has been executed once on
	  a system, the application can then be started with the relevant
	  thresholds provided on the command line, allowing the application
	  to start passing start traffic immediately

	b.Normal phase. Traffic starts immediately based on the default
	  thresholds, or based on the user supplied thresholds via the
	  command line parameters. The run-time poll counts are compared with
	  the baseline and the decision will be taken to move to MED power
	  state or HIGH power state. The counters are calculated every 10ms.

3. Proposed  API

1.  rte_power_empty_poll_stat_init(struct ep_params **eptr,
		uint8_t *freq_tlb, struct ep_policy *policy);
which is used to initialize the power management system.
 
2.  rte_power_empty_poll_stat_free(void);
which is used to free the resource hold by power management system.
 
3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
which is used to update specific core empty poll counter, not thread safe
 
4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
which is used to update specific core valid poll counter, not thread safe
 
5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core empty poll counter.
 
6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core valid poll counter.

7.  rte_empty_poll_detection(struct rte_timer *tim, void *arg);
which is used to detect empty poll state changes then take action.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2018-10-26 01:55:07 +02:00
Yipeng Wang
c7d93df552 hash: use partial-key hashing
This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 01:04:33 +02:00
Yipeng Wang
75706568a7 hash: add extendable bucket feature
In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.

Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 01:04:33 +02:00
Yipeng Wang
9904094344 hash: fix race condition in iterate
In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.

Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 00:33:51 +02:00
Yipeng Wang
86c1ef2090 hash: remove unused constant
Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.

Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 00:00:16 +02:00
Jerin Jacob
a5cba5a2c4 mbuf: add IGMP packet type
Add support for IGMP packet type.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-10-25 15:51:16 +02:00
Jerin Jacob
8e255bdb1b mbuf: add MPLS packet type
Add support of MPLS packet type.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-10-25 15:48:30 +02:00
Jerin Jacob
07e70104e0 mbuf: add FCoE packet type
Add support of FCoE packet type.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-10-25 15:48:24 +02:00
Ferruh Yigit
73aa5c1332 ring: add library version to meson build
Fixes: a3d6026711 ("ring: relax alignment constraint on ring structure")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-10-25 14:30:05 +02:00
Ferruh Yigit
63f54e650a mbuf: fix library version on meson build
Fixes: d27a626187 ("mbuf: remove control mbuf")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-10-25 14:29:22 +02:00
Rami Rosen
2b4aa851ec bpf: fix a typo
This trivial patch fixes a typo in rte_bpf_ethdev.h,

Fixes: a93ff62a89 ("bpf: introduce basic Rx/Tx filters")
Cc: stable@dpdk.org

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-25 11:27:49 +02:00
Reshma Pattan
77b7485af7 latency: fix timestamp marking and latency calculation
Latency calculation logic is not correct for the case where
packets gets dropped before TX. As for the dropped packets,
the timestamp is not cleared, and such packets still gets
counted for latency calculation in next runs, that will result
in inaccurate latency measurement.

So fix this issue as below,

Before setting timestamp in mbuf, check mbuf don't have
any prior valid time stamp flag set and after marking
the timestamp, set mbuf flags to indicate timestamp is
valid.

Before calculating timestamp check mbuf flags are set to
indicate timestamp is valid.

With the above logic it is guaranteed that correct timestamps
have been used.

Fixes: 5cd3cac9ed ("latency: added new library for latency stats")
Cc: stable@dpdk.org

Reported-by: Bao-Long Tran <longtb5@viettel.com.vn>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Tested-by: Bao-Long Tran <longtb5@viettel.com.vn>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-25 10:30:13 +02:00
Paul Luse
66fd3a3b0f bus/vdev: fix multi-process IPC buffer leak on scan
This patch fixes an issue caught with ASAN where a vdev_scan()
to a secondary bus was failing to free some memory.

The doxygen comment in EAL is fixed at the same time.

Fixes: cdb068f031 ("bus/vdev: scan by multi-process channel")
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Paul Luse <paul.e.luse@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-25 10:28:13 +02:00
Gaetan Rivet
97e476ad7c devargs: fix variadic parsing memory leak
rte_devargs_parsef will leak memory each time it is called.
The device string must be freed.

Fixes: a23bc2c4e0 ("devargs: add non-variadic parsing function")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-25 08:54:25 +02:00
Keith Wiles
81bede55e3 eal: add macro for attribute weak
eal: add shorthand __rte_weak macro
qat: update code to use __rte_weak macro
avf: update code to use __rte_weak macro
fm10k: update code to use __rte_weak macro
i40e: update code to use __rte_weak macro
ixgbe: update code to use __rte_weak macro
mlx5: update code to use __rte_weak macro
virtio: update code to use __rte_weak macro
acl: update code to use __rte_weak macro
bpf: update code to use __rte_weak macro

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-25 02:11:23 +02:00
Stephen Hemminger
dea302eb4e eal/arm: remove profanity in comment
Update comment to describe the problem better without
risk of being offensive.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-25 02:11:22 +02:00
Stephen Hemminger
4f06c51c7e eal/linux: eliminate cast of HPET thread signature
The cast of hpet_msb_inc is causing a warning in some compilations.
Yet the cast is unnecessary, the function is used only one place
just use the correct signature.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-25 02:11:22 +02:00
Stephen Hemminger
08e348daab eal: remove double space in init alert messages
rte_init_alert already adds a newline, don't do it twice.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-25 02:11:22 +02:00
Akhil Goyal
8b593b8cbf security: support PDCP
Packet Data Convergence Protocol (PDCP) is added in rte_security
for 3GPP TS 36.323 for LTE.

The patchset provide the structure definitions for configuring the
PDCP sessions and relevant documentation is added.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
2018-10-24 15:12:33 +02:00
Dariusz Stojaczyk
b4f62e5862 ipc: fix undefined behavior in no-shconf mode
In no-shconf mode the rte_mp_request_sync() wasn't initializing
the `reply` parameter, which contained e.g. a number of sent
requests. Callers of rte_mp_request_sync() might check that
param afterwards and might read potentially unitialized memory.

The no-shconf check that makes us return early (with rc = 0) was
placed before the `reply` initialization. Fix this by making the
`reply` initialization occur first.

Fixes: 5848e3d281 ("ipc: support --no-shconf mode")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-24 21:49:57 +02:00
Thomas Monjalon
25495407cb kvargs: fix processing a null list
In the doxygen description of rte_kvargs_process(), it is said:
	If *kvlist* is NULL function does nothing.
It has been added by mistake here instead of rte_kvargs_free().
Anyway, null list should be correctly handled in both functions.

Comments are fixed in both functions and NULL handling is added
to rte_kvargs_process().

Fixes: c34af7424e ("kvargs: fix freeing behaviour for null")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-10-24 15:06:46 +02:00
Anatoly Burakov
198b66b946 mem: fix resource leak
Segment preallocation code allocates an array of structures on the
heap but does not free the memory afterwards. Fix it by freeing it
at the end of the function, and changing control flow to always go
through that code path.

Coverity issue: 323524
Fixes: 1dd342d0fd ("mem: improve segment list preallocation")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-23 11:36:48 +02:00
Qi Zhang
4dc3db031d eal: fix bus name read for removal in multi-process
A crash may appear when removing some PCI devices because
dev->devargs is not always initialized. So use dev->bus instead of
dev->devargs->bus when building devargs string to remove a device.

Fixes: 244d513071 ("eal: enable hotplug on multi-process")

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-22 12:41:28 +02:00
Anatoly Burakov
1dd342d0fd mem: improve segment list preallocation
Current code to preallocate segment lists is trying to do
everything in one go, and thus ends up being convoluted,
hard to understand, and, most importantly, does not scale beyond
initial assumptions about number of NUMA nodes and number of
page sizes, and therefore has issues on some configurations.

Instead of fixing these issues in the existing code, simply
rewrite it to be slightly less clever but much more logical, and
provide ample comments to explain exactly what is going on.

We cannot use the same approach for 32-bit code because the
limitations of the target dictate current socket-centric
approach rather than type-centric approach we use on 64-bit
target, so 32-bit code is left unmodified. FreeBSD doesn't
support NUMA so there's no complexity involved there, and thus
its code is much more readable and not worth changing.

Fixes: 1d406458db ("mem: make segment preallocation OS-specific")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-22 12:40:14 +02:00
Anatoly Burakov
0042eb5646 eal: improve musl compatibility of thread log
Musl complains about pthread id being of wrong size, because on
musl, pthread_t is a struct pointer, not an unsigned int. Fix the
printing code by casting pthread id to unsigned pointer type and
adjusting the format specifier to be of appropriate size.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 12:40:14 +02:00
Anatoly Burakov
0601defe2e eal: improve musl compatibility of string functions
Musl wraps various string functions such as strlcpy in order to
harden them. However, the fortify wrappers are included without
including the actual string functions being wrapped, which
throws missing definition compile errors. Fix by including
string.h in string functions header.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 12:40:14 +02:00
Anatoly Burakov
3717943819 mem: improve musl compatibility
When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.

Bugzilla ID: 31

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 11:29:37 +02:00
Anatoly Burakov
1c7fd81054 eal/linux: improve musl compatibility
When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.

Bugzilla ID: 33

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 11:28:52 +02:00
Anatoly Burakov
997b0ef8f8 fbarray: improve musl compatibility
When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.

Bugzilla ID: 34

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 11:28:46 +02:00
Anatoly Burakov
5d7b673d5f mk: build with _GNU_SOURCE defined by default
We use _GNU_SOURCE all over the place, but often times we miss
defining it, resulting in broken builds on musl. Rather than
fixing every library's and driver's and application's makefile,
fix it by simply defining _GNU_SOURCE by default for all
builds.

Remove all usages of _GNU_SOURCE in source files and makefiles,
and also fixup a couple of instances of using __USE_GNU instead
of _GNU_SOURCE.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-22 11:28:27 +02:00
Thomas Monjalon
739e13bcc9 devargs: fix freeing during device removal
After calling unplug function of a bus, the device is expected
to be freed. It is too late for getting devargs to remove.
Anyway, the buses which implement unplug are already freeing
the devargs, except the PCI bus.
So the call to rte_devargs_remove() is removed from EAL and
added in PCI.

Fixes: 2effa126fb ("devargs: simplify parameters of removal function")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-19 22:37:10 +02:00
Nithin Dabilpuram
52d8bd535f mbuf: fix missing Tx outer UDP checksum flag name
Fix missing Tx outer udp checksum flag name

Fixes: df694a05bf ("ethdev: add Tx offload outer UDP checksum definition")

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-18 10:24:39 +02:00
Jerin Jacob
c23e465944 mbuf: fix offload flag name and list
Fix missing PKT_TX* & PKT_RX* ol_flag name and fix ol_flag list.

Fixes: 6d18505efa ("vhost: support UDP Fragmentation Offload")
Fixes: 829a1c2c41 ("mbuf: extend flow director field")
Fixes: 63c0d74daa ("mbuf: add Tx side tunneling type")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
aa18d98bca vhost: enable postcopy protocol feature
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
cd85039e7e vhost: restrict postcopy live-migration enablement
Postcopy live-migration feature requires the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.

The application should only enable the feature if it does not
force the guest memory to be populated.

In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.

For the same reason, postcopy and dequeue zero copy features
are not compatible, so don't advertize postcopy support if
dequeue zero copy is requested.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9a0a3a25fa vhost: support postcopy end request
The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.

The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
5c2ee9662d vhost: send userfault range addresses back to Qemu
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
f40d8ea63a vhost: avoid useless VhostUserMemory copy
The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.

This is not very important, but next, we'll need to update the
payload and send it back to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9b62e2da18 vhost: register new regions with userfaultfd
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
25807d6893 vhost: support postcopy listen message
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9eefef3b59 vhost: introduce postcopy advise message
This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
06e787bd94 vhost: add config flag for postcopy
Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.

This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.

With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.

With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9bd3979306 vhost: enable fds passing in vhost-user messages
Passing userfault fds to Qemu will be required for postcopy
live-migration feature.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
02ef07efc0 vhost: pass socket fd to message handling callbacks
This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
c00bb88d35 vhost: add number of fds to vhost-user messages
As soon as some ancillary data (fds) are received, it is copied
without checking its length.

This patch adds the number of fds received to the message,
which is set in read_vhost_message().

This is preliminary work to support sending fds to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
7f20d9a965 vhost: define postcopy protocol flag
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
74ee315e4f vhost: fix error handling when mem table gets updated
When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.

Fixes: d5022533c2 ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
57b4d90b58 vhost: fix payload size of reply
QEMU doesn't expect any payload for the reply of
VHOST_USER_SET_LOG_BASE request, so don't send any.
Note that the Vhost-user specification isn't clear about
it and would need to be fixed.

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org

Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
7987eb1bc7 vhost: clarify reply-ack in case a reply was already sent
For messages that require a reply, a second ack should not be
sent when reply-ack protocol feature is negotiated, even if
the corresponding flag is set in the message.

The code is compliant with the spec but it isn't clear it is,
so this patch adds a comment to make it explicit.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
ef6fb7d3fd vhost: fix return code of messages requiring replies
VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.

Fixes: 0bff510b5e ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
a52ac8ec27 vhost: fix messages results handling
Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.

External post and pre callbacks return type needs also to be
changed to the new enum, so that its handling is consistent.
This is done in this patch alongside with the convertion of
its only user, vhost-crypto backend.

Fixes: 0bff510b5e ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Xiaolong Ye
d0d4887d62 vhost: add doxygen comment to vDPA header
As APIs in rte_vdpa.h are public, we need to add doxygen comments
to all APIs and structures.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Tiwei Bie
cd9012c3f8 vhost: fix notification for packed ring
The notification can't be disabled in packed ring when
application tries to disable notification, because the
device event flags field is overwritten by an unexpected
value. This patch fixes this issue.

Fixes: b1cce26af1 ("vhost: add notification for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2018-10-18 10:24:39 +02:00
Xiaoyu Min
15dbcdaada ethdev: add generic MAC address rewrite actions
rte_flow actions:
- RTE_FLOW_ACTION_TYPE_SET_MAC_SRC
- RTE_FLOW_ACTION_TYPE_SET_MAC_DST
added in order to offload to NIC

The rte_flow_itme_eth must be present in rte_flow pattern

Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-18 10:24:39 +02:00
Xiaoyu Min
6f1c2168bc ethdev: add generic TTL rewrite actions
rewrite TTL by decrease or just set it directly
it's not necessary to check if the final result
is zero or not

This is slightly different from the one defined
by openflow and more generic

Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-18 10:24:39 +02:00
Alejandro Lucero
deb373fb07 ethdev: add field for device data per process
Primary and secondary processes share a per-device private data. With
current design it is not possible to have data per-device per-process.
This is required for handling properly the CPP interface inside the NFP
PMD with multiprocess support.

There is also at least another PMD driver, tap, with similar
requirements for per-process device data.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-18 10:24:39 +02:00
Fan Zhang
3f1d382ca3 cryptodev: fix library version
This patch fixes the cryptodev library version number that was
missed updating in DPDK 18.08.

Fixes: a4493be5bd ("cryptodev: replace bus specific struct with generic dev")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-17 12:23:40 +02:00
Junxiao Shi
f49d7e9a99 cryptodev: fix pool element size for undefined operation
The documentation of rte_crypto_op_pool_create indicates that
specifying RTE_CRYPTO_OP_TYPE_UNDEFINED would create a pool that
supports all operation types. This change makes the code
consistent with documentation.

Fixes: c0f87eb525 ("cryptodev: change burst API to be crypto op oriented")
Cc: stable@dpdk.org

Signed-off-by: Junxiao Shi <git@mail1.yoursunny.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2018-10-17 12:23:40 +02:00
Tomasz Jozwiak
f1597231ac compressdev: fix compression API description
This patch fixes following API descriptions
  -  rte_comp_op_raw_bulk_alloc
  -  rte_comp_op_bulk_alloc

Fixes: 96086db5a3 ("compressdev: add operation management")
Cc: stable@dpdk.org

Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
2018-10-17 12:16:54 +02:00
Thomas Monjalon
e9d159c3d5 eal: allow probing a device again
In the devargs syntax for device representors, it is possible to add
several devices at once: -w dbdf,representor=[0-3]
It will become a more frequent case when introducing wildcards
and ranges in the new devargs syntax.

If a devargs string is provided for probing, and updated with a bigger
range for a new probing, then we do not want it to fail because
part of this range was already probed previously.
There can be new ports to create from an existing rte_device.

That's why the check for an already probed device
is moved as bus responsibility.
In the case of vdev, a global check is kept in insert_vdev(),
assuming that a vdev will always have only one port.
In the case of ifpga and vmbus, already probed devices are checked.
In the case of NXP buses, the probing is done only once (no hotplug),
though a check is added at bus level for consistency.
In the case of PCI, a driver flag is added to allow PMD probing again.
Only the PMD knows the ports attached to one rte_device.

As another consequence of being able to probe in several steps,
the field rte_device.devargs must not be considered as a full
representation of the rte_device, but only the latest probing args.
Anyway, the field rte_device.devargs is used only for probing.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-18 01:49:52 +02:00
Thomas Monjalon
52897e7e70 eal: add function to query device status
The function rte_dev_is_probed() is added in order to improve semantic
and enforce proper check of the probing status of a device.

It will answer this rte_device query:
Is it already successfully probed or not?

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-18 01:49:28 +02:00
Thomas Monjalon
391797f042 drivers/bus: move driver assignment to end of probing
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa ("bus/pci: reference driver structure before mapping")

However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
2018-10-17 10:26:59 +02:00
Thomas Monjalon
9fca820459 compressdev: remove driver name from logs
The logs printed by COMPRESSDEV_LOG were prefixed with the driver name.

In order to avoid assigning the driver before the end of the probing,
the driver name is removed from the compressdev library logs.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-17 10:26:59 +02:00
Thomas Monjalon
0a8d2cb1fa cryptodev: remove driver name from logs
The logs printed by CDEV_LOG_* were prefixed with the driver name.

In order to avoid assigning the driver before the end of the probing,
the driver name is removed from the cryptodev library logs.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-17 10:26:59 +02:00
Thomas Monjalon
5e046832f1 ethdev: rename memzones allocated for DMA
The helper rte_eth_dma_zone_reserve() is called by PMDs
when probing a new port.
It creates a new memzone with an unique name.
The name of this memzone was using the name of the driver
doing the probe.

In order to avoid assigning the driver before the end of the probing,
the driver name is removed from these memzone names.
The ethdev name (data->name) is not used because it may be too long
and may be not set at this stage of probing.

Syntax of old name: <driver>_<ring>_<port>_<queue>
Syntax of new name: eth_p<port>_q<queue>_<ring>

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-17 10:26:59 +02:00
Qi Zhang
ac9e4a1737 eal: support attach/detach shared device from secondary
This patch cover the multi-process hotplug case when a device
attach/detach request be issued from a secondary process

device attach on secondary:
a) secondary send sync request to the primary.
b) primary receive the request and attach the new device if
   failed goto i).
c) primary forward attach sync request to all secondary.
d) secondary receive the request and attach the device and send a reply.
e) primary check the reply if all success goes to j).
f) primary send attach rollback sync request to all secondary.
g) secondary receive the request and detach the device and send a reply.
h) primary receive the reply and detach device as rollback action.
i) send attach fail to secondary as a reply of step a), goto k).
j) send attach success to secondary as a reply of step a).
k) secondary receive reply and return.

device detach on secondary:
a) secondary send sync request to the primary.
b) primary send detach sync request to all secondary.
c) secondary detach the device and send a reply.
d) primary check the reply if all success goes to g).
e) primary send detach rollback sync request to all secondary.
f) secondary receive the request and attach back device. goto h).
g) primary detach the device if success goto i), else goto e).
h) primary send detach fail to secondary as a reply of step a), goto j).
i) primary send detach success to secondary as a reply of step a).
j) secondary receive reply and return.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-17 10:16:18 +02:00
Qi Zhang
244d513071 eal: enable hotplug on multi-process
We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.

IPC scenario for Case 1, 2:

attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-17 10:16:18 +02:00
Qi Zhang
95f3e8846e ethdev: add function to release port in secondary process
Add driver API rte_eth_release_port_secondary to support the
case when an ethdev need to be detached on a secondary process.
Local state is set to unused and shared data will not be reset
so the primary process can still use it.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-10-17 10:16:18 +02:00
Jerin Jacob
94d7265976 vfio: fix missing header inclusion
The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE
and used in the below files.

drivers/bus/pci/linux/pci_vfio.c
drivers/bus/pci/pci_common.c
lib/librte_eal/linuxapp/eal/eal_interrupts.c

However, Except the first file, the change missed to include
<rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined.
This creates runtime following error on vfio-pci mode and
kernel >= 4.0.0 combination.

EAL: [rte_intr_enable] Unknown handle type of fd 95
EAL: [pci_vfio_enable_notifier]Fail to enable req notifier.
EAL: Fail to unregister req notifier handler.
EAL: Error setting up notifier!
EAL: Requested device 0000:07:00.1 cannot be used

Fixes: cda9441996 ("vfio: fix build with Linux < 4.0")

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-10-17 10:16:18 +02:00
Jeff Guo
c89fdd8da2 eal/bsd: fix build
When compiling on FreeBSD, a warning/error is thrown for
unused parameter. This patch aim to fix the issue by delete
the useless func definition.

Fixes: 89ecd11052 ("eal: modify device event process function")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-16 14:54:25 +02:00
Jeff Guo
cda9441996 vfio: fix build with Linux < 4.0
Since the older kernel version do not implement the device request
interface for vfio, so when build on the kernel < v4.0.0, which is
the version begin to add the device request interface, it will
throw the error to show “VFIO_PCI_REQ_IRQ_INDEX” is undeclared.
This patch aim to fix this compile issue by add the macro
“HAVE_VFIO_DEV_REQ_INTERFACE” after checking the kernel version.

Fixes: 0eb8a1c4c7 ("vfio: add request notifier interrupt")
Fixes: c115fd000c ("vfio: handle hotplug request notifier")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-16 14:54:25 +02:00
Jeff Guo
89ecd11052 eal: modify device event process function
This patch modify the device event callback process function name to be
more explicit, change the variable to be const. And more, because not only
eal device helper will use the callback, but also vfio bus will use the
callback to handle hot-unplug, so exposure the API out from private eal.
The bus drivers and eal device would directly use this API to process
device event callback.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-15 22:55:55 +02:00
Jeff Guo
0eb8a1c4c7 vfio: add request notifier interrupt
Add a new req notifier in eal interrupt for enable vfio hotplug.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-15 22:29:35 +02:00
Jeff Guo
0fc54536b1 eal: add failure handling for hot-unplug
The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:17:49 +02:00
Jeff Guo
62e63653ba bus: add helper to handle sigbus
This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:17:35 +02:00
Jeff Guo
538d974bcd bus: add sigbus handler
When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:17:01 +02:00
Jeff Guo
a8a279da63 bus: add hot-unplug handler
A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:16:47 +02:00
Cristian Dumitrescu
196eae6148 pipeline: add table action for packet decap
This patch introduces a new table action for packet decapsulation
which removes n bytes from the start of the input packet. The n
is read from the current table entry. The following mbuf fields
are updated by the action: data_off, data_len, pkt_len.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-12 19:33:34 +02:00
Cristian Dumitrescu
a63e06d706 pipeline: add table action for packet tag
This patch introduces the packet tag table action which attaches
a 32-bit value (the tag) to the current input packet. The tag is
read from the current table entry. The tag is written into the
mbuf->hash.fdir.hi and the flags PKT_RX_FDIR and PKT_RX_FDIR_ID
are set into mbuf->ol_flags.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-12 19:33:26 +02:00
Fan Zhang
96303217a6 pipeline: add symmetric crypto table action
This patch adds the symmetric crypto action support to pipeline
library. The symmetric crypto action works as the shim layer
between pipeline and DPDK cryptodev and is able to interact with
cryptodev with the control path requests such as session
creation/deletion and data path work to assemble the crypto
operations for received packets.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-12 19:33:07 +02:00
Fan Zhang
cc85c0781f port: add symmetric crypto
This patch adds the symmetric crypto support to port library.
The crypto port acts as a shim layer to DPDK cryptodev library and
supports in-place crypto workload processing.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-12 19:33:02 +02:00
Kevin Laatz
ea7be0a038 lib/librte_table: add hash function headers
This commit adds rte_table_hash_func.h and rte_table_hash_func_arm64.h to
librte_table. This reduces code duplication by removing duplicate header
files within two folders and consolidating them into a single one. This
also adds a scalar implementation of the x86_64 intrinsic for crc32 as a
generic fallback.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-12 17:58:53 +02:00
Cristian Dumitrescu
b594cf43c3 pipeline: add VXLAN encap table action
Add support for VXLAN as part of the encap table action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-12 17:57:44 +02:00
Jasvinder Singh
923010592a sched: allocate memory on the given socket id
Replace rte_zmalloc() with rte_zmalloc_socket() to allocate
memory on the socket id provided by the application.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-08 17:52:29 +02:00
Rosen Xu
9378d24bef ethdev: expand queue threshold size of RED parameters
There's very commonly that more than 4G DDR memory in NIC for HQoS,
so right now the queue threshold size of RED needs to expand to
uint64_t. This patch fixes it.

Signed-off-by: Rosen Xu <rosen.xu@intel.com>
2018-10-08 17:51:54 +02:00
Vivek Sharma
bed70e5deb eal: use correct data type for bitmap slab operations
Currently, slab operations use unsigned long data type for 64-bit slab
related operations. On target 'i686-native-linuxapp-gcc', unsigned long
is 32-bit and thus, slab operations breaks on this target. Changing slab
operations to use unsigned long long for correct functioning on
all targets.

Fixes: de3cfa2c98 ("sched: initial import")
Fixes: 693f715da4 ("remove extra parentheses in return statement")
Cc: stable@dpdk.org

Signed-off-by: Vivek Sharma <vivek.sharma@caviumnetworks.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-08 17:51:24 +02:00
Jerin Jacob
df694a05bf ethdev: add Tx offload outer UDP checksum definition
Introduced DEV_TX_OFFLOAD_OUTER_UDP_CKSUM offload flags and
PKT_TX_OUTER_UDP_CKSUM mbuf ol_flags to enable Tx outer UDP
checksum offload.

To use hardware Tx outer UDP checksum offload, the user needs to,

- enable following in mbuf:
a) fill outer_l2_len and outer_l3_len in mbuf
b) set the PKT_TX_OUTER_UDP_CKSUM flag
c) set the flag PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6

- configure DEV_TX_OFFLOAD_OUTER_UDP_CKSUM offload flags in slow path

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 18:53:49 +02:00
Jerin Jacob
ec7f71577f ethdev: add Rx offload outer UDP checksum definition
Introduced DEV_RX_OFFLOAD_OUTER_UDP_CKSUM Rx offload flag and
PKT_RX_OUTER_L4_CKSUM_* mbuf ol_flags to detect outer UDP checksum
status.

- To use hardware Rx outer UDP checksum offload, the user needs to
configure DEV_RX_OFFLOAD_OUTER_UDP_CKSUM offload flags in slowpath.

- Driver updates checksum status in mbuf ol_flag as
PKT_RX_OUTER_L4_CKSUM_* flags.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 18:53:49 +02:00
Rahul Lakkireddy
8287597059 ethdev: add flow action to swap MAC addresses
This action is useful for offloading loopback mode, where the hardware
will swap source and destination MAC addresses in the outermost Ethernet
header before looping back the packet. This action can be used in
conjunction with other rewrite actions to achieve MAC layer transparent
NAT where the MAC addresses are swapped before either the source or
destination MAC address is rewritten and NAT is performed.

Must be used with a valid RTE_FLOW_ITEM_TYPE_ETH flow pattern item.
Otherwise, RTE_FLOW_ERROR_TYPE_ACTION error should be returned by the
PMDs.

Original work by Shagun Agrawal

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 18:53:49 +02:00
Rahul Lakkireddy
9ccc949195 ethdev: add flow API actions to modify TCP/UDP port numbers
Add actions:
- SET_TP_SRC - set a new TCP/UDP source port number.
- SET_TP_DST - set a new TCP/UDP destination port number.

Original work by Shagun Agrawal

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 18:53:49 +02:00
Rahul Lakkireddy
0517eea761 ethdev: add flow API actions to modify IP addresses
Add actions:
- SET_IPV4_SRC - set a new IPv4 source address.
- SET_IPV4_DST - set a new IPv4 destination address.
- SET_IPV6_SRC - set a new IPv6 source address.
- SET_IPV6_DST - set a new IPv6 destination address.

Original work by Shagun Agrawal

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 18:53:49 +02:00
Ferruh Yigit
b2fd027389 mbuf: clarify QinQ flag usage
Update implementation that when PKT_RX_QINQ_STRIPPED mbuf ol_flags
set by PMD, PKT_RX_QINQ, PKT_RX_VLAN_STRIPPED & PKT_RX_VLAN
should be also set.

Clarify mbuf documentations that when PKT_RX_QINQ set PKT_RX_VLAN also
should be set.

So that appllication can rely on PKT_RX_QINQ flag to access both
mbuf.vlan_tci & mbuf.vlan_tci_outer

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-10-11 18:53:49 +02:00
Jerin Jacob
1037ed842c mbuf: fix Tx offload mask
Fixes missing PKT_TX_UDP_SEG, PKT_TX_OUTER_IPV6,PKT_TX_OUTER_IPV4,
PKT_TX_IPV6 and  PKT_TX_IPV4 values in PKT_TX_OFFLOAD_MASK.

Also sort them in bit wise order to recognize missing items later.

Fixes: 6d18505efa ("vhost: support UDP Fragmentation Offload")
Fixes: 1c3b7c33e9 ("mbuf: add Tx offloading flags for tunnels")
Fixes: 711ba9e23e ("mbuf: remove aliasing of Tx offloading flags with Rx ones")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2018-10-11 18:53:49 +02:00
Jerin Jacob
28f6a3b88d ethdev: support SCTP Rx checksum offload
Added SCTP Rx checksum offload support

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-11 18:53:49 +02:00
Xiaoyun Li
4539803652 ethdev: get Rx queue interrupt fd
Some users want to use their own epoll instances to control both
DPDK rxq interrupt fds and their own other fds. So added a function
to get rxq interrupt fd based on port id and queue id.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-11 18:53:49 +02:00
Adrien Mazarguil
5ca8203907 ethdev: deprecate flow object copy function
No users left for this function, time to deprecate it.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-10-11 18:53:49 +02:00
Adrien Mazarguil
c2beb1d469 ethdev: add missing items/actions to flow object converter
Several pattern items and actions were never handled by rte_flow_copy()
because their descriptions were missing. rte_flow_conv() inherited this
deficiency.

This patch adds them and reorders others to match rte_flow.h. It doesn't
pose as a fix because so far no one has complained about it and
rte_flow_conv() would have to be backported as well: this function is
the only sane approach to handle VXLAN and NVGRE encap definitions.

As a matter of fact, it's the last missing piece to finally allow
testpmd users to request the creation of VXLAN/NVGRE encap/decap flow
rules without getting rejected outright.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-10-11 18:53:49 +02:00
Adrien Mazarguil
239dfc8d66 ethdev: add flow API item/action name conversion
This provides a means for applications to retrieve the name of flow
pattern items and actions.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-10-11 18:53:49 +02:00
Adrien Mazarguil
063911ee1d ethdev: add flow API object converter
rte_flow_copy() is bound to duplicate flow rule descriptions
(attributes, pattern and list of actions, all at once), however
applications sometimes need more flexibility, for instance the ability
to duplicate only one of the underlying objects (a single pattern item
or action) or retrieve other properties such as their names.

Instead of adding dedicated functions to handle each possible use case,
this patch introduces rte_flow_conv(), which supports any number of
object conversion operations in an extensible manner.

This patch re-implements rte_flow_copy() as a wrapper to
rte_flow_conv().

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-10-11 18:53:49 +02:00
Xiaolong Ye
0e0a7d3801 vhost: introduce API to get vDPA device number
It's used to get number of available registered vDPA devices.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-11 18:53:49 +02:00
Thomas Monjalon
911462eb4a eal: simplify parameters of hotplug functions
All information about a device to probe can be grouped
in a common string, which is what we usually call devargs.
An application should not have to parse this string before
calling the EAL probe function.
And the syntax could evolve to be more complex and support
matching multiple devices in one string.
That's why the bus name and device name should be removed from
rte_eal_hotplug_add().
Instead of changing this function, a simpler one is added
and used in the old one, which may be deprecated later.

When removing a device, we already know its rte_device handle
which can be directly passed as parameter of rte_eal_hotplug_remove().
If the rte_device is not known, it can be retrieved with the devargs,
by iterating in the device list (future RTE_DEV_FOREACH()).
Similarly to the probing case, a new function is added
and used in the old one, which may be deprecated later.
The new function is used in failsafe, because the replacement is easy.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-11 14:09:24 +02:00
Thomas Monjalon
6878cd397d eal: remove experimental flag of hotplug functions
These functions are quite old and are the only available replacement
for the deprecated attach/detach functions.

Note: some new functions may (again) replace these hotplug functions,
in future, with better parameters.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-11 14:09:24 +02:00
Thomas Monjalon
6844d146ff eal: add bus pointer in device structure
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.

A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.

It will make possible to remove a rte_device,
using the function pointer from its bus.

The function rte_bus_find_by_device() becomes useless,
and may be removed later.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-11 14:09:24 +02:00
Thomas Monjalon
2effa126fb devargs: simplify parameters of removal function
The function rte_devargs_remove(), which is intended to be internal,
can take a devargs structure as argument.
The matching is still using string comparison of bus name and
device name.
It is simpler and may allow a different devargs matching in future.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-11 14:09:24 +02:00
Thomas Monjalon
e7ec4d2fc8 devargs: remove deprecated functions
rte_eal_parse_devargs_str() does not support parsing the bus name
at the start of devargs. So it was renamed and deprecated.

rte_eal_devargs_add(), rte_eal_devargs_type_count() and
rte_eal_devargs_dump() were declared deprecated and had their
implementation body renamed.

All these functions were deprecated in release 18.05.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-11 14:09:18 +02:00
Thomas Monjalon
3f7a40c670 devargs: rename enum items with singular form
The enum names are *_params (plural form).
And the items are also using the plural form: *_PARAMS_*.
It looks more natural to use the singular form *_PARAM_* for items.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-11 13:57:29 +02:00
Pawel Wodkowski
8e1fdcaa3d mem: fix --huge-unlink option
The final_va field is set during remap_segment() but this information is
not propagated to temporal copy of huge page memory configuration so the
unlink_hugepage_files() function wrongly assume that there is nothing to
unlink. Fix this issue by checking orig_va instead of final_va.

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 12:19:58 +02:00
Anatoly Burakov
f32c7c9de9 malloc: enable event callbacks for external memory
When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.

Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
c842d1c3b0 malloc: allow detaching from external memory
Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
ff3619d624 malloc: allow attaching to external memory chunks
In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
75185aa5fe malloc: allow removing memory from named heaps
Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
7d75c31014 malloc: allow adding memory to named heaps
Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
15d6dd023c malloc: allow destroying heaps
Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
02e323a8a8 malloc: allow creating malloc heaps
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:51 +02:00
Anatoly Burakov
65ff37b105 malloc: add function to check if socket is external
An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:25 +02:00
Anatoly Burakov
e1fe3c2fab malloc: add function to query socket ID of named heap
When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:25 +02:00
Anatoly Burakov
d14c148e79 malloc: add name to malloc heaps
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:23 +02:00
Anatoly Burakov
f50c6c4bd1 sched: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
5675c2ea15 pipeline: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
21bd1106ea flow_classify: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
f473b6d191 mem: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
72cf92b318 malloc: index heaps using heap ID rather than NUMA node
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

This breaks the ABI, so document the changes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 10:37:39 +02:00
Anatoly Burakov
5282bb1c36 mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 10:24:29 +02:00
Anatoly Burakov
4104b2a485 mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-11 10:24:16 +02:00
Ferruh Yigit
850716bc57 eventdev: fix build
build error:
.../lib/librte_eventdev/rte_event_eth_tx_adapter.c:
  In function ‘txa_service_queue_del’:
.../lib/librte_eventdev/rte_event_eth_tx_adapter.c:800:7:
  error: ‘ret’ may be used uninitialized in this function
  [-Werror=maybe-uninitialized]
compilation terminated due to -Wfatal-errors.

https://mails.dpdk.org/archives/test-report/2018-October/065919.html

'ret' may be used uninitialized when 'dev->data->nb_tx_queues' is 0,
although this is not a practical value, initialize 'ret' to cover this
case.

Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-10 21:40:40 +02:00
Anatoly Burakov
03ba15ca65 vfio: allow mapping MSI-X BARs if kernel allows it
Currently, DPDK will skip mapping some areas (or even an entire BAR)
if MSI-X table happens to be in them but is smaller than page size.

Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this
as a capability flag. Capability flags themselves are also only
supported since kernel 4.6 [2].

This commit will introduce support for checking VFIO capabilities,
and will use it to check if we are allowed to map BARs with MSI-X
tables in them, along with backwards compatibility for older
kernels, including a workaround for a variable rename in VFIO
region info structure [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279

[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:45:50 +02:00
Anatoly Burakov
b1621823ea mem: fix undefined behavior in NUMA-aware mapping
When NUMA-aware hugepages config option is set, we rely on
libnuma to tell the kernel to allocate hugepages on a specific
NUMA node. However, we allocate node mask before we check if
NUMA is available in the first place, which, according to
the manpage [1], causes undefined behaviour.

Fix by only using nodemask when we have NUMA available.

[1] https://linux.die.net/man/3/numa_alloc_onnode

Bugzilla ID: 20
Fixes: 1b72605d24 ("mem: balanced allocation of hugepages")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-04 00:33:58 +02:00
Anatoly Burakov
64cdfc35aa mem: store memory mode flags in shared config
Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.

Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.

This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.

[1] http://patches.dpdk.org/patch/43502/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:09:47 +02:00
Gaetan Rivet
ca372b3f50 devargs: remove comment regarding logs
rte_log() is available in the context of this compilation unit,
do not deter from using it.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-03 14:36:18 +02:00
Gaetan Rivet
e815a7f693 ethdev: register as a class
Implement the operators of an rte_class for the
ethdev abstraction layer.

Register the layer as such.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-03 14:23:02 +02:00
Gaetan Rivet
600ce80536 ethdev: add private generic device iterator
This iterator can be customized with a comparison function that will
trigger a stopping condition.

It can be leveraged to write several different iterators that have
similar but non-identical purposes.

It is private to librte_ethdev.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-03 14:22:41 +02:00
Igor Ryzhov
edd2fafbc0 kni: allocate memory dynamically for each device
Long time ago preallocation of memory for KNI was introduced in commit
0c6bc8e. It was done because of lack of ability to free previously
allocated memzones, which led to memzone exhaustion. Currently memzones
can be freed and this patch uses this ability for dynamic KNI memory
allocation.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-02 17:57:00 +02:00
Nikhil Rao
475425186f eventdev: fix port id argument in Rx adapter caps
Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.

Also, update the event rx adapter test to use 16 bit
ethernet port ids.

Fixes: c2189c907d ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:53:13 +02:00
Nikhil Rao
a3bbf2e097 eventdev: add eth Tx adapter implementation
This patch implements the Tx adapter APIs by invoking the
corresponding eventdev PMD callbacks and also provides
the common rte_service function based implementation when
the eventdev PMD support is absent.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
2018-10-01 16:51:13 +02:00
Nikhil Rao
c662a950f4 eventdev: add caps API and PMD callbacks for eth Tx adapter
The caps API allows the application to query if the transmit
stage is implemented in the eventdev PMD or uses the common
rte_service function. The PMD callbacks support the
eventdev PMD implementation of the adapter.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:50:54 +02:00
Nikhil Rao
c9bf83947e eventdev: add eth Tx adapter APIs
The ethernet Tx adapter abstracts the transmit stage of an
event driven packet processing application. The transmit
stage may be implemented with eventdev PMD support or use a
rte_service function implemented in the adapter. These APIs
provide a common configuration and control interface and
an transmit API for the eventdev PMD implementation.

The transmit port is specified using mbuf::port. The transmit
queue is specified using the rte_event_eth_tx_adapter_txq_set()
function.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:49:41 +02:00
Harry van Haaren
e279bbe4b2 event: add function for reading unlink in progress
This commit introduces a new function in the eventdev API,
which allows applications to read the number of unlink requests
in progress on a particular port of an eventdev instance.

This information allows applications to verify when no more packets
from a particular queue (or any queue) will arrive at a port.
The application could decide to stop polling, or put the core into
a sleep state if it wishes, as it is ensured that no new packets
will arrive at a particular port anymore if all queues are unlinked.

Suggested-by: Matias Elo <matias.elo@nokia.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:48:38 +02:00
Nikhil Rao
d7b5f102c4 eventdev: fix eth Rx adapter hotplug incompatibility
Use RTE_MAX_ETHPORTS instead of rte_eth_dev_count_total()
when allocating eth Rx adapter's per-eth device data structure
to account for hotplugged devices.

Fixes: 9c38b704d2 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:47:56 +02:00
Jiayu Hu
729199397f vhost: fix corner case for enqueue operation
When performing enqueue operations on the split and packed rings,
if the reserved buffer length from the descriptor table exceeds
65535, the returned length by fill_vec_buf_split/_packed()
overflows. This patch is to avoid this corner case.

Fixes: f689586bc0 ("vhost: shadow used ring update")
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Fixes: 37f5e79a27 ("vhost: add shadow used ring support for packed rings")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
2f270595c0 vhost: rework message handling as a callback array
Introduce vhost_message_handlers, which maps the message request
type to the message handler. Then replace the switch construct
with a map and call.

Failing vhost_user_set_features is fatal and all processing should
stop immediately and propagate the error to the upper layers. Change
the code accordingly to reflect that.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
0bff510b5e vhost: unify message handling function signature
Each vhost-user message handling function will return an int result
which is described in the new enum vh_result: error, OK and reply.
All functions will now have two arguments, virtio_net double pointer
and VhostUserMsg pointer.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
fd29c33b65 vhost: handle unsupported message types in functions
Add new functions to handle the unsupported vhost message types:
 - vhost_user_set_vring_err
 - vhost_user_set_log_fd

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
e951355ffc vhost: make message handling functions prepare the reply
As VhostUserMsg structure is reused to generate the reply, move the
relevant fields update into the respective message handling functions.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
44eb792f9f vhost: unify struct VhostUserMsg usage
Do not use the typedef version of struct VhostUserMsg. Also unify the
related parameter name.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Paul M Stillwell Jr
cb0ad8fa26 ethdev: fix doxygen comment to be with structure
The doxygen comment describing the rte_eth_dev_info structure
was separated from the structure itself so move the comment
back to be with the structure.

Fixes: 7238e63bce ("ethdev: add support for device offload capabilities")
Cc: stable@dpdk.org

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-09-28 01:41:03 +02:00
Alejandro Lucero
aa3c4fb6a4 ethdev: fix error handling in create function
This patch fixes how function exit is handled when errors inside
rte_eth_dev_create.

Fixes: e489007a41 ("ethdev: add generic create/destroy ethdev APIs")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-09-28 01:41:02 +02:00
Didier Pallard
ae0207d4b5 net: fix Intel prepare function for IP checksum offload
Current Intel tx prepare function does not properly handle the
case where only IP checksum is requested, without requesting
any L4 checksum or TSO: IP checksum is not properly reset to 0
and output packet may contain invalid IP checksum.

Fixes: 4fb7e803eb ("ethdev: add Tx preparation")
Cc: stable@dpdk.org

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-09-28 01:41:02 +02:00
Anatoly Burakov
55d6bb67c9 eal/bsd: fix build
When compiling on FreeBSD, lots of warnings/errors are thrown for
unused parameter. Fix these by marking the parameters as unused
in the code.

Fixes: 1009ba1704 ("mem: add internal API to get and set segment fd")
Fixes: 3a44687139 ("mem: allow querying offset into segment fd")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-20 14:51:52 +02:00
Alex Kiselev
d5946eef6a ip_frag: add function to delete expired entries
A fragmented packets is supposed to live no longer than max_cycles,
but the lib deletes an expired packet only occasionally when it scans
a bucket to find an empty slot while adding a new packet.
Therefore a fragment might sit in the table forever.

Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-09-19 19:45:38 +02:00
Alex Kiselev
e480688dce lpm6: add incremental update on delete
Rework the delete function and add additional
internal data structures to support incremental
LPM tree update rather than full tree rebuild.

Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-19 17:11:37 +02:00
Alex Kiselev
86b3b21952 lpm6: store rules in hash table
Rework the lpm6 rule subsystem and replace
current rules algorithm complexity O(n)
with hashtables which allow dealing with
large (50k) rule sets.

Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-19 17:11:17 +02:00
Anatoly Burakov
c127be93f6 mem: support using memfd segments for in-memory mode
Enable using memfd-created segments if supported by the system.

This will allow having real fd's for pages but without hugetlbfs
mounts, which will enable in-memory mode to be used with virtio.

The implementation is mostly piggy-backing on existing real-fd
code, except that we no longer need to unlink any files or track
per-page locks in single-file segments mode, because in-memory
mode does not support secondary processes anyway.

We move some checks from EAL command-line parsing code to memalloc
because it is now possible to use single-file segments mode with
in-memory mode, but only if memfd is supported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 15:02:19 +02:00
Anatoly Burakov
3a44687139 mem: allow querying offset into segment fd
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 15:01:58 +02:00
Anatoly Burakov
41dbdb6872 mem: add external API to retrieve page fd
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.

We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:48:04 +02:00
Anatoly Burakov
1009ba1704 mem: add internal API to get and set segment fd
Enable setting and retrieving segment fd's internally.

For now, retrieving fd's will not be used anywhere until we
get an external API, but it will be useful for things like
virtio, where we wish to share segment fd's.

Setting segment fd's will not be available as a public API
at this time, but internally it is needed for legacy mode,
because we're not allocating our hugepages in memalloc in
legacy mode case, and we still need to store the fd.

Another user of get segment fd API is memseg info dump, to
show which pages use which fd's.

Not supported on FreeBSD.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:46:34 +02:00
Anatoly Burakov
16cab6e5c8 mem: track page fd in non-single file mode
Previously, we were only tracking lock file fd's in single-file
segments mode, but did not track fd's in non-single file mode
because we didn't need to (mmap() call still kept the lock). Now
that we are going to expose these fd's to the world, we need to
have access to them, so track them even in non-single file
segments mode.

We don't need to close fd's after mmap() because we're still
tracking them in an fd list. Also, for anonymous hugepages mode,
fd will always be -1 so exit early on error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:44:11 +02:00
Anatoly Burakov
a033a4158b mem: rename lock list to fd list
Previously, we were only using lock lists to store per-page lock fd's
because we cannot use modern fcntl() file description locks to lock
parts of the page in single file segments mode.

Now, we will be using this list to store either lock fd's (along with
memseg list fd) in single file segments mode, or per-page fd's (and set
memseg list fd to -1), so rename the list accordingly.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:43:14 +02:00
Anatoly Burakov
18329a4366 mem: raise maximum fd limit unconditionally
Previously, when we allocated hugepages, we closed the fd's corresponding
to them after we've done our mappings. Since we did mmap(), we didn't
actually lose the reference, but file descriptors used for mmap() do not
count against the fd limit. Since we are going to store all of our fd's,
we will hit the fd limit much more often when using smaller page sizes.

Fix this to raise the fd limit to maximum unconditionally.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:41:38 +02:00
Anatoly Burakov
d4ce95d6b4 eal: do not allow legacy mode with --in-memory mode
In-memory mode was never meant to support legacy mode, because we
cannot sort anonymous pages anyway.

Fixes: 72b49ff623 ("mem: support --in-memory mode")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:40:56 +02:00
Anatoly Burakov
310aa7c041 fbarray: fix detach in --no-shconf mode
In noshconf mode, no shared files are created, but we're still trying
to unlink them, resulting in detach/destroy failure even though it
should have succeeded. Fix it by exiting early in noshconf mode.

Fixes: 3ee2cde248 ("fbarray: support --no-shconf mode")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:40:18 +02:00
Gaetan Rivet
b0236c7cf7 eal: add strscpy function
The strncpy function has long been deemed unsafe for use,
in favor of strlcpy or snprintf.

While snprintf is standard and strlcpy is still largely available,
they both have issues regarding error checking and performance.

Both will force reading the source buffer past the requested size
if the input is not a proper c-string, and will return the expected
number of bytes copied, meaning that error checking needs to verify
that the number of bytes copied is not superior to the destination
size.

This contributes to awkward code flow, unclear error checking and
potential issues with malformed input.

The function strscpy has been discussed for some time already and
has been made available in the linux kernel[1].

Propose this new function as a safe alternative.

[1]: http://git.kernel.org/linus/30c44659f4a3

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Juhamatti Kuusisaari <juhamatti.kuusisaari@coriant.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-09-19 11:38:19 +02:00
David Marchand
c4833b8ed3 mbuf: remove deprecated segment free functions
__rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
a long time now (early 17.05), are not part of the abi and are easily
replaced with existing api.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-09-19 10:35:01 +02:00
Reshma Pattan
2c6c3e0dc8 pdump: remove dependency on libpthread
pdump library now uses generic multi process channel
and it is no more dependent on the pthreads, so remove
the dependency from the Makefile.

Fixes: 660098d61f ("pdump: use generic multi-process channel")
Cc: stable@dpdk.org

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
2018-09-18 11:48:32 +02:00
Bruce Richardson
806c45dd48 build: add configuration summary at end of config
After running meson to configure a DPDK build, it can be useful to know
what was automatically enabled or disabled. Therefore, print out by way of
summary a categorised list of libraries and drivers to be built.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-17 13:58:40 +02:00
Bruce Richardson
34b3d7a4a4 build: simplify logic for default library dependencies
EAL is a standard dependency of all libraries, except for those built
before it. We can therefore simplify the logic by just checking if EAL
has been processed, and make it a standard dependency if so.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-17 13:51:25 +02:00
Luca Boccassi
54d609a138 build: add ppc64 meson build
This has been only build-tested for now, on a native ppc64el POWER8E
machine running Debian sid.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-17 12:21:17 +02:00
Luca Boccassi
888904417d eal: include missing hypervisor files in meson
They are built by the legacy makefiles but not by Meson.

Fixes: 8f40ee0734 ("eal/x86: get hypervisor name")
Cc: stable@dpdk.org

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-17 12:17:02 +02:00
Ferruh Yigit
323e7b667f ethdev: make default behavior CRC strip on Rx
Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag.
Without any specific Rx offload flag, default behavior by PMDs is to
strip CRC.

PMDs that support keeping CRC should advertise DEV_RX_OFFLOAD_KEEP_CRC
Rx offload capability.

Applications that require keeping CRC should check PMD capability first
and if it is supported can enable this feature by setting
DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure()

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tomasz Duszynski <tdu@semihalf.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jan Remes <remes@netcope.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
2018-09-14 20:08:41 +02:00
Dekel Peled
d9552ee24a ethdev: fix missing names in Tx offload name array
Patch 5355f443 added two definitions of DEV_TX_OFFLOAD_xxx.
If new Tx offload capabilities are defined, they also must be mentioned
in rte_tx_offload_names in rte_ethdev.c file.

This patch adds the required lines in array rte_tx_offload_names.

Fixes: 5355f4439e ("ethdev: introduce generic IP/UDP tunnel checksum and TSO")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-09-14 20:08:41 +02:00
Tiwei Bie
58e90a9113 vhost: fix return value on enqueue path
Fixes: 62250c1d09 ("vhost: extract split ring handling from Rx and Tx functions")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Ilya Maximets
0d7853a4da vhost-user: drop connection on message handling failures
There are a lot of cases where vhost-user massage handling
could fail and end up in a fully not recoverable state. For
example, allocation failures of shadow used ring and batched
copy array are not recoverable and leads to the segmentation
faults like this on the receiving/transmission path:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7f913fecf0 (LWP 43625)]
  in copy_desc_to_mbuf () at /lib/librte_vhost/virtio_net.c:760
  760       batch_copy[vq->batch_copy_nb_elems].dst =

This could be easily reproduced in case of low memory or big
number of vhost-user ports.

Fix that by propagating error to the upper layer which will
end up with disconnection in case we can not report to
the message sender when the error happens.

Fixes: f689586bc0 ("vhost: shadow used ring update")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Tiwei Bie
77de7c781c vhost: fix vhost interrupt support
When VIRTIO_RING_F_EVENT_IDX is negotiated, we need to
update the avail event to enable the notification.

Fixes: 3f8ff12821 ("vhost: support interrupt mode")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Ilya Maximets
28925156d9 vhost: fix zmbufs array leak after NUMA realloc
'numa_realloc()' allocates 'zmbufs' even if zero copy mode
is not configured. This leads to memory leak, because array
is freed only for zero copy case.

Fixes: 2651726def ("vhost: do deep copy while reallocating queue")
CC: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-09-12 19:10:09 +02:00
Stephen Hemminger
40d96ffbd1 ethdev: fix port ownership logs
The rte_eth_dev_owner_unset function always generates a log
message because the unset value for owner id is 0.

Also, when rte_eth_dev_owner_delete is called with a valid
owner id, the log message should be at NOTICE not ERROR
severity.

Fixes: 5b7ba31148 ("ethdev: add port ownership")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Matan Azrad <matan@mellanox.com>
2018-08-28 15:27:39 +02:00
Alejandro Lucero
1e5e3d2e72 ethdev: fix MAC changes when live change not supported
Current code assumes a MAC change can occur when the port has been
started. In fact, there are some NICs which require this port state
for being successful, but other NICs not always support MAC change
in that case.

This patch supports a new device flag for a device advertising this
limitation, and if the flag is set, the MAC is changed before the
port starts.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-08-28 15:27:39 +02:00
Ilya Maximets
d02f2092a3 vhost: suppress error if NUMA is not available
It's a common case that 'get_mempolicy' fails on systems
without NUMA support. No need to flag an error in log for
this situation.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-08-28 15:27:39 +02:00
Ilia Kurakin
2c1bbab7f0 ethdev: change vtune profiling approach
The patch changes rx_burst profiling approach:
	1. VTune's instrumentation is removed
	2. empty hook callback for profiling is added
This way all VTune-specific logic moves to the VTune side.
Hook is enabled only when CONFIG_RTE_ETHDEV_PROFILE_WITH_VTUNE option
is turned on. VTune uses this hook to attach to the polling cycle. It
is not possible to attach to the rx_burst directly, as it is inline.

Signed-off-by: Ilia Kurakin <ilia.kurakin@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-08-28 15:27:39 +02:00
Konstantin Ananyev
5394547798 acl: forbid rule with priority zero
If user specifies priority=0 for some of ACL rules
that can cause rte_acl_classify to return wrong results.
The reason is that priority zero is used internally for no-match nodes.
See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
The simplest way to overcome the issue is just not allow zero
to be a valid priority for the rule.

Fixes: dc276b5780 ("acl: new library")
Cc: stable@dpdk.org

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-09-16 11:53:25 +02:00
Tiwei Bie
dde37a8fb8 malloc: fix potential null pointer dereference
We need to do the NULL pointer check first after malloc().

Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-09-16 11:23:12 +02:00
Thomas Monjalon
76b9d9de5c version: 18.11-rc0
Start version numbering for a new release cycle,
and introduce a template file for release notes.

The release notes comments have a new block to suggest
the order of items, inspired by Ferruh's proposal.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: John McNamara <john.mcnamara@intel.com>
2018-08-13 12:42:46 +02:00
Thomas Monjalon
11a1f847d2 version: 18.08.0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-08-09 23:11:26 +02:00
Yipeng Wang
8308d2ff5d hash: add more accurate thread-safety comments
Describing the thread-safety support more accurately for
API documentation.

Fixes: f2e3001b53 ("hash: support read/write concurrency")

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-08-09 21:56:29 +02:00
Rami Rosen
2da7f0146e ethdev: fix a doxygen comment for port allocation
This patch fixes a doxygen comment of the rte_eth_dev_allocate()
method. There is no parameter named "type" for this
method; so this patch removes the doxygen comment about it.

Fixes: 6751f6deb7 ("ethdev: get rid of device type")
Cc: stable@dpdk.org

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-08-09 17:53:34 +02:00
Dan Gora
e716b63985 kni: fix crash with null name
Fix a segmentation fault which occurs when the kni_autotest is run
in the 'test' application.

This segmenation fault occurs when rte_kni_get() is called with a
NULL value for 'name'.

Fixes: 0c6bc8ef70 ("kni: memzone pool for alloc and release")
Cc: stable@dpdk.org

Signed-off-by: Dan Gora <dg@adax.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-08-09 11:50:10 +02:00
Olivier Matz
6b867cc113 eal: remove experimental tag for user mbuf pool ops
Remove experimental tag from rte_eal_mbuf_user_pool_ops().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-08-09 01:03:14 +02:00
Olivier Matz
83a8a143bb eal: remove deprecated function for mbuf pool ops
rte_eal_mbuf_default_mempool_ops() is replaced by
rte_mbuf_best_mempool_ops().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-08-09 01:03:14 +02:00
Pablo de Lara
f9975d333a hash: fix doxygen of return values
rte_hash_lookup_data() and rte_hash_lookup_with_hash_data()
functions return the index of the table where the key is stored
when this is found, and not 0 as the Doxygen currently states.

Also, these functions, and rte_hash_get_key_with_position()
return negative values when keys are not found (-EINVAL and -ENOENT),
where the minus sign was missing.

Bugzilla ID: 78
Fixes: 473d1bebce ("hash: allow to store data in hash table")
Fixes: 6dc34e0afe ("hash: retrieve a key given its position")
Cc: stable@dpdk.org

Reported-by: Petr Houska <t-pehous@microsoft.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-08-07 15:40:04 +02:00
Thomas Monjalon
db803c3e4c ethdev: bump library version
The old offload API is removed in 18.08,
so the library version must be increased,
in order to show the incompatibility with 18.05 one.

Fixes: ab3ce1e0c1 ("ethdev: remove old offload API")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-08-07 13:20:39 +02:00