This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
changes in rte_eth_dev and rte_eth_desc_lim structures.
As discussed in that thread:
http://dpdk.org/ml/archives/dev/2015-September/023603.html
Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:
- Max number of fragments per packet allowed
- Max number of fragments per TSO segments
- The way pseudo-header checksum should be pre-calculated
- L3/L4 header fields filling
- etc.
MOTIVATION:
-----------
1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
However, this work is sometimes required, and now, it's an
application issue.
2) Different hardware may have different requirements for TX offloads,
other subset can be supported and so on.
3) Some parameters (eg. number of segments in ixgbe driver) may hung
device. These parameters may be vary for different devices.
For example i40e HW allows 8 fragments per packet, but that is after
TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
4) Fields in packet may require different initialization (like eg. will
require pseudo-header checksum precalculation, sometimes in a
different way depending on packet type, and so on). Now application
needs to care about it.
5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
prepare packet burst in acceptable form for specific device.
6) Some additional checks may be done in debug mode keeping tx_burst
implementation clean.
PROPOSAL:
---------
To help user to deal with all these varieties we propose to:
1. Introduce rte_eth_tx_prep() function to do necessary preparations of
packet burst to be safely transmitted on device for desired HW
offloads (set/reset checksum field according to the hardware
requirements) and check HW constraints (number of segments per
packet, etc).
While the limitations and requirements may differ for devices, it
requires to extend rte_eth_dev structure with new function pointer
"tx_pkt_prep" which can be implemented in the driver to prepare and
verify packets, in devices specific way, before burst, what should to
prevent application to send malformed packets.
2. Also new fields will be introduced in rte_eth_desc_lim:
nb_seg_max and nb_mtu_seg_max, providing an information about max
segments in TSO and non-TSO packets acceptable by device.
This information is useful for application to not create/limit
malicious packet.
APPLICATION (CASE OF USE):
--------------------------
1) Application should to initialize burst of packets to send, set
required tx offload flags and required fields, like l2_len, l3_len,
l4_len, and tso_segsz
2) Application passes burst to the rte_eth_tx_prep to check conditions
required to send packets through the NIC.
3) The result of rte_eth_tx_prep can be used to send valid packets
and/or restore invalid if function fails.
eg.
for (i = 0; i < nb_pkts; i++) {
/* initialize or process packet */
bufs[i]->tso_segsz = 800;
bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
| PKT_TX_IP_CKSUM;
bufs[i]->l2_len = sizeof(struct ether_hdr);
bufs[i]->l3_len = sizeof(struct ipv4_hdr);
bufs[i]->l4_len = sizeof(struct tcp_hdr);
}
/* Prepare burst of TX packets */
nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
if (nb_prep < nb_pkts) {
printf("tx_prep failed\n");
/* drop or restore invalid packets */
}
/* Send burst of TX packets */
nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
/* Free any unsent packets. */
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The right name of ethdev should be dpdk_netdev. However:
1/ We are using rte_ prefix in the code and library names.
2/ The API uses rte_ethdev
That's why 16.11 will just have the rte_ prefix prepended to
the library filename as every other libraries.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Driver names for all the supported devices in DPDK do not have
a naming convention. Some are using a prefix, some are not
and some have long names. Driver names are used when creating
virtual devices, so it is useful to have consistency in the names.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Remove deprecation notice pertaining to introduction of new flow
types in favor of a more generic filtering infrastructure proposal.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Add new section on tested platforms and nics and OSes to the release notes.
Signed-off-by: Yulong Pei <yulong.pei@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Improve the wording of some text in the "new features" section of
the release notes.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
When use i40e linux kernel driver as host driver and DPDK handler the i40e
VF, the promiscuous mode doesn't work in i40e VF. It is not supported by
DPDK i40e VF driver right now.
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
The driver is incorrectly setting the RSS field in the last mbuf in
the packet chain instead of the first. Moreover, the last mbuf might
have already been freed if it only contained the Ethernet CRC.
Also, fix the call to i40e_rxd_build_fdir to store the fdir flags in
the first mbuf of the chain instead of the last.
Fixes: 4861cde461 ("i40e: new poll mode driver")
Fixes: 5a21d9715f ("i40e: report flow director matching")
Signed-off-by: Dumitru Ceara <dumitru.ceara@gmail.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This configuration is example configuration for flow classification.
This fix changes the offset and mask value to compute the hash correctly.
This fix does not involve code change and do not impact compilation,
build and performance.
Fixes: 93771a569d ("examples/ip_pipeline: rework flow classification CLI")
Signed-off-by: Sankar Chokkalingam <sankarx.chokkalingam@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
the tail blank after a group of lcore or cpu set
will make check of its end character fail.
for example: --lcores '(0-3)@(0-3) ,(4-5)@(4-5)',
the next character after cpu set (0-3) is not ','
or '\0', which fail the check in eal_parse_lcores( ).
Fixes: 53e54bf817 ("eal: new option --lcores for cpu assignment")
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The '-' in lcore set overrides cpu set of following
lcore set in the argument of EAL option --lcores.
for example --locres '0-2,(3-5)@(3,4),6@(5,6),7@(5-7)',
0-2 make lflags=1 which indeed suppress following
cpu set (3,4), (5,6) and (5-7) after @ .
Fixes: 53e54bf817 ("eal: new option --lcores for cpu assignment")
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
local variable i is not referred by other codes in
the function eal_parse_lcores( ), so it can be removed.
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
If a mempool is removed from the list by a callback function
during rte_mempool_walk(), the TAILQ_FOREACH loop will fail unexpectedly.
It is fixed by using the safe version of the loop macro.
Reported-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This git tree will be used to backport some fixes from the
master branch to maintain some "stable releases".
The minor version number z will be incremented for these releases:
YY.MM.z
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
VSI structure needs to be removed from TAILQ list when releasing.
But for the child VSI it will be removed again after the structure
is freed. It will cause core dump when the DPDK i40e using as PF
host driver.
This patch fixes it to only remove child VSI from TAILQ before
send adminq command to remove it from hardware.
Fixes: 4861cde461 ("i40e: new poll mode driver")
Fixes: 440499cf53 ("net/i40e: support floating VEB")
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
nr_desc is not an index but the number of descriptors,
so can be equal to the virtqueue size.
Fixes: a436f53ebf ("vhost: avoid dead loop chain")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When timer_cb resets another running timer on the same lcore,
the list of expired timers is chained to the pending-list.
This commit prevents a running timer from being reset
by not its own timer_cb.
Fixes: a4b7a5a45c ("timer: fix race condition")
Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
Acked-by: Robert Sanford <rsanford@akamai.com>
When timer_set_running_state() fails in rte_timer_manage(),
the failed timer is put back on pending-list.
In this case, another core tries to reset or stop the timer.
It does not need to be on pending-list.
Fixes: a4b7a5a45c ("timer: fix race condition")
Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
Acked-by: Robert Sanford <rsanford@akamai.com>
This commit fixes incorrect pending-list manipulation
when getting list of expired timers in rte_timer_manage().
When timer_get_prev_entries() sets pending_head on prev,
the pending-list is broken.
The next of pending_head always becomes NULL.
In this depth level, it is not need to manipulate the list.
Fixes: 9b15ba895b ("timer: use a skip list")
Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
Acked-by: Robert Sanford <rsanford@akamai.com>
Use of rte_smb_wmb() instead of rte_smb_rmb() in sc dequeue function
creates the additional overhead of waiting for all the STOREs
to be completed to local buffer from ring buffer memory.
The sc dequeue function demands only LOAD-STORE barrier where LOADs
from ring buffer memory needs to be completed before tail pointer update.
Changing to rte_smb_rmb() to enable the required LOAD-STORE barrier.
Fixes: ecc7d10e44 ("ring: guarantee dequeue ordering before tail update")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
There is a dependency on librt with old glibc.
The -lrt option was added everywhere it is needed but was also
added in some applications makefiles as the first link option.
The problem is this option is really useful only if added after
the objects or libraries using it (except if using --whole-archive).
And the -lrt options put after were removed to avoid duplicates.
It was resulting in errors linking test application:
eal_timer.c:(.text+0x128): undefined reference to `clock_gettime'
eal_timer.c:(.text+0x166): undefined reference to `clock_gettime'
eal_alarm.c:(.text+0xda): undefined reference to `clock_gettime'
eal_alarm.c:(.text+0x211): undefined reference to `clock_gettime'
It is fixed by removing superfluous -lrt in app makefiles.
Fixes: 281948b475 ("mk: fix missing librt dependencies")
Fixes: 2f6414f4ba ("mk: fix static link with glibc < 2.17")
Reported-by: Piotr Azarewicz <piotrx.t.azarewicz@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
-dumpversion is for gcc compatibility and doesn't return actual clang
version. -dumpversion only returns 4.2.1 for a long time.
Fixes: 2ef6eea891 ("mk: add clang toolchain")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Add a git tree line for the virtio/vhost section, to make an explicit
statement that the developers are suggested to make patches based on
that tree.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There are now 2 different sections for drivers/net/ and drivers/crypto/.
It makes possible to declare some dedicated git trees.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The following tools may be installed system-wide.
It may be cleaner and more convenient to find them with the same
dpdk- prefix (especially for autocompletion).
Moreover, the script dpdk_nic_bind.py deserves a new name because it is
not restricted to NICs and can be used for e.g. crypto.
These files are renamed:
pmdinfogen -> dpdk-pmdinfogen
pmdinfo.py -> dpdk-pmdinfo.py
dpdk_pdump -> dpdk-pdump
dpdk_proc_info -> dpdk-procinfo
dpdk_nic_bind.py -> dpdk-devbind.py
setup.sh -> dpdk-setup.sh
The tools pmdinfogen, pmdinfo.py and dpdk_pdump are new in 16.07.
The scripts dpdk_nic_bind.py and setup.sh may have been used with
previous releases by end users. That's why a symbolic link still
provide the old name in the installed tools directory.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Update the Sphinx installation instructions in the documentation
contributors guide to reflect the fact that in the 1.4+ versions
of Sphinx the ReadTheDocs theme must also be installed. Previously,
in version 1.3.x, it was installed by default.
Also change 'yum' to 'dnf' for package installations.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Fix warnings raised by Python Sphinx 1.4.5:
guides/sample_app_ug/ip_pipeline.rst:334:
WARNING: Could not lex literal_block as "ini". Highlighting skipped.
guides/sample_app_ug/l2_forward_real_virtual.rst:467:
WARNING: Could not lex literal_block as "c". Highlighting skipped.
guides/sample_app_ug/l3_forward.rst:293:
WARNING: Could not lex literal_block as "c". Highlighting skipped.
guides/sample_app_ug/vm_power_management.rst:162:
WARNING: Could not lex literal_block as "xml". Highlighting skipped.
These warnings arise from invalid syntax in code-block directives.
Fixes: f1e779ec5b ("doc: update ip pipeline app guide")
Fixes: d0dff9ba44 ("doc: sample application user guide")
Fixes: c75f4e6a7a ("doc: add vm power mgmt app")
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Flow Bifurcation is a mechanism which uses features of advanced
Ethernet devices to split traffic between queues. It provides
the capability to let the kernel driver and DPDK driver co-exist
and take advantage of both.
It is achieved by using SR-IOV and the NIC's advanced filtering. This
patch describes Flow Bifurcation and adds the user guide for ixgbe
and i40e NICs.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch adds an image of the Live Migration of a VM using vhost_user
on the host, test configuration.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch describes the procedure to be be followed to perform
Live Migration of a VM with Virtio PMD running on a host which
is running the vhost_user sample application (vhost-switch).
It includes sample host and VM scripts used in the procedure.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch adds an image of the Live Migration for
virtio and sriov test configuration.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch describes the procedure to be be followed
to perform Live Migration of a VM with Virtio and VF PMD's
using the bonding PMD.
It includes sample host and VM scripts used in the procedure,
and a sample switch configuration.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
- Fix vhost setup flags
- Add minor edits to improve readability and consistency
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
The vhost feature negotiation only happens at virtio reset stage, say
when a virtio-net device is firstly initiated, or when DPDK virtio PMD
initiates. That means, if vhost APP restarts after the negotiation and
reconnects, the feature negotiation process will not be triggered again,
meaning the info is lost. To make reconnect work, QEMU simply saves
the negotiated features before the restart and restores it afterwards.
Therefore, the vhost supported features must be exactly the same before
and after the restart. For example, if TSO is disabled and then enabled,
nothing will work and undefined issues might happen.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
icc version 16.0.2, compile error:
examples/l2fwd-ivshmem/host/host.c(157):
error #3656: variable "total_vm_packets_dropped"
may be used before its value is set
total_vm_packets_dropped += ctrl->vm_ports[portid].stats.dropped;
^
Fixes: 6aa4972491 ("examples/l2fwd-ivshmem: import sample application")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Function create_mp_ring_vdev() for failure cases exits without
freeing the created rte rings, because of this, pdump tool cannot be
rerun successfully. Added rte ring cleanup logic upon failures.
Fixes: caa7028276 ("app/pdump: add tool for packet capturing")
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
i40e driver was removing elements when iterating tailq lists
with TAILQ_FOREACH macro, which is not safe.
It is especially visible since the memory is zeroed on free
(commit ea0bddbd14).
Instead, TAILQ_FOREACH_SAFE macro is used when removing/freeing
these elements.
Fixes: 4861cde461 ("i40e: new poll mode driver")
Fixes: 440499cf53 ("net/i40e: support floating VEB")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Removing/freeing elements elements within a TAILQ_FOREACH loop is not safe.
FreeBSD defines TAILQ_FOREACH_SAFE macro, which permits
these operations safely.
This patch defines this macro for Linux systems, where it is not defined.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
In rte_mem_virt2phy: Value returned from a function and indicating the
number of bytes was ignored. This could cause a wrong pfn (page frame
number) mask read from pagemap file.
When read returns less than the number of sizeof(uint64_t) bytes,
function rte_mem_virt2phy returns error.
Coverity issue: 13212
Fixes: 40b966a211 ("ivshmem: library changes for mmaping using ivshmem")
Signed-off-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>