Do not set FDIRCTRL.DROP_NO_MATCH in ixgbe_init_fdir_perfect_82599(),
this bit is already set in ixgbe_set_fdir_drop_queue_82599() which
makes more sense for drivers that call that function.
This resolves an issue where packets were being dropped when switching
to perfect filters mode.
Setting this bit makes no sense in perfect filters mode for the
driver as we do not want to route all packets that don't match an FDIR
rule to a single queue and instead fall back to RSS.
Drivers that need this bit set can call ixgbe_set_fdir_drop_queue_82599()
and the ones that don't, can preserve the old behavior.
Fixes: 2241ce281646 ("ixgbe/base: add flow director drop queue")
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
The X550EM_a device provides the MAC_SGMII_BUSY register to
indicate when slow SGMII register writes complete. Add
definitions for the register. No definitions are provided for
the individual bits under the theory that it is better to wait
for everything to complete when needed rather than try to map
out which reads need to wait for which writes. So we should wait
when anything is marked as "busy".
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Instead of not defining the callback for set_phy_power when
manageability is enabled, put the check in the set_phy_power
function so that only turning the power off is conditional on
management, but not turning the PHY on.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
This patch resolves an issue where VF mac address is zeroed out
in cases where the VF driver is loaded while the PF interface
is down.
The solution is to only set it when we get an ACK from the PF.
Fixes: 6202266e5680 ("ixgbe/base: vf changes")
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Only x550em_x V1 was supported before. Now V2 is supported.
A mask for V1 and V2 is defined and used to support both.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Add new X550EM_a devices and their mac types, X550EM_a
and X550EM_a_vf.
Update the code to use the new devices and mac types.
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Currently, ixgbe vf and pf will disable interrupt twice in
stop stage and uninit stage. It will cause an error:
testpmd> quit
Shutting down port 0...
Stopping ports...
Done
Closing ports...
EAL: Error disabling MSI-X interrupts for fd 26
Done
because the interrupt has already been disabled in stop stage.
Since it is enabled in init stage, better remove from
stop stage.
Fixes: 0eb609239efd ("ixgbe: enable Rx queue interrupts for PF and VF")
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
The freeing of mbuf's in ixgbe is one of the observable hot spots
under load. Optimize it by doing bulk free of mbufs using code similar
to i40e and fm10k.
Drop the no longer needed micro-optimization for the no refcount flag.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This brings the DPDK igb driver inline with the behavior used by
the current Linux driver. The IGB hardware has several different
MAC types and the threshold values that work vary based on the hardware.
Since DPDK 1.8 it has been up to devices to provide the correct default
configuration parameter. But the igb driver gives values that are broken
on some devices, and always causes a warning message at startup.
Please test this on real hardware, I don't have the luxury of a
hardware lab full of variations of this chip.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Allow reprogramming of the RAR with a zero mac address,
to ensure that the VF traffic goes to the PF after
stop, close and detach of the VF.
Fixes: be2d648a2dd3 ("igb: add PF support")
Fixes: d82170d27918 ("igb: add VF support")
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Enable promiscuous and allmulticast mode control from the VF using
rte_eth_promiscuous_enable()/rte_eth_promiscuous_disable() and
rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable().
For promiscuous mode host/PF igb driver should be built with
IGB_ENABLE_VF_PROMISC.
For allmulticast mode "allmulti" flag should be set for appropriate PF
ifconfig eth0 allmulti
Signed-off-by: Yury Kylulin <yury.kylulin@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Modified driver and eal code to support I217 and I218 Intel NICs.
Compiled and tested (via testpmd) on Ubuntu 14.04 for target
x86_64-native-linuxapp-gcc
Compiled for target x86_64-native-linuxapp-clang
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
The last packet of the tx burst function array was not being
emitted until the subsequent call. The nic descriptor index
was being set to the current tx descriptor instead of one past
the descriptor as required by the nic.
Fixes: d739ba4c6abf ("enic: improve Tx packet rate")
Signed-off-by: John Daley <johndale@cisco.com>
This is a wholesale replacement of the Enic PMD receive path in order
to improve performance and code clarity. The changes are:
- Simplify and reduce code path length of receive function.
- Put most of the fast-path receive functions in one file.
- Reduce the number of posted_index updates (pay attention to
rx_free_thresh)
- Remove the unneeded container structure around the RQ mbuf ring
- Prefetch next Mbuf and descriptors while processing the current one
- Use a lookup table for converting CQ flags to mbuf flags.
Signed-off-by: John Daley <johndale@cisco.com>
The enic PMD driver send function uses a constant offset instead
of relying on the data_off in the mbuf to find the start of the packet.
Fixes: fefed3d1e62c ("enic: new driver")
Signed-off-by: Yoann Desmouceaux <ydesmouc@cisco.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Chelsio NIC ports share a single PF. Move rte_eth_copy_pci_info()
to copy the pci device information to the remaining ports as well.
Also update license year to 2016.
Fixes: eeefe73f0af1 ("drivers: copy PCI device info to ethdev data")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
max_rx_pkt_len already includes ETHER_HDR_LEN and ETHER_CRC_LEN for the
mtu. But, the firmware also adds ETHER_HDR_LEN and ETHER_CRC_LEN to the
mtu specified. Fix by subtracting these values from the mtu before
passing it to firmware.
Fixes: 4b2eff452d2e ("cxgbe: enable jumbo frames")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
The size of each entry in the port's rss table is actually 2 bytes
and not 1 byte. A segfault occurs when accessing part of port 0's rss
table because it gets overwritten by subsequent port 1's part of the
rss table. Fix by setting the size of each entry appropriately.
Fixes: 92c8a63223e5 ("cxgbe: add device configuration and Rx support")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
The VF needs to determine the queues sizes before .dev_infos_get
so that it can hint to the upper layer the proper sizes. Move
bnx2x_vf_get_resources() to .eth_dev_init and probe with the guesses
from bnx2x_init_rte().
Signed-off-by: Chas Williams <3chas3@gmail.com>
Acked-by: Rasesh Mody <rasesh.mody@qlogic.com>
bnx2x_loop_obtain_resources() returns a struct containing the status and
the error message. If bnx2x_do_req4pf() fails, it shouldn't return both
of these fields set to 0 indicating failure and no error.
Further, bnx2x_do_req4pf() needs to be able fail and return NO_RESOURCES
so that bnx2x_loop_obtain_resources() can negotiate reduced resource
requirments. This requires additional checking around bnx2x_do_req4pf().
Fixes: 540a211084a7 ("bnx2x: driver core")
Signed-off-by: Chas Williams <3chas3@gmail.com>
Acked-by: Rasesh Mody <rasesh.mody@qlogic.com>
The mbuf_alloc_size is leftover from BSD or some other code base.
It is set but never used in DPDK driver. After that the related defines
can also be eliminated.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Harish Patil <harish.patil@qlogic.com>
Fixes: the hung/crash issue when quitting testpmd under high
traffic rate. The following issue were found and fixed.
1. edesc->size is not initialized properly in mpipe_do_xmit() and could
cause buffer leak or corruption when HW buffer return is used.
2. Check the 'idesc.be' error bit in mpipe_recv_flush() to make sure
buffer is valid before releasing it. This is to avoid issues when
running out of buffers.
3. priv->rx_buffers counter is not accurate when HW buffer return is
used. Remove this counter to simplify the code.
Signed-off-by: Liming Sun <lsun@ezchip.com>
Acked-by: Zhigang Lu <zlu@ezchip.com>
Mpipe link structure is initialized in function mpipe_link_init().
Currently it's only called from the eth_dev_ops.dev_start, which
caused crashes when link mgmt APIs (like promiscuous_enable)
was called before eth_dev_ops.dev_start(). This submit fixed it
by calling mpipe_link_init() in rte_pmd_mpipe_devinit().
Fixes: a8dd50513dea ("mpipe: add TILE-Gx mPIPE poll mode driver")
Signed-off-by: Liming Sun <lsun@ezchip.com>
Acked-by: Zhigang Lu <zlu@ezchip.com>
This submit has changes to optimize the mpipe buffer return. When
a packet is received, instead of allocating and refilling the
buffer stack right away, it tracks the number of pending buffers,
and use HW buffer return as an optimization when the pending
number is below certain threshold, thus save two MMIO writes and
improves performance especially for bidirectional traffic case.
Signed-off-by: Liming Sun <lsun@ezchip.com>
Acked-by: Zhigang Lu <zlu@ezchip.com>
The CROSS variable has empty default value (for native) and
must be set when using a cross-toolchain.
Signed-off-by: Liming Sun <lsun@ezchip.com>
Acked-by: Zhigang Lu <zlu@ezchip.com>
Currently, default values of kickfd and callfd are -1.
If the values are -1, current code guesses kickfd and callfd haven't
been initialized yet. Then vhost library will guess the virtqueue isn't
ready for processing.
But callfd and kickfd will be set as -1 when "--enable-kvm"
isn't specified in QEMU command line. It means we cannot treat -1 as
uninitialized state.
The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and
VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for
the default values of kickfd and callfd.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If a malicious guest forges a dead loop chain, it could lead to a dead
loop of copying the desc buf to mbuf, which results to all mbuf being
exhausted.
Add a var nr_desc to avoid such case.
Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
A malicious guest may easily forge some illegal vring desc buf.
To make our vhost robust, we need make sure desc->next will not
go beyond the vq->desc[] array.
Suggested-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We need make sure that desc->len is bigger than the size of virtio net
header, otherwise, unexpected behaviour might happen due to "desc_avail"
would become a huge number with for following code:
desc_avail = desc->len - vq->vhost_hlen;
For dequeue code path, it will try to allocate enough mbuf to hold such
size of desc buf, which ends up with consuming all mbufs, leading to no
free mbuf is available. Therefore, you might see an error message:
Failed to allocate memory for mbuf.
Also, for both dequeue/enqueue code path, while it copies data from/to
desc buf, the big "desc_avail" would result to access memory not belong
the desc buf, which could lead to some potential memory access errors.
A malicious guest could easily forge such malformed vring desc buf. Every
time we restart an interrupted DPDK application inside guest would also
trigger this issue, as all huge pages are reset to 0 during DPDK re-init,
leading to desc->len being 0.
Therefore, this patch does a sanity check for desc->len, to make vhost
robust.
Reported-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost.
Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't
make sense to me at all.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
First of all, rte_memcpy() is mostly useful for copying big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.
And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy()
is inlined, it increases the binary code size linearly every time
we call it at a different place. Replacing the two rte_memcpy()
with directly copy saves nearly 12K bytes of code size!
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Current virtio_dev_merge_rx() implementation just looks like the
old rte_vhost_dequeue_burst(), full of twisted logic, that you
can see same code block in quite many different places.
However, the logic of virtio_dev_merge_rx() is quite similar to
virtio_dev_rx(). The big difference is that the mergeable one
could allocate more than one available entries to hold the data.
Fetching all available entries to vec_buf at once makes the
difference a bit bigger then.
The refactored code looks like below:
while (mbuf_has_not_drained_totally || mbuf_has_next) {
if (this_desc_has_no_room) {
this_desc = fetch_next_from_vec_buf();
if (it is the last of a desc chain)
update_used_ring();
}
if (this_mbuf_has_drained_totally)
mbuf = fetch_next_mbuf();
COPY(this_desc, this_mbuf);
}
This patch reduces quite many lines of code, therefore, make it much
more readable.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This is a simple refactor, as there isn't any twisted logic in old
code. Here I just broke the code and introduced two helper functions,
reserve_avail_buf() and copy_mbuf_to_desc() to make the code more
readable.
Also, it saves nearly 1K bytes of binary code size.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The current rte_vhost_dequeue_burst() implementation is a bit messy
and logic twisted. And you could see repeat code here and there.
However, rte_vhost_dequeue_burst() acutally does a simple job: copy
the packet data from vring desc to mbuf. What's tricky here is:
- desc buff could be chained (by desc->next field), so that you need
fetch next one if current is wholly drained.
- One mbuf could not be big enough to hold all desc buff, hence you
need to chain the mbuf as well, by the mbuf->next field.
The simplified code looks like following:
while (this_desc_is_not_drained_totally || has_next_desc) {
if (this_desc_has_drained_totally) {
this_desc = next_desc();
}
if (mbuf_has_no_room) {
mbuf = allocate_a_new_mbuf();
}
COPY(mbuf, desc);
}
Note that the old patch does a special handling for skipping virtio
header. However, that could be simply done by adjusting desc_avail
and desc_offset var:
desc_avail = desc->len - vq->vhost_hlen;
desc_offset = vq->vhost_hlen;
This refactor makes the code much more readable (IMO), yet it reduces
binary code size.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Declare dst as type uint32_t instead of uint64_t, otherwise, we will get
a random upper 32 bit feature bits, as the following io port read reads
lower 32 bit only. It could lead a feature bits that include VIRTIO_F_VERSION_1
(the 32th bit) for legacy virtio, which is obviously wrong.
Fixes: b8f04520ad71 ("virtio: use PCI ioport API")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled inverse
multiply. To maintain equivalent precision, scaled math is used.
The application ABI is the same.
This improved performance from 5Gbit/sec to 10 Gbit/sec when configured
for 10 Gbit/sec rate.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This adds (with permission of the original author)
reciprocal divide based on algorithm in Linux.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Add DT_NEEDED entries for librte_eal external dependencies.
Details between the platforms differ somewhat, and for static
builds they need to be handled from mk/exec-env still.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Add DT_NEEDED entries for external library dependencies which
are the most critical ones for sane operation.
Clean up vhost_cuse CFLAGS/LDFLAGS confusion while at it.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
There are two places that need -lm (test app and librte_sched) and
exactly one that needs -lrt (librte_sched). Add the relevant
DT_NEEDED entries to both, and eliminate the bogus discrepancy
between Linux and BSD EXECENV_LDLIBS wrt these libs.
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Added testpmd support to validate zero nb_rxq/nb_txq
changes of ethdev (d505ba8).
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tell the compiler to use unsigned constants for left shift ops,
otherwise building with gcc >= 6.0 fails due to multiple warnings like:
warning: left shift of negative value [-Wshift-negative-value]
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The pass-through pipeline implementation is extended with load balancing
function. This function allows uniform distribution of the packets among
its output ports. For packets distribution, any application level logic
can be applied. For instance, in this implementation, hash value
computed over specific header fields of the incoming packets has been
used to spread traffic uniformly among the output ports.
The following pass-through configuration can be used for implementing
load balancing function over ipv4 traffic;
[PIPELINE0]
type = PASS-THROUGH
core = 0
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0
dma_src_offset = 278; mbuf (128) + headroom (128) + 1st ethertype offset (14) + ttl offset within ip header = 278 (ipv4)
dma_dst_offset = 128; mbuf (128)
dma_size = 16
dma_src_mask = 00FF0000FFFFFFFFFFFFFFFFFFFFFFFF
dma_hash_offset = 144; (dma_dst_offset+dma_size)
lb = hash
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch add packet dumping feature to ip_pipeline. Output port type
SINK now supports dumping packets to PCAP file before releasing mbuf back
to mempool. This feature can be applied by specifying parameters in
configuration file as shown below:
[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = SOURCE0 SOURCE1
pktq_out = SINK0 SINK1
pcap_file_wr = /path/to/eth1.pcap /path/to/eth2.pcap
pcap_n_pkt_wr = 80 0
The configuration section "pcap_file_wr" contains full path and name of
the PCAP file which the packets will be dumped to. If multiple SINKs
exists, each shall have its own PCAP file path listed in this section,
separated by spaces. Multiple SINK ports shall NOT share same PCAP file to
be dumped.
The configuration section "pcap_n_pkt_wr" contains integer value(s)
and indicates the maximum number of packets to be dumped to the PCAP file.
If this value is "0", the "infinite" dumping mode will be used. If this
value is N (N > 0), the dumping will be finished when the number of
packets dumped to the file reaches N.
To enable PCAP dumping support to IP pipeline, the compiler option
CONFIG_RTE_PORT_PCAP must be set to 'y'. It is possible to disable this
feature by removing "pcap_file_wr" and "pcap_n_pkt_wr" lines from the
configuration file.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>