The page size is often retrieved from the macro PAGE_SIZE.
If PAGE_SIZE is not defined, it is either using hard coded default,
or getting the system value from the UNIX-only function sysconf().
Such definitions are replaced with the generic function
rte_mem_page_size() defined for each supported OS.
Removing PAGE_SIZE definitions will fix dlb drivers for musl libc,
because #ifdef checks were missing, causing redefinition errors.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Boyer <aboyer@pensando.io>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
When a VF device is present, netvsc can send or receive packets over the
VF device. The VF device driver communicates directly with the PCI device
via the PF from the host hypervisor. This is faster than exchanging data
with netvsp via vmbus, i.e. syntheic path.
In Azure and Hyper-v environments, VF device can be hot added or hot
removed at anytime while guest VM is running. This patch improves netvsc
to support VF device hot add/remove.
1. netvsc monitors all system hot add activities over the PCI bus. When it
detects a VF device is added to the system and is managed under this
netvsc device, it asks EAL to probe and start this VF device, then it
attaches and switches data path to the VF device.
2. After a VF device is attached to netvsc, netvsc monitors this device on
hot remove. When this VF device is hot removed, netvsc switches data path
to synthetic, stops this VF device and removes it from EAL.
3. If any failure happens during a VF device hot remove or add, the netvsc
falls back to synthetic path for all data traffic.
Signed-off-by: Long Li <longli@microsoft.com>
When receiving packets, netvsp puts data in a buffer mapped through UIO.
Depending on packet size, netvsc may attach the buffer as an external
mbuf. This is not a problem if this mbuf is consumed in the application,
and the application can correctly read data out of an external mbuf.
However, there are two problems with data in an external mbuf.
1. Due to the limitation of the kernel UIO implementation, physical
address of this external buffer is not exposed to the user-mode. If
this mbuf is passed to another driver, the other driver is unable to
map this buffer to iova.
2. Some DPDK applications are not aware of external mbuf, and may bug
when they receive an mbuf with external buffer attached.
Introduce a driver parameter "rx_extmbuf_enable" to control if netvsc
should use external mbuf for receiving packets. The default value is 0.
(netvsc doesn't use external mbuf, it always allocates mbuf and copy
data to mbuf) A non-zero value tells netvsc to attach external buffers
to mbuf on receiving packets, thus avoid copying memory.
Signed-off-by: Long Li <longli@microsoft.com>
The values for Rx and Tx copy break should be tunable rather
than hard coded constants.
The rx_copybreak sets the threshold where the driver uses an
external mbuf to avoid having to copy data. Setting 0 for copybreak
will cause driver to always create an external mbuf. Setting
a value greater than the MTU would prevent it from ever making
an external mbuf and always copy. The default value is 256 (bytes).
Likewise the tx_copybreak sets the threshold where the driver
aggregates multiple small packets into one request. If tx_copybreak
is 0 then each packet goes as a VMBus request (no copying).
If tx_copybreak is set larger than the MTU, then all packets smaller
than the chunk size of the VMBus send buffer will be copied; larger
packets always have to go as a single direct request. The default
value is 512 (bytes).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Long Li <longli@microsoft.com>
When sending data, netvsc assumes the tx_rndis buffer is contiguous and
calculates physical addresses based on this assumption.
Use memzone to allocate tx_rndis so it's guaranteed that this buffer is
physically contiguous.
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Change eth_dev_stop_t return value from void to int.
Make eth_dev_stop_t implementations across all drivers to return
negative errno values if case of error conditions.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The API function rte_eth_dev_close() was returning void.
The return type is changed to int for notifying of errors.
If an error happens during a close operation,
the status of the port is undefined,
a maximum of resources having been freed.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Liron Himi <lironh@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
netvsc uses rxbuf_info buffer to track received packets attached via
rte_pktmbuf_attach_extbuf() and ack the host based on usage count. It
uses the transaction_id in the VMBus packet to locate where to use
memory in the rxbuf_info.
This is not correct in multiple channel setup, as different channels may
return identical transaction_ids at a time, and may corrupt the
rxbuf_info buffer.
Fix this by defining rxbuf_info for each queue.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
When rte_pktmbuf_attach_extbuf() is used, the driver should not decrease
the reference count in its callback function hn_rx_buf_free_cb, because
the reference count is already decreased by rte_pktmbuf. Doing it twice
may result in underflow and driver may never send an ack packet over
vmbus to host.
Also declares rxbuf_outstanding as atomic, because this value is shared
among all receive queues.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
When the primary device link state is queried, there is no
need to query the VF state as well. The application only sees
the state of the synthetic device.
Fixes: dc7680e859 ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The PMD_TX_LOG and PMD_RX_LOG can hide errors since this
debug log is typically disabled. Change the code to use
PMD_DRV_LOG for errors.
Under load, the ring buffer to the host can fill.
Add some statistics to estimate the impact and see other errors.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
These functions are useful for applications and debugging.
The netvsc PMD also transparently handles the rx/tx descriptor
functions for underlying VF device.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There is not a lot of info here from this driver.
But worth supporting these additional info queries.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
With multiple channels, the primary channel may receive notification
that VF has been added or removed while secondary channel is in
process of doing receive or transmit. Resolve this race by converting
existing vf_lock to a reader/writer lock.
Users of lock (tx/rx/stats) acquire for read, and actions like
add/remove acquire it for write.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The netvsc PMD was putting the mac address in private data but the
core rte_ethdev doesn't allow that it. It has to be in rte_malloc'd
memory or a message will be printed on shutdown/close.
EAL: Invalid memory
Fixes: f8279f47dd ("net/netvsc: fix crash in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The VMBus has reserved transmit area (per device) and transmit
descriptors (per queue). The previous code was always having a 1:1
mapping between send buffers and descriptors.
This can lead to one queue starving another and also buffer bloat.
Change to working more like FreeBSD where there is a pool of transmit
descriptors per queue. If send buffer is not available then no
aggregation happens but the queue can still drain.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Enabling/disabling of allmulticast mode is not always successful and
it should be taken into account to be able to handle it properly.
When correct return status is unclear from driver code, -EAGAIN is used.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Change return value of the callbacks from void to int. Make
implementations across all drivers return negative errno
values in case of error conditions.
Both callbacks are updated together because a large number of
drivers assign the same function to both callbacks.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Enabling/disabling of promiscuous mode is not always successful and
it should be taken into account to be able to handle it properly.
When correct return status is unclear from driver code, -EAGAIN is used.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
rte_eth_dev_info_get() return value was changed from void to int,
so this patch modify rte_eth_dev_info_get() usage across
net/netvsc according to its new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The id values for VF stats were not being offset correctly.
And getting xstats for VF device only worked if VF device supported
it; it did not support the generic stats.
Fixes: dc7680e859 ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When dev_close is called, the netvsc driver will clean up all
queues including the primary ring buffer.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
The VF device management in netvsc was using a pointer to the
rte_eth_devices. But the actual rte_eth_devices array is likely to
be place in the secondary process; which causes a crash.
The solution is to record the port of the VF (instead of a pointer)
and find the device in the per process array as needed.
Fixes: dc7680e859 ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
On device close or startup errors, the transmit descriptor pool
was being left behind.
Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Provide API's to enable allmulticast and promiscuous in Netvsc PMD
with VF. This keeps the VF and PV path in sync.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Integrate accelerated networking support into netvsc PMD.
This allows netvsc to manage VF without using failsafe or vdev_netvsc.
For the exception vswitch path some tests like transmit
get a 22% increase in packets/sec.
For the VF path, the code is slightly shorter but has no
real change in performance.
Pro:
* using netvsc is more like other DPDK NIC's
* the exception packet uses less CPU
* much smaller code size
* no locking required on VF transmit/receive path
* no legacy Linux network device to get mangled by userspace
* much simpler (1K vs 9K) LOC
* unified extended statistics
Con:
* using netvsc has more complex startup model
* no bifurcated driver support
* no flow support (since host does not have flow API).
* no tunnel offload support
* no receive interrupt support
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Implement callback functionality on link state changes.
This is not really driven off of interrupt file descriptor like most other
PMD's. Instead, it happens when a link state change message arrives
in the common ring buffer.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
If application sends faster than vswitch can keep up, then the
transmit descriptor pool will be exhausted. This is not a failure
so change the name statistic and don't include it in oerrors.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
The event buffer was changed to be a fixed size value,
had a couple of issues. The big one is that rte_free was still
being called for a pointer that was not setup with rte_malloc().
The event buffer was also too small to handle heavy receive
traffic; and running the event buffer out would crash
the application.
Fix by going back to a dynamically resized event buffer.
And grow it by 25% to avoid lots of realloc's.
Fixes: 530af95a78 ("bus/vmbus: avoid signalling host on read")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Add tx_done_cleanup ethdev hook to allow application to
control if/when it wants completions to be handled.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Tune the vmbus connection so the host scans faster. This improves
transmit performance. The host default value is 100us but setting
to 50us reduces packet loss significantly.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Don't signal host that receive ring has been read until all events
have been processed. This reduces the number of guest exits and
therefore improves performance.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
The driver supports Hyper-V networking directly like
virtio for KVM or vmxnet3 for VMware.
This code is based off of the FreeBSD driver. The file and variable
names are kept the same to help with understanding (with most of the
BSD style warts removed).
This version supports the latest NetVSP 6.1 version and
older versions.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>