This file defines the operations to be implemented by
any Packet Framework table.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Source port is a packet generator, similar to /dev/zero Linux device.
Sink port is a packet terminator (drops all input packets), similar
to /dev/null Linux device.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
The QoS hierarchical scheduler presented as Packet Framework port.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
The IPv4 reassembly operation is presented as a Packet Framework port.
The code duplication with examples/ip_reassembly sample application
to be addressed soon by linking the relevant library once upstreamed.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
[Thomas: update to new ip_frag library]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This port presents the IPv4 fragmentation operation as a Packet Framework port.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
[Thomas: update to new ip_frag library]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
ring_reader input port (on top of single consumer rte_ring)
ring writer output port (on top of single producer rte_ring)
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
The input port ethdev_reader implements the Packet Framework port API
on top of the Intel DPDK poll mode driver for a NIC RX queue.
The output port ethdev_writer implements the Packet Framework port API
on top of the Intel DPDK poll mode driver for a NIC TX queue.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
This file defines the port operations that have to be implemented
by Packet Framework ports.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Added API function for LPM IPv4 and IPv6 to query for the existence
of a rule/route and return the next hop ID associated with the route
if route is present.
This is used by the Packet Framework LPM table for implementing a
routing table.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Added zero-size field (offset in data structure) to specify the beginning
of packet meta-data in the packet buffer just after the mbuf.
The size of the packet meta-data is application specific and the packet
meta-data is managed by the application.
The packet meta-data should always be accessed through the provided macros.
This is used by the Packet Framework libraries (port, table, pipeline).
There is absolutely no performance impact due to this mbuf field, as it
does not take any space in the mbuf structure (zero-size field).
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
This patch adds following ixgbe NIC filters implement:
syn filter, ethertype filter, 5tuple filter for intel NIC 82599
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch adds following igb NIC filters implement:
syn filter, ethertype filter, 2tuple filter, flex filter for intel NIC 82580 and i350
syn filter, ethertype filter, 5tuple filter for intel NIC 82576
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch adds APIs for NIC filters list below:
ethertype filter, syn filter, 2tuple filter, flex filter, 5tuple filter
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Mostly a copy-paste of IPv4, with a few caveats.
Only supported packets are those in which fragment extension header is
just after the IPv6 header.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Technically, fragmentation table can work for both IPv4 and IPv6
packets, so we're renaming everything to be generic enough to make sense
in IPv6 context.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Moved out debug log macros into common, as reassembly code will later
need them as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Issues were reported by checkpatch.pl.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.
Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.
Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.
There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.
If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).
In unit tests, we don't know if VFIO is compiled (eal_vfio.h header is
internal to Linuxapp EAL), so we check this flag regardless.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.
For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.
VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).
Here is the logic in a nutshell:
1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed
in case of any error, socket is closed and SOCKET_ERR is sent.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).
In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add VFIO compilation option to linuxapp config.
Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.
This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.
Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
New file containing optimized receive and transmit functions which
use 128bit vector instructions to improve performance. When conditions
permit, these functions will be enabled at runtime by the device
initialization routines already in the PMD.
The compilation of the vectorized RX and TX code paths is controlled by
a new setting in the build time configuration for the IXGBE driver. Also
added is a setting which allows an optional further performance increase
by disabling the use of the olflags field on packet RX.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: XiaonanX Zhang <xiaonanx.zhang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
[Thomas: code-style adjustments]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The ACL library is used to perform an N-tuple search over a set of rules with
multiple categories and find the best match for each category.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
[Thomas: some code-style changes]
This fixes style problems reported by checkpatch including:
* extra whitespace
* spaces before tabs
* strings broken across lines
* excessively long lines
* missing spaces after keywords
* unnecessary paren's in return statements
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
This adds the code for a new Intel DPDK library for packet distribution.
The distributor is a component which is designed to pass packets
one-at-a-time to workers, with dynamic load balancing. Using the RSS
field in the mbuf as a tag, the distributor tracks what packet tag is
being processed by what worker and then ensures that no two packets with
the same tag are in-flight simultaneously. Once a tag is not in-flight,
then the next packet with that tag will be sent to the next available
core.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
[Thomas: add doxygen @file comment]
Allows to lookup four IP addresses in an LPM table.
Uses SSE instrincts.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Add API to support setting TX rate for a queue and a VF.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
It is implemented by enabling or disabling TX laser.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
This patch adds API to support the functionality of setting link up and down.
It can be used to repeatedly stop and restart RX/TX of a port without
re-allocating resources for the port and re-configuring the port.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
icc 12.1 complains about RTE_LOG() format:
"argument is incompatible with corresponding format string conversion"
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
If igb_alloc_rx_queue_mbufs() would fail to allocate an mbuf for RX queue,
it calls igb_rx_queue_release(rxq).
That causes rxq to be silently freed, without updating
dev->data->rx_queues[].
So any further reference to it will trigger the SIGSEGV.
Same thing in em PMD too.
To fix: igb_alloc_rx_queue_mbufs() should just return an error to the
caller and let upper layer to deal with the probem.
That's what ixgbe PMD is doing right now.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add __rte_unused to
pci_unbind_kernel_driver(struct rte_pci_device *dev)
Signed-off-by: Alan Carew <alan.carew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Recent change to rte_dump_tailq (commit 591a9d7985),
which now uses a FILE parameter causes compilation to fail under FreeBSD
and sourced to a missing include of stdio.h.
Errors:
rte_tailq.h: unknown type name 'FILE' void rte_dump_tailq(FILE *f);
rte_memory.h: unknown type name 'FILE' void rte_dump_physmem_layout(FILE *f);
Signed-off-by: Alan Carew <alan.carew@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
If pcap_sendpacket() fails, then eth_pcap_tx shouldn't silently free that
mbuf and continue.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
The unit of allocated_size is MB, so the change below is made.
Otherwise, it will fail to free memory when available memory is not enough.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Since Linux kernel version 3.13.0,
the xen_create/destroy_contiguous_region() API has been changed,
and the first parameter is physical address in the API.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
The patch changes the way of reserving memory in Dom0 driver.
It will reserve memory at installing rte_dom0_mm.ko kernel module
instead of requesting memory dynamically during DPDK application startup.
Meanwhile, now driver requests memory size of 4M once first,
if it failed, and request memory size of 2M once.
The main reasons for these changes are as follows:
First, to reduce the impact of increasing in memory fragment
after system run a long time.
Second, to reduce number of memory segment.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch supports multiple queues feature in DPDK based virtio-net frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch cleanups some coding style issue, and fixes some errors and warnings
reported by checkpatch.pl.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch implements queue start and stop functionality in IXGBE PMD;
it also enable hardware loopback for VMDQ mode in IXGBE PMD.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch adds API to support queue start and stop functionality for RX/TX.
It allows RX and TX queue is started or stopped one by one, instead of starting
and stopping all of them at the same time.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
1) Add a new function "rss_hash_conf_get" in the PMD API to retrieve the
current configuration of the RSS functions and/or of the RSS key used
by a NIC to compute the RSS hash of input packets.
The new function uses the existing data structure "rte_eth_rss_conf" for
returning the RSS hash configuration.
2) Add the ixgbe-specific function "ixgbe_dev_rss_hash_conf_get" and the
igb-specific function "eth_igb_rss_hash_conf_get" to retrieve the RSS
hash configuration of ixgbe and igb controllers respectively.
3) Add the command "show port X rss-hash [key]" in the testpmd application
to display the RSS hash configuration of port X.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
1) Add a new function "rss_hash_update" in the PMD API to dynamically
update the RSS flags and/or the RSS key used by a NIC to compute the RSS
hash of input packets.
The new function uses the existing data structure "rte_eth_rss_conf" for
the argument that contains the new hash flags and/or the new hash key to
use.
2) Add the ixgbe-specific function "ixgbe_dev_rss_hash_update" and the
igb-specific function "eth_igb_rss_hash_update" to update the RSS
hash configuration of ixgbe and igb controllers respectively.
Before changing anything, these 2 functions check that the update RSS
operation does not attempt to disable RSS, if RSS was enabled at port
initialization time, or does not attempt to enable RSS, if RSS was
disabled at port initialization time.
Note:
Configuring the RSS hash flags and the RSS key used by a NIC consists in
updating appropriate PCI registers of the NIC.
These operations have been manually tested with the interactive commands
"write reg" and "write regbit" of the testpmd application.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Each entry of the RSS redirection table (RETA) of igb and ixgbe ports
contains a 4-bit RX queue index, thus imposing RSS RX queue indices to
be strictly lower than 16.
In addition, if a RETA entry is configured with a RX queue index that is
strictly lower than 16, but is greater or equal to the number of RX queues
of the port, then all input packets whose RSS hash value indexes that RETA
entry are silently dropped by the NIC.
Make the function rte_eth_dev_rss_reta_update() check that RX queue indices
that are supplied in the reta_conf argument are strictly lower than
ETH_RSS_RETA_MAX_QUEUE (16) and are strictly lower than the number of
RX queues of the port.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
e1000_vfadapt type corresponds to 82576 VF devices,
check e1000_set_mac_type() for more details.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
When initializing a VF with no initial MAC address assigned by
the underlying Host PF driver, assign a default MAC address.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The VF_RESET message of the 82599 PF/VF communication protocol issued by a
a Guest VF driver may include an optional permanent MAC address assigned to
the VF by the Guest OS, in order to make it recorded into the 82599 RAR
registers by the Host PF driver.
To indicate the absence of this optional MAC address, the VF_RESET command
assumes that a NULL MAC address is sent, instead of using a dedicated bit
for this purpose. However, when sending a VF_RESET command with no permanent
MAC address, the function ixgbe_reset_hw_vf() of the 82599 VF driver
directly invokes the function ixgbe_write_mbx_vf() with a message that does
not include a NULL MAC address, wrongly assuming that this function fills in
with zero all unused mailbox data registers.
More globally, it is safer to explicitely reset to zero all remaining mailbox
data registers that are not used to store the content of a message, in order
to reset the data sent in a previous VF/PF exchange (in either side),
including the last exchange performed by another Guest OS to which that VF
was previously assigned.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
On a 82599 VF, the deletion of a dynamically added MAC address consists in
first flushing all added MAC addresses, then in adding again all remaining MAC
addresses.
For this purpose, the function ixgbevf_remove_mac_addr() parses the pool
of MAC addresses associated with a VF, and must skip the VF permanent MAC
address that is stored into it, as well as all NULL MAC addresses.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
During the initialization of a VF device, the rte_eth_dev_start() function
indirectly invokes the PMD "mac_addr_add" function with the permanent MAC
address assigned to the device.
In the case of 82599 VFs, this operation leads to exhausting the very
limited set of PF resources used to store VF MAC addresses.
To address this issue, do nothing in the function ixgbevf_add_mac_addr()
if the added MAC address is equal to the permanent MAC address of the VF.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add missing PMD functions in the ixgbevf driver to add (respectively remove)
a MAC address to/from a 82599 VF.
For this purpose, these 2 functions use the VF/PF mailbox-based protocol.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When latest Linux ixgbe PF is used, and DPDK VF is used in DPDK application,
jumbo frames are not received.
Also - if Linux ixgbe PF has MTU set to 1500 (default),
then normal sized packets can be received by DPDK VF.
However, if Linux PF has MTU > 1500, then DPDK VF receives no packets
(normal or jumbo).
With ixgbe_mbox_api_10 ixgbe simply didn't allow set VF MTU > 1514 for 82599.
With ixgbe_mbox_ajpi_11 it does, though now, if PF uses jumbo frames,
it simply disables RX for all VFs.
So to work with PF ithat using jumbo frames, at startup each VF has to:
1. negotiate with PF mbox_api_11.
2. Send to PF SET_LPE message with desired MTU.
Note, that if PF already uses MTU bigger then asked by the VF,
then PF wouldn't take any action.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
Bug: When a timer is running
- if rte_timer_stop is called, the pending decrement is
skipped (decremented only if the timer is pending) and due
to the update flag the future processing is skipped so the
timer is counted as pending while it is stopped. - the same
applies when rte_timer_reset is called but then the pending
statistics is additionally incremented so the timer is
counted pending twice.
Solution: decrement the pending
statistics after returning from the callback. If
rte_timer_stop was called, it skipped decrementing the
pending statistics. If rte_time_reset was called, the
pending statistics was incremented. If neither was called
and the timer is periodic, the pending statistics is
incremented when it is reloaded
Signed-off-by: Vadim Suraev <vadim.suraev@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Bug: when a periodic timer's callback is running, if another
timer is manipulated, the periodic timer is not reloaded.
Solution: set the update flag only if the modified timer is
in RUNNING state
Signed-off-by: Vadim Suraev <vadim.suraev@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Stop on EOF when reading commands from a file or a pipe.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Error of "implicit-function-declaration" can be seen when building KNI
kernel module on Linux kernel 3.6.10 platform, as follows.
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
In function igb_get_eee:
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
2441:4: error: implicit declaration of function
mmd_eee_adv_to_ethtool_adv_t
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
In function igb_set_eee:
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
2551:2: error: implicit declaration of function
ethtool_adv_to_mmd_eee_adv_t
The root cause is as follows.
On Fedora 18 with kernel 3.6.10, ETHTOOL_GEEE is defined in Linux
header file of "linux/ethtool.h", while is not defined in most of other
linux kernel versions.
mmd_eee_cap_to_ethtool_sup_t(), mmd_eee_adv_to_ethtool_adv_t() and
ethtool_adv_to_mmd_eee_adv_t() in kcompat.h are disabled by "#if
!defined(ETHTOOL_GEEE) || (RHEL_RELEASE_CODE && RHEL_RELEASE_CODE <=
RHEL_RELEASE_VERSION(6,4))", while are called in igb_get_eee() in
igb_ethtool.c which is enabled by "#ifdef ETHTOOL_GEEE".
Reported-by: Prashant Upadhyaya <prashant.upadhyaya@aricent.com>
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The rte_scheduler will get stuck and not deliver any more packets
if there are two active subports and then one of them stops enqueing
more packets. This is because of a bug in how the grinder state machines
are managed.
If a non-zero grinder is assigned (but not yet active), then the dequeue
would miss it and always return zero packets. The cure is to always
do a first pass over all grinders.
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Fix build error if RTE_SCHED_DEBUG is enabled.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The existing rte scheduler can only be safely configured once per port
because a memory zone has a fixed size once it is created and can never
be freed or change in size.
This patch changes the scheduler to use rte_malloc instead. This allows
for a port to be reconfigured by doing rte_sched_port_free followed
rte_sched_port_config.
The patch also removes the now unused name parameter from the
port parameters structure.
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There was an indentation error in commit d93c252e88
"convert to use of PMD_REGISTER_DRIVER and fix linking"
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There was an indentation error in commit e57f20e051
"make vdev init path generic for both virtual and pci devices"
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Now that we've converted all the pmds in dpdk to use the driver registration
macro, rte_pmd_init_all has become empty. As theres no reason to keep it around
anymore, just remove it and fix up all the eample callers.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the vmxnet3 pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the vmxnet3 library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the virtio pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the virtio library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the ixgbevf pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the ixgbevf library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the ixgbe pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the ixgbe library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the igbvf pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the igbvf library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the igb pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the igb library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the e1000 pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the e1000 library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Currently, physical device pmds use a separate initalization path
(rte_pmd_init_all) while virtual devices use a constructor registration and
rte_eal_dev_init. Theres no reason to have them be separate. This patch
removes the vdev specific nomenclature from the vdev init path and makes it more
generic for use with all pmds. This is the first step in converting the
physical device pmds to using the same constructor based registration path that
the virtual devices use.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the xenvirt driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the xenvirt library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
A few notes:
xenvirt was unbuildable as of commit 4c39baf297d10c217e7d3e7370f26a1fede58308..
That commit neglected to include the rte_vdev.h header, so several structs were
left undefined. This patch includes a fix for that as well.
Also, The linkage for xenvirt is broken in much the same way pmd_ring was, in
that the xenvirt pmd has a function that is called directly from applications
(the example being the testpmd application). The function is
rte_mempool_gntalloc_create, and should clearly be moved into the rte_mempool
library, with the supporting code in the function implementation moved to a new
xenvirt library separate from the pmd. This is a large undertaking that
detracts from the purpose of this series however, and so for now, I'm leaving
the linkage to the application in place, and will address this issue in a later
series
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the ring driver to use the PMD_REGISTER_DRIVER macro and fix up the
Makefile so that its linkage is only done if we are building static libraries.
This means that the test applications now have no reference to the ring library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Note that the ring driver was also written in such a way that it violated some
general layering principles, several functions were contained in the pmd which
were being called by example from the test application in the app/test
directory. Specifically it was calling eth_ring_pair_attach,
eth_ring_pair_create and rte_eth_ring_devinit, which should only be called
internally to the dpdk core library. To correct this I've removed those
functions, and instead allowed them to be called indirectly at initalization
time using the vdev command line argument key nodeaction=<name>:<node>:<action>
where action is one of ATTACH or CREATE. I've tested out the functionality of
the command line with the testpmd utility, with success, and have removed the
called functions from the test utility. This will affect how the test utility
is invoked (the -d and --vdev option will need to be specified on the command
line now), but honestly, given the way it was coded, I think the testing of the
ring pmd was not the best example of how to code with dpdk to begin with. I
have also left the two layer violating functions in place, so as not to break
existing applications, but added deprecation warnings to them so that apps can
migrate off them.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the pcap driver to use the PMD_REGISTER_DRIVER macro and fix up the
Makefile so that its linkage is only done if we are building static libraries.
This means that the test applications now have no reference to the pcap library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Rather than have each driver have to remember to add a constructor to it to make
sure its gets registered properly, wrap that process up in a macro to make
registration a one line affair. This also sets the stage for us to make
registration of vdev pmds and physical pmds a uniform process
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Fix compilation error introduced by:
e5ac7c2ff3 eal: don't inline string functions
The stdio.h include is missing due to its removing from rte_string_fns.h.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add function to iterate over mempool.
Useful for diagnostic code that wants to look at mempool usage patterns.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
When doing diagnostic function, it is useful to have a ability
to iterate over all memzones.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
It makes no sense to inline string functions, in fact snprintf
can't be inlined because the function supports variable number of
arguments.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: update includes]
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The DPDK dump functions are useful for remote debugging of an
applications. But when application runs as a daemon, stdout
is typically routed to /dev/null.
Instead change all these functions to take a stdio FILE * handle
instead. An application can then use open_memstream() to capture
the output.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: fix quota_watermark example]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add a new specific packet processing engine in the "testpmd" application that
only replies to ARP requests and to ICMP echo requests.
For this purpose, a new "icmpecho" forwarding mode is provided that can be
dynamically selected with the following testpmd command:
set fwd icmpecho
before starting the receipt of packets on the selected ports.
Then, the "icmpecho" engine performs the following actions on all received
packets:
- replies to a received ARP request by sending back on the RX port a ARP
reply with a "sender hardware address" field containing the MAC address
of the RX port,
- replies to a ICMP echo request by sending back on the RX port a ICMP echo
reply, swapping the IP source and the IP destination address in the IP
header,
- otherwise, simply drops the received packet.
When replying to a received packet that was encapsulated into a VLAN tunnel,
the reply is sent back with the same VLAN identifier.
By default, the testpmd configures VLAN header stripping RX option on each
port.
This option is not managed by the icmpecho engine which won't detect
packets that were encapsulated into a VLAN.
To address this issue, the VLAN header stripping option must be previously
switched off with the following testpmd command:
vlan set strip off
When the "verbose" mode has been set with the testpmd command
"set verbose 1", the "icmpecho" engine displays informations about each
received packet.
The "icmpecho" forwarding engine can also be used to simply check port
connectivity at the hardware level (check that cables are well-plugged)
and at the software level (receipt of VLAN packets, for instance).
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The second condition of this logical OR:
(get_gcd(new_obj_size, nrank * nchan) != 1 ||
get_gcd(nchan, new_obj_size) != 1)
is redundant with the first condition.
We can show that the first condition is equivalent to its disjunction
with the second condition using these two results:
- R1: For all conditions A and B, if B implies A, then (A || B) is
equivalent to A.
- R2: (get_gcd(nchan, new_obj_size) != 1) implies
(get_gcd(new_obj_size, nrank * nchan) != 1)
We can show R1 with the following truth table (0 is false, 1 is true):
+-----+-----++----------+-----+-------------+
| A | B || (A || B) | A | B implies A |
+-----+-----++----------+-----+-------------+
| 0 | 0 || 0 | 0 | 1 |
| 0 | 1 || 1 | 0 | 0 |
| 1 | 0 || 1 | 1 | 1 |
| 1 | 1 || 1 | 1 | 1 |
+-----+-----++----------+-----+-------------+
Truth table of (A || B) and A
We can show R2 by looking at the code of optimize_object_size and
get_gcd.
We see that:
- S1: (nchan >= 1) and (nrank >= 1).
- S2: get_gcd returns 0 only when both arguments are 0.
Let:
- X be get_gcd(new_obj_size, nrank * nchan).
- Y be get_gcd(nchan, new_obj_size).
Suppose:
- H1: get_gcd returns the greatest common divisor of its arguments.
- H2: (nrank * nchan) does not exceed UINT_MAX.
We prove (Y != 1) implies (X != 1) with the following steps:
- Suppose L0: (Y != 1). We have to show (X != 1).
- By H1, Y is the greatest common divisor of nchan and new_obj_size.
In particular, we have L1: Y divides nchan and new_obj_size.
- By H2, we have L2: nchan divides (nrank * nchan)
- By L1 and L2, we have L3: Y divides (nrank * nchan) and
new_obj_size.
- By H1 and L3, we have L4: (Y <= X).
- By S1 and S2, we have L5: (Y != 0).
- By L0 and L5, we have L6: (Y > 1).
- By L4 and L6, we have (X > 1) and thus (X != 1), which concludes.
R2 was also tested for all values of new_obj_size, nrank, and nchan
between 0 and 2000.
This redundant condition was found using TrustInSoft Analyzer.
Signed-off-by: Julien Cretin <julien.cretin@trust-in-soft.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Currently, if there is more memory in hugepages than the amount
requested by dpdk application, the memory is allocated by taking as much
memory as possible from each socket, starting from first one.
For example if a system is configured with 8 GB in 2 sockets (4 GB per
socket), and dpdk is requesting only 4GB of memory, all memory will be
taken in socket 0 (that have exactly 4GB of free hugepages) even if some
cores are configured on socket 1, and there are free hugepages on socket
1...
Change this behaviour to allocate memory on all sockets where some cores
are configured, spreading the memory amongst sockets using following
ratio per socket:
N° of cores configured on the socket / Total number of configured cores
* requested memory
If this new algorithm fails, it defaults to previous behaviour.
This algorithm is used when memory amount is specified globally using
-m option. Per socket memory allocation can always be done using
--socket-mem option.
It is implemented only for Linux as BSD part looks not to be ready for NUMA.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Venky Venkatesan <venky.venkatesan@intel.com>
Allow to initialize a ring in an already allocated memory. The rte_ring_create()
function that allocates a ring in a rte_memzone is still available and now uses
the new rte_ring_init() function in order to factorize the code.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add a function that returns the amount of memory occupied by a rte_ring
structure and its object table. This commit prepares the next one that
will allow to allocate a ring dynamically.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
RTE_EAL_UNBIND_PORTS was deprecated in DPDK 1.4.0 and removed in 1.6.0, but the
code was not removed.
The bind/unbind operations should not be handled by the eal.
These operations should be either done outside of dpdk or inside the PMDs
themselves as these are their problems.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Move RTE_PCI_DRV_FORCE_UNBIND flag handling out of RTE_EAL_UNBIND_PORTS section.
This had nothing to do with RTE_EAL_UNBIND_PORTS anyway.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The pci_switch_module() function should only do what its name tells: unbind pci
devices and rebind them on the specified kernel driver.
Hence, it can not call pci_uio_map_resource().
Call to pci_uio_map_resource() should be moved to rte_eal_pci_probe_one_driver()
so that we can factorize code.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
A fd leak happens in pci_map_resource when multiple bars are mapped.
Fix this by closing fd unconditionnally in this function and open the
intr_handle fd in pci_uio_map_resource instead.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
virtio-uio does not need eal to map bars from uio device, so remove flag
RTE_PCI_DRV_NEED_IGB_UIO.
Then, move virtio-uio workaround out of generic eal_pci.c for linux
implementation.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
bsd implementation lacks check on driver flags, fix this.
Besides, check on BAR0 is not needed and could cause trouble for devices that
have no BAR0.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Looking at bsd implementation, we can see that there are some potential mem
leaks in linux implementation. Fix them.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Some applications reserve hugepages for later use,
but DPDK doesn't take reserved pages into account
when calculating number of available number of hugepages.
This patch adds reading from "resv_hugepages" file
in addition to "free_hugepages".
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Start development cycle for version 1.7.0.
This new development workflow introduces a new versioning scheme.
Instead of having releases r0, r1, r2, etc, there will be release
candidates. Last number has special meanings:
< 16 numbers are reserved for release candidates (RTE_VER_SUFFIX is -rc)
16 is reserved for the release (RTE_VER_SUFFIX must be unset)
> 16 numbers can be used locally (RTE_VER_SUFFIX must be set)
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Print the maximum lcore(s) as configured, and the number of lcore(s) detected
on eal cpu init as debug info besides the not separate detected/not-detected
lcore info.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
[Thomas: add BSD part]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Increasing maximum number of lcores gives a huge place to undetected
lcores in output traces. Moreover, this output does not give any
interesting information, since list of undetected lcores can be deduced
from list of detected ones.
So remove output related to undetected cores.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There is no need for a 'magic' field in struct rte_config, as this part of the
structure is local to each process. All threads of a process are synchronised
because of the run_once atomic.
So remove this field, as it is only adding confusion when reading code that
references 'magic' field from struct rte_mem_config.
Besides, there is no reference about the 'version' field, so remove it as well.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
A line was forgotten when removing blacklist option in commit
"use devargs for vdev and PCI lists with bsd" (cd25fb0863).
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
vdev ethdev can not be allocated on a numa socket that is not socket 0.
The reason comes from rte_eth_dev_allocate() which uses rte_socket_id() to
identify the socket on which vdev driver data should be allocated.
However, at this initialization step, rte_socket_id() always returns 0.
Looking at rte_socket_id(), it needs rte_lcore_id() which uses the per-core
global _lcore_id variable. This variable is initialised by
eal_thread_init_master.
So eal_thread_init_master should be called before rte_eal_vdev_init().
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
There should be no real need for this initialised field as the whole structure
is set to 0 in rte_config_init() by primary process, and secondary processes
wait for this to happen before anything else (looking at mem_config magic).
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
We don't really need this field as it is only used when creating the memzone
object associated to this heap.
Removing numa_socket field makes things simpler and remove race condition.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Following debian kernel headers upgrade to 3.2.57, pci capability accessors
have been backported (upstream commit 8c0d3a02c1309eb6112d2e7c8172e8ceb26ecfca,
("PCI: Add accessors for PCI Express Capability", v3.7-rc1)).
It results in the same compilation error as redhat 6.x.
However, there is no clear way to determine we are building on a debian kernel.
So, rather than determine if we are building on a distribution kernel, look at
PCI_EXP_LNKSTA2 that appeared in this upstream commit.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Compiling the DPDK under FreeBSD gives the following error due to a
missing include <sys/rwlock.h>.
In file included from nic_uio.c:52:
@/vm/vm_pager.h:126:2: error: implicit declaration of function 'rw_assert' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
VM_OBJECT_ASSERT_WLOCKED(object);
^
@/vm/vm_object.h:226:2: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
rw_assert(&(object)->lock, RA_WLOCKED)
^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:126:2: error: use of undeclared identifier 'RA_WLOCKED'
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
rw_assert(&(object)->lock, RA_WLOCKED)
^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:143:2: error: use of undeclared identifier 'RA_WLOCKED'
VM_OBJECT_ASSERT_WLOCKED(object);
^
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
rw_assert(&(object)->lock, RA_WLOCKED)
^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:167:2: error: use of undeclared identifier 'RA_WLOCKED'
VM_OBJECT_ASSERT_WLOCKED(object);
^
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
rw_assert(&(object)->lock, RA_WLOCKED)
^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:190:2: error: use of undeclared identifier 'RA_WLOCKED'
VM_OBJECT_ASSERT_WLOCKED(m->object);
^
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
rw_assert(&(object)->lock, RA_WLOCKED)
^
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in commit 8e245de6ca.
Add the ability to pass some specific initialization arguments to PCI
devices at start-up.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in commit cac6d08c8b.
This commit splits the "--use-device" option in two new options:
- "--pci-whitelist or -w": add a PCI device in the white list
- "--vdev": instanciate a new virtual device
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in commit a8b97e3a1d.
This commit changes the API of --use-device command line argument.
It changes the separators from ';' to ','.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in commit 1220458951.
This patch removes old whitelist code and use the newly introduced
rte_devargs to get the PCI white list, the PCI black list and the list
of virtual devices.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsd part was missing in commit bf6dea0e04.
This commit introduces a new API for storing device arguments given by
the user. It only adds the framework and the test.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in commit 5b1f4a67dd.
To avoid confusion with virtual devices, rename device_list as
pci_device_list and driver_list as pci_driver_list.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in commit 57c24af85d.
This commit adds a dummy rte_mem_virt2phy() to fix the compilation of
DPDK under BSD. This function is only used when the debug option
"--no-huge" is given, to get the physical address of mempools in memory.
As a result, it seems acceptable for now to implement a dummy function
to fix the compilation as the usual case (using contigmem module) works
properly.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The bsdapp part was missing in c5e9eeca5a.
This commit allows external libraries and applications to know if
hugepages are enabled.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
When loading a library "libfoo.so" (depending on "libbar.so", located in an
entirely different folder), with a LD_LIBRARY_PATH=/path/to/libfoo.so", it
returns an error:
EAL: ./libfoo.so: cannot open shared object file: No such file or directory
If the first dlopen() fails (here, because it can't find all dependencies),
the code requires for a second dlopen() that looks for "./libfoo.so". It
turns on pathname matching, which does not use LD_LIBRARY_PATH. As a result,
it fails because it cannot find "./libfoo.so".
The error message matches the error of the second dlopen(), not the first's.
Do not try to look for a different library ("./"-prefixed) than the one
provided in argument. Let the dynamic library management handle it, just
provide an appropriate LD_LIBRARY_PATH.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
lcores that are set in coremask should be checked against lcores detected on
system. This way, we won't need to check them later.
Besides, if specifying an unavailable lcore, we currently panic in
eal_thread_loop() because pthread_setaffinity_np fails.
So this check will return an error with a more explicit message in
eal_parse_coremask().
"EAL: pthread_setaffinity_np failed
PANIC in eal_thread_loop():
cannot set affinity"
becomes :
"EAL: lcore 4 unavailable
EAL: invalid coremask"
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Only the last feature was checked since commit 99f2cdf9ca
(eal: fix %rbx corruption and simplify the code)
The return code for rte_cpu_get_flag_enabled is only checked on the termination
of the for loop that it is called inside, but should be checked for every
iteration it makes through the for loop. This is caused by some silly missing
brackets. Simply add them in
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Pablo De Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
For RH 6.5:
- always include mdio.h to get the definitions of MDIO_EEE, ETHTOOL_GEEE
- is_link_local_ether_addr(), pcie_capability_clear_and_set_word(), and
ether_addr_equal() have been backported
For RH 6.4:
- same issue with ether_addr_equal()
- here ETH_GEE is defined without having the functions.
igb_ethtool.c:2441: error: implicit declaration of function ‘mmd_eee_adv_to_ethtool_adv_t’
Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
On RH 6.5:
igb_main.c:2298: error: unknown field ‘ndo_fdb_add’ specified in
initializer
FDB ops are present in RH 6.5 via the extension of netdev, so add the
ifdef inside the netdev ops definition of igb.
However, FDB functions are not set for RHEL 6.5: the implementation
relies on dev_mc_add_excl API which has not been backported.
Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
rxhash has been renamed to hash. In 3.14 and newer, we can use
skb_set_hash().
Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Need to pass mode argument to open with O_CREAT.
Must check return value from ftruncate().
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The registration of an external vdev driver (a .so library) is done in a
function that has the ((constructor)) attribute. This function is called
when dlopen(driver.so) is invoked.
As a result, we need to do the dlopen() before calling
rte_eal_vdev_init() that calls the initialization functions of all
registered drivers.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Instead of having a list of virtual device drivers in EAL code, add an
API to register drivers. Thanks to this new registration method, we can
remove the references to pmd_ring, pmd_pcap and pmd_xenvirt in EAL code.
This also enables the ability to register a virtual device driver as
a shared library.
The registration is done in an init function flaged with
__attribute__((constructor)). The new convention is to name this
function rte_pmd_xyz_init(). The per-device init function is renamed
rte_pmd_xyz_devinit().
By the way the internal PMDs are now also .so/standalone ready. Let's do
it later on. It will be required to ease maintenance.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The name "nonpci_devs" for virtual devices is ambiguous as a physical
device can also be non-PCI (ex: usb, sata, ...). A better name for this
file is "vdev" as it only deals with virtual devices.
This patch doesn't introduce any change except renaming.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Some PCI drivers may require some specific initialization arguments at
start-up.
Even if unused today, adding this feature seems coherent with virtual
devices in order to provide a full-featured rte_devargs framework. In
the future, it could be added in pmd_ixgbe or pmd_igb for instance to
enable debug of drivers or setting a specific operating mode at
start-up.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit splits the "--use-device" option in two new options:
- "--pci-whitelist or -w": add a PCI device in the white list
- "--vdev": instanciate a new virtual device
Before the patch, the same option "--use-device" was used for these 2
use-cases.
By the way, we also add "--pci-blacklist" in addition to the existing
"-b" for coherency with the whitelist parameter.
Test result:
echo 100 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 100 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
./app/test -c 0x15 -n 3 -m 64
RTE>>eal_flags_autotest
[...]
Test OK
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit changes the API of --use-device command line argument.
It changes the separators from ';' to ','. Indeed, ';' is not the best
choice as this character is also used to separate shell commands,
forcing the user to surround arguments with quotes.
This commit impacts both devargs and kvargs as each of them define
a separator in --use-device argument:
- devargs defines the separator between the device name or pci_id and
its arguments
- kvargs defines the separator between each key/value pairs in
arguments for drivers using the kvargs API to parse their arguments
The modification of devargs and kvargs is done in one commit to keep
the coherency of --use-device.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Remove old whitelist code:
- remove references to rte_pmd_ring, rte_pmd_pcap and pmd_xenvirt in
is_valid_wl_entry() as we want to be able to register external virtual
drivers as a shared library. Moreover this code was duplicated with
dev_types[] from eal_common_pci.c
- eal_common_whitelist.c was badly named: it was able to process PCI
devices white list and the registration of virtual devices
- the parsing code was complex: all arguments were prepended in
one string dev_list_str[4096], then split again
Use the newly introduced rte_devargs to get:
- the PCI white list
- the PCI black list
- the list of virtual devices
Rework the tests:
- a part of the whitelist test can be removed as it is now tested
in app/test/test_devargs.c
- the other parts are just reworked to adapt them to the new API
This commit induce a small API modification: it is not possible to specify
several devices per "--use-device" option. This notation was anyway a bit
cryptic. Ex:
--use-device="eth_ring0,eth_pcap0;iface=ixgbe0"
now becomes:
--use-device="eth_ring0" --use-device="eth_pcap0;iface=ixgbe0"
On the other hand, it is now possible to work in PCI blacklist mode and
instanciate virtual drivers, which was not possible before this patch.
Test result:
./app/test -c 0x15 -n 3 -m 64
RTE>>devargs_autotest
EAL: invalid PCI identifier <08:1>
EAL: invalid PCI identifier <00.1>
EAL: invalid PCI identifier <foo>
EAL: invalid PCI identifier <>
EAL: invalid PCI identifier <000f:0:0>
Test OK
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>