Commit Graph

616 Commits

Author SHA1 Message Date
Cristian Dumitrescu
ca71bbfa04 table: new packet framework API
This file defines the operations to be implemented by
any Packet Framework table.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 03:34:10 +02:00
Cristian Dumitrescu
ef3403fb6f port: source and sink
Source port is a packet generator, similar to /dev/zero Linux device.

Sink port is a packet terminator (drops all input packets), similar
to /dev/null Linux device.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 03:34:10 +02:00
Cristian Dumitrescu
8dceb6aa6e port: hierarchical scheduler
The QoS hierarchical scheduler presented as Packet Framework port.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 03:34:10 +02:00
Cristian Dumitrescu
31987388ce port: IPv4 reassembly
The IPv4 reassembly operation is presented as a Packet Framework port.

The code duplication with examples/ip_reassembly sample application
to be addressed soon by linking the relevant library once upstreamed.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
[Thomas: update to new ip_frag library]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-17 03:34:10 +02:00
Cristian Dumitrescu
9ec4f0900b port: IPv4 fragmentation
This port presents the IPv4 fragmentation operation as a Packet Framework port.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
[Thomas: update to new ip_frag library]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-17 03:34:09 +02:00
Cristian Dumitrescu
bf6931b242 port: ring
ring_reader input port (on top of single consumer rte_ring)
ring writer output port (on top of single producer rte_ring)

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 02:37:29 +02:00
Cristian Dumitrescu
4d97e8b565 port: ethdev
The input port ethdev_reader implements the Packet Framework port API
on top of the Intel DPDK poll mode driver for a NIC RX queue.

The output port ethdev_writer implements the Packet Framework port API
on top of the Intel DPDK poll mode driver for a NIC TX queue.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 02:37:29 +02:00
Cristian Dumitrescu
eb77db3ed9 port: new packet framework API
This file defines the port operations that have to be implemented
by Packet Framework ports.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 02:37:28 +02:00
Cristian Dumitrescu
212841e67c lpm: check rule existence
Added API function for LPM IPv4 and IPv6 to query for the existence
of a rule/route and return the next hop ID associated with the route
if route is present.
This is used by the Packet Framework LPM table for implementing a
routing table.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 02:37:28 +02:00
Cristian Dumitrescu
3179b9ee4f mbuf: meta-data offset
Added zero-size field (offset in data structure) to specify the beginning
of packet meta-data in the packet buffer just after the mbuf.

The size of the packet meta-data is application specific and the packet
meta-data is managed by the application.

The packet meta-data should always be accessed through the provided macros.

This is used by the Packet Framework libraries (port, table, pipeline).

There is absolutely no performance impact due to this mbuf field, as it
does not take any space in the mbuf structure (zero-size field).

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-17 02:37:28 +02:00
Thomas Monjalon
ad942d78c1 ip_frag: clean includes
Add required rte_byteorder in rte_ip_frag.h.
Remove useless includes in *.c files.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-17 02:37:28 +02:00
Jingjing Wu
76db0ac639 ixgbe: add filters
This patch adds following ixgbe NIC filters implement:
  syn filter, ethertype filter, 5tuple filter for intel NIC 82599

Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 19:55:14 +02:00
Jingjing Wu
8fe9df2295 igb: add filters
This patch adds following igb NIC filters implement:
  syn filter, ethertype filter, 2tuple filter, flex filter for intel NIC 82580 and i350
  syn filter, ethertype filter, 5tuple filter for intel NIC 82576

Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 19:55:04 +02:00
Jingjing Wu
b5323dee19 ethdev: add filters
This patch adds APIs for NIC filters list below:
ethertype filter, syn filter, 2tuple filter, flex filter, 5tuple filter

Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 19:42:40 +02:00
Anatoly Burakov
4f1a8f6338 ip_frag: add IPv6 reassembly
Mostly a copy-paste of IPv4, with a few caveats.

Only supported packets are those in which fragment extension header is
just after the IPv6 header.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:05 +02:00
Anatoly Burakov
0aa31d7a59 ip_frag: add IPv6 fragmentation support
Mostly a copy-paste of IPv4.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:05 +02:00
Anatoly Burakov
5ab22ca3ba ip_frag: rename ipv4_fragmentation function
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:05 +02:00
Anatoly Burakov
416707812c ip_frag: refactor reassembly code into a proper library
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:05 +02:00
Anatoly Burakov
63ec0b5851 ip_frag: rename structures in fragmentation table
Technically, fragmentation table can work for both IPv4 and IPv6
packets, so we're renaming everything to be generic enough to make sense
in IPv6 context.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:05 +02:00
Anatoly Burakov
5cb60f3eb3 ip_frag: remove unneeded check and macro
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:04 +02:00
Anatoly Burakov
e87f24024d ip_frag: new internal common header
Moved out debug log macros into common, as reassembly code will later
need them as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:04 +02:00
Anatoly Burakov
83e25bb4d7 ip_frag: fix code style
Issues were reported by checkpatch.pl.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:04 +02:00
Anatoly Burakov
4c38e5532a ip_frag: refactor IPv4 fragmentation into a proper library
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
[Thomas: add in doxygen]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:04 +02:00
Anatoly Burakov
601e279df0 ip_frag: move fragmentation/reassembly headers into a library
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 18:55:04 +02:00
Anatoly Burakov
629395b063 igb_uio: remove PCI id table
Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:11 +02:00
Anatoly Burakov
317fe51f6e eal: add command line option to select vfio interrupt type
Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

In unit tests, we don't know if VFIO is compiled (eal_vfio.h header is
internal to Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
5da473e965 pci: enable vfio device binding
Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
2f4adfad0a vfio: add multiprocess support
Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
ff0b67d1c8 vfio: DMA mapping
Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
5c782b3928 vfio: interrupts
Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
157bf937f5 vfio: header for build support
Add VFIO compilation option to linuxapp config.

Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
88701645c9 eal: move interrupt type out of igb_uio
Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
f058c9ba63 igb_uio: make compilation optional
Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
71d74422e2 pci: rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
6bf883260d pci: distinguish between legitimate failures and non-fatal errors
Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
2b730d4f0a pci: fix code style
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
67c536bdad pci: move uio mapping in a dedicated file
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
46a6fa8793 pci: rework uio mapping to prepare for vfio
Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
f15addb79e mem: make --no-huge use mmap instead of malloc
This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:10 +02:00
Anatoly Burakov
a05a193109 eal: remove useless compilation flag
eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 15:02:09 +02:00
Bruce Richardson
c95584dc2b ixgbe: new vectorized functions for Rx/Tx
New file containing optimized receive and transmit functions which
use 128bit vector instructions to improve performance. When conditions
permit, these functions will be enabled at runtime by the device
initialization routines already in the PMD.

The compilation of the vectorized RX and TX code paths is controlled by
a new setting in the build time configuration for the IXGBE driver. Also
added is a setting which allows an optional further performance increase
by disabling the use of the olflags field on packet RX.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: XiaonanX Zhang <xiaonanx.zhang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
[Thomas: code-style adjustments]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-16 09:01:23 +02:00
Konstantin Ananyev
dc276b5780 acl: new library
The ACL library is used to perform an N-tuple search over a set of rules with
multiple categories and find the best match for each category.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
[Thomas: some code-style changes]
2014-06-14 01:29:45 +02:00
Stephen Hemminger
36c248ebc6 virtio: fix build with debug enabled
Remove useless message that breaks if VIRTIO_DEBUG_DRIVER is defined.
virtio_ethdev.c:224:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]

Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-06-13 11:18:56 +02:00
Stephen Hemminger
14337d0b7a virtio: checkpatch cleanups
This fixes style problems reported by checkpatch including:
  * extra whitespace
  * spaces before tabs
  * strings broken across lines
  * excessively long lines
  * missing spaces after keywords
  * unnecessary paren's in return statements

Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-06-13 11:18:56 +02:00
Bruce Richardson
08ccf3faa6 distributor: new packet distributor library
This adds the code for a new Intel DPDK library for packet distribution.
The distributor is a component which is designed to pass packets
one-at-a-time to workers, with dynamic load balancing. Using the RSS
field in the mbuf as a tag, the distributor tracks what packet tag is
being processed by what worker and then ensures that no two packets with
the same tag are in-flight simultaneously. Once a tag is not in-flight,
then the next packet with that tag will be sent to the next available
core.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
[Thomas: add doxygen @file comment]
2014-06-12 15:47:04 +02:00
Konstantin Ananyev
3440438c5d lpm: introduce rte_lpm_lookupx4
Allows to lookup four IP addresses in an LPM table.
Uses SSE instrincts.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
2014-06-12 12:11:39 +02:00
Pawel Wodkowski
cc333208d5 pci: remove conditions on device definitions
This patch removes obsolete code that prevents defining
NICs 82575EB, I218 and I350.

Signed-off-by: Pawel Wodkowski <pawelx.wdkowski@intel.com>
[Thomas: remove conditions for I218]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-11 18:10:53 +02:00
Ouyang Changchun
1e151eb3bf ixgbe: Tx rate limitation for queue and VF
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2014-06-11 15:56:19 +02:00
Ouyang Changchun
8dbe82b073 ethdev: Tx rate limitation for queue and VF
Add API to support setting TX rate for a queue and a VF.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2014-06-11 15:56:19 +02:00
Ouyang Changchun
c38f4f83ed ixgbe: link up and down
It is implemented by enabling or disabling TX laser.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-11 00:29:36 +02:00
Ouyang Changchun
915e678375 ethdev: API for link up and down
This patch adds API to support the functionality of setting link up and down.
It can be used to repeatedly stop and restart RX/TX of a port without
re-allocating resources for the port and re-configuring the port.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
2014-06-11 00:29:36 +02:00
Konstantin Ananyev
1d99384f4d ethdev: fix compiler warning on PMD_DEBUG_TRACE formats
icc 12.1 complains about RTE_LOG() format:
"argument is incompatible with corresponding format string conversion"

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-11 00:29:36 +02:00
Konstantin Ananyev
4a481f1aec ethdev: prevent from starting/stopping already started/stopped device
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-11 00:29:36 +02:00
Konstantin Ananyev
6b6c73feb7 igb/ixgbe: reset queue pointers after releasing
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-11 00:29:36 +02:00
Konstantin Ananyev
c6c79fa425 e1000: do not release queue on alloc error
If igb_alloc_rx_queue_mbufs() would fail to allocate an mbuf for RX queue,
it calls igb_rx_queue_release(rxq).
That causes rxq to be silently freed, without updating
dev->data->rx_queues[].
So any further reference to it will trigger the SIGSEGV.
Same thing in em PMD too.

To fix: igb_alloc_rx_queue_mbufs() should just return an error to the
caller and let upper layer to deal with the probem.
That's what ixgbe PMD is doing right now.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-11 00:29:36 +02:00
Bruce Richardson
3031749c2d remove trailing whitespaces
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-06-11 00:29:34 +02:00
Alan Carew
d10296d7ea pci: fix build for FreeBSD
Add __rte_unused to
pci_unbind_kernel_driver(struct rte_pci_device *dev)

Signed-off-by: Alan Carew <alan.carew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-06-11 00:29:34 +02:00
Alan Carew
86d5de5c46 eal: fix build for FreeBSD
Recent change to rte_dump_tailq (commit 591a9d7985),
which now uses a FILE parameter causes compilation to fail under FreeBSD
and sourced to a missing include of stdio.h.

Errors:
rte_tailq.h:  unknown type name 'FILE' void rte_dump_tailq(FILE *f);
rte_memory.h: unknown type name 'FILE' void rte_dump_physmem_layout(FILE *f);

Signed-off-by: Alan Carew <alan.carew@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-06-11 00:29:33 +02:00
Konstantin Ananyev
88523f27e8 pcap: fix Tx mbuf corruption
If pcap_sendpacket() fails, then eth_pcap_tx shouldn't silently free that
mbuf and continue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
2014-06-10 13:23:35 +02:00
Jijiang Liu
28dbbd485f xen: fix memory size calculation
The unit of allocated_size is MB, so the change below is made.
Otherwise, it will fail to free memory when available memory is not enough.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
2014-06-09 17:50:04 +02:00
Jijiang Liu
6f0ce7b9cd xen: fix for contiguous region API in kernel 3.13
Since Linux kernel version 3.13.0,
the xen_create/destroy_contiguous_region() API has been changed,
and the first parameter is physical address in the API.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
2014-06-09 17:50:04 +02:00
Jijiang Liu
5ebbb17281 xen: reserve memory at installing dom0_mm.ko
The patch changes the way of reserving memory in Dom0 driver.

It will reserve memory at installing rte_dom0_mm.ko kernel module
instead of requesting memory dynamically during DPDK application startup.
Meanwhile, now driver requests memory size of 4M once first,
if it failed, and request memory size of 2M once.

The main reasons for these changes are as follows:
First, to reduce the impact of increasing in memory fragment
after system run a long time.
Second, to reduce number of memory segment.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-29 11:43:11 +02:00
Ouyang Changchun
823ad64795 virtio: support multiple queues
This patch supports multiple queues feature in DPDK based virtio-net frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-29 11:11:24 +02:00
Ouyang Changchun
5591a4a913 virtio: code-style cleanup
This patch cleanups some coding style issue, and fixes some errors and warnings
reported by checkpatch.pl.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-29 11:11:24 +02:00
Ouyang Changchun
029fd06d40 ixgbe: queue start and stop
This patch implements queue start and stop functionality in IXGBE PMD;
it also enable hardware loopback for VMDQ mode in IXGBE PMD.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-28 16:00:55 +02:00
Ouyang Changchun
0748be2cf9 ethdev: queue start and stop
This patch adds API to support queue start and stop functionality for RX/TX.
It allows RX and TX queue is started or stopped one by one, instead of starting
and stopping all of them at the same time.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-28 16:00:55 +02:00
Ivan Boule
16321de093 ethdev: allow to get RSS hash functions and key
1) Add a new function "rss_hash_conf_get" in the PMD API to retrieve the
   current configuration of the RSS functions and/or of the RSS key used
   by a NIC to compute the RSS hash of input packets.
   The new function uses the existing data structure "rte_eth_rss_conf" for
   returning the RSS hash configuration.

2) Add the ixgbe-specific function "ixgbe_dev_rss_hash_conf_get" and the
   igb-specific function "eth_igb_rss_hash_conf_get" to retrieve the RSS
   hash configuration of ixgbe and igb controllers respectively.

3) Add the command "show port X rss-hash [key]" in the testpmd application
   to display the RSS hash configuration of port X.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 18:42:05 +02:00
Ivan Boule
db5b65301d ethdev: allow to set RSS hash computation flags and/or key
1) Add a new function "rss_hash_update" in the PMD API to dynamically
   update the RSS flags and/or the RSS key used by a NIC to compute the RSS
   hash of input packets.
   The new function uses the existing data structure "rte_eth_rss_conf" for
   the argument that contains the new hash flags and/or the new hash key to
   use.

2) Add the ixgbe-specific function "ixgbe_dev_rss_hash_update" and the
   igb-specific function "eth_igb_rss_hash_update" to update the RSS
   hash configuration of ixgbe and igb controllers respectively.
   Before changing anything, these 2 functions check that the update RSS
   operation does not attempt to disable RSS, if RSS was enabled at port
   initialization time, or does not attempt to enable RSS, if RSS was
   disabled at port initialization time.

Note:
   Configuring the RSS hash flags and the RSS key used by a NIC consists in
   updating appropriate PCI registers of the NIC.
   These operations have been manually tested with the interactive commands
   "write reg" and "write regbit" of the testpmd application.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 18:42:05 +02:00
Ivan Boule
c79875a708 ethdev: check RETA queue indices against number of queues
Each entry of the RSS redirection table (RETA) of igb and ixgbe ports
contains a 4-bit RX queue index, thus imposing RSS RX queue indices to
be strictly lower than 16.
In addition, if a RETA entry is configured with a RX queue index that is
strictly lower than 16, but is greater or equal to the number of RX queues
of the port, then all input packets whose RSS hash value indexes that RETA
entry are silently dropped by the NIC.

Make the function rte_eth_dev_rss_reta_update() check that RX queue indices
that are supplied in the reta_conf argument are strictly lower than
ETH_RSS_RETA_MAX_QUEUE (16) and are strictly lower than the number of
RX queues of the port.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 18:42:05 +02:00
Konstantin Ananyev
242b69c060 igbvf: fix mac type for 82576
e1000_vfadapt type corresponds to 82576 VF devices,
check e1000_set_mac_type() for more details.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
2014-05-27 17:08:37 +02:00
Ivan Boule
0e57cebf74 ixgbevf: assign a default mac address
When initializing a VF with no initial MAC address assigned by
the underlying Host PF driver, assign a default MAC address.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 16:50:19 +02:00
Ivan Boule
698266e688 ixgbevf: reset unused mailbox data registers
The VF_RESET message of the 82599 PF/VF communication protocol issued by a
a Guest VF driver may include an optional permanent MAC address assigned to
the VF by the Guest OS, in order to make it recorded into the 82599 RAR
registers by the Host PF driver.

To indicate the absence of this optional MAC address, the VF_RESET command
assumes that a NULL MAC address is sent, instead of using a dedicated bit
for this purpose. However, when sending a VF_RESET command with no permanent
MAC address, the function ixgbe_reset_hw_vf() of the 82599 VF driver
directly invokes the function ixgbe_write_mbx_vf() with a message that does
not include a NULL MAC address, wrongly assuming that this function fills in
with zero all unused mailbox data registers.

More globally, it is safer to explicitely reset to zero all remaining mailbox
data registers that are not used to store the content of a message, in order
to reset the data sent in a previous VF/PF exchange (in either side),
including the last exchange performed by another Guest OS to which that VF
was previously assigned.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 16:50:19 +02:00
Ivan Boule
b6562244b4 ixgbevf: skip null and permanent mac addresses
On a 82599 VF, the deletion of a dynamically added MAC address consists in
first flushing all added MAC addresses, then in adding again all remaining MAC
addresses.
For this purpose, the function ixgbevf_remove_mac_addr() parses the pool
of MAC addresses associated with a VF, and must skip the VF permanent MAC
address that is stored into it, as well as all NULL MAC addresses.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 16:50:19 +02:00
Ivan Boule
7bfd68a274 ixgbevf: avoid adding twice the permanent mac address
During the initialization of a VF device, the rte_eth_dev_start() function
indirectly invokes the PMD "mac_addr_add" function with the permanent MAC
address assigned to the device.
In the case of 82599 VFs, this operation leads to exhausting the very
limited set of PF resources used to store VF MAC addresses.

To address this issue, do nothing in the function ixgbevf_add_mac_addr()
if the added MAC address is equal to the permanent MAC address of the VF.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 16:50:19 +02:00
Ivan Boule
17d32e45d5 ixgbevf: add/remove mac address
Add missing PMD functions in the ixgbevf driver to add (respectively remove)
a MAC address to/from a 82599 VF.
For this purpose, these 2 functions use the VF/PF mailbox-based protocol.

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-27 16:50:19 +02:00
Konstantin Ananyev
88fccb7a05 ixgbevf: fix jumbo frame
When latest Linux ixgbe PF is used, and DPDK VF is used in DPDK application,
jumbo frames are not received.
Also - if Linux ixgbe PF has MTU set to 1500 (default),
then normal sized packets can be received by DPDK VF.
However, if Linux PF has MTU > 1500, then DPDK VF receives no packets
(normal or jumbo).
With ixgbe_mbox_api_10 ixgbe simply didn't allow set VF MTU > 1514 for 82599.
With ixgbe_mbox_ajpi_11 it does, though now, if PF uses jumbo frames,
it simply disables RX for all VFs.
So to work with PF ithat using jumbo frames, at startup each VF has to:
1. negotiate with PF mbox_api_11.
2. Send to PF SET_LPE message with desired MTU.
Note, that if PF already uses MTU bigger then asked by the VF,
then PF wouldn't take any action.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
2014-05-27 16:50:19 +02:00
Vadim Suraev
57f0ba5f8b timer: fix pending counter
Bug: When a timer is running
 - if rte_timer_stop is called, the pending decrement is
 skipped (decremented only if the timer is pending) and due
 to the update flag the future processing is skipped so the
 timer is counted as pending while it is stopped. - the same
 applies when rte_timer_reset is called but then the pending
 statistics is additionally incremented so the timer is
 counted pending twice.
Solution: decrement the pending
 statistics after returning from the callback. If
 rte_timer_stop was called, it skipped decrementing the
 pending statistics. If rte_time_reset was called, the
 pending statistics was incremented. If neither was called
 and the timer is periodic, the pending statistics is
 incremented when it is reloaded

Signed-off-by: Vadim Suraev <vadim.suraev@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-05-26 18:22:04 +02:00
Vadim Suraev
83cb53f3a2 timer: fix reloading after changes
Bug: when a periodic timer's callback is running, if another
 timer is manipulated, the periodic timer is not reloaded.
Solution: set the update flag only if the modified timer is
 in RUNNING state

Signed-off-by: Vadim Suraev <vadim.suraev@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-05-26 18:22:03 +02:00
Cristian Dumitrescu
cd64eeac11 cmdline: fix infinite loop after EOF
Stop on EOF when reading commands from a file or a pipe.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-05-26 17:16:39 +02:00
Helin Zhang
d719bc02aa kni: fix build on Fedora 18 with kernel 3.6.10
Error of "implicit-function-declaration" can be seen when building KNI
kernel module on Linux kernel 3.6.10 platform, as follows.
  lib/librte_eal/linuxapp/kni/igb_ethtool.c:
    In function igb_get_eee:
  lib/librte_eal/linuxapp/kni/igb_ethtool.c:
    2441:4: error: implicit declaration of function
    mmd_eee_adv_to_ethtool_adv_t
  lib/librte_eal/linuxapp/kni/igb_ethtool.c:
    In function igb_set_eee:
  lib/librte_eal/linuxapp/kni/igb_ethtool.c:
    2551:2: error: implicit declaration of function
    ethtool_adv_to_mmd_eee_adv_t

The root cause is as follows.
On Fedora 18 with kernel 3.6.10, ETHTOOL_GEEE is defined in Linux
header file of "linux/ethtool.h", while is not defined in most of other
linux kernel versions.
mmd_eee_cap_to_ethtool_sup_t(), mmd_eee_adv_to_ethtool_adv_t() and
ethtool_adv_to_mmd_eee_adv_t() in kcompat.h are disabled by "#if
!defined(ETHTOOL_GEEE) || (RHEL_RELEASE_CODE && RHEL_RELEASE_CODE <=
RHEL_RELEASE_VERSION(6,4))", while are called in igb_get_eee() in
igb_ethtool.c which is enabled by "#ifdef ETHTOOL_GEEE".

Reported-by: Prashant Upadhyaya <prashant.upadhyaya@aricent.com>
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-22 16:30:28 +02:00
Stephen Hemminger
1f1f3a0078 sched: fix grinder bug
The rte_scheduler will get stuck and not deliver any more packets
if there are two active subports and then one of them stops enqueing
more packets. This is because of a bug in how the grinder state machines
are managed.

If a non-zero grinder is assigned (but not yet active), then the dequeue
would miss it and always return zero packets. The cure is to always
do a first pass over all grinders.

Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-22 16:14:36 +02:00
Stephen Hemminger
dc05cabd98 sched: fix build if debug enabled
Fix build error if RTE_SCHED_DEBUG is enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-22 16:14:26 +02:00
Stephen Hemminger
3d349a1d35 sched: use malloc instead of memzone for allocation
The existing rte scheduler can only be safely configured once per port
because a memory zone has a fixed size once it is created and can never
be freed or change in size.

This patch changes the scheduler to use rte_malloc instead. This allows
for a port to be reconfigured by doing rte_sched_port_free followed
rte_sched_port_config.

The patch also removes the now unused name parameter from the
port parameters structure.

Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-22 16:13:30 +02:00
Thomas Monjalon
7e14aa0cb4 igb: fix indentation of previous patchset
There was an indentation error in commit d93c252e88
"convert to use of PMD_REGISTER_DRIVER and fix linking"

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-21 13:31:13 +02:00
Thomas Monjalon
683ef79e57 eal: fix indentation of previous patchset
There was an indentation error in commit e57f20e051
"make vdev init path generic for both virtual and pci devices"

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-21 13:31:01 +02:00
Neil Horman
e5ffdd1457 ethdev: remove rte_pmd_init_all function
Now that we've converted all the pmds in dpdk to use the driver registration
macro, rte_pmd_init_all has become empty.  As theres no reason to keep it around
anymore, just remove it and fix up all the eample callers.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:26 +02:00
Neil Horman
2e87457ef9 vmxnet3: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the vmxnet3 pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the vmxnet3 library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:17 +02:00
Neil Horman
42a6654c10 virtio: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the virtio pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the virtio library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:17 +02:00
Neil Horman
aeec17c200 ixgbevf: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the ixgbevf pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the ixgbevf library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:17 +02:00
Neil Horman
7b9d82208b ixgbe: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the ixgbe pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the ixgbe library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
4ac16f77f0 igbvf: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the igbvf pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the igbvf library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
d93c252e88 igb: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the igb pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the igb library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
ca5834b7af e1000: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the e1000 pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the e1000 library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
e57f20e051 eal: make vdev init path generic for both virtual and pci devices
Currently, physical device pmds use a separate initalization path
(rte_pmd_init_all) while virtual devices use a constructor registration and
rte_eal_dev_init.  Theres no reason to have them be separate.  This patch
removes the vdev specific nomenclature from the vdev init path and makes it more
generic for use with all pmds.  This is the first step in converting the
physical device pmds to using the same constructor based registration path that
the virtual devices use.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
2c62588a53 xenvirt: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the xenvirt driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the xenvirt library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

A few notes:

xenvirt was unbuildable as of commit 4c39baf297d10c217e7d3e7370f26a1fede58308..
That commit neglected to include the rte_vdev.h header, so several structs were
left undefined.  This patch includes a fix for that as well.

Also, The linkage for xenvirt is broken in much the same way pmd_ring was, in
that the xenvirt pmd has a function that is called directly from applications
(the example being the testpmd application).  The function is
rte_mempool_gntalloc_create, and should clearly be moved into the rte_mempool
library, with the supporting code in the function implementation moved to a new
xenvirt library separate from the pmd.  This is a large undertaking that
detracts from the purpose of this series however, and so for now, I'm leaving
the linkage to the application in place, and will address this issue in a later
series

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
61934c0956 ring: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the ring driver to use the PMD_REGISTER_DRIVER macro and fix up the
Makefile so that its linkage is only done if we are building static libraries.
This means that the test applications now have no reference to the ring library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Note that the ring driver was also written in such a way that it violated some
general layering principles, several functions were contained in the pmd which
were being called by example from the test application in the app/test
directory.  Specifically it was calling eth_ring_pair_attach,
eth_ring_pair_create and rte_eth_ring_devinit, which should only be called
internally to the dpdk core library.  To correct this I've removed those
functions, and instead allowed them to be called indirectly at initalization
time using the vdev command line argument key nodeaction=<name>:<node>:<action>
where action is one of ATTACH or CREATE.  I've tested out the functionality of
the command line with the testpmd utility, with success, and have removed the
called functions from the test utility.  This will affect how the test utility
is invoked (the -d and --vdev option will need to be specified on the command
line now), but honestly, given the way it was coded, I think the testing of the
ring pmd was not the best example of how to code with dpdk to begin with.  I
have also left the two layer violating functions in place, so as not to break
existing applications, but added deprecation warnings to them so that apps can
migrate off them.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
3660cdf990 pcap: convert to use of PMD_REGISTER_DRIVER and fix linking
Convert the pcap driver to use the PMD_REGISTER_DRIVER macro and fix up the
Makefile so that its linkage is only done if we are building static libraries.
This means that the test applications now have no reference to the pcap library
when building DSO's and must specify its use on the command line with the -d
option.  Static linking will still initalize the driver automatically.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:28:16 +02:00
Neil Horman
3ac2bd97d9 eal: add PMD_REGISTER_DRIVER macro
Rather than have each driver have to remember to add a constructor to it to make
sure its gets registered properly, wrap that process up in a macro to make
registration a one line affair.  This also sets the stage for us to make
registration of vdev pmds and physical pmds a uniform process

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-20 14:27:59 +02:00
Olivier Matz
61d931aaee ivshmem: fix build
Fix compilation error introduced by:
	e5ac7c2ff3 eal: don't inline string functions

The stdio.h include is missing due to its removing from rte_string_fns.h.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-19 14:59:56 +02:00
Stephen Hemminger
356cb732d5 mempool: add iterator function
Add function to iterate over mempool.
Useful for diagnostic code that wants to look at mempool usage patterns.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-05-16 16:02:55 +02:00
Stephen Hemminger
58f8a1d2e3 memzone: add iterator function
When doing diagnostic function, it is useful to have a ability
to iterate over all memzones.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-05-16 16:02:55 +02:00
Stephen Hemminger
e5ac7c2ff3 eal: don't inline string functions
It makes no sense to inline string functions, in fact snprintf
can't be inlined because the function supports variable number of
arguments.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: update includes]
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-05-16 16:02:55 +02:00
Stephen Hemminger
591a9d7985 add FILE argument to debug functions
The DPDK dump functions are useful for remote debugging of an
applications. But when application runs as a daemon, stdout
is typically routed to /dev/null.

Instead change all these functions to take a stdio FILE * handle
instead. An application can then use open_memstream() to capture
the output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: fix quota_watermark example]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-16 16:02:55 +02:00
Stephen Hemminger
c738c6a644 spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-05-16 16:02:55 +02:00
Ivan Boule
168dfa6166 app/testpmd: add engine that replies to ARP and ICMP echo requests
Add a new specific packet processing engine in the "testpmd" application that
only replies to ARP requests and to ICMP echo requests.
For this purpose, a new "icmpecho" forwarding mode is provided that can be
dynamically selected with the following testpmd command:

    set fwd icmpecho

before starting the receipt of packets on the selected ports.

Then, the "icmpecho" engine performs the following actions on all received
packets:

- replies to a received ARP request by sending back on the RX port a ARP
  reply with a "sender hardware address" field containing the MAC address
  of the RX port,

- replies to a ICMP echo request by sending back on the RX port a ICMP echo
  reply, swapping the IP source and the IP destination address in the IP
  header,

- otherwise, simply drops the received packet.

When replying to a received packet that was encapsulated into a VLAN tunnel,
the reply is sent back with the same VLAN identifier.
By default, the testpmd configures VLAN header stripping RX option on each
port.
This option is not managed by the icmpecho engine which won't detect
packets that were encapsulated into a VLAN.
To address this issue, the VLAN header stripping option must be previously
switched off with the following testpmd command:

    vlan set strip off

When the "verbose" mode has been set with the testpmd command
"set verbose 1", the "icmpecho" engine displays informations about each
received packet.

The "icmpecho" forwarding engine can also be used to simply check port
connectivity at the hardware level (check that cables are well-plugged)
and at the software level (receipt of VLAN packets, for instance).

Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-16 13:25:16 +02:00
Julien Cretin
2612a4b935 mem: remove redundant check in optimize_object_size
The second condition of this logical OR:
    (get_gcd(new_obj_size, nrank * nchan) != 1 ||
    get_gcd(nchan, new_obj_size) != 1)
is redundant with the first condition.

We can show that the first condition is equivalent to its disjunction
with the second condition using these two results:

- R1: For all conditions A and B, if B implies A, then (A || B) is
  equivalent to A.

- R2: (get_gcd(nchan, new_obj_size) != 1) implies
  (get_gcd(new_obj_size, nrank * nchan) != 1)

We can show R1 with the following truth table (0 is false, 1 is true):
        +-----+-----++----------+-----+-------------+
        |  A  |  B  || (A || B) |  A  | B implies A |
        +-----+-----++----------+-----+-------------+
        |  0  |  0  ||     0    |  0  |      1      |
        |  0  |  1  ||     1    |  0  |      0      |
        |  1  |  0  ||     1    |  1  |      1      |
        |  1  |  1  ||     1    |  1  |      1      |
        +-----+-----++----------+-----+-------------+
                Truth table of (A || B) and A

We can show R2 by looking at the code of optimize_object_size and
get_gcd.

We see that:
- S1: (nchan >= 1) and (nrank >= 1).
- S2: get_gcd returns 0 only when both arguments are 0.

Let:
- X be get_gcd(new_obj_size, nrank * nchan).
- Y be get_gcd(nchan, new_obj_size).

Suppose:
- H1: get_gcd returns the greatest common divisor of its arguments.
- H2: (nrank * nchan) does not exceed UINT_MAX.

We prove (Y != 1) implies (X != 1) with the following steps:
- Suppose L0: (Y != 1). We have to show (X != 1).
- By H1, Y is the greatest common divisor of nchan and new_obj_size.
  In particular, we have L1: Y divides nchan and new_obj_size.
- By H2, we have L2: nchan divides (nrank * nchan)
- By L1 and L2, we have L3: Y divides (nrank * nchan) and
  new_obj_size.
- By H1 and L3, we have L4: (Y <= X).
- By S1 and S2, we have L5: (Y != 0).
- By L0 and L5, we have L6: (Y > 1).
- By L4 and L6, we have (X > 1) and thus (X != 1), which concludes.

R2 was also tested for all values of new_obj_size, nrank, and nchan
between 0 and 2000.

This redundant condition was found using TrustInSoft Analyzer.

Signed-off-by: Julien Cretin <julien.cretin@trust-in-soft.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-14 11:21:44 +02:00
Didier Pallard
937cca79c9 mem: change default per socket memory allocation
Currently, if there is more memory in hugepages than the amount
requested by dpdk application, the memory is allocated by taking as much
memory as possible from each socket, starting from first one.
For example if a system is configured with 8 GB in 2 sockets (4 GB per
socket), and dpdk is requesting only 4GB of memory, all memory will be
taken in socket 0 (that have exactly 4GB of free hugepages) even if some
cores are configured on socket 1, and there are free hugepages on socket
1...

Change this behaviour to allocate memory on all sockets where some cores
are configured, spreading the memory amongst sockets using following
ratio per socket:
N° of cores configured on the socket / Total number of configured cores
* requested memory
If this new algorithm fails, it defaults to previous behaviour.

This algorithm is used when memory amount is specified globally using
-m option. Per socket memory allocation can always be done using
--socket-mem option.

It is implemented only for Linux as BSD part looks not to be ready for NUMA.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Venky Venkatesan <venky.venkatesan@intel.com>
2014-05-14 11:06:49 +02:00
Olivier Matz
1d64e46eb8 ring: allow to initialize without memzone
Allow to initialize a ring in an already allocated memory. The rte_ring_create()
function that allocates a ring in a rte_memzone is still available and now uses
the new rte_ring_init() function in order to factorize the code.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-05-13 14:40:09 +02:00
Olivier Matz
a182620042 ring: get size in memory
Add a function that returns the amount of memory occupied by a rte_ring
structure and its object table. This commit prepares the next one that
will allow to allocate a ring dynamically.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-05-13 14:35:06 +02:00
David Marchand
5d8751b83b pci: remove deprecated RTE_EAL_UNBIND_PORTS option
RTE_EAL_UNBIND_PORTS was deprecated in DPDK 1.4.0 and removed in 1.6.0, but the
code was not removed.

The bind/unbind operations should not be handled by the eal.
These operations should be either done outside of dpdk or inside the PMDs
themselves as these are their problems.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:21:48 +02:00
David Marchand
ef6352833a pci: move RTE_PCI_DRV_FORCE_UNBIND handling out of #ifdef
Move RTE_PCI_DRV_FORCE_UNBIND flag handling out of RTE_EAL_UNBIND_PORTS section.
This had nothing to do with RTE_EAL_UNBIND_PORTS anyway.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:21:29 +02:00
David Marchand
d8deab8a88 pci: pci_switch_module cleanup
The pci_switch_module() function should only do what its name tells: unbind pci
devices and rebind them on the specified kernel driver.
Hence, it can not call pci_uio_map_resource().

Call to pci_uio_map_resource() should be moved to rte_eal_pci_probe_one_driver()
so that we can factorize code.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:21:22 +02:00
David Marchand
d3e6faf840 pci: rework interrupt fd init and fix fd leak
A fd leak happens in pci_map_resource when multiple bars are mapped.
Fix this by closing fd unconditionnally in this function and open the
intr_handle fd in pci_uio_map_resource instead.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:20:47 +02:00
David Marchand
99d44c7e26 pci: remove virtio-uio workaround
virtio-uio does not need eal to map bars from uio device, so remove flag
RTE_PCI_DRV_NEED_IGB_UIO.
Then, move virtio-uio workaround out of generic eal_pci.c for linux
implementation.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:20:35 +02:00
David Marchand
8e5f9df258 pci: align bsd implementation on linux
bsd implementation lacks check on driver flags, fix this.
Besides, check on BAR0 is not needed and could cause trouble for devices that
have no BAR0.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:20:29 +02:00
David Marchand
8990aac31d pci: fix potential mem leaks
Looking at bsd implementation, we can see that there are some potential mem
leaks in linux implementation. Fix them.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-13 13:18:17 +02:00
Burakov, Anatoly
0c3977dd15 mem: take reserved hugepages into account
Some applications reserve hugepages for later use,
but DPDK doesn't take reserved pages into account
when calculating number of available number of hugepages.

This patch adds reading from "resv_hugepages" file
in addition to "free_hugepages".

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-13 10:11:18 +02:00
Thomas Monjalon
536ba2d8a8 version: 1.7.0-rc0
Start development cycle for version 1.7.0.

This new development workflow introduces a new versioning scheme.
Instead of having releases r0, r1, r2, etc, there will be release
candidates. Last number has special meanings:
< 16 numbers are reserved for release candidates (RTE_VER_SUFFIX is -rc)
16 is reserved for the release (RTE_VER_SUFFIX must be unset)
> 16 numbers can be used locally (RTE_VER_SUFFIX must be set)

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-05-13 10:11:17 +02:00
Wang Sheng-Hui
b20539d687 eal: print maximum and detected lcores
Print the maximum lcore(s) as configured, and the number of lcore(s) detected
on eal cpu init as debug info besides the not separate detected/not-detected
lcore info.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
[Thomas: add BSD part]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-05 18:04:05 +02:00
Didier Pallard
2ef79b212a eal: remove useless output of undetected lcores
Increasing maximum number of lcores gives a huge place to undetected
lcores in output traces. Moreover, this output does not give any
interesting information, since list of undetected lcores can be deduced
from list of detected ones.
So remove output related to undetected cores.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-05 18:03:09 +02:00
David Marchand
69020660c3 eal: remove unused config fields
There is no need for a 'magic' field in struct rte_config, as this part of the
structure is local to each process. All threads of a process are synchronised
because of the run_once atomic.
So remove this field, as it is only adding confusion when reading code that
references 'magic' field from struct rte_mem_config.

Besides, there is no reference about the 'version' field, so remove it as well.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-05-05 18:02:56 +02:00
Thomas Monjalon
fa97553a69 version: 1.6.0r2
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-05-01 22:58:21 +02:00
Thomas Monjalon
ef4fb79bee eal: fix usage description for bsd
A line was forgotten when removing blacklist option in commit
"use devargs for vdev and PCI lists with bsd" (cd25fb0863).

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-30 23:20:01 +02:00
Maxime Leroy
9ad0e24942 eal: fix vdev allocation on non-0 numa socket
vdev ethdev can not be allocated on a numa socket that is not socket 0.
The reason comes from rte_eth_dev_allocate() which uses rte_socket_id() to
identify the socket on which vdev driver data should be allocated.
However, at this initialization step, rte_socket_id() always returns 0.

Looking at rte_socket_id(), it needs rte_lcore_id() which uses the per-core
global _lcore_id variable. This variable is initialised by
eal_thread_init_master.

So eal_thread_init_master should be called before rte_eal_vdev_init().

Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-30 22:59:17 +02:00
David Marchand
c87215d045 malloc: simplify heap initialisation
There should be no real need for this initialised field as the whole structure
is set to 0 in rte_config_init() by primary process, and secondary processes
wait for this to happen before anything else (looking at mem_config magic).

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 11:31:22 +02:00
David Marchand
eeeebe4cd9 malloc: fix race condition on numa_socket field
We don't really need this field as it is only used when creating the memzone
object associated to this heap.
Removing numa_socket field makes things simpler and remove race condition.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 11:29:00 +02:00
David Marchand
8c910d01ae kni: fix build with debian kernel 3.2.57-2
Following debian kernel headers upgrade to 3.2.57, pci capability accessors
have been backported (upstream commit 8c0d3a02c1309eb6112d2e7c8172e8ceb26ecfca,
("PCI: Add accessors for PCI Express Capability", v3.7-rc1)).

It results in the same compilation error as redhat 6.x.
However, there is no clear way to determine we are building on a debian kernel.
So, rather than determine if we are building on a distribution kernel, look at
PCI_EXP_LNKSTA2 that appeared in this upstream commit.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:31:30 +02:00
Olivier Matz
22139b107b nic_uio: fix build with freebsd 10
Compiling the DPDK under FreeBSD gives the following error due to a
missing include <sys/rwlock.h>.

In file included from nic_uio.c:52:
@/vm/vm_pager.h:126:2: error: implicit declaration of function 'rw_assert' is invalid in C99
      [-Werror,-Wimplicit-function-declaration]
        VM_OBJECT_ASSERT_WLOCKED(object);
        ^
@/vm/vm_object.h:226:2: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
        rw_assert(&(object)->lock, RA_WLOCKED)
        ^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:126:2: error: use of undeclared identifier 'RA_WLOCKED'
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
        rw_assert(&(object)->lock, RA_WLOCKED)
                                   ^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:143:2: error: use of undeclared identifier 'RA_WLOCKED'
        VM_OBJECT_ASSERT_WLOCKED(object);
        ^
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
        rw_assert(&(object)->lock, RA_WLOCKED)
                                   ^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:167:2: error: use of undeclared identifier 'RA_WLOCKED'
        VM_OBJECT_ASSERT_WLOCKED(object);
        ^
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
        rw_assert(&(object)->lock, RA_WLOCKED)
                                   ^
In file included from nic_uio.c:52:
@/vm/vm_pager.h:190:2: error: use of undeclared identifier 'RA_WLOCKED'
        VM_OBJECT_ASSERT_WLOCKED(m->object);
        ^
@/vm/vm_object.h:226:29: note: expanded from macro 'VM_OBJECT_ASSERT_WLOCKED'
        rw_assert(&(object)->lock, RA_WLOCKED)
                                   ^

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:31:30 +02:00
Olivier Matz
dd9a36772a devargs: allow to provide arguments per pci device for bsd
The bsdapp part was missing in commit 8e245de6ca.

Add the ability to pass some specific initialization arguments to PCI
devices at start-up.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:31:30 +02:00
Olivier Matz
4bf3fe634a devargs: replace --use-device option by --pci-whitelist and --vdev for bsd
The bsdapp part was missing in commit cac6d08c8b.

This commit splits the "--use-device" option in two new options:

- "--pci-whitelist or -w": add a PCI device in the white list
- "--vdev": instanciate a new virtual device

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:31:19 +02:00
Olivier Matz
c431df41bc devargs: use a comma to separate key/values for bsd
The bsdapp part was missing in commit a8b97e3a1d.

This commit changes the API of --use-device command line argument.
It changes the separators from ';' to ','.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:31:08 +02:00
Olivier Matz
cd25fb0863 devargs: use devargs for vdev and PCI lists with bsd
The bsdapp part was missing in commit 1220458951.

This patch removes old whitelist code and use the newly introduced
rte_devargs to get the PCI white list, the PCI black list and the list
of virtual devices.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:30:54 +02:00
Olivier Matz
3944edad8b devargs: build common functions for bsd
The bsd part was missing in commit bf6dea0e04.

This commit introduces a new API for storing device arguments given by
the user. It only adds the framework and the test.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:28:19 +02:00
Olivier Matz
f070eb9a21 pci: rename device and driver lists for bsd
The bsdapp part was missing in commit 5b1f4a67dd.

To avoid confusion with virtual devices, rename device_list as
pci_device_list and driver_list as pci_driver_list.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:28:19 +02:00
Olivier Matz
9a9f7f8d3a mem: get dummy physical address in case of --no-huge with bsd
The bsdapp part was missing in commit 57c24af85d.

This commit adds a dummy rte_mem_virt2phy() to fix the compilation of
DPDK under BSD. This function is only used when the debug option
"--no-huge" is given, to get the physical address of mempools in memory.

As a result, it seems acceptable for now to implement a dummy function
to fix the compilation as the usual case (using contigmem module) works
properly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:28:19 +02:00
Olivier Matz
eca9a994bc mem: get hugepages config for bsd
The bsdapp part was missing in c5e9eeca5a.

This commit allows external libraries and applications to know if
hugepages are enabled.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-30 01:28:18 +02:00
Pascal Mazon
894fd42e7f eal: do not try to load library from current directory
When loading a library "libfoo.so" (depending on "libbar.so", located in an
entirely different folder), with a LD_LIBRARY_PATH=/path/to/libfoo.so", it
returns an error:

 EAL: ./libfoo.so: cannot open shared object file: No such file or directory

If the first dlopen() fails (here, because it can't find all dependencies),
the code requires for a second dlopen() that looks for "./libfoo.so". It
turns on pathname matching, which does not use LD_LIBRARY_PATH. As a result,
it fails because it cannot find "./libfoo.so".

The error message matches the error of the second dlopen(), not the first's.

Do not try to look for a different library ("./"-prefixed) than the one
provided in argument. Let the dynamic library management handle it, just
provide an appropriate LD_LIBRARY_PATH.

Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-18 00:38:38 +02:00
David Marchand
4f04db8b89 eal: check coremask against detected lcores
lcores that are set in coremask should be checked against lcores detected on
system. This way, we won't need to check them later.

Besides, if specifying an unavailable lcore, we currently panic in
eal_thread_loop() because pthread_setaffinity_np fails.
So this check will return an error with a more explicit message in
eal_parse_coremask().

"EAL: pthread_setaffinity_np failed
 PANIC in eal_thread_loop():
 cannot set affinity"

becomes :

"EAL: lcore 4 unavailable
 EAL: invalid coremask"

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-04-18 00:38:37 +02:00
Neil Horman
5d52944803 eal: fix check of all requested CPU features
Only the last feature was checked since commit 99f2cdf9ca
(eal: fix %rbx corruption and simplify the code)

The return code for rte_cpu_get_flag_enabled is only checked on the termination
of the for loop that it is called inside, but should be checked for every
iteration it makes through the for loop.  This is caused by some silly missing
brackets.  Simply add them in

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Pablo De Lara Guarch  <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-18 00:20:04 +02:00
Jean-Mickael Guerin
5578ace03c kni: more compatibility with RHEL 6.4/6.5
For RH 6.5:
- always include mdio.h to get the definitions of MDIO_EEE, ETHTOOL_GEEE
- is_link_local_ether_addr(), pcie_capability_clear_and_set_word(),  and
  ether_addr_equal() have been backported

For RH 6.4:
- same issue with ether_addr_equal()
- here ETH_GEE is defined without having the functions.

igb_ethtool.c:2441: error: implicit declaration of function ‘mmd_eee_adv_to_ethtool_adv_t’

Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-18 00:20:04 +02:00
Jean-Mickael Guerin
22367416b0 kni: disable FDB operations on RHEL 6.5
On RH 6.5:
igb_main.c:2298: error: unknown field ‘ndo_fdb_add’ specified in
initializer

FDB ops are present in RH 6.5 via the extension of netdev, so add the
ifdef inside the netdev ops definition of igb.

However, FDB functions are not set for RHEL 6.5: the implementation
relies on dev_mc_add_excl API which has not been backported.

Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-18 00:20:04 +02:00
Aaro Koskinen
74d62e73df kni: fix build with kernel 3.15
rxhash has been renamed to hash. In 3.14 and newer, we can use
skb_set_hash().

Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-17 18:00:32 +02:00
Stephen Hemminger
524f073a10 ivshmem: fix errors identified by hardening
Need to pass mode argument to open with O_CREAT.
Must check return value from ftruncate().

Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-17 15:48:44 +02:00
Olivier Matz
396b69e56a vdev: allow external registration of virtual device drivers
The registration of an external vdev driver (a .so library) is done in a
function that has the ((constructor)) attribute. This function is called
when dlopen(driver.so) is invoked.

As a result, we need to do the dlopen() before calling
rte_eal_vdev_init() that calls the initialization functions of all
registered drivers.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-11 16:17:57 +02:00
Olivier Matz
4c39baf297 vdev: new registration API
Instead of having a list of virtual device drivers in EAL code, add an
API to register drivers. Thanks to this new registration method, we can
remove the references to pmd_ring, pmd_pcap and pmd_xenvirt in EAL code.
This also enables the ability to register a virtual device driver as
a shared library.

The registration is done in an init function flaged with
__attribute__((constructor)). The new convention is to name this
function rte_pmd_xyz_init(). The per-device init function is renamed
rte_pmd_xyz_devinit().

By the way the internal PMDs are now also .so/standalone ready. Let's do
it later on. It will be required to ease maintenance.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-11 16:17:42 +02:00
Olivier Matz
9fa5e2b026 vdev: rename nonpci_devs as vdev
The name "nonpci_devs" for virtual devices is ambiguous as a physical
device can also be non-PCI (ex: usb, sata, ...). A better name for this
file is "vdev" as it only deals with virtual devices.

This patch doesn't introduce any change except renaming.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-11 14:05:08 +02:00
Olivier Matz
8e245de6ca devargs: allow to provide arguments per pci device
Some PCI drivers may require some specific initialization arguments at
start-up.

Even if unused today, adding this feature seems coherent with virtual
devices in order to provide a full-featured rte_devargs framework. In
the future, it could be added in pmd_ixgbe or pmd_igb for instance to
enable debug of drivers or setting a specific operating mode at
start-up.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-10 15:50:34 +02:00
Olivier Matz
cac6d08c8b devargs: replace --use-device option by --pci-whitelist and --vdev
This commit splits the "--use-device" option in two new options:

- "--pci-whitelist or -w": add a PCI device in the white list
- "--vdev": instanciate a new virtual device

Before the patch, the same option "--use-device" was used for these 2
use-cases.

By the way, we also add "--pci-blacklist" in addition to the existing
"-b" for coherency with the whitelist parameter.

Test result:

echo 100 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 100 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
./app/test -c 0x15 -n 3 -m 64
RTE>>eal_flags_autotest
[...]
Test OK

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-10 15:50:34 +02:00
Olivier Matz
a8b97e3a1d devargs: use a comma instead of semicolon to separate key/values
This commit changes the API of --use-device command line argument.
It changes the separators from ';' to ','. Indeed, ';' is not the best
choice as this character is also used to separate shell commands,
forcing the user to surround arguments with quotes.

This commit impacts both devargs and kvargs as each of them define
a separator in --use-device argument:

- devargs defines the separator between the device name or pci_id and
   its arguments
- kvargs defines the separator between each key/value pairs in
   arguments for drivers using the kvargs API to parse their arguments

The modification of devargs and kvargs is done in one commit to keep
the coherency of --use-device.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-10 15:50:11 +02:00
Olivier Matz
1220458951 devargs: use devargs for vdev and PCI whitelist/blacklist
Remove old whitelist code:
- remove references to rte_pmd_ring, rte_pmd_pcap and pmd_xenvirt in
  is_valid_wl_entry() as we want to be able to register external virtual
  drivers as a shared library. Moreover this code was duplicated with
  dev_types[] from eal_common_pci.c
- eal_common_whitelist.c was badly named: it was able to process PCI
  devices white list and the registration of virtual devices
- the parsing code was complex: all arguments were prepended in
  one string dev_list_str[4096], then split again

Use the newly introduced rte_devargs to get:
- the PCI white list
- the PCI black list
- the list of virtual devices

Rework the tests:
- a part of the whitelist test can be removed as it is now tested
  in app/test/test_devargs.c
- the other parts are just reworked to adapt them to the new API

This commit induce a small API modification: it is not possible to specify
several devices per "--use-device" option. This notation was anyway a bit
cryptic. Ex:
  --use-device="eth_ring0,eth_pcap0;iface=ixgbe0"
  now becomes:
  --use-device="eth_ring0" --use-device="eth_pcap0;iface=ixgbe0"

On the other hand, it is now possible to work in PCI blacklist mode and
instanciate virtual drivers, which was not possible before this patch.

Test result:

./app/test -c 0x15 -n 3 -m 64
RTE>>devargs_autotest
EAL: invalid PCI identifier <08:1>
EAL: invalid PCI identifier <00.1>
EAL: invalid PCI identifier <foo>
EAL: invalid PCI identifier <>
EAL: invalid PCI identifier <000f:0:0>
Test OK

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-04-10 14:59:34 +02:00