Commit Graph

1196 Commits

Author SHA1 Message Date
Pawel Wodkowski
46fb436836 bond: add mode 4
This patch set add support for dynamic link aggregation (mode 4) to the
librte_pmd_bond library. This mode provides auto negotiation/configuration
of peers and well as link status changes monitoring using out of band
LACP (link aggregation control protocol) messages. For further details of
LACP specification see the IEEE 802.3ad/802.1AX standards. It is also
described here
https://www.kernel.org/doc/Documentation/networking/bonding.txt.

In this implementation we have an array of mode 4 settings for each slave.
There is also assumption that for every port is one aggregator (it might
be unused if better is found).

Difference in this implementation vs Linux implementation:
- this implementation it is not directly based on state machines but current
  state is calculated from actor and partner states (and other things too).

Some implementation details:
- during rx burst every packet Is checked if this is LACP or marker packet.
  If it is LACP frame it is passed to mode 4 logic using slaves rx ring  and
  removed from rx buffer before it is returned
- in tx burst, packets from mode 4 (if any) are injected into each slave.
- there is a timer running in background to process/produce mode 4
  frames form rx/to tx functions.

Some requirements for this mode:
- for LACP mode to work rx and tx burst functions must be invoked
  at least in 100ms intervals
- provided buffer to rx burst should be at least 2x slave count size. This is
  not needed but might increase performance especially during initial
  handshake.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
2014-11-27 21:20:58 +01:00
Sujith Sankar
80083092aa enic: fix vfio inclusion
Inclusion of vfio.h was giving compilation errors if kernel version is less
than 3.6.0 and if RTE_EAL_VFIO was in config.

Removed inclusion of vfio.h and replaced RTE_EAL_VFIO with VFIO_PRESENT.

Reported-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-27 21:20:42 +01:00
Thomas Monjalon
f97ae91fcb enic: fix dependencies
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-27 19:12:43 +01:00
Thomas Monjalon
d07180f211 net: fix conflict with libc
It was impossible to include netinet/in.h and rte_ip.h
because the IP protocols were redefined.
It is removed because useless.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
2014-11-27 19:03:27 +01:00
Keith Wiles
1665d6310d mempool: avoid dump crash with null pointer
Check the FILE *f and rte_mempool *mp pointers for NULL.

Signed-off-by: Keith Wiles <keith.wiles@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-27 17:30:20 +01:00
Sergio Gonzalez Monroy
fdf20fa7be add prefix to cache line macros
CACHE_LINE_SIZE is a macro defined in machine/param.h in FreeBSD and
conflicts with DPDK macro version.
Adding RTE_ prefix to avoid conflicts.
CACHE_LINE_MASK and CACHE_LINE_ROUNDUP are also prefixed.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
[Thomas: updated on HEAD, including PPC]
2014-11-27 16:21:11 +01:00
Sergio Gonzalez Monroy
be04c70727 eal/bsd: remove unused HPET support
The HPET support in the BSD EAL was copied directly from the Linux version,
but did not actually work on FreeBSD. We replace this old code with a simple
compiler message that informs the user that we don't support HPET on BSD if
they enable such support in the build-time configuration file.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2014-11-27 16:21:11 +01:00
Sergio Gonzalez Monroy
7343f7498f eal/bsd: use sysctl to get TSC frequency
BSD provides the TSC frequency value through sysctl.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2014-11-27 16:21:01 +01:00
David Marchand
f9462cf0b9 eal: no more bare metal environment
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-27 13:09:51 +01:00
Thomas Monjalon
df2e9cec22 mbuf: sort TCP segmentation offload flag
Due to reordering conflicts, the TSO flag was not sorted.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-27 10:39:21 +01:00
David Marchand
63dd47b9a5 eal/linux: fix remaining checks for 64-bit architectures
RTE_ARCH_X86_64 can not be used as a way to determine if we are building for
64bits cpus. Instead, RTE_ARCH_64 should be used.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2014-11-27 08:46:22 +01:00
jingjing.wu
be646317d7 i40e: add ethertype filter
Handle the RTE_ETH_FILTER_ADD and RTE_ETH_FILTER_DELETE operations
on ethertype filter.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
2014-11-26 23:24:43 +01:00
jingjing.wu
07f02cdb08 ethdev: add ethertype filter
A new structure of ethertype filter is defined in rte_eth_ctrl.h
for filter_ctrl api

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
2014-11-26 23:24:43 +01:00
Sujith Sankar
df2fd00e29 enic: build integration
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: enable for BSD - not tested]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-26 23:07:11 +01:00
Sujith Sankar
fefed3d1e6 enic: new driver
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-26 23:07:11 +01:00
Sujith Sankar
9913fbb91d enic/base: common code
VNIC common code is partially shared with ENIC kernel mode driver.

Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-26 23:07:11 +01:00
Sujith Sankar
c73ddb41e7 enic: license
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
2014-11-26 23:07:11 +01:00
Chao Zhu
d05e7115f4 mem: support layout of IBM Power
The mmap of hugepage files on IBM Power starts from high address to low
address. This is different from x86. This patch modified the memory
segment detection code to get the correct memory segment layout on Power
architecture. This patch also added a commond ARCH_PPC_64 definition for
64 bit systems.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:10 +01:00
Chao Zhu
b77b563972 mem: add huge page sizes for IBM Power
IBM Power architecture has different huge page sizes (16MB, 16GB) than
x86.This patch defines RTE_PGSIZE_16M and RTE_PGSIZE_16G in the
rte_page_sizes enum variable and adds huge page size support of DPDK
for IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:10 +01:00
Chao Zhu
f0fa947c61 eal/linux: disable iopl operation for IBM Power
iopl() call is mostly for the i386 architecture. In Power and other
architecture, it doesn't exist. This patch modified rte_eal_iopl_init()
and make it return -1 for Power and other architecture. Thus
rte_config.flags will not contain EAL_FLG_HIGH_IOPL flag for other
architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:10 +01:00
Chao Zhu
9ae1553856 eal/ppc: cpu flag checks for IBM Power
IBM Power processor doesn't have CPU flag hardware registers. This patch
uses aux vector software register to get CPU flags and add CPU flag
checking support for IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:10 +01:00
Chao Zhu
7302a1724b eal/ppc: vector memcpy for IBM Power
The SSE based memory copy in DPDK only support x86. This patch adds
altivec based memory copy functions for IBM Power architecture. This
patch includes altivec.h which requires GCC version>= 4.8.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:10 +01:00
Chao Zhu
d464c8af66 eal/ppc: spinlock operations for IBM Power
This patch adds spinlock operations for IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:09 +01:00
Chao Zhu
529c7f5c8c eal/ppc: prefetch operations for IBM Power
This patch add architecture specific prefetch operations for IBM Power
architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:09 +01:00
Chao Zhu
85997d60b1 eal/ppc: cpu cycle operations for IBM Power
IBM Power architecture doesn't have TSC register to get CPU cycles. This
patch implements the time base register read instead of TSC register of
x86 on IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:09 +01:00
Chao Zhu
de6fff135e eal/ppc: byte order operations for IBM Power
This patch adds architecture specific byte order operations for IBM Power
architecture. Power architecture support both big endian and little
endian. This patch also adds a RTE_ARCH_BIG_ENDIAN micro.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:09 +01:00
Chao Zhu
05c3fd7110 eal/ppc: atomic operations for IBM Power
This patch adds architecture specific atomic operation file for IBM
Power architecture CPU.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
2014-11-26 21:50:09 +01:00
Olivier Matz
1224decaa4 ixgbe: support TCP segmentation offload
Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
(l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
TCP segmentation.

In ixgbe, when doing TSO, the IP length must not be included in the TCP
pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
used to fix the pseudo header checksum of the packet before giving it to
the hardware.

In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This should not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-26 19:35:56 +01:00
Olivier Matz
4199fdea60 mbuf: generic support for TCP segmentation offload
Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.

Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).

To delegate the TCP segmentation to the hardware, the user has to:

- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
  PKT_TX_TCP_CKSUM)
- set the flag PKT_TX_IPV4 or PKT_TX_IPV6
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
  the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum without taking ip_len in account,
  and set it in the TCP header, for instance by using
  rte_ipv4_phdr_cksum(ip_hdr, ol_flags)

The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.

This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.

Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-26 19:35:56 +01:00
Olivier Matz
6006818cfb net: new checksum functions
Introduce new functions to calculate checksums. These new functions
are derivated from the ones provided csumonly.c but slightly reworked.
There is still some room for future optimization of these functions
(maybe SSE/AVX, ...).

This API will be modified in tbe next commits by the introduction of
TSO that requires a different pseudo header checksum to be set in the
packet.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-26 19:35:55 +01:00
Olivier Matz
e4a1c50e69 mbuf: get the name of offload flags
In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2014-11-26 19:35:55 +01:00
Olivier Matz
b161f72107 mbuf: remove too specific flags mask
This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-26 19:35:55 +01:00
Olivier Matz
b029fd236d mbuf: add help about Tx checksum flags
Describe how to use hardware checksum API.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-26 19:35:55 +01:00
Olivier Matz
340e52d9bd mbuf: reorder Tx flags
The tx mbuf flags are now ordered from the lowest value to the
the highest. Add comments to explain where to add new flags.

By the way, move the PKT_TX_VXLAN_CKSUM at the right place.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-26 18:54:41 +01:00
Olivier Matz
0e8d6a29b4 ixgbe: fix flags variable size to 64 bits
Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
packet flags are now 64 bits wide. Some occurences were forgotten in
the ixgbe driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-26 18:54:15 +01:00
Olivier Matz
b22d99894a igb/ixgbe: fix IP checksum calculation
According to Intel® 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-26 18:49:53 +01:00
Alan Carew
75bf9c8e82 power: integration of vm power management
librte_power now contains both rte_power_acpi_cpufreq and rte_power_kvm_vm
implementations.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-26 17:27:04 +01:00
Alan Carew
210c383e24 power: packet format for vm power management
Provides a command packet format for host and guest.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-26 17:27:04 +01:00
Alan Carew
445c6528b5 power: common interface for guest and host
Moved the current librte_power implementation to rte_power_acpi_cpufreq, with
renaming of functions only.
Added rte_power_kvm_vm implementation to support Power Management from a VM.

librte_power now hides the implementation based on the environment used.
A new call rte_power_set_env() can explicidly set the environment, if not
called then auto-detection takes place.

rte_power_kvm_vm is subset of the librte_power APIs, the following is supported:
 rte_power_init(unsigned lcore_id)
 rte_power_exit(unsigned lcore_id)
 rte_power_freq_up(unsigned lcore_id)
 rte_power_freq_down(unsigned lcore_id)
 rte_power_freq_min(unsigned lcore_id)
 rte_power_freq_max(unsigned lcore_id)

The other unsupported APIs return -ENOTSUP

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-26 17:27:04 +01:00
Alan Carew
cd0d5547e8 power: vm communication channels in guest
Allows for the opening of Virtio-Serial devices on a VM, where a DPDK
application can send packets to the host based monitor. The packet formatted is
specified in channel_commands.h
Each device appears as a serial device in path
/dev/virtio-ports/virtio.serial.port.<agent_type>.<lcore_num> where each lcore
in a DPDK application has exclusive to a device/channel.
Each channel is opened in non-blocking mode, after a successful open a test
packet is send to the host to ensure the host side is monitoring.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-26 17:27:03 +01:00
Anatoly Burakov
c4f136db8e eal/linux: map pci memory resources after hugepages
Multi-process DPDK application must mmap hugepages and PCI resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL try and map PCI BARs right after the hugepages
(instead of location chosen by mmap) in virtual memory, so that PCI BARs
have less chance of ending up in random places in virtual memory.

Signed-off-by: Liang Xu <liang.xu@cinfotech.cn>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 18:16:41 +01:00
Simon Kuenzer
fcbda6d4b0 eal: add option --master-lcore
Enable users to specify the lcore id that is used as master lcore.

Signed-off-by: Simon Kuenzer <simon.kuenzer@neclab.eu>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-25 14:06:40 +01:00
Patrick Lu
5583037a79 eal: get relative core index
EAL -c option allows the user to enable any lcore in the system.
Often times, the user app wants to know 1st enabled core, 2nd
enabled core, etc, rather than phyical core ID (rte_lcore_id().)

The new API rte_lcore_index() will return an index from enabled lcores
starting from zero.

Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:33:35 +01:00
Didier Pallard
d888cb8b96 eal: add core list input format
In current version, used cores can only be specified using a bitmask.
It will now be possible to specify cores in 2 different ways:
- Using a bitmask (-c [0x]nnn): bitmask must be in hex format
- Using a list in following format: -l <c1>[-c2][,c3[-c4],...]

The letter -l can stand for lcore or list.

-l 0-7,16-23,31 being equivalent to -c 0x80FF00FF

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:33:35 +01:00
Thomas Monjalon
8552f950ee eal: factorize configuration adjustment
Some adjustments are done after options parsing and are common
to Linux and BSD.

Remove process_type adjustment in rte_config_init() because
it is already done in eal_parse_args().
eal_proc_type_detect() is kept duplicated because it open a
file descriptor which is used later in each eal.c.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:33:35 +01:00
Thomas Monjalon
f91fc65d12 eal: factorize options sanity check
No need to have duplicated check for common options.

Some flags are set for options -c and -m in order to simplify the
checks.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:33:35 +01:00
Thomas Monjalon
341befa2a2 eal: factorize internal config reset
Now that internal config structure is common to Linux and BSD,
we can have a common function to initialize it.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:33:31 +01:00
Thomas Monjalon
33e25b3394 eal: fix header guards
Some guards are missing or have a wrong name.
Others have LINUXAPP in their name but are now common.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:30:23 +01:00
Thomas Monjalon
8828a3210c eal: factorize common headers
No need to have different headers for Linux and BSD.
These files are identicals with exception of internal config which has
uio and vfio fields only useful for Linux.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:16:24 +01:00
Thomas Monjalon
4b5a6f916a eal: move internal headers in source directory
The directory include/ should be reserved to public headers.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 13:16:16 +01:00
Thomas Monjalon
8bb25d58b6 ethdev: fix doxygen comments about RSS
The parameters port_id didn't match with comments about port.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-25 12:14:57 +01:00
Thomas Monjalon
4722d7b446 bond: fix doxygen
There is no parameter delay_ms in *_delay_get functions.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-25 12:14:48 +01:00
Ouyang Changchun
4743e40ce0 pci: new ixgbe devices
EAL misses 4 device ID but base codes support them, so add them into EAL.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
2014-11-25 10:30:15 +01:00
Jingjing Wu
d8b90c4eab i40e: take flow director flexible payload configuration
configure flexible payload and flex mask in i40e driver
It includes arguments verification and HW setting.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:39 +01:00
Jingjing Wu
4af0a28dc5 ethdev: add flow director flexible payload setting in port config
add flexible payload setting in eth_conf

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:04 +01:00
Jingjing Wu
8d45466d58 i40e: get flow director statistics
implement operation to get flow director statistics in i40e pmd driver

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:04 +01:00
Jingjing Wu
abcc810513 ethdev: get flow director statistics
define structures for getting flow director statistics

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:04 +01:00
Jingjing Wu
bf78123f30 i40e: get flow director information
implement operation to get flow director information in i40e pmd driver, includes
 - mode
 - supported flow types
 - table space
 - flexible payload size and granularity
 - configured flexible payload and mask information

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:04 +01:00
Jingjing Wu
bdfb5d4d35 ethdev: get flow director information
define structures for getting flow director information includes:
 - mode
 - supported flow types
 - table space
 - flexible payload size and granularity
 - configured flexible payload and mask information

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:04 +01:00
Jingjing Wu
8ae001ba17 i40e: flush flow director table
implement operation to flush flow director table

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:04 +01:00
Jingjing Wu
5a21d9715f i40e: report flow director matching
setting the FDIR flag and report FD_ID plus flex bytes in mbuf if match

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:03 +01:00
Jingjing Wu
829a1c2c41 mbuf: extend flow director field
fdir field in rte_mbuf is extended to support flex bytes reported when fdir match.
8 flex bytes can be reported in maximum.
The reported flex bytes are part of flexible payload.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:03 +01:00
Jingjing Wu
74da0911f0 i40e: flow director matching counter
support to get the fdir_match counter

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:03 +01:00
Jingjing Wu
05999aab4c i40e: add or delete flow director
deal with two operations for flow director
 - RTE_ETH_FILTER_ADD
 - RTE_ETH_FILTER_DELETE
encode the flow inputs to programming packet
sent the packet to filter programming queue and check status on the status report queue

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:03 +01:00
Jingjing Wu
a72c1d58de i40e: transition between flow type and pctype
- macros to validate flow_type and pctype
- functions for transition between flow_type and pctype:
  - i40e_flowtype_to_pctype
  - i40e_pctype_to_flowtype

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:03 +01:00
Jingjing Wu
d69be32d4d ethdev: structures to add or delete flow director
define structures to add or delete flow director filter
  - struct rte_eth_fdir_filter

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:06:03 +01:00
Jingjing Wu
f05ec7d77e i40e: initialize flow director flexible payload setting
set flexible payload related registers to default value at initialization time.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-25 00:05:54 +01:00
Jingjing Wu
71d35259ff i40e: tear down flow director
release fortville resources on flow director, includes
 - queue 0 pair release
 - release vsi

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 23:47:05 +01:00
Jingjing Wu
a778a1fa2e i40e: set up and initialize flow director
set up fortville resources to support flow director, includes
 - queue 0 pair allocated and set up for flow director
 - create vsi
 - reserve memzone for flow director programming packet

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 23:46:00 +01:00
Helin Zhang
2abd11d2e9 i40evf: support querying and updating redirection table
Support of updating/querying redirection table has been added for VF.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 23:15:51 +01:00
Helin Zhang
66c594904a ethdev: support multiple sizes of redirection table
As 40G NIC supports different sizes (128/512/64 entries) of
redirection table from that (128 entries) of 1G and 10G NICs,
support of multiple sizes of redirection table is needed.
It includes,
* Redefine 'struct rte_eth_rss_reta' in ethdev.
  - To 'struct rte_eth_rss_reta_entry64' which contains 64
    entries and 64 bits mask.
  - Array of above new structure can be used for any number of
    redirection table entries, as long as the number is multiple
    of 64. This is quite flexible for the future expanding of
    redirection table.
* Redefinition of relevant interfaces in ethdev.
  - Interface of reta update has been redefined with new parameters.
  - Interface of reta query has been redefined with new parameters.
* Rework of 1G PMD in igb.
  - reta update has been reworked.
  - reta query has been reworked.
* Rework of 10G PMD in ixgbe.
  - reta update has been reworked.
  - reta query has been reworked.
* Rework of 40G PMD (PF only) in i40e.
  - reta update has been reworked.
  - reta query has been reworked.
* Implement relevant commands in testpmd.

Test report: http://dpdk.org/ml/archives/dev/2014-November/008362.html

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Erlu Chen <erlu.chen@intel.com>
2014-11-24 22:59:15 +01:00
Helin Zhang
a887690986 i40e: add redirection table size in device info
Returning redirection table size has been supported in ops of
'dev_infos_get' for both PF and VF. Default RX/TX configurations
of VF can be returned in ops of 'dev_infos_get', while it was
missed before.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 22:59:15 +01:00
Helin Zhang
2144f6630f ixgbe: add redirection table size in device info
As more and more information are different between PF and VF, ops
of 'dev_infos_get' has been implemented respectively. In addition,
returning redirection table size has been supported in it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 22:59:15 +01:00
Helin Zhang
1f35bdcbfd igb: add redirection table size in device info
As more and more information are different between PF and VF,
ops of 'dev_infos_get' has been implemented respectively. In
addition, new field of 'reta_size' has been added in
'struct rte_eth_dev_info' for returning redirection table size.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 22:59:15 +01:00
Helin Zhang
03fce3d063 i40e: support setting hash lookup table size
Add support of setting hash lookup table size according
to the hardawre capability.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 22:59:14 +01:00
Helin Zhang
9097952eae i40evf: fix code style
Fix of several code style issues.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-24 22:59:14 +01:00
Declan Doherty
a45b288ef2 bond: support link status polling
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tested-by: SunX Jiajia <sunx.jiajia@intel.com>
2014-11-24 21:44:02 +01:00
Declan Doherty
620f98d66f bond: free mbufs on Tx burst failure
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tested-by: SunX Jiajia <sunx.jiajia@intel.com>
2014-11-24 21:43:50 +01:00
Declan Doherty
2a61ae793a bond: fix naming inconsistency
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-24 21:43:42 +01:00
Declan Doherty
2493e691c7 bond: remove switch statement from Rx burst
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-24 21:43:12 +01:00
Declan Doherty
76d29903f5 bond: support link status interrupt
Adding support for lsc interrupt from bonded device to link
bonding library with supporting unit tests in the test application.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tested-by: SunX Jiajia <sunx.jiajia@intel.com>
2014-11-24 21:40:29 +01:00
John W. Linville
364e08f2bb af_packet: add PMD for AF_PACKET-based virtual devices
This is a Linux-specific virtual PMD driver backed by an AF_PACKET
socket.  This implementation uses mmap'ed ring buffers to limit copying
and user/kernel transitions.  The PACKET_FANOUT_HASH behavior of
AF_PACKET is used for frame reception.  In the current implementation,
Tx and Rx queues are always paired, and therefore are always equal
in number -- changing this would be a Simple Matter Of Programming.

Interfaces of this type are created with a command line option like
"--vdev=eth_af_packet0,iface=...".  There are a number of options available
as arguments:

 - Interface is chosen by "iface" (required)
 - Number of queue pairs set by "qpairs" (optional, default: 1)
 - AF_PACKET MMAP block size set by "blocksz" (optional, default: 4096)
 - AF_PACKET MMAP frame size set by "framesz" (optional, default: 2048)
 - AF_PACKET MMAP frame count set by "framecnt" (optional, default: 512)

Signed-off-by: John W. Linville <linville@tuxdriver.com>
[Thomas: disable because of incompatibility with some kernels]
2014-11-24 16:39:49 +01:00
Sergio Gonzalez Monroy
9c2c27f8c4 cmdline: fix for bsd
Some features of the cmdline were broken in FreeBSD as a result of
termios not being compiled.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-24 13:17:49 +01:00
Sergio Gonzalez Monroy
89a233209e xenvirt: fix reference to old mbuf field
Since commit 08b563ffb1 ("mbuf: replace data pointer by an offset"),
data is not an mbuf field anymore.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-24 13:17:49 +01:00
Pawel Wodkowski
a1a57f3e11 alarm: make cancellation thread-safe
It eliminates a race between threads using rte_alarm_cancel() and
rte_alarm_set().

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-24 13:17:49 +01:00
Balazs Nemeth
5bc3c265cb table: fix pointer calculations at initialization
During initialization of rte_table_hash_ext and rte_table_hash_lru, a
contiguous region of memory is allocated to store meta data, buckets,
extended buckets, keys, stack of keys, stack of extended buckets and
data entries. The size of each region depends on the hash table
configuration.

The address of each region is calculated using offsets relative to the
beginning of the memory region. Without this patch, the offsets
contain the size of the table meta data (sizeof(struct
rte_table_hash)). These addresses are stored in pointers which are
used when entries are added or deleted and lookups are performed.

Instead of adding these offsets to the address of the beginning of the
memory region, they are added to the address of the end of the meta
data (= address of the beginning of the memory region + sizeof(struct
rte_table_hash)). The resulting addresses are off by sizeof(struct
rte_table_hash) bytes. As a consequence, memory past the allocated
region can be accessed by the add, delete and lookup operations.

This patch corrects the address calculation by not including the size
of the meta data in the offsets.

Signed-off-by: Balazs Nemeth <balazs.nemeth@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2014-11-24 13:17:49 +01:00
Balazs Nemeth
8595428e50 table: fix incorrect initialization
During initialization of rte_hash_table_ext and rte_hash_table_lru,
t->data_size_shl is calculated.  This member contains the number of
bits to shift left during calculation of the location of entries in
the hash table.  To determine the number of bits to shift left, the
size of the entry (as provided to the rte_table_hash_ext_create and
rte_table_hash_lru_create) has to be used instead of the size of the
key.

Signed-off-by: Balazs Nemeth <balazs.nemeth@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2014-11-24 13:17:49 +01:00
Balazs Nemeth
14f2544cda table: fix checking extended buckets in unoptimized case
If a key is not found in a bucket and the bucket has been extended,
the extended buckets also have to checked for potentially matching
keys. The extended buckets are checked at the end of the lookup. In
most cases, this logic is skipped as it is uncommon to have buckets in
an extended state.

In case the lookup is performed with less than 5 packets, an
unoptimized version is run instead (the optimized version requires at
least 5 packets). The extended buckets should also be checked in this
case instead of simply ignoring the extended buckets.

Signed-off-by: Balazs Nemeth <balazs.nemeth@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2014-11-24 13:17:49 +01:00
Balazs Nemeth
fb20a4bd0f table: fix empty bucket removal during entry deletion
When an entry is deleted from an extensible rte_table_hash, the bucket
that stored the entry can become empty. If this is the case, the
bucket needs to be removed from the chain of buckets.

During removal of the bucket, the chain should be updated first. If
the bucket that will be removed is cleared first, the chain is broken
and the information to update the chain is lost.

Signed-off-by: Balazs Nemeth <balazs.nemeth@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-24 13:17:49 +01:00
Yong Wang
2e84937377 vmxnet3: leverage data ring on Tx path
Data_ring is a pre-mapped guest ring buffer that vmxnet3
backend has access to directly without a need for buffer
address mapping and unmapping during packet transmission.
It is useful in reducing device emulation cost on the tx
path.  There are some additional cost though on the guest
driver for packet copy and overall it's a win.

This patch leverages the data_ring for packets with a
length less than or equal to the data_ring entry size
(128B).  For larger packet, we won't use the data_ring
as that requires one extra tx descriptor and it's not
clear if doing this will be beneficial.

Performance results show that this patch significantly
boosts vmxnet3 64B tx performance (pkt rate) for l2fwd
application on a Ivy Bridge server by >20% at which
point we start to hit some bottleneck on the rx side.

Signed-off-by: Yong Wang <yongwang@vmware.com>
2014-11-14 17:32:27 +01:00
Yong Wang
14680e3747 vmxnet3: improve Rx performance
This patch includes two small performance optimizations
on the rx path:

(1) It adds unlikely hints on various infrequent error
paths to the compiler to make branch prediction more
efficient.

(2) It also moves a constant assignment out of the pkt
polling loop.  This saves one branching per packet.

Performance evaluation configs:
- On the DPDK-side, it's running some l3 forwarding app
inside a VM on ESXi with one core assigned for polling.
- On the client side, pktgen/dpdk is used to generate
64B tcp packets at line rate (14.8M PPS).

Performance results on a Nehalem box (4cores@2.8GHzx2)
shown below.  CPU usage is collected factoring out the
idle loop cost.
- Before the patch, ~900K PPS with 65% CPU of a core
used for DPDK.
- After the patch, only 45% of a core used, while
maintaining the same packet rate.

Signed-off-by: Yong Wang <yongwang@vmware.com>
2014-11-14 17:32:01 +01:00
Yong Wang
d768f6273c vmxnet3: add Rx check offloads
Only supports IPv4 so far.

Signed-off-by: Yong Wang <yongwang@vmware.com>
2014-11-14 17:31:43 +01:00
Yong Wang
5aecdc17a9 vmxnet3: fix stop/restart
This change makes vmxnet3 consistent with other pmds in
terms of dev_stop behavior: rather than releasing tx/rx
rings, it only resets the ring structure and release the
pending mbufs.

Verified with various tests (test-pmd and pktgen) over
vmxnet3 that dev stop/restart works fine.

Signed-off-by: Yong Wang <yongwang@vmware.com>
2014-11-14 17:31:18 +01:00
Yong Wang
3604496377 vmxnet3: add vlan Tx offload
Signed-off-by: Yong Wang <yongwang@vmware.com>
2014-11-14 17:31:06 +01:00
Yong Wang
b3e03223f1 vmxnet3: fix vlan Rx stripping
Shouldn't reset vlan_tci to 0 if a valid VLAN tag is stripped.

Signed-off-by: Yong Wang <yongwang@vmware.com>
2014-11-14 17:30:51 +01:00
Thomas Monjalon
4b9bb6b71a acl: fix code typos
Replace indicies by indices.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-14 17:23:50 +01:00
Thomas Monjalon
7eef9194ab acl: fix comments typos
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-14 17:23:50 +01:00
Qinglai Xiao
ecb6c4559e distributor: enhance and fix tag matching
With introduction of in_flight_bitmask, the whole 32 bits of tag can be
used. Further more, this patch fixed the integer overflow when finding
the matched tags.
The maximum number workers is now defined as 64, which is length of
double-word. The link between number of workers and RTE_MAX_LCORE is
now removed. Compile time check is added to ensure the
RTE_DISTRIB_MAX_WORKERS is less than or equal to size of double-word.

Signed-off-by: Qinglai Xiao <jigsaw@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-13 12:26:10 +01:00
Qinglai Xiao
9f2e99d171 mbuf: add usr alias for hash
This field is added for librte_distributor. User of librte_distributor
is advocated to set value of mbuf->hash.usr before calling
rte_distributor_process. The value of usr is the tag which stands as
identifier of flow.

Signed-off-by: Qinglai Xiao <jigsaw@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-13 12:26:10 +01:00
Helin Zhang
8dae34c15c eal: update i40e supported devices
According to the changes of the i40e base driver, two device
IDs (0x1573, 0x1582) are not supported anymore, and one new
device ID (0x1586) is supported. The list of i40e device IDs
DPDK supported should be modified accordingly.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-13 10:26:00 +01:00
Cunming Liang
5974ee01f4 ixgbe: fix reconfiguration of Rx method
The scattered_rx configuration is updated in dev_start().
For the execution sequence "stop, re-configure and then re-start",
it expects using the new configuration.
But during re-configure, the stored data may still be the old one.
The patch clean the configuration anyway in dev_stop().
So that make sure always get the best Rx routine.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
2014-11-13 00:48:16 +01:00
Cunming Liang
5e8ae7fc91 ethdev: fix Rx/Tx return in debug mode
Per definition, rte_eth_rx_burst/rte_eth_tx_burst/rte_eth_rx_queue_count
returns the packet number.
When RTE_LIBRTE_ETHDEV_DEBUG turns on, retval of FUNC_PTR_OR_ERR_RTE was
set to -ENOTSUP. It makes confusing.
The patch always return 0 no matter no packet or there's error.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-13 00:48:16 +01:00
Cunming Liang
ec3d82db2d ether: new function to format mac address
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
2014-11-13 00:48:16 +01:00
Ouyang Changchun
90924caf08 vhost: enable promiscuous and multicast
This is to enable user space vhost receiving and forwarding broadcast
and multicast packets:
Use new option in command line to enable promisc mode;
Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-11-12 00:10:23 +01:00
Ouyang Changchun
cd91b7348d virtio: support promiscuous and allmulticast
Add codes for supporting promiscuous and allmulticast enable and disable.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-11-12 00:10:23 +01:00
Ouyang Changchun
38da13a9c3 ixgbe: VMDQ Rx mode
Config PFVML2FLT register in ixgbe PMD to enable it receive broadcast and multicast packets;
also factorize the common logic with ixgbe_set_pool_rx_mode.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-12 00:10:23 +01:00
Ouyang Changchun
8d74cfc4d2 igb: VMDQ Rx mode
Config VM offload register in igb PMD to enable it receive broadcast and multicast packets.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-12 00:10:17 +01:00
Ouyang Changchun
7e1fceb51d ethdev: VMDQ Rx mode
Add vmdq rx mode field into rx config struct, it is flag from ETH_VMDQ_ACCEPT_*.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-12 00:10:12 +01:00
Jia Yu
5bad0b917e kni: add build-time checks for mbuf mapping
Adding this check is to avoid breakage from future data structure changes.

Signed-off-by: Jia Yu <jyu@vmware.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-10 10:28:46 +01:00
Thomas Monjalon
4ffab9b998 kni: fix build
Since commit 08b563ffb1 ("mbuf: replace data pointer by an offset"),
KNI vhost compilation (CONFIG_RTE_KNI_VHOST=y) was broken.

rte_pktmbuf_mtod() is not used in the kernel context but is replaced
by a simple addition of the base address and the offset.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-11-10 10:28:46 +01:00
Bruce Richardson
98d5a1318a distributor: add comments
Add in some additional comments around more complex areas of the code
so as to make the code easier to read and understand.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-07 15:04:59 +01:00
David Marchand
1c1dc182da eal: fix C++ compilation after headers rework
Following the big headers rework, all C++ stuff has moved to arch-specific
headers. The generic headers should not contain this so that this is done only
once.
There was a remaining #ifdef __cplusplus in "eal: split CPU cycle operation to
architecture specific" (fa4001c30e).

Reported-by: Keunhong Lee <dlrmsghd@gmail.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-07 11:57:16 +01:00
Thomas Monjalon
ee63ac39f8 i40e: fix build with icc
Since commit d798a94 ("mac vlan filter"),
ICC reports this error:
	lib/librte_pmd_i40e/i40e_ethdev.c(1763): error #188:
	enumerated type mixed with another type

Indeed, RTE_ETH_FILTER_NONE comes from enum rte_filter_type but
enum rte_filter_op is expected.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-07 00:08:27 +01:00
Helin Zhang
d6b1972909 i40evf: support configurable crc stripping
Configurable CRC stripping needs to be supported in VF,
and the configuration should be finally set in relevant
RX queue context with PF host support.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-06 23:50:14 +01:00
Helin Zhang
9c7aeb45f4 i40e: support configurable crc stripping
Support of configurable crc stripping in context of
VF RX queues.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-06 23:50:14 +01:00
Helin Zhang
e1879011c8 i40e: fix code style
Rename some local variables to express more accurately
and briefly. Fix several code style issues reported by
checkpatch.pl. Line warpping for some source lines which
has more than 80 characters, and merge lines together for
those source lines which does not need any line wrapping
actually. Add macros for numeric or calculating memory
sizes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-06 23:50:14 +01:00
Helin Zhang
5e904375a4 i40evf: rework mailbox version check
API version number is straightfoward enough for checking
the PF host, and no need to use 'host_is_dpdk'.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-06 23:50:14 +01:00
Ouyang Changchun
4e3eff86cf vhost: fix mem path check
Commit aec8283d47 fixes the compilation issue, but it leads to
one runtime issue: early exit wrongly. In some case, 'path' is NULL, but
'resolved_path' has effective path, it should continue going ahead rather
than exit.
This is due to that qemu unlink the file after it maps the huge page file.
In this special case, it is ok to check the resolved path
when path is NULL if errno indicates "No such file or directory".

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2014-11-06 23:12:02 +01:00
Huawei Xie
af4f2c5feb vhost: fix code style
Fix alignment issues, lengthy lines, misordered type and other coding style issues.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-06 23:12:02 +01:00
David Marchand
a0d395597d eal: factorize x86 headers
No need to keep the same code duplicated for 32 and 64bits x86.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Chao Zhu <bjzhuc@cn.ibm.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:20:24 +01:00
David Marchand
4573013513 eal: install all arch headers
Architecture can have their own specific headers, just install all headers from
arch directory.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Chao Zhu <bjzhuc@cn.ibm.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:20:17 +01:00
Chao Zhu
d900193518 eal: split CPU flags operations to architecture specific
This patch splits CPU flags related operations from DPDK and push them
to architecture specific arch directories, so that other processor
architecture can implement its own CPU flag functions to support DPDK.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:20:12 +01:00
Chao Zhu
8468f49071 eal: split memcpy operation to architecture specific
This patch splits the SSE based memory copy function from DPDK and push
them to architecture specific arch directories. Other processor
architecture can implement its own vector based memory copy functions.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:20:05 +01:00
Chao Zhu
8a65109c01 eal: split spinlock operations to architecture specific
This patch splits the spinlock operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:19:23 +01:00
Chao Zhu
2eefe6f11b eal: split prefetch operations to architecture specific
This patch splits the prefetch operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can implement their own functions.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:19:16 +01:00
Chao Zhu
fa4001c30e eal: split CPU cycle operation to architecture specific
This patch splits the CPU TSC read operations from DPDK and push them to
architecture specific arch directories, so that other processors that
don't have tsc register can implement its own functions.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:19:10 +01:00
Chao Zhu
b56f46c327 eal: split byte order operations to architecture specific
This patch splits the byte order operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:19:04 +01:00
David Marchand
4b2fd987d8 eal: split atomic operations to architecture specific
This patch first adds architecture specific directories to eal.
Then split the atomic operations to architecture specific and generic files.
Architecture specific files are put into the corresponding architecture
directory and common header are put into generic directory.

Update documentation generation with new generic/ directory.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:18:31 +01:00
Matthew Hall
a4ff75496a eal/linux: fix inaccurate numa node error comment
Signed-off-by: Matthew Hall <mhall@mhcomputing.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-11-05 22:18:31 +01:00
Pablo de Lara
699d55e3ff eal/bsd: fix pci mapping in secondary process
On FreeBSD, when initializing a secondary process,
EAL was complaining if there were ports not bound
to nic_uio module, exiting the application, which
should not happen, as this is expected behaviour,
and not an error

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:18:31 +01:00
Bruce Richardson
0a88485379 ixgbe: fix build with icc
When using Intel C++ compiler(icc) 14.0.1.106 or the older icc 13.x
version, the mbuf initializer variable was not getting configured
correctly, as the mb_def variable was not set correctly. This is due
to an issue with icc (DPD200249565 which already been fixed in
icc 14.0.2 and newer compiler release) where it incorrectly calculates
the field offsets with initializers when zero-sized fields
are used in a structure.
To work around this, the code in ixgbe_rxq_vec_setup does not setup the
fields using an initializer, but instead assigns the values individually
in code.
NOTE: There is no performance impact to this change as the queue
setup functions are not data-plane APIs, but are only used at app
initialization.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-11-05 22:18:31 +01:00
Choonho Son
a87db84bd4 i40e: fix build with debug enabled
The commit 15dbb63ef9 ("VXLAN packet identification") didn't compile,
if CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER is enabled.

Signed-off-by: Choonho Son <choonho.son@gmail.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-11-05 22:18:31 +01:00
Chen Jing D(Mark)
b6583ee402 i40e: full VMDQ pools support
1. Function i40e_vsi_* name change to i40e_dev_* since PF can contains
   more than 1 VSI after VMDQ enabled.
2. i40e_dev_rx/tx_queue_setup change to have capability of setup
   queues that belongs to VMDQ pools.
3. Add queue mapping. This will do a convertion between queue index
   that application used and real NIC queue index.
3. i40e_dev_start/stop change to have capability switching VMDQ queues.
4. i40e_pf_config_rss change to calculate actual main VSI queue numbers
   after VMDQ pools introduced.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-05 00:21:34 +01:00
Chen Jing D(Mark)
4805ed59e9 i40e: enhance mac address operations
Change i40e_macaddr_add and i40e_macaddr_remove functions to support
multiple macaddr add/delete. In the meanwhile, support macaddr ops
on different pools.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-05 00:21:34 +01:00
Chen Jing D(Mark)
c9eb97fb92 i40e: add VMDQ support
The change includes several parts:
1. Get maximum number of VMDQ pools supported in dev_init.
2. Fill VMDQ info in i40e_dev_info_get.
3. Setup VMDQ pools in i40e_dev_configure.
4. i40e_vsi_setup change to support creation of VMDQ VSI.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-05 00:21:34 +01:00
Chen Jing D(Mark)
40513777fd ixgbe: new VMDQ argument
Assign new VMDQ arguments with correct values.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-05 00:21:34 +01:00
Chen Jing D(Mark)
467f83c6ed igb: new VMDQ argument
Assign new VMDQ arguments with correct values.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-05 00:21:34 +01:00
Chen Jing D(Mark)
4bdefaade6 ethdev: VMDQ enhancements
The change includes several parts:
1. Clear pool bitmap when trying to remove specific MAC.
2. Define RSS, DCB and VMDQ flags to combine rx_mq_mode.
3. Use 'struct' to replace 'union', which to expand the rx_adv_conf
   arguments to better support RSS, DCB and VMDQ.
4. Fix bug in rte_eth_dev_config_restore function, which will restore
   all MAC address to default pool.
5. Define additional 3 arguments for better VMDQ support.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-11-05 00:21:25 +01:00
Helin Zhang
03e801bc79 i40e: fix PF interrupt handler
'PFINT_ICR0_ENA' shouldn't be cleared in user space ISR,
otherwise adminq interrupts might be missed during
co-working with VF initialization.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
2014-11-04 11:20:11 +01:00
Helin Zhang
2e38e98bbf i40e: handle link status change interrupt
As driver flag of 'RTE_PCI_DRV_INTR_LSC' was introduced
recently, it must be added in i40e. One second waiting
is needed for link up event, to wait the hardware into a
stable state, so the interrupt could be processed in two
parts. In addition, it should finally call the callback
function registered by application.

Test: http://dpdk.org/ml/archives/dev/2014-September/006091.html

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Reviewed-by: Jing Chen <jing.d.chen@intel.com>
Reviewed-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
2014-11-03 18:02:41 +01:00
Helin Zhang
ad4631322d i40e: rework PF interrupt cause processing
To get the code cleaner and more straightforward, a macro
is defined for all interrupt cause enable flags. Two more
causes are enabled, and all the interrupt causes for
reporting any errors are compiled conditionally, as they
are for debug only.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Reviewed-by: Jing Chen <jing.d.chen@intel.com>
Reviewed-by: Jijiang Liu <jijiang.liu@intel.com>
2014-11-03 17:59:18 +01:00
Helin Zhang
3fd0200a35 i40e: rename variables in interrupt handler
To be more straightforward, two local variables in interrupt
handler are renamed.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Reviewed-by: Jing Chen <jing.d.chen@intel.com>
Reviewed-by: Jijiang Liu <jijiang.liu@intel.com>
2014-11-03 17:54:14 +01:00
Jijiang Liu
d9129af2cb i40e: fix mac vlan filter
This patch fixes two issues:
- one is to fix the log issues
- the other is to set filter type when updating the default MAC filter.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
2014-10-31 10:18:40 +01:00
Jijiang Liu
3b496a2759 ethdev: fix mac vlan filter type
Use the enum type for filter type.
It fixes the compilation error under ICC compiler.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-31 10:18:40 +01:00
Jijiang Liu
d798a94a3d i40evf: mac vlan filter
It mainly add i40e_vf_mac_filter_set() function to support perfect match
and hash match of MAC address and VLAN ID for VF.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
2014-10-30 23:33:46 +01:00
Jijiang Liu
912b595146 i40e: mac vlan filter
This patch mainly optimizes the i40e_add_macvlan_filters() and
the i40e_remove_macvlan_filters() functions in order that
we are able to provide filter type configuration.
And another relevant MAC filter codes are changed based on new data structures.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
2014-10-30 23:33:40 +01:00
Jijiang Liu
de8f4d5704 ethdev: mac vlan filter
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-30 23:33:33 +01:00
Alexander Guy
e20d254882 kni: fix build on Ubuntu-hybrids
In the case where a userspace reports itself as Ubuntu, but the
kernel isn't providing the expected version signature interface,
turn off Ubuntu specializations.

This situation happens often enough in development environments,
and with multi-distribution build servers (e.g. chroot, containers).

Signed-off-by: Alexander Guy <alexander@andern.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-30 00:55:23 +01:00
Pablo de Lara
c0cddf03f1 ip_frag: disable ip fragmentation if mbuf refcnt is disabled
rte_ipv4_fragment_packet() and rte_ipv6_fragment packet()
call rte_pktmbuf_attach() to attach the segment of the original
packet to the segment of the new fragmented one. Such function
is not declared if RTE_MBUF_REFCNT is disabled, as it needs to
call rte_mbuf_refcnt_update, not declared either.

Therefore, the ipv4/v6 fragmentation libraries are disabled
in that situation.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2014-10-30 00:51:10 +01:00
Bruce Richardson
33e79bed3e ixgbe: always perform vector Rx setup if vpmd enabled
If the vector pmd option is turned on in the compile time config file,
then always call the vector rxq setup, since we can now use the vector
PMD for receiving jumbo frames that need chained mbufs, a.k.a scattered
packets. Up till now, this function was not being called when receiving
scattered packets, potentially leading to problems with mbufs not being
properly initialized.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Changchun Ouyang <Changchun.ouyang@intel.com>
2014-10-30 00:33:33 +01:00
Bruce Richardson
ed49550a71 ixgbe: remove static qualifier for thread safety
Remove the "static" prefix to the template mbuf variable in
ixgbe_rxq_vec_setup function. This will then allow different
threads to initialize different RX queues at the same time,
without one overwriting the other's data.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Changchun Ouyang <Changchun.ouyang@intel.com>
2014-10-30 00:33:33 +01:00
Ouyang Changchun
aec8283d47 vhost: fix build without unused result
It fixes this compilation complain: "error: ignoring return value of 'realpath',
declared with attribute warn_unused_result [-Werror=unused-result]"

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
2014-10-30 00:19:41 +01:00
Helin Zhang
fc52326f94 i40e/base: remove early hardware definitions
As i40e_register_x710_int.h is defined for early hardware
only, it should be deleted.
For the register which is still required, just define it in
code directly as workaround.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
2014-10-29 23:52:43 +01:00
Helin Zhang
c977ba006c i40e: fix configuration of Rx interrupt
Inside NIC RX interrupt is needed for single RX descriptor
write back. The fix is to correct the wrong configuration
of register 'I40E_QINT_RQCTL'.
Note that interrupt will be inside NIC only, that means it
will never be reported outside NIC hardware.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
2014-10-29 23:44:26 +01:00
Helin Zhang
f20d60556c i40e: code style fixes
Add several code style fixes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
2014-10-29 23:44:26 +01:00
Thomas Monjalon
7e33267d41 fix VXLAN acronym
Each letter must be uppercased:
Virtual eXtensible Local Area Network (VXLAN)

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-27 16:11:04 +01:00
Jijiang Liu
77b8301733 i40e: VXLAN Tx checksum offload
Support VxLAN Tx checksum offload, which include
  - outer L3(IP) checksum offload
  - inner L3(IP) checksum offload
  - inner L4(UDP, TCP and SCTP) checksum offload

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
2014-10-27 14:37:34 +01:00
Jijiang Liu
9dda959a4b i40e: VXLAN filter
The filter types supported are listed below for VXLAN:
   1. Inner MAC and Inner VLAN ID.
   2. Inner MAC address, inner VLAN ID and tenant ID.
   3. Inner MAC and tenant ID.
   4. Inner MAC address.
   5. Outer MAC address, tenant ID and inner MAC address.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
2014-10-27 14:37:34 +01:00
Jijiang Liu
982cca14ef ethdev: tunnel filter
Add definitions of the data structures of tunneling packet filter.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-27 14:37:34 +01:00
Jijiang Liu
15dbb63ef9 i40e: VXLAN packet identification
Implement the configuration API of VXLAN destination UDP port,
and add new Rx offload flags for supporting VXLAN packet offload.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
2014-10-27 14:37:34 +01:00
Jijiang Liu
6f1f04afac ethdev: UDP tunnels
Add two functions to support UDP tunneling port configuration.

There are "some" destination UDP port numbers that have unique meaning.
In terms of VxLAN, "IANA has assigned the value 4789 for the VXLAN UDP port,
and this value SHOULD be used by default as the destination UDP port.
Some early implementations of VXLAN have used other values for the destination
port. To enable interoperability with these implementations, the destination
port SHOULD be configurable."

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-27 14:37:34 +01:00
Jijiang Liu
20f4f53aea ether: add VXLAN header
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-27 14:31:45 +01:00
Jijiang Liu
73b7d59cf4 mbuf: add fields for tunnels
Replace the "reserved2" field with the "packet_type" field
and add the "inner_l2_l3_len" field in the rte_mbuf structure.
The "packet_type" field is used to indicate ordinary packet format and also
tunneling packet format such as IP in IP, IP in GRE, MAC in GRE and MAC in UDP.
The "inner_l2_len" and the "inner_l3_len" fields are added
in the second cache line, they use 2 bytes for TX offloading of tunnels.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Yong Liu <yong.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-27 11:43:51 +01:00
Declan Doherty
fa7f63e7e2 bond: disable broadcast mode if mbuf refcnt is disabled
Link bonding broadcast mode requires refcnt parameter in the mbuf struct to
allow efficient transmission of duplicated mbufs on slave ports.

This patch disables broadcast mode when the complication option RTE_MBUF_REFCNT
is disabled to allow clean building of the bonding library.
A warning message notify user of disabling of broadcast mode.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-22 15:47:18 +02:00
Marc Sune
71d47d3d2e kni: fix build
Fix compilation warning 'missing-field-initializers' for some GCC and clang
versions introduced in commit 0c6bc8e due to the use of C89/C90 initializers.
Using C99-style initializers

Signed-off-by: Marc Sune <marc.sune@bisdn.de>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-22 12:30:32 +02:00
Marc Sune
0c6bc8ef70 kni: memzone pool for alloc and release
The previous implementation of rte_kni_alloc() was allocating memzones with a
name composed of a fixed string and the interface name. When an application was
allocating and deallocating multiple interfaces with different names, memzones
were quickly exhausted, even though memzones from deallocated interfaces were
never used anymore (unless an interface with the same name was re-allocated).
As a result, the application was unable to allocate more KNI interfaces with
different names.

This patch implements the KNI memzone pool in order to prevent memzone
exhaustion when allocating/deallocating KNI interfaces. It adds a new API call,
rte_kni_init(max_kni_ifaces) that shall be called before any call to
rte_kni_alloc() if KNI is used. The memzones are pre-allocated with interface-
independent names so that they can be reused.

Signed-off-by: Marc Sune <marc.sune@bisdn.de>
Acked-by: Helin Zhang <helin.zhang@intel.com>
2014-10-21 17:24:53 +02:00
Ouyang Changchun
3ead3080aa ixgbe: fix build with mbuf refcnt disabled
An error has been introduced by commit 1f22652ca8
("fix perf regression due to moved pool ptr").

Fix the case where RTE_MBUF_REFCNT is disabled.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-21 09:51:10 +02:00
Jingjing Wu
455d09e54b i40e: generic filter control
Only provide empty handler.
It can be completed to support filter features on fortville.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: remove unused empty functions]
2014-10-20 23:51:05 +02:00
Jingjing Wu
fbd0d8e67f ethdev: introduce generic filter control
Define a new API umbrella to configure any kind of Rx filtering.
New functions:
- rte_eth_dev_filter_supported
- rte_eth_dev_filter_ctrl

Filter types, operations, and structures are defined specifically in new
header file lib/librte_eth/rte_dev_ctrl.h.

As to the implementation discussion, please refer to
http://dpdk.org/ml/archives/dev/2014-September/005179.html

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
[Thomas: rename ops and remove unused types]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-20 23:51:05 +02:00
Helin Zhang
cea7a51c17 i40evf: support RSS
i40e hardware supports RSS in VF.
It's now supported in this driver.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
2014-10-20 23:51:05 +02:00
Helin Zhang
08832b2d49 i40e: expose RSS functions and relevant macros
To reuse code, 'i40e_config_hena()' and 'i40e_parse_hena()' and
their relevant macros need to be extern, and then can be used for
both PF and VF parts.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
2014-10-20 23:51:05 +02:00
Helin Zhang
0434614c53 ethdev: better typing of RSS constants
Forced type conversion is not needed to define a macro with
constant. The alternate is to let compiler use the default width,
or specify the width with suffix of 'U', 'UL', 'ULL', etc.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
2014-10-20 23:51:05 +02:00
Alan Carew
f2c4afab58 contigmem: fix buffer overrun on unload
The maximum mount contiguous memory regions for FreeBSD is limited by
RTE_CONTIGMEM_MAX_NUM_BUFS, a pointer to each region is stored in
static void * contigmem_buffers[RTE_CONTIGMEM_MAX_NUM_BUFS]

A user can specify a greater amount via hw.contigmem.num_buffers,
while the allocation logic will prevent this allocation from occuring the logic
in contigmem_unload() will attempt to free hw.contigmem.num_buffers and an
overrun occurs.

This patch limits the freeing to a maximum of RTE_CONTIGMEM_MAX_NUM_BUFS.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-10-20 23:50:35 +02:00
Pablo de Lara
06cada9fc6 ethdev: fix memory corruption with default Rx/Tx configuration
Commit fbde27f1 (get default Rx/Tx configuration from dev info),
introduced a bug, which caused memory corruption in dev_info.
To get RX/TX configuration, both rx/tx queue setup functions were calling
dev_info_get from PMDs, so dev_info structure was not being reseted
before being populated.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-20 23:50:14 +02:00
Ouyang Changchun
d07982f4b1 virtio: increase max Rx packet length
Since commit 13ce5e7eb9 ("virtio: mergeable buffers"),
this driver has the capability of receiving and transmitting jumbo frame.
So update max Rx packet length.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
2014-10-15 16:16:40 +02:00
Jijiang Liu
6bfe648406 i40e: add Rx error statistics
Add incoming packet error statistics for i40e.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
2014-10-15 14:27:06 +02:00
Helin Zhang
96d5c1656b i40e/base: fix build with gcc < 4.4
It fixes the compile error as below on gcc version 4.3.4.
cc1: error: unrecognized command line option "-Wno-unused-but-set-variable"

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Zhaochen Zhan <zhaochen.zhan@intel.com>
2014-10-15 09:49:00 +02:00
Ouyang Changchun
a779ba05d4 virtio: fix needed vring entry number
Fix one issue in virtio TX: it needs one more vring descriptor to hold the virtio
header when transmitting packets, it is used later to determine whether to free
more entries from used vring.
It fixes failing to transmit any packet with 1 segment in the circumstance of only
1 descriptor in the vring free list.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2014-10-15 08:25:13 +02:00
Thomas Monjalon
8933dae15c vhost: add in doc
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-13 19:39:38 +02:00
Huawei Xie
7c845c1fcd vhost: add makefile
vhost lib is turned off by default.
vhost lib is based on cuse, which requires fuse development package
to be installed.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: fix build dependencies]
2014-10-13 19:16:54 +02:00
Huawei Xie
22c668d494 vhost: comment identified issues
1) FIXME: concurrent calls to vhost set mem table from different guests
could cause mem_temp to be overrided.
2) TODO: cmpset cost quite some cpu cyles. Allow app to disable this
feature if there is no contention in real workload.
3) FIXME: fix scatter gather mbuf copy to vhost vring chained buffers.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
60ddca7654 vhost: coding style fixes
Fix serious coding style issues reported by checkpatch.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
38726fb1b6 vhost: static variable fixes
Add "static" for some variable definitions.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
da74110053 vhost: clean includes
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
1c01d52392 vhost: add debug print
Define PRINT_PACKET and LOG_DEBUG macros.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
a58f905514 vhost: add private context field
priv field could be used to store application specific context.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
3d671af711 vhost: supported features
VHOST_SUPPORTED_FEATURES is the feature mask that vhost lib supports.
VHOST_FEATURES is the feature mask vhost currently supports after some features are turned on/off.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
2014-10-13 19:16:54 +02:00
Huawei Xie
9eed6bfd2e vhost: allow to enable or disable features
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
2014-10-13 19:16:54 +02:00
Huawei Xie
7202b0a824 vhost: get available vring entries
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
2014-10-13 19:16:54 +02:00
Huawei Xie
28689ff04d vhost: rename ops registering function
Rename init_virtio_net as rte_vhost_callback_register API.
rte_vhost_callback_register register the callbacks called when a
vhost device is created and ready to be added to data processing core
or is de-actived by guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
e5c2ded503 vhost: expose register and start functions
Rename register_cuse_device as rte_vhost_driver_register API.
Rename start_session_loop as rte_vhost_driver_session_start API.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
f782576959 vhost: get internal ops when registering
vhost_net_device_ops is internal implementation in vhost lib.
register_cuse_device will be vhost driver register API.
There is no need for it to know the internal vhost ops.
Instead, that ops is retrieved in register_cuse_device
through get_virtio_net_callbacks.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
17b8320a3e vhost: remove index parameter
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
a4fff3bba8 vhost: enqueue/dequeue burst
rte_vhost_enqueue_burst copies host packets to guest.
rte_vhost_enqueue_burst will call virtio_dev_rx and virtio_dev_merge_rx
respectively depending on whether merge-able feature is negotiated or not
in the vhost device.

virtio_dev_merge_tx is renamed to rte_vhost_dequeue_burst.
rte_vhost_dequeue_burst gets to-be-sent packets from guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: merged patches]
2014-10-13 19:16:54 +02:00
Huawei Xie
830bea8bc4 vhost: add queue id parameter
queue_id parameter is added to Rx/Tx functions for multiple queue support
in future.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
fa325fa413 vhost: calculate mbuf size
As a lib, we have no idea the app defined mbuf size.
This patch will calculate mbuf size dynamically.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:53 +02:00
Huawei Xie
20f16ce646 vhost: return packets to upper layer
This patch makes virtio_dev_merge_tx return the received packets to app layer.
Previously virtio_tx_route was called to route these packets and then free them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:53 +02:00
Huawei Xie
7f456f6d61 vhost: move address translation function
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split from a previous patch]
2014-10-13 19:16:04 +02:00
Huawei Xie
c19dc7db15 vhost: move internal structure
The structure virtio_net_config_ll is moved to virtio_net.c.
It is related to internal virtio device management,
so it should not be exposed to other files.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
8a7576c3dc vhost: remove retry logic
It was used to wait some time and retry when there are not enough descriptors.
App could implement this policy easily if it needs.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00