Commit Graph

722 Commits

Author SHA1 Message Date
Bruce Richardson
21dc08a991 mbuf: reorder fields by time of use
*  Reorder the fields in the mbuf so that we have fields that are used
together side-by-side in the structure. This means that we have a
contiguous block of 8-bytes in the mbuf which are used to reset an mbuf
of descriptor rearm, and a block of 16-bytes of data (excluding flags)
which are set on RX from the received packet descriptor.
* Use dummy fields as appropriate to ensure alignment or to reserve gaps
for later field additions.
* Place most items which are not used by fast-path RX separately at the end
of the structure so they can later be moved to a separate cache line.
[The l2/l3 length fields are not moved at this stage as doing so will
cause overflow to the next cache line].

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-09-17 18:53:40 +02:00
Olivier Matz
08b563ffb1 mbuf: replace data pointer by an offset
The mbuf structure already contains a pointer to the beginning of the
buffer (m->buf_addr). It is not needed to use 8 bytes again to store
another pointer to the beginning of the data.

Using a 16 bits unsigned integer is enough as we know that a mbuf is
never longer than 64KB. We gain 6 bytes in the structure thanks to
this modification.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

* Updated to apply to latest on mainline.
* Disabled vector PMD in config as it relies heavily on the mbuf layout
  This will be re-enabled in a subsequent commit once vPMD has been
  reworked to take account of mbuf changes.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-09-17 18:53:40 +02:00
Bruce Richardson
7869536f3f mbuf: flatten struct vlan_macip
The vlan_macip structure combined a vlan tag id with l2 and l3 headers
lengths for tracking offloads. However, this structure was only used as
a unit by the e1000 and ixgbe drivers, not generally.

This patch removes the structure from the mbuf header and places the
fields into the mbuf structure directly at the required point, without
any net effect on the structure layout. This allows us to treat the vlan
tags and header length fields as separate for future mbuf changes. The
drivers which were written to use the combined structure still do so,
using a driver-local definition of it.

Reduce perf regression caused by splitting vlan_macip field. This is
done by providing a single uint16_t value to allow writing/clearing
the l2 and l3 lengths together. There is still a small perf hit to the
slow path TX due to the reads from vlan_tci and l2/l3 lengths being
separated. (<5% in my tests with testpmd with no extra params).
Unfortunately, this cannot be eliminated, without restoring the vlan
tags and l2/l3 lengths as a combined 32-bit field. This would prevent
us from ever looking to move those fields about and is an artificial tie
that applies only for performance in igb and ixgbe drivers. Therefore,
this patch keeps the vlan_tci field separate from the lengths as the
best solution going forward.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-09-17 11:29:17 +02:00
Bruce Richardson
ca04aaea80 mbuf: rename in_port to just port
In some cases we may want to tag a packet for a particular destination
or output port, so rename the "in_port" field in the mbuf to just "port"
so that it can be re-used for this purpose if an application needs it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-09-17 11:27:51 +02:00
Olivier Matz
ea672a8b16 mbuf: remove the rte_pktmbuf structure
The rte_pktmbuf structure was initially included in the rte_mbuf
structure. This was needed when there was 2 types of mbuf (ctrl and
packet). As the control mbuf has been removed, we can merge the
rte_pktmbuf into the rte_mbuf structure.

Advantages of doing this:
  - the access to mbuf fields is easier (ex: m->data instead of m->pkt.data)
  - make the structure more consistent: for instance, there was no reason
    to have the ol_flags field in rte_mbuf
  - it will allow a deeper reorganization of the rte_mbuf structure in the
    next commits, allowing to gain several bytes in it

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
[Bruce: updated for latest code and new example apps]
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-09-17 11:27:51 +02:00
Olivier Matz
9aaccf1abd mbuf: remove rte_ctrlmbuf
The initial role of rte_ctrlmbuf is to carry generic messages (data
pointer + data length) but it's not used by the DPDK or it applications.
Keeping it implies:
  - loosing 1 byte in the rte_mbuf structure
  - having some dead code rte_mbuf.[ch]

This patch removes this feature. Thanks to it, it is now possible to
simplify the rte_mbuf structure by merging the rte_pktmbuf structure
in it. This is done in next commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

* Updated patch to HEAD.
* Modified patch to retain the old function names for ctrl mbufs as
  macros. This helps with app compatibility, and allows the concept
  of a control mbuf to be reintroduced via a single-bit flag in
  a future change.
* Updated the packet framework ip_pipeline example application to
  work following this change.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-09-17 11:27:50 +02:00
Olivier Matz
62814bc2e9 mbuf: rename RTE_MBUF_SCATTER_GATHER into RTE_MBUF_REFCNT
It seems that RTE_MBUF_SCATTER_GATHER is not the proper name for the
feature it provides. "Scatter gather" means that data is stored using
several buffers. RTE_MBUF_REFCNT seems to be a better name for that
feature as it provides a reference counter for mbufs.

The macro RTE_MBUF_SCATTER_GATHER is poisoned to ensure this
modification is seen by drivers or applications using it.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-09-17 11:27:50 +02:00
Bruce Richardson
cafa596846 ixgbe: keep only non-zero initializer in mbuf definition
Since all unspecified fields in an initializer are assumed to be zero we
can simplify the empty mbuf definition in the vector driver to only use
the fields that are non-zero, i.e. just nb_segs = 1. This makes things
shorter and means that the structure doesn't need as many updates for
other fields being renamed or moved.

The variable itself is never modified and only used by a single function
so it can be made const and local to the using function.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2014-09-17 11:27:50 +02:00
Thomas Monjalon
f635747001 version: 1.8.0-rc0
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-09-16 14:15:33 +02:00
Thomas Monjalon
99213f3827 version: 1.7.1
RPM can be built for a default machine now.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-09-03 13:28:26 +02:00
Guillaume Gaudonville
11ba04265c igb_uio: fix build on RHEL 6.3
- pci_num_vf() is already defined in RHEL 6
- pci_intx_mask_supported is already defined in RHEL 6.3
- pci_check_and_mask_intx is already defined in RHEL 6.3

Signed-off-by: Guillaume Gaudonville <guillaume.gaudonville@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-09-03 13:28:26 +02:00
Thomas Monjalon
e478ee507a version: 1.7.1-rc1
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-09-03 03:59:12 +02:00
Thomas Monjalon
d8ee82745a igb_uio: revert MSI IRQ mode
This reverts commit 399a3f0db8
	"fix IRQ mode handling"
and part of commit 4a5c221f9d
	"fix compability on old kernel"

MSI implementation is using irq_to_desc which is not exported before
kernel 3.4 and commit 3911ff30.
Let's revert it for release 1.7.1, waiting for another solution.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-09-03 03:59:11 +02:00
Konstantin Ananyev
074f54ad03 acl: fix build and runtime for default target
Make ACL library to build/work on 'default' architecture:
- make rte_acl_classify_scalar really scalar
 (make sure it wouldn't use sse4 instrincts through resolve_priority()).
- Provide two versions of rte_acl_classify code path:
  rte_acl_classify_sse() - could be build and used only on systems with sse4.2
  and upper, return -ENOTSUP on lower arch.
  rte_acl_classify_scalar() - a slower version, but could be build and used
  on all systems.
- Addition of a new function rte_acl_classify_alg.  This function lets you
  specify an enum value to override the acl contexts default algorithm when doing
  a classification.  This allows an application to specify a classification
  algorithm without needing to publicize each method. I know there was concern
  over keeping those methods public, but we don't have a static ABI at the moment,
  so this seems to me a reasonable thing to do, as it gives us less of an ABI
  surface to worry about.
- keep common code shared between these two codepaths.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-09-03 03:26:50 +02:00
Aaro Koskinen
0ab95e82c1 kni: fix build with kernel 3.17
Since Linux commit "set name_assign_type in alloc_netdev" (c835a677331495),
the function alloc_netdev takes a new parameter (name_assign_type)
whose default value is NET_NAME_UNKNOWN.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-29 16:18:25 +02:00
Zhangkun
f20a47ff97 eal: fix memory leak in hugepage error cases
The sysfs directory for hugepages parsing was not closed properly in some
error cases.

Signed-off-by: Zhangkun <zhangk.zhangkun@huawei.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-29 12:38:29 +02:00
Stephen Hemminger
844083c46a vmxnet3: fix crash on stop
The cmd_ring_release can be called twice if queue has already
been released. This cause crash on shutdown.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-29 12:38:29 +02:00
David Marchand
413e8baa0d eal: remove unused macros
Clean both linux and bsd implementations from unused macros.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-29 12:38:29 +02:00
David Marchand
add720fce9 fix unix permissions for source files
No need for that 'x bit' on source files.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-08-28 17:04:01 +02:00
Bruce Richardson
f403924724 ixgbe: make vector stores unaligned
When writing to the mbuf array for receiving packets, do not assume
16-byte alignment by using aligned stores. If the pointers are only
8-byte aligned, the program will crash due to incorrect alignment.
Changing "store" to "storeu" fixes this.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-28 16:48:34 +02:00
Ouyang Changchun
13ce5e7eb9 virtio: mergeable buffers
This patch supports mergeable buffer feature in DPDK based virtio PMD,
which can receive jumbo frame with larger size, like 3K, 4K or even 9K.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
2014-08-25 17:14:22 +02:00
Chen Jing D(Mark)
3f6a696f10 i40evf: queue start and stop
Add per-queue RX/TX start/stop function.
Support fields start_rx_per_q and start_tx_per_q.
In the meanwhile, change dev_start/stop to call per-queue RX/TX functions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Huawei Xie <huawei.xie@intel.com>
[Thomas: fix typo]
2014-08-25 16:07:50 +02:00
Chen Jing D(Mark)
e406719449 i40e: queue start and stop
Add functions to start/stop specific RX/TX queue.
Support fields start_rx_per_q and start_tx_per_q.
In the meanwhile, change dev_start/stop functions to call per-queue functions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Changchun Ouyang <changchun.ouyang@intel.com>
Reviewed-by: Huawei Xie <huawei.xie@intel.com>
[Thomas: reword comments and merge 2 patches]
2014-08-25 16:07:50 +02:00
Ding Heng
ad6e7857ff i40e: enable multicast for promiscuous mode
IPv6 will run NDP with multicast packets, but multicast packets will be
filtered by i40e driver by default. So we need to enable multicast when
promiscuous mode is on, or IPv6 will fail.

Signed-off-by: Ding Heng <hengx.ding@intel.com>
Reviewed-by: Helin Zhang <helin.zhang@intel.com>
2014-08-25 16:07:50 +02:00
Cunming Liang
6e145fcc75 i40e: support autoneg or force link speed
- i40e force link up/down
- i40e autoneg/force speed

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Tested-by: Xu HuilongX <huilongx.xu@intel.com>
2014-08-25 16:07:50 +02:00
Helin Zhang
4f41de4fd5 i40e: support xen domain0
i40e was failing to run in XEN domain0, as the physical
memory for adminq DMA should be allocated and translated
in a different way for XEN domain0. So
rte_memzone_reserve_bounded() should be used for DMA
memory allocation, and rte_mem_phy2mch() should be used
for DMA memory address translation to support running
i40e PMD in XEN domain0.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Zhaochen Zhan <zhaochen.zhan@intel.com>
Acked-by:  Jijiang Liu <jijiang.liu@intel.com>
2014-08-25 15:44:32 +02:00
Pawel Wodkowski
35170c52d0 kni: fix build on Ubuntu 12.04
On Ubuntu 12.04.4 file '/proc/version_signature' contains
'Ubuntu 3.11.0-15.25~precise1-generic 3.11.10'. This introduce compilation
error since '~precise1' will not be discarded. This patch discards
everything after '~' inclusively.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-21 23:01:40 +02:00
Neil Horman
8777aabc53 ixgbe: require only sse3 intrinsics
ixgbe was failing to build in the default configuration because it required
sse4.2 intrinsics, and the default config doesn't support more than sse3.
Modify the pmd so that only sse3 intrinsics are pulled in and used.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
CC: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2014-08-13 01:47:03 +02:00
Pablo de Lara
7b4482e04b pcap: fix Rx crash
Normally, bufs[i] stores the mbuf pointer, the index of buf[i]
is the loop count i, but if header.len > buf_size, DPDK will
free the mbuf, but the loop count i still increases, so some
of the items in bufs[] might be NULL pointer, causing a potential
DPDK core. Using num_rx as the index for bufs[] solves the problem.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tested-by: Jiajia SunX <sunx.jiajia@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-13 01:27:53 +02:00
Julien Cretin
1589b64c35 kni: fix missing backslash in Makefile
With GNU Make 3.81 on Ubuntu 14.04, I get:
lib/librte_eal/linuxapp/kni/Makefile:49: *** unterminated call to function `shell': missing `)'.  Stop.

Signed-off-by: Julien Cretin <julien.cretin@trust-in-soft.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 21:25:00 +02:00
Thomas Monjalon
030df0102c ether: fix local address check
cppcheck reports show that is_local_admin_ether_addr() was broken:
	Expression '(X & 0x2) == 0x1' is always false

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 17:56:54 +02:00
Declan Doherty
06c1bfd94a bond: fix unit tests
- Fix bonding unit test suite which was failing due to a change
  in pmd configuration behaviour introduced in commit
  a130f53118 (add link state interrupt flag)
- Added fixes to allow the ability to re-run test suite from test
  application without restarting application

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2014-08-01 17:14:06 +02:00
Stephen Hemminger
638e82a139 vmxnet3: initialize receive mode for broadcast
The driver must listen to broadcast packets, like other devices.
Otherwise protocols like ARP won't work!

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 16:47:14 +02:00
Ouyang Changchun
ff91664430 virtio: fix build with debug
Fix 2 compilation issues in virtio PMD when dump option is enabled.

These errors were introduced by commits
    f37cdfde46 (remove unused virtqueue name)
and ce65e697c6 (simplify the hardware structure).

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 16:30:14 +02:00
Patrice Buriez
a09b359dac kni: fix build on Ubuntu 14.04
Recent Ubuntu kernel 3.13.0-30.54, although based on Linux kernel 3.13.11,
already provides skb_set_hash() inline function, slightly different than
the one provided by lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h

Ubuntu kernel 3.13.0-30.54 provides:
    * i40e/i40evf: i40e implementation for skb_set_hash
    - https://bugs.launchpad.net/bugs/1328037
    - http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_3.13.0-30.54/changelog

As a result, the implementation provided by kcompat.h must be skipped.
It is not appropriate to test whether LINUX_VERSION_CODE >= KERNEL_VERSION(3,13,11)
because previous Ubuntu kernel 3.13.0-29.53, already based on 3.13.11, needs to
get the implementation provided by kcompat.h

So the full Ubuntu kernel version numbering scheme must be tested:
<base kernel version>-<ABI number>.<upload number>-<flavour>
See "What does a specific Ubuntu kernel version number mean?"
and "How can we determine the version of the running kernel?"
at: https://wiki.ubuntu.com/Kernel/FAQ

Unlike RHEL_RELEASE_CODE, there is no such UBUNTU_RELEASE_CODE available out of
the box, so it needs to be crafted from the Makefile
Similarly, UBUNTU_KERNEL_CODE is generated with ABI and upload numbers.

`lsb_release -si` is first used to check whether we are running Ubuntu
`lsb_release -sr` provides release number 14.04, then converted to integer 1404
/proc/version_signature is parsed to get base kernel version, ABI and upload
numbers, and flavour is dropped

UBUNTU_KERNEL_CODE is indirectly defined using the UBUNTU_KERNEL_VERSION macro,
which in turn is defined in kcompat.h
This makes a single place to define the Ubuntu kernel version numbering scheme,
which is slightly different than the usual "shift by 8" scheme: ABI numbers can
be big (see: https://wiki.ubuntu.com/Kernel/Dev/TopicBranches), so 16-bits have
been reserved for them.

Finally, the implementaion of skb_set_hash is skipped in kcompat.h if we are
running Ubuntu 14.04 with an Ubuntu kernel >= 3.13.0-30.54

Signed-off-by: Patrice Buriez <patrice.buriez@intel.com>
[Thomas: simpler form, use tr instead of subst]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 14:43:18 +02:00
Stephen Hemminger
4094930e55 igb_uio: handle no IRQ fallback
Fix a couple of issues with my earlier igb_uio stuff:
1. With MSI (like MSI-X) actual IRQ number is not known until
   after the pci_enable_msi() is done.
2. If INTX fails, fall back to running without IRQ.
   This allows usermode PCI to recover and run without out IRQ
   for cases where PCI INTX support is broken (aka VMWare).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 14:43:18 +02:00
Stephen Hemminger
4a5c221f9d igb_uio: fix compability on old kernel
Add more compatibility wrappers, and split out all the wrapper
code to a separate file. Builds on Debian Squeeze (2.6.32) which
is oldest version of kernel current DPDK supports.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-08-01 14:43:18 +02:00
Thomas Monjalon
282e1ec857 igb_uio: fix build with kernel older than 2.6.34
There was a missing brace in commit 819fc2fe2a
(dont wrap pci_num_vf function needlessly).

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-07-23 10:12:59 +02:00
Anatoly Burakov
8d8d88cbd9 acl: make tailq fully local
Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 19:42:23 +02:00
Anatoly Burakov
899d8bc9b3 lpm: make tailq fully local
Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 19:42:23 +02:00
Anatoly Burakov
4542f89397 hash: make tailq fully local
Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 19:42:23 +02:00
Anatoly Burakov
dd0024ccbc mempool: make tailq fully local
Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 19:42:23 +02:00
Anatoly Burakov
e3f3b68c6e ring: make tailq fully local
Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 19:42:18 +02:00
Anatoly Burakov
2db1d35fea tailq: change rte_dummy to rte_tailq_entry
Rename structure and add a data pointer.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 19:15:01 +02:00
Anatoly Burakov
fa02c869f1 eal: use --base-virtaddr for mapping rte_config as well
Use --base-virtaddr to set the address of rte_config file along with
start address of the hugepages. Since the user would likely expect
the hugepages to be starting at the specified address, the specified
address will likely be rounded to either 2M or 1G. So, in order to
not waste space, we subtract the length of the config (and align it
on page boundary) from the base virtual address and map the config
just before the hugepages.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 17:48:49 +02:00
Anatoly Burakov
6258f1c942 eal: map shared config into exact same address as primary process
Shared config is shared across primary and secondary processes.
However,when using rte_malloc, the malloc elements keep references to
the heap inside themselves. This heap reference might not be referencing
a local heap because the heap reference points to the heap of whatever
process has allocated that malloc element. Therefore, there can be
situations when malloc elements in a given heap actually reference
different addresses for the same heap - depending on which process has
allocated the element. This can lead to segmentation faults when dealing
with malloc elements allocated on the same heap by different processes.

To fix this problem, heaps will now have the same addresses across
processes. In order to achieve that, a new field in a shared mem_config
(a structure that holds the heaps, and which is shared across processes)
was added to keep the address of where this config is mapped in the
primary process.

Secondary process will now map the config in two stages - first, it'll
map it into an arbitrary address and read the address the primary
process has allocated for the shared config. Then, the config is
unmapped and re-mapped using the address previously read.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2014-07-22 17:47:51 +02:00
Pablo de Lara
546afbc682 ring: remove extra devices creation with --vdev option
When passing extra arguments in EAL option --vdev, to create
ring ethdevs, API was creating three ethdevs, even if there
was just one argument, such as CREATE.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-07-22 16:43:04 +02:00
Stephen Hemminger
fb8251bcc5 vmxnet3: remove useless adapter wrapper
The adapter struct is just a wrapper around the vmxnet3_hw
structure. Eliminate the wrapper and get rid of the macro
used to access and needlessly cast the private data.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-07-22 16:40:55 +02:00
Stephen Hemminger
8073d56771 vmxnet3: add per-queue stats
Update per-queue statistics and add missing multicast into statistics.
Also, no need to zero statistics since they are already cleared
in rte_stats_get.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-07-22 16:40:54 +02:00
Stephen Hemminger
afce239484 vmxnet3: fix double spacing of log messages
The debug log macro's already include newline, no need
to double space the output.

Note: other drivers have the same problem

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-07-22 16:40:54 +02:00