4a22e6ee3d
This change adds automatic figure references to the docs. The figure numbers in the generated Html and PDF docs are now automatically numbered based on section. Requires Sphinx >= 1.3.1. The patch makes the following changes. * Changes image:: tag to figure:: and moves image caption to the figure. * Adds captions to figures that didn't previously have any. * Un-templates the |image-name| substitution definitions into explicit figure:: tags. They weren't used more than once anyway and Sphinx doesn't support them for figure. * Adds a target to each image that didn't previously have one so that they can be cross-referenced. * Renamed existing image target to match the image name for consistency. * Replaces the Figures lists with automatic :numref: :ref: entries to generate automatic numbering and captions. * Replaces "Figure" references with automatic :numref: references. Signed-off-by: John McNamara <john.mcnamara@intel.com>
495 lines
17 KiB
ReStructuredText
495 lines
17 KiB
ReStructuredText
.. BSD LICENSE
|
|
Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
|
|
All rights reserved.
|
|
|
|
Redistribution and use in source and binary forms, with or without
|
|
modification, are permitted provided that the following conditions
|
|
are met:
|
|
|
|
* Redistributions of source code must retain the above copyright
|
|
notice, this list of conditions and the following disclaimer.
|
|
* Redistributions in binary form must reproduce the above copyright
|
|
notice, this list of conditions and the following disclaimer in
|
|
the documentation and/or other materials provided with the
|
|
distribution.
|
|
* Neither the name of Intel Corporation nor the names of its
|
|
contributors may be used to endorse or promote products derived
|
|
from this software without specific prior written permission.
|
|
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
|
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
|
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
|
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
|
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
|
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
|
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
|
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
|
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
Quota and Watermark Sample Application
|
|
======================================
|
|
|
|
The Quota and Watermark sample application is a simple example of packet processing using Data Plane Development Kit (DPDK) that
|
|
showcases the use of a quota as the maximum number of packets enqueue/dequeue at a time and low and high watermarks
|
|
to signal low and high ring usage respectively.
|
|
|
|
Additionally, it shows how ring watermarks can be used to feedback congestion notifications to data producers by
|
|
temporarily stopping processing overloaded rings and sending Ethernet flow control frames.
|
|
|
|
This sample application is split in two parts:
|
|
|
|
* qw - The core quota and watermark sample application
|
|
|
|
* qwctl - A command line tool to alter quota and watermarks while qw is running
|
|
|
|
Overview
|
|
--------
|
|
|
|
The Quota and Watermark sample application performs forwarding for each packet that is received on a given port.
|
|
The destination port is the adjacent port from the enabled port mask, that is,
|
|
if the first four ports are enabled (port mask 0xf), ports 0 and 1 forward into each other,
|
|
and ports 2 and 3 forward into each other.
|
|
The MAC addresses of the forwarded Ethernet frames are not affected.
|
|
|
|
Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline,
|
|
each stage of which being connected by rings, as shown in :numref:`figure_pipeline_overview`.
|
|
|
|
.. _figure_pipeline_overview:
|
|
|
|
.. figure:: img/pipeline_overview.*
|
|
|
|
Pipeline Overview
|
|
|
|
|
|
An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue.
|
|
Adjustable watermark values associated with the rings control a back-off mechanism that
|
|
tries to prevent the pipeline from being overloaded by:
|
|
|
|
* Stopping enqueuing on rings for which the usage has crossed the high watermark threshold
|
|
|
|
* Sending Ethernet pause frames
|
|
|
|
* Only resuming enqueuing on a ring once its usage goes below a global low watermark threshold
|
|
|
|
This mechanism allows congestion notifications to go up the ring pipeline and
|
|
eventually lead to an Ethernet flow control frame being send to the source.
|
|
|
|
On top of serving as an example of quota and watermark usage,
|
|
this application can be used to benchmark ring based processing pipelines performance using a traffic- generator,
|
|
as shown in :numref:`figure_ring_pipeline_perf_setup`.
|
|
|
|
.. _figure_ring_pipeline_perf_setup:
|
|
|
|
.. figure:: img/ring_pipeline_perf_setup.*
|
|
|
|
Ring-based Processing Pipeline Performance Setup
|
|
|
|
|
|
Compiling the Application
|
|
-------------------------
|
|
|
|
#. Go to the example directory:
|
|
|
|
.. code-block:: console
|
|
|
|
export RTE_SDK=/path/to/rte_sdk
|
|
cd ${RTE_SDK}/examples/quota_watermark
|
|
|
|
#. Set the target (a default target is used if not specified). For example:
|
|
|
|
.. code-block:: console
|
|
|
|
export RTE_TARGET=x86_64-native-linuxapp-gcc
|
|
|
|
See the *DPDK Getting Started Guide* for possible RTE_TARGET values.
|
|
|
|
#. Build the application:
|
|
|
|
.. code-block:: console
|
|
|
|
make
|
|
|
|
Running the Application
|
|
-----------------------
|
|
|
|
The core application, qw, has to be started first.
|
|
|
|
Once it is up and running, one can alter quota and watermarks while it runs using the control application, qwctl.
|
|
|
|
Running the Core Application
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The application requires a single command line option:
|
|
|
|
.. code-block:: console
|
|
|
|
./qw/build/qw [EAL options] -- -p PORTMASK
|
|
|
|
where,
|
|
|
|
-p PORTMASK: A hexadecimal bitmask of the ports to configure
|
|
|
|
To run the application in a linuxapp environment with four logical cores and ports 0 and 2,
|
|
issue the following command:
|
|
|
|
.. code-block:: console
|
|
|
|
./qw/build/qw -c f -n 4 -- -p 5
|
|
|
|
Refer to the *DPDK Getting Started Guide* for general information on running applications and
|
|
the Environment Abstraction Layer (EAL) options.
|
|
|
|
Running the Control Application
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The control application requires a number of command line options:
|
|
|
|
.. code-block:: console
|
|
|
|
./qwctl/build/qwctl [EAL options] --proc-type=secondary
|
|
|
|
The --proc-type=secondary option is necessary for the EAL to properly initialize the control application to
|
|
use the same huge pages as the core application and thus be able to access its rings.
|
|
|
|
To run the application in a linuxapp environment on logical core 0, issue the following command:
|
|
|
|
.. code-block:: console
|
|
|
|
./qwctl/build/qwctl -c 1 -n 4 --proc-type=secondary
|
|
|
|
Refer to the *DPDK Getting Started* Guide for general information on running applications and
|
|
the Environment Abstraction Layer (EAL) options.
|
|
|
|
qwctl is an interactive command line that let the user change variables in a running instance of qw.
|
|
The help command gives a list of available commands:
|
|
|
|
.. code-block:: console
|
|
|
|
$ qwctl > help
|
|
|
|
Code Overview
|
|
-------------
|
|
|
|
The following sections provide a quick guide to the application's source code.
|
|
|
|
Core Application - qw
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
EAL and Drivers Setup
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The EAL arguments are parsed at the beginning of the main() function:
|
|
|
|
.. code-block:: c
|
|
|
|
ret = rte_eal_init(argc, argv);
|
|
if (ret < 0)
|
|
rte_exit(EXIT_FAILURE, "Cannot initialize EAL\n");
|
|
|
|
argc -= ret;
|
|
argv += ret;
|
|
|
|
Then, a call to init_dpdk(), defined in init.c, is made to initialize the poll mode drivers:
|
|
|
|
.. code-block:: c
|
|
|
|
void
|
|
init_dpdk(void)
|
|
{
|
|
int ret;
|
|
|
|
/* Bind the drivers to usable devices */
|
|
|
|
ret = rte_eal_pci_probe();
|
|
if (ret < 0)
|
|
rte_exit(EXIT_FAILURE, "rte_eal_pci_probe(): error %d\n", ret);
|
|
|
|
if (rte_eth_dev_count() < 2)
|
|
rte_exit(EXIT_FAILURE, "Not enough Ethernet port available\n");
|
|
}
|
|
|
|
To fully understand this code, it is recommended to study the chapters that relate to the *Poll Mode Driver*
|
|
in the *DPDK Getting Started Guide* and the *DPDK API Reference*.
|
|
|
|
Shared Variables Setup
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The quota and low_watermark shared variables are put into an rte_memzone using a call to setup_shared_variables():
|
|
|
|
.. code-block:: c
|
|
|
|
void
|
|
setup_shared_variables(void)
|
|
{
|
|
const struct rte_memzone *qw_memzone;
|
|
|
|
qw_memzone = rte_memzone_reserve(QUOTA_WATERMARK_MEMZONE_NAME, 2 * sizeof(int), rte_socket_id(), RTE_MEMZONE_2MB);
|
|
|
|
if (qw_memzone == NULL)
|
|
rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno));
|
|
|
|
quota = qw_memzone->addr;
|
|
low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
|
|
}
|
|
|
|
These two variables are initialized to a default value in main() and
|
|
can be changed while qw is running using the qwctl control program.
|
|
|
|
Application Arguments
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The qw application only takes one argument: a port mask that specifies which ports should be used by the application.
|
|
At least two ports are needed to run the application and there should be an even number of ports given in the port mask.
|
|
|
|
The port mask parsing is done in parse_qw_args(), defined in args.c.
|
|
|
|
Mbuf Pool Initialization
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Once the application's arguments are parsed, an mbuf pool is created.
|
|
It contains a set of mbuf objects that are used by the driver and the application to store network packets:
|
|
|
|
.. code-block:: c
|
|
|
|
/* Create a pool of mbuf to store packets */
|
|
|
|
mbuf_pool = rte_mempool_create("mbuf_pool", MBUF_PER_POOL, MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private),
|
|
rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0);
|
|
|
|
if (mbuf_pool == NULL)
|
|
rte_panic("%s\n", rte_strerror(rte_errno));
|
|
|
|
The rte_mempool is a generic structure used to handle pools of objects.
|
|
In this case, it is necessary to create a pool that will be used by the driver,
|
|
which expects to have some reserved space in the mempool structure, sizeof(struct rte_pktmbuf_pool_private) bytes.
|
|
|
|
The number of allocated pkt mbufs is MBUF_PER_POOL, with a size of MBUF_SIZE each.
|
|
A per-lcore cache of 32 mbufs is kept.
|
|
The memory is allocated in on the master lcore's socket, but it is possible to extend this code to allocate one mbuf pool per socket.
|
|
|
|
Two callback pointers are also given to the rte_mempool_create() function:
|
|
|
|
* The first callback pointer is to rte_pktmbuf_pool_init() and is used to initialize the private data of the mempool,
|
|
which is needed by the driver.
|
|
This function is provided by the mbuf API, but can be copied and extended by the developer.
|
|
|
|
* The second callback pointer given to rte_mempool_create() is the mbuf initializer.
|
|
|
|
The default is used, that is, rte_pktmbuf_init(), which is provided in the rte_mbuf library.
|
|
If a more complex application wants to extend the rte_pktmbuf structure for its own needs,
|
|
a new function derived from rte_pktmbuf_init() can be created.
|
|
|
|
Ports Configuration and Pairing
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Each port in the port mask is configured and a corresponding ring is created in the master lcore's array of rings.
|
|
This ring is the first in the pipeline and will hold the packets directly coming from the port.
|
|
|
|
.. code-block:: c
|
|
|
|
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
|
|
if (is_bit_set(port_id, portmask)) {
|
|
configure_eth_port(port_id);
|
|
init_ring(master_lcore_id, port_id);
|
|
}
|
|
|
|
pair_ports();
|
|
|
|
The configure_eth_port() and init_ring() functions are used to configure a port and a ring respectively and are defined in init.c.
|
|
They make use of the DPDK APIs defined in rte_eth.h and rte_ring.h.
|
|
|
|
pair_ports() builds the port_pairs[] array so that its key-value pairs are a mapping between reception and transmission ports.
|
|
It is defined in init.c.
|
|
|
|
Logical Cores Assignment
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port.
|
|
|
|
Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core.
|
|
pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See :numref:`figure_threads_pipelines`.
|
|
|
|
.. code-block:: c
|
|
|
|
/* Start pipeline_stage() on all the available slave lcore but the last */
|
|
|
|
for (lcore_id = 0 ; lcore_id < last_lcore_id; lcore_id++) {
|
|
if (rte_lcore_is_enabled(lcore_id) && lcore_id != master_lcore_id) {
|
|
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
|
|
if (is_bit_set(port_id, portmask))
|
|
init_ring(lcore_id, port_id);
|
|
|
|
rte_eal_remote_launch(pipeline_stage, NULL, lcore_id);
|
|
}
|
|
}
|
|
|
|
The last available logical core runs send_stage(),
|
|
which is the last stage of the pipeline dequeuing packets from the last ring in the pipeline and
|
|
sending them out on the destination port setup by pair_ports().
|
|
|
|
.. code-block:: c
|
|
|
|
/* Start send_stage() on the last slave core */
|
|
|
|
rte_eal_remote_launch(send_stage, NULL, last_lcore_id);
|
|
|
|
Receive, Process and Transmit Packets
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. _figure_threads_pipelines:
|
|
|
|
.. figure:: img/threads_pipelines.*
|
|
|
|
Threads and Pipelines
|
|
|
|
|
|
In the receive_stage() function running on the master logical core,
|
|
the main task is to read ingress packets from the RX ports and enqueue them
|
|
on the port's corresponding first ring in the pipeline.
|
|
This is done using the following code:
|
|
|
|
.. code-block:: c
|
|
|
|
lcore_id = rte_lcore_id();
|
|
|
|
/* Process each port round robin style */
|
|
|
|
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
|
|
if (!is_bit_set(port_id, portmask))
|
|
continue;
|
|
|
|
ring = rings[lcore_id][port_id];
|
|
|
|
if (ring_state[port_id] != RING_READY) {
|
|
if (rte_ring_count(ring) > *low_watermark)
|
|
continue;
|
|
else
|
|
ring_state[port_id] = RING_READY;
|
|
}
|
|
|
|
/* Enqueue received packets on the RX ring */
|
|
|
|
nb_rx_pkts = rte_eth_rx_burst(port_id, 0, pkts, *quota);
|
|
|
|
ret = rte_ring_enqueue_bulk(ring, (void *) pkts, nb_rx_pkts);
|
|
if (ret == -EDQUOT) {
|
|
ring_state[port_id] = RING_OVERLOADED;
|
|
send_pause_frame(port_id, 1337);
|
|
}
|
|
}
|
|
|
|
For each port in the port mask, the corresponding ring's pointer is fetched into ring and that ring's state is checked:
|
|
|
|
* If it is in the RING_READY state, \*quota packets are grabbed from the port and put on the ring.
|
|
Should this operation make the ring's usage cross its high watermark,
|
|
the ring is marked as overloaded and an Ethernet flow control frame is sent to the source.
|
|
|
|
* If it is not in the RING_READY state, this port is ignored until the ring's usage crosses the \*low_watermark value.
|
|
|
|
The pipeline_stage() function's task is to process and move packets from the preceding pipeline stage.
|
|
This thread is running on most of the logical cores to create and arbitrarily long pipeline.
|
|
|
|
.. code-block:: c
|
|
|
|
lcore_id = rte_lcore_id();
|
|
|
|
previous_lcore_id = get_previous_lcore_id(lcore_id);
|
|
|
|
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
|
|
if (!is_bit_set(port_id, portmask))
|
|
continue;
|
|
|
|
tx = rings[lcore_id][port_id];
|
|
rx = rings[previous_lcore_id][port_id];
|
|
if (ring_state[port_id] != RING_READY) {
|
|
if (rte_ring_count(tx) > *low_watermark)
|
|
continue;
|
|
else
|
|
ring_state[port_id] = RING_READY;
|
|
}
|
|
|
|
/* Dequeue up to quota mbuf from rx */
|
|
|
|
nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
|
|
|
|
if (unlikely(nb_dq_pkts < 0))
|
|
continue;
|
|
|
|
/* Enqueue them on tx */
|
|
|
|
ret = rte_ring_enqueue_bulk(tx, pkts, nb_dq_pkts);
|
|
if (ret == -EDQUOT)
|
|
ring_state[port_id] = RING_OVERLOADED;
|
|
}
|
|
|
|
The thread's logic works mostly like receive_stage(),
|
|
except that packets are moved from ring to ring instead of port to ring.
|
|
|
|
In this example, no actual processing is done on the packets,
|
|
but pipeline_stage() is an ideal place to perform any processing required by the application.
|
|
|
|
Finally, the send_stage() function's task is to read packets from the last ring in a pipeline and
|
|
send them on the destination port defined in the port_pairs[] array.
|
|
It is running on the last available logical core only.
|
|
|
|
.. code-block:: c
|
|
|
|
lcore_id = rte_lcore_id();
|
|
|
|
previous_lcore_id = get_previous_lcore_id(lcore_id);
|
|
|
|
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
|
|
if (!is_bit_set(port_id, portmask)) continue;
|
|
|
|
dest_port_id = port_pairs[port_id];
|
|
tx = rings[previous_lcore_id][port_id];
|
|
|
|
if (rte_ring_empty(tx)) continue;
|
|
|
|
/* Dequeue packets from tx and send them */
|
|
|
|
nb_dq_pkts = rte_ring_dequeue_burst(tx, (void *) tx_pkts, *quota);
|
|
nb_tx_pkts = rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
|
|
}
|
|
|
|
For each port in the port mask, up to \*quota packets are pulled from the last ring in its pipeline and
|
|
sent on the destination port paired with the current port.
|
|
|
|
Control Application - qwctl
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The qwctl application uses the rte_cmdline library to provide the user with an interactive command line that
|
|
can be used to modify and inspect parameters in a running qw application.
|
|
Those parameters are the global quota and low_watermark value as well as each ring's built-in high watermark.
|
|
|
|
Command Definitions
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
The available commands are defined in commands.c.
|
|
|
|
It is advised to use the cmdline sample application user guide as a reference for everything related to the rte_cmdline library.
|
|
|
|
Accessing Shared Variables
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The setup_shared_variables() function retrieves the shared variables quota and
|
|
low_watermark from the rte_memzone previously created by qw.
|
|
|
|
.. code-block:: c
|
|
|
|
static void
|
|
setup_shared_variables(void)
|
|
{
|
|
const struct rte_memzone *qw_memzone;
|
|
|
|
qw_memzone = rte_memzone_lookup(QUOTA_WATERMARK_MEMZONE_NAME);
|
|
if (qw_memzone == NULL)
|
|
rte_exit(EXIT_FAILURE, "Couldn't find memzone\n");
|
|
|
|
quota = qw_memzone->addr;
|
|
|
|
low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
|
|
}
|