Syncing the values by adding c11 atomic memory barriers to make sure
the values being synced before updating fifo_write and fifo_read.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Adding memory barrier to make sure the values being synced
before updating fifo_write in kni_fifo_put and fifo_read in
kni_fifo_get.
Fixes: 3fc5ca2f63 ("kni: initial import")
Cc: stable@dpdk.org
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
With existing code in kni_fifo_put, rx_q values are not being updated
before updating fifo_write. While reading rx_q in kni_net_rx_normal,
This is causing the sync issue on other core. The same situation happens
in kni_fifo_get as well.
So syncing the values by adding memory barriers to make sure the values
being synced before updating fifo_write and fifo_read.
Fixes: 3fc5ca2f63 ("kni: initial import")
Cc: stable@dpdk.org
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Keep only single config option RTE_USE_C11_MEM_MODEL for C11 memory
model, so all modules can leverage C11 atomic extension by enable this
option.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
A Kbuild is also included to allow users to use DKMS natively without
additional code.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The build error observed with Linux kernel 4.19 when KNI ethtool
support enabled (CONFIG_RTE_KNI_KMOD_ETHTOOL=y)
.../build/build/kernel/linux/kni/kni_ethtool.c:193:3:
error: ‘struct ethtool_ops’ has no member named ‘get_settings’;
.get_settings = kni_get_settings,
^~~~~~~~~~~~
.../build/build/kernel/linux/kni/kni_ethtool.c:194:3:
error: ‘struct ethtool_ops’ has no member named ‘set_settings’;
.set_settings = kni_set_settings,
^~~~~~~~~~~~
With kernel 4.19 ethtool_ops `get_settings` & `set_settings` are
replaced with `get_link_ksettings` & `set_link_ksettings`
Commit 9b3004953503 ("ethtool: drop get_settings and set_settings callbacks")
This fix practically removes `get_settings` & `set_settings` support
for the kernel versions that have the new ethtool_ops without
implementing the new ones.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch updates the release notes for added feature of crypto
port and symmetric crypto action.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Add support for rte_pause() implementation for ppc64.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
'OCTEON TX' is the registered name. All other usages need to be fixed.
Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Remove unused file from the release notes docs. This file was
used to display a hierarchy in older releases, circa 2015, but
doesn't seem useful in the current structure.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch updates the CLI parsing of softnic with extra symmetric
cryptodev, port, session, and action support.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch adds symmetric crypto action support to softnic.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch enables the crypt port configuration in softnic.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch adds cryptodev abstraction to softnic. The DPDK
Cryptodevs are abstracted as crypto ports in the softnic.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Update document with flow and qos api support in softnic PMD.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>
Unit tests to check for hash lookup and bulk-lookup perf
with lock-free enabled and with lock-free disabled.
Unit tests performed with readers running in parallel with writers.
Tests include:
- hash lookup on existing keys with:
- hash add causing NO key-shifts of existing keys in the table
- hash lookup on existing keys likely to be on shift-path with:
- hash add causing key-shifts of existing keys in the table
- hash lookup on existing keys NOT likely to be on shift-path with:
- hash add causing key-shifts of existing keys in the table
- hash lookup on non-existing keys with:
- hash add causing NO key-shifts of existing keys in the table
- hash add causing key-shifts of existing keys in the table
- hash lookup on keys likely to be on shift-path with:
- multiple writers causing key-shifts of existing keys in the table
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add lock-free read-write concurrency. This is achieved by the
following changes.
1) Add memory ordering to avoid race conditions. The only race
condition that can occur is - using the key store element
before the key write is completed. Hence, while inserting the element
the release memory order is used. Any other race condition is caught
by the key comparison. Memory orderings are added only where needed.
For ex: reads in the writer's context do not need memory ordering
as there is a single writer.
key_idx in the bucket entry and pdata in the key store element are
used for synchronisation. key_idx is used to release an inserted
entry in the bucket to the reader. Use of pdata for synchronisation
is required due to updation of an existing entry where-in only
the pdata is updated without updating key_idx.
2) Reader-writer concurrency issue, caused by moving the keys
to their alternative locations during key insert, is solved
by introducing a global counter(tbl_chng_cnt) indicating a
change in table.
3) Add the flag to enable reader-writer concurrency during
run time.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Fix the key store array element alignment such that every array
element is aligned on KEY_ALIGNMENT boundary. This is required to
make 'pdata' in 'struct rte_hash_key' align on its natural boundary
for atomic load/store.
Fixes: 473d1bebce ("hash: allow to store data in hash table")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
rte_hash_lookup_xxx APIs return the index of slot in
the key store. Application(reader) can use that index to reference
other data structures in its scope. Because of this, the
index should not be freed till the application completes
using the index.
RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this.
When this flag is enabled rte_hash_del_xxx APIs do not free the
key-store index/internal memory associated with the deleted
entry. The new API rte_hash_free_key_with_position should be called
to free the key-store index/internal memory after calling
rte_hash_del_xxx APIs.
Suggested-by: Yipeng Wang <yipeng1.wang@intel.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
RW concurrency is required with single writer and multiple reader
usecase as well. Hence, multi-writer should not be enabled by default when
RW concurrency is enabled.
Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add meson.build in vm_power_manager and the guest_cli subdirectory.
Building can be achieved by going to the build directory, and using
meson configure -Dexamples=vm_power_manager,vm_power_manager/guest_cli
Then, when ninja is invoked, it will build dpdk-vm_power_manger and
dpdk-guest_cli
Work still needs to be done on the meson build system to handles the case
where the target list of example apps is defined as 'all'. That will come
in a future patch.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Some messages appearing several times a second, removing as they are
unnecessary. Other less severe messages change from INFO to DEBUG
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add JSON string handling to vm_power_manager for JSON strings received
through the fifo. The format of the JSON strings are detailed in the
next patch, the vm_power_manager user guide documentation updates.
This patch introduces a new dependency on Jansson, a C library for
encoding, decoding and manipulating JSON data. To compile the sample app
you now need to have installed libjansson4 and libjansson-dev (these may
be named slightly differently depending on your Operating System)
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Now that we're handling host policies, containers and virtual machines,
we'll rename MAX_VMS to MAX_CLIENTS, and increase from 4 to 64
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch adds a fifo channel to the vm_power_manager app through which
we can send commands and polices. Intended for sending JSON strings.
The fifo is at /tmp/powermonitor/fifo
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The changes here are minimal, as the guest app functionality is not
changing at all, but there is a new element in the channel_packet
struct that needs to have a default set (channel_packet->core_type).
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch does a couple of things:
* Adds a new message type for removing policies (PKT_POLICY_REMOVE)
Used when we want to remove a previously created policy.
* Adds a core_type bool to the channel packet struct to specify whether
the type of core we want to control is virtual or physical.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Previously the vm_power_manager app required to have some vms defined, so
the call to get_all_vm() always set the noVms variable. Now we're accepting
policies from the host OS (without any VMs defined), so it is now valid to
have zero VMs. This patch initialises the relevant variables to zero just
in case the call to get_all_vms() does not find any, so could return with
the variables uninitialised.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Allow vm_power_manager to run without requiring qemu to be present
on the machine. This will be required for instances where the JSON
interface is used for commands and polices, without any VMs present.
A use case for this is a container enviromnent.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add the support for new traffic pattern aware power control
power management API.
Example:
./l3fwd-power -l xxx -n 4 -w 0000:xx:00.0 -w 0000:xx:00.1 -- -p 0x3
-P --config="(0,0,xx),(1,0,xx)" --empty-poll="0,0,0" -l 14 -m 9 -h 1
Please Reference l3fwd-power document for full parameter usage
The option "l", "m", "h" are used to set the power index for
LOW, MED, HIGH power state. Only is useful after enable empty-poll
--empty-poll="training_flag, med_threshold, high_threshold"
The option training_flag is used to enable/disable training mode.
The option med_threshold is used to indicate the empty poll threshold
of modest state which is customized by user.
The option high_threshold is used to indicate the empty poll threshold
of busy state which is customized by user.
Above three option default value is all 0.
Once enable empty-poll. System will apply the default parameter if no
other command line options are provided.
If training mode is enabled, the user should ensure that no traffic
is allowed to pass through the system. When training phase complete,
the application transfer to normal operation
System will start running with the modest power mode.
If the traffic goes above 70%, then system will move to High power state.
If the traffic drops below 30%, the system will fallback to the modest
power state.
Example code use master thread to monitoring worker thread busyness.
The default timer resolution is 10ms.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
1. Abstract
For packet processing workloads such as DPDK polling is continuous.
This means CPU cores always show 100% busy independent of how much work
those cores are doing. It is critical to accurately determine how busy
a core is hugely important for the following reasons:
* No indication of overload conditions.
* User does not know how much real load is on a system, resulting
in wasted energy as no power management is utilized.
Compared to the original l3fwd-power design, instead of going to sleep
after detecting an empty poll, the new mechanism just lowers the core
frequency. As a result, the application does not stop polling the device,
which leads to improved handling of bursts of traffic.
When the system become busy, the empty poll mechanism can also increase the
core frequency (including turbo) to do best effort for intensive traffic.
This gives us more flexible and balanced traffic awareness over the
standard l3fwd-power application.
2. Proposed solution
The proposed solution focuses on how many times empty polls are executed.
The less the number of empty polls, means current core is busy with
processing workload, therefore, the higher frequency is needed. The high
empty poll number indicates the current core not doing any real work
therefore, we can lower the frequency to safe power.
In the current implementation, each core has 1 empty-poll counter which
assume 1 core is dedicated to 1 queue. This will need to be expanded in the
future to support multiple queues per core.
2.1 Power state definition:
LOW: Not currently used, reserved for future use.
MED: the frequency is used to process modest traffic workload.
HIGH: the frequency is used to process busy traffic workload.
2.2 There are two phases to establish the power management system:
a.Initialization/Training phase. The training phase is necessary
in order to figure out the system polling baseline numbers from
idle to busy. The highest poll count will be during idle, where
all polls are empty. These poll counts will be different between
systems due to the many possible processor micro-arch, cache
and device configurations, hence the training phase.
In the training phase, traffic is blocked so the training
algorithm can average the empty-poll numbers for the LOW, MED and
HIGH power states in order to create a baseline.
The core's counter are collected every 10ms, and the Training
phase will take 2 seconds.
Training is disabled as default configuration. The default
parameter is applied. Sample App still can trigger training
if that's needed. Once the training phase has been executed once on
a system, the application can then be started with the relevant
thresholds provided on the command line, allowing the application
to start passing start traffic immediately
b.Normal phase. Traffic starts immediately based on the default
thresholds, or based on the user supplied thresholds via the
command line parameters. The run-time poll counts are compared with
the baseline and the decision will be taken to move to MED power
state or HIGH power state. The counters are calculated every 10ms.
3. Proposed API
1. rte_power_empty_poll_stat_init(struct ep_params **eptr,
uint8_t *freq_tlb, struct ep_policy *policy);
which is used to initialize the power management system.
2. rte_power_empty_poll_stat_free(void);
which is used to free the resource hold by power management system.
3. rte_power_empty_poll_stat_update(unsigned int lcore_id);
which is used to update specific core empty poll counter, not thread safe
4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
which is used to update specific core valid poll counter, not thread safe
5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core empty poll counter.
6. rte_power_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core valid poll counter.
7. rte_empty_poll_detection(struct rte_timer *tim, void *arg);
which is used to detect empty poll state changes then take action.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.
This is proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.
With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This commit changes the current rte_hash unit test to
test the extendable table feature and performance.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.
This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.
Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.
This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.
Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.
Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The test_hash_readwrite.c was not in the meson.build file. This
commit adds the missing test into the file.
Fixes: 0eb3726ebc ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
the multi-reader and multi-writer rte_hash unit test does not
work correctly with non-consecutive core ids. This commit
fixes the issue.
Fixes: 0eb3726ebc ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Edit the printf information when error happens to be more
accurate and informative.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.
In the test, the bucket size and number of buckets are used
to map to the underneath rte_hash structure. They are used
to test performance of two conditions: keys in primary
buckets only and keys in both primary and secondary buckets.
Although there is no functional issue with bucket size set
to 4, it mismatches the underneath rte_hash structure,
which may affect code readability and future extension.
Fixes: 58017c98ed ("hash: add vectorized comparison")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
The 'TX' in OCTEON TX would cause a warning.
Adding an exception for that.
OCTEON TX is a registered product under Cavium
Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Really minor issue:
There were extra spaces making the alignment wrong.
Fixes: e95faac151 ("crypto/mrvl: rename PMD to mvsam")
Fixes: 4ccc8d770d ("net/mvneta: add PMD skeleton")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The option GENERATE_DEPRECATEDLIST will create a page
"Deprecated List" in "Related Pages" menu.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>