Optimized dequeue using x86 vector instructions was added
in 21.05, but due to limited testing the default has been
changed back to the scalar mode implementation. The vector mode
implementation can be enabled via the devargs option
"vector_opts_enabled=<y/Y>".
Fixes: 000a7b8e75 ("event/dlb2: optimize dequeue operation")
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
The HW scheduling type was not being extracted properly
in the vector optimized dequeue path. It was also not
being recorded in the xstats.
Fixes: 000a7b8e75 ("event/dlb2: optimize dequeue operation")
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Deferred scheduling is a DLB v1.0 feature, and is not valid for
DLB v2.0 or v2.5.
Fixes: bc62748bd7 ("event/dlb2: add private data structures and constants")
Fixes: a2e4f1f5e7 ("event/dlb2: add dequeue and its burst variants")
Cc: stable@dpdk.org
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Let's try to enforce the convention where most drivers use a pmd. logtype
with their class reflected in it, and libraries use a lib. logtype.
Introduce two new macros:
- RTE_LOG_REGISTER_DEFAULT can be used when a single logtype is
used in a component. It is associated to the default name provided
by the build system,
- RTE_LOG_REGISTER_SUFFIX can be used when multiple logtypes are used,
and then the passed name is appended to the default name,
RTE_LOG_REGISTER is left untouched for existing external users
and for components that do not comply with the convention.
There is a new Meson variable log_prefix to adapt the default name
for baseband (pmd.bb.), bus (no pmd.) and mempool (no pmd.) classes.
Note: achieved with below commands + reverted change on net/bonding +
edits on crypto/virtio, compress/mlx5, regex/mlx5
$ git grep -l RTE_LOG_REGISTER drivers/ |
while read file; do
pattern=${file##drivers/};
class=${pattern%%/*};
pattern=${pattern#$class/};
drv=${pattern%%/*};
case "$class" in
baseband) pattern=pmd.bb.$drv;;
bus) pattern=bus.$drv;;
mempool) pattern=mempool.$drv;;
*) pattern=pmd.$class.$drv;;
esac
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
done
$ git grep -l RTE_LOG_REGISTER lib/ |
while read file; do
pattern=${file##lib/};
pattern=lib.${pattern%%/*};
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
done
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add devargs to control each event timer adapter i.e. TIM rings internal
parameters uniquely. The following dict format is expected
[ring-chnk_slots-disable_npa-stats_ena]. 0 represents default values.
Example:
--dev "0002:1e:00.0,tim_ring_ctl=[2-1023-1-0]"
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add event timer adapter statistics get and reset functions.
Stats are disabled by default and can be enabled through devargs.
Example:
--dev "0002:1e:00.0,tim_stats_ena=1"
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add function to cancel event timer that has been armed.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add event timer arm timeout burst function.
All the timers requested to be armed have the same timeout.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add TIM bucket operations used for event timer arm and cancel.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add devargs to control default chunk size and max numbers of
timer rings to attach to a given RVU PF.
Example:
--dev "0002:1e:00.0,tim_chnk_slots=1024"
--dev "0002:1e:00.0,tim_rings_lmt=4"
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add internal SSO functions to allow event adapters to resize SSO buffers
that are used to hold in-flight events in DRAM.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
If the chunks are allocated from NPA then TIM can automatically free
them when traversing the list of chunks.
Add devargs to disable NPA and use software mempool to manage chunks.
Example:
--dev "0002:0e:00.0,tim_disable_npa=1"
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
When the application calls timer adapter create the following is used:
- Allocate a TIM LF based on number of LF's provisioned.
- Verify the config parameters supplied.
- Allocate memory required for
* Buckets based on min and max timeout supplied.
* Allocate the chunk pool based on the number of timers.
On Free:
- Free the allocated bucket and chunk memory.
- Free the TIM lf allocated.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Add eventdev start function along with few cleanup API's to maintain
sanity.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add devargs to configure the platform specific getwork mode.
CN9K getwork mode by default is set to use dual workslot mode.
Add option to force single workslot mode.
Example:
--dev "0002:0e:00.0,single_ws=1"
CN10K supports multiple getwork prefetch modes, by default the
prefetch mode is set to none.
Add option to select getwork prefetch mode
Example:
--dev "0002:1e:00.0,gw_mode=1"
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
SSO HWGRPs i.e. queue uses DRAM & SRAM buffers to hold in-flight
events. By default the buffers are assigned to the SSO HWGRPs to
satisfy minimum HW requirements. SSO is free to assign the remaining
buffers to HWGRPs based on a preconfigured threshold.
We can control the QoS of SSO HWGRP by modifying the above mentioned
thresholds. HWGRPs that have higher importance can be assigned higher
thresholds than the rest.
Example:
--dev "0002:0e:00.0,qos=[1-50-50-50]" // [Qx-XAQ-TAQ-IAQ]
Qx -> Event queue Aka SSO GGRP.
XAQ -> DRAM In-flights.
TAQ & IAQ -> SRAM In-flights.
The values need to be expressed in terms of percentages, 0 represents
default.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
The number of events for a *open system* event device is specified
as -1 as per the eventdev specification.
Since, SSO inflight events are only limited by DRAM size, the
xae_cnt devargs parameter is introduced to provide upper limit for
in-flight events.
Example:
--dev "0002:0e:00.0,xae_cnt=8192"
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Allocate buffers in DRAM that hold inflight events.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Add platform specific event device configuration that attaches the
requested number of SSO HWS(event ports) and HWGRP(event queues) LFs
to the RVU PF/VF.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Add platform specific event device probe and remove, also add
event device info get function.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Add the info_get function to return details on the queues, flow,
prioritization capabilities, etc. which this device has.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Add meson build infra structure along with the event device
SSO initialization and teardown functions.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Convert code to use x86 vector instructions, thereby significantly
improving dequeue performance.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
The new devarg names and their default values
are listed below. The defaults have not changed, and
none of these parameters are accessed in the fast path.
poll_interval=1000
sw_credit_quantai=32
default_depth_thresh=256
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
All references to the old register map have been removed,
so it is safe to rename the new combined file that supports
both DLB v2.0 and DLB v2.5. Also fixed all places where this
file is included.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
As support for DLB v2.5 was added, modifications were made to
dlb_hw_types_new.h, but the old file needed to be preserved during
the port in order to meet the requirement that individual patches in
a series each compile successfully. Since the DLB v2.5 support is
completely integrated, it is now safe to remove the old (original)
file, as well as the DLB2_USE_NEW_HEADERS define that was used to
control which version of the file was to be included in certain
source files.
It is now safe to rename the new file, and use it unconditionally
in all DLB source files.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
The file dlb_resource_new.c now contains all of the low level
functions required to support both DLB v2.0 and DLB v2.5, and
the original file (dlb_resource.c) was removed in the previous
commit, so rename dlb_resource_new.c to dlb_resource.c, and
update the meson build file so that the new file is built.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
A temporary version of dlb_resource.h (dlb_resource_new.h) was used
by the previous commits in this patch series. Merge the two files
now that DLB v2.5 support has been fully added to dlb_resource.c.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Update the low level HW functions that perform the sequence number
management functions. These include getting a groups number of
sequence numbers per queue, managing in-use slots, getting the
current occupancy, and setting sequence numbers for a group.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Update the low level HW functions responsible for
configuring sparse CQ mode, where each cache line
contains just one QE instead of 4.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Update the low level HW functions responsible for
finishing the queue map/unmap operation, which is an
asynchronous operation.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Update the low level hardware functions responsible for
getting the queue depth. The command arguments are also
validated.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
DLB v2.5 uses a different credit scheme than was used in DLB v2.0 .
Specifically, there is a single credit pool for both load balanced
and directed traffic, instead of a separate pool for each as is
found with DLB v2.0.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>