If the NIC or the FW does not support the dynamic flex parser,
it will return error when trying to create the parser for eCRPI.
Then it is hard to know the detail error reason of the failure.
Before creating the parser node and the following usage of the
parser, the capacity bit saved in the HCA_CAP could be used to
confirm if the dynamic flex parser is supported.
If no, an error will be returned directly with ENOTSUP to prevent
the following steps to be executed.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
eCPRI protocol has unified format layout for the variants, over
ETH layer (including .1Q) and UDP layer.
The common header of the message has 4 bytes fixed length, and the
message payload layers are different based on the type field. Now
only type #0, #2 and #5 will be supported, and 2 bytes are needed.
When creating the flex parser, the header will be extended to 8
bytes and 2 DW samples are needed. The 1st DW starts from offset 0
and will be used for the type field of the common header. The 2nd
DW starts from offset 4 and will be used for the physical channel
ID, real-time control ID or measurement ID fields.
The parser will be created once a flow with eCPRI item is observed
for the first time. After creating, it will remain in the system
and HW until the device is stopped. Right now, there is no need to
destroy the eCPRI flex parser after the last flow with eCPRI item
is destroyed. This is to get rid of the alternate states of creating
and destroying eCPRI flex parser with a single eCPRI flow.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The structures and other definitions will be used for the dynamic
flex parser creation via Devx command interface. These structures
will be used as some some intermediate variables and input
parameters for the parser creation API.
It is better to keep all members consistent with the PRM definition
even though some of them will not be used.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
In the translation stage, the eCPRI item should be translated into
the format that lower layer driver could use. All the fields that
need to match must be in network byte order after translation, as
well as the mask. Since the header in the item belongs to the network
layers stack, and the input parameter of the header is considered to
be in big-endian format already.
Base on the definition in the PRM, the DW samples will be used for
matching in the FTE/STE. Now, the type field and only the PC ID, RTC
ID, and DLY MSR ID of the payload will be supported. The masks should
be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are
needed to support such matching. The mask fields could be zeros to
support some wildcard rules. But it makes no sense to support the
rule matching only on the payload but without matching type field.
The DW samples should be stored after the flex parser creation for
eCPRI. There is no need to query the sample IDs each time when
creating a flow rule with eCPRI item. It will not introduce
insertion rate degradation significantly.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When creating a flow with eCPRI header item, the validation of it is
mandatory. The detailed limitations are listed below:
1. Over Ether / VLAN, ethertype must be 0xAEFE.
2. No tunnel support is described in the specification now.
3. L3 layer is only supported when L4 is UDP, see #4.
4. Over TCP is not supported from the specification, and over UDP
is not supported right now.
5. Concatenation indicator matching is not supported now.
6. No need to check the revision.
7. Only type field in the common header is mandatory, and one byte
should be matched integrally.
8. Fields in the message payload header are optional.
9. Only messages with type #0, #2 and #5 are supported now.
Some limitations are only from software right now, because there is
no need to support all the message types and variants of protocol
stack listed in the specification.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:
- txpp_err_miss_int - rearm queue interrupt was not handled
was not handled in time and service routine might miss
the completions
- txpp_err_rearm_queue - reports errors in rearm queue
- txpp_err_clock_queue - reports errors in clock queue
- txpp_err_ts_past - timestamps in the packet being sent
were found in the past, timestamps were ignored
- txpp_err_ts_future - timestamps in the packet being sent
were found in the too distant future (beyond HW/clock queue
capabilities to schedule, typically it is about 16M of
tx_pp devarg periods)
- txpp_jitter - estimated jitter in device clocks between
8K completions of Clock Queue.
- txpp_wander - estimated wander in device clocks between
16M completions of Clock Queue.
- txpp_sync_lost - error flag, the Clock Queue completions
synchronization is lost, accurate packet scheduling can
not be handled, timestamps are being ignored, the restart
of all ports using scheduling must be performed.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch adds send scheduling on timestamps into tx_burst
routine template. The feature is controlled by static configuration
flag, the actual routines supporting the new feature are generated
over this updated template.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The new static control flag is introduced to control
routine generating from template, enabling the scheduling
on timestamps.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The fields to support send scheduling on dynamic timestamp
field are introduced and initialized on device start.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:
- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
continuously running and pushing cross-channel requests
to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularity and assigns to Clock Queue.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization might be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
Clock Queue in robust way.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.
Fixes: f048f3d479 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch introduces the new devargs:
tx_pp - enables accurate packet send scheduling on mbuf timestamps
in the PMD. On the device start if "rte_dynflag_timestamp"
dynamic flag is registered and this devarg non-zero value is
specified, the driver initializes all necessary internal
infrastructure to provide packet scheduling. The parameter
value specifies scheduling granularity in nanoseconds.
tx_skew - the parameter adjusts the send packet scheduling on
timestamps and represents the average delay between beginning
of the transmitting descriptor processing by the hardware and
appearance of actual packet data on the wire. The value should
be provided in nanoseconds and is valid only if tx_pp parameter
is specified. The default value is zero.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
As tx mbuf is not set for some advanced descriptors, if there is no
mbuf checking before rte_pktmbuf_free_seg() function be called on
the process of tx done clean up, that will cause a segfault. So add
a NULL pointer check to fix it.
Bugzilla ID: 501
Fixes: 8d907d2b79 ("net/igb: free consumed Tx buffers on demand")
Cc: stable@dpdk.org
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
When the configure pattern involve GTPU inner l3 and l4, even the
configure input set only l3 but not l4, the different l4 protocol
header should also be configured for the different l4 protocol.
Fixes: 215a247b5f ("net/iavf: refactor hash flow")
Fixes: 642f201950 ("net/iavf: support RSS for IPv4 IPv6 mix of GTP")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
When the configure pattern involve GTPU inner l3 and l4, even the
configure input set only l3 but not l4, the different l4 protocol
header should also be configured for the different l4 protocol.
Fixes: 0b952714e9 ("net/ice: refactor PF hash flow")
Fixes: de32fa2ba2 ("net/ice: support RSS for IPv6 prefix")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Some protocol don't support symmetric hash, need to handle these cases.
When set an invalid symmetric hash rule, just return failed.
Fixes: 4eafe71ee9 ("net/ice: fix RSS type")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
The rte_eth_dev_set_vlan_offload function will check vlan rx offload
capability, the i350/i210/i211 nics have vlan extend feature but
DEV_RX_OFFLOAD_VLAN_EXTEND is not set into the capability, that will
cause setting fail. So need to add this capability in
igb_get_rx_port_offloads_capa function.
Fixes: ef990fb56e ("net/e1000: convert to new Rx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
The rte_eth_dev_set_vlan_offload function will check vlan rx offload
capability, the i40e vf has vlan filter feature but
DEV_RX_OFFLOAD_VLAN_FILTER is not set into the capability, that will
cause setting fail. So need to add this capability in
i40e_vf_representor_dev_infos_get function.
Fixes: e0cb96204b ("net/i40e: add support for representor ports")
Cc: stable@dpdk.org
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
DEV_RX_OFFLOAD_TIMESTAMP is per port, so the internal implementation
shall enable it on per port basis only.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This experimental API is no longer required as the same
purpose can be solved with standard DEV_RX_OFFLOAD_TIMESTAMP
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
A dpdk bonding 802.3ad network as follows:
+----------+ +-----------+
|dpdk lacp |bond1.1 <------> bond2.1|switch lacp|
| |bond1.2 <------> bond2.2| |
+----------+ +-----------+
If a fiber optic go wrong about single pass during normal running like
this:
bond1.2 -----> bond2.2 ok
bond1.2 <--x-- bond2.2 error: bond1.2 receive no LACPDU Some packets
from switch to dpdk will choose bond2.2
and lost.
DPDK lacp state machine will transits to the expired state if no LACPDU
is received before the current_while_timer expires. But if no LACPDU is
received before the current_while_timer expires again, DPDK lacp state
machine has no change. Bond2.2 can not change to inactive depend on the
received LACPDU.
According to IEEE 802.3ad, if no lacpdu is received before the
current_while_timer expires again, the state machine should transits
from expired to defaulted. Bond2.2 will change to inactive depend on the
LACPDU with defaulted state.
This patch adds a state machine change from expired to defaulted when no
lacpdu is received before the current_while_timer expires again
according to IEEE 802.3ad:
If no LACPDU is received before the current_while timer expires again,
the state machine transits to the DEFAULTED state. The record Default
function overwrites the current operational parameters for the Partner
with administratively configured values. This allows configuration of
aggregations and individual links when no protocol partner is present,
while still permitting an active partner to override default settings.
The update_Default_Selected function sets the Selected variable FALSE
if the Link Aggregation Group has changed. Since all operational
parameters are now set to locally administered values there can be no
disagreement as to the Link Aggregation Group, so the Matched variable
is set TRUE.
The relevant description is in the chapter 43.4.12 of the link below:
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=850426
Signed-off-by: Weifeng Li <liweifeng96@126.com>
Acked-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
The function valid_bonded_port_id() has already contains function
rte_eth_dev_is_valid_port(), so delete redundant check.
Fixes: 588ae95e79 ("net/bonding: fix port ID check")
Cc: stable@dpdk.org
Signed-off-by: Dongyang Pan <197020236@qq.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Added support for exact match templates
Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
The default egress rule should include buffer descriptor action
record only if the VF representor is enabled.
Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
The protocol header are implicitly matched based on the proto
field data. For instance, if ether type is set as 0x800 in the
ether header then ipv4 protocol header is assumed to be present
for template matching even if ipv4 header is not present in the
given flow pattern.
Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
OVS-DPDK is accumulating the flow counters that are returned as part of
the flow_query API and it is being issued at least 3 times every second.
So there is no need to accumulate the counts internally in the driver.
Fixes: 306c2d28e2 ("net/bnxt: support count action in flow query")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Removed the check to enable default flows only when VF representor
are enabled. It should be enabled all the time in truflow mode.
Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
Initialize table scope resource manager parameter.
Clear out rm_is_allocated parms before calling as base_index was added
and used incorrectly in this instance.
Signed-off-by: Farah Smith <farah.smith@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Add support for new resource manager to manage CFA resources.
TCAM is split into high and low regions now and CFA resource types
are being updated accordingly.
Signed-off-by: Peter Spreadborough <peter.spreadborough@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Farah Smith <farah.smith@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>