For each mbuf byte, free_space[i] == 0 means the space is occupied,
free_space[i] != 0 means space is free.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The value free_space[i] is used to save the size of biggest aligned
element that can fit in the zone, current implementation has one flaw,
for example, if user registers dynfield1 (size = 4, align = 4, req = 124)
first, the free_space would be as below after registration:
0070: 08 08 08 08 08 08 08 08
0078: 08 08 08 08 00 00 00 00
Then if user continues to register dynfield2 (size = 4, align = 4),
free_space would become:
0070: 00 00 00 00 04 04 04 04
0078: 04 04 04 04 00 00 00 00
Further request dynfield3 (size = 8, align = 8) would fail to register
due to alignment requirement can't be satisfied, though there is enough
space remained in mbuf.
This patch fixes above issue by saving alignment only in aligned zone,
after the fix, above registrations order can be satisfied, free_space
would be like:
After dynfield1 registration:
0070: 08 08 08 08 08 08 08 08
0078: 04 04 04 04 00 00 00 00
After dynfield2 registration:
0070: 08 08 08 08 08 08 08 08
0078: 00 00 00 00 00 00 00 00
After dynfield3 registration:
0070: 00 00 00 00 00 00 00 00
0078: 00 00 00 00 00 00 00 00
This patch also reduces iterations in process_score() by jumping align
steps in each loop.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Set rte_errno as ENOMEM when allocation failure.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
We should make sure off + size < sizeof(struct rte_mbuf) to avoid
possible out-of-bounds access of free_space array, there is no issue
currently due to the low bits of free_flags (which is adjacent to
free_space) are always set to 0. But we shouldn't rely on it since it's
fragile and layout of struct mbuf_dyn_shm may be changed in the future.
This patch adds boundary check explicitly to avoid potential risk of
out-of-bounds access.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
To fix CVE-2020-12888, the linux vfio-pci module will invalidate mmaps
and block MMIO access on disabled memory, it will send a SIGBUS to the
application:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abafbc551fdd
When the application opens the vfio PCI device, the vfio-pci module will
enable the bus memory space through PCI read/write access. According to
the PCIe specification, the 'Memory Space Enable' is always zero for VF:
Table 9-13 Command Register Changes
Bit Location | PF and VF Register Differences | PF | VF
| From Base | Attributes | Attributes
-------------+--------------------------------+------------+-----------
| Memory Space Enable - Does not | |
| apply to VFs. Must be hardwired| Base | 0b
1 | to 0b for VFs. VF Memory Space | |
| is controlled by the VF MSE bit| |
| in the VF Control register. | |
-------------+--------------------------------+------------+-----------
Afterwards the vfio-pci will initialize its own virtual PCI config space
data ('vconfig') by reading the VF's physical PCI config space, then the
'Memory Space Enable' bit in vconfig will always be 0b value. This will
make the vfio-pci treat the BAR memory space as disabled, and the SIGBUS
will be triggered if access these BARs.
By investigation, the VF PCI device *passthrough* into the Guest OS by
QEMU has the 'Memory Space Enable' with 1b value. That's because every
PCI driver will start to enable the memory space, and this action will
be hooked by vfio-pci virtual PCI read/write to set the 'Memory Space
Enable' in vconfig space to 1b. So VF runs in guest OS has 'Mem+', but
VF runs in host OS has 'Mem-'.
Align with PCI working mode in Guest/QEMU/Host, in DPDK, enable the PCI
bus memory space explicitly to avoid access on disabled memory.
Fixes: 33604c3135 ("vfio: refactor PCI BAR mapping")
Cc: stable@dpdk.org
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Harman Kalra <hkalra@marvell.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Tested-by: Thierry Martin <thierry.martin.public@gmail.com>
Caught by code review, bitops test name is incorrect.
Fixes: 7660614c11 ("test/bitops: add bit operations test case")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Phil Yang <phil.yang@arm.com>
The path to the socket when running the script as a regular user needed
to be updated to match the logic in EAL.
Fixes: 6a2967c112 ("usertools: add new telemetry script")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
vmbus_map_addr is used as the next start virtual address for mapping ring
buffer. However it's updated based on ring_buf, which is a pointer to an
address on the stack. The next ring buffer may be mapped to an unexpected
address.
Fix this by calculating vmbus_map_addr based on returned virtual address.
Fixes: 3f9277031a ("bus/vmbus: fix check for mmap failure")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
64-bit support was missing from the functions pipe_profile_check
and rte_sched_subport_config_pipe_profile_table.
Fixes: 68c1f26d42 ("sched: support 64-bit values")
Cc: stable@dpdk.org
Signed-off-by: Archit Pandey <architpandeynitk@gmail.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
In function rte_sched_subport_free, there is code to free all allocated
stuff related to scheduler subport.
First there are some checks, and in the end, rte_bitmap_free is called.
Now, rte_bitmap_free is a dummy function, and it just checks if
provided pointer to bitmap is valid or not. So, actual memory for
subport is not freed.
This patch fixes this by removing call to rte_bitmap_free, and
instead calling rte_free.
Fixes: d9213b829a ("sched: remove pipe params config from port level")
Cc: stable@dpdk.org
Signed-off-by: Hrvoje Habjanic <hrvoje.habjanic@zg.ht.hr>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
When printf()'s stdout is line-buffered for terminal, it is fully
buffered for pipes. So, stdout listener can only get the output
when it is flushed (on program termination, when buffer is filled or
manual flush).
stdout buffer might fill slowly since every stats report could be small.
Also when it is fully filled it might contain a part of the last stats
report which makes it very inconvenient for any automation which reads
and parses the output.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Georgiy Levashov <georgiy.levashov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
In order to optimize the PCI management, RTE_KDRV_NONE based
device driver probing removed by not adding them to list in
the scan phase.
The legacy virtio is the only consumer of RTE_KDRV_NONE based device
driver probe scheme. The legacy virtio support will be available
through the existing VFIO/UIO based kernel driver scheme.
This patch also removes the deprecation notice for the same.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Make x86 JIT to generate native code for
(BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD)
instructions.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
To fill the gap with linux kernel eBPF implementation,
add support for two non-generic instructions:
(BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD)
which are used to access packet data.
These instructions can only be used when BPF context is a pointer
to 'struct rte_mbuf' (i.e: RTE_BPF_ARG_PTR_MBUF type).
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
eval_add()/eval_sub() not always correctly estimate
minimum and maximum possible values of add/sub operations.
Fixes: 8021917293 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Address for few small issues:
- unreachable return statement
- failed test-case can finish with 'success' status
Also use unified cmp_res() function to check return value.
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Support the debug functions in eal_common_debug.c for Windows.
Implementation of rte_dump_stack to get a backtrace similarly to Unix
and of rte_eal_cleanup in eal.c.
Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Move common functions between Unix and Windows to eal_common_debug.c.
Those functions are rte_exit, __rte_panic and rte_dump_registers
which has the same implementation on Unix and Windows.
Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
An issue has been observed where epoll file descriptor
list rebuilds every time an interrupt/alarm event is
received.
eal_intr_process_interrupts() should notify pipe fd only
if any source is removed from the source list i.e (rv > 0)
Fixes: 0c7ce182a7 ("eal: add pending interrupt callback unregister")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>
The code didn't compile when using exported meter functions under Windows.
error LNK2001: unresolved external symbol
rte_meter_srtcm_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_srtcm_color_blind_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_color_blind_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_rfc4115_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_rfc4115_color_blind_check
The cause was that there were some inline functions that were included in
the export list.
To solve this the functions were removed from rte_meter_version.map export
list which are implemented in the header and shouldn't be exported.
Fixes: 655796d2b5 ("meter: support RFC4115 trTCM")
Fixes: 9d41beed24 ("lib: provide initial versioning")
Cc: stable@dpdk.org
Signed-off-by: Fady Bader <fady@mellanox.com>
EAL common timer doesn't compile under Windows.
Compilation log:
error LNK2019:
unresolved external symbol nanosleep referenced in function
rte_delay_us_sleep
error LNK2019:
unresolved external symbol get_tsc_freq referenced in function set_tsc_freq
error LNK2019:
unresolved external symbol sleep referenced in function set_tsc_freq
The reason was that some functions called POSIX functions.
The solution was to move POSIX dependent functions from common to Unix.
Signed-off-by: Fady Bader <fady@mellanox.com>
Reviewed-by: Tal Shnaiderman <talshn@mellanox.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Provide a more direct link for installer download and clarify thread
model choice during installation. As pthread is not a requirement,
remove notice about its possible runtime dependency.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
Even if pthread is provided by the toolchain, it is not needed for DPDK
on Windows, because internal shim is used. As a side-effect, this
enables cross-build with MinGW configured with non-POSIX thread library,
e.g. mcfgthread, which is the default on some distributions.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
Disable all unused firmware resources during init time to give
more resources for HASH (exact-match) filter region and always
request firmware to enable HASH filter support when resources
are available.
Signed-off-by: Karra Satwik <kaara.satwik@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Create a set of verbs callbacks in 'struct mlx5_verbs_ops'
and add MR operations to it (file net/mlx5/linux/mlx5_verbs.c).
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Prior to this commit MR operations were verbs based and hard coded under
common/mlx5/linux directory. This commit enables upper layers (e.g.
net/mlx5) to determine which MR operations to use. For example the net
layer could set devx based MR operations in non-Linux environments. The
reg_mr and dereg_mr callbacks are added to the global per-device MR
cache 'struct mlx5_mr_share_cache'.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The glue verbs operations reg_mr and dereg_mr are wrapped and exported
in functions mlx5_common_verbs_reg_mr and mlx5_common_verbs_dereg_mr
respectively. The exported functions are added to a new file
linux/mlx5_common_verbs.c.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Replace 'struct ibv_mr *' (in 'struct mlx5_mr') with a new 'struct
mlx5_pmd_mr'. The new struct contains the required MR field: lkey,
addr, len and is independent of ibv.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
commit 536db938a4 ("net/cxgbe: add devargs to control filtermode and
filtermask") allows configuring hardware to select specific combination
of header fields to match in the incoming packets. However, the default
mask is set for all fields in the requested pattern items, even if the
field is not explicitly set in the combination and results in
validation errors. To prevent this, ignore setting the default masks
for the unrequested fields and the hardware will also ignore them in
validation, accordingly. Also, tweak the filter spec before finalizing
the masks.
Fixes: 536db938a4 ("net/cxgbe: add devargs to control filtermode and filtermask")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Free up Source MAC Table (SMT) entry properly during filter create
failure and filter delete.
Fixes: 993541b2fa ("net/cxgbe: support flow API for source MAC rewrite")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
The Multi Port Switch (MPS) entry is allocated twice when both
flow validate and create are invoked, but only freed once during
flow destroy. Avoid double alloc by moving MPS entry allocation
closer to when the filter create request is sent to hardware and
will be ignored for filter validate request.
Fixes: fefee7a619 ("net/cxgbe: add flow ops to match based on dest MAC")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Free up Compressed Local IP (CLIP) entry properly during filter
creation failure path. Also consolidate all various tables
cleanup to a common function and invoke it from both wild-card
and exact-match filter paths.
Fixes: af44a57798 ("net/cxgbe: support to offload flows to HASH region")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Remove PPPoD's packet type from PPPoE's ptype bitmap.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
In the flow API, add ability to add IPV4/IPV6 rules that match on
packets with or without inner L4 protocols.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
There isn't a case for 1G SGMII in ice_get_media_type() so add
the handling for it.
Also handle the special case where some direct attach
cables may report that they support 1G SGMII, but
that is erroneous since SGMII is supposed to be a
backplane media type (between a MAC and a PHY). If
the driver doesn't handle this special case then a
user could see the 'Port' in ethtool change from
'Direct attach Copper' to 'Backplane' when they have
forced the speed to 1G, but the cable hasn't changed.
Lastly, change ice_aq_get_phy_caps() to save the
module_type info if the function was called with
ICE_AQC_REPORT_TOPO_CAP. This call uses the media
information to populate the module_type. If no
media is present then the values in module_type
will be 0.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
This patch add initialization for prof_res_bm_init flag
to zero in order that the possible resource for field vector
in the package file can be initialized.(in ice_init_prof_result_bm)
Fixes: 453d087cca ("net/ice/base: add common functions")
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
This patch add more tunnel type definition ipv4/ipv6 packet,
it enable tcp/udp layer of ipv4/ipv6 as L4 payload but without
L4 dst/src port number as input set for switch filter rule.
For example:
we can download a switch rule to direct ipv4 packet with specific
source and destination ip address to queue index 1.
"eth / ipv4 src is 192.168.0.1 dst is 192.168.0.2 / udp /
end actions queue index 1 / end"
this type of rule will be matched with ipv4/udp file only.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
The parameter ref_cnt is used for tracking how many
rules are reusing this VSI list, so it can only be
updated when a rule which using this list be deleted.
Fixes: f89aa3affa ("net/ice/base: support removing advanced rule")
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
When free vsi list resource after vsi list update to empty, some
useless code in function ice_remove_vsi_list_rule() should be deleted.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
Add support for LLDP forwarding to SW programming in FW
LLDP Filter Control is 0x0A0A.
Signed-off-by: Sharon Haroni <sharon.haroni@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
Distribute the tx queues evenly across all queue groups. This will
help the queues to get more equal sharing among the queues when all
are in use.
In the previous algorithm, the next queue group node will be picked up
only after the previous one filled with max children.
For example: if VSI is configured with 9 queues, the first 8 queues
will be assigned to queue group 1 and the 9th queue will be assigned to
queue group 2.
The 2 queue groups split the bandwidth between them equally (50:50).
The first queue group node will share the 50% bandwidth with all of
its children (8 queues). And the second queue group node will share
the entire 50% bandwidth with its only children.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
By default the queues are configured in legacy mode. The default
bandwidth settings for legacy/advanced modes are different. The existing
code was using the advanced mode default value of 1 which was
incorrect. This caused the unbalanced BW sharing among siblings.
The recommended default value is applied.
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>