For IPV6, cm->cmsg_level is SOL_IPV6 and cm->cmsg_type is
IPV6_RECVERR. However these combination was not included.
To clarify the fix check if positive conditions are satisfied and
then reverse the result.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I675f4337f383d3526fed1b86794697f41113ed4c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10428
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Options modified on sock after connected is also moved to a
function.
Signed-off-by: Liu Qing <winglq@gmail.com>
Change-Id: I4c2a9ae9858c102764959d055bed208b4b0621d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9516
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This reverts commit 2cd948c4a6.
This commit caused drop in performance tests.
More info in issue https://github.com/spdk/spdk/issues/2158
Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Change-Id: Id5d353535323c79e773e33377af388dae47238cb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9510
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
In order to not affect the loopback test.
Also create a sock_common.c file which can be used by posix/uring
implementation. We do not put such code in sock.c. Because sock.c
is the general layer. Other users may include their own user space
sock impelmentations. So put those common code in sock_common.c instead.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I983ec2313119539e6eed2d9f11ba1488c0ed6560
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8769
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This parameter is still part of API spdk_sock_impl_opts
structure but it is not used. Keep it to support ABI
compatibility since it is located in the middle of the
structure and removing it may break socket opts initialization
or parsing.
Change-Id: Ib641ad7d965d68bc9ebb65dba531408d88cf6fa1
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8914
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When entering the if case to order the list, there is bug should be
fixed. The original code does not address this.
The way this happens is when there is a connection left in the socks_with_data list
between polls and there are enough new events detected that it would exceed the
maximal number of events. A connection is left on this list between polls if it isn't
fully drained via reads by the upper layer on each poll loop.
Currently, the maximal socket event num is 32. Then we did not hit this issue
in our normal test cases. But when you use NVMe-oF tcp target to test which is
described in #2105, there are more than 32 active sockets, and it exceeeds
the maximal num of events of polling (32), so we will trigger this issue.
Fixes issue #2015
Change-Id: I9384476fdba8826f5fe55a5d2594e3f4ed3832ba
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8541
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In posix_sock_create(), we loops through all the addresses available.
If something is wrong, we should close(fd) and set fd to -1, and
try the next address. Only, when one fd satisfies all conditions,
we will break the loop with the useful fd.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Change-Id: Icbfc10246c92b95cacd6eb058e6e46cf8924fc4c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8310
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
After reading the code in detail, I think that we should
not set pipe_has_data= true and socket_has_data at the same time.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I8f9f96b16f4f0e0c585877a0dd687a240252a7cf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8283
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Move from a single flag indicating that the socket is on the
pending_events list to two flags - pipe_has_data and socket_has_data. If
either flag is true, the socket is on the socks_with_data list.
This is necessary to track enough state to avoid doing extra recv()
system calls.
Change-Id: I65e5701dccb0a5bade19f266f164f26706b110d4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7595
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When zcero copy send is enabled and used by initiator,
it could significantly increase latency in some payloads.
To enable more fine graing configuration of zero copy
send feature, add new parameters enable_zerocopy_send_server
and enable_zerocopy_send_client to spdk_sock_impl_opts to
enable/disable zcopy for specific type of sockets.
Exisiting enable_zerocopy_send parameter affects all types
of sockets.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I111c75608f8826980a56e210c076ab8ff16ddbdc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7457
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
On host side the connections are created and then added to thread's
poll group. Those connections could use different NIC queues underneath.
To route all connections of poll group through single queue a unique
placement id is chosen as group_placement_id and each socket of poll
group is marked with group_placment_id using getsockopt(SO_MARK) option.
The driver could use so_mark value of skb to determine the queue to use.
Change-Id: I06bda777fe07a62133b80b2491fa7772150b3b5d
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6160
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We do not want to do any further work on adding
the sock to the group if the epoll_ctl (or kevent)
fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I44b6dc86ce5676aa1b8d6c50b86f22758e4e37fa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7594
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
suppression
When all of the following conditions are met:
- non-blocking socket
- zero copy is enabled
- interrupts are suppressed (i.e. busy polling)
- NIC tx queue is full at the time sendmsg() is called
- epoll_wait sees there is already an EPOLLIN event
then we can get into a situation where data we've sent is queued
up in the kernel network stack, but interrupts have been suppressed
because other traffic is flowing. This makes the kernel miss the
signal to flush the software tx queue. If there wasn't also already
a pending EPOLLIN event, then epoll_wait would have been sufficient
to kick the system out of this state. But when all of this aligns,
it hangs.
We deal with this by detecting the scenario and calling poll(), which
will force the kernel to issue the pending transmits.
Change-Id: Ifb247159b7de16c8fc72a90f0333f5b421c8bd07
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6750
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Busy pollng using recv() is dependent on kernel socket buffer being
empty. Instead poll() function busy polls hw queues with no such dependency.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: I1cb101848d51f7778cdf3d4c015d2d03201bdb37
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7014
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Since the maps are unique to modules, they can store the group_impls
directly.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I7f11db558e38e940267fdf6eaacbe515334391c2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7222
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This allows for different policies per module, as well as overlapped
placement_id values.
Change-Id: I0a9c83e68d22733d81f005eb054a4c5f236f88d9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7221
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
There was a fix for this that went into the posix layer, but the
underlying problem is the logic in the nvme/tcp transport. Attempt to
fix that instead.
Change-Id: I04dd850bb201641d441c8c1f88c7bb8ba1d09e58
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6751
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This list will hold any socket that has some event pending and needs to
be part of the set returned during polling of the group.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I5acf01677e59c1026f93671c7b7b3dc458075bf7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6748
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Instead of iterating the list, we can just manipulate the list
in a single step.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I0172cdbce9af35a62d62dbccfac573e5d723f43a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6747
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Instead, move it down to the modules. This allows modules
to potentially change the value, if they are able.
Change-Id: I08f5fbadf5d1e96b489ddaaca72aa051ce2cb85c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7212
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This value is already available in the options structure.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I140dc79da1fa5f155a39f1f9e2f54f46d93b6c1c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7211
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Currently there is possibility of adding a sock to pending_recv
list again if sock->pending_recv is true. Check if flag is false
before adding to the list.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: Ie23e1e8dbe1aa5594d9ddea30e7f235e3bf8ddad
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6381
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
This seems to be cleaning up the pending_recv list to account for the
missed cases in the previous patches in this series. Now that we're
correctly cleaning up the list, don't do this.
Note that if an EPOLLIN event is received but the application never does
a read/recv, the socket will remain in the pending recv list. The next
poll will get another EPOLLIN event, but the logic already handles that
case.
Additionally, left a TODO for a performance optimization.
Change-Id: I1cdde500a5c76554401a89de766d35b7a486b207
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6746
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If there was an EPOLLIN event the socket gets adding to the pending_recv
list. But if the application then does a very large read, it will bypass
the logic that clears the socket from the pending_recv list. Fix this.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ia0ba86012f7c6dfd14eb43ba6eeed94dbbce90ce
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6744
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
from pending_recv list
If the upper layer performs a read/recv, it should still remove the
socket from the pending_recv list.
Change-Id: I32ca8ecccbfe1e53ecc7d6f57343c2727e84b851
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6743
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Leverage SO_INCOMING_CPU to get the CPU affinity of connections
(sockets). And allocate the connections to specific poll groups,
which aims to utilize cache locality.
From our test:
6 P4600 NVMe on target,target uses 8 cores, NIC irqs are bound to
these 8 cores, and initiator side uses 24 and 32 cores,
we can get 11%~17% randwrite performance boost for posix, and 8%~12%
for uring.
Change-Id: I011e0a21502c85adcccd4a14fbe9838b43f54976
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5748
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The statement causes this issue is:
assert(group_impl->num_removed_socks < MAX_EVENTS_PER_POLL);
The call trace is:
The previous solution is:
commitid with: e71e81b631
But with this solution, it will always add the sock
into the removed_socks list even if it is not under polling
context by sock_group_impl_poll_count. So it will exceed the size of
removed_socks array if sock_group_impl_poll_count function will not be
called. And we should not use a large array, because it is just a workaround,
it just hides the bug.
So our current solution is:
1 Remove the code in sock layer, i.e., rollback the commit
e71e81b631. This patch is
not the right fix. The sock->cb_fn's NULL pointer case is
caused by the cb_fn of write operation (if the
spdk_sock_group_remove_sock is inside the cb_fn). And it is not
caused by the epoll related cache issue described in commit
"e7181.." commit, but caused by the following situation:
(1)The socket's cb_fn is set to NULL which is caused by
spdk_sock_group_remove_sock by the socket itself
inside a call back function from a write operation.
(2) And the socket is already in the pending_recv list. It is
not caused by the epoll event issue, e.g., socket A changes Socket B's
cb_fn. By the way, A socket A should never remove a socket B from a polling group.
If it really does it, it should use spdk_thread_sendmsg to make sure
it happens in the next round.
2 Add the code check in each posix, uring implementation module.
If sock->cb_fn is NULL, we will not return the socket to the active socks list.
And this is enough to address the issue.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I79187f2f1301c819c46a5c3bdd84372f75534f2f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6472
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Default to using epoll unless __FreeBSD__ is defined. Add macros SPDK_KEVENT
and SPDK_EPOLL to indicate whether epoll or kevent is present. The macros
match the naming convention for SPDK_ZEROCOPY which controls zero copy
in a similar way.
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Change-Id: I4c46fb94b254cb075427bfe07a8085887254c45a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6466
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When socket is being created and zcopy is disabled
by the config, we can return from posix_sock_alloc
function before we try to set quick_ack
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I6670b8337e70ec12b18a5e6753674fbef9e95648
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6382
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
A single sock connection can call posix_sock_flush,
and this sock may not belong to a polling group.
So add the check in sock_check_zcopy to avoid such issue.
Fixes#1788
Change-Id: Id0a2f80ad0f3cdb7fc736a3be3211e49513751b1
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6334
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
To allow SO_MINOR updates on LTS for the whole year it is supported,
the major version for all components needs to be increased.
This is to prevent scenario where two versions exists with matching
versions, but conflicting ABI.
Ex. Next SPDK release adds an API call increasing the minor version,
then LTS needs just a subset of those additions.
Increasing major so version after LTS, allows the quarterly releases
to update versions as needed. Yet allowing LTS to increase minor
version separately.
Disabled test for increasing SO version without ABI change, as
that is goal of this patch. This check shall be removed with SPDK 21.04
release.
This patch:
- increases SO_VER by 1 for all components
- resets SO_MINOR to 0 for all components
- removes suppressions for ABI tests
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I44d01154430a074103bd21c7084f44932e81fe72
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6167
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When zero copy is enabled in initiator, there could
be the case that a socket connection does not belong
to a polling group, i.e, the application does not use
socket polling group. Then we should actively call
_sock_check_zcopy in posix_sock_flush function when zero
copy policy is enabled.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Idceaa7557eb265daa878db40c922494c3de35ea8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5423
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Mellanox Build Bot
In NVMF TCP initiator when zero copy is disabled,
all requests are completed when we receive EPOLLIN event
for socket, add socket to pending_recv list and call socket's
callback which calls qpair_process_completions. As part of
completions processing on NVME level we receive the number
of completions and resubmit the same number of queued requests.
When zero copy is enabled, some transport requests can be
completed when we receive and process EPOLLERR event, it
happens out of qpair_process_completions context. So part
of requests can be completed, transport level contains
free requests but NVME layer don't have info about it
until it calls qpair_process_completions. And there is
a chance that on posix level when we poll sockets we
receive only EPOLLERR flag without EPOLLIN. In this
case we can complete several requests but don't call
qpair_process_completion so we don't resubmit queued
requests. It may lead to a hang in the end of test run
when there are no mo requests to be completed on transport
level (no EPOLLIN event) and we receive EPOLERR only,
so we can't resubmit queured requests.
This patch fixes this problem, it add a socket to
group's pending_recv list if we received EPOLERR
event and completed at least 1 socket request.
So socket's callback can be called even without
EPOLLIN event.
Fixes issue #1685
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I21d5c2fe6eb0787aab9531925a7f0e2fe18bafaa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6162
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The purpose is to reduce the duplicated functions
in posix and uring implmentation.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ia0568b2490d362e7e78fa59b3ca88a60313ba0bd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5284
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
By design the opts for each implementation
can not match spdk_sock_impl_opts.
During get_opts for specific implementation
only used fields are filled.
Yet iterating over all spdk_sock_impl_opts fields
would yeild garbage values for unset fields.
This is the case right now when doing save_config RPC
with uring enabled. A garbe value for enable_zerocopy_send
is returned.
sock.c:829:62: runtime error: load of value 165, which is not a valid value for type '_Bool'
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ie0512a7dffc36c8ff89256d08f8a2f4fefcf9e83
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4699
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In NVME TCP initiator zero copy is enabled for IO qpairs
and disabled for admin qpairs
Change-Id: Ibdf521dccde9b95ec5dd15a5eb2baed8fcf8b88e
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4211
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
A preparation step for enabling zero copy in NVMEoF TCP initiator.
This option will be used to disable zero copy
for admin qpair. This is needed since the admin
qpair's socket is not connected to socket poll group
and we can't receive buffer reclaim notification.
Change-Id: Ibfbb8a156aafcd7ba8975a50f790da7fbd37d96f
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4210
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When net.core.optmem_max is not set high enough, a call to sendmsg()
might fail with ENOBUFS. Currently this is treated as an error.
When we have no more buffer space left, we should continue to process
any completions and by doing so, free up the auxiliary buffers we ran out
of.
With this change I was able to run perf against the spdk target with a
purposely set to a low, value of optmem_max, where previously it would
fail.
This fixes github issue #1592
Signed-off-by: Jeffry Molanus <jeffry.molanus@gmail.com>
Change-Id: Ieeeed4fbecd827d0da815456b57fbe81495fe54d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch is used to enable placement_id getting
in sock layer and also add the rpc support.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I70de57b0ed392a0aefce9d3ff1f61ef924015a87
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4146
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The type of sendmsg_idx is uint32_t, so the maximal
is 2^32 -1, so it could be overflow and get 0, so
we should fix it.
PS: I think that our code may have potential defect.
In my experiment, I try to init sendmsg_idx with 2^32 -1,
so the first req->internal.offset = 2^32 - 1.
But for the ee_info and ee_data in "struct sock_extended_err"
got from _sock_check_zcopy is all 0 in the target side.
So it means that the this req will never be completed.
With the increase of sendmsg_idx (the type is
uint32_t), sendmsg_idx will finally goto 2^32 - 1, so I
think it will still kick the issue I described.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ic9aaf629d73d5b7e2c81800a4f7f92c728adbc34
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3948
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Add a helper function iscsi_parse_redirect_addr() to validate the
passed IP address-port pair.
iSCSI login redirection will support only numeric IP address and
TCP port, and add AI_NUMERICSERV and AI_NUMERICHOST.
This function is almost same as nvme_tcp_parse_addr() and
nvme_rdma_parse_addr().
Besides, update error log in posix_sock_create() to use
gai_strerror(). gai_strerror() will provide more accurate
information as done by nvme_tcp_parse_addr() and nvme_rdma_parse_addr().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I65c6de81a64dcb26551ce796172d0458e1c298a7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3357
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
TCP delayed ACK can be disabled or enabled by enabling or disabling
quick ACK, respectively.
The recently added spdk_sock_impl_opts is helpful for sock library
to control quick ACK.
Hence this patch adds and uses an option enable_quickack. The option
is effective only for the POSIX sock module.
We have spdk_sock_opts now too but spdk_sock_impl_opts will be better
for this case.
This option is not supported on FreeBSD. FreeBSD users can set the
option globally via sysctl if desired.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic89620267acce5872dc8ecaf7a99bb70ae97e993
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3603
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Zero copy send can cause performance degradation with small
payloads. This patch adds an option to disable it if required. By
default zero copy is enabled.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I14f2b21ad375e770cb08f850360898bac675b351
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Receive pipe reduces number of system calls and gives significant
performance improvement with kernel TCP stack and relatively small IO
sizes. With user space TCP/IP implementations there are no system
calls and double buffering introduced by pipe has negative impact on
performance. Receive pipe remains enabled by default.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: Ic5ddee42293df2c233ba7ffbe6662de7917ac586
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3343
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>