When zcero copy send is enabled and used by initiator,
it could significantly increase latency in some payloads.
To enable more fine graing configuration of zero copy
send feature, add new parameters enable_zerocopy_send_server
and enable_zerocopy_send_client to spdk_sock_impl_opts to
enable/disable zcopy for specific type of sockets.
Exisiting enable_zerocopy_send parameter affects all types
of sockets.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7457 (master)
(cherry picked from commit 8e85b675fc)
Change-Id: I111c75608f8826980a56e210c076ab8ff16ddbdc
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7638
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
On host side the connections are created and then added to thread's
poll group. Those connections could use different NIC queues underneath.
To route all connections of poll group through single queue a unique
placement id is chosen as group_placement_id and each socket of poll
group is marked with group_placment_id using getsockopt(SO_MARK) option.
The driver could use so_mark value of skb to determine the queue to use.
Change-Id: I06bda777fe07a62133b80b2491fa7772150b3b5d
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6160
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We do not want to do any further work on adding
the sock to the group if the epoll_ctl (or kevent)
fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I44b6dc86ce5676aa1b8d6c50b86f22758e4e37fa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7594
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
suppression
When all of the following conditions are met:
- non-blocking socket
- zero copy is enabled
- interrupts are suppressed (i.e. busy polling)
- NIC tx queue is full at the time sendmsg() is called
- epoll_wait sees there is already an EPOLLIN event
then we can get into a situation where data we've sent is queued
up in the kernel network stack, but interrupts have been suppressed
because other traffic is flowing. This makes the kernel miss the
signal to flush the software tx queue. If there wasn't also already
a pending EPOLLIN event, then epoll_wait would have been sufficient
to kick the system out of this state. But when all of this aligns,
it hangs.
We deal with this by detecting the scenario and calling poll(), which
will force the kernel to issue the pending transmits.
Change-Id: Ifb247159b7de16c8fc72a90f0333f5b421c8bd07
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6750
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Busy pollng using recv() is dependent on kernel socket buffer being
empty. Instead poll() function busy polls hw queues with no such dependency.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: I1cb101848d51f7778cdf3d4c015d2d03201bdb37
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7014
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Since the maps are unique to modules, they can store the group_impls
directly.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I7f11db558e38e940267fdf6eaacbe515334391c2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7222
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This allows for different policies per module, as well as overlapped
placement_id values.
Change-Id: I0a9c83e68d22733d81f005eb054a4c5f236f88d9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7221
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
There was a fix for this that went into the posix layer, but the
underlying problem is the logic in the nvme/tcp transport. Attempt to
fix that instead.
Change-Id: I04dd850bb201641d441c8c1f88c7bb8ba1d09e58
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6751
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This list will hold any socket that has some event pending and needs to
be part of the set returned during polling of the group.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I5acf01677e59c1026f93671c7b7b3dc458075bf7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6748
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Instead of iterating the list, we can just manipulate the list
in a single step.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I0172cdbce9af35a62d62dbccfac573e5d723f43a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6747
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Instead, move it down to the modules. This allows modules
to potentially change the value, if they are able.
Change-Id: I08f5fbadf5d1e96b489ddaaca72aa051ce2cb85c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7212
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This value is already available in the options structure.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I140dc79da1fa5f155a39f1f9e2f54f46d93b6c1c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7211
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Currently there is possibility of adding a sock to pending_recv
list again if sock->pending_recv is true. Check if flag is false
before adding to the list.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: Ie23e1e8dbe1aa5594d9ddea30e7f235e3bf8ddad
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6381
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
This seems to be cleaning up the pending_recv list to account for the
missed cases in the previous patches in this series. Now that we're
correctly cleaning up the list, don't do this.
Note that if an EPOLLIN event is received but the application never does
a read/recv, the socket will remain in the pending recv list. The next
poll will get another EPOLLIN event, but the logic already handles that
case.
Additionally, left a TODO for a performance optimization.
Change-Id: I1cdde500a5c76554401a89de766d35b7a486b207
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6746
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If there was an EPOLLIN event the socket gets adding to the pending_recv
list. But if the application then does a very large read, it will bypass
the logic that clears the socket from the pending_recv list. Fix this.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ia0ba86012f7c6dfd14eb43ba6eeed94dbbce90ce
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6744
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
from pending_recv list
If the upper layer performs a read/recv, it should still remove the
socket from the pending_recv list.
Change-Id: I32ca8ecccbfe1e53ecc7d6f57343c2727e84b851
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6743
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Leverage SO_INCOMING_CPU to get the CPU affinity of connections
(sockets). And allocate the connections to specific poll groups,
which aims to utilize cache locality.
From our test:
6 P4600 NVMe on target,target uses 8 cores, NIC irqs are bound to
these 8 cores, and initiator side uses 24 and 32 cores,
we can get 11%~17% randwrite performance boost for posix, and 8%~12%
for uring.
Change-Id: I011e0a21502c85adcccd4a14fbe9838b43f54976
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5748
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Use the same style compared the code in posix_sock_close.
Thus if we cannot close sock->fd, i.e., we leak the fd,
but we can still free the memory related with uring sock.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Id2f0e8a2c7065f100c2b009e76a49b528fd221b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6539
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The statement causes this issue is:
assert(group_impl->num_removed_socks < MAX_EVENTS_PER_POLL);
The call trace is:
The previous solution is:
commitid with: e71e81b631
But with this solution, it will always add the sock
into the removed_socks list even if it is not under polling
context by sock_group_impl_poll_count. So it will exceed the size of
removed_socks array if sock_group_impl_poll_count function will not be
called. And we should not use a large array, because it is just a workaround,
it just hides the bug.
So our current solution is:
1 Remove the code in sock layer, i.e., rollback the commit
e71e81b631. This patch is
not the right fix. The sock->cb_fn's NULL pointer case is
caused by the cb_fn of write operation (if the
spdk_sock_group_remove_sock is inside the cb_fn). And it is not
caused by the epoll related cache issue described in commit
"e7181.." commit, but caused by the following situation:
(1)The socket's cb_fn is set to NULL which is caused by
spdk_sock_group_remove_sock by the socket itself
inside a call back function from a write operation.
(2) And the socket is already in the pending_recv list. It is
not caused by the epoll event issue, e.g., socket A changes Socket B's
cb_fn. By the way, A socket A should never remove a socket B from a polling group.
If it really does it, it should use spdk_thread_sendmsg to make sure
it happens in the next round.
2 Add the code check in each posix, uring implementation module.
If sock->cb_fn is NULL, we will not return the socket to the active socks list.
And this is enough to address the issue.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I79187f2f1301c819c46a5c3bdd84372f75534f2f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6472
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Default to using epoll unless __FreeBSD__ is defined. Add macros SPDK_KEVENT
and SPDK_EPOLL to indicate whether epoll or kevent is present. The macros
match the naming convention for SPDK_ZEROCOPY which controls zero copy
in a similar way.
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Change-Id: I4c46fb94b254cb075427bfe07a8085887254c45a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6466
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When socket is being created and zcopy is disabled
by the config, we can return from posix_sock_alloc
function before we try to set quick_ack
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I6670b8337e70ec12b18a5e6753674fbef9e95648
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6382
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
A single sock connection can call posix_sock_flush,
and this sock may not belong to a polling group.
So add the check in sock_check_zcopy to avoid such issue.
Fixes#1788
Change-Id: Id0a2f80ad0f3cdb7fc736a3be3211e49513751b1
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6334
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
To allow SO_MINOR updates on LTS for the whole year it is supported,
the major version for all components needs to be increased.
This is to prevent scenario where two versions exists with matching
versions, but conflicting ABI.
Ex. Next SPDK release adds an API call increasing the minor version,
then LTS needs just a subset of those additions.
Increasing major so version after LTS, allows the quarterly releases
to update versions as needed. Yet allowing LTS to increase minor
version separately.
Disabled test for increasing SO version without ABI change, as
that is goal of this patch. This check shall be removed with SPDK 21.04
release.
This patch:
- increases SO_VER by 1 for all components
- resets SO_MINOR to 0 for all components
- removes suppressions for ABI tests
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I44d01154430a074103bd21c7084f44932e81fe72
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6167
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When zero copy is enabled in initiator, there could
be the case that a socket connection does not belong
to a polling group, i.e, the application does not use
socket polling group. Then we should actively call
_sock_check_zcopy in posix_sock_flush function when zero
copy policy is enabled.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Idceaa7557eb265daa878db40c922494c3de35ea8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5423
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Mellanox Build Bot
In NVMF TCP initiator when zero copy is disabled,
all requests are completed when we receive EPOLLIN event
for socket, add socket to pending_recv list and call socket's
callback which calls qpair_process_completions. As part of
completions processing on NVME level we receive the number
of completions and resubmit the same number of queued requests.
When zero copy is enabled, some transport requests can be
completed when we receive and process EPOLLERR event, it
happens out of qpair_process_completions context. So part
of requests can be completed, transport level contains
free requests but NVME layer don't have info about it
until it calls qpair_process_completions. And there is
a chance that on posix level when we poll sockets we
receive only EPOLLERR flag without EPOLLIN. In this
case we can complete several requests but don't call
qpair_process_completion so we don't resubmit queued
requests. It may lead to a hang in the end of test run
when there are no mo requests to be completed on transport
level (no EPOLLIN event) and we receive EPOLERR only,
so we can't resubmit queured requests.
This patch fixes this problem, it add a socket to
group's pending_recv list if we received EPOLERR
event and completed at least 1 socket request.
So socket's callback can be called even without
EPOLLIN event.
Fixes issue #1685
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I21d5c2fe6eb0787aab9531925a7f0e2fe18bafaa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6162
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The purpose is to reduce the duplicated functions
in posix and uring implmentation.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ia0568b2490d362e7e78fa59b3ca88a60313ba0bd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5284
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Since spdk_sock_group_poll should do level trigger to
the callback func registered in spdk_sock_group_add_sock,
if there is data in pipe, but not event occurs, poll func
should still reap sock who has data in pipe.
Change-Id: If3a983f80fd04708e45ad0398c7d34018ec52bc7
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5072
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
By design the opts for each implementation
can not match spdk_sock_impl_opts.
During get_opts for specific implementation
only used fields are filled.
Yet iterating over all spdk_sock_impl_opts fields
would yeild garbage values for unset fields.
This is the case right now when doing save_config RPC
with uring enabled. A garbe value for enable_zerocopy_send
is returned.
sock.c:829:62: runtime error: load of value 165, which is not a valid value for type '_Bool'
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ie0512a7dffc36c8ff89256d08f8a2f4fefcf9e83
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4699
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In NVME TCP initiator zero copy is enabled for IO qpairs
and disabled for admin qpairs
Change-Id: Ibdf521dccde9b95ec5dd15a5eb2baed8fcf8b88e
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4211
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
A preparation step for enabling zero copy in NVMEoF TCP initiator.
This option will be used to disable zero copy
for admin qpair. This is needed since the admin
qpair's socket is not connected to socket poll group
and we can't receive buffer reclaim notification.
Change-Id: Ibfbb8a156aafcd7ba8975a50f790da7fbd37d96f
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4210
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When net.core.optmem_max is not set high enough, a call to sendmsg()
might fail with ENOBUFS. Currently this is treated as an error.
When we have no more buffer space left, we should continue to process
any completions and by doing so, free up the auxiliary buffers we ran out
of.
With this change I was able to run perf against the spdk target with a
purposely set to a low, value of optmem_max, where previously it would
fail.
This fixes github issue #1592
Signed-off-by: Jeffry Molanus <jeffry.molanus@gmail.com>
Change-Id: Ieeeed4fbecd827d0da815456b57fbe81495fe54d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch is used to enable placement_id getting
in sock layer and also add the rpc support.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I70de57b0ed392a0aefce9d3ff1f61ef924015a87
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4146
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The type of sendmsg_idx is uint32_t, so the maximal
is 2^32 -1, so it could be overflow and get 0, so
we should fix it.
PS: I think that our code may have potential defect.
In my experiment, I try to init sendmsg_idx with 2^32 -1,
so the first req->internal.offset = 2^32 - 1.
But for the ee_info and ee_data in "struct sock_extended_err"
got from _sock_check_zcopy is all 0 in the target side.
So it means that the this req will never be completed.
With the increase of sendmsg_idx (the type is
uint32_t), sendmsg_idx will finally goto 2^32 - 1, so I
think it will still kick the issue I described.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ic9aaf629d73d5b7e2c81800a4f7f92c728adbc34
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3948
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch removes implementation of VPP socket abstraction
along with ways to compile it.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I089f7703cfc4fb517f8f80f4368e544bced549b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3734
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Add a helper function iscsi_parse_redirect_addr() to validate the
passed IP address-port pair.
iSCSI login redirection will support only numeric IP address and
TCP port, and add AI_NUMERICSERV and AI_NUMERICHOST.
This function is almost same as nvme_tcp_parse_addr() and
nvme_rdma_parse_addr().
Besides, update error log in posix_sock_create() to use
gai_strerror(). gai_strerror() will provide more accurate
information as done by nvme_tcp_parse_addr() and nvme_rdma_parse_addr().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I65c6de81a64dcb26551ce796172d0458e1c298a7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3357
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
TCP delayed ACK can be disabled or enabled by enabling or disabling
quick ACK, respectively.
The recently added spdk_sock_impl_opts is helpful for sock library
to control quick ACK.
Hence this patch adds and uses an option enable_quickack. The option
is effective only for the POSIX sock module.
We have spdk_sock_opts now too but spdk_sock_impl_opts will be better
for this case.
This option is not supported on FreeBSD. FreeBSD users can set the
option globally via sysctl if desired.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic89620267acce5872dc8ecaf7a99bb70ae97e993
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3603
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The pipe buffer has obvious performance influence on arm64. The
following is my test result with 1core, we can also enable it
on arm64 currently like the posix socket. And later we can find
the optimal pipe size that won't cause a degradation for large
payloads.
randwrite randread
512 byte 61% 97%
4096 byte 84% 16%
16384 byte -13% -17%
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: Ib4df60751c5e06ef9bd7fc7bb7efafa5ad4de211
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Zero copy send can cause performance degradation with small
payloads. This patch adds an option to disable it if required. By
default zero copy is enabled.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I14f2b21ad375e770cb08f850360898bac675b351
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Receive pipe reduces number of system calls and gives significant
performance improvement with kernel TCP stack and relatively small IO
sizes. With user space TCP/IP implementations there are no system
calls and double buffering introduced by pipe has negative impact on
performance. Receive pipe remains enabled by default.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: Ic5ddee42293df2c233ba7ffbe6662de7917ac586
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3343
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Call recv to trigger busy polling even when no socket is active. when
epoll_wait returns zero, the first socket in poll group is used to
trigger busy polling in kernel stack and potentially reap incoming data
Change-Id: I15f04cb4a2c7b382dd07391eda69678fd7919790
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3180
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
For some exceptional cases (e.g.,
https://github.com/spdk/spdk/issues/1486),
we may detect POLLERR or other events. So for those events,
we can just ingore it, but not use SPDK_UNREACHABLE.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I073575408783ff75e50b40d45ddf09388a2cab96
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3262
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Poller should return status > 0 when it did some work
(CPU was used for some time) marking its call as busy
CPU time.
Active pollers should return BUSY status only if they
did any meangful work besides checking some conditions
(e.g. processing requests, do some complicated operations).
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: Id4636a0997489b129cecfe785592cc97b50992ba
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2164
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>