When zcero copy send is enabled and used by initiator,
it could significantly increase latency in some payloads.
To enable more fine graing configuration of zero copy
send feature, add new parameters enable_zerocopy_send_server
and enable_zerocopy_send_client to spdk_sock_impl_opts to
enable/disable zcopy for specific type of sockets.
Exisiting enable_zerocopy_send parameter affects all types
of sockets.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I111c75608f8826980a56e210c076ab8ff16ddbdc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7457
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
On host side the connections are created and then added to thread's
poll group. Those connections could use different NIC queues underneath.
To route all connections of poll group through single queue a unique
placement id is chosen as group_placement_id and each socket of poll
group is marked with group_placment_id using getsockopt(SO_MARK) option.
The driver could use so_mark value of skb to determine the queue to use.
Change-Id: I06bda777fe07a62133b80b2491fa7772150b3b5d
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6160
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Leverage SO_INCOMING_CPU to get the CPU affinity of connections
(sockets). And allocate the connections to specific poll groups,
which aims to utilize cache locality.
From our test:
6 P4600 NVMe on target,target uses 8 cores, NIC irqs are bound to
these 8 cores, and initiator side uses 24 and 32 cores,
we can get 11%~17% randwrite performance boost for posix, and 8%~12%
for uring.
Change-Id: I011e0a21502c85adcccd4a14fbe9838b43f54976
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5748
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When uring is enabled, uring socket implementation is
used to create sockets. We may want to use posix sockets
for some reasons (e.g. performance tests). This patch adds
a new API function to set the socket implementation which
will be used by default, e.g. when no impl_name is passed
to spdk_sock_connect/spdk_sock_listen functions.
Misc changes: include spdk_internal/log.h to register
SOCK log component. The new include header already
includes spdk/sock.h and spdk/queue.h, sow remove
direct inclusion of these headers.
Change-Id: I4abad0a59cd033b15bd43a00e3dbdf313fa6b06c
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4327
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
A preparation step for enabling zero copy in NVMEoF TCP initiator.
This option will be used to disable zero copy
for admin qpair. This is needed since the admin
qpair's socket is not connected to socket poll group
and we can't receive buffer reclaim notification.
Change-Id: Ibfbb8a156aafcd7ba8975a50f790da7fbd37d96f
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4210
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch is used to enable placement_id getting
in sock layer and also add the rpc support.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I70de57b0ed392a0aefce9d3ff1f61ef924015a87
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4146
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch removes implementation of VPP socket abstraction
along with ways to compile it.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I089f7703cfc4fb517f8f80f4368e544bced549b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3734
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
TCP delayed ACK can be disabled or enabled by enabling or disabling
quick ACK, respectively.
The recently added spdk_sock_impl_opts is helpful for sock library
to control quick ACK.
Hence this patch adds and uses an option enable_quickack. The option
is effective only for the POSIX sock module.
We have spdk_sock_opts now too but spdk_sock_impl_opts will be better
for this case.
This option is not supported on FreeBSD. FreeBSD users can set the
option globally via sysctl if desired.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic89620267acce5872dc8ecaf7a99bb70ae97e993
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3603
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Zero copy send can cause performance degradation with small
payloads. This patch adds an option to disable it if required. By
default zero copy is enabled.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I14f2b21ad375e770cb08f850360898bac675b351
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Receive pipe reduces number of system calls and gives significant
performance improvement with kernel TCP stack and relatively small IO
sizes. With user space TCP/IP implementations there are no system
calls and double buffering introduced by pipe has negative impact on
performance. Receive pipe remains enabled by default.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: Ic5ddee42293df2c233ba7ffbe6662de7917ac586
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3343
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
spdk_sock_impl_get/set_opts functions allow to set different socket layer
configuration options. Options can be set independently for each
socket layer implementation.
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I617e58366a153fae2cf0de1b271cc4f4f19ec451
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/607
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Purpose: This is used to make users can specify
some options on the socket, e.g., the different priority for the socket.
While creating sockets, the priority needs to be set before connect()
and listen system calls, so better to add one parameter in spdk_sock_opts
which can contain options (e.g., priority) in spdk_sock_listen_ext and
spdk_sock_connect_ext functions.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: I406238e9da7abd69f937b7072535a19124ed0169
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1874
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Since the related feature is already contained in
spdk_sock_listen and spdk_sock_connect functions,
we no longer need this function.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I1eafff0d139fa266a355fbee2bf0fc3947db69fc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1876
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If available, automatically use MSG_ZEROCOPY when sending on sockets.
Storage workloads contain sufficient data transfer sizes that this is
always a performance improvement, regardless of workload.
Change-Id: I14429d78c22ad3bc036aec13c9fce6453e899c92
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471752
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Or Gerlitz <gerlitz.or@gmail.com>
Initiator drivers (e.g nvme/tcp) don't use poll groups but rather directly
poll the qpair. In this case we want to allow the polling function (e.g
_qpair_process_completions()) to flush async writes pending on the socket.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Change-Id: Ibd8c73691213d58e287b7110d0f5a381a89a64d0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475419
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Purpose: With this patch,
(1)We can support using different sock implementations in
one application together.
(2)For one IP address managed by kernel, we can use different method
to listen/connect, e.g., posix, or uring. With this patch, we can
designate the specified sock implementation if impl_name is not NULL
and valid. Otherwise, spdk_sock_listen/connect will try to use the sock
implementations in the list by order if impl_name is NULL.
Without this patch, the app will always use the same type of sock implementation
if the order is fixed. For example, if we have posix and uring together,
the first one will always be uring.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ic49563f5025085471d356798e522ff7ab748f586
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/478140
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add spdk_sock_writev_async for performing asynchronous writes to
sockets. The user of this call is responsible for allocating their own
spdk_sock_request structures to pass to this call.
spdk_sock_writev_async will not return EAGAIN and will instead leave the
requests queued until they are fully sent or aborted due to socket
error.
Change-Id: Idf3239e65d26a3024e578122c23e4fb8f95e241b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470523
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This is useful for detecting sockets that have been disconnected
by the other end without reading data.
Change-Id: Ieb6529984d282d48373766d9f5555cf11720f19b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470513
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
spdk_sock_group_poll() and spdk_sock_group_poll_count() had returned
0 on success. The implementation didn't match the specification
described in the header file, and couldn't be used to collect stats
correctly because 0 means idle.
This patch fixes the return value of spdk_sock_group_poll() and
spdk_sock_group_poll_count() to return number of events and
the callers not to overwrite the return value by 0.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7e2a17187fc74ea44d3acf2f35d63f5e5a254eda
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463710
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Purpose: This API can be used to set the socket
with different priority.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I9df1122bf6ae640eba731e635a1784f4e9da4104
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461738
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
And also add spdk_sock_group_get_ctx function
Change-Id: I2a2a58b0588ff7d99d3538ea0a633a3b8c7a234b
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454538
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Also add the mapping table and the operations between placement_id and
sock_group
Change-Id: I31868e241fdd20252c2d79792ff1239e6d23afb8
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454537
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add an new API spdk_sock_readv(sock, iov, iovcnt) to the sock
library. This will be used in SPDK iSCSI target first.
Implementation was done based on vcom_socket_readv in VPP.
Change-Id: I88a8f2af4856b1035165b78d76b4a4f4587b265d
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446343
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Purpose: We need to get the port info in other applications
(e.g., NVMe-oF TCP/IP transport)
Change-Id: I3a4636e764e44425436bb064cb0062c6f3e44035
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/428313
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
These were related to recent scsi.h and sock.h
patches.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I17db2eefcbf3ade7a5baad68fb2e12a9474c1ce0
Reviewed-on: https://review.gerrithub.io/416034
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
For now, this provides common abstraction for Linux epoll
and FreeBSD kqueue. It also provides the basis for future
changes where alternate userspace TCP stacks have their own
mechanism for polling a group of descriptors.
While here, remove old epoll/kqueue code in iscsi/conn.c that
was commented out when the iSCSI idle connection code was
recently removed - we now have a real implementation of it
in sock.c so the original code is no longer needed as a
reference.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I664ae32a5ff4d37711b7f534149eb0eb35942335
Reviewed-on: https://review.gerrithub.io/398969
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
This provides an abstraction layer around TCP
sockets. Previously we just used fd integers, but
we don't want to be tied to integers for alternative
userspace TCP stacks.
Future patches will do more work to enable multiple
implementations of this abstraction. For now, just
get the abstraction in place for POSIX sockets and
make all of the iSCSI changes associated with it.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9a825e9e02eb6927c8702d205665c626f57b3771
Reviewed-on: https://review.gerrithub.io/398861
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
The socket-related code was already broken out into
lib/net/sock.c, so break out the header portions
from include/spdk/net.h into its own sock.h.
This prepares for some upcoming changes in how
TCP sockets are abstracted, to enable alternative
userspace TCP stack implementations to be used with
SPDK.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I40b162e72ea80c235b49f10b17c2085fcfb385d4
Reviewed-on: https://review.gerrithub.io/398851
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-by: <shuhei.matsumoto.xt@hitachi.com>