abfdb70d04
The link titles were the names of the pages, but those names were then changed. Change-Id: If24711a941ca42db703e373eea56c6235bed6685 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/421550 Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
98 lines
5.9 KiB
Markdown
98 lines
5.9 KiB
Markdown
# User Space Drivers {#userspace}
|
|
|
|
# Controlling Hardware From User Space {#userspace_control}
|
|
|
|
Much of the documentation for SPDK talks about _user space drivers_, so it's
|
|
important to understand what that means at a technical level. First and
|
|
foremost, a _driver_ is software that directly controls a particular device
|
|
attached to a computer. Second, operating systems segregate the system's
|
|
virtual memory into two categories of addresses based on privilege level -
|
|
[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This
|
|
separation is aided by features on the CPU itself that enforce memory
|
|
separation called
|
|
[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically,
|
|
drivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that
|
|
instead are designed to run in user space, but they still interface directly
|
|
with the hardware device that they are controlling.
|
|
|
|
In order for SPDK to take control of a device, it must first instruct the
|
|
operating system to relinquish control. This is often referred to as unbinding
|
|
the kernel driver from the device and on Linux is done by
|
|
[writing to a file in sysfs](https://lwn.net/Articles/143397/).
|
|
SPDK then rebinds the driver to one of two special device drivers that come
|
|
bundled with Linux -
|
|
[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or
|
|
[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers
|
|
are "dummy" drivers in the sense that they mostly indicate to the operating
|
|
system that the device has a driver bound to it so it won't automatically try
|
|
to re-bind the default driver. They don't actually initialize the hardware in
|
|
any way, nor do they even understand what type of device it is. The primary
|
|
difference between uio and vfio is that vfio is capable of programming the
|
|
platform's
|
|
[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit),
|
|
which is a critical piece of hardware for ensuring memory safety in user space
|
|
drivers. See @ref memory for full details.
|
|
|
|
Once the device is unbound from the operating system kernel, the operating
|
|
system can't use it anymore. For example, if you unbind an NVMe device on Linux,
|
|
the devices corresponding to it such as /dev/nvme0n1 will disappear. It further
|
|
means that filesystems mounted on the device will also be removed and kernel
|
|
filesystems can no longer interact with the device. In fact, the entire kernel
|
|
block storage stack is no longer involved. Instead, SPDK provides re-imagined
|
|
implementations of most of the layers in a typical operating system storage
|
|
stack all as C libraries that can be directly embedded into your application.
|
|
This includes a [block device abstraction layer](@ref bdev) primarily, but
|
|
also [block allocators](@ref blob) and [filesystem-like components](@ref blobfs).
|
|
|
|
User space drivers utilize features in uio or vfio to map the
|
|
[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device
|
|
into the current process, which allows the driver to perform
|
|
[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref
|
|
nvme, for instance, maps the BAR for the NVMe device and then follows along
|
|
with the
|
|
[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf)
|
|
to initialize the device, create queue pairs, and ultimately send I/O.
|
|
|
|
# Interrupts {#userspace_interrupts}
|
|
|
|
SPDK polls devices for completions instead of waiting for interrupts. There
|
|
are a number of reasons for doing this: 1) practically speaking, routing an
|
|
interrupt to a handler in a user space process just isn't feasible for most
|
|
hardware designs, 2) interrupts introduce software jitter and have significant
|
|
overhead due to forced context switches. Operations in SPDK are almost
|
|
universally asynchronous and allow the user to provide a callback on
|
|
completion. The callback is called in response to the user calling a function
|
|
to poll for completions. Polling an NVMe device is fast because only host
|
|
memory needs to be read (no MMIO) to check a queue pair for a bit flip and
|
|
technologies such as Intel's
|
|
[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html)
|
|
will ensure that the host memory being checked is present in the CPU cache
|
|
after an update by the device.
|
|
|
|
# Threading {#userspace_threading}
|
|
|
|
NVMe devices expose multiple queues for submitting requests to the hardware.
|
|
Separate queues can be accessed without coordination, so software can send
|
|
requests to the device from multiple threads of execution in parallel without
|
|
locks. Unfortunately, kernel drivers must be designed to handle I/O coming
|
|
from lots of different places either in the operating system or in various
|
|
processes on the system, and the thread topology of those processes changes
|
|
over time. Most kernel drivers elect to map hardware queues to cores (as close
|
|
to 1:1 as possible), and then when a request is submitted they look up the
|
|
correct hardware queue for whatever core the current thread happens to be
|
|
running on. Often, they'll need to either acquire a lock around the queue or
|
|
temporarily disable interrupts to guard against preemption from threads
|
|
running on the same core, which can be expensive. This is a large improvement
|
|
from older hardware interfaces that only had a single queue or no queue at
|
|
all, but still isn't always optimal.
|
|
|
|
A user space driver, on the other hand, is embedded into a single application.
|
|
This application knows exactly how many threads (or processes) exist
|
|
because the application created them. Therefore, the SPDK drivers choose to
|
|
expose the hardware queues directly to the application with the requirement
|
|
that a hardware queue is only ever accessed from one thread at a time. In
|
|
practice, applications assign one hardware queue to each thread (as opposed to
|
|
one hardware queue per core in kernel drivers). This guarantees that the thread
|
|
can submit requests without having to perform any sort of coordination (i.e.
|
|
locking) with the other threads in the system.
|