d0bd125135
MFC after: 1 week Sponsored by: Mellanox Technologies
Introduction ============ libibverbs is a library that allows programs to use RDMA "verbs" for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace. For more information on RDMA verbs, see the InfiniBand Architecture Specification vol. 1, especially chapter 11, and the RDMA Consortium's RDMA Protocol Verbs Specification. Using libibverbs ================ Device nodes ------------ The verbs library expects special character device files named /dev/infiniband/uverbsN to be created. When you load the kernel modules, including both the low-level driver for your IB hardware as well as the ib_uverbs module, you should see one or more uverbsN entries in /sys/class/infiniband_verbs in addition to the /dev/infiniband/uverbsN character device files. To create the appropriate character device files automatically with udev, a rule like KERNEL="uverbs*", NAME="infiniband/%k" can be used. This will create device nodes named /dev/infiniband/uverbs0 and so on. Since the RDMA userspace verbs should be safe for use by non-privileged users, you may want to add an appropriate MODE or GROUP to your udev rule. Permissions ----------- To use IB verbs from userspace, a process must be able to access the appropriate /dev/infiniband/uverbsN special device file. You can check the permissions on this file with the command ls -l /dev/infiniband/uverbs* Make sure that the permissions on these files are such that the user/group that your verbs program runs as can access the device file. To use IB verbs from userspace, a process must also have permission to tell the kernel to lock sufficient memory for all of your registered memory regions as well as the memory used internally by IB resources such as queue pairs (QPs) and completion queues (CQs). To check your resource limits, use the command ulimit -l (or "limit memorylocked" for csh-like shells). If you see a small number such as 32 (the units are KB) then you will need to increase this limit. This is usually done for ordinary users via the file /etc/security/limits.conf. More configuration may be necessary if you are logging in via OpenSSH and your sshd is configured to use privilege separation. Valgrind support ---------------- When running applications that use libibverbs under the Valgrind memory-checking debugger, Valgrind will falsely report "read from uninitialized" for memory that was initialized by the kernel drivers. Specifically, Valgrind cannot see when kernel drivers write to userspace memory, so when the process reads from that memory, Valgrind incorrectly assumes that the memory contents are uninitialized, and therefore raises a warning. libibverbs can be built with specific support for the Valgrind memory-checking debugger by specifying the --with-valgrind command line argument to configure. This flag enables code in libibverbs to tell Valgrind "this memory may look uninitialized, but it's really OK," which therefore suppresses the incorrect "read from uninitialized" warnings. This code adds trivial overhead to the critical performance path, so it is disabled by default. The intent is that production users can use a "normal" build of libibverbs and developers can use the "valgrind debug" build by simply switching their LD_LIBRARY_PATH environment variables. Libibverbs needs some header files from Valgrind in order to compile this support; it is important to use the header files from the same version of Valgrind that will be used at run time. You may need to specify the directory where Valgrind's header files are installed as an argument to --with-valgrind. For example ./configure --with-valgrind=/opt/valgrind will make the libibverbs build look for valgrind headers in /opt/valgrind/include Reporting bugs ============== Bugs should be reported to the OpenFabrics mailing list <general@lists.openfabrics.org>. In your bug report, please include: * Information about your system: - Linux distribution and version - Linux kernel and version - InfiniBand/iWARP hardware and firmware version - ... any other relevant information * How to reproduce the bug. Command line arguments for a libibverbs example program or source code that other developers can compile and run is most convenient. * If the bug is a crash, the exact output printed out when the crash occurred, including any kernel messages produced. * If a verbs call is mysteriously returning an error or failing, the output of "strace -ewrite -ewrite=all <command>". Submitting patches ================== Patches should also be submitted to the OpenFabrics mailing list <general@lists.openfabrics.org>. Please use unified diff form (the -u option to GNU diff), and include a good description of what your patch does and why it should be applied. If your patch fixes a bug, please make sure to describe the bug and how your fix works. Please include a change to the ChangeLog file (in standard GNU changelog format) as part of your patch. Make sure that your contribution can be licensed under the same license as the original code you are patching, and that you have all necessary permissions to release your work. TODO ==== 1.1 series ---------- The libibverbs API and ABI are frozen for all releases in the 1.1 series. Methods were added to struct ibv_context to implement the following features, so it should be possible to add them in a future release in the 1.1 series: * Memory window (MW) support. * Implement the reregister memory region (MR) verb. We will add an extension to the IB spec to allow the application to indicate that the region is only being extended, and that operations in progress should _not_ fail (contrary to the IB spec, which states that reregister must be implemented so that it behaves equivalently to a deregister followed by a register). Other possibilities ------------------- There are no plans to implement the following features, which would be needed for completeness but don't seem particularly useful. However, if there is demand from application developers or an implementation is contributed, then the feature may be added. * Implement the query address handle (AH) verb. * Implement the query memory region (MR) verb.