From 5f03f96fbefbb5c68a5d7d06728ff5b4a05f87b0 Mon Sep 17 00:00:00 2001 From: Mark Johnston Date: Fri, 3 Feb 2023 10:55:30 -0500 Subject: [PATCH] shm: Document shm_create_largepage() While here, move notes about FreeBSD-specific functionality to the COMPATIBILITY section, and document the ECAPMODE error for shm_open(). Reviewed by: pauamma, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38282 --- lib/libc/sys/Makefile.inc | 1 + lib/libc/sys/shm_open.2 | 171 +++++++++++++++++++++++++++++++++++--- 2 files changed, 162 insertions(+), 10 deletions(-) diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc index 5e2c3da198b0..a86d7d160b6c 100644 --- a/lib/libc/sys/Makefile.inc +++ b/lib/libc/sys/Makefile.inc @@ -482,6 +482,7 @@ MLINKS+=setuid.2 setegid.2 \ setuid.2 setgid.2 MLINKS+=shmat.2 shmdt.2 MLINKS+=shm_open.2 memfd_create.3 \ + shm_open.2 shm_create_largepage.3 \ shm_open.2 shm_unlink.2 \ shm_open.2 shm_rename.2 MLINKS+=sigwaitinfo.2 sigtimedwait.2 diff --git a/lib/libc/sys/shm_open.2 b/lib/libc/sys/shm_open.2 index 4c03288b6bbe..a728bd0d4abf 100644 --- a/lib/libc/sys/shm_open.2 +++ b/lib/libc/sys/shm_open.2 @@ -28,11 +28,11 @@ .\" .\" $FreeBSD$ .\" -.Dd June 25, 2021 +.Dd January 30, 2023 .Dt SHM_OPEN 2 .Os .Sh NAME -.Nm memfd_create , shm_open , shm_rename, shm_unlink +.Nm memfd_create , shm_create_largepage , shm_open , shm_rename, shm_unlink .Nd "shared memory object operations" .Sh LIBRARY .Lb libc @@ -43,6 +43,14 @@ .Ft int .Fn memfd_create "const char *name" "unsigned int flags" .Ft int +.Fo shm_create_largepage +.Fa "const char *path" +.Fa "int flags" +.Fa "int psind" +.Fa "int alloc_policy" +.Fa "mode_t mode" +.Fc +.Ft int .Fn shm_open "const char *path" "int flags" "mode_t mode" .Ft int .Fn shm_rename "const char *path_from" "const char *path_to" "int flags" @@ -51,7 +59,7 @@ .Sh DESCRIPTION The .Fn shm_open -system call opens (or optionally creates) a +function opens (or optionally creates) a POSIX shared memory object named .Fa path . @@ -114,9 +122,7 @@ see and .Xr fcntl 2 . .Pp -As a -.Fx -extension, the constant +The constant .Dv SHM_ANON may be used for the .Fa path @@ -143,6 +149,131 @@ will fail with All other flags are ignored. .Pp The +.Fn shm_create_largepage +function behaves similarly to +.Fn shm_open , +except that the +.Dv O_CREAT +flag is implicitly specified, and the returned +.Dq largepage +object is always backed by aligned, physically contiguous chunks of memory. +This ensures that the object can be mapped using so-called +.Dq superpages , +which can improve application performance in some workloads by reducing the +number of translation lookaside buffer (TLB) entries required to access a +mapping of the object, +and by reducing the number of page faults performed when accessing a mapping. +This happens automatically for all largepage objects. +.Pp +An existing largepage object can be opened using the +.Fn shm_open +function. +Largepage shared memory objects behave slightly differently from non-largepage +objects: +.Bl -bullet -offset indent +.It +Memory for a largepage object is allocated when the object is +extended using the +.Xr ftruncate 2 +system call, whereas memory for regular shared memory objects is allocated +lazily and may be paged out to a swap device when not in use. +.It +The size of a mapping of a largepage object must be a multiple of the +underlying large page size. +Most attributes of such a mapping can only be modified at the granularity +of the large page size. +For example, when using +.Xr munmap 2 +to unmap a portion of a largepage object mapping, or when using +.Xr mprotect 2 +to adjust protections of a mapping of a largepage object, the starting address +must be large page size-aligned, and the length of the operation must be a +multiple of the large page size. +If not, the corresponding system call will fail and set +.Va errno +to +.Er EINVAL . +.El +.Pp +The +.Fa psind +argument to +.Fn shm_create_largepage +specifies the size of large pages used to back the object. +This argument is an index into the page sizes array returned by +.Xr getpagesizes 3 . +In particular, all large pages backing a largepage object must be of the +same size. +For example, on a system with large page sizes of 2MB and 1GB, a 2GB largepage +object will consist of either 1024 2MB pages, or 2 1GB pages, depending on +the value specified for the +.Fa psind +argument. +The +.Fa alloc_policy +parameter specifies what happens when an attempt to use +.Xr ftruncate 2 +to allocate memory for the object fails. +The following values are accepted: +.Bl -tag -offset indent -width SHM_ +.It Dv SHM_LARGEPAGE_ALLOC_DEFAULT +If the (non-blocking) memory allocation fails because there is insufficient free +contiguous memory, the kernel will attempt to defragment physical memory and +try another allocation. +The subsequent allocation may or may not succeed. +If this subsequent allocation also fails, +.Xr ftruncate 2 +will fail and set +.Va errno +to +.Er ENOMEM . +.It Dv SHM_LARGEPAGE_ALLOC_NOWAIT +If the memory allocation fails, +.Xr ftruncate 2 +will fail and set +.Va errno +to +.Er ENOMEM . +.It Dv SHM_LARGEPAGE_ALLOC_HARD +The kernel will attempt defragmentation until the allocation succeeds, +or an unblocked signal is delivered to the thread. +However, it is possible for physical memory to be fragmented such that the +allocation will never succeed. +.El +.Pp +The +.Dv FIOSSHMLPGCNF +and +.Dv FIOGSHMLPGCNF +.Xr ioctl 2 +commands can be used with a largepage shared memory object to get and set +largepage object parameters. +Both commands operate on the following structure: +.Bd -literal +struct shm_largepage_conf { + int psind; + int alloc_policy; +}; + +.Ed +The +.Dv FIOGSHMLPGCNF +command populates this structure with the current values of these parameters, +while the +.Dv FIOSSHMLPGCNF +command modifies the largepage object. +Currently only the +.Va alloc_policy +parameter may be modified. +Internally, +.Fn shm_create_largepage +works by creating a regular shared memory object using +.Fn shm_open , +and then converting it into a largepage object using the +.Dv FIOSSHMLPGCNF +ioctl command. +.Pp +The .Fn shm_rename system call atomically removes a shared memory object named .Fa path_from @@ -162,10 +293,6 @@ Return an error if an shm exists at .Fa path_to , rather than unlinking it. .El -.Fn shm_rename -is also a -.Fx -extension. .Pp The .Fn shm_unlink @@ -235,6 +362,17 @@ All functions return -1 on failure, and set to indicate the error. .Sh COMPATIBILITY The +.Fn shm_create_largepage +and +.Fn shm_rename +functions are +.Fx +extensions, as is support for the +.Dv SHM_ANON +value in +.Fn shm_open . +.Pp +The .Fa path , .Fa path_from , and @@ -377,6 +515,18 @@ and are specified and the named shared memory object does exist. .It Bq Er EACCES The required permissions (for reading or reading and writing) are denied. +.It Bq Er ECAPMODE +The process is running in capability mode (see +.Xr capsicum 4 ) +and attempted to create a named shared memory object. +.El +.Pp +.Fn shm_create_largepage +can fail for the reasons listed above. +It also fails with these error codes for the following conditions: +.Bl -tag -width Er +.It Bq Er ENOTTY +The kernel does not support large pages on the current platform. .El .Pp The following errors are defined for @@ -425,6 +575,7 @@ requires write permission to the shared memory object. .Xr close 2 , .Xr fstat 2 , .Xr ftruncate 2 , +.Xr ioctl 2 , .Xr mmap 2 , .Xr munmap 2 , .Xr sendfile 2