The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and
O_NONBLOCK (on both sides) as part of the function.
If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).
If the pointer is not valid, behaviour differs: pipe2() writes into the
array from the kernel like socketpair() does, while pipe() writes into the
array from an architecture-specific assembler wrapper.
Reviewed by: kan, kib
The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)
The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.
Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.
There are no getdtablesize() bounds on the file descriptor to be duplicated;
it only has to be open. If the RLIMIT_NOFILE rlimit was decreased after
opening the file descriptor, it may be greater than or equal to
getdtablesize() but still valid.
MFC after: 1 week
The functions utx_active_add(), utx_active_remove(), utx_lastlogin_add() and
utx_log_add() set errno to 0 if they are successful. This not only violates
POSIX if pututxline() is successful, but may also overwrite a valid error
with 0 if, for example, utx_lastlogin_add() fails while utx_log_add()
succeeds.
Reviewed by: ed
signal.
- Fix the old ksem implementation for POSIX semaphores to not restart
sem_wait() or sem_timedwait() if interrupted by a signal.
MFC after: 1 week
extattr_set_{fd,file,link} is logically a write(2)-like operation and
should return ssize_t, just like extattr_get_*. Also, the user-space
utility was using an int for the return value of extattr_get_* and
extattr_list_*, both of which return an ssize_t.
MFC after: 1 week
Words in shell script are separated by spaces or tabs independent of the
value of IFS. The value of IFS is only relevant for the result of
substitutions. Therefore, there should be a space between 'wordexp' and the
words to be expanded, not an IFS character.
Paranoia might dictate that the shell ignore IFS from the environment (even
though our sh currently uses it), so do not depend on it in the new test
case.
While almost nobody uses O_ASYNC, and rightly so, the inheritance of the
related properties across accept() is a portability issue like the
inheritance of O_NONBLOCK.
on signed integer overflow wrapping. Otherwise mktime(3) and timegm(3)
can hang, in case the timestamp passed in struct tm is not representable
in a time_t. Specifically, any timestamp after 2038-01-19 03:14:07, in
combination with a 32-bit time_t.
Note that it would be better to change the code to not rely on undefined
behaviour, but it is contributed code, and it is not entirely trivial to
fix the issue properly.
MFC after: 3 days
u_long. Before this change it was of type int for syscalls, but prototypes
in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
for lchflags(2)) stated that it was u_long. Now some related functions
use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.
Discussed on: arch
Sponsored by: The FreeBSD Foundation
This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.
The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.
The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.
For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.
Reviewed by: kib
multibyte support[0] and the new functions strenvisx and strsenvisx.
Add MLINKS for vis(3) functions add by this and the initial import from
NetBSD[1].
PR: bin/166364, bin/175418
Submitted by: "J.R. Oldroyd" <fbsd@opal.com>[0]
stefanf[1]
Obtained from: NetBSD
MFC after: 2 weeks
It is almost always a bug if nscd closes the connection unexpectedly but
programs should not be killed with SIGPIPE for it.
Reviewed by: bushman
Tested by: Jan Beich
MFC after: 1 week
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);
which allow to bind and connect respectively to a UNIX domain socket with a
path relative to the directory associated with the given file descriptor 'fd'.
- Add manual pages for the new syscalls.
- Make the new syscalls available for processes in capability mode sandbox.
- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
the directory descriptor for the syscalls to work.
- Update audit(4) to support those two new syscalls and to handle path
in sockaddr_un structure relative to the given directory descriptor.
- Update procstat(1) to recognize the new capability rights.
- Document the new capability rights in cap_rights_limit(2).
Sponsored by: The FreeBSD Foundation
Discussed with: rwatson, jilles, kib, des
- Capability is no longer separate descriptor type. Now every descriptor
has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and
should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of
cap_new(2), which limits capability rights of the given descriptor
without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed
ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls
that can be used with the new cap_fcntls_limit(2) syscall and retrive
them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was
heavly modified.
- The audit subsystem, kdump and procstat tools were updated to
recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide
backward API and ABI compatibility there are some incompatible changes
that are described in detail below:
CAP_CREATE old behaviour:
- Allow for openat(2)+O_CREAT.
- Allow for linkat(2).
- Allow for symlinkat(2).
CAP_CREATE new behaviour:
- Allow for openat(2)+O_CREAT.
Added CAP_LINKAT:
- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
- Allow to be target for renameat(2).
Added CAP_SYMLINKAT:
- Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour:
- Allow for unlinkat(2) when removing non-directory object.
- Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour:
- Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT:
- Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
- Allow for unlinkat(2) on any object.
- Required if target of renameat(2) exists and will be removed by this
call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour:
- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
PROT_WRITE.
CAP_MMAP new behaviour:
- Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R:
- Allow for mmap(PROT_READ).
Added CAP_MMAP_W:
- Allow for mmap(PROT_WRITE).
Added CAP_MMAP_X:
- Allow for mmap(PROT_EXEC).
Added CAP_MMAP_RW:
- Allow for mmap(PROT_READ | PROT_WRITE).
Added CAP_MMAP_RX:
- Allow for mmap(PROT_READ | PROT_EXEC).
Added CAP_MMAP_WX:
- Allow for mmap(PROT_WRITE | PROT_EXEC).
Added CAP_MMAP_RWX:
- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT.
Renamed CAP_MKFIFO to CAP_MKFIFOAT.
Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour:
- Allow pread(2).
- Disallow read(2), readv(2) (if there is no CAP_SEEK).
CAP_READ new behaviour:
- Allow read(2), readv(2).
- Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour:
- Allow pwrite(2).
- Disallow write(2), writev(2) (if there is no CAP_SEEK).
CAP_WRITE new behaviour:
- Allow write(2), writev(2).
- Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ)
#define CAP_PWRITE (CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ)
#define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
#define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W)
#define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X)
#define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X)
#define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
#define CAP_RECV CAP_READ
#define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \
(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X
#define CAP_DELETE CAP_UNLINKAT
#define CAP_MKDIR CAP_MKDIRAT
#define CAP_RMDIR CAP_UNLINKAT
#define CAP_MKFIFO CAP_MKFIFOAT
#define CAP_MKNOD CAP_MKNODAT
#define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation
Reviewed by: Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with: rwatson, benl, jonathan
ABI compatibility discussed with: kib
system call, which has a nice property - it never fails, so it is a bit
easier to use. If there is no support for capability mode in the kernel
the function will return false (not in a sandbox). If the kernel is compiled
with the support for capability mode, the function will return true or false
depending if the calling process is in the capability mode sandbox or not
respectively.
Sponsored by: The FreeBSD Foundation
routines provide write-only stdio FILE objects that store their data in a
dynamically allocated buffer. They are a string builder interface somewhat
akin to a completely dynamic sbuf.
Reviewed by: bde, jilles (earlier versions)
MFC after: 1 month
* Reopen the directory using openat(fd, ".", ...) instead of opening the
pathname again. This fixes a race condition where the meaning of the
pathname changes and allows a reopen with fdopendir().
* Always reopen the directory for union stacks, not only when DTF_REWIND
is passed. Applications should be able to fchdir(dirfd(dir)) and
*at(dirfd(dir), ...). DTF_REWIND now does nothing.
example from bsearch(3) too, so that we don't have to duplicate
the example code in both places.
PR: docs/176197
Reviewed by: stefanf
Approved by: remko (mentor), gjb (mentor)
MFC after: 1 week
- Remove unused #include.
- Do not cast away const.
- Use the canonical idiom to compare two numbers.
- Use proper type for sizes, i.e. size_t instead of int.
- Correct indentation.
- Simplify printf("\n") to puts("").
- Use return instead of exit() in main().
Submitted by: Christoph Mallon, christoph.mallon at gmx.de
Approved by: gjb (mentor)
Reviewed by: stefanf
MFC after: 1 week
qsort(3) can work together to sort an array of integers.
PR: docs/176197
Submitted by: Fernando, fapesteguia at opensistemas.com
Approved by: gjb (mentor)
MFC after: 1 week
now disables read-ahead. It used to effectively restore the system default
readahead hueristic if it had been changed; a negative value now restores
the default.
Reviewed by: kib
this is not a problem as they are resolved by libgcc. The exception is for
the __aeabi_mem* functions. These call back into libc to the appropriate
function. This causes issues for static binaries as we only link against
libc once so there is no way for it to call into libgcc and back.
The fix for this is to include these symbols in libc but keep them hidden
so binaries use the libgcc version.
There are uncommon cases where fts_safe_changedir() may be called with a
non-NULL name that is not "..". Do not block or worse if an attacker put (a
(symlink to) a fifo or device where a directory used to be.
MFC after: 1 week
with the user's namespace.
- Correct size and position variables type from long to size_t.
- Do not set errno to ENOMEM on malloc failure, as malloc already does so.
- Implement the concept of "buffer data length", which mandates what SEEK_END
refers to and the allowed extent for a read.
- Use NULL as read-callback if the buffer is opened in write-only mode.
Conversely, use NULL as write-callback when opened in read-only mode.
- Implement the handling of the ``b'' character in the mode argument. A binary
buffer differs from a text buffer (default mode if ``b'' is omitted) in that
NULL bytes are never appended to writes and that the "buffer data length"
equals to the size of the buffer.
- Remove shall from the man page. Use indicative instead. Also, specify that
the ``b'' flag does not conform with POSIX but is supported by glibc.
- Update the regression test so that the ``b'' functionality and the "buffer
data length" concepts are tested.
- Minor style(9) corrections.
Suggested by: jilles
Reviewed by: cognet
Approved by: cognet
but use normal references instead of weak. This makes the statically
linked binaries to use fast gettimeofday(2) by forcing the linker to
resolve references and providing the neccessary functions.
Reported by: bde
Tested by: marius (sparc64)
MFC after: 2 weeks
in r7 and use ip to store the old version of r7 as it is not guaranteed to
be kept when calling a subroutine. The kernel will preserve the register
across system calls.
path longer than this.
- Fix an unreached case of check against sizeof buf, which in turn leads
to an off-by-one nul byte write on the stack. The original condition
can never be satisfied because the passed boundary is the maximum value
that can be returned, so code was harmless.
MFC after: 1 month
NetBSD's. This output size limited versions of vis and unvis functions
as well as a set of vis variants that allow arbitrary characters to be
specified for encoding.
Finally, MIME Quoted-Printable encoding as described in RFC 2045 is
supported.
A fork/exec could happen between open and fcntl, leaking a file descriptor.
Using O_CLOEXEC fixes this and as a side effect simplifies the code.
NetBSD already had this (I checked this after making the change myself).
Reviewed by: gabor
The changes were derived from what has been committed to NetBSD, with
modifications. These are:
1. Preserve the existsing GLOB_LIMIT behaviour by including the number
of matches to the set of parameters to limit.
2. Change some of the limits to avoid impacting normal use cases:
GLOB_LIMIT_STRING - change from 65536 to ARG_MAX so that glob(3)
can still provide a full command line of expanded names.
GLOB_LIMIT_STAT - change from 128 to 1024 for no other reason than
that 128 feels too low (it's not a limit that impacts the
behaviour of the test program listed in CVE-2010-2632).
GLOB_LIMIT_PATH - change from 1024 to 65536 so that glob(3) can
still provide a fill command line of expanded names.
3. Protect against buffer overruns when we hit the GLOB_LIMIT_STAT or
GLOB_LIMIT_READDIR limits. We append SEP and EOS to pathend in
those cases. Return GLOB_ABORTED instead of GLOB_NOSPACE when we
would otherwise overrun the buffer.
This change also modifies the existing behaviour of glob(3) in case
GLOB_LIMIT is specifies by limiting the *new* matches and not all
matches. This is an important distinction when GLOB_APPEND is set or
when the caller uses a non-zero gl_offs. Previously pre-existing
matches or the value of gl_offs would be counted in the number of
matches even though the man page states that glob(3) would return
GLOB_NOSPACE when gl_matchc or more matches were found.
The limits that cannot be circumvented are GLOB_LIMIT_STRING and
GLOB_LIMIT_PATH all others can be crossed by simply calling glob(3)
again and with GLOB_APPEND set.
The entire description above applies only when GLOB_LIMIT has been
specified of course. No limits apply when this flag isn't set!
Obtained from: Juniper Networks, Inc
equivalent to malloc(size). This eliminates the conditional expression
used for calling either realloc() or malloc() when realloc() will do
all the time.
free and clear the gl_pathv pointer in the glob_t structure. Such
breaks the invariant of the glob_t structure, as stated in the comment
right in front of the globextend() function. If gl_pathv was non-NULL,
then gl_pathc was > 0. Making gl_pathv a NULL pointer without also
setting gl_pathc to 0 is wrong.
Since we otherwise don't free the memory associated with a glob_t in
error cases, it's unlikely that this change will cause a memory leak
that wasn't already there to begin with. Callers of glob(3) must
call globfree(3) irrespective of whether glob(3) returned an error
or not.
This commit adds a new mode option 'e' that must follow any 'b', '+' and/or
'x' options. C11 is clear about the 'x' needing to follow 'b' and/or '+' and
that is what we implement; therefore, require a strict position for 'e' as
well.
For freopen() with a non-NULL path argument and fopen(), the close-on-exec
flag is set iff the 'e' mode option is specified. For freopen() with a NULL
path argument and fdopen(), the close-on-exec flag is turned on if the 'e'
mode option is specified and remains unchanged otherwise.
Although the same behaviour for fopen() can be obtained by open(O_CLOEXEC)
and fdopen(), this needlessly complicates the calling code.
Apart from the ordering requirement, the new option matches glibc.
PR: kern/169320
libc.a and libc_p.a. In addition, define isnan in libm.a and libm_p.a,
but not in libm.so.
This makes it possible to statically link executables using both isnan
and isnanf with libc and libm.
Tested by: kargl
MFC after: 1 week
output and replace it with a new visible sysctl kern.ipc.acceptqueue
of the same functionality. It specifies the maximum length of the
accept queue on a listen socket.
The old kern.ipc.somaxconn remains available for reading and writing
for compatibility reasons so that existing programs, scripts and
configurations continue to work. There no plans to ever remove the
orginal and now hidden kern.ipc.somaxconn.
This adds two features:
* uid_from_user() and gid_from_group() as the reverse of user_from_uid()
and groups_from_gid().
* pwcache_userdb() and pwcache_groupdb() which allow alternative lookup
functions to be used. For example lookups from passwd and group
databases in a non-standard location.
srandomdev(). This doesn't actually work
with any modern C compiler:
In particular, both clang and modern gcc
verisons silently elide any xor operation
with 'junk'.
Approved by: secteam
MFC after: 3 days
After further discussion, instead of pretending to use
uid_t and gid_t as upstream Solaris and linux try to, we
are better using u_int, which is in fact what the code
can handle and best approaches the range of values used
by uid and gid.
Discussed with: bde
Reviewed by: bde
The previous change (based on Solaris) doesn't work properly either
as the casting only has the effect of quieting the compiler.
Move back to the previous solution but adjust the sizeof()
type in xdr_array(). This should mostly work (by accident).
Reported by: bde
1) Don't iterate the loop from the environment array beginning each time,
iterate it under the last place we deactivate instead.
2) Call __rebuild_environ() not on each iteration but once, only at the end
of whole loop (of course, only in case if something is changed).
MFC after: 1 week
As part of the previous commit, uses of xdr_int() were replaced
with xdr_u_int(). This has undesired effects as the second
argument doesn't match exactly uid_t or gid_t. It also breaks
assumptions in the size of the provided types.
To work around those issues we revert back to the use of xdr_int()
but provide proper casting so the behaviour doesn't change.
While here fix a style issue in the affected lines.
Reported by: bde
When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.
This change matches the reference (OpenSolaris) implementation.
Tested by: David Wolfskill
Obtained from: Bull GNU/Linux NFSv4 Project (libtirpc)
MFC after: 2 weeks
__rpc_getconfip is supposed to return the first netconf
entry supporting tcp or udp, respectively. The code will
currently return the *last* entry, plus it will leak
memory when there is more than one such entry.
This change matches the reference (OpenSolaris)
implementation.
Tested by: David Wolfskill
Obtained from: Bull GNU/linux NFSv4 Project (libtirpc)
MFC after: 1 week
to craft environment variables with similar names like that:
a=1
a=2
...
unsetenv("a") should remove them all to make later getenv("a") impossible.
Fix it to do so (this is GNU autoconf test #3 failure too).
PR: 172273
MFC after: 1 week
This fixes a race condition where another thread may fork() before CLOEXEC
is set, unintentionally passing the descriptor to the child process.
This commit only adds O_CLOEXEC flags to open() or openat() calls where no
fcntl(fd, F_SETFD, FD_CLOEXEC) follows. The separate fcntl() call still
leaves a race window so it should be fixed later.
Because fts keeps internal file descriptors open across calls, making such
descriptors close-on-exec helps not only multi-threaded applications but
also single-threaded applications.
In particular, this prevents passing a temporary file descriptor for saving
the current directory to processes created via find -exec.
The attempt to merge changes from the linux libtirpc caused
rpc.lockd to exit after startup under unclear conditions.
After many hours of selective experiments and inconsistent results
the conclusion is that it's better to just revert everything and
restart in a future time with a much smaller subset of the
changes.
____
MFC after: 3 days
Reported by: David Wolfskill
Tested by: David Wolfskill
Passing an invalid pointer results in undefined behaviour.
The wrappers in libthr access some of the data pointed to by the arguments
in userland, so that an invalid pointer will cause a signal and not an
[EFAULT] error return.
Furthermore, if the [EFAULT] error occurs when the kernel is writing, it is
not a proper error in the sense that the call still commits (changing the
signal disposition or accepting the signal).
MFC after: 1 week