well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.
Reviewed by: davidxu
Tested by: pho
MFC after: 1 month
* retry various system calls on EINTR
* retry the rest after a short read (common if there is more than about 1K
of output)
* block SIGCHLD like system(3) does (note that this does not and cannot
work fully in threaded programs, they will need to be careful with wait
functions)
PR: 90580
MFC after: 1 month
one path. When the list is empty (contain only a NULL pointer), return
EINVAL instead of pretending to succeed, which will cause a NULL pointer
deference in a later fts_read() call.
Noticed by: Christoph Mallon (via rdivacky@)
MFC after: 2 weeks
the type argument. This is known to fix some pthread_mutexattr_settype()
invocations, especially when it comes to pulseaudio.
Approved by: kib
deischen (threads)
MFC after: 3 days
other than the current system-wide size (32-bits) has been updated so
for now just cautiously turn the check off. While here fix the check
for IDs being too large which doesn't work due to type mis-matches.
Reviewed by: jhb (previous version)
Approved by: re (kib)
MFC after: 1 month (type mis-match fixes only)
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
(this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
int. This removes the need for the shm_bsegsz member in struct
shmid_kernel and should allow for complete support of SYSV SHM regions
>= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
short.
- The shm_internal member of struct shmid_ds is now gone. The internal
VM object pointer for SHM regions has been moved into struct
shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
now marked COMPAT7 and new versions of those system calls which support
the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc. The
FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
symbol versions has been added to libc. Version tags are added to
system calls by adding an appropriate __sym_compat() entry to
src/lib/libc/incldue/compat.h. [1]
PR: kern/16195 kern/113218 bin/129855
Reviewed by: arch@, rwatson
Discussed with: kan, kib [1]
system callers of getgroups(), getgrouplist(), and setgroups() to
allocate buffers dynamically. Specifically, allocate a buffer of size
sysconf(_SC_NGROUPS_MAX)+1 (+2 in a few cases to allow for overflow).
This (or similar gymnastics) is required for the code to actually follow
the POSIX.1-2008 specification where {NGROUPS_MAX} may differ at runtime
and where getgroups may return {NGROUPS_MAX}+1 results on systems like
FreeBSD which include the primary group.
In id(1), don't pointlessly add the primary group to the list of all
groups, it is always the first result from getgroups(). In principle
the old code was more portable, but this was only done in one of the two
places where getgroups() was called to the overall effect was pointless.
Document the actual POSIX requirements in the getgroups(2) and
setgroups(2) manpages. We do not yet support a dynamic NGROUPS, but we
may in the future.
MFC after: 2 weeks
Last year I added SLIST_REMOVE_NEXT and STAILQ_REMOVE_NEXT, to remove
entries behind an element in the list, using O(1) time. I recently
discovered NetBSD also has a similar macro, called SLIST_REMOVE_AFTER.
In my opinion this approach is a lot better:
- It doesn't have the unused first argument of the list pointer. I added
this, mainly because OpenBSD also had it.
- The _AFTER suffix makes a lot more sense, because it is related to
SLIST_INSERT_AFTER. _NEXT is only used to iterate through the list.
The reason why I want to rename this now, is to make sure we don't
release a major version with the badly named macros.
the length by evaluating the value from the copy, cbuf instead. This
fixes a crash caused by previous commit (use-after-free)
Submitted by: Dimitry Andric <dimitry andric com>
Pointy hat to: delphij
The entire world seems to use the non-standard TIOCSCTTY ioctl to make a
TTY a controlling terminal of a session. Even though tcsetsid(3) is also
non-standard, I think it's a lot better to use in our own source code,
mainly because it's similar to tcsetpgrp(), tcgetpgrp() and tcgetsid().
I stole the idea from QNX. They do it the other way around; their
TIOCSCTTY is just a wrapper around tcsetsid(). tcsetsid() then calls
into an IPC framework.
dlfunc() called dlsym() to do the work, and dlsym() determines the dso
that originating the call by the return address. Due to this, dlfunc()
operated as if the caller is always the libc.
To fix this, move the dlfunc() to rtld, where it can call the internal
implementation of dlsym, and still correctly fetch return address.
Provide usual weak stub for the symbol from libc for static binaries.
dlfunc is put to FBSD_1.0 symver namespace in the ld.so export to
override dlfunc@FBSD_1.0 weak symbol, exported by libc.
Reported, analyzed and tested by: Tijl Coosemans <tijl ulyssis org>
PR: standards/133339
Reviewed by: kan
these functions were moved into the kernel:
- Move the version entries from gen/ to sys/. Since the ABI of the actual
routines did not change, I'm still exporting them as FBSD 1.0 on purpose.
- Add FBSD-private versions for the _ and __sys_ variants.
When calling setttyent() after calling endttyent(), pts_valid will never
be set to 1, because the readdir()-loop will likely never vind a pts
that has a higher number than before.
Simplify the code by removing pts_valid. We'll just set maxpts to -1
when we don't have a valid count yet.
It seems ttyslot() calls rindex(), to strip the device name to the last
slash, but this is obviously invalid. /dev/pts/0 should be stripped
until pts/0. Because /etc/ttys only supports TTY names in /dev/, just
strip this piece of the pathname.
A more elegant way of obtaining a name of a character device by its file
descriptor on FreeBSD, is to use the FIODGNAME ioctl. Because a valid
file descriptor implies a file descriptor is visible in /dev, it will
always resolve a valid device name.
I'm adding a more friendly wrapper for this ioctl, called fdevname(). It
is a lot easier to use than devname() and also has better error
handling. When a device name cannot be resolved, it will just return
NULL instead of a generated device name that makes no sense.
Discussed with: kib
Threading library calls _pre before the fork, allowing the rtld to
lock itself to ensure that other threads of the process are out of
dynamic linker. _post releases the locks.
This allows the rtld to have consistent state in the child. Although
child may legitimately call only async-safe functions, the call may
need plt relocation resolution, and this requires working rtld.
Reported and debugging help by: rink
Reviewed by: kan, davidxu
MFC after: 1 month (anyway, not before 7.1 is out)
This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
I believe this is not a valid C99 construct. Use directly calculated
offsets into the supplied buffer, using specified members length,
to fill appropriate structure.
Either use sysctl, or copy the value of the UNAME_x environment
variable, instead of unconditionally doing sysctl, and then
overriding a returned value with user-specified one.
Noted and tested by: rdivacky
Update referenced example to include unistd.h per manpage.
Update example to be more style(9)-ish, silence warnings and add
FreeBSD id to the source file.
review by secteam@ for the reasons mentioned below.
1) Rename /dev/urandom to /dev/random since urandom marked as
XXX Deprecated
alias in /sys/dev/random/randomdev.c
(this is our naming convention and no review by secteam@ required)
2) Set rs_stired flag after forced initialization to prevent
double stearing.
(this is already in OpenBSD, i.e. they don't have double stearing.
It means that this change matches their code path and no additional
secteam@ review required)
Submitted by: Thorsten Glaser <tg@mirbsd.de> (2)
to this commit, "env BLOCKSIZE=4X df" prints not only "4X: unknown
blocksize" as expected, but sometimes also "maximum blocksize is 1G"
and "minimum blocksize is 512" depending on what happened to be on
the stack.
Found by: LLVM/Clang Static Checker
assumed to be reviewd by them):
Stir directly from the kernel PRNG, without taking less random pid & time
bytes too (when it is possible).
The difference with OpenBSD code is that they have KERN_ARND sysctl for
that task, while we need to read /dev/random
wide string arguments.
Also simplify the code that handles length modifiers and make it
more conservative. For instance, be explicit about the modifiers
allowed for %d, rather than assuming that anything other than L,
q, t, or z implies an int argument.
I guess the original author of the popen() code didn't want to use our
<sys/queue.h> macro's, because the single linked list macro's didn't
offer O(1) deletion. Because I introduced SLIST_REMOVE_NEXT() some time
ago, we can now use the macro's here.
By converting the code to an SLIST, it is more consistent with other
parts of the C library and the operating system.
Reviewed by: csjp
Approved by: philip (mentor, implicit)
It seems I made a small bug when writing some of the posix_spawn(3)
manpages. Remove the redundant "Ed Schouten", which broke the AUTHORS
section.
Approved by: philip (mentor, implicit)
"If you don't get a review within a day or two, I would firmly recommend
backing out the changes"
back out all my changes, i.e. not comes from merging from OpenBSD as
unreviewed by secteam@ yet.
(OpenBSD changes stays in assumption they are reviewd by OpenBSD)
Yes, it means some old bugs returned, like not setted rs_stired = 1 in
arc4random_stir(3) causing double stirring.
1) Unindent and sort variables.
2) Indent struct members.
3) Remove _packed, use guaranteed >128 bytes size and only first 128
bytes from the structure.
4) Reword comment.
Obtained from: bde
2) Use gettimeofday() and getpid() only if reading from /dev/urandom
fails or impossible.
3) Discard N bytes on very first initialization only (i.e. don't
discard on re-stir).
4) Reduce N from 1024 to 512 as really suggested in the
"(Not So) Random Shuffles of RC4" paper:
http://research.microsoft.com/users/mironov/papers/rc4full.pdf
2) Eliminate "struct arc4_stream *as" arg since only single arg is
possible.
3) Set rs.j = rs.i after arc4random key schedule to be more like arc4
stream cipher.
Obtained from: OpenBSD
Adding exevpe() has caused some ports to break. Even though execvpe() is
a useful routine, it does not conform to any standards.
This patch is a little bit different from the patch sent to the mailing
list. I forgot to remove execvpe from the Symbol.map (which does not
seem to miscompile libc, though).
Reviewed by: davidxu
Approved by: philip
can be used as replacements for exec/fork in a lot of cases. This
change also added execvpe() which allows environment variable
PATH to be used for searching executable file, it is used for
implementing posix_spawnp().
PR: standards/122051
files after a seekdir().
The seekdir shall set the position for the next readdir operation.
When the _readdir_unlocked() encounters deleted entry, dd_loc is
already advanced. Continuing the loop leads to premature read of
the target entry.
Submitted by: Marc Balmer <mbalmer at openbsd org>
Obtained from: OpenBSD
MFC after: 2 weeks
deals with the usual __opendir2() calls, and the rest part with an interface
translator to expose fdopendir(3) functionality. Manual page was obtained from
kib@'s work for *at(2) system calls.
live in libm, while modf() lives in libc due to historical
mistakes. I'm claiming in the manpage that they all live in libm,
since programmers should not rely on the mistake.
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
of the array length needed to store all the directory entries.
Although BSD has historically guaranteed that st_size is the size
of the directory file, POSIX does not, and more to the point, some
recent filesystems such as ZFS use st_size to mean something else.
The fix is to not stat the directory at all, set the initial
array size to 32 entries, and realloc it in powers of 2 if that
proves insufficient.
PR: 113668
{SHRT_MAX}, so {STREAM_MAX} should be no greater than that. (This
does not exactly meet the letter of POSIX but comes reasonably close
to it in spirit.)
MFC after: 14 days
fields in FTS and FTSENT structs being too narrow. In addition,
the narrow types creep from there into fts.c. As a result, fts(3)
consumers, e.g., find(1) or rm(1), can't handle file trees an ordinary
user can create, which can have security implications.
To fix the historic implementation of fts(3), OpenBSD and NetBSD
have already changed <fts.h> in somewhat incompatible ways, so we
are free to do so, too. This change is a superset of changes from
the other BSDs with a few more improvements. It doesn't touch
fts(3) functionality; it just extends integer types used by it to
match modern reality and the C standard.
Here are its points:
o For C object sizes, use size_t unless it's 100% certain that
the object will be really small. (Note that fts(3) can construct
pathnames _much_ longer than PATH_MAX for its consumers.)
o Avoid the short types because on modern platforms using them
results in larger and slower code. Change shorts to ints as
follows:
- For variables than count simple, limited things like states,
use plain vanilla `int' as it's the type of choice in C.
- For a limited number of bit flags use `unsigned' because signed
bit-wise operations are implementation-defined, i.e., unportable,
in C.
o For things that should be at least 64 bits wide, use long long
and not int64_t, as the latter is an optional type. See
FTSENT.fts_number aka FTS.fts_bignum. Extending fts_number `to
satisfy future needs' is pointless because there is fts_pointer,
which can be used to link to arbitrary data from an FTSENT.
However, there already are fts(3) consumers that require fts_number,
or fts_bignum, have at least 64 bits in it, so we must allow for them.
o For the tree depth, use `long'. This is a trade-off between making
this field too wide and allowing for 64-bit inode numbers and/or
chain-mounted filesystems. On the one hand, `long' is almost
enough for 32-bit filesystems on a 32-bit platform (our ino_t is
uint32_t now). On the other hand, platforms with a 64-bit (or
wider) `long' will be ready for 64-bit inode numbers, as well as
for several 32-bit filesystems mounted one under another. Note
that fts_level has to be signed because -1 is a magic value for it,
FTS_ROOTPARENTLEVEL.
o For the `nlinks' local var in fts_build(), use `long'. The logic
in fts_build() requires that `nlinks' be signed, but our nlink_t
currently is uint16_t. Therefore let's make the signed var wide
enough to be able to represent 2^16-1 in pure C99, and even 2^32-1
on a 64-bit platform. Perhaps the logic should be changed just
to use nlink_t, but it can be done later w/o breaking fts(3) ABI
any more because `nlinks' is just a local var.
This commit also inludes supporting stuff for the fts change:
o Preserve the old versions of fts(3) functions through libc symbol
versioning because the old versions appeared in all our former releases.
o Bump __FreeBSD_version just in case. There is a small chance that
some ill-written 3-rd party apps may fail to build or work correctly
if compiled after this change.
o Update the fts(3) manpage accordingly. In particular, remove
references to fts_bignum, which was a FreeBSD-specific hack to work
around the too narrow types of FTSENT members. Now fts_number is
at least 64 bits wide (long long) and fts_bignum is an undocumented
alias for fts_number kept around for compatibility reasons. According
to Google Code Search, the only big consumers of fts_bignum are in
our own source tree, so they can be fixed easily to use fts_number.
o Mention the change in src/UPDATING.
PR: bin/104458
Approved by: re (quite a while ago)
Discussed with: deischen (the symbol versioning part)
Reviewed by: -arch (mostly silence); das (generally OK, but we didn't
agree on some types used; assuming that no objections on
-arch let me to stick to my opinion)
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
when particular function can't be found in nsswitch-module. For
example, getgrouplist(3) will use module-supplied 'getgroupmembership'
function (which can work in an optimal way for such source as LDAP) and
will fall back to the stanard iterate-through-all-groups implementation
otherwise.
PR: ports/114655
Submitted by: Michael Hanselmann <freebsd AT hansmi DOT ch>
Reviewed by: brooks (mentor)
This commit includes the following core components:
* sample configuration file for sensorsd
* rc(8) script and glue code for sensorsd(8)
* sysctl(3) doc fixes for CTL_HW tree
* sysctl(3) documentation for hardware sensors
* sysctl(8) documentation for hardware sensors
* support for the sensor structure for sysctl(8)
* rc.conf(5) documentation for starting sensorsd(8)
* sensor_attach(9) et al documentation
* /sys/kern/kern_sensors.c
o sensor_attach(9) API for drivers to register ksensors
o sensor_task_register(9) API for the update task
o sysctl(3) glue code
o hw.sensors shadow tree for sysctl(8) internal magic
* <sys/sensors.h>
* HW_SENSORS definition for <sys/sysctl.h>
* sensors display for systat(1), including documentation
* sensorsd(8) and all applicable documentation
The userland part of the framework is entirely source-code
compatible with OpenBSD 4.1, 4.2 and -current as of today.
All sensor readings can be viewed with `sysctl hw.sensors`,
monitored in semi-realtime with `systat -sensors` and also
logged with `sensorsd`.
Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
Obtained from: OpenBSD (parts)
call the pad-less versions of the corresponding syscalls if the running
kernel supports it. Check kern.osreldate once per program and cache the
result to select the appropriate syscall. This maintains userland
compatability with kernel.old's from quite a while back.
Approved by: re (kensmith)
net: endhostdnsent is named _endhostdnsent and is
private to netdb family of functions.
posix1e: acl_size.c has been never compiled in,
so there's no "acl_size".
rpc: "getnetid" is a static function.
stdtime: "gtime" is #ifdef'ed out in the source.
some symbols are specific only to some architectures,
e.g., ___tls_get_addr is only defined on i386.
__htonl, __htons, __ntohl and __ntohs are no longer
functions, they are now (internal) defines in
<machine/endian.h>.
Submitted by: ru
on int, but in fact it should operate on long.
- Introduce 'lvalue' variable, which is long.
- Fix _SC_XOPEN_SHM for 64bit archs.
- Fix _SC_PHYS_PAGES for 64bit archs.
Reported by: simokawa
- Use lvalue for pathconf(3), as it returns long.
- Cast value explicitly to long on return.
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.
A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.
There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.
Reviewed by: rwatson
The symptom is that syslog() fails to log anything but the "ident"
string if LOG_PERROR is specified to openlog(3) and the extensible
printf is in action.
For unclear, likely quaint historical reasons, syslog uses fwopen()
on a stack buffer, rather than using the more straightforward
and faster snprintf().
Along the way, fflush(3) is called, and since the callback writer
function returns zero instead of the length "written", __SERR
naturally gets set on the filedescriptor.
The extensible printf, in difference from the normal printf refuses
to output anything to an __SERR marked filedescriptor, and thus
the actual syslog message is supressed.
MFC: after 2 weeks
in rev. 1.34. Mainly I missed the fact that the buffer is used for two
purposes:
1) storing a group line from the group file;
2) __gr_parse_entry() parses the buffer and tries to put the group
members to the remaining part of the buffer and can fail if there
is no enough room for them.
Re-arrange the buffer size checks to account the latter case.
Submitted by: Kirk R Webb
MFC after: 2 weeks
If the initial buffer size (1KB) for the given group line is not big
enough, reset the offset. It helps to do not miss this line when
getrg() reallocates the larger buffer and tries to parse the line again.
PR: bin/52433, kern/55031, bin/83696, misc/97640, misc/98111
Submitted by: bsw71@mail.ru, Philip M. Gollucci, Justin Erenkrantz
Glanced at: nectar
MFC after: 1 month