user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
PR: 120233
Submitted by: Jukka Ukkonen
Approved by: rwaston (mentor)
MFC after: 1 month
and assignment.
- Add a reference to a struct cpuset in each thread that is inherited from
the thread that created it.
- Release the reference when the thread is destroyed.
- Add prototypes for syscalls and macros for manipulating cpusets in
sys/cpuset.h
- Add syscalls to create, get, and set new numbered cpusets:
cpuset(), cpuset_{get,set}id()
- Add syscalls for getting and setting affinity masks for cpusets or
individual threads: cpuid_{get,set}affinity()
- Add types for the 'level' and 'which' parameters for the cpuset. This
will permit expansion of the api to cover cpu masks for other objects
identifiable with an id_t integer. For example, IRQs and Jails may be
coming soon.
- The root set 0 contains all valid cpus. All thread initially belong to
cpuset 1. This permits migrating all threads off of certain cpus to
reserve them for special applications.
Sponsored by: Nokia
Discussed with: arch, rwatson, brooks, davidxu, deischen
Reviewed by: antoine
referencing the files VM pages are returned from the network stack,
making changes to the file safe.
This flag does not guarantee that the data has been transmitted to the
other end.
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
is seems to be a problem for SUID applications, which we like to
prevent as much as possible.
PR: docs/39530
Submitted by: Soren Spies <sspies at apple dot com>
MFC After: 3 days
a module was loaded might make the pathname inaccurate.
I wonder if an inode reference should be stored with the pathname
to allow a validity check?
Suggested by: rwatson@
for kldstat(2).
This allows libdtrace to determine the exact file from which
a kernel module was loaded without having to guess.
The kldstat(2) API is versioned with the size of the
kld_file_stat structure, so this change creates version 2.
Add the pathname to the verbose output of kldstat(8) too.
MFC: 3 days
call the pad-less versions of the corresponding syscalls if the running
kernel supports it. Check kern.osreldate once per program and cache the
result to select the appropriate syscall. This maintains userland
compatability with kernel.old's from quite a while back.
Approved by: re (kensmith)
syscalls, unless WITHOUT_SYSCALL_COMPAT is defined. The default case
will have the .c wrappers still. If you define WITHOUT_SYSCALL_COMPAT,
the .c wrappers will go away and libc will make direct syscalls.
After 7-stable starts, the direct syscall method will be default.
Approved by: re (kensmith)
Add IMPLEMENTATION NOTES section explaining in detail the effect this
system call has in common use cases involving PF_INET and PF_INET6 sockets.
PR: kern/84761
MFC after: 2 days
effective group ID (and any of our group) doesn't match the group ID of the
file, we get EPERM. This doesn't conform POSIX. POSIX requires that we should
return 0, but silently clear the set-gid bit.
- O_NONBLOCK flag has to be set, if it is not set, open(2) will wait for
another process opening the fifo for reading,
- Use O_WRONLY which implies that the file has to be opened _only_ for write.
This is quite tricky situation, because we allow to open a file with
O_RDONLY|O_TRUNC. O_TRUNC modifies a file, but we actually don't open
it for writing. EISDIR is also returned when we try to open a directory
O_RDONLY|O_TRUNC, which is correct.
POSIX says that "The result of using O_TRUNC with O_RDONLY is undefined.",
we choose to accept it (Solaris did the same), that's why "to be modified"
seems more accurate to me.
flag set, rmdir(2) returns EPERM.
- If the parent directory of the directory to be removed has its immutable or
append-only flag set, rmdir(2) returns EPERM.
immutable or append-only flag set, rename(2) returns EPERM.
- If the parent directory of the file pointed at by the to argument has its
immutable flag set, rename(2) returns EPERM.
objects with SF_IMMUTABLE, SF_APPEND, or SF_NOUNLINK.
* Document that non-superusers cannot set or clear any SF_* flag
(setting fails with EPERM, clearing is silently ignored).
* Document that superusers cannot change any flag if one of
SF_IMMUTABLE, SF_APPEND, SF_NOUNLINK is set and securelevel is
greater than 0.
* Document SF_SNAPSHOT and note that it is maintained by the
system and is, for this reason, impossible to set to clear by
any user.
PR: docs/33877
Submitted by: harti
Help by: George Marsellis <gam9478@njit.edu>
MFC after: 1 week
documentation bug. We switched to page indexes some time around
FreeBSD 2.2. The actual 'len' limit is the maximum file size or what
will fit in your address space, whichever comes first. It should be
possible to make 1TB files on 32 bit systems, but of course address space
runs out long before then.
behaviour of returning EINVAL when ".." is passed as either argument
has been restored.
rmdir("..") now returns EINVAL instead of EPERM. Document the
previously undocumented behaviour of rmdir(".") returning EINVAL
as required by POSIX and SUSv3. Bump the man page change date.
undelete("..") now returns EINVAL instead of EPERM. Bump the man
page change date.
MFC after: 3 days
other systems it prevents a tty from becoming a controlling tty on the
open. O_SYNC is the POSIX name for O_FSYNC.
The Markup Police may need to tweak my references to standards.
than the value of backlog argument.
- Document the fact that a subsequent listen(2) calls on the listening
socket change the backlog argument.
- Note that current listen queue lengths can be queried using netstat(1).
Submitted by: Igor Sysoev <is rambler-co.ru>
Wording by: gnn
and writev() except that they take an additional offset argument and do
not change the current file position. In SAT speak:
preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
kern_foov() and dofilefoo() functions into new dofilefoo() functions
that are called by kern_foov() and kern_pfoov(). The non-v functions
now all generate a simple uio on the stack from the passed in arguments
and then call kern_foov(). For example, read() now just builds a uio and
calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362
Submitted by: Marc Olzheim marcolz at stack dot nl (1)
Approved by: re (scottl)
MFC after: 1 week
reflects the actual behavior of the API
for listing extended attributes.
PR: docs/79261
Submitted by: rodrigc
Reviewed by: rwatson, kan
Approved by: das (mentor)
aio_write(2) completion through kevent(2). This method does not work on
64-bit architectures. It was deprecated in FreeBSD 4.4. See revisions
1.87 and 1.70.2.7.
Change aio_physwakeup() to call psignal(9) directly rather than indirectly
through a timeout(9). Discussed with: bde
Correct a bug introduced in revision 1.65 that could result in premature
delivery of a signal if an lio_listio(2) consisted of a mixture of
direct/raw and queued I/O operations. Observed by: tegge
Eliminate a field from struct kaioinfo that is now unused.
Reviewed by: tegge
negative (in addition to returning EINVAL when called on a descriptor
that is not a socket).
Submitted by: Arne H Juul <arnej@europe.yahoo-inc.com>
PR: docs/80587
a "null pointer".''
Making good use of the excellent explanations sent to me by Ruslan
Ermilov, Garrett Wollman and Bruce Evans, correct the descriptions of
null pointers. They are just "null pointers", not nil, not NULL or
".Dv NULL".
Suggested by: ru, wollman, bde
Reviewed by: ru, wollman
Pointy hat: keramida
documenting the obsoleteness of the msync(2) syscall and its single
remaining purpose.
PR: 70916
Submitted by: Radim Kolar <hsn@netmag.cz>
MFC after: 3 days