freebsd-skq

Author	SHA1	Message	Date
John Baldwin	dccc45e4c0	Move the cleanup of f_cdevpriv when the reference count of a devfs file descriptor drops to zero out of _fdrop() and into devfs_close_f() as it is only relevant for devfs file descriptors. Reviewed by: kib MFC after: 1 week	2011-11-04 03:39:31 +00:00
Robert Watson	b160c14194	Correct a bug in export of capability-related information from the sysctls supporting procstat -f: properly provide capability rights information to userspace. The bug resulted from a merge-o during upstreaming (or rather, a failure to properly merge FreeBSD-side changed downstream). Spotted by: des, kibab MFC after: 3 days	2011-10-12 12:08:03 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Jonathan Anderson	cfb5f76865	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-18 22:51:30 +00:00
Konstantin Belousov	9c00bb9190	Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)	2011-08-16 20:07:47 +00:00
Jonathan Anderson	69d377fe1b	Allow Capsicum capabilities to delegate constrained access to file system subtrees to sandboxed processes. - Use of absolute paths and '..' are limited in capability mode. - Use of absolute paths and '..' are limited when looking up relative to a capability. - When a name lookup is performed, identify what operation is to be performed (such as CAP_MKDIR) as well as check for CAP_LOOKUP. With these constraints, openat() and friends are now safe in capability mode, and can then be used by code such as the capability-mode runtime linker. Approved by: re (bz), mentor (rwatson) Sponsored by: Google Inc	2011-08-13 09:21:16 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Jonathan Anderson	c30b9b5169	Export capability information via sysctls. When reporting on a capability, flag the fact that it is a capability, but also unwrap to report all of the usual information about the underlying file. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-07-20 09:53:35 +00:00
Jonathan Anderson	745bae379d	Add implementation for capabilities. Code to actually implement Capsicum capabilities, including fileops and kern_capwrap(), which creates a capability to wrap an existing file descriptor. We also modify kern_close() and closef() to handle capabilities. Finally, remove cap_filelist from struct capability, since we don't actually need it. Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc	2011-07-15 09:37:14 +00:00
Jonathan Anderson	5604e481b1	Fix the "passability" test in fdcopy(). Rather than checking to see if a descriptor is a kqueue, check to see if its fileops flags include DFLAG_PASSABLE. At the moment, these two tests are equivalent, but this will change with the addition of capabilities that wrap kqueues but are themselves of type DTYPE_CAPABILITY. We already have the DFLAG_PASSABLE abstraction, so let's use it. This change has been tested with [the newly improved] tools/regression/kqueue. Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc	2011-07-08 12:19:25 +00:00
Edward Tomasz Napierala	afcc55f318	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".	2011-07-06 20:06:44 +00:00
Jonathan Anderson	9acdfe6549	Rework _fget to accept capability parameters. This new version of _fget() requires new parameters: - cap_rights_t needrights the rights that we expect the capability's rights mask to include (e.g. CAP_READ if we are going to read from the file) - cap_rights_t haverights used to return the capability's rights mask (ignored if NULL) - u_char maxprotp the maximum mmap() rights (e.g. VM_PROT_READ) that can be permitted (only used if we are going to mmap the file; ignored if NULL) - int fget_flags FGET_GETCAP if we want to return the capability itself, rather than the underlying object which it wraps Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc	2011-07-05 13:45:10 +00:00
Jonathan Anderson	c0467b5e6e	When Capsicum starts creating capabilities to wrap existing file descriptors, we will want to allocate a new descriptor without installing it in the FD array. Split falloc() into falloc_noinstall() and finstall(), and rewrite falloc() to call them with appropriate atomicity. Approved by: mentor (rwatson), re (bz)	2011-06-30 15:22:49 +00:00
Stanislav Sedov	ff6f41a472	- Do no try to drop a NULL filedesc pointer.	2011-05-12 10:56:33 +00:00
Stanislav Sedov	0daf62d9f5	- Commit work from libprocstat project. These patches add support for runtime file and processes information retrieval from the running kernel via sysctl in the form of new library, libprocstat. The library also supports KVM backend for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have been modified to take advantage of the library (as the bonus point the fstat(1) utility no longer need superuser privileges to operate), and the procstat(1) utility is now able to display information from memory dumps as well. The newly introduced fuser(1) utility also uses this library and able to operate via sysctl and kvm backends. The library is by no means complete (e.g. KVM backend is missing vnode name resolution routines, and there're no manpages for the library itself) so I plan to improve it further. I'm commiting it so it will get wider exposure and review. We won't be able to MFC this work as it relies on changes in HEAD, which was introduced some time ago, that break kernel ABI. OTOH we may be able to merge the library with KVM backend if we really need it there. Discussed with: rwatson	2011-05-12 10:11:39 +00:00
Edward Tomasz Napierala	722581d9e6	Add RACCT_NOFILE accounting. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 19:13:04 +00:00
Konstantin Belousov	1fe80828e7	After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat. Requested by: jhb X-MFC-note: do not	2011-04-01 13:28:34 +00:00
Konstantin Belousov	246d35ec91	Add O_CLOEXEC flag to open(2) and fhopen(2). The new function fallocf(9), that is renamed falloc(9) with added flag argument, is provided to facilitate the merge to stable branch. Reviewed by: jhb MFC after: 1 week	2011-03-25 14:00:36 +00:00
John Baldwin	8e6fa660f2	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week	2011-03-24 18:40:11 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Jilles Tjoelker	90750179ec	Do not trip a KASSERT if /dev/null cannot be opened for a setuid program. The fdcheckstd() function makes sure fds 0, 1 and 2 are open by opening /dev/null. If this fails (e.g. missing devfs or wrong permissions), fdcheckstd() will return failure and the process will exit as if it received SIGABRT. The KASSERT is only to check that kern_open() returns the expected fd, given that it succeeded. Tripping the KASSERT is most likely if fd 0 is open but fd 1 or 2 are not. MFC after: 2 weeks	2011-01-28 15:29:35 +00:00
Konstantin Belousov	23b70c1ae2	Finish r210923, 210926. Mark some devices as eternal. MFC after: 2 weeks	2011-01-04 10:59:38 +00:00
Bjoern A. Zeeb	2f826fdf53	Remove one zero from the double-0. This code doesn't have a license to kill. MFC after: 3 days	2010-04-23 14:32:58 +00:00
Konstantin Belousov	080136212f	On the return path from F_RDAHEAD and F_READAHEAD fcntls, do not unlock Giant twice. While there, bring conditions in the do/while loops closer to style, that also makes the lines fit into 80 columns. Reported and tested by: dougb	2009-11-20 22:22:53 +00:00
Xin LI	82aebf697c	Add two new fcntls to enable/disable read-ahead: - F_READAHEAD: specify the amount for sequential access. The amount is specified in bytes and is rounded up to nearest block size. - F_RDAHEAD: Darwin compatible version that use 128KB as the sequential access size. A third argument of zero disables the read-ahead behavior. Please note that the read-ahead amount is also constrainted by sysctl variable, vfs.read_max, which may need to be raised in order to better utilize this feature. Thanks Igor Sysoev for proposing the feature and submitting the original version, and kib@ for his valuable comments. Submitted by: Igor Sysoev <is rambler-co ru> Reviewed by: kib@ MFC after: 1 month	2009-09-28 16:59:47 +00:00
Robert Watson	14961ba789	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
Ulf Lilleengen	2dd43ecaec	- Similar to the previous commit, but for CURRENT: Fix a bug where a FIFO vnode use count was increased twice, but only decreased once.	2009-06-24 18:44:38 +00:00
Ulf Lilleengen	7083246d7c	- Fix a bug where a FIFO vnode use count was increased twice, but only decreased once. MFC after: 1 week	2009-06-24 18:38:51 +00:00
John Baldwin	c4f16b69e1	Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing. Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe. Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks	2009-06-15 20:38:55 +00:00
Jeff Roberson	f4471727f3	- Use an acquire barrier to increment f_count in fget_unlocked and remove the volatile cast. Describe the reason in detail in a comment. Discussed with: bde, jhb	2009-06-02 06:55:32 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
John Baldwin	6ca33ea345	Set the umask in a new file descriptor table earlier in fdcopy() to remove two lock operations.	2009-05-20 18:42:04 +00:00
Konstantin Belousov	6b72d8db47	Revert r192094. The revision caused problems for sysctl(3) consumers that expect that oldlen is filled with required buffer length even when supplied buffer is too short and returned error is ENOMEM. Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when remaining buffer space is too short for the current kinfo_file structure. Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition to keep existing interface for the sysctl, though. Reported by: ed, Florian Smeets <flo kasimir com> Tested by: pho	2009-05-15 14:41:44 +00:00
Jeff Roberson	bf422e5f27	- Implement a lockless file descriptor lookup algorithm in fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.	2009-05-14 03:24:22 +00:00
John Baldwin	3f11530b79	Update comment above _fget() for earlier change to FWRITE failures return EBADF rather than EINVAL. Submitted by: Jaakko Heinonen jh saunalahti fi MFC after: 1 month	2009-04-15 19:10:37 +00:00
Joe Marcus Clarke	0618630015	Remove the printf's when the vnode to be exported for procstat is not a VDIR. If the file system backing a process' cwd is removed, and procstat -f PID is called, then these messages would have been printed. The extra verbosity is not required in this situation. Requested by: kib Approved by: kib	2009-02-14 21:55:09 +00:00
Joe Marcus Clarke	03fd9c2092	Change two KASSERTS to printfs and simple returns. Stress testing has revealed that a process' current working directory can be VBAD if the directory is removed. This can trigger a panic when procstat -f PID is run. Tested by: pho Discovered by: phobot Reviewed by: kib Approved by: kib	2009-02-14 21:12:24 +00:00
Robert Watson	54fffe2d67	Modify fdcopy() so that, during fork(2), it won't copy file descriptors from the parent to the child process if they have an operation vector of &badfileops. This narrows a set of races involving system calls that allocate a new file descriptor, potentially block for some extended period, and then return the file descriptor, when invoked by a threaded program that concurrently invokes fork(2). Similar approches are used in both Solaris and Linux, and the wideness of this race was introduced in FreeBSD when we moved to a more optimistic implementation of accept(2) in order to simplify locking. A small race necessarily remains because the fork(2) might occur after the finit() in accept(2) but before the system call has returned, but that appears unavoidable using current APIs. However, this race is vastly narrower. The fix can be validated using the newfileops_on_fork regression test. PR: kern/130348 Reported by: Ivan Shcheklein <shcheklein at gmail dot com> Reviewed by: jhb, kib MFC after: 1 week	2009-02-11 15:22:01 +00:00
Konstantin Belousov	7efa697d80	Clear the pointers to the file in the struct filedesc before file is closed in fdfree. Otherwise, sysctl_kern_proc_filedesc may dereference stale struct file * values. Reported and tested by: pho MFC after: 1 month	2008-12-30 12:51:56 +00:00
Peter Wemm	43151ee6cf	Merge user/peter/kinfo branch as of r185547 into head. This changes struct kinfo_filedesc and kinfo_vmentry such that they are same on both 32 and 64 bit platforms like i386/amd64 and won't require sysctl wrapping. Two new OIDs are assigned. The old ones are available under COMPAT_FREEBSD7 - but it isn't that simple. The superceded interface was never actually released on 7.x. The other main change is to pack the data passed to userland via the sysctl. kf_structsize and kve_structsize are reduced for the copyout. If you have a process with 100,000+ sockets open, the unpacked records require a 132MB+ copyout. With packing, it is "only" ~35MB. (Still seriously unpleasant, but not quite as devastating). A similar problem exists for the vmentry structure - have lots and lots of shared libraries and small mmaps and its copyout gets expensive too. My immediate problem is valgrind. It traditionally achieves this functionality by parsing procfs output, in a packed format. Secondly, when tracing 32 bit binaries on amd64 under valgrind, it uses a cross compiled 32 bit binary which ran directly into the differing data structures in 32 vs 64 bit mode. (valgrind uses this to track file descriptor operations and this therefore affected every single 32 bit binary) I've added two utility functions to libutil to unpack the structures into a fixed record length and to make it a little more convenient to use.	2008-12-02 06:50:26 +00:00
John Baldwin	2ff47c5f18	Remove unnecessary locking around vn_fullpath(). The vnode lock for the vnode in question does not need to be held. All the data structures used during the name lookup are protected by the global name cache lock. Instead, the caller merely needs to ensure a reference is held on the vnode (such as vhold()) to keep it from being freed. In the case of procfs' <pid>/file entry, grab the process lock while we gain a new reference (via vhold()) on p_textvp to fully close races with execve(2). For the kern.proc.vmmap sysctl handler, use a shared vnode lock around the call to VOP_GETATTR() rather than an exclusive lock. MFC after: 1 month	2008-11-04 19:04:01 +00:00
John Baldwin	21fc02d271	Use shared vnode locks instead of exclusive vnode locks for the access(), chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month	2008-11-03 20:31:00 +00:00
Dag-Erling Smørgrav	e11e3f187d	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Robert Watson	ac2456bfc3	Downgrade XXX to a Note for fgetsock() and fputsock(). MFC after: 3 days	2008-10-12 20:03:17 +00:00
Ed Schouten	bc093719ca	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan	2008-08-20 08:31:58 +00:00
Ed Schouten	79da190c16	Remove unneeded D_NEEDGIANT from /dev/fd/{0,1,2}. There is no reason the fdopen() routine needs Giant. It only sets curthread->td_dupfd, based on the device unit number of the cdev. I guess we won't get massive performance improvements here, but still, I assume we eventually want to get rid of Giant.	2008-08-09 12:42:12 +00:00
John Baldwin	6bc1e9cd84	Rework the lifetime management of the kernel implementation of POSIX semaphores. Specifically, semaphores are now represented as new file descriptor type that is set to close on exec. This removes the need for all of the manual process reference counting (and fork, exec, and exit event handlers) as the normal file descriptor operations handle all of that for us nicely. It is also suggested as one possible implementation in the spec and at least one other OS (OS X) uses this approach. Some bugs that were fixed as a result include: - References to a named semaphore whose name is removed still work after the sem_unlink() operation. Prior to this patch, if a semaphore's name was removed, valid handles from sem_open() would get EINVAL errors from sem_getvalue(), sem_post(), etc. This fixes that. - Unnamed semaphores created with sem_init() were not cleaned up when a process exited or exec'd. They were only cleaned up if the process did an explicit sem_destroy(). This could result in a leak of semaphore objects that could never be cleaned up. - On the other hand, if another process guessed the id (kernel pointer to 'struct ksem' of an unnamed semaphore (created via sem_init)) and had write access to the semaphore based on UID/GID checks, then that other process could manipulate the semaphore via sem_destroy(), sem_post(), sem_wait(), etc. - As part of the permission check (UID/GID), the umask of the proces creating the semaphore was not honored. Thus if your umask denied group read/write access but the explicit mode in the sem_init() call allowed it, the semaphore would be readable/writable by other users in the same group, for example. This includes access via the previous bug. - If the module refused to unload because there were active semaphores, then it might have deregistered one or more of the semaphore system calls before it noticed that there was a problem. I'm not sure if this actually happened as the order that modules are discovered by the kernel linker depends on how the actual .ko file is linked. One can make the order deterministic by using a single module with a mod_event handler that explicitly registers syscalls (and deregisters during unload after any checks). This also fixes a race where even if the sem_module unloaded first it would have destroyed locks that the syscalls might be trying to access if they are still executing when they are unloaded. XXX: By the way, deregistering system calls doesn't do any blocking to drain any threads from the calls. - Some minor fixes to errno values on error. For example, sem_init() isn't documented to return ENFILE or EMFILE if we run out of semaphores the way that sem_open() can. Instead, it should return ENOSPC in that case. Other changes: - Kernel semaphores now use a hash table to manage the namespace of named semaphores nearly in a similar fashion to the POSIX shared memory object file descriptors. Kernel semaphores can now also have names longer than 14 chars (up to MAXPATHLEN) and can include subdirectories in their pathname. - The UID/GID permission checks for access to a named semaphore are now done via vaccess() rather than a home-rolled set of checks. - Now that kernel semaphores have an associated file object, the various MAC checks for POSIX semaphores accept both a file credential and an active credential. There is also a new posixsem_check_stat() since it is possible to fstat() a semaphore file descriptor. - A small set of regression tests (using the ksem API directly) is present in src/tools/regression/posixsem. Reported by: kris (1) Tested by: kris Reviewed by: rwatson (lightly) MFC after: 1 month	2008-06-27 05:39:04 +00:00
Ed Schouten	cc8945d204	Remove redundant checks from fcntl()'s F_DUPFD. Right now we perform some of the checks inside the fcntl()'s F_DUPFD operation twice. We first validate the `fd' argument. When finished, we validate the `arg' argument. These checks are also performed inside do_dup(). The reason we need to do this, is because fcntl() should return different errno's when the `arg' argument is out of bounds (EINVAL instead of EBADF). To prevent the redundant locking of the PROC_LOCK and FILEDESC_SLOCK, patch do_dup() to support the error semantics required by fcntl(). Approved by: philip (mentor)	2008-05-28 20:25:19 +00:00
Attilio Rao	258f4727f1	Replace direct atomic operation for the file refcount witht the refcount interface. It also introduces the correct usage of memory barriers, as sometimes fdrop() and fhold() are used with shared locks, which don't use any release barrier.	2008-05-25 14:57:43 +00:00

1 2 3 4 5 ...

381 Commits