freebsd-dev

Author	SHA1	Message	Date
Robert Watson	71909edec8	Significant refactoring of the accounting code to improve locking and VFS happiness, as well as correct other bugs: - Replace notion of current and saved accounting credential/vnode with a single credential/vnode and an acct_suspended flag. This simplifies the accounting logic substantially. - Replace acct_mtx with acct_sx, a sleepable lock held exclusively during reconfiguration and space polling, but shared during log entry generation. This avoids holding a mutex over sleepable VFS operations. - Hold the sx lock over the duration of the I/O so that the vnode I/O cannot occur after vnode close, which could occur previously if accounting was disabled as a process exited. - Write the accounting log entry with Giant conditionally acquired based on the file system where the log is stored. Previously, the accounting code relied on the caller acquiring Giant. - Acquire Giant conditionally in the accounting callout based on the file system where the accounting log is stored. Run the callout MPSAFE. - Expose acct_suspended via a read-only sysctl so it is possibly to programmatically determine whether accounting is suspended or not without attempting to parse logs. - Check both acct_vp and acct_suspended lock-free before entering the accounting sx lock in acct(). - When accounting is disabled due to a VBAD vnode (i.e., forceable unmount), generate a log message indicating accounting has been disabled. - Correct a long-standing bug in how free space is calculated and compared to the required space: generate and compare signed results, not unsigned results, or negative free space will cause accounting to not be suspended when required, or worse, incorrectly resumed once negative free space is reached. MFC after: 2 weeks	2005-11-12 10:45:13 +00:00
Robert Watson	87328e07e0	Pass 'curthread' into VFS_STATFS() from acctwatch(), rather than passing NULL. The NFS client expects that a thread will always be present for a VOP so that it can check for signal conditions, and will dereference a NULL pointer if one isn't present. MFC after: 3 days	2005-09-21 15:28:07 +00:00
Robert Watson	081322613b	When mac_check_system_acct() fails, make sure to unlock as well as close the vnode. Pointed out by: jeff	2005-03-01 08:56:13 +00:00
Robert Watson	2b05b557ff	In acct_process(), do a lockless read of acctvp to see if it's NULL before deciding to do more expensive locking to account for process exit. This acceptable minor race avoids two mutex operations in that highly common case of accounting not being enabled. MFC after: 2 weeks	2005-01-08 04:45:57 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Bruce Evans	01e3f3ae4f	Fixed some style bugs (mainly misplaced comments, and totally disordered declarations in acct_process()).	2004-03-04 09:47:09 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Poul-Henning Kamp	7c89f162bc	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.	2003-07-27 17:04:56 +00:00
Bosko Milekic	48719ca7c8	Change the style of the english used to print accounting enabled and disabled. This means no period at the end and changing "Process accounting <foo>" to "Accounting <foo>". Pointed out by: bde	2003-07-16 13:20:10 +00:00
Bosko Milekic	d2dbf5bc0b	Log process accounting activation/deactivation. Useful for some auditing purposes. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/54529	2003-07-16 03:59:50 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Dag-Erling Smørgrav	87ccef7b77	Instead of recording the Unix time in a process when it starts, record the uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.	2003-05-01 16:59:23 +00:00
John Baldwin	7e653dbd3b	Hold the proc lock across a wider range of fields that it protects.	2003-04-17 22:20:30 +00:00
John Baldwin	2d055ab20f	Trim some trailing whitespace.	2003-03-13 23:07:09 +00:00
Tim J. Robbins	27e39ae4d8	Remove the PL_SHAREMOD flag from struct plimit, which could have been used to share resource limits between rfork threads, but never was. Removing it makes resource limit locking much simpler -- only the current process can change the contents of the structure that p_limit points to.	2003-02-20 04:18:42 +00:00
Alfred Perlstein	f97182acf8	unwrap lines made short enough by SCARGS removal	2002-12-14 08:18:06 +00:00
Alfred Perlstein	b80521fee5	remove syscallarg(). Suggested by: peter	2002-12-14 02:07:32 +00:00
Alfred Perlstein	d1e405c5ce	SCARGS removal take II.	2002-12-14 01:56:26 +00:00
Alfred Perlstein	bc9e75d7ca	Backout removal SCARGS, the code freeze is only "selectively" over.	2002-12-13 22:41:47 +00:00
Alfred Perlstein	0bbe7292e1	Remove SCARGS. Reviewed by: md5	2002-12-13 22:27:25 +00:00
Bill Fenner	8b5f8b061a	Don't hold acct_mtx over limcopy(), since it's unnecessary and limcopy() can sleep. Approved by: re	2002-11-26 18:04:12 +00:00
Giorgos Keramidas	5f9ae8e026	Typo in comment: commmand -> command Reviewed by: jhb	2002-11-05 14:54:07 +00:00
Robert Watson	e5e820fd1f	Permit MAC policies to instrument the access control decisions for system accounting configuration and for nfsd server thread attach. Policies might use this to protect the integrity or confidentiality of accounting data, limit the ability to turn on or off accounting, as well as to prevent inappropriately labeled threads from becoming nfs server threads. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-04 15:13:36 +00:00
Robert Watson	b497ca81d6	Make sure that the accounting credential is saved along with the vp when accounting is suspended--otherwise when accounting is restored, we may incorrectly assume the credential is valid. Panics experienced by: juli	2002-10-05 20:05:23 +00:00
Robert Watson	289c6dea76	Don't call VOP_LEASE() while holding the accounting mutex.	2002-09-18 01:56:13 +00:00
Andrew R. Reiter	b4dcc46af5	- Fix two obvious locking bugs; 1) returning with lock held when it needed to be dropped, 2) attempting to lock acct_mtx while already holding it. Sorry to those who experienced pain. - Added two comments referring to two areas in which acct_mtx is held over vnode operations that might sleep. Patch in the works for this.	2002-09-12 05:00:32 +00:00
Andrew R. Reiter	4f39d5d511	- Lock down the accounting code globals with a subsystem mutex. Reviewed by: jhb, mdodd	2002-09-11 04:10:41 +00:00
Robert Watson	9ca435893b	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 20:55:08 +00:00
Robert Watson	2d70161756	Cache the credential provided during accton() for use in later accounting vnode operations. This permits the rights of the user (typically root) used to turn on accounting to be used when writing out accounting entries, rather than the credentials of the process generating the accounting record. This fixes accounting in a number of environments, including file systems that offer revocation support, MAC environments, some securelevel scenarios, and in some NFS environments. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-07 19:30:16 +00:00
Johan Karlsson	5b60674451	Save flags returned by vn_open and use them when calling vn_close. Reviewed by: bde Approved by: sheldonh (mentor)	2002-07-21 15:22:56 +00:00
Johan Karlsson	92da2e7671	Open accounting file for appending, not general writing. This allows accton(1) to be used with an append-only file. PR: 7169 Reported by: Joao Carlos Mendes Luis <jonny@jonny.eng.br> Reviewed by: bde Approved by: sheldonh (mentor) MFC after: 2 weeks	2002-07-10 17:31:58 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
John Baldwin	16e7bc7b90	- Remove an early KSE diagnostic panic. The thread pointer here is always curthread. - We don't need Giant to do suser() checks now, so don't lock Giant until after the check.	2002-04-09 19:58:38 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Alfred Perlstein	4d77a549fe	Remove __P.	2002-03-19 21:25:46 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Seigo Tanimura	f591779bb5	Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)	2002-02-23 11:12:57 +00:00
Robert Watson	fc5d29ef7d	o Move suser() calls in kern/ to using suser_xxx() with an explicit credential selection, rather than reference via a thread or process pointer. This is part of a gradual migration to suser() accepting a struct ucred instead of a struct proc, simplifying the reference and locking semantics of suser(). Obtained from: TrustedBSD Project	2001-11-01 20:56:57 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Matthew Dillon	116734c4d1	Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(), vfork(), rfork(), jail().	2001-09-01 03:04:31 +00:00
Robert Watson	3c4543e046	o Reduce gratuitous whitespace difference from Darwin.	2001-08-29 17:18:04 +00:00
Robert Watson	b1fc0ec1a7	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
John Baldwin	ba88dfc733	Back out proc locking to protect p_ucred for obtaining additional references along with the actual obtaining of additional references.	2001-01-27 00:01:31 +00:00
John Baldwin	762dba203e	- Proc locking. - Protect calcru() with sched_lock.	2001-01-24 00:28:07 +00:00
Jake Burkholder	4f55983606	Use callout_reset instead of timeout(9). Most callouts are statically allocated, 2 have been added to struct proc for setitimer and sleep. Reviewed by: jhb, jlemon	2000-11-27 22:52:31 +00:00

1 2

76 Commits