freebsd-skq

Author	SHA1	Message	Date
John Baldwin	e46502943a	Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson	2008-01-07 20:05:19 +00:00
Jeff Roberson	397c19d175	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
Jeff Roberson	ace8398da0	Refactor select to reduce contention and hide internal implementation details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch	2007-12-16 06:21:20 +00:00
Craig Rodrigues	d7f81adbd4	Revert previous commits which I committed by mistake. Approved by: re (implicit) Pointy hat to: me	2007-07-14 21:23:31 +00:00
Craig Rodrigues	d678780e60	The last entry in the ext2_opts array must be NULL, otherwise the kernel with crash in vfs_filteropt() if an invalid mount option is passed to ext2fs. Approved by: re (kensmith)	2007-07-14 21:18:19 +00:00
Robert Watson	dede2ab3b2	In kern_kevent(), unconditionally fdrop() fp once fget() has succeeded, as we never have an opportunity to set it to NULL. Found with: Coverity Prevent(tm) CID: 2161	2007-05-28 17:15:05 +00:00
Robert Watson	87066f04c6	Select a more appealing spelling for the word acquire.	2007-05-27 19:24:00 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Robert Watson	0c14ff0eb5	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
John Baldwin	6600b45d88	Save exit status of an exiting process in kn_data in the knote. Submitted by: Jared Yanovich ^phirerunner at comcast.net^ MFC after: 2 weeks	2006-11-20 22:17:50 +00:00
John-Mark Gurney	33fabe46da	remove unnecessary NULL check... Coverity ID: 1545	2006-09-25 01:29:48 +00:00
John-Mark Gurney	4db71d27a1	hide kqueue_register from public view, and replace it w/ kqfd_register... this eliminates a possible race in aio registering a kevent..	2006-09-24 04:47:47 +00:00
John-Mark Gurney	9edac6f3f9	add KTRACE hooks into kevent... This will help people debug their kqueue programs to find out exactly which events were registered and which were returned... This should be lower in kern_kevent, but that would require special munging due to locks and the functions used to copyin/copyout kevents... If someone wants to teach ktrace how to output pretty kevents, I have a kevent prety printer that can be used...	2006-09-24 02:23:29 +00:00
John Baldwin	5c69ad8374	Use fget() in kqueue_register() instead of doing all the work by hand.	2006-06-12 21:46:23 +00:00
Pawel Jakub Dawidek	f420242b2b	Don't forget to unlock kq lock in low memory situations. OK'ed by: jmg	2006-06-02 13:23:39 +00:00
Pawel Jakub Dawidek	8ebab14c70	Remove confusing done_noglobal label. The KQ_GLOBAL_UNLOCK() macro know how to handle both situations - when kq_global lock is and is not held. OK'ed by: jmg	2006-06-02 13:21:21 +00:00
Pawel Jakub Dawidek	241321abc0	Use SLIST_FOREACH_SAFE() macro, because knote_drop() can free an element which can be then used to find next element in the list. OK'ed by: jmg	2006-06-02 13:18:59 +00:00
John Baldwin	a29b4f6eec	Drop the kqueue global mutex as soon as we are finished with it rather than keeping it locked until we exit the function to optimize the case where the lock would be dropped and later reacquired. The optimization was broken when kevent's were moved from UFS to VFS and the knote list lock for a vnode kevent became the lockmgr vnode lock. If one tried to use a kqueue that contained events for a kqueue fd followed by a vnode, then the kq global lock would end up being held when the vnode lock was acquired which could result in sleeping with a mutex held (and subsequent panics) if the vnode lock was contested. Reviewed by: jmg Tested by: ps (on 6.x) MFC after: 3 days	2006-04-14 14:27:28 +00:00
John-Mark Gurney	1c4ca5e5fe	spell unlock correctly, this is relatively minor as it's rare someone would provide a lock method, and want the default unlock, but it is a bug... PR: 95356 Submitted by: Stephen Corteselli MFC after: 3 days	2006-04-07 17:21:27 +00:00
John-Mark Gurney	5e6125891f	mask out any action when copying the flags from the event to the knote.. Pointed out by: Václav Haisman Submitted by: Dan Nelson (slightly modifed patch) MFC after: 3 days	2006-04-01 20:15:39 +00:00
John-Mark Gurney	4e095bc045	hold the list lock over the f_event and KNOTE_ACTIVATE calls... This closes a race where data could come in before we clear the INFLUX flag, and get skipped over by knote (and hence never be activated, though it should of been)... Found by: glebius & co. Reviewed by: glebius MFC after: 3 days	2006-03-29 18:15:30 +00:00
Doug Ambrisko	69cd28dacb	Add in kqueue support to LIO event notification and fix how it handled notifications when LIO operations completed. These were the problems with LIO event complete notification: - Move all LIO/AIO event notification into one general function so we don't have bugs in different data paths. This unification got rid of several notification bugs one of which if kqueue was used a SIGILL could get sent to the process. - Change the LIO event accounting to count all AIO request that could have been split across the fast path and daemon mode. The prior accounting only kept track of AIO op's in that mode and not the entire list of operations. This could cause a bogus LIO event complete notification to occur when all of the fast path AIO op's completed and not the AIO op's that ended up queued for the daemon. Suggestions from: alc	2005-10-12 17:51:31 +00:00
Stephan Uphoff	19b2dff7b0	Fix race condition that caused activation of an event to be ignored immediately after it was deactivated. Found by: Yahoo! MFC after: 3 days	2005-09-15 21:10:12 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Paul Saab	efe5becafa	Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall. Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks	2005-06-03 23:15:01 +00:00
John-Mark Gurney	b633f50dd8	make stat return an zero'd struct, and be a FIFO again... This is only to fix libc_r since it requires stat to close fd's, and so commented in the code... PR: threads/75795 Reviewed by: ps MFC after: 1 week	2005-05-24 23:42:50 +00:00
John-Mark Gurney	c4c44d2935	fix aio+kq... I've been running ambrisko's test program for much longer w/o problems than I was before... This simply brings back the knote_delete as knlist_delete which will also drop the knote's, instead of just clearing the list and seeing _ONESHOT... Fix a race where if a note was _INFLUX and _DETACHED, it could end up being modified... whoopse.. MFC after: 1 week Prodded by: ambrisko and dwhite	2005-03-18 01:11:39 +00:00
Paul Saab	b8a4edc17e	Use kern_kevent instead of the stackgap for 32bit syscall wrapping. Submitted by: jhb Tested on: amd64	2005-03-01 17:45:55 +00:00
Robert Watson	4f7fd28ee1	When invoking callout_init(), spell '1' as "CALLOUT_MPSAFE". MFC after: 3 days	2005-02-22 13:11:33 +00:00
Poul-Henning Kamp	c711aea6ca	Make a bunch of malloc types static. Found by: src/tools/tools/kernxref	2005-02-10 12:02:37 +00:00
Poul-Henning Kamp	1b5cd47aa0	Move a FILEDESC_UNLOCK upwards to silence witness.	2004-11-16 14:41:31 +00:00
Poul-Henning Kamp	124e4c3be8	Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.	2004-11-13 11:53:02 +00:00
John-Mark Gurney	583ef6b6d2	/me gets the wrong patch out of the pr :( /me had the write patch w/o comments on his test system. Pointed out by: kuriyama and ache Pointy hat to: jmg	2004-10-14 03:26:50 +00:00
John-Mark Gurney	d46316e8f9	fix a bug where signal events didn't set the flags for attach/detach.. PR: 72234 MFC after: 2 days	2004-10-13 20:55:19 +00:00
John-Mark Gurney	31580e6817	unlock global lock in kqueue_scan before msleep'ing to prevent dead lock.. we didn't unlock global lock earlier to prevent just having to reaquire it again.. Found by: peter Reviewed by: ps MFC after: 3 days	2004-09-14 18:38:16 +00:00
John-Mark Gurney	ca95b2de43	remove giant required from kqueue_close.. Reported by: kuriyama MFC after: 3 days	2004-09-10 03:14:32 +00:00
John-Mark Gurney	9b90387dcf	don't call f_detach if the filter has alread removed the knote.. This happens when a proc exits, but needs to inform the user that this has happened.. This also means we can remove the check for detached from proc and sig f_detach functions as this is doing in kqueue now... MFC after: 5 days	2004-09-06 19:02:42 +00:00
Brian Feldman	1c0f9af5b5	Allocate the marker, when scanning a kqueue, from the "heap" instead of the stack. When swapped out, a process's kernel stack would be unavailable, and we could get a page fault when scanning the same kqueue. PR: kern/61849	2004-08-16 03:08:38 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
John-Mark Gurney	7d5e45a391	looks like rwatson forgot tabs... :)	2004-08-13 07:38:58 +00:00
Robert Watson	44f31f7556	Trim trailing white space.	2004-08-12 18:06:21 +00:00
Robert Watson	a6719c82b1	Push Giant acquisition down into fo_stat() from most callers. Acquire Giant conditional on debug.mpsafenet in the socket soo_stat() routine, unconditionally in vn_statfile() for VFS, and otherwise don't acquire Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is a no-op. Don't acquire Giant in fstat() system call. Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS stack sitting on top, and therefore there will still be Giant recursion in this case.	2004-07-22 20:40:23 +00:00
Robert Watson	1c1ce9253f	Push acquisition of Giant from fdrop_closed() into fo_close() so that individual file object implementations can optionally acquire Giant if they require it: - soo_close(): depends on debug.mpsafenet - pipe_close(): Giant not acquired - kqueue_close(): Giant required - vn_close(): Giant required - cryptof_close(): Giant required (conservative) Notes: Giant is still acquired in close() even when closing MPSAFE objects due to kqueue requiring Giant in the calling closef() code. Microbenchmarks indicate that this removal of Giant cuts 3%-3% off of pipe create/destroy pairs from user space with SMP compiled into the kernel. The cryptodev and opencrypto code appears MPSAFE, but I'm unable to test it extensively and so have left Giant over fo_close(). It can probably be removed given some testing and review.	2004-07-22 18:35:43 +00:00
Alfred Perlstein	a88295bb83	Disable SIGIO for now, leave a comment as to why it's busted and hard to fix.	2004-07-15 03:49:52 +00:00
Alfred Perlstein	67543ab1e3	Make FIOASYNC, FIOSETOWN and FIOGETOWN work on kqueues.	2004-07-14 07:02:03 +00:00
Alfred Perlstein	94ed9c8af5	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.	2004-07-04 10:52:54 +00:00
Robert Watson	948a4734ed	Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires Giant.	2004-06-01 18:05:41 +00:00
Colin Percival	ec513ff759	Fix filt_timer* races: Finish initializing a knote before we pass it to a callout, and use the new callout_drain API to make sure that a callout has finished before we deallocate memory it is using. PR: kern/64121 Discussed with: gallatin	2004-04-07 05:59:57 +00:00
Brian Feldman	fe5f3a72ac	Make sure to wake up any select waiters when closing a kqueue (also, not crash). I am fairly sure that only people with SMP and multi-threaded apps using kqueue will be affected by this, so I have a stress-testing program on my web site: <URL:http://green.homeunix.org/~green/getaddrinfo-pthreads-stresstest.c>	2004-02-20 04:00:48 +00:00
David Malone	1c58509c25	Don't TAILQ_INIT kq_head twice, once is enough.	2003-12-25 23:42:36 +00:00
Olivier Houchard	1a29c80648	Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week	2003-11-14 18:49:01 +00:00
Seigo Tanimura	512824f8f7	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
Olivier Houchard	7922cdc855	I believe kbyanc@ really meant this in rev 1.58. Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered. MFC after: 1 week	2003-11-04 01:41:47 +00:00
Olivier Houchard	f44004690c	Do not attempt to report proc event if NOTE_EXIT has already been received. This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen. PR: kern/58258	2003-11-04 01:14:58 +00:00
David Malone	e1419c08e2	falloc allocates a file structure and adds it to the file descriptor table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument. As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes. To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file. Reviewed by: iedowse Idea run past: jhb	2003-10-19 20:41:07 +00:00
Poul-Henning Kamp	7c2d2efd58	Initialize struct fileops with C99 sparse initialization.	2003-06-18 18:16:40 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Kelly Yancey	f563420e8d	Fix race between a process registering a NOTE_EXIT EVFILT_PROC event and the target process exiting which causes attempts to register the kevent to randomly fail depending on whether the target runs to completion before the parent can call kevent(2). The bug actually effects EVFILT_PROC events on any zombie process, but the most common manifestation is with parents trying to monitor child processes. MFC after: 2 weeks Sponsored by: NTT Multimedia Communications Labs	2003-04-12 01:57:04 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Alfred Perlstein	e7d6662f1b	Do not allow kqueues to be passed via unix domain sockets.	2003-02-15 06:04:55 +00:00
Alfred Perlstein	edf6699ae6	Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a barrier between free'ing filedesc structures. Basically if you want to access another process's filedesc, you want to hold this mutex over the entire operation.	2003-02-15 05:52:56 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Jeffrey Hsu	34c54d9f74	Rewrite the SMP filedesc locking in knote_attach() in order to 1. eliminate unnecessary loop which frees and re-allocates the just allocated array 2. eliminate the newsize recomputation 3. eliminate unnecessary unlock and relock around free 4. correctly match the free with the malloc into M_KQUEUE instead of M_TEMP 5. eliminate conditional assignment of oldlist, which is equivalent to a simple assignment 6. eliminate the oldlist temporary variable completely Reviewed by: jhb	2003-01-21 04:05:49 +00:00
Matthew Dillon	48e3128b34	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.	2003-01-13 00:33:17 +00:00
Matthew Dillon	cd72f2180b	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.	2003-01-12 01:37:13 +00:00
Alfred Perlstein	13438f6823	When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.	2003-01-01 01:56:19 +00:00
Poul-Henning Kamp	a7010ee2f4	White-space changes.	2002-12-24 09:44:51 +00:00
Poul-Henning Kamp	f3a682116c	Detediousficate declaration of fileops array members by introducing typedefs for them.	2002-12-23 21:53:20 +00:00
Robert Watson	9a1b076af2	Minor comment typo fix. Submitted by: Wayne Morrison <tewok@tislabs.com>	2002-10-29 20:51:44 +00:00
Don Lewis	cb81d3ca4d	hashinit() calls MALLOC(), so release the filedesc lock in knote_attach() before calling hashinit() and relock afterwards, taking care to see that we don't lose a race.	2002-10-03 06:03:26 +00:00
Robert Watson	d49fa1ca6e	In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation. - Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-17 02:36:16 +00:00
Robert Watson	49cde51dfd	Correct white space nits that crept in during my recent merges of trustedbsd_mac material.	2002-08-16 14:12:40 +00:00
Robert Watson	ea6027a8e1	Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-16 12:52:03 +00:00
Robert Watson	9ca435893b	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 20:55:08 +00:00
Alfred Perlstein	7f05b0353a	More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.	2002-06-29 01:50:25 +00:00
Alfred Perlstein	802082390b	More caddr_t removal. Change struct knote's kn_hook from caddr_t to void *.	2002-06-29 00:29:12 +00:00
John Baldwin	f44d9e24fb	Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.	2002-05-19 00:14:50 +00:00
Jeff Roberson	c897b81311	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.	2002-03-20 04:09:59 +00:00
Jonathan Lemon	cd75bfa75f	Add entry for EVFILT_NETDEV, which was inadverdently omitted back in Sept.	2002-01-24 17:20:55 +00:00
Alfred Perlstein	a4db49537b	Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().	2002-01-14 00:13:45 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
Alfred Perlstein	21d56e9c33	Make AIO a loadable module. Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9). Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run. Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO. Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.) Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.	2001-12-29 07:13:47 +00:00
Matthew Dillon	b064d43d8f	remove holdfp() Replace uses of holdfp() with fget() or fgetvp() calls as appropriate introduce fget(), fget_read(), fget_write() - these functions will take a thread and file descriptor and return a file pointer with its ref count bumped. introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will take a thread and file descriptor and return a vref()'d vnode. _read() requires that the file pointer be FREAD, _write that it be FWRITE. This continues the cleanup of struct filedesc and struct file access routines which, when are all through with it, will allow us to then make the API calls MP safe and be able to move Giant down into the fo_* functions.	2001-11-14 06:30:36 +00:00
Jonathan Lemon	0217f5c71e	Have EVFILT_TIMERS allocate their callouts via malloc() instead of using the static callout list allocated by the system. Change malloc type from M_TEMP to M_KQUEUE to better track memory. Add a kern.kq_calloutmax to globally limit the amount of kernel memory that can be allocated by callouts. Submitted by: iedowse (items 1, 2)	2001-09-29 17:48:39 +00:00
John Baldwin	ed01445d8f	Use the passed in thread to selrecord() instead of curthread.	2001-09-21 22:46:54 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Matthew Dillon	116734c4d1	Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(), vfork(), rfork(), jail().	2001-09-01 03:04:31 +00:00
Jonathan Lemon	5f5c2e958f	Introduce EVFILT_TIMER, which allows a process to establish an arbitrary number of timers, both oneshot and periodic. Repeatedly reminded to commit by: jayanth Reviewed by: peter (a while back)	2001-07-19 18:34:40 +00:00
Robert Watson	a0f75161f9	o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx(). The p_can(...) construct was a premature (and, it turns out, awkward) abstraction. The individual calls to p_canxxx() better reflect differences between the inter-process authorization checks, such as differing checks based on the type of signal. This has a side effect of improving code readability. o Replace direct credential authorization checks in ktrace() with invocation of p_candebug(), while maintaining the special case check of KTR_ROOT. This allows ktrace() to "play more nicely" with new mandatory access control schemes, as well as making its authorization checks consistent with other "debugging class" checks. o Eliminate "privused" construct for p_can*() calls which allowed the caller to determine if privilege was required for successful evaluation of the access control check. This primitive is currently unused, and as such, serves only to complicate the API. Approved by: ({procfs,linprocfs} changes) des Obtained from: TrustedBSD Project	2001-07-05 17:10:46 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
John Baldwin	33a9ed9d0e	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake	2001-04-24 00:51:53 +00:00
Robert Watson	e386f9bda3	o Make kqueue's filt_procattach() function use the error value returned by p_can(...P_CAN_SEE), rather than returning EACCES directly. This brings the error code used here into line with similar arrangements elsewhere, and prevents the leakage of pid usage information. Reviewed by: jlemon Obtained from: TrustedBSD Project	2001-04-12 21:32:02 +00:00
Jonathan Lemon	24607d88ed	Add an EV_SET() convenience macro for initializing struct kevent prior to the call to kevent(). Update the copyright notices as well.	2001-02-24 01:44:03 +00:00
Jonathan Lemon	89bbe051bb	Fix typo in comment (knode -> knote).	2001-02-23 20:32:42 +00:00
Jonathan Lemon	608a3ce62a	Extend kqueue down to the device layer. Backwards compatible approach suggested by: peter	2001-02-15 16:34:11 +00:00
John Baldwin	e5690aadaa	Proc locking.	2001-01-24 00:35:12 +00:00
Garrett Wollman	0a2c3d48c6	select() DKI is now in <sys/selinfo.h>.	2001-01-09 04:33:49 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Matthew Dillon	279d722604	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>	2000-11-18 21:01:04 +00:00
Robert Watson	387d2c036b	o Centralize inter-process access control, introducing: int p_can(p1, p2, operation, privused) which allows specification of subject process, object process, inter-process operation, and an optional call-by-reference privused flag, allowing the caller to determine if privilege was required for the call to succeed. This allows jail, kern.ps_showallprocs and regular credential-based interaction checks to occur in one block of code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL, and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a series of static function checks in kern_prot, which should not be invoked directly. o Commented out capabilities entries are included for some checks. o Update most inter-process authorization to make use of p_can() instead of manual checks, PRISON_CHECK(), P_TRESPASS(), and kern.ps_showallprocs. o Modify suser{,_xxx} to use const arguments, as it no longer modifies process flags due to the disabling of ASU. o Modify some checks/errors in procfs so that ENOENT is returned instead of ESRCH, further improving concealment of processes that should not be visible to other processes. Also introduce new access checks to improve hiding of processes for procfs_lookup(), procfs_getattr(), procfs_readdir(). Correct a bug reported by bp concerning not handling the CREATE case in procfs_lookup(). Remove volatile flag in procfs that caused apparently spurious qualifier warnigns (approved by bde). o Add comment noting that ktrace() has not been updated, as its access control checks are different from ptrace(), whereas they should probably be the same. Further discussion should happen on this topic. Reviewed by: bde, green, phk, freebsd-security, others Approved by: bde Obtained from: TrustedBSD Project	2000-08-30 04:49:09 +00:00

1 2 3 4

165 Commits