freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	9731998997	Move the convert_sigevent32() utility function into freebsd32_misc.c for consumption outside the vfs_aio.c. For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods, also copy in the sigev_value, since librt event pumping loop compares note generation number with the value passed through sigev_value. Tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:33:48 +00:00
Gleb Smirnoff	6160e12c10	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
Gleb Smirnoff	f95c13db04	Separate LIO_SYNC processing into a separate function aio_process_sync(), and rename aio_process() into aio_process_rw(). Reviewed by: kib Sponsored by: Nginx, Inc.	2013-06-08 13:02:43 +00:00
Konstantin Belousov	f3215a60fd	Fix a race with the vnode reclamation in the aio_qphysio(). Obtain the thread reference on the vp->v_rdev and use the returned struct cdev dev instead of using vp->v_rdev. Call dev_strategy_csw() instead of dev_strategy(), since we now own the reference. Since the csw was already calculated, test d_flags to avoid mapping the buffer if the driver supports unmapped requests []. Suggested by: kan [*] Reviewed by: kan (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-03-27 11:47:52 +00:00
Konstantin Belousov	e81ff91e62	Do not remap usermode pages into KVA for physio. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:43:57 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
David Xu	d56e058a79	Add 32-bit compat code for AIO kevent flags introduced in revision 230857.	2012-02-05 04:49:31 +00:00
David Xu	fde809356a	If multiple threads call kevent() to get AIO events on same kqueue fd, it is possible that a single AIO event will be reported to multiple threads, it is not threading friendly, and the existing API can not control this behavior. Allocate a kevent flags field sigev_notify_kevent_flags for AIO event notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH or EV_ONESHOT to AIO kernel code, user can control whether the event should be cleared once it is retrieved by a thread. This change should be comptaible with existing application, because the field should have already been zero-filled, and no additional action will be taken by kernel. PR: kern/156567	2012-02-01 02:53:06 +00:00
Doug Ambrisko	8e9fc27818	When detaching an AIO or LIO requests grab the lock and tell knlist_remove that we have the lock now. This cleans up a locking panic ASSERT when knlist_empty is called without a lock when INVARIANTS etc. are turned. Reviewed by: kib jhb MFC after: 1 week	2012-01-30 19:19:22 +00:00
Gleb Smirnoff	94fce84763	Fix size check, that prevents getting negative after casting to a signed type Reviewed by: bde	2012-01-27 08:58:58 +00:00
Gleb Smirnoff	434ea137cc	Although aio_nbytes is size_t, later is is signed to casted types: to ssize_t in filesystem code and to int in buf code, thus supplying a negative argument leads to kernel panic later. To fix that check user supplied argument in the beginning of syscall. Submitted by: Maxim Dounin <mdounin mdounin.ru>, maxim@	2012-01-26 11:59:48 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
David Xu	cf7d9a8ca8	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian	2010-10-09 02:50:23 +00:00
Konstantin Belousov	723d37c0ac	Convert aio syscall registration to SYSCALL_INIT_HELPER. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:11:34 +00:00
Nathan Whitehorn	841c0c7ec7	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb	2010-03-11 14:49:06 +00:00
Robert Watson	e76d823b81	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
Konstantin Belousov	d8b0556c6d	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
John Baldwin	74fb0ba732	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
John Baldwin	e588eeb1dc	Use the correct type for the timeout parameter to the 32-bit compat version aio_waitcomplete(). Reminded by: bz Submitted by: jamie MFC after: 3 days	2009-01-23 13:23:17 +00:00
John Baldwin	3858a1f4f5	- Add 32-bit compat system calls for VFS_AIO. The system calls live in the aio code and are registered via the recently added SYSCALL32_*() helpers. - Since the aio code likes to invoke fuword and suword a lot down in the "bowels" of system calls, add a structure holding a set of operations for things like storing errors, copying in the aiocb structure, storing status, etc. The 32-bit system calls use a separate operations vector to handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now done by having seperate operation vectors with different aiocb copyin routines. - Split out kern_foo() functions for the various AIO system calls so the 32-bit front ends can manage things like copying in and converting timespec structures, etc. - For both the native and 32-bit aio_suspend() and lio_listio() calls, just use copyin() to read the array of aiocb pointers instead of using a for loop that iterated over fuword/fuword32. The error handling in the old case was incomplete (lio_listio() just ignored any aiocb's that it got an EFAULT trying to read rather than reporting an error), and possibly slower. MFC after: 1 month	2008-12-10 20:56:19 +00:00
Oleksandr Tymoshenko	22035f4727	Use minimum of max_aio_procs and target_aio_procs when spawning new aiod since there should be no more then max_aio_procs processes.	2008-06-21 11:34:34 +00:00
Robert Watson	e603be7ada	Use FEATURE() macro to advertise aio availability.	2008-02-01 11:59:14 +00:00
Jean-Sébastien Pédron	a8afa221cc	When asked to use kqueue, AIO stores its internal state in the `kn_sdata' member of the newly registered knote. The problem is that this member is overwritten by a call to kevent(2) with the EV_ADD flag, targetted at the same kevent/knote. For instance, a userland application may set the pointer to NULL, leading to a panic. A testcase was provided by the submitter. PR: kern/118911 Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 1 day	2008-01-24 17:10:19 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Konstantin Belousov	5114048b63	Destroy the kaio_mtx on the freeing the struct kaioinfo in the aio_proc_rundown. Do not allow for zero-length read to be passed to the fo_read file method by aio. Reported and tested by: Peter Holm Approved by: re (kensmith)	2007-08-20 11:53:26 +00:00
Matt Jacob	a659386c7e	Remove unused variable.	2007-06-10 01:50:05 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Robert Watson	873fbcd776	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.	2007-03-05 13:10:58 +00:00
Tom Rhodes	6aeb05d7be	Merge posix4/* into normal kernel hierarchy. Reviewed by: glanced at by jhb Approved by: silence on -arch@ and -standards@	2006-11-11 16:26:58 +00:00
Alexander Leidinger	6a1162d4cd	MFP4 (with some minor changes): Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>	2006-10-15 14:22:14 +00:00
John-Mark Gurney	4db71d27a1	hide kqueue_register from public view, and replace it w/ kqfd_register... this eliminates a possible race in aio registering a kevent..	2006-09-24 04:47:47 +00:00
Mark Peek	f6d004d510	Remove call to fdfree() for the AIO daemons to prevent kernel panics with linprocfs. This call is not needed since file descriptor sharing was removed in v1.125. Reviewed by: alc, davidxu, ambrisko MFC after: 3 days	2006-09-06 15:11:20 +00:00
Alexander Leidinger	993182e57c	- Change process_exec function handlers prototype to include struct image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret. Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)	2006-08-15 12:10:57 +00:00
Doug Ambrisko	51e37c7f37	Make lio ident more consistant with aio ident.	2006-06-02 17:45:48 +00:00
David Xu	759ccccadb	Use a dedicated mutex to protect aio queues, the movation is to reduce lock contention with other parts.	2006-05-09 00:10:11 +00:00
David Xu	dbbccfe923	1. Move code for scanning pending I/O from aio_fsync to aio_aqueue, it has less overhead. 2. Avoid scheduling task if maximum number of I/O threads is reached.	2006-03-24 00:50:06 +00:00
David Xu	99eee864ad	Implement aio_fsync() syscall.	2006-03-23 08:46:42 +00:00
David Xu	27b8220d12	1. Remove aio entry from lists earlier in aio_free_entry, so other threads can not see it if we unlock the proc lock (this can happen in knlist_delete). Don't do wakeup, it is not necessary. 2. Decrease kaio_buffer_count in biohelper rather than doing it in aio_bio_done_notify. 3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN was set, because the process is already in single thread mode. 4. Use assignment to initialize aiothreadflags. 5. AIOCBLIST_RUNDOWN is not useful, axe the code using it. 6. use LIO_NOP instead of zero.	2006-02-26 12:56:23 +00:00
David Xu	ad8de0f243	If block size is zero, use normal file operations to do I/O, this eliminates a divided-by-zero fault. Recommended by: phk	2006-02-22 00:05:12 +00:00
David Xu	6d53aa6297	Just like dofilewrite(), call bwillwrite before fo_write.	2006-01-27 08:02:25 +00:00
David Xu	03d66b36c7	return final error code in aio_return rather than a hardcoded 0.	2006-01-27 04:14:16 +00:00
David Xu	55a122bf28	in aio_aqueue, store same return code into job->_aiocb_private.error. in aio_return, unlock proc lock before suword.	2006-01-26 08:37:02 +00:00
David Xu	1aa4c324ee	Add locking annotation and comments about socket, pipe, fifo problem. Temporarily fix a locking problem for socket I/O.	2006-01-24 07:24:24 +00:00
David Xu	e6bdc05ff7	Er, rescure a deleted comment line.	2006-01-24 02:50:42 +00:00
David Xu	bd793be3c6	More cleanup for aio code: 1) unregsiter kqueue filter for EVFILT_LIO. 2) free uma_zones. 3) call setsid directly to enter another session rather than implementing by itself. Submitted by: jhb	2006-01-24 02:46:15 +00:00
David Xu	7f34b521c7	Add bracket.	2006-01-23 23:46:30 +00:00
David Xu	68d7111884	Verify all supported notification types.	2006-01-23 10:27:15 +00:00
David Xu	a9bf5e37ae	1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue, so that lio_listio won't exceed the quota. 2) Remove lio_ref_count, it is no longer used.	2006-01-23 02:49:34 +00:00
David Xu	8c0d9af5bf	Fix a bogus panic.	2006-01-22 09:39:59 +00:00
David Xu	9b84335c84	Decrease kaio_active_count first, because user process may go away after we notified it.	2006-01-22 09:25:52 +00:00
David Xu	1ce9182407	Make aio code MP safe.	2006-01-22 05:59:27 +00:00
Christian S.J. Peron	8213baf002	Initialize ki to p->p_aioinfo after we know it's going to be referencing a valid kaioinfo structure. This avoids a potential NULL pointer dereference. Found with: Coverity Prevent(tm) MFC after: 2 weeks	2006-01-15 01:55:45 +00:00
John Baldwin	af56abaab5	Return error from fget_write() rather than hardcoding EBADF now that fget_write() DTRT. Requested by: bde	2006-01-06 16:34:22 +00:00
David Xu	323fe56580	In aio_waitcomplete, do not return EAGAIN if no other threads have started aio, instead, initialize aio management structure if it hasn't been done, the reason to adjust this behavior is to make it a bit friendly for threaded program, consider two threads, one submits aio_write, and another just calls aio_waitcomplete to wait any I/O to be completed and recycle the aio requests, before submitter doing any I/O, the recycler wants to wait in kernel. This also fixes inconsistency with other aio syscalls.	2005-11-08 23:48:32 +00:00
John Baldwin	2a522eb9d3	Various and sundry cleanups: - Use curthread for calls to knlist_delete() and add a big comment explaining why as well as appropriate assertions. - Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them. - Use fget() family of functions to lookup file objects instead of grovelling around in file descriptor tables. - Destroy the aio_freeproc mutex if we are unloaded. Tested on: i386	2005-11-08 17:43:05 +00:00
David Xu	8f0371f19d	Fix name compatible problem with POSIX standard. the sigval_ptr and sigval_int really should be sival_ptr and sival_int. Also sigev_notify_function accepts a union sigval value but not a pointer.	2005-11-04 09:41:00 +00:00
David Xu	4c0fb2cfff	Support sending realtime signal information via signal queue, realtime signal memory is pre-allocated, so kernel can always notify user code.	2005-11-03 05:25:26 +00:00
John Baldwin	68a17869c1	Push down Giant into fdfree() and remove it from two of the callers. Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week	2005-11-01 17:13:05 +00:00
David Xu	0972628aff	Fix sigevent's POSIX incompatible problem by adding member fields sigev_notify_function and sigev_notify_attributes. AIO syscalls use sigevent, so they have to be adjusted. Reviewed by: alc	2005-10-30 02:12:49 +00:00
Doug Ambrisko	db43cd0417	Fix tinderbox box by removing incomplete/bad spl usage. Proper giant free locking is required in for aio. Pointed out by: imp	2005-10-12 22:33:22 +00:00
Doug Ambrisko	69cd28dacb	Add in kqueue support to LIO event notification and fix how it handled notifications when LIO operations completed. These were the problems with LIO event complete notification: - Move all LIO/AIO event notification into one general function so we don't have bugs in different data paths. This unification got rid of several notification bugs one of which if kqueue was used a SIGILL could get sent to the process. - Change the LIO event accounting to count all AIO request that could have been split across the fast path and daemon mode. The prior accounting only kept track of AIO op's in that mode and not the entire list of operations. This could cause a bogus LIO event complete notification to occur when all of the fast path AIO op's completed and not the AIO op's that ended up queued for the daemon. Suggestions from: alc	2005-10-12 17:51:31 +00:00
Alan Cox	ec9c9e7363	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Alan Cox	b490cc72b2	In lio_listio(2) change jobref from an int to a long so that lio_listio(LIO_WAIT, ...) works correctly on 64-bit architectures. Reviewed by: tegge	2005-06-07 05:28:21 +00:00
Alan Cox	67b95a95eb	Eliminate an unused field from struct aio_liojob.	2005-06-05 05:41:48 +00:00
Alan Cox	bbe7bbdfee	Eliminate the original method of requesting notification of aio_read(2) and aio_write(2) completion through kevent(2). This method does not work on 64-bit architectures. It was deprecated in FreeBSD 4.4. See revisions 1.87 and 1.70.2.7. Change aio_physwakeup() to call psignal(9) directly rather than indirectly through a timeout(9). Discussed with: bde Correct a bug introduced in revision 1.65 that could result in premature delivery of a signal if an lio_listio(2) consisted of a mixture of direct/raw and queued I/O operations. Observed by: tegge Eliminate a field from struct kaioinfo that is now unused. Reviewed by: tegge	2005-06-04 19:16:33 +00:00
Alan Cox	3769f562e2	Synchronize access to the per process aiocb lists in many of the functions.	2005-06-03 05:27:20 +00:00
Alan Cox	e293dc860c	In aio_waitcomplete() correct two cases of using an aiocb after freeing it.	2005-06-02 23:14:38 +00:00
Alan Cox	3148c2c96a	Synchronize access to aio_freeproc with a mutex. Eliminate related spl calls. Reduce the scope of Giant in aio_daemon().	2005-05-30 22:26:34 +00:00
Alan Cox	3999ebe3b6	Use the proc mtx to prevent simultaneous changes to p_aioinfo.	2005-05-30 19:33:33 +00:00
Alan Cox	8285135020	Eliminate unnecessary calls to wakeup(); no one sleeps on &aio_freeproc. Eliminate an unused flag, AIOP_SCHED; it's cleared but never set.	2005-05-30 18:02:00 +00:00
Alan Cox	95eca142ec	Eliminate aio_activeproc; it's unused.	2005-05-30 05:25:10 +00:00
Alan Cox	8484b5e66c	Eliminate aio_bufjobs; it's unused.	2005-05-29 21:29:15 +00:00
Jeff Roberson	a230c79b9e	- Acquire Giant in AIO's iodone routine. VFS will no longer do it for us soon. Sponsored by: Isilon Systems, Inc.	2005-04-30 11:27:31 +00:00
John-Mark Gurney	c4c44d2935	fix aio+kq... I've been running ambrisko's test program for much longer w/o problems than I was before... This simply brings back the knote_delete as knlist_delete which will also drop the knote's, instead of just clearing the list and seeing _ONESHOT... Fix a race where if a note was _INFLUX and _DETACHED, it could end up being modified... whoopse.. MFC after: 1 week Prodded by: ambrisko and dwhite	2005-03-18 01:11:39 +00:00
Poul-Henning Kamp	5ece08f57a	Make a SYSCTL_NODE static	2005-02-10 12:23:29 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	c569065139	Remove buf->b_dev field.	2004-11-04 07:59:57 +00:00
Poul-Henning Kamp	6afb3b1c37	Give dev_strategy() an explict cdev argument in preparation for removing buf->b-dev. Put a bio between the buf passed to dev_strategy() and the device driver strategy routine in order to not clobber fields in the buf. Assert copyright on vfs_bio.c and update copyright message to canonical text. There is no legal difference between John Dysons two-clause abbreviated BSD license and the canonical text.	2004-10-29 07:16:37 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Alfred Perlstein	576c004fb9	cover soreadable and sowriteable with the corresponding socketbuffer locks.	2004-10-01 05:54:06 +00:00
Poul-Henning Kamp	1a52a73d68	Eliminate DEV_STRATEGY() macro: call dev_strategy() directly. Make dev_strategy() handle errors and departing devices properly.	2004-09-23 14:45:04 +00:00
Robert Watson	b6ac582880	Tag AIO as requiring Giant over the network stack using NET_NEEDS_GIANT(). RELENG_5 candidate.	2004-09-03 03:19:14 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
John-Mark Gurney	ac77164d64	clean up whitespace...	2004-08-13 17:43:53 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
Robert Watson	9535efc00d	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.	2004-06-17 22:48:11 +00:00
Poul-Henning Kamp	77409fe148	Add missing #include <sys/module.h>	2004-05-30 20:34:58 +00:00
Peter Wemm	a5bdcb2a2f	Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.	2004-03-14 02:06:28 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
John-Mark Gurney	0eb3b7bb7f	don't allow reading from files that haven't been open'd for reading.	2003-10-24 21:07:53 +00:00
Hidetoshi Shimokawa	a44ca4f05f	We need to initialize bp->b_offset and bp->b_iooffset becuase bp->b_blkno is ignored now.	2003-10-21 13:18:19 +00:00
Hidetoshi Shimokawa	8edbaf859d	Fix asynchronous physio breakage introduced in rev 1.163. We cannnot use bp->b_caller2 because DEV_STRATEGY will overwrite it.	2003-09-10 15:48:51 +00:00
Poul-Henning Kamp	3b6d965263	Add a f_vnode field to struct file. Several of the subtypes have an associated vnode which is used for stuff like the f*() functions. By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use. At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.	2003-06-22 08:41:43 +00:00
Poul-Henning Kamp	e725c18c3a	Get rid of the b_spc specialty field in struct buf by using an already available caller private field.	2003-06-16 07:18:39 +00:00

1 2 3 4 5 ...

311 Commits