freebsd-skq

Author	SHA1	Message	Date
John Baldwin	31bd8c7dcf	Comment nits.	2009-05-20 18:36:17 +00:00
John Baldwin	63ba66086a	Put the vnode returned from namei() immediately after namei() returns in svr4_sys_resolvepath().	2009-05-20 18:25:16 +00:00
Ed Schouten	e5a34e5582	Regenerate system call tables to use SVN ids.	2009-05-08 20:16:04 +00:00
Ed Schouten	51e6ba0f00	Burn TTY ioctl bridges in compat layers. I really don't want any pieces of code to include ioctl_compat.h, so let the ibcs2 and svr4 compat leave sgtty alone. If they want to support sgtty, they should emulate it on top of termios, not sgtty. The code has been marked with BURN_BRIDGES for a long time. ibcs2 and svr4 are not really popular pieces of code anyway.	2009-05-08 20:06:37 +00:00
Marko Zec	29b02909eb	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
Dmitry Chagin	32c01de21c	Implement new way of branding ELF binaries by looking to a ".note.ABI-tag" section. The search order of a brand is changed, now first of all the ".note.ABI-tag" is looked through. Move code which fetch osreldate for ELF binary to check_note() handler. PR: 118473 Approved by: kib (mentor)	2009-03-13 16:40:51 +00:00
Bjoern A. Zeeb	33553d6e99	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
John Baldwin	ea77ff0a15	Use shared vnode locks when invoking VOP_READDIR(). MFC after: 1 month	2009-02-13 18:18:14 +00:00
Ed Schouten	a4611ab612	Last step of splitting up minor and unit numbers: remove minor(). Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev(). We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check. Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now. I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.	2009-01-28 17:57:16 +00:00
Konstantin Belousov	6f3475454e	Remove two remnant uses of AT_DEBUG.	2008-12-17 13:13:35 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Konstantin Belousov	b4cf0e62f4	Add sv_flags field to struct sysentvec with intention to provide description of the ABI of the currently executing image. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures to determine ABI features. Discussed with: dchagin, imp, jhb, peter	2008-11-22 12:36:15 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Konstantin Belousov	a8d403e102	Change the static struct sysentvec and struct Elf_Brandinfo initializers to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month	2008-09-24 10:14:37 +00:00
Ed Schouten	7969b32c44	Allow COMPAT_SVR4 to be built without COMPAT_43. It seems we only depend on COMPAT_43 to implement the send() and recv() routines. We can easily implement them using sendto() and recvfrom(), just like we do inside our very own C library. I wasn't able to really test it, apart from simple compilation testing. I've heard rumours that COMPAT_SVR4 is broken inside execve() anyway. It's still worth to fix this, because I suspect we'll get rid of COMPAT_43 somewhere in the future... Reviewed by: rdivacky Discussed with: jhb	2008-09-15 15:09:35 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Julian Elischer	576c43c844	We left out V_static_len from ip_fw2.c (also a whitespace diff that i'd rahter fix her ethan break in the vimage branch.)	2008-08-25 05:38:18 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Roman Divacky	cabce2bf19	The vmspace->vm_daddr is constant until freed, there is no need to hold lock while accessing it. Approved by: kib (mentor)	2008-04-21 21:24:08 +00:00
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
Doug Rabson	dfdcada31e	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
Antoine Brodin	afe5acff1b	Simplify fcntl(SVR4_F_DUP2FD) code now that FreeBSD has F_DUP2FD. Approved by: rwatson (mentor)	2008-03-17 18:27:28 +00:00
Jeff Roberson	66257bc8d9	- The P_SA flag has been removed. Don't reference it in a KASSERT.	2008-03-12 22:17:06 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Jeff Roberson	397c19d175	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
Konstantin Belousov	89b57fcf01	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Attilio Rao	a1fe14bc33	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)	2007-06-09 21:48:44 +00:00
Attilio Rao	a140976eb4	The current rusage code show peculiar problems: - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)	2007-06-09 18:56:11 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Ruslan Ermilov	9f70620442	Regen. Forgotten by: trhodes	2006-11-11 21:49:08 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Robert Watson	0ee913128d	Remove two hypothetical calls to suser() in ifdef'd (and uncompilable) svr4 code: this code would call centralized sysctl code that does these checks also. MFC after: 1 week Obtained from: TrustedBSD Project Sponsored by: nCircle Network Security, Inc.	2006-09-02 08:18:22 +00:00
John Baldwin	f8f1f7fb85	Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier systrace changes.	2006-08-15 17:37:01 +00:00
John Baldwin	df78f6d313	- Remove unused sysvec variables from various syscalls.conf. - Send the systrace_args files for all the compat ABIs to /dev/null for now. Right now makesyscalls.sh generates a file with a hardcoded function name, so it wouldn't work for any of the ABIs anyway. Probably the function name should be configurable via a 'systracename' variable and the functions should be stored in a function pointer in the sysvec structure.	2006-08-15 17:25:55 +00:00
Robert Watson	3d0685834f	With socket code no longer in svr4_stream.c, MAC includes are no longer required, so GC.	2006-08-05 22:04:21 +00:00
Brooks Davis	012759b743	Use TAILQ_EMPTY instead of checking if TAILQ_FIRST is NULL.	2006-08-04 21:15:09 +00:00
John Baldwin	91ce2694d1	Regen for MPSAFE flag removal.	2006-07-28 19:08:37 +00:00
John Baldwin	af5bf12239	Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.	2006-07-28 19:05:28 +00:00
John Baldwin	78371ec202	Regen.	2006-07-28 16:56:44 +00:00
John Baldwin	95e7d19dfa	- Explicitly lock Giant to protect the fields in the svr4_strm structure except for s_family (which is read-only once after it is set when the structure is created). - Mark svr4_sys_ioctl(), svr4_sys_getmsg(), and svr4_sys_putmsg() MPSAFE.	2006-07-28 16:56:17 +00:00
John Baldwin	f30e89ced3	Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.	2006-07-27 19:54:41 +00:00
John Baldwin	3ce72960e8	Regen.	2006-07-21 20:41:33 +00:00
John Baldwin	e0569c0798	Clean up the svr4 socket cache and streams code some to make it more easily locked. - Move all the svr4 socket cache code into svr4_socket.c, specifically move svr4_delete_socket() over from streams.c. Make the socket cache entry structure and svr4_head private to svr4_socket.c as a result. - Add a mutex to protect the svr4 socket cache. - Change svr4_find_socket() to copy the sockaddr_un struct into a caller-supplied sockaddr_un rather than giving the caller a pointer to our internal one. This removes the one case where code outside of svr4_socket.c could access data in the cache. - Add an eventhandler for process_exit and process_exec to purge the cache of any entries for the exiting or execing process. - Add methods to init and destroy the socket cache and call them from the svr4 ABI module's event handler. - Conditionally grab Giant around socreate() in streamsopen(). - Use fdclose() instead of inlining it in streamsopen() when handling socreate() failure. - Only allocate a stream structure and attach it to a socket in streamsopen(). Previously, if a svr4 program performed a stream operation on an arbitrary socket not opened via the streams device, we would attach streams state data to it and change f_ops of the associated struct file while it was in use. The latter was especially not safe, and if a program wants a stream object it should open it via the streams device anyway. - Don't bother locking so_emuldata in the streams code now that we only touch it right after creating a socket (in streamsopen()) or when tearing it down when the file is closed. - Remove D_NEEDGIANT from the streams device as it is no longer needed.	2006-07-21 20:40:13 +00:00
John Baldwin	52d639a953	Add conditional VFS Giant locking to svr4_sys_fchroot() and mark it MPSAFE. Also, call change_dir() instead of doing part of it inline (this now adds a mac_check_vnode_chdir() call) to match fchdir() and call mac_check_vnode_chroot() to match chroot(). Also, use the change_root() function to do the actual change root to match chroot(). Reviewed by: rwatson	2006-07-21 20:28:56 +00:00
John Baldwin	7cf6a457ea	Regen.	2006-07-19 19:03:21 +00:00
John Baldwin	a1ca3e0ba7	Add conditional VFS Giant locking to svr4_sys_resolvepath() and mark it MPSAFE.	2006-07-19 19:03:03 +00:00
John Baldwin	a3616b117c	Make svr4_sys_waitsys() a lot less ugly and mark it MPSAFE. - If the WNOWAIT flag isn't specified and either of WEXITED or WTRAPPED is set, then just call kern_wait() and let it do all the work. This means that this function no longer has to duplicate the work to teardown zombies that is done in kern_wait(). Instead, if the above conditions aren't true, then it uses a simpler loop to implement WNOWAIT and/or tracing for only stopped or continued processes. This function still has to duplicate code from kern_wait() for the latter two cases, but those are much simpler. - Sync the code to handle the WCONTINUED and WSTOPPED cases with the equivalent code in kern_wait(). - Fix several places that would return with the proctree lock still held. - Lock the current process to prevent lost wakeup races when blocking.	2006-07-19 19:01:10 +00:00
John Baldwin	a02f5c6204	Initialize svr4_head during MOD_LOAD rather than on demand.	2006-07-19 18:26:09 +00:00
John Baldwin	90aff9de2d	Regen.	2006-07-11 20:55:23 +00:00
John Baldwin	be5747d5b5	- Add conditional VFS Giant locking to getdents_common() (linux ABIs), ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(), and svr4_sys_getdents64() similar to that in getdirentries(). - Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(), linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and svr4_sys_getdents64() MPSAFE.	2006-07-11 20:52:08 +00:00
John Baldwin	c870740e09	- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.	2006-07-10 21:38:17 +00:00
John Baldwin	b1ee5b654d	Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This mostly consists of pushing a few copyin's and copyout's up into __semctl() as all the other callers were already doing the UIO_SYSSPACE case. This also changes kern_semctl() to set the return value in a passed in pointer to a register_t rather than td->td_retval[0] directly so that callers can only set td->td_retval[0] if all the various copyout's succeed. As a result of these changes, kern_semctl() no longer does copyin/copyout (except for GETALL/SETALL) so simplify the locking to acquire the semakptr mutex before the MAC check and hold it all the way until the end of the big switch statement. The GETALL/SETALL cases have to temporarily drop it while they do copyin/malloc and copyout. Also, simplify the SETALL case to remove handling for a non-existent race condition.	2006-07-08 19:51:38 +00:00
John Baldwin	d699b1ce00	Don't try to copyin extra data for IPC_RMID requests to msgctl() or shmctl(). None of the other ABI's do this (including the native FreeBSD ABI), and uselessly trying to do a copyin() can actually result in a bogus EFAULT if the a process specifies NULL for the optional argument (which is what they should do in this case).	2006-07-06 21:38:24 +00:00
Mark Murray	93c005929f	Housekeeping. Update for maintainers who have handed in their commit bits or (in my case) no longer feel that oversight is necessary.	2006-07-01 10:51:55 +00:00
John Baldwin	cec34dbf79	Regen.	2006-06-27 18:32:16 +00:00
John Baldwin	bb639715d4	Use kern_shmctl() in svr4_sys_shmctl() and drop use of the stackgap. Mark svr4_sys_shmctl() MPSAFE.	2006-06-27 18:31:36 +00:00
John Baldwin	49d409a108	- Add a kern_semctl() helper function for __semctl(). It accepts a pointer to a copied-in copy of the 'union semun' and a uioseg to indicate which memory space the 'buf' pointer of the union points to. This is then used in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap. - Mark linux_ipc() and svr4_sys_semsys() MPSAFE.	2006-06-27 18:28:50 +00:00
John Baldwin	b820787fb3	Regen.	2006-06-26 18:37:36 +00:00
John Baldwin	b0f6106af9	Change svr4_sys_break() to just call obreak() and mark it MPSAFE. Not objected to by: alc	2006-06-26 18:36:57 +00:00
Robert Watson	f7f45ac8e2	Annotate uses of fgetsock() with indications that they should rely on their existing file descriptor references to sockets, rather than use fgetsock() to retrieve a direct socket reference. MFC after: 3 months	2006-04-01 15:25:01 +00:00
John Baldwin	8917b8d28c	- Always call exec_free_args() in kern_execve() instead of doing it in all the callers if the exec either succeeds or fails early. - Move the code to call exit1() if the exec fails after the vmspace is gone to the bottom of kern_execve() to cut down on some code duplication.	2006-02-06 22:06:54 +00:00
David Xu	3e1c732ffa	Fix compiling problem by adding prefix name svr4 to si_xxx macro, the si_xxx macro should not be used in compat headers, as these are standard member names or only can be used in our native header file signal.h.	2005-10-19 09:33:15 +00:00
David Xu	9104847f21	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64	2005-10-14 12:43:47 +00:00
Robert Watson	5f419982c2	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde	2005-09-28 07:03:03 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Robert Watson	13f4c340ae	Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to ifnet.if_drv_flags. Device drivers are now responsible for synchronizing access to these flags, as they are in if_drv_flags. This helps prevent races between the network stack and device driver in maintaining the interface flags field. Many __FreeBSD__ and __FreeBSD_version checks maintained and continued; some less so. Reviewed by: pjd, bz MFC after: 7 days	2005-08-09 10:20:02 +00:00
John Baldwin	ec1f24a934	Add missing dependencies on the SYSVIPC modules.	2005-07-29 19:41:04 +00:00
John Baldwin	ac5ee935dd	Regen.	2005-07-13 20:35:09 +00:00
John Baldwin	8683e7fdc1	Make a pass through all the compat ABIs sychronizing the MP safe flags with the master syscall table as well as marking several ABI wrapper functions safe. MFC after: 1 week	2005-07-13 20:32:42 +00:00
John Baldwin	6e9b02cf80	Regen.	2005-07-13 15:14:54 +00:00
John Baldwin	2773347338	- Stop hardcoding #define's for options and use the appropriate opt_foo.h headers instead. - Hook up the IPC SVR4 syscalls. MFC after: 3 days	2005-07-13 15:14:33 +00:00
John Baldwin	55522478e6	Lock Giant in svr4_add_socket() so that the various svr4_*stat() calls can be marked MP safe as this is the only part of them that is not already MP safe. Approved by: re (scottl)	2005-07-07 19:27:29 +00:00
John Baldwin	03badf38ab	Remove an unused syscallarg() macro leftover from this code's origins in NetBSD. Approved by: re (scottl)	2005-07-07 19:26:43 +00:00
John Baldwin	07fac65b15	Rototill this file so that it actually compiles. It doesn't do anything in the build still due to some #undef's in svr4.h, but if you hack around that and add some missing entries to syscalls.master, then this file will now compile. The changes involved proc -> thread, using FreeBSD syscall names instead of NetBSD, and axeing syscallarg() and retval arguments. Approved by: re (scottl)	2005-07-07 19:25:47 +00:00
Robert Watson	3984b2328c	Rebuild generated system call definition files following the addition of the audit event field to the syscalls.master file format. Submitted by: wsalamon Obtained from: TrustedBSD Project	2005-05-30 15:20:21 +00:00
Robert Watson	f3596e3370	Introduce a new field in the syscalls.master file format to hold the audit event identifier associated with each system call, which will be stored by makesyscalls.sh in the sy_auevent field of struct sysent. For now, default the audit identifier on all system calls to AUE_NULL, but in the near future, other BSM event identifiers will be used. The mapping of system calls to event identifiers is many:one due to multiple system calls that map to the same end functionality across compatibility wrappers, ABI wrappers, etc. Submitted by: wsalamon Obtained from: TrustedBSD Project	2005-05-30 15:09:18 +00:00
John Baldwin	98df9218da	- Change the vm_mmap() function to accept an objtype_t parameter specifying the type of object represented by the handle argument. - Allow vm_mmap() to map device memory via cdev objects in addition to vnodes and anonymous memory. Note that mmaping a cdev directly does not currently perform any MAC checks like mapping a vnode does. - Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the cdev the ioctl is acting on rather than trying to find a suitable vnode to map from. Reviewed by: alc, arch@	2005-04-01 20:00:11 +00:00
David Schultz	a3e1ec194d	Bounds check the user-supplied length used in a copyout() in svr4_do_getmsg(). In principle this bug could disclose data from kernel memory, but in practice, the SVR4 emulation layer is probably not functional enough to cause the relevant code path to be executed. In any case, the emulator has been disconnected from the build since 5.0-RELEASE. Found by: Coverity Prevent analysis tool	2005-03-23 08:28:06 +00:00
Sam Leffler	960f641e6d	fixup signal mapping: o change the mapping arrays to have a zero offset rather than base 1; this eliminates lots of signo adjustments and brings the code back inline with the original netbsd code o purge use of SVR4_SIGTBLZ; SVR4_NSIG is the only definition for how big a mapping array is o change the mapping loops to explicitly ignore signal 0 o purge some bogus code from bsd_to_svr4_sigset o adjust svr4_sysentvec to deal with the mapping table change Enticed into fixing by: Coverity Prevent analysis tool Glanced at by: marcel, jhb	2005-02-25 19:34:10 +00:00
Maxim Sobolev	1a88a252fd	Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad. Requested by: rwatson	2005-02-13 17:37:20 +00:00
Maxim Sobolev	d8ff44b79f	Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not. Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour. As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application. Requested (sorta) by: rwatson MFC after: 2 weeks	2005-02-13 16:42:08 +00:00
John Baldwin	c87b5f76aa	- Implement svr4_emul_find() using kern_alternate_path(). This changes the semantics in that the returned filename to use is now a kernel pointer rather than a user space pointer. This required changing the arguments to the CHECKALT*() macros some and changing the various system calls that used pathnames to use the kern_foo() functions that can accept kernel space filename pointers instead of calling the system call directly. - Use kern_open(), kern_access(), kern_msgctl(), kern_execve(), kern_mkfifo(), kern_mknod(), kern_statfs(), kern_fstatfs(), kern_setitimer(), kern_stat(), kern_lstat(), kern_fstat(), kern_utimes(), kern_pathconf(), and kern_unlink().	2005-02-07 21:53:42 +00:00
Maxim Sobolev	610ecfe035	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks	2005-01-29 23:12:00 +00:00
Warner Losh	898b0535b7	Start each of the license/copyright comments with /*-	2005-01-05 22:34:37 +00:00
John Baldwin	c88379381b	- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.	2005-01-05 22:19:44 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
Poul-Henning Kamp	124e4c3be8	Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.	2004-11-13 11:53:02 +00:00
Poul-Henning Kamp	0ac3a7f694	More sensible FILEDESC_ locking.	2004-11-07 15:59:27 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
John Baldwin	4afec35169	Add a proc *p pointer for td->td_proc to make this code easier to read.	2004-09-24 20:26:15 +00:00
John Baldwin	7eaec467d8	Various small style fixes.	2004-09-22 15:24:33 +00:00

1 2 3 4 5 ...

346 Commits