freebsd-skq

Author	SHA1	Message	Date
schweikh	e566b473a5	s/Muliple/Multiple Removed whitespace at EOL and EOF.	2004-01-10 18:34:01 +00:00
des	1d50ccc7f9	More unparenthesized return values.	2004-01-10 17:14:53 +00:00
des	65c30ff68d	Style: parenthesize return values.	2004-01-10 13:03:43 +00:00
truckman	417693c813	Add a somewhat redundant check on the len arguement to getsockaddr() to avoid relying on the minimum memory allocation size to avoid problems. The check is somewhat redundant because the consumers of the returned structure will check that sa_len is a protocol-specific larger size. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: nectar MFC after: 30 days	2004-01-10 08:28:54 +00:00
cognet	ced43cb5e7	Prevent a race condition between fork1() and whatever changes the pgrp by setting the new process' p_pgrp again before inserting it in the p_pglist. Without it we can get the new process to be inserted in a different p_pglist than the one p2->p_pgrp points to, and this is not something we want to happen. This is not a fix, merely a bandaid, but it will work until someone finds a better way to do it. Discussed with: jhb (a long time ago)	2004-01-09 23:42:36 +00:00
rwatson	e73ac85912	Improve the expressiveness of ttyinfo (^T) when dealing with threads in slightly less usual states: If the thread is on a run queue, display "running" if the thread is actually running, otherwise, "runnable". If the thread is sleeping, and it's on a sleep queue, display the name of the queue, otherwise "unknown" -- previously, in this situation we would display "iowait". If the thread is waiting on a lock, display *lockname. If the thread is suspended, display "suspended" -- previously, in this situation we would display "iowait". If the thread is waiting for an interrupt, display "intrwait" -- previously, in this situation we would display "iowait". If the thread is in a state not handled by the above, display "unknown" -- previously, we would print "iowait". Among other things, this avoids displaying "iowait" when the foreground process turns out to be suspended waiting for a debugger to properly attach.	2004-01-08 22:49:23 +00:00
rwatson	befa7a41a2	Drop the sigacts mutex around calls to stopevent() to avoid sleeping holding the mutex. Because the sigacts pointer can't change while the process is "live" (proc locking (x)), we know our pointer is still valid. In communication with: truckman Reviewed by: jhb	2004-01-08 22:44:54 +00:00
kan	6ecda71663	Add pid to the info printed in lockmgr_printinfo. This makes VFS diagnostic messages slightly more useful.	2004-01-06 04:34:13 +00:00
kan	443a35f2bf	More style fixes. Obtained from: bde	2004-01-05 23:40:46 +00:00
jhb	d619fa8207	- Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recurse on a non-recursive mutex will fail but will not trigger any assertions. - Add an assertion to mtx_lock() that one never recurses on a non-recursive mutex. This is mostly useful for the non-WITNESS case. Requested by: deischen, julian, others (1)	2004-01-05 23:09:51 +00:00
kan	2ffbc2464e	style(9): Add empty line before first code line in functions with no local variables. Properly terminate comment sentences. Indent lines which are longer that 80 characters. Move v_addpollinfo closer to the rest of poll-related functions. Move DEBUG_VFS_LOCKS ifdefed block to the end of file. Obtained from: bde (partly)	2004-01-05 19:04:29 +00:00
kan	2872921967	Cosmetics: strip '\n' from a string passed to Debugger().	2004-01-04 03:42:20 +00:00
davidxu	f39653dda8	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr	2004-01-03 02:02:26 +00:00
njl	8fe7e8ee2d	Move the kernel power change printf under bootverbose since the power_profile script now duplicates the message via syslog.	2004-01-02 18:24:13 +00:00
sam	ca1ce6605d	m_tag fixups in preparation for heavier use: o promote several m_tag_* routines to inline o add an m_tag_setup inline to set the fixed fields in a packet tag o add an m_tag_free method pointer to each mtag to support, for example, allocating tags from zones o have m_tag_find check if the tag list is not empty before calling m_tag_locate to search Reviewed by: brooks, silence from others	2004-01-02 17:27:39 +00:00
dwmalone	1e69eeaeee	Plug a leak of open files that happens when you exec a suid program with one of std{in,out,err} open. This helps with the file descriptor leaks reported on -current. This should probably be merged into 5.2. Reviewed by: ru Tested by: Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>	2003-12-28 19:27:14 +00:00
bde	7d91626477	v_vxproc was a bogus name for a thread (pointer).	2003-12-28 09:12:56 +00:00
silby	a7d8091ae5	Track three new sendfile-related statistics: - The number of times sendfile had to do disk I/O - The number of times sfbuf allocation failed - The number of times sfbuf allocation had to wait	2003-12-28 08:57:09 +00:00
bde	ca9bf1d586	Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).	2003-12-28 04:37:59 +00:00
bde	9ff26a0443	Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().	2003-12-28 04:18:13 +00:00
silby	f71dce4e63	Fix the maxpipekva warning message so that it points to the correct sysctl, and shorten the message. Noticed by: bde	2003-12-28 01:19:58 +00:00
alc	7f5f3bd3db	Remove GIANT_REQUIRED from exec_unmap_first_page().	2003-12-27 19:40:03 +00:00
silby	5c5418dd6e	Track current and peak sfbuf usage, export the values via sysctl.	2003-12-27 07:52:47 +00:00
jhb	17274fb2ed	Create a separate kthread that executes sched_cpu() once a second. Because sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to acquire the lock, it is not safe to execute this in a callout handler from softclock().	2003-12-26 17:07:29 +00:00
alfred	984d4cf4c5	Put restrict back in, the compilation failure was my fault when I did a bad merge from the PR. Thanks to Bruce Evans for explaining.	2003-12-26 05:58:16 +00:00
alfred	278c6c3367	Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr, copyin and copyout.	2003-12-26 05:54:35 +00:00
dwmalone	92fec55360	In socket(2) we only need Giant around the call to socreate, so just grab it there.	2003-12-25 23:44:38 +00:00
dwmalone	ebdd5b9754	Don't TAILQ_INIT kq_head twice, once is enough.	2003-12-25 23:42:36 +00:00
silby	0884486ba2	Fix another 0 / NULL mixup.	2003-12-25 01:17:27 +00:00
alfred	b44ae63caf	We're not ready for restrict qualifiers here.	2003-12-24 19:09:45 +00:00
alfred	c4b798a73d	Add restrict qualifiers. PR: 44394 Submitted by: Craig Rodrigues <rodrige@attbi.com>	2003-12-24 18:47:43 +00:00
rwatson	fc37f21a15	Document that when we are addressing an open()/close() race, the reason we call vn_close() manually rather than letting fdrop() take care of it is that we haven't yet hooked up the various 'struct file' fields.	2003-12-24 17:13:01 +00:00
alfred	a121c1dfef	Introduce mp_maxcpus which can be used by libkvm utils to find out how many CPUs the system was compiled for. Export the variable via a sysctl node 'kern.smp.maxcpus' as well.	2003-12-23 13:54:16 +00:00
peter	b58a2a1deb	Regen - this should be essentially a NOP, except for rcsid changes.	2003-12-23 03:52:14 +00:00
peter	06d2b26b72	Remove namespc column and attempt to un-fold some of the longer lines that now fit.	2003-12-23 03:51:36 +00:00
peter	e3a23c9582	Remove the namespace column from the syscalls tables. We don't actually use it, if we ever did. They have been been VERY poorly maintained for some time, possibly because they were a NOP. FWIW, This brings our table formats back closer to the other *BSD's.	2003-12-23 03:50:43 +00:00
peter	998b79089f	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.	2003-12-23 02:42:39 +00:00
peter	7cc77e03ae	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.	2003-12-23 02:36:43 +00:00
peter	0d11c5c3c4	Don't use NULL (pointer) when we mean 0 (integer) for the number of ticks in msleep.	2003-12-23 02:28:42 +00:00
jeff	eac1e55acc	- Make our transfer decisions based on load and not transferable load. A cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.	2003-12-20 22:35:20 +00:00
jeff	d4f3760df1	- Enable ithread migration on x86. This is done to work around a bug in the IO APIC on Xeons that prevents round-robin interrupt assignment from working.	2003-12-20 20:36:19 +00:00
alc	a7fef684f6	Remove a variable that has been initialized but otherwise unused since revision 1.315.	2003-12-20 19:46:21 +00:00
jeff	40c79491e2	- In kseq_transfer() return if smp has not been started. - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc	2003-12-20 14:03:14 +00:00
jeff	04d161f363	- Running interactive tasks with the minimum time-slice is fine for vi and sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.	2003-12-20 12:54:35 +00:00
tjr	50207b49b7	Reduce the overhead of semop() by using the kernel stack instead of malloc'd memory to store the operations array if it is small enough to fit.	2003-12-19 13:07:17 +00:00
jhb	ecc9efa2f1	Various style fixes. Submitted by: bde (mostly, if not all)	2003-12-17 21:13:04 +00:00
jeff	5443bd4c65	- In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT. Submitted by: Stephan Uphoff <ups@stups.com>	2003-12-16 17:08:27 +00:00
jeff	aa712bc6e4	- When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by reassigning their v_ops field to specfs, detaching from the mountpoint, etc. However, this is not sufficient. If we vclean() the vnode the pages owned by the vnode are lost, potentially while buffers reference them. Implement parts of vclean() seperately in vgonechrl() so that the pages and bufs associated with a device vnode are not destroyed while in use.	2003-12-16 17:05:05 +00:00
bms	1b8b89ab32	style(9) pass and type fixups. Submitted by: bde	2003-12-16 14:13:47 +00:00
bms	3eb53d90ef	Push m_apply() and m_getptr() up into the colleciton of standard mbuf routines, and purge them from opencrypto. Reviewed by: sam Obtained from: NetBSD Sponsored by: spc.org	2003-12-15 21:49:41 +00:00
jeff	fe983d4260	- Assign the ke_cpu field in kseq_notify() so that all of our callers do not have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.	2003-12-14 02:06:29 +00:00
rwatson	012b8f6c02	Although sometimes to the uninitiated, it may seem like goup, KSEGOUP is actually spelt KSEGROUP. Go figure. Reported by: samy@kerneled.com	2003-12-12 21:25:56 +00:00
jeff	80e1439e63	- Now that we have kseq groups, balance them seperately. - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.	2003-12-12 07:33:51 +00:00
jeff	6edc4a1eb1	- Don't let the pctcpu rate limiter throttle us if we have recorded over SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.	2003-12-11 04:23:39 +00:00
jeff	da98a74234	- In sched_switch(), if a thread has been assigned, don't touch the runqueues or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.	2003-12-11 04:00:49 +00:00
jeff	7c857e9275	- Add support for CPU groups to ule. All SMT cores on the same physical cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.	2003-12-11 03:57:10 +00:00
peter	e24b9cafc1	Regen	2003-12-10 22:18:54 +00:00
peter	4c2b7999cf	Update file locations for syscall tables to copy to.	2003-12-10 22:08:37 +00:00
marcel	b6631c500b	Write the thread pointer (val) in the kse mailbox (loc) before we set the new context in kse_switchin(2). This allows us to return an error to the calling context when the suword() fails.	2003-12-10 01:59:23 +00:00
jhb	d8b6cc614a	Adjust an assertion for the TDF_TSNOBLOCK race handling in turnstile_unpend(). A racing thread that does not have TDI_LOCK set may either be running on another CPU or it may be sitting on a run queue if it was preempted during the very small window in turnstile_wait() between unlocking the turnstile chain lock and locking sched_lock.	2003-12-09 21:14:31 +00:00
jhb	f110a9ab64	Assert that the we never give a thread a NULL turnstile when waking it up.	2003-12-09 21:09:54 +00:00
jhb	66cc89fadf	Revert the previous race fix and replace it with a more general fix. The case of a turnstile having no threads is just one instance of the more general case where the thread we are examining has been partially awakened already in that it has been removed from the turnstile's blocked list but still has TDI_LOCK set. We detect that case by checking to see if the thread has already had a turnstile reassigned to it.	2003-12-09 21:09:04 +00:00
davidxu	69ce33ca6e	Lock and unlock sched_lock when walking through thread list, current we insert kse upcall thread into thread list at mi_switch time, process lock is not enough.	2003-12-07 23:47:15 +00:00
truckman	e9a439edcd	Pass MTX_DEF as the last argument to mtx_init() instead of 0. This is not a functional change. The code happened to work properly only because MTX_DEF is defined as 0.	2003-12-07 21:53:41 +00:00
phk	239117c33f	Make the DIAGNOSTIC code which complains about long {call\|time}out(9) functions less noisy: We printf if a new function took longer than the previous record holder, or of the previous record holder took more than twice as long as the current record.	2003-12-07 20:03:28 +00:00
marcel	f3326a4c71	Regen due to kse_switchin(2).	2003-12-07 19:36:16 +00:00
marcel	2ba380839b	Add kse_switchin(2). This syscall can be used by KSE implementations to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.	2003-12-07 19:34:29 +00:00
peter	e28e232dff	rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64). Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff	2003-12-07 09:57:51 +00:00
scottl	2b68e67c6d	Re-arrange and consolidate some random debugging stuff	2003-12-07 05:04:49 +00:00
alc	4408614be4	- Giant is no longer required by vm_thread_new().	2003-12-07 04:16:49 +00:00
rwatson	08335c63bf	Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(), and the mpo_create_cred() MAC policy entry point to mpo_copy_cred_label(). This is more consistent with similar entry points for creation and label copying, as mac_create_cred() was called from crdup() as opposed to during process creation. For a number of policies, this removes the requirement for special handling when copying credential labels, and improves consistency. Approved by: re (scottl) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-12-06 21:48:03 +00:00
jhb	4b61439e79	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
jhb	907202ec1f	Export a few SMP related symbols in UP kernels as well. This is needed to aid other kernel code, especially code which can be in a module such as the acpi_cpu(4) driver, to work properly with both SMP and UP kernels. The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and the smp_rendezvous() function. This also means that CPU_ABSENT() is now always implemented the same on all kernels. Approved by: re (scottl)	2003-12-03 14:55:31 +00:00
dg	da88330aaa	Fixed a bug in sendfile(2) where the sent data would be corrupted due to sendfile(2) being erroneously automatically restarted after a signal is delivered. Fixed by converting ERESTART to EINTR prior to exiting. Updated manual page to indicate the potential EINTR error, its cause and consequences. Approved by: re@freebsd.org	2003-12-01 22:12:50 +00:00
iedowse	bc2791c3fa	In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the forced unmount case. Otherwise, a file system that is referenced only by process fd_cdir/fd_rdir references to the file system root vnode will be successfully unmounted without the MNT_FORCE flag. The previous behaviour was not compatible with the unmount semantics required by amd(8), so file systems could be unexpectedly unmounted while there were still references to the file system root directory. Reported by: Erez Zadok <ezk@cs.sunysb.edu> Approved by: re (scottl)	2003-11-30 23:30:09 +00:00
jeff	e35dcab926	- Don't forget to unlock the vnode interlock in the LK_NOWAIT case. Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)	2003-11-30 22:09:58 +00:00
kan	c31eef63dc	Do not attempt to destroy NULL vfs options list. Approved by: re (scottl) Reported by: Christian Laursen <xi atborderworlds dot dk>	2003-11-23 17:13:48 +00:00
jhb	bbe7d290ea	- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)	2003-11-21 22:23:26 +00:00
markm	6a2f4748c4	Fix a major faux pas of mine. I was causing 2 very bad things to happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *	2003-11-20 15:35:48 +00:00
markm	832b97971f	Hackfix to patch around a kernel panic I introduced. Real fix to follow. In the meanwhile, we are not harvesting interrupt entropy. Approved by: re (jhb)	2003-11-18 14:35:43 +00:00
rwatson	9c969b771a	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
rwatson	cc012e0835	Add a sysctl, security.bsd.see_other_gids, similar in semantics to see_other_uids but with the logical conversion. This is based on (but not identical to) the patch submitted by Samy Al Bahra. Submitted by: Samy Al Bahra <samy@kerneled.com>	2003-11-17 20:20:53 +00:00
peter	9dedda25aa	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.	2003-11-17 08:58:16 +00:00
jeff	71a2f6d146	- Mark ksq_assigned as volatile so that when this code is used without sched_lock we can be sure that we'll pick up the new value.	2003-11-17 08:27:11 +00:00
jeff	a6911261a3	- Remove long dead code. rslices hasn't been used in some time and neither has sched_pickcpu().	2003-11-17 08:24:14 +00:00
peter	38ebd79a92	Expand the argument to the ithread enable/disable helper hooks from an int to something big enough to hold a pointer. amd64 needs this.	2003-11-17 06:08:10 +00:00
rwatson	7aa5c2497a	Implement sockets support for __mac_get_fd() and __mac_set_fd() system calls, and prefer these calls over getsockopt()/setsockopt() for ABI reasons. When addressing UNIX domain sockets, these calls retrieve and modify the socket label, not the label of the rendezvous vnode. - Create mac_copy_socket_label() entry point based on mac_copy_pipe_label() entry point, intended to copy the socket label into temporary storage that doesn't require a socket lock to be held (currently Giant). - Implement mac_copy_socket_label() for various policies. - Expose socket label allocation, free, internalize, externalize entry points as non-static from mac_net.c. - Use mac_socket_label_set() in __mac_set_fd(). MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and mac_get_peer() to retrieve and set various socket labels without directly invoking the getsockopt() interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 23:31:45 +00:00
rwatson	f9ad21ec5d	Reduce gratuitous redundancy and length in function names: mac_setsockopt_label_set() -> mac_setsockopt_label() mac_getsockopt_label_get() -> mac_getsockopt_label() mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel() Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 18:25:20 +00:00
alc	74614e7f63	- Modify alpha's sf_buf implementation to use the direct virtual-to- physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.	2003-11-16 06:11:26 +00:00
rwatson	0dca1bb7dd	When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, make sure to sooptcopyin() the (struct mac) so that the MAC Framework knows which label types are being requested. This fixes process queries of socket labels. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 03:53:36 +00:00
bde	60cfaec287	Localized the cy driver's locking.	2003-11-16 00:55:54 +00:00
phk	1cf4c96919	Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout".	2003-11-15 18:33:54 +00:00
tjr	9e7ac78554	Initialize sequence numbers to 0 in seminit() instead of using whatever garbage happens to be in memory. This did not seem to cause any problems except making semaphore ID's unpredictable (and ugly in ipcs(1) output).	2003-11-15 11:56:53 +00:00
phk	d04b779c2d	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
alc	0e7d8d9c51	- Remove the remaining now unnecessary checks for the buf's b_object being NULL. See revision 1.421 for more detail. - Remove GIANT_REQUIRED from vfs_unbusy_pages(). Discussed with: jeff	2003-11-15 08:45:36 +00:00
jeff	be190686fa	- Introduce kseq_runq_{add,rem}() which are used to insert and remove kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.	2003-11-15 07:32:07 +00:00
cognet	690de3f7ac	Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week	2003-11-14 18:49:01 +00:00
kan	1246b503c6	Fix a number of style(9) bugs introduced in r1.113 by me. Suggested by: bde	2003-11-14 05:27:41 +00:00
jeff	76902f9650	- regen.	2003-11-14 03:49:41 +00:00
jeff	7491142e9a	- Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.	2003-11-14 03:48:37 +00:00
phk	c92feb226d	Various minor details: Give the HZ/overflow check a 10% margin. Eliminate bogus newline. If timecounters have equal quality, prefer higher frequency. Some inspiration from: bde	2003-11-13 10:03:58 +00:00
jhb	989e0408dd	- Close a race where a thread on another CPU could release a contested lock and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)	2003-11-12 23:48:42 +00:00
mckusick	b2ca74654c	At the request of several developers, restore the DIAGNOSIC code deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>	2003-11-12 22:28:27 +00:00
rwatson	3f5efde5af	Mark __mac_get_pid() as MPSAFE in the comment, as it runs without Giant and is also MPSAFE. Push Giant further down into __mac_get_fd() and __mac_set_fd(), grabbing it only for constrained regions dealing with VFS, and dropping it entirely for operations related to labeling of pipes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 22:19:15 +00:00
peter	e97c7f33cb	MNAMELEN is back to an int again after Kirk's statfs commit kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1	2003-11-12 17:09:12 +00:00
jhb	b996af9fb8	Fix a typo in a comment. Submitted by: das	2003-11-12 14:55:45 +00:00
phk	70388f65e9	Replace B_PHYS conditional assignment to bio_offset with KASSERT check to see that the originating code already did it right.	2003-11-12 10:27:06 +00:00
mckusick	a7fc450278	Update the five files derived from /sys/kern/syscalls.master after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.	2003-11-12 08:09:19 +00:00
mckusick	6a4c30bccd	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
rwatson	77ed6e2d1c	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
kan	9352a05d40	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson	2003-11-12 02:54:47 +00:00
jhb	6cc1f7e330	Add an implementation of turnstiles and change the sleep mutex code to use turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP	2003-11-11 22:07:29 +00:00
jkoshy	ac83b0ec2b	Bound the number of iterations a thread can perform inside ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)	2003-11-11 09:09:26 +00:00
jkoshy	edc6e45a50	Have utrace(2) return ENOMEM if malloc() fails. Document this error return in its manual page. Reviewed by: jhb	2003-11-11 04:54:11 +00:00
alc	148f9321bd	- Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being consistency initialized. Consequently, a number of conditionals that checked the validity of b_object before passing it to VM_OBJECT_LOCK() and VM_OBJECT_UNLOCK() are no longer needed.	2003-11-11 04:45:37 +00:00
rwatson	ce4ce483f9	Whitespace sync to MAC branch, expand comment at the head of the file.	2003-11-11 03:40:04 +00:00
alfred	8837bee815	Fix a bug where the taskqueue kproc was being parented by init because RFNOWAIT was being passed to kproc_create. The result was that shutdown took quite a bit longer because this errant "child" would not respond to termination signals from init at system shutdown. RFNOWAIT dissassociates itself from the caller by attaching to init as a parent proc. We could have had the taskqueue proc listen for SIGKILL, but being able to SIGKILL a potentially critical system process doesn't seem like a good idea.	2003-11-10 20:39:44 +00:00
tjr	3fb83070b6	When there are no free sem_undo structs available in semu_alloc(), only free one sem_undo with un_cnt == 0 instead of all of them. This is a temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so that it doesn't cause cycles in semu_list when removing multiple adjacent items. It might be easier to just use (doubly-linked) LISTs here instead of complicated SLIST code to achieve O(1) removals. This bug manifested itself as a complete lockup under heavy semaphore use by multiple processes with the SEM_UNDO flag set. PR: 58984	2003-11-10 07:22:41 +00:00
marcel	21340f30b3	Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().	2003-11-09 20:31:04 +00:00
bde	67e14c35e9	Quick fix for scaling of statclock ticks in the SMP case. As explained in the log message for kern_sched.c 1.83 (which should have been repo-copied to preserve history for this file), the (4BSD) scheduler algorithm only works right if stathz is nearly 128 Hz. The old commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz but there was a scale factor of 4 to give the requirement of 64 Hz, and rev.1.83 changed the scale factor so that the requirement became 128 Hz. The change of the scale factor was incomplete in the SMP case. Then scheduling ticks are provided by smp_ncpu CPUs, and the scheduler cannot tell the difference between this and 1 CPU providing scheduling ticks smp_ncpu times faster, so we need another scale factor of smp_ncp or an algorithm change. This quick fix uses the scale factor without even trying to optimize the runtime divisions required for this as is done for the other scale factor. The main algorithmic problem is the clamp on the scheduling tick counts. This was 295; it is now approximately 295 * smp_ncpu. When the limit is reached, threads get free timeslices and scheduling becomes very unfair to the threads that don't hit the limit. The limit can be reached and maintained in the worst case if the load average is larger than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with 2 CPUs before this change), so there are algorithmic problems even for a load average of 1. Fortunately, the worst case isn't common enough for the problem to be very noticeable (it is mainly for niced CPU hogs competing with less nice CPU hogs).	2003-11-09 13:45:54 +00:00
tanimura	7eade05dfa	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
sam	7f3b205cb8	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
davidxu	42bda3c853	Return a reasonable number for top or ps to display for M:N thread, since there is no direct association between M:N thread and kse, sometimes, a thread does not have a kse, in that case, return a pctcpu from its last kse, it is not perfect, but gives a good number to be displayed.	2003-11-08 03:03:17 +00:00
jhb	b6b663a274	Regen.	2003-11-07 20:30:30 +00:00
jhb	0ec5596a30	Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe. The parts of these calls that are not yet MP safe acquire Giant explicitly.	2003-11-07 20:23:23 +00:00
rwatson	f74e2ef8ef	Slight whitespace consistency improvement: Trim trailing whitespace. Remove unmatched " " before ")".	2003-11-07 04:47:14 +00:00
jeff	6fc9686d6a	- Somehow I botched my last commit. Add an extra ( to fix things up. I'm still not sure how this happened. Reported by: ps	2003-11-06 07:56:01 +00:00
alc	adf47c347a	- Delay the allocation of memory for the pipe mutex until we need it. This avoids the need to free said memory in various error cases along the way.	2003-11-06 05:58:26 +00:00
alc	ddec4754a2	- Simplify pipespace() by eliminating the explicit creation of vm objects. Instead, let the vm objects be lazily instantiated at fault time. This results in the allocation of fewer vm objects and vm map entries due to aggregation in the vm system.	2003-11-06 05:08:12 +00:00
rwatson	c7fff281b1	Remove the flags argument from mac_externalize_*_label(), as it's not passed into policies or used internally to the MAC Framework. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-06 03:42:43 +00:00
jeff	e150e86d4b	- Remove the local definition of sched_pin and unpin. They are provided in sched.h now. - Respect the td pin count.	2003-11-06 03:09:51 +00:00
sam	0927f68a45	o make debug_mpsafenet globally visible o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation	2003-11-05 23:42:51 +00:00
imp	535923b1ec	Minor style(9) nit	2003-11-05 06:14:48 +00:00
jeff	9f266b0941	- It's ok if sched_runnable() has races in it, we don't need the sched_lock here unless we have something on the assigned queue.	2003-11-05 05:30:12 +00:00
kan	36d60f3bb7	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
fjoe	160f13825c	Back out the following revisions: 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week	2003-11-05 01:53:10 +00:00
mckusick	609a7ca3c1	Get rid of DIAGNOSTIC that gives false positives on slow CPUs.	2003-11-04 08:03:11 +00:00
jeff	23d5b83434	- Add initial support for pinning and binding.	2003-11-04 07:45:41 +00:00
mckusick	993e4dcc97	Allow the bufdaemon and update daemon processes to skip the waitrunningbufspace() calls so that they are always able to proceed and clean up buffer space. Submitted by: Brian Fundakowski Feldman <green@freebsd.org>	2003-11-04 06:30:00 +00:00
sam	0faeaa0329	disable MPSAFE network drivers; we aren't ready yet`	2003-11-04 02:01:42 +00:00
cognet	8a3fa16a0a	I believe kbyanc@ really meant this in rev 1.58. Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered. MFC after: 1 week	2003-11-04 01:41:47 +00:00
cognet	a3a383926f	Do not attempt to report proc event if NOTE_EXIT has already been received. This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen. PR: kern/58258	2003-11-04 01:14:58 +00:00
jhb	da7c66355b	Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead, let the MD code choose whether or not to implement such a policy. The new i386 interrupt code allows multiple FAST handlers for a given source for example. However, the code does not allow FAST and non-FAST handlers to be mixed.	2003-11-03 22:42:58 +00:00
jhb	1d8b4454d7	Update spin lock order list for new i386 interrupt and SMP code.	2003-11-03 22:38:30 +00:00
rwatson	95e56a059a	Unlock pipe mutex when failing MAC pipe ioctl access control check. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-03 17:58:23 +00:00
jeff	19f727b426	- Remove kseq_find(), we no longer scan other cpu's run queues when we go idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry	2003-11-03 03:27:22 +00:00
jeff	40aa6873ef	- Remove the ksq_loads[] array. We are only interested in three counts, the total load, the timeshare load, and the number of threads that can be migrated to another cpu. Account for these seperately. - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE can be migrated to another CPU. Currently, this only checks to see if we're an interrupt handler. Eventually this will also be used to support CPU binding.	2003-11-02 10:56:48 +00:00
kan	618baf4714	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
jeff	e8941020fa	- In sched_prio() only force us onto the current queue if our priority is being elevated (numerically smaller).	2003-11-02 04:25:59 +00:00
jeff	5c293e491d	- Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in slice assignment. Add a comment describing what it does. - Remove a stale XXX comment, the nice should not impact the interactivity, nice adjustments only effect non-interactive tasks in ULE. - Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice +20 tasks as intended.	2003-11-02 04:10:15 +00:00

1 2 3 4 5 ...

6990 Commits