freebsd-nq

Author	SHA1	Message	Date
Robert Watson	abdeb3b01f	Canonicalize copyrights in some files I hold copyrights on: - Sort by date in license blocks, oldest copyright first. - All rights reserved after all copyrights, not just the first. - Use (c) to be consistent with other entries. MFC after: 3 days	2007-01-08 17:49:59 +00:00
Jeff Roberson	eddb4efacd	- Don't let SCHED_TICK_TOTAL() return less than hz. This can cause integer divide faults in roundup() later if it is able to return 0. For some reason this bug only shows up on my laptop and not my testboxes.	2007-01-06 12:33:43 +00:00
Jeff Roberson	1e516cf534	- Fix the sched_priority() invalid priority bugs. Use roundup() instead of max() when computing the divisor in SCHED_TICK_PRI(). This prevents cases where rounding down would allow the quotient to exceed SCHED_PRI_RANGE. - Garbage collect some unused flags and fields. - Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply duplicated this functionality. - Re-enable the rebalancer by default and fix the sysctl so it can be modified.	2007-01-06 08:44:13 +00:00
Jeff Roberson	9330bbbb61	- Don't IPI unless we're going to interrupt something exiting in the kernel. otherwise we can afford the latency. This makes a significant performance improvement.	2007-01-06 02:34:23 +00:00
Jeff Roberson	155b6ca12b	- Fix a comparison in sched_choose() that caused cpus to be constantly marked idle, thus breaking cpu load balancing. - Change sched_interact_update() to fix cases where the stored history has expanded significantly rather than handling them in the callers. This fixes a case where sched_priority() could compute a bad value. - Add a sysctl to disable the global load balancer for experimentation.	2007-01-05 23:45:38 +00:00
John Baldwin	9ae328fc8f	- Close a race between enumerating UNIX domain socket pcb structures via sysctl and socket teardown by adding a reference count to the UNIX domain pcb object and fixing the sysctl that enumerates unpcbs to grab a reference on each unpcb while it builds the list to copy out to userland. - Close a race between UNIX domain pcb garbage collection (unp_gc()) and file descriptor teardown (fdrop()) by adding a new garbage collection flag FWAIT. unp_gc() sets FWAIT while it walks the message buffers in a UNIX domain socket looking for nested file descriptor references and clears the flag when it is finished. fdrop() checks to see if the flag is set on a file descriptor whose refcount just dropped to 0 and waits for unp_gc() to clear the flag before completely destroying the file descriptor. MFC after: 1 week Reviewed by: rwatson Submitted by: ups Hopefully makes the panics go away: mx1	2007-01-05 19:59:46 +00:00
Jeff Roberson	8ab80cf009	- ftick was initialized to -1 for init and any of it's children. Fix this by setting ftick = ltick = ticks in schedinit(). - Update the priority when we are pulled off of the run queue and when we are inserted onto the run queue so that it more accurately reflects our present status. This is important for efficient priority propagation functioning. - Move the frequency test into sched_pctcpu_update() so we don't repeat it each time we'd like to call it. - Put some temporary work-around code in sched_priority() in case the tick mechanism produces a bad priority. Eventually this should revert to an assert again.	2007-01-05 08:50:38 +00:00
Jeff Roberson	3f872f85d2	- Only allow the tdq_idx to increase by one each tick rather than up to the most recently chosen index. This significantly improves nice behavior. This allows a lower priority thread to run some multiple of times before the higher priority thread makes it to the front of the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice difference of 1 makes a 4% difference in cpu usage between two hogs. - Track a seperate insert and removal index. When the removal index is empty it is updated to point at the current insert index. - Don't remove and re-add a thread to the runq when it is being adjusted down in priority. - Pull some conditional code out of sched_tick(). It's looking a bit large now.	2007-01-04 12:16:19 +00:00
Jeff Roberson	cd49bb7047	- Don't pass a pointer into runq_choose_from(). The caller can adjust the index if it chooses to.	2007-01-04 12:10:58 +00:00
Jeff Roberson	e7d50326de	ULE 2.0: - Remove the double queue mechanism for timeshare threads. It was slow due to excess cache lines in play, caused suboptimal scheduling behavior with niced and other non-interactive processes, complicated priority lending, etc. - Use a circular queue with a floating starting index for timeshare threads. Enforces fairness by moving the insertion point closer to threads with worse priorities over time. - Give interactive timeshare threads real-time user-space priorities and place them on the realtime/ithd queue. - Select non-interactive timeshare thread priorities based on their cpu utilization over the last 10 seconds combined with the nice value. This gives us more sane priorities and behavior in a loaded system as compared to the old method of using the interactivity score. The interactive score quickly hit a ceiling if threads were non-interactive and penalized new hog threads. - Use one slice size for all threads. The slice is not currently dynamically set to adjust scheduling behavior of different threads. - Add some new sysctls for scheduling parameters. Bug fixes/Clean up: - Fix zeroing of td_sched after initialization in sched_fork_thread() caused by recent ksegrp removal. - Fix KSE interactivity issues related to frequent forking and exiting of kse threads. We simply disable the penalty for thread creation and exit for kse threads. - Cleanup the cpu estimator by using tickincr here as well. Keep ticks and ltick/ftick in the same frequency. Previously ticks were stathz and others were hz. - Lots of new and updated comments. - Many many others. Tested on: up x86/amd64, 8way amd64.	2007-01-04 08:56:25 +00:00
Jeff Roberson	3fed7d239a	- Add three new functions to support circular run queues. - runq_add_pri allows the caller to position the thread at any rqindex regardless of priority. - runq_choose_from() chooses the lowest priority thread starting from a given index. The index is updated with the rqindex of the chosen thread. This routine is used to pick the lowest priority relative to a given index. - runq_remove_idx() updates the index if the run queue that held the removed thread is now empty.	2007-01-04 08:39:58 +00:00
Jeff Roberson	b1c00b13d6	- Fix schedgraph output with KSE threads. Call thread_switchout() after calling CTR() so we don't confuse a new kse thread with a real preemption.	2007-01-03 02:38:41 +00:00
David Xu	fe1a9506fa	Fix compiling.	2007-01-02 04:14:01 +00:00
Robert Watson	2da78e3862	Prefer a more traditional spelling of inhibited in comments and panic messages.	2006-12-31 15:56:04 +00:00
Jeff Roberson	c02bbb43a0	- More search and replace prettying.	2006-12-29 12:55:32 +00:00
Jeff Roberson	d2ad694caa	- Clean up a bit after the most recent KSE restructuring.	2006-12-29 10:37:07 +00:00
Robert Watson	224a974b9b	Break contents of kern_mac.c out into two files following a repo-copy: mac_framework.c Contains basic MAC Framework functions, policy registration, sysinits, etc. mac_syscalls.c Contains implementations of various MAC system calls, including ENOSYS stubs when compiling without options MAC. Obtained from: TrustedBSD Project	2006-12-28 20:52:02 +00:00
Robert Watson	471e5756ad	Update MAC Framework general comments, referencing various interfaces it consumes and implements, as well as the location of the framework and policy modules. Refactor MAC Framework versioning a bit so that the current ABI version can be exported via a read-only sysctl. Further update comments relating to locking/synchronization. Update copyright to take into account these and other recent changes. Obtained from: TrustedBSD Project	2006-12-28 17:25:57 +00:00
David Xu	016fa30228	break loop early if we know that there are at least two signals.	2006-12-25 03:00:15 +00:00
David Xu	34e1241b9d	Fix typo, p_slptime should be td_slptime.	2006-12-24 01:52:27 +00:00
Bruce M Simpson	a86ec33820	Drop all received data mbufs from a socket's queue if the MT_SONAME mbuf is dropped, to preserve the invariant in the PR_ADDR case. Add a regression test to detect this condition, but do not hook it up to the build for now. PR: kern/38495 Submitted by: James Juran Reviewed by: sam, rwatson Obtained from: NetBSD MFC after: 2 weeks	2006-12-23 21:07:07 +00:00
Robert Watson	afae638809	Update comments to reflect changes in the extattrctl() code. Clean up comment formatting. Obtained from: TrustedBSD Project	2006-12-23 00:30:03 +00:00
Robert Watson	168d0553a3	Following a repo-copy of vfs_syscalls.c to vfs_extattr.c, remove non-extattr functions from vfs_extattr.c, and extattr functions from vfs_syscalls.c. Change copyright/license on vfs_extattr.c to my copyright/license on the extended attribute implementation (from extattr.h). Clean up includes a bit. Obtained from: TrustedBSD Project	2006-12-23 00:10:36 +00:00
Robert Watson	0efd6615cd	Move src/sys/sys/mac_policy.h, the kernel interface between the MAC Framework and security modules, to src/sys/security/mac/mac_policy.h, completing the removal of kernel-only MAC Framework include files from src/sys/sys. Update the MAC Framework and MAC policy modules. Delete the old mac_policy.h. Third party policy modules will need similar updating. Obtained from: TrustedBSD Project	2006-12-22 23:34:47 +00:00
Randall Stewart	5288989fac	The prepend function did not handle non-pkthdr's correctly. It always called MH_ALIGN for small lengths being prepended (less than MHLEN). This meant that if you did a prepend on a non M_PKTHDR the system would panic with the KASSERT in MH_ALIGN. Instead we are not aware of this and do a MH_ALIGN or M_ALIGN as appropriate. Reviewed by: andre Approved by: gnn	2006-12-21 19:58:04 +00:00
Robert Watson	e66fe0e1db	Remove mac_enforce_subsystem debugging sysctls. Enforcement on subsystems will be a property of policy modules, which may require access control check entry points to be invoked even when not actively enforcing (i.e., to track information flow without providing protection). Obtained from: TrustedBSD Project Suggested by: Christopher dot Vance at sparta dot com	2006-12-21 09:51:34 +00:00
Robert Watson	17041e6708	Expand commenting on label slots, justification for the MAC Framework locking model, interactions between locking and policy init/destroy methods. Rewrap some comments to 77 character line wrap. Obtained from: TrustedBSD Project	2006-12-20 20:38:44 +00:00
Jung-uk Kim	4e4de5e43c	MFP4: (part of) 110058 copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and it is done from its wrapper functions to support 32-bit emulations. After I implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes copyin()/copyout() function pointers from wrappers. Darwin passes size of message type as an argument, which is actually similar to my first implementation (P4 109706). We may revisit these implementations later.	2006-12-20 19:26:30 +00:00
Konstantin Belousov	cc570216bb	In rev. 1.514, iodone on async buffer may happen before code checks the vnode v_flag. For cluster buffers this would result in dereferencing NULL b_vp. To prevent the panic, cache relevant vnode flag before calling bstrategy. Reported by: Peter Holm, kris Tested by: Peter Holm Reviewed by: tegge Pointy hat to: kib	2006-12-20 09:22:31 +00:00
David Xu	4e32b7b3cc	Add a lwpid field into per-cpu structure, the lwpid represents current running thread's id on each cpu. This allow us to add in-kernel adaptive spin for user level mutex. While spinning in user space is possible, without correct thread running state exported from kernel, it hardly can be implemented efficiently without wasting cpu cycles, however exporting thread running state unlikely will be implemented soon as it has to design and stablize interfaces. This implementation is transparent to user space, it can be disabled dynamically. With this change, mutex ping-pong program's performance is improved massively on SMP machine. performance of mysql super-smack select benchmark is increased about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems which have bunch of cpus and system-call overhead is low (athlon64, opteron, and core-2 are known to be fast), the adaptive spin does help performance. Added sysctls: kern.threads.umtx_dflt_spins if the sysctl value is non-zero, a zero umutex.m_spincount will cause the sysctl value to be used a spin cycle count. kern.threads.umtx_max_spins the sysctl sets upper limit of spin cycle count. Tested on: Athlon64 X2 3800+, Dual Xeon 5130	2006-12-20 04:40:39 +00:00
Martin Blapp	cd1b20d58a	Back out rev. 1.266. The real cause for the recent panics has been fixed in rev. 1.267 and there is no need to keep this test.	2006-12-20 02:49:59 +00:00
Martin Blapp	b472f371b2	Giant might have been temporarily dropped while waiting for proctree_lock, allowing for an intervening tty_close() that cleared tp->t_session. Submitted by: tegge MFC: 1 day	2006-12-19 22:34:32 +00:00
Martin Blapp	e0b43fcf44	Add the tp->t_refcnt validity check back. There are still some race conditions where tp->t_refcnt can go to zero.	2006-12-19 16:46:13 +00:00
David Xu	d733ccfbc2	Remove unused sysctls.	2006-12-19 13:06:01 +00:00
Pawel Jakub Dawidek	ce0d4ed4c2	Use pipe_direct_write() optimization only if the data is in process' memory. This fixes sending data through pipe from the kernel. Fix suggested by: rwatson	2006-12-19 12:52:22 +00:00
Kip Macy	a12f193c7c	ktrace_cv is no longer used - remove Submitted by: Attilio Rao	2006-12-17 00:16:09 +00:00
Kip Macy	10ebecb796	Cleaner fix for handling declaration of loop variable under INVARIANTS - in trying to avoid nested brackets and #ifdef INVARIANTS around i at the top, I broke booting for INVARIANTS all together :-( - the cleanest fix is to simply assign to sq twice if INVARIANTS is enabled - tested both with and without INVARIANTS :-/	2006-12-17 00:14:20 +00:00
Andrey A. Chernov	6d87718991	Don't intermix assignments and variable declarations in prev. commit	2006-12-16 21:17:27 +00:00
Andrey A. Chernov	fc6d254f9e	Fix NULL pointer reference for INVARIANTS case Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>	2006-12-16 20:33:26 +00:00
Craig Rodrigues	03eff5830a	In vfs_export(), if we specify MNT_DELEXPORT in the struct export_args, after we perform the operations to delete the export, call vfs_deleteopt() to delete the "export" mount option from the linked list of mount options associated with that mount point. This fixes one scenario: - put a filesystem in /etc/exports to export it - remove the filesystem from /etc/exports to delete the export and restart mountd - try to do a "mount -u -o ro" or "mount -u -o rw" on that filesystem now that it is no longer exported.	2006-12-16 15:50:36 +00:00
Craig Rodrigues	2892f3bbfa	Add a function vfs_deleteopt() which searches through the vfsoptlist linked list of mount options by name, and deletes the option if it finds it.	2006-12-16 15:44:03 +00:00
Craig Rodrigues	2830e09d3f	Convert to ANSI-style function prototypes.	2006-12-16 12:06:59 +00:00
Robert Watson	9ac5741040	For now, back out sysv_ipc.c:1.30, which caused shmget() with odd mode arguments to fail. The mode field for shmget() appears to have undefined meaning in the context of an already-present IPC object, but applications appear to assume any arbitrary passed value will be ignored. I had hoped to revisit this more quickly, but am removing the change for now to prevent toe-stubbing. Reported by: JAroslav Suchanek <jarda at grisoft dot cz> PR: kern/106078	2006-12-16 11:30:54 +00:00
Kip Macy	bd9275b4c4	correct name of number of sleep queues	2006-12-16 07:50:39 +00:00
Kip Macy	6cbb70e2cc	Add second sleep queue so that sx and lockmgr can have separate sleep queues for shared and exclusive acquisitions Submitted by: Attilio Rao Approved by: jhb	2006-12-16 06:54:09 +00:00
Kip Macy	1364a812e7	- Fix some gcc warnings in lock_profile.h - add cnt_hold cnt_lock support for spin mutexes - make sure contested is initialized to zero to only bump contested when appropriate - move initialization function to kern_mutex.c to avoid cyclic dependency between mutex.h and lock_profile.h	2006-12-16 02:37:58 +00:00
Nick Hibma	9079fff550	Align the interfaces for the various watchdogs and make the interface behave as expected. Also: - Return an error if WD_PASSIVE is passed in to the ioctl as only WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an explanation of the difference between WD_ACTIVE and WD_PASSIVE. - Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've lost your sense of humor, than don't add a define. Specific changes: i80321_wdog.c Don't roll your own passive watchdog tickle as this would defeat the purpose of an active (userland) watchdog tickle. ichwd.c / ipmi.c: WD_ACTIVE means active patting of the watchdog by a userland process, not whether the watchdog is active. See sys/watchdog.h. kern_clock.c: (software watchdog) Remove a check for WD_ACTIVE as this does not make sense here. This reverts r1.181.	2006-12-15 21:44:49 +00:00
Konstantin Belousov	3b7b5496a7	Resolve two deadlocks that could be caused by busy md device backed by vnode. Allow for md thread and the thread that owns lock on vnode backing the md device to do the write even when runningbufspace is exhausted. Tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks	2006-12-14 11:34:07 +00:00
John Baldwin	c304531851	Add a function to return the MD interrupt source cookie associated with an interrupt event. Use this in the x86 code to fixup the intrcnt names when an interrupt handler is removed.	2006-12-12 19:20:19 +00:00
John Baldwin	bc17acb2ad	Add a comment and fix a whitespace nit.	2006-12-12 19:19:22 +00:00
Julian Elischer	0c17ece676	Fix a potential point of confusion. Art Ironport we've seen this end up with an infinite loop in and out of the kernel during process shutdown.	2006-12-12 08:01:55 +00:00
Craig Rodrigues	3a13c9cc28	Use vfs_mount_error() to log mount errors in a few places with human readable strings which can be retrieved if an "errmsg" parameter is passed into nmount().	2006-12-07 02:57:00 +00:00
Julian Elischer	fc6c30f6c6	Changes to try fix sched_ule.c courtesy of David Xu.	2006-12-06 06:55:59 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Kip Macy	aa077979f6	Bug fix for obscenely large wait times on uncontested locks if waittime was zero (the lock was uncontested) l->lpo_waittime in the hash table would not get initialized. Inspection prompted by questions from: Attilio Rao	2006-12-04 22:15:50 +00:00
John Baldwin	5505470e4a	Fix an edge case in rman_manage_region() where it didn't handle a resource ending at ULONG_MAX properly. While here, use TAILQ_FOREACH_SAFE(). Tested by: "Stephane E. Potvin" <sepotvin at videotron-ca> MFC after: 1 week	2006-12-04 16:45:23 +00:00
David Xu	745fbd3a72	if a thread blocked on userland condition variable is pthread_cancel()ed, it is expected that the thread will not consume a pthread_cond_signal(), therefor, we use thr_wake() to mark a flag, the flag tells a thread calling do_cv_wait() in umtx code to not block on a condition variable. Thread library is expected that once a thread detected itself is in pthread_cond_wait, it will call the thr_wake() for itself in its SIGCANCEL handler.	2006-12-04 14:15:12 +00:00
David Xu	a6abdf322d	Introduce userspace condition variable, since we have already POSIX priority mutex implemented, it is the time to introduce this stuff, now we can use umutex and ucond together to implement pthread's condition wait/signal.	2006-12-03 01:49:22 +00:00
Konstantin Belousov	7226306ed5	Linker set support depends on the magic __start_<section> and __stop_<section> symbols generated by the static linker for elf sections. This is done only for the final link, and not for ld -r. Augment elf_obj in-kernel linker by recognizing such special symbols, and resolving them to the start and end of the section automatically. As result, linker sets on amd64 could be used in the same way as on other architectures, without explicit calls to linker_file_lookup_set(). Requested by: rdivacky No objections from: peter, jhb	2006-11-30 10:50:29 +00:00
Poul-Henning Kamp	a4dcb4f627	Only grab the sched_lock if we actually need to modify the thread priority. During a buildworld only 2/3 of the calls to msleep actually changed the priority.	2006-11-30 08:27:38 +00:00
John Birrell	d4fbc81d99	Flushing the buffer is conditional on actually using the buffer. Oops.	2006-11-30 07:25:52 +00:00
John Birrell	e0b651251d	Turn console printf buffering into a kernel option and only on by default for sun4v where it is absolutely required. This change moves the buffer from struct pcpu to the stack to avoid using the critical section which created a LOR in a couple of cases due to interaction with the tty code and kqueue. The LOR can't be fixed with the critical section and the pcpu buffer can't be used without the critical section. Putting the buffer on the stack was my initial solution, but it was pointed out that the stress on the stack might cause problems depending on the call path. We don't have a way of creating tests for those possible cases, so it's best to leave this as an option for the time being. In time we may get enough data to enable this option more generally.	2006-11-30 04:17:05 +00:00
David Xu	843b99c6f7	- Remove third parameter of itimer_find, the parameter is always zero. - Call callout_drain on deleting POSIX timer. - Use kern_timer_delete in exiting hook.	2006-11-28 03:24:34 +00:00
Mohan Srinivasan	84eab9ad73	Fix a race in soclose() where connections could be queued to the listening socket after the pass that cleans those queues. This results in these connections being orphaned (and leaked). The fix is to clean up the so queues after detaching the socket from the protocol. Thanks to ups and jhb for discussions and a thorough code review.	2006-11-22 23:54:29 +00:00
John Baldwin	6600b45d88	Save exit status of an exiting process in kn_data in the knote. Submitted by: Jared Yanovich ^phirerunner at comcast.net^ MFC after: 2 weeks	2006-11-20 22:17:50 +00:00
Julian Elischer	de38cd9d8b	whitespace fix only	2006-11-20 16:13:02 +00:00
David Xu	fa0d3a327a	Use scheduler API sched_user_prio() to adjust thread's userland priority, use td_base_user_prio to get real userland priority since POSIX priority mutex may adjust td_user_pri which is an effective priority.	2006-11-20 05:50:59 +00:00
Alan Cox	976a87a284	Add vm map and object locking to each_writable_segment(). Noticed by: jhb@ MFC after: 3 weeks	2006-11-19 23:38:59 +00:00
Jung-uk Kim	e22291430e	Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by timing out msgsnd and rechecking resources. This problem was found while I was running Linux Test Project test suite (test cases: msgctl08, msgctl09). Change `msgwait' to `msgsnd' and `msgrcv' to distinguish its sleeping conditions. Few cosmetic changes to debugging messages.	2006-11-17 20:43:01 +00:00
Pawel Jakub Dawidek	7ee07175af	Change sleepq_add(9) argument from 'struct mtx ' to 'struct lock_object ', which allows to use it with different kinds of locks. For example it allows to implement Solaris conditions variables which will be used in ZFS port on top of sx(9) locks. Reviewed by: jhb	2006-11-16 01:02:00 +00:00
John Baldwin	7eefbf10c8	Adjust assertions to allow for magical properties of the 'lbolt' wait channel for tsleep(): - Allow tsleep() on &lbolt without Giant with a timeout 0 since &lbolt has an implied timeout. - If &lbolt is used with msleep() pass NULL to sleepq_add() for the lock object. Unlike other sleepq channels, &lbolt doesn't have an associated owning lock.	2006-11-15 20:44:07 +00:00
David Xu	653385756c	Fix a copy-paste bug in NON-KSE case.	2006-11-14 05:48:27 +00:00
Kip Macy	2f6a774be4	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)	2006-11-13 05:51:22 +00:00
Kip Macy	61bd5e21b3	track lock class name in a way that doesn't break WITNESS	2006-11-13 05:41:46 +00:00
Kip Macy	44a96b46bd	Unbreak witness	2006-11-12 23:23:38 +00:00
Andre Oppermann	3e932ca715	In kern_sendfile() fix the calculation of sbytes (the total number of bytes written to the socket). The rewrite in revision 1.240 got confused by the FreeBSD 4.x bug compatibility code. For some reason lighttpd, that was used for testing the new sendfile code, was not affected by the problem but apache and others using headers/trailers in the sendfile call received incorrect sbytes values after return from non- blocking sockets. This then lead to restarts with wrong offsets and thus mixed up file contents when the socket was writeable again. All programs not using headers/trailers, like ftpd, were not affected by the bug. Reported by: Pawel Worach <pawel.worach-at-gmail.com> Tested by: Pawel Worach <pawel.worach-at-gmail.com>	2006-11-12 20:57:00 +00:00
David Xu	60d4823594	Copy base user priority in NO_KSE case.	2006-11-12 11:48:37 +00:00
Tom Rhodes	bedc1c9c96	Fix mispatch of includes list; allows my kernel to build successfully.	2006-11-12 03:34:03 +00:00
Kip Macy	54e57f7613	show lock class in profiling output for default case where type is not specified when initializing the lock Approved by: scottl (standing in for mentor rwatson)	2006-11-12 03:30:01 +00:00
David Xu	812fb4a89f	Use mi_switch, this should fix loadavg calculation problem in NO_KSE case.	2006-11-12 03:18:22 +00:00
Tom Rhodes	c4f7f0fd4a	Update includes for sys/posix4 move. Approved by: silence on -arch and -standards	2006-11-11 16:46:31 +00:00
Tom Rhodes	6aeb05d7be	Merge posix4/* into normal kernel hierarchy. Reviewed by: glanced at by jhb Approved by: silence on -arch@ and -standards@	2006-11-11 16:26:58 +00:00
Tom Rhodes	bdd04ab184	Update #includes list.	2006-11-11 16:19:12 +00:00
David Xu	5a21514727	Unbreak userland priority inheriting in NO_KSE case.	2006-11-11 13:11:29 +00:00
Kip Macy	ed6a7c42f6	tinderbox fix	2006-11-11 07:38:48 +00:00
Kip Macy	cf2c39e7a2	remove lingering call to rd(tick)	2006-11-11 07:28:45 +00:00
Kip Macy	83b72e3e25	missed nits replacing mutex with lock	2006-11-11 06:28:47 +00:00
Kip Macy	7c0435b933	MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile wait (time waited to acquire) and hold times for all kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb	2006-11-11 03:18:07 +00:00
Maxim Konovalov	f645b5da88	o Fix a couple of obvious typos.	2006-11-08 09:09:07 +00:00
Andre Oppermann	62b36a7fc2	Style cleanups to the sctp_* syscall functions.	2006-11-07 21:28:12 +00:00
John Baldwin	6b8de13ab4	Simplify operations with sync_mtx in sched_sync(): - Don't drop the lock just to reacquire it again to check rushjob, this only wastes time. - Use msleep() to drop the mutex while sleeping instead of explicitly unlocking around tsleep. Reviewed by: pjd	2006-11-07 19:45:05 +00:00
John Baldwin	8064e5d71f	Fix comment typo and function declaration.	2006-11-07 19:07:33 +00:00
Tor Egge	40dee3da29	Don't drop reference to tty in tty_close() if TS_ISOPEN is already cleared. Reviewed by: bde	2006-11-06 22:12:43 +00:00
Andre Oppermann	bda8b1f3b8	Handle early errors in kern_sendfile() by introducing a new goto 'out' label after the sbunlock() part. This correctly handles calls to sendfile(2) without valid parameters that was broken in rev. 1.240. Coverity error: 272162	2006-11-06 21:53:19 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Robert Watson	800c940832	Add a new priv(9) kernel interface for checking the availability of privilege for threads and credentials. Unlike the existing suser(9) interface, priv(9) exposes a named privilege identifier to the privilege checking code, allowing more complex policies regarding the granting of privilege to be expressed. Two interfaces are provided, replacing the existing suser(9) interface: suser(td) -> priv_check(td, priv) suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags) A comprehensive list of currently available kernel privileges may be found in priv.h. New privileges are easily added as required, but the comments on adding privileges found in priv.h and priv(9) should be read before doing so. The new privilege interface exposed sufficient information to the privilege checking routine that it will now be possible for jail to determine whether a particular privilege is granted in the check routine, rather than relying on hints from the calling context via the SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail check function, prison_priv_check(), is exposed from kern_jail.c and used by the privilege check routine to determine if the privilege is permitted in jail. As a result, a centralized list of privileges permitted in jail is now present in kern_jail.c. The MAC Framework is now also able to instrument privilege checks, both to deny privileges otherwise granted (mac_priv_check()), and to grant privileges otherwise denied (mac_priv_grant()), permitting MAC Policy modules to implement privilege models, as well as control a much broader range of system behavior in order to constrain processes running with root privilege. The suser() and suser_cred() functions remain implemented, now in terms of priv_check() and the PRIV_ROOT privilege, for use during the transition and possibly continuing use by third party kernel modules that have not been updated. The PRIV_DRIVER privilege exists to allow device drivers to check privilege without adopting a more specific privilege identifier. This change does not modify the actual security policy, rather, it modifies the interface for privilege checks so changes to the security policy become more feasible. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:37:19 +00:00
Pawel Jakub Dawidek	a2ca03b3ad	Typo, 'from' vnode is locked here, not 'to' vnode.	2006-11-04 23:57:02 +00:00
Randall Stewart	af99851047	This commits the remake in kern/ make sysent to get the correct syscalls.master's $FreeBSD$ tag record and a make sysent in sys/compat/freebsd32. Thanks Ruslan for pointing out the steps I missed :-0 Approved by: gnn	2006-11-03 18:57:49 +00:00
Randall Stewart	f8829a4a40	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
John Birrell	35b927a8c4	Always init the console before trying to cnadd it to avoid the case where the console name isn't set and cnadd wants to use printf to complain about it.	2006-11-03 06:23:53 +00:00
Andre Oppermann	1ae4d97d51	Use the improved m_uiotombuf() function instead of home grown sosend_copyin() to do the userland to kernel copying in sosend_generic() and sosend_dgram(). sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported by m_uiotombuf(). Benchmaring shows significant improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:45:28 +00:00
Andre Oppermann	5e20f43d31	Rename m_getm() to m_getm2() and rewrite it to allocate up to page sized mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf chain flags. Provide compatibility macro for m_getm() calling m_getm2() with M_PKTHDR set. Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the uiomove() in a tight loop over the mbuf chain. Add a flags parameter to accept mbuf flags to be passed to m_getm2(). Adjust all callers for the extra parameter. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:37:22 +00:00
Andre Oppermann	d99b0dd2c5	Rewrite kern_sendfile() to work in two loops, the inner which turns as many VM pages into mbufs as it can -- up to the free send socket buffer space. The outer loop then drops the whole mbuf chain into the send socket buffer, calls tcp_output() on it and then waits until 50% of the socket buffer are free again to repeat the cycle. This way tcp_output() gets the full amount of data to work with and can issue up to 64K sends for TSO to chop up in the network adapter without using any CPU cycles. Thus it gets very efficient especially with the readahead the VM and I/O system do. The previous sendfile(2) code simply looped over the file, turned each 4K page into an mbuf and sent it off. This had the effect that TSO could only generate 2 packets per send instead of up to 44 at its maximum of 64K. Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of sleeping on mbuf allocation failures. Benchmarking shows significant improvements (95% confidence): 45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO) 83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 16:53:26 +00:00
John Baldwin	1ac27db5b7	Increment nb_allocated while holding the pt_mtx lock to avoid races.	2006-11-01 16:50:13 +00:00
John Baldwin	9045eda252	Comment and style tweak.	2006-11-01 16:48:33 +00:00
John Birrell	3d068827c2	Add a cnputs() function to write a string to the console with a lock to prevent interspersed strings written from different CPUs at the same time. To avoid putting a buffer on the stack or having to malloc one, space is incorporated in the per-cpu structure. The buffer size if 128 bytes; chosen because it's the next power of 2 size up from 80 characters. String writes to the console are buffered up the end of the line or until the buffer fills. Then the buffer is flushed to all console devices. Existing low level console output via cnputc() is unaffected by this change. ithread calls to log() are also unaffected to avoid blocking those threads. A minor change to the behaviour in a panic situation is that console output will still be buffered, but won't be written to a tty as before. This should prevent interspersed panic output as a number of CPUs panic before we end up single threaded running ddb. Reviewed by: scottl, jhb MFC after: 2 weeks	2006-11-01 04:54:51 +00:00
Pawel Jakub Dawidek	1a60c7fc8e	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
Pawel Jakub Dawidek	c3618c657a	Add a new I/O request - BIO_FLUSH, which basically tells providers below to flush their caches. For now will mostly be used by disks to flush their write cache. Sponsored by: home.pl	2006-10-31 21:11:21 +00:00
Alan Cox	0c2b04b419	Refactor vfs_setdirty(), creating vfs_setdirty_locked_object(). Call vfs_setdirty_locked_object() from vfs_busy_pages() instead of vfs_setdirty(), thereby eliminating a second acquisition and release of the same vm object lock.	2006-10-29 00:04:39 +00:00
Alan Cox	20ed1b5b1b	In bufdone_finish() restrict the acquisition and release of the page queues lock to BIO_READ operations. Recent changes to the implementation of the per-page flags have eliminated the need for the page queues lock in the other cases.	2006-10-28 19:16:57 +00:00
David Xu	d21ac9b686	Remove member p_procscopegrp which is no longer used by libthr.	2006-10-27 05:45:44 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Konstantin Belousov	9a969e626c	The attempt to rename "." with MAC framework compiled in would cause attempt to twice unlock the vnode. Check that ni_vp and ni_dvp are different before doing second unlock. Reviewed by: rwatson Approved by: pjd (mentor) MFC after: 1 week	2006-10-26 13:20:28 +00:00
Robert Watson	24076d138e	Increase usefulness of "show malloc" by moving from displaying the basic counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl	2006-10-26 10:17:13 +00:00
David Xu	4c9b02c253	Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop, make a fast path when a umtx_pi can be allocated without being blocked.	2006-10-26 09:33:34 +00:00
David Xu	7c24ae418a	In order to eliminate a branch, convert opcode to unsigned integer.	2006-10-25 06:38:46 +00:00
David Xu	91d0b4d615	Eliminate an unnecessary `if' statement.	2006-10-25 06:28:23 +00:00
David Xu	ff7668079f	Move sigqueue_take() call into proc_reparent(), this fixed bugs where proc_reparent() is called but sigqueue_take() is forgotten.	2006-10-25 06:18:04 +00:00
David Xu	e94cc4ac30	Protect sigqueue_take() call by child process's lock, it fixed a potential race with ptrace 'attach' which changes parent of the child process.	2006-10-24 12:04:21 +00:00
Poul-Henning Kamp	7ea93e912b	Better naming of fattime conversion functions, they do convert to timespec after all. Add 'utc' argument to control if fattimestamps are on UTC or local timezone calendar.	2006-10-24 10:27:23 +00:00
Alan Cox	2a53696fb8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
Poul-Henning Kamp	b39be1b35c	Add two new functions to convert FAT filesystem format timestamps to and from struct timespec, to replace the crummy conversion function which have been copy&pasted into three different filesystems already. Apart from general crummyness as indicated by code like: for (year = 1970;; year++) { inc = year & 0x03 ? 365 : 366; if (days < inc) break; days -= inc; } They also contain specialized crummyness which tries to compensate for the general crummyness by caching recent conversion results, with no regard for locking or consistency. These replacement functions are smaller, O(1) and handle the Y2.1K leap-year correctly. Ideally, these functions should live in a module of their own, which the three offending filesystems would depend on, but the size is 877 bytes of code (on i386), so that would be false economy.	2006-10-22 18:19:08 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
David Xu	5c28a8d474	Use macro TAILQ_FOREACH_SAFE instead of expanding it.	2006-10-22 00:09:41 +00:00
David Xu	f71e748d89	Since revision 1.333 of kern_sig.c no longer uses P_WEXIT, the change opened a race window which can cause memory leak in signal queue. Here we free memory for signal queue when process state is set to PRS_ZOMBIE.	2006-10-21 23:59:15 +00:00
John Baldwin	0fc32899f1	Remove the check that prevented signals from being delivered to exiting processes. It was originally added back when support for Linux threads (and thus shared sigacts objects) was added, but no one knows why. My guess is that at some point during the Linux threads patches, the sigacts object was torn down during exit1(), so this check was added to prevent a panic for that race. However, the stuff that was actually committed to the tree doesn't teardown sigacts until wait() making the above race moot. Re-allowing signals here lets one interrupt a NFS request during process teardown (such as closing descriptors) on an interruptible mount. Requested by: kib (long time ago) MFC after: 1 week	2006-10-20 16:19:21 +00:00
Konstantin Belousov	1663075c64	Fix the race between devfs_fp_check and devfs_reclaim. Derefence the vnode' v_rdev and increment the dev threadcount , as well as clear it (in devfs_reclaim) under the dev_lock(). Reviewed by: tegge Approved by: pjd (mentor)	2006-10-20 07:59:50 +00:00
Bruce Evans	1ca2c0183f	kern_intr.c: - Count (scheduling of) software interrupts (SWIs) as SWIs, not as hardware interrupts. - Don't count (scheduling of) delayed SWIs as interrupts at all, since in the delayed case it is expected that there are many more scheduling calls than handling calls. Perhaps all interrupts should be counted only when they are handled, but it is only counts of delayed SWIs that shouldn never be combined with the other counts. subr_trap.c: - Count (handling of) Asynchronous System Traps (ASTs) as traps, not as software interrupts. Before these changes, the counter for SWIs only counted ASTs, and SWIs weren't counted separately, but a subcounter for ASTs alone is less needed than for most other exception sources. 4.4BSD-Lite uses the counters for similar things (actually matching their names) on its main arches (hp300, ..., !i386) where more of the exceptions are in hardware.	2006-10-18 04:48:09 +00:00
David Xu	034b26fc65	Regenerate.	2006-10-17 02:28:58 +00:00
David Xu	5f641fc0fb	o Add keyword volatile for user mutex owner field. o Fix type consistent problem by using type long for old umtx and wait channel. o Rename casuptr to casuword.	2006-10-17 02:24:47 +00:00
Alexander Leidinger	6a1162d4cd	MFP4 (with some minor changes): Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>	2006-10-15 14:22:14 +00:00
Ruslan Ermilov	a1b0a18096	Prevent IOC_IN with zero size argument (this is only supported if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it. Security: local DoS Reported by: Peter Holm MFC after: 3 days	2006-10-14 19:01:55 +00:00
Tom Rhodes	f51bf07af8	Close a race condition where num can be larger than tmp, giving the user too large of a boundary. Reported by: Ilja Van Sprundel	2006-10-14 10:30:14 +00:00
Tor Egge	e0c33ad529	Wait for thread count to reach zero in destroy_devl() even when no purge method is defined, to avoid memory being modified after free. Temporarily increase refcount in destroy_devl() to avoid a double free if dev_rel() is called while waiting for thread count to reach zero.	2006-10-13 20:49:24 +00:00
Gleb Smirnoff	68a57ebfad	Improve ktr(4) logging for callout(9) subsystem. Log all inserts and removals, including failures, into the callwheel. XXX: Most of the CTR() macros are called with callout_lock spin mutex held, thus won't be logged into file, if KTR_ALQ is used. Moving the CTR() macros out from the spinlocked code would require copying of all arguments. I'm too lazy to do this.	2006-10-11 14:57:03 +00:00
David Xu	ae7d8a6766	Implement 32bit umtx_lock and umtx_unlock system calls, these two system calls are not used by libthr in RELENG_6 and HEAD, it is only used by the libthr in RELENG-5, the _umtx_op system call can do more incremental dirty works than these two system calls without having to introduce new system calls or throw away old system calls when things are going on.	2006-10-06 08:22:08 +00:00
David Xu	c6511aea86	Move some declaration of 32-bit signal structures into file freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.	2006-10-05 01:56:11 +00:00
Martin Blapp	89ff1e4cb8	Back out part of rev. 1.149. While adding a workaround in ptcopen() to avoid leaked ptys works fine, this opens a possible security hole. Submitted by: bde MFC after: 3 days	2006-10-04 05:43:39 +00:00
Robert Watson	531147aa3e	Regenerate.	2006-10-03 20:48:11 +00:00
Robert Watson	888db9e177	Audit creat() system call (compat code), and change type for getpagesize(), which isn't actually being audited anyway. MFC after: 3 days Obtained from: TrustedBSD Project	2006-10-03 20:46:52 +00:00
Konstantin Belousov	30af71199e	Fix the remaining race in the revs. 1.232, 1,233 that could occur during unmount when mp structure is reused while waiting for coveredvp lock. Introduce struct mount generation count, increment it on each reuse and compare the generations before and after obtaining the coveredvp lock. Reviewed by: tegge, pjd Approved by: pjd (mentor) MFC after: 2 weeks	2006-10-03 10:47:04 +00:00
Poul-Henning Kamp	e5037a18a9	Use utc_offset() where applicable, and hide the internals of it as static variables.	2006-10-02 18:23:37 +00:00
Poul-Henning Kamp	f97c1c4bf7	Introduce utc_offset() to capture a calculation currently done all over the place.	2006-10-02 16:17:23 +00:00
Poul-Henning Kamp	94d67e0fb8	Move tz_minuteswest and tz_dsttime to subr_clock.c	2006-10-02 16:06:26 +00:00
Poul-Henning Kamp	b69f71eb29	Second part of a little cleanup in the calendar/timezone/RTC handling. Split subr_clock.c in two parts (by repo-copy): subr_clock.c contains generic RTC and calendaric stuff. etc. subr_rtc.c contains the newbus'ified RTC interface. Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock} sysctls and associated variables into subr_clock.c. They are not machine dependent and we have generic code that relies on being present so they are not even optional.	2006-10-02 15:42:02 +00:00
Poul-Henning Kamp	f645b0b51c	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
Konstantin Belousov	45ea8737bf	Correct the comment: numvnodes is decreased on vdestroying the vnode. OKed by: tegge Approved by: pjd (mentor) MFC after: 1 week	2006-10-02 07:25:58 +00:00
Tor Egge	04aa807cb6	If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.	2006-10-02 02:06:27 +00:00
Martin Blapp	570d6457d1	Readd rev. 1.145 because of vfs bugs and races near revoke(). Until they are fixed we can't free any slaves. Add a workaround to not to leak ptys by number.	2006-09-30 22:51:05 +00:00
Pawel Jakub Dawidek	2342d5216e	Remove duplicated $FreeBSD$.	2006-09-30 16:33:29 +00:00
Martin Blapp	35dcc318f4	Any call of tty_close() with a tty refcount of <= 1 is wrong and we will free the tty in this case. This is a workaround until the underlaying devfs/tty problems are fixed. MFC after: 1 day	2006-09-30 08:11:51 +00:00
Martin Blapp	9b206de5a0	Free tty struct after last close. This should fix the pty-leak by numbers. Remove workarounds for tty_refcount beeing 0, this will be fixed differently later.	2006-09-29 09:53:19 +00:00
Martin Blapp	e4936f3763	Free tty struct after last close. This should fix the pty-leak by numbers. Remove workarounds for tty_refcount beeing 0, this will be fixed differently later. Back out rev 1.145 since we initialize the tty struct from scratch and bad things can't happen anymore.	2006-09-29 09:52:57 +00:00
Ruslan Ermilov	9fddcc6661	Fix our ioctl(2) implementation when the argument is "int". New ioctls passing integer arguments should use the _IOWINT() macro. This fixes a lot of ioctl's not working on sparc64, most notable being keyboard/syscons ioctls. Full ABI compatibility is provided, with the bonus of fixing the handling of old ioctls on sparc64. Reviewed by: bde (with contributions) Tested by: emax, marius MFC after: 1 week	2006-09-27 19:57:02 +00:00
Martin Blapp	8be563721a	Move Giant up even further since P_CONTROLT isn't really fully locked yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb	2006-09-27 16:42:10 +00:00
Martin Blapp	1bf5e4b866	Use ctty instead of just returning. ctty just has a simple open that returns ENXIO. Submitted by: jhb	2006-09-27 16:41:15 +00:00
Tor Egge	e60c361218	Reduce fluctuations of mnt_flag to allow unlocked readers to get a slightly more consistent view.	2006-09-26 04:20:09 +00:00
Tor Egge	fba924ce9b	Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with MNT_UPDATE flag, closing a race between nmount() and quotactl().	2006-09-26 04:18:36 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	cea9d840d8	Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race with dounmount(), causing loss of MNTK_UNMOUNT flag.	2006-09-26 04:15:04 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Robert Watson	88b85279a9	SI_ORDER_THIRD + 2, not SI_ORDER_FOURTH + 2. MFC after: 3 days Submitted by: mlaier	2006-09-26 00:15:56 +00:00
Robert Watson	5add74b4a7	Add "FreeBSD" trademark statement to copyright section of boot messages. MFC after: 3 days Approved by: core, board at FreeBSDFoundation dot org	2006-09-25 23:19:01 +00:00
John-Mark Gurney	33fabe46da	remove unnecessary NULL check... Coverity ID: 1545	2006-09-25 01:29:48 +00:00
John-Mark Gurney	4db71d27a1	hide kqueue_register from public view, and replace it w/ kqfd_register... this eliminates a possible race in aio registering a kevent..	2006-09-24 04:47:47 +00:00
John-Mark Gurney	aeab19b21f	return EBADF instead of successfully attaching (and then panicing) when an fd is dieing.. Convinced by: jhb PR: 103127	2006-09-24 02:29:53 +00:00
John-Mark Gurney	9edac6f3f9	add KTRACE hooks into kevent... This will help people debug their kqueue programs to find out exactly which events were registered and which were returned... This should be lower in kern_kevent, but that would require special munging due to locks and the functions used to copyin/copyout kevents... If someone wants to teach ktrace how to output pretty kevents, I have a kevent prety printer that can be used...	2006-09-24 02:23:29 +00:00
Martin Blapp	45e6819160	Protect enterpgrp() against another tty/proc race case until the tty locking work has been fixed. MFC after: 1 week	2006-09-23 17:35:24 +00:00
Martin Blapp	7c56049e6d	Check for tp->t_refcnt == 0 before doing anything in tty_open(). PR: 103520 MFC after: 1 week	2006-09-23 14:52:46 +00:00
Martin Blapp	153c21c8c1	If /dev/tty gets opened after your controlling terminal has been revoked you can't call tty_clone afterwords. OpenBSD and NetBSD both fail the open call in that case, so we should do so as well. This can be done in ctty_clone by returning with *dev==NULL. Admittedly this causes open to return ENOENT, instead of ENXIO as on the other BSDs, but this way requires the least touching of code. Submitted by: Nate Eldredge <nge@cs.hmc.edu> PR: 83375 MFC: 1 week	2006-09-23 14:44:14 +00:00
Bruce M Simpson	4a75dc2585	Fix a case where socket I/O atomicity is violated due to not dropping the entire record when a non-data mbuf is removed in the soreceive() path. This only triggers a panic directly when compiled with INVARIANTS. PR: 38495 Submitted by: James Juran MFC after: 1 week	2006-09-22 15:34:16 +00:00
David Xu	cda9a0d1c2	Add compatible code to let 32bit libthr work on 64bit kernel.	2006-09-22 15:04:28 +00:00
David Xu	e58b17ea53	Fix umtx command order error for freebsd 32bit.	2006-09-22 14:59:10 +00:00
David Xu	1eec02f538	Add umtx support for 32bit process on AMD64 machine.	2006-09-22 00:52:54 +00:00
Martin Blapp	1c1d411bee	Back out rev. 1.258. The real race cause has been fixed in rev. 1.241 of kern_proc.c. Requested by: jhb	2006-09-21 14:09:26 +00:00
Randall Stewart	adf5d1c6d0	atomic_fetchadd_int is used by mb_free_ext(), but it returns the previous value that the "add" effected (In this case we are adding -1), afterwhich we compare it to '0'... to see if we free the mbuf... we should be comparing it to '1'... Note that this only effects when there is contention since there is a first part to the comparison that checks to see if its '1'. So this bug would only crop up if two CPU's are trying to free the same mbuf refcount at the same time. This will happen in SCTP but I doubt can happen in TCP or UDP. PR: N/A Submitted by: rrs Reviewed by: gnn,sam Approved by: gnn,sam	2006-09-21 09:55:43 +00:00
David Xu	cca0a557dd	Regenerate.	2006-09-21 04:19:48 +00:00
David Xu	73fa3e5b88	Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.	2006-09-21 04:18:46 +00:00
Robert Watson	f50c4fd817	Remove MAC_DEBUG + MPRINTF debugging from System V IPC. This no longer appears to be serving a useful purpose, as it was used during initial development of MAC support for System V IPC. MFC after: 1 month Obtained from: TrustedBSD Project Suggested by: Christopher dot Vance at SPARTA dot com	2006-09-20 13:40:00 +00:00
Robert Watson	738f14d4b1	Remove MAC_DEBUG label counters, which were used to debug leaks and other problems while labels were first being added to various kernel objects. They have outlived their usefulness. MFC after: 1 month Suggested by: Christopher dot Vance at SPARTA dot com Obtained from: TrustedBSD Project	2006-09-20 13:33:41 +00:00
Pawel Jakub Dawidek	783deec19e	There is no need to set 'sp' to NULL anymore.	2006-09-20 07:27:05 +00:00
Tor Egge	4e59868e08	Copy stat information from mount structure before it can change identity.	2006-09-20 00:32:07 +00:00
Tor Egge	60b0b1aa18	Don't try to obtain a reference to a nonexisting (NULL) mount structure in default VOP_GETWRITEMOUNT().	2006-09-20 00:27:02 +00:00
Martin Blapp	d7b167b57b	Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The tty code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week	2006-09-19 19:25:11 +00:00
Konstantin Belousov	f37e633887	Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be unlocked only if it really exists. Found with: Coverity Prevent(tm) CID: 1535 Approved by: pjd (mentor)	2006-09-19 14:04:12 +00:00
Konstantin Belousov	4dec8579bd	Fix the race while waiting for coveredvp lock during unmount. The vnode may be recycled during the sleep, wrap the vn_lock with vhold/vdrop. Check that coveredvp still points to the same mp after sleep (needed because sleep dropped Giant). Move check for user rights for unmount after coveredvp lock is obtained. Tested by: Peter Holm Reviewed by: tegge Approved by: kan (mentor) MFC after: 2 weeks	2006-09-18 15:35:22 +00:00
Robert Watson	5702e0965e	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.	2006-09-17 20:00:36 +00:00
Andre Oppermann	a855e2b4c0	Remove VLAN mtag UMA zones and initialize ether_vtag and tso_segsz packet header fields to zero on mbuf allocation. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:44:32 +00:00
Robert Watson	da7cbdc2b3	Regenerate.	2006-09-17 13:29:36 +00:00
Robert Watson	6c2d307a0e	AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack(). Obtained from: TrustedBSD Project MFC after: 3 days	2006-09-17 13:28:11 +00:00
Robert Watson	101581b082	Expore kern.acct_configured, a sysctl that reflects the configured/ unconfigured state of the kernel accounting system. This is used by the accounting privilege regression test to determine whether accounting is in use and will be disrupted by the regression test. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project MFC after: 1 month	2006-09-17 11:00:36 +00:00
Mohan Srinivasan	3c5b80d6c2	Fix for a potential bug caught by Coverity. Pointed out to me by Kris Kennaway.	2006-09-14 17:57:02 +00:00
Mohan Srinivasan	7d7d9e2242	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
Scott Long	988129b824	Introduce a spinlock for synchronizing access to the video output hardware in syscons. This replaces a simple access semaphore that was assumed to be protected by Giant but often was not. If two threads that were otherwise SMP-safe called printf at the same time, there was a high likelyhood that the semaphore would get corrupted and result in a permanently frozen video console. This is similar to what is already done in the serial console drivers.	2006-09-13 15:48:15 +00:00
Christian S.J. Peron	7ca6b7823d	Back out one of the Giant removals from revision 1.272. Giant was not here to protect the vnode, it was present to synchronize access to TTY session information between exit(2) and the TTY code. While we are here, note that Giant is required for TTY protection. Clue from: bde Discussed with: jhb MFC after: 1 week	2006-09-13 15:47:53 +00:00
Pawel Jakub Dawidek	689f94bfe6	Fix a lock leak in an error case. Reported by: netchild Reviewed by: rwatson	2006-09-13 06:58:40 +00:00
John Baldwin	3bb00f61a2	- Revert making bus_generic_add_child() the default for BUS_ADD_CHILD(). Instead, we want busses to explicitly specify an add_child routine if they want to support identify routines, but by default disallow having outside drivers add devices. - Give smbus(4) an explicit bus_add_child() method. Requested by: imp	2006-09-11 22:20:37 +00:00
John Baldwin	4288462f38	Add a default method for BUS_ADD_CHILD() that just calls device_add_child_ordered(). Previously, a device driver that wanted to add a new child device in its identify routine had to know if the parent driver had a custom bus_add_child method and use BUS_ADD_CHILD() in that case, otherwise use device_add_child(). Getting it wrong in either direction would result in panics or failure to add the child device. Now, BUS_ADD_CHILD() always works isolating child drivers from having to know intimate details about the parent driver. Discussed with: imp MFC after: 1 week	2006-09-11 19:41:31 +00:00
John Baldwin	9914a8cc7d	- Fix rman_manage_region() to be a lot more intelligent. It now checks for overlaps, but more importantly, it collapses adjacent free regions. This is needed to cope with BIOSen that split up ports for system devices (like IPMI controllers) across multiple system resource entries. - Now that rman_manage_region() is not so dumb, remove extra logic in the x86 nexus drivers to populate the IRQ rman that manually coalesced the regions. MFC after: 1 week	2006-09-11 19:31:52 +00:00

... 2 3 4 5 6 ...

9851 Commits