freebsd-skq

Author	SHA1	Message	Date
jhb	25a000d49f	Simplify operations with sync_mtx in sched_sync(): - Don't drop the lock just to reacquire it again to check rushjob, this only wastes time. - Use msleep() to drop the mutex while sleeping instead of explicitly unlocking around tsleep. Reviewed by: pjd	2006-11-07 19:45:05 +00:00
jhb	f4782279b7	Fix comment typo and function declaration.	2006-11-07 19:07:33 +00:00
tegge	4fdb31ed24	Don't drop reference to tty in tty_close() if TS_ISOPEN is already cleared. Reviewed by: bde	2006-11-06 22:12:43 +00:00
andre	98af8dbef9	Handle early errors in kern_sendfile() by introducing a new goto 'out' label after the sbunlock() part. This correctly handles calls to sendfile(2) without valid parameters that was broken in rev. 1.240. Coverity error: 272162	2006-11-06 21:53:19 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
rwatson	7288104e20	Add a new priv(9) kernel interface for checking the availability of privilege for threads and credentials. Unlike the existing suser(9) interface, priv(9) exposes a named privilege identifier to the privilege checking code, allowing more complex policies regarding the granting of privilege to be expressed. Two interfaces are provided, replacing the existing suser(9) interface: suser(td) -> priv_check(td, priv) suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags) A comprehensive list of currently available kernel privileges may be found in priv.h. New privileges are easily added as required, but the comments on adding privileges found in priv.h and priv(9) should be read before doing so. The new privilege interface exposed sufficient information to the privilege checking routine that it will now be possible for jail to determine whether a particular privilege is granted in the check routine, rather than relying on hints from the calling context via the SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail check function, prison_priv_check(), is exposed from kern_jail.c and used by the privilege check routine to determine if the privilege is permitted in jail. As a result, a centralized list of privileges permitted in jail is now present in kern_jail.c. The MAC Framework is now also able to instrument privilege checks, both to deny privileges otherwise granted (mac_priv_check()), and to grant privileges otherwise denied (mac_priv_grant()), permitting MAC Policy modules to implement privilege models, as well as control a much broader range of system behavior in order to constrain processes running with root privilege. The suser() and suser_cred() functions remain implemented, now in terms of priv_check() and the PRIV_ROOT privilege, for use during the transition and possibly continuing use by third party kernel modules that have not been updated. The PRIV_DRIVER privilege exists to allow device drivers to check privilege without adopting a more specific privilege identifier. This change does not modify the actual security policy, rather, it modifies the interface for privilege checks so changes to the security policy become more feasible. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:37:19 +00:00
pjd	c524521d2f	Typo, 'from' vnode is locked here, not 'to' vnode.	2006-11-04 23:57:02 +00:00
rrs	f19063382b	This commits the remake in kern/ make sysent to get the correct syscalls.master's $FreeBSD$ tag record and a make sysent in sys/compat/freebsd32. Thanks Ruslan for pointing out the steps I missed :-0 Approved by: gnn	2006-11-03 18:57:49 +00:00
rrs	3d3e3f2242	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
jb	9658161dc0	Always init the console before trying to cnadd it to avoid the case where the console name isn't set and cnadd wants to use printf to complain about it.	2006-11-03 06:23:53 +00:00
andre	429c8e9145	Use the improved m_uiotombuf() function instead of home grown sosend_copyin() to do the userland to kernel copying in sosend_generic() and sosend_dgram(). sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported by m_uiotombuf(). Benchmaring shows significant improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:45:28 +00:00
andre	d1cc5b22d7	Rename m_getm() to m_getm2() and rewrite it to allocate up to page sized mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf chain flags. Provide compatibility macro for m_getm() calling m_getm2() with M_PKTHDR set. Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the uiomove() in a tight loop over the mbuf chain. Add a flags parameter to accept mbuf flags to be passed to m_getm2(). Adjust all callers for the extra parameter. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:37:22 +00:00
andre	42a69dcade	Rewrite kern_sendfile() to work in two loops, the inner which turns as many VM pages into mbufs as it can -- up to the free send socket buffer space. The outer loop then drops the whole mbuf chain into the send socket buffer, calls tcp_output() on it and then waits until 50% of the socket buffer are free again to repeat the cycle. This way tcp_output() gets the full amount of data to work with and can issue up to 64K sends for TSO to chop up in the network adapter without using any CPU cycles. Thus it gets very efficient especially with the readahead the VM and I/O system do. The previous sendfile(2) code simply looped over the file, turned each 4K page into an mbuf and sent it off. This had the effect that TSO could only generate 2 packets per send instead of up to 44 at its maximum of 64K. Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of sleeping on mbuf allocation failures. Benchmarking shows significant improvements (95% confidence): 45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO) 83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 16:53:26 +00:00
jhb	2a13022ca0	Increment nb_allocated while holding the pt_mtx lock to avoid races.	2006-11-01 16:50:13 +00:00
jhb	49d3deb803	Comment and style tweak.	2006-11-01 16:48:33 +00:00
jb	d2bd807356	Add a cnputs() function to write a string to the console with a lock to prevent interspersed strings written from different CPUs at the same time. To avoid putting a buffer on the stack or having to malloc one, space is incorporated in the per-cpu structure. The buffer size if 128 bytes; chosen because it's the next power of 2 size up from 80 characters. String writes to the console are buffered up the end of the line or until the buffer fills. Then the buffer is flushed to all console devices. Existing low level console output via cnputc() is unaffected by this change. ithread calls to log() are also unaffected to avoid blocking those threads. A minor change to the behaviour in a panic situation is that console output will still be buffered, but won't be written to a tty as before. This should prevent interspersed panic output as a number of CPUs panic before we end up single threaded running ddb. Reviewed by: scottl, jhb MFC after: 2 weeks	2006-11-01 04:54:51 +00:00
pjd	036e929548	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
pjd	d5cc909451	Add a new I/O request - BIO_FLUSH, which basically tells providers below to flush their caches. For now will mostly be used by disks to flush their write cache. Sponsored by: home.pl	2006-10-31 21:11:21 +00:00
alc	ce403a9391	Refactor vfs_setdirty(), creating vfs_setdirty_locked_object(). Call vfs_setdirty_locked_object() from vfs_busy_pages() instead of vfs_setdirty(), thereby eliminating a second acquisition and release of the same vm object lock.	2006-10-29 00:04:39 +00:00
alc	b944aa0079	In bufdone_finish() restrict the acquisition and release of the page queues lock to BIO_READ operations. Recent changes to the implementation of the per-page flags have eliminated the need for the page queues lock in the other cases.	2006-10-28 19:16:57 +00:00
davidxu	ce90069697	Remove member p_procscopegrp which is no longer used by libthr.	2006-10-27 05:45:44 +00:00
jb	f82c799735	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
kib	33638419ea	The attempt to rename "." with MAC framework compiled in would cause attempt to twice unlock the vnode. Check that ni_vp and ni_dvp are different before doing second unlock. Reviewed by: rwatson Approved by: pjd (mentor) MFC after: 1 week	2006-10-26 13:20:28 +00:00
rwatson	55067e6cf7	Increase usefulness of "show malloc" by moving from displaying the basic counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl	2006-10-26 10:17:13 +00:00
davidxu	329f9b6294	Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop, make a fast path when a umtx_pi can be allocated without being blocked.	2006-10-26 09:33:34 +00:00
davidxu	c7a2917f38	In order to eliminate a branch, convert opcode to unsigned integer.	2006-10-25 06:38:46 +00:00
davidxu	6f6aaf471b	Eliminate an unnecessary `if' statement.	2006-10-25 06:28:23 +00:00
davidxu	4889cc1ac4	Move sigqueue_take() call into proc_reparent(), this fixed bugs where proc_reparent() is called but sigqueue_take() is forgotten.	2006-10-25 06:18:04 +00:00
davidxu	c117e9447c	Protect sigqueue_take() call by child process's lock, it fixed a potential race with ptrace 'attach' which changes parent of the child process.	2006-10-24 12:04:21 +00:00
phk	e74f0ab140	Better naming of fattime conversion functions, they do convert to timespec after all. Add 'utc' argument to control if fattimestamps are on UTC or local timezone calendar.	2006-10-24 10:27:23 +00:00
alc	5d9c66a3f8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
phk	cc74fcabb4	Add two new functions to convert FAT filesystem format timestamps to and from struct timespec, to replace the crummy conversion function which have been copy&pasted into three different filesystems already. Apart from general crummyness as indicated by code like: for (year = 1970;; year++) { inc = year & 0x03 ? 365 : 366; if (days < inc) break; days -= inc; } They also contain specialized crummyness which tries to compensate for the general crummyness by caching recent conversion results, with no regard for locking or consistency. These replacement functions are smaller, O(1) and handle the Y2.1K leap-year correctly. Ideally, these functions should live in a module of their own, which the three offending filesystems would depend on, but the size is 877 bytes of code (on i386), so that would be false economy.	2006-10-22 18:19:08 +00:00
rwatson	7beaaf5cd2	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
alc	cbcb760109	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
davidxu	5bbfc34004	Use macro TAILQ_FOREACH_SAFE instead of expanding it.	2006-10-22 00:09:41 +00:00
davidxu	df9c81e665	Since revision 1.333 of kern_sig.c no longer uses P_WEXIT, the change opened a race window which can cause memory leak in signal queue. Here we free memory for signal queue when process state is set to PRS_ZOMBIE.	2006-10-21 23:59:15 +00:00
jhb	4cc0b2f46e	Remove the check that prevented signals from being delivered to exiting processes. It was originally added back when support for Linux threads (and thus shared sigacts objects) was added, but no one knows why. My guess is that at some point during the Linux threads patches, the sigacts object was torn down during exit1(), so this check was added to prevent a panic for that race. However, the stuff that was actually committed to the tree doesn't teardown sigacts until wait() making the above race moot. Re-allowing signals here lets one interrupt a NFS request during process teardown (such as closing descriptors) on an interruptible mount. Requested by: kib (long time ago) MFC after: 1 week	2006-10-20 16:19:21 +00:00
kib	5f5bf9dadc	Fix the race between devfs_fp_check and devfs_reclaim. Derefence the vnode' v_rdev and increment the dev threadcount , as well as clear it (in devfs_reclaim) under the dev_lock(). Reviewed by: tegge Approved by: pjd (mentor)	2006-10-20 07:59:50 +00:00
bde	57b90fd011	kern_intr.c: - Count (scheduling of) software interrupts (SWIs) as SWIs, not as hardware interrupts. - Don't count (scheduling of) delayed SWIs as interrupts at all, since in the delayed case it is expected that there are many more scheduling calls than handling calls. Perhaps all interrupts should be counted only when they are handled, but it is only counts of delayed SWIs that shouldn never be combined with the other counts. subr_trap.c: - Count (handling of) Asynchronous System Traps (ASTs) as traps, not as software interrupts. Before these changes, the counter for SWIs only counted ASTs, and SWIs weren't counted separately, but a subcounter for ASTs alone is less needed than for most other exception sources. 4.4BSD-Lite uses the counters for similar things (actually matching their names) on its main arches (hp300, ..., !i386) where more of the exceptions are in hardware.	2006-10-18 04:48:09 +00:00
davidxu	9e7c324e6e	Regenerate.	2006-10-17 02:28:58 +00:00
davidxu	bb5a3880aa	o Add keyword volatile for user mutex owner field. o Fix type consistent problem by using type long for old umtx and wait channel. o Rename casuptr to casuword.	2006-10-17 02:24:47 +00:00
netchild	183bd5a34b	MFP4 (with some minor changes): Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>	2006-10-15 14:22:14 +00:00
ru	1bc0f5fc76	Prevent IOC_IN with zero size argument (this is only supported if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it. Security: local DoS Reported by: Peter Holm MFC after: 3 days	2006-10-14 19:01:55 +00:00
trhodes	e7d9ab0897	Close a race condition where num can be larger than tmp, giving the user too large of a boundary. Reported by: Ilja Van Sprundel	2006-10-14 10:30:14 +00:00
tegge	2d3b850cd6	Wait for thread count to reach zero in destroy_devl() even when no purge method is defined, to avoid memory being modified after free. Temporarily increase refcount in destroy_devl() to avoid a double free if dev_rel() is called while waiting for thread count to reach zero.	2006-10-13 20:49:24 +00:00
glebius	7fc0346961	Improve ktr(4) logging for callout(9) subsystem. Log all inserts and removals, including failures, into the callwheel. XXX: Most of the CTR() macros are called with callout_lock spin mutex held, thus won't be logged into file, if KTR_ALQ is used. Moving the CTR() macros out from the spinlocked code would require copying of all arguments. I'm too lazy to do this.	2006-10-11 14:57:03 +00:00
davidxu	c5bda619e9	Implement 32bit umtx_lock and umtx_unlock system calls, these two system calls are not used by libthr in RELENG_6 and HEAD, it is only used by the libthr in RELENG-5, the _umtx_op system call can do more incremental dirty works than these two system calls without having to introduce new system calls or throw away old system calls when things are going on.	2006-10-06 08:22:08 +00:00
davidxu	0fa66af83d	Move some declaration of 32-bit signal structures into file freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.	2006-10-05 01:56:11 +00:00
mbr	7d21b9894a	Back out part of rev. 1.149. While adding a workaround in ptcopen() to avoid leaked ptys works fine, this opens a possible security hole. Submitted by: bde MFC after: 3 days	2006-10-04 05:43:39 +00:00
rwatson	dafc0585f0	Regenerate.	2006-10-03 20:48:11 +00:00
rwatson	b54cd1fd3c	Audit creat() system call (compat code), and change type for getpagesize(), which isn't actually being audited anyway. MFC after: 3 days Obtained from: TrustedBSD Project	2006-10-03 20:46:52 +00:00
kib	38b2c3b26f	Fix the remaining race in the revs. 1.232, 1,233 that could occur during unmount when mp structure is reused while waiting for coveredvp lock. Introduce struct mount generation count, increment it on each reuse and compare the generations before and after obtaining the coveredvp lock. Reviewed by: tegge, pjd Approved by: pjd (mentor) MFC after: 2 weeks	2006-10-03 10:47:04 +00:00
phk	638e020bc6	Use utc_offset() where applicable, and hide the internals of it as static variables.	2006-10-02 18:23:37 +00:00
phk	52fd275504	Introduce utc_offset() to capture a calculation currently done all over the place.	2006-10-02 16:17:23 +00:00
phk	7a874f961d	Move tz_minuteswest and tz_dsttime to subr_clock.c	2006-10-02 16:06:26 +00:00
phk	84cc0f277f	Second part of a little cleanup in the calendar/timezone/RTC handling. Split subr_clock.c in two parts (by repo-copy): subr_clock.c contains generic RTC and calendaric stuff. etc. subr_rtc.c contains the newbus'ified RTC interface. Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock} sysctls and associated variables into subr_clock.c. They are not machine dependent and we have generic code that relies on being present so they are not even optional.	2006-10-02 15:42:02 +00:00
phk	50c81b8a9a	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
kib	aa82c72808	Correct the comment: numvnodes is decreased on vdestroying the vnode. OKed by: tegge Approved by: pjd (mentor) MFC after: 1 week	2006-10-02 07:25:58 +00:00
tegge	4d26d3d4d7	If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.	2006-10-02 02:06:27 +00:00
mbr	7e18a19928	Readd rev. 1.145 because of vfs bugs and races near revoke(). Until they are fixed we can't free any slaves. Add a workaround to not to leak ptys by number.	2006-09-30 22:51:05 +00:00
pjd	1358d3f1ae	Remove duplicated $FreeBSD$.	2006-09-30 16:33:29 +00:00
mbr	b63df50b58	Any call of tty_close() with a tty refcount of <= 1 is wrong and we will free the tty in this case. This is a workaround until the underlaying devfs/tty problems are fixed. MFC after: 1 day	2006-09-30 08:11:51 +00:00
mbr	06d2753424	Free tty struct after last close. This should fix the pty-leak by numbers. Remove workarounds for tty_refcount beeing 0, this will be fixed differently later.	2006-09-29 09:53:19 +00:00
mbr	ed48249ff4	Free tty struct after last close. This should fix the pty-leak by numbers. Remove workarounds for tty_refcount beeing 0, this will be fixed differently later. Back out rev 1.145 since we initialize the tty struct from scratch and bad things can't happen anymore.	2006-09-29 09:52:57 +00:00
ru	4ef62e4ca5	Fix our ioctl(2) implementation when the argument is "int". New ioctls passing integer arguments should use the _IOWINT() macro. This fixes a lot of ioctl's not working on sparc64, most notable being keyboard/syscons ioctls. Full ABI compatibility is provided, with the bonus of fixing the handling of old ioctls on sparc64. Reviewed by: bde (with contributions) Tested by: emax, marius MFC after: 1 week	2006-09-27 19:57:02 +00:00
mbr	24e5e6df3a	Move Giant up even further since P_CONTROLT isn't really fully locked yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb	2006-09-27 16:42:10 +00:00
mbr	b07486e6c4	Use ctty instead of just returning. ctty just has a simple open that returns ENXIO. Submitted by: jhb	2006-09-27 16:41:15 +00:00
tegge	89ea8a9b1b	Reduce fluctuations of mnt_flag to allow unlocked readers to get a slightly more consistent view.	2006-09-26 04:20:09 +00:00
tegge	a802c35eb4	Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with MNT_UPDATE flag, closing a race between nmount() and quotactl().	2006-09-26 04:18:36 +00:00
tegge	f42473d76b	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
tegge	b500b0ae92	Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race with dounmount(), causing loss of MNTK_UNMOUNT flag.	2006-09-26 04:15:04 +00:00
tegge	83154f853d	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
rwatson	aa39d5a21d	SI_ORDER_THIRD + 2, not SI_ORDER_FOURTH + 2. MFC after: 3 days Submitted by: mlaier	2006-09-26 00:15:56 +00:00
rwatson	ffe75b8d42	Add "FreeBSD" trademark statement to copyright section of boot messages. MFC after: 3 days Approved by: core, board at FreeBSDFoundation dot org	2006-09-25 23:19:01 +00:00
jmg	2b64b4e601	remove unnecessary NULL check... Coverity ID: 1545	2006-09-25 01:29:48 +00:00
jmg	28f00353ba	hide kqueue_register from public view, and replace it w/ kqfd_register... this eliminates a possible race in aio registering a kevent..	2006-09-24 04:47:47 +00:00
jmg	08d150cea6	return EBADF instead of successfully attaching (and then panicing) when an fd is dieing.. Convinced by: jhb PR: 103127	2006-09-24 02:29:53 +00:00
jmg	689b605f8b	add KTRACE hooks into kevent... This will help people debug their kqueue programs to find out exactly which events were registered and which were returned... This should be lower in kern_kevent, but that would require special munging due to locks and the functions used to copyin/copyout kevents... If someone wants to teach ktrace how to output pretty kevents, I have a kevent prety printer that can be used...	2006-09-24 02:23:29 +00:00
mbr	f58415b4f5	Protect enterpgrp() against another tty/proc race case until the tty locking work has been fixed. MFC after: 1 week	2006-09-23 17:35:24 +00:00
mbr	9af53fcaab	Check for tp->t_refcnt == 0 before doing anything in tty_open(). PR: 103520 MFC after: 1 week	2006-09-23 14:52:46 +00:00
mbr	35537661a5	If /dev/tty gets opened after your controlling terminal has been revoked you can't call tty_clone afterwords. OpenBSD and NetBSD both fail the open call in that case, so we should do so as well. This can be done in ctty_clone by returning with *dev==NULL. Admittedly this causes open to return ENOENT, instead of ENXIO as on the other BSDs, but this way requires the least touching of code. Submitted by: Nate Eldredge <nge@cs.hmc.edu> PR: 83375 MFC: 1 week	2006-09-23 14:44:14 +00:00
bms	fdb36f8d07	Fix a case where socket I/O atomicity is violated due to not dropping the entire record when a non-data mbuf is removed in the soreceive() path. This only triggers a panic directly when compiled with INVARIANTS. PR: 38495 Submitted by: James Juran MFC after: 1 week	2006-09-22 15:34:16 +00:00
davidxu	a2dd6344c0	Add compatible code to let 32bit libthr work on 64bit kernel.	2006-09-22 15:04:28 +00:00
davidxu	04676f904a	Fix umtx command order error for freebsd 32bit.	2006-09-22 14:59:10 +00:00
davidxu	44261d5f28	Add umtx support for 32bit process on AMD64 machine.	2006-09-22 00:52:54 +00:00
mbr	cdbb778538	Back out rev. 1.258. The real race cause has been fixed in rev. 1.241 of kern_proc.c. Requested by: jhb	2006-09-21 14:09:26 +00:00
rrs	9221681b1a	atomic_fetchadd_int is used by mb_free_ext(), but it returns the previous value that the "add" effected (In this case we are adding -1), afterwhich we compare it to '0'... to see if we free the mbuf... we should be comparing it to '1'... Note that this only effects when there is contention since there is a first part to the comparison that checks to see if its '1'. So this bug would only crop up if two CPU's are trying to free the same mbuf refcount at the same time. This will happen in SCTP but I doubt can happen in TCP or UDP. PR: N/A Submitted by: rrs Reviewed by: gnn,sam Approved by: gnn,sam	2006-09-21 09:55:43 +00:00
davidxu	92bd1e76b1	Regenerate.	2006-09-21 04:19:48 +00:00
davidxu	bac7c2b79d	Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.	2006-09-21 04:18:46 +00:00
rwatson	41fdf2d481	Remove MAC_DEBUG + MPRINTF debugging from System V IPC. This no longer appears to be serving a useful purpose, as it was used during initial development of MAC support for System V IPC. MFC after: 1 month Obtained from: TrustedBSD Project Suggested by: Christopher dot Vance at SPARTA dot com	2006-09-20 13:40:00 +00:00
rwatson	5cab05d889	Remove MAC_DEBUG label counters, which were used to debug leaks and other problems while labels were first being added to various kernel objects. They have outlived their usefulness. MFC after: 1 month Suggested by: Christopher dot Vance at SPARTA dot com Obtained from: TrustedBSD Project	2006-09-20 13:33:41 +00:00
pjd	32e23e90df	There is no need to set 'sp' to NULL anymore.	2006-09-20 07:27:05 +00:00
tegge	f5b67318ae	Copy stat information from mount structure before it can change identity.	2006-09-20 00:32:07 +00:00
tegge	61b02921e7	Don't try to obtain a reference to a nonexisting (NULL) mount structure in default VOP_GETWRITEMOUNT().	2006-09-20 00:27:02 +00:00
mbr	fc7fc7fa1c	Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The tty code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week	2006-09-19 19:25:11 +00:00
kib	cf9722d790	Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be unlocked only if it really exists. Found with: Coverity Prevent(tm) CID: 1535 Approved by: pjd (mentor)	2006-09-19 14:04:12 +00:00
kib	e98552d0d9	Fix the race while waiting for coveredvp lock during unmount. The vnode may be recycled during the sleep, wrap the vn_lock with vhold/vdrop. Check that coveredvp still points to the same mp after sleep (needed because sleep dropped Giant). Move check for user rights for unmount after coveredvp lock is obtained. Tested by: Peter Holm Reviewed by: tegge Approved by: kan (mentor) MFC after: 2 weeks	2006-09-18 15:35:22 +00:00
rwatson	8b3f7ca1ce	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.	2006-09-17 20:00:36 +00:00
andre	710a642f7a	Remove VLAN mtag UMA zones and initialize ether_vtag and tso_segsz packet header fields to zero on mbuf allocation. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:44:32 +00:00
rwatson	9f40438221	Regenerate.	2006-09-17 13:29:36 +00:00
rwatson	f50a5f19fb	AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack(). Obtained from: TrustedBSD Project MFC after: 3 days	2006-09-17 13:28:11 +00:00
rwatson	cc2c7c1920	Expore kern.acct_configured, a sysctl that reflects the configured/ unconfigured state of the kernel accounting system. This is used by the accounting privilege regression test to determine whether accounting is in use and will be disrupted by the regression test. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project MFC after: 1 month	2006-09-17 11:00:36 +00:00
mohans	a569229408	Fix for a potential bug caught by Coverity. Pointed out to me by Kris Kennaway.	2006-09-14 17:57:02 +00:00
mohans	21daa650a9	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
scottl	ae1ca6fd73	Introduce a spinlock for synchronizing access to the video output hardware in syscons. This replaces a simple access semaphore that was assumed to be protected by Giant but often was not. If two threads that were otherwise SMP-safe called printf at the same time, there was a high likelyhood that the semaphore would get corrupted and result in a permanently frozen video console. This is similar to what is already done in the serial console drivers.	2006-09-13 15:48:15 +00:00
csjp	3927aa4474	Back out one of the Giant removals from revision 1.272. Giant was not here to protect the vnode, it was present to synchronize access to TTY session information between exit(2) and the TTY code. While we are here, note that Giant is required for TTY protection. Clue from: bde Discussed with: jhb MFC after: 1 week	2006-09-13 15:47:53 +00:00
pjd	166e57c29a	Fix a lock leak in an error case. Reported by: netchild Reviewed by: rwatson	2006-09-13 06:58:40 +00:00
jhb	c74e70f7a8	- Revert making bus_generic_add_child() the default for BUS_ADD_CHILD(). Instead, we want busses to explicitly specify an add_child routine if they want to support identify routines, but by default disallow having outside drivers add devices. - Give smbus(4) an explicit bus_add_child() method. Requested by: imp	2006-09-11 22:20:37 +00:00
jhb	7be67f8a93	Add a default method for BUS_ADD_CHILD() that just calls device_add_child_ordered(). Previously, a device driver that wanted to add a new child device in its identify routine had to know if the parent driver had a custom bus_add_child method and use BUS_ADD_CHILD() in that case, otherwise use device_add_child(). Getting it wrong in either direction would result in panics or failure to add the child device. Now, BUS_ADD_CHILD() always works isolating child drivers from having to know intimate details about the parent driver. Discussed with: imp MFC after: 1 week	2006-09-11 19:41:31 +00:00
jhb	896ceebabd	- Fix rman_manage_region() to be a lot more intelligent. It now checks for overlaps, but more importantly, it collapses adjacent free regions. This is needed to cope with BIOSen that split up ports for system devices (like IPMI controllers) across multiple system resource entries. - Now that rman_manage_region() is not so dumb, remove extra logic in the x86 nexus drivers to populate the IRQ rman that manually coalesced the regions. MFC after: 1 week	2006-09-11 19:31:52 +00:00
andre	7478fd14ea	New sockets created by incoming connections into listen sockets should inherit all settings and options except listen specific options. Add the missing send/receive timeouts and low watermarks. Remove inheritance of the field so_timeo which is unused. Noticed by: phk Reviewed by: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-10 17:08:06 +00:00
mbr	eecf512f8f	Fix locking race in ttymodem(). The locking of the proctree happens too late and opens a small race window before tp->t_session->s_leader is accessed. In case tp->t_session has just been set to NULL elsewhere, we get a panic(). This fix is a bandaid until someone else fixes the whole locking in the tty subsystem. Definitly more work needs to be done. MFC after: 1 week Reviewed by: mlaier PR: kern/103101	2006-09-10 16:51:56 +00:00
rwatson	5eee50ca36	Remove slightly oddly placed suser() call from the KTR/ALQ setup sysctl: it was present only in the enable path, not the disable path, which one presumes would be equally of interest. Either way, it was not needed, as the sysctl framework already calls suser() if the operation is a write operation, which configuration requests are. Sponsored by: nCircle Network Security, Inc.	2006-09-09 16:09:01 +00:00
jhb	27f742341b	Use sysctl_handle_long() instead of duplicating it's logic for kern.ipc.maxsockbuf so that this sysctl works for 32-bit binaries running on amd64 via compat/freebsd32. MFC after: 3 days	2006-09-06 21:59:36 +00:00
mp	f2b0030374	Remove call to fdfree() for the AIO daemons to prevent kernel panics with linprocfs. This call is not needed since file descriptor sharing was removed in v1.125. Reviewed by: alc, davidxu, ambrisko MFC after: 3 days	2006-09-06 15:11:20 +00:00
davidxu	3001f9c21f	Merge all code of do_lock_normal, do_lock_pi and do_lock_pp into function do_lock_umutex.	2006-09-05 12:01:09 +00:00
pjd	12baf6e1ec	Add 'show vnode <addr>' DDB command.	2006-09-04 22:15:44 +00:00
rwatson	58bc728eee	Regenerate for updated audit event identifiers.	2006-09-03 15:11:13 +00:00
rwatson	211375c235	Assign proper audit event identifiers to a number of system calls not covered in previous passes: - sysarch, rtprio - clock_settime - preadv/pwritev - __getcwd - kqueue - fhstatfs - kldunloadf Obtained from: TrustedBSD Project	2006-09-03 15:10:40 +00:00
rwatson	c5e1a3ed4e	Regenerate.	2006-09-03 13:48:48 +00:00
rwatson	3387bd0cd3	Use AUE_NTP_ADJTIME for ntp_adjtime() instead of AUE_ADJTIME. Obtained from: TrustedBSD Project	2006-09-03 13:44:21 +00:00
jmg	c25fb06d92	add a newbus method for obtaining the bus's bus_dma_tag_t... This is required by arches like sparc64 (not yet implemented) and sun4v where there are seperate IOMMU's for each PCI bus... For all other arches, it will end up returning NULL, which makes it a no-op... Convert a few drivers (the ones we've been working w/ on sun4v) to the new convection... Eventually all drivers will need to replace the parent tag of NULL, w/ bus_get_dma_tag(dev), though dev is usually different for each driver, and will require hand inspection... Reviewed by: scottl (earlier version)	2006-09-03 00:27:42 +00:00
davidxu	b73a4c5234	Check if it is root user in do_unlock_pp.	2006-09-03 00:07:37 +00:00
davidxu	f41d0bcd97	Make sure we get new m_owner value if we can not unlock it in uncontested case. Reorder statements in do_unlock_umutex.	2006-09-02 02:41:33 +00:00
wsalamon	c62317c442	Audit the argv and env vectors passed in on exec: Add the argument auditing functions for argv and env. Add kernel-specific versions of the tokenizer functions for the arg and env represented as a char array. Implement the AUDIT_ARGV and AUDIT_ARGE audit policy commands to enable/disable argv/env auditing. Call the argument auditing from the exec system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-09-01 11:45:40 +00:00
davidxu	14a4be51a4	Reorder some statments. Fix typo and remove stale comments.	2006-08-30 23:59:45 +00:00
davidxu	c4a566770e	Update comments about interrupted mutex locking.	2006-08-28 07:09:27 +00:00
davidxu	3cac7b5ddb	Regenerate.	2006-08-28 04:28:25 +00:00
davidxu	5a12667fcf	This is initial version of POSIX priority mutex support, a new userland mutex structure is added as following: struct umutex { __lwpid_t m_owner; uint32_t m_flags; uint32_t m_ceilings[2]; uint32_t m_spare[4]; }; The m_owner represents owner thread, it is a thread id, in non-contested case, userland can simply use atomic_cmpset_int to lock the mutex, if the mutex is contested, high order bit will be set, and userland should do locking and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure and kernel syscall should be invoked to do locking, this becauses for such a mutex, kernel should always boost the thread's priority before it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex, the first element is used to boost thread's priority when it locked the mutex, second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT mutex's link list is kept in userland, the m_ceiling[1] is managed by thread library so kernel needn't allocate memory to keep the link list, when such a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED. Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process shared, if the flag is not set, it saves a vm_map_lookup() call. The umtx chain is still used as a sleep queue, when a thread is blocked on PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority propagating, it is dynamically allocated and reference count is used, it is not optimized but works well in my tests, while the umtx chain has its own locking protocol, the priority propagating protocol are all protected by sched_lock because priority propagating function is called with sched_lock held from scheduler. No visible performance degradation is found which these changes. Some parameter names in _umtx_op syscall are renamed.	2006-08-28 04:24:51 +00:00
marius	e78225039a	Fix another bug introduced with rev. 1.204; in vfs_donmount() if the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)' call fails, 'errmsg' is left uninitialized, making the later tests against NULL meaningless, and the uses bogus. Thus initialize 'errmsg' to NULL beforehand. [1] While at it, remove the superfluous assignment of 0 to 'errmsg_len' if the above mentioned call fails as it's already initialized to 0. Submitted by: Michael Plass [1]	2006-08-26 16:28:19 +00:00
ssouhlal	4fdc09f2f9	The "taskqueue_fast" spinlocks were renamed to "fast_taskqueue" in subr_taskqueue.c:r1.32 Reported by: rdivacky	2006-08-26 11:21:25 +00:00
pjd	a2a865527b	Fix comment.	2006-08-25 15:13:49 +00:00
davidxu	fa0e6a0558	Same as previous change, the user provided priority should be reversed too.	2006-08-25 10:05:30 +00:00
davidxu	e2f7e9cc95	Initialize kg_base_user_pri.	2006-08-25 06:29:16 +00:00
davidxu	8e3f035b3f	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
marius	604b84193c	Fix a bug introduced with rev. 1.204; in vfs_donmount() use copyout(9) instead of copystr(9) for copying the errmsg from kernel- to user-space. This fixes a panic on sparc64 when using the nmount(2)-converted mountd(8). While at it, use bcopy(3) instead of strncpy(3) in the kernel- to kernel-space case for consistency with vfs_buildopts() and between kernel- to user-space and kernel- to kernel-space case.	2006-08-24 18:52:28 +00:00
davidxu	97c67e6558	POSIX requires that higher numerical values for the priority represent higher priorities, so we should reverse the passed value here.	2006-08-23 07:22:25 +00:00
cperciva	4ec3e871a6	Fix a signedness bug. MFC after: 3 days Security: Local DoS	2006-08-20 10:29:08 +00:00
gnn	b22d0cad1e	Fix a kernel panic based on receiving an ICMPv6 Packet too Big message. PR: 99779 Submitted by: Jinmei Tatuya Reviewed by: clement, rwatson MFC after: 1 week	2006-08-18 14:05:13 +00:00
peter	b34fae62b6	Grab two syscall numbers. One is used to emulate functionality that linux has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname that corresponds to a given file descriptor). Valgrind-3.x needs this functionality. This is a placeholder only at this time.	2006-08-16 22:32:50 +00:00
cperciva	9525a1c43e	Swap the names "sem_exithook" and "sem_exechook" in the previous commit to match up with reality and the prototype definitions. Register the sem_exechook as the "process_exec" event handler, not sem_exithook. Submitted by: rdivacky Sponsored by: SoC 2006	2006-08-16 08:25:40 +00:00
jhb	4e96206d8a	Add a new 'show sleepchain' ddb command similar to 'show lockchain' except that it operates on lockmgr and sx locks. This can be useful for tracking down vnode deadlocks in VFS for example. Note that this command is a bit more fragile than 'show lockchain' as we have to poke around at the wait channel of a thread to see if it points to either a struct lock or a condition variable inside of a struct sx. If td_wchan points to something unmapped, then this command will terminate early due to a fault, but no harm will be done.	2006-08-15 18:29:01 +00:00
jhb	e092be0121	- When spinning on a spin lock, if the debugger is active or we are in a panic, go ahead and do the longer DELAY(1) spin wait. - If we panic due to spinning too long, print out a few more details including the pointer to the mutex in question and the tid of the owning thread.	2006-08-15 18:26:12 +00:00
jhb	d900df3c77	Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier systrace changes.	2006-08-15 17:37:01 +00:00
jhb	3cbb1cf256	Add a new set of macros <prefix>_AUE_<syscallname> to sysproto.h that map to the audit event associated with a specific system call. For example, SYS_AUE___semctl would be set to AUE_SEMCTL in sys/sysproto.h.	2006-08-15 17:09:32 +00:00
jhb	0d834c1eff	- Use NOSTD rather than NOIMPL for nfssvc() to match other syscalls provided via klds. - Correct audit identifier for nfssvc().	2006-08-15 16:45:41 +00:00
jhb	c8c91ce0a9	Rename 'show lockchain' to 'show locktree' and 'show threadchain' to 'show lockchain'. The churn is because I'm about to add a new 'show sleepchain' similar to 'show lockchain' for sleep locks (lockmgr and sx) and 'show threadchain' was a bit ambiguous as both commands show a chain of thread dependencies, 'lockchain' is for non-sleepable locks (mtx and rw) and 'sleepchain' is for sleepable locks.	2006-08-15 16:44:18 +00:00
jhb	9810a59f6d	Add a 'show lockmgr' command that dumps the relevant details of a lockmgr lock.	2006-08-15 16:42:16 +00:00
netchild	b2a39f267a	- Change process_exec function handlers prototype to include struct image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret. Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)	2006-08-15 12:10:57 +00:00
rwatson	ff74569fb4	Minor white space tweaks.	2006-08-13 23:16:59 +00:00

1 2 3 4 5 ...

9714 Commits