freebsd-dev

Author	SHA1	Message	Date
Alexander Leidinger	6a1162d4cd	MFP4 (with some minor changes): Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>	2006-10-15 14:22:14 +00:00
Ruslan Ermilov	a1b0a18096	Prevent IOC_IN with zero size argument (this is only supported if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it. Security: local DoS Reported by: Peter Holm MFC after: 3 days	2006-10-14 19:01:55 +00:00
Tom Rhodes	f51bf07af8	Close a race condition where num can be larger than tmp, giving the user too large of a boundary. Reported by: Ilja Van Sprundel	2006-10-14 10:30:14 +00:00
Tor Egge	e0c33ad529	Wait for thread count to reach zero in destroy_devl() even when no purge method is defined, to avoid memory being modified after free. Temporarily increase refcount in destroy_devl() to avoid a double free if dev_rel() is called while waiting for thread count to reach zero.	2006-10-13 20:49:24 +00:00
Gleb Smirnoff	68a57ebfad	Improve ktr(4) logging for callout(9) subsystem. Log all inserts and removals, including failures, into the callwheel. XXX: Most of the CTR() macros are called with callout_lock spin mutex held, thus won't be logged into file, if KTR_ALQ is used. Moving the CTR() macros out from the spinlocked code would require copying of all arguments. I'm too lazy to do this.	2006-10-11 14:57:03 +00:00
David Xu	ae7d8a6766	Implement 32bit umtx_lock and umtx_unlock system calls, these two system calls are not used by libthr in RELENG_6 and HEAD, it is only used by the libthr in RELENG-5, the _umtx_op system call can do more incremental dirty works than these two system calls without having to introduce new system calls or throw away old system calls when things are going on.	2006-10-06 08:22:08 +00:00
David Xu	c6511aea86	Move some declaration of 32-bit signal structures into file freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.	2006-10-05 01:56:11 +00:00
Martin Blapp	89ff1e4cb8	Back out part of rev. 1.149. While adding a workaround in ptcopen() to avoid leaked ptys works fine, this opens a possible security hole. Submitted by: bde MFC after: 3 days	2006-10-04 05:43:39 +00:00
Robert Watson	531147aa3e	Regenerate.	2006-10-03 20:48:11 +00:00
Robert Watson	888db9e177	Audit creat() system call (compat code), and change type for getpagesize(), which isn't actually being audited anyway. MFC after: 3 days Obtained from: TrustedBSD Project	2006-10-03 20:46:52 +00:00
Konstantin Belousov	30af71199e	Fix the remaining race in the revs. 1.232, 1,233 that could occur during unmount when mp structure is reused while waiting for coveredvp lock. Introduce struct mount generation count, increment it on each reuse and compare the generations before and after obtaining the coveredvp lock. Reviewed by: tegge, pjd Approved by: pjd (mentor) MFC after: 2 weeks	2006-10-03 10:47:04 +00:00
Poul-Henning Kamp	e5037a18a9	Use utc_offset() where applicable, and hide the internals of it as static variables.	2006-10-02 18:23:37 +00:00
Poul-Henning Kamp	f97c1c4bf7	Introduce utc_offset() to capture a calculation currently done all over the place.	2006-10-02 16:17:23 +00:00
Poul-Henning Kamp	94d67e0fb8	Move tz_minuteswest and tz_dsttime to subr_clock.c	2006-10-02 16:06:26 +00:00
Poul-Henning Kamp	b69f71eb29	Second part of a little cleanup in the calendar/timezone/RTC handling. Split subr_clock.c in two parts (by repo-copy): subr_clock.c contains generic RTC and calendaric stuff. etc. subr_rtc.c contains the newbus'ified RTC interface. Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock} sysctls and associated variables into subr_clock.c. They are not machine dependent and we have generic code that relies on being present so they are not even optional.	2006-10-02 15:42:02 +00:00
Poul-Henning Kamp	f645b0b51c	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
Konstantin Belousov	45ea8737bf	Correct the comment: numvnodes is decreased on vdestroying the vnode. OKed by: tegge Approved by: pjd (mentor) MFC after: 1 week	2006-10-02 07:25:58 +00:00
Tor Egge	04aa807cb6	If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.	2006-10-02 02:06:27 +00:00
Martin Blapp	570d6457d1	Readd rev. 1.145 because of vfs bugs and races near revoke(). Until they are fixed we can't free any slaves. Add a workaround to not to leak ptys by number.	2006-09-30 22:51:05 +00:00
Pawel Jakub Dawidek	2342d5216e	Remove duplicated $FreeBSD$.	2006-09-30 16:33:29 +00:00
Martin Blapp	35dcc318f4	Any call of tty_close() with a tty refcount of <= 1 is wrong and we will free the tty in this case. This is a workaround until the underlaying devfs/tty problems are fixed. MFC after: 1 day	2006-09-30 08:11:51 +00:00
Martin Blapp	9b206de5a0	Free tty struct after last close. This should fix the pty-leak by numbers. Remove workarounds for tty_refcount beeing 0, this will be fixed differently later.	2006-09-29 09:53:19 +00:00
Martin Blapp	e4936f3763	Free tty struct after last close. This should fix the pty-leak by numbers. Remove workarounds for tty_refcount beeing 0, this will be fixed differently later. Back out rev 1.145 since we initialize the tty struct from scratch and bad things can't happen anymore.	2006-09-29 09:52:57 +00:00
Ruslan Ermilov	9fddcc6661	Fix our ioctl(2) implementation when the argument is "int". New ioctls passing integer arguments should use the _IOWINT() macro. This fixes a lot of ioctl's not working on sparc64, most notable being keyboard/syscons ioctls. Full ABI compatibility is provided, with the bonus of fixing the handling of old ioctls on sparc64. Reviewed by: bde (with contributions) Tested by: emax, marius MFC after: 1 week	2006-09-27 19:57:02 +00:00
Martin Blapp	8be563721a	Move Giant up even further since P_CONTROLT isn't really fully locked yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb	2006-09-27 16:42:10 +00:00
Martin Blapp	1bf5e4b866	Use ctty instead of just returning. ctty just has a simple open that returns ENXIO. Submitted by: jhb	2006-09-27 16:41:15 +00:00
Tor Egge	e60c361218	Reduce fluctuations of mnt_flag to allow unlocked readers to get a slightly more consistent view.	2006-09-26 04:20:09 +00:00
Tor Egge	fba924ce9b	Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with MNT_UPDATE flag, closing a race between nmount() and quotactl().	2006-09-26 04:18:36 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	cea9d840d8	Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race with dounmount(), causing loss of MNTK_UNMOUNT flag.	2006-09-26 04:15:04 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Robert Watson	88b85279a9	SI_ORDER_THIRD + 2, not SI_ORDER_FOURTH + 2. MFC after: 3 days Submitted by: mlaier	2006-09-26 00:15:56 +00:00
Robert Watson	5add74b4a7	Add "FreeBSD" trademark statement to copyright section of boot messages. MFC after: 3 days Approved by: core, board at FreeBSDFoundation dot org	2006-09-25 23:19:01 +00:00
John-Mark Gurney	33fabe46da	remove unnecessary NULL check... Coverity ID: 1545	2006-09-25 01:29:48 +00:00
John-Mark Gurney	4db71d27a1	hide kqueue_register from public view, and replace it w/ kqfd_register... this eliminates a possible race in aio registering a kevent..	2006-09-24 04:47:47 +00:00
John-Mark Gurney	aeab19b21f	return EBADF instead of successfully attaching (and then panicing) when an fd is dieing.. Convinced by: jhb PR: 103127	2006-09-24 02:29:53 +00:00
John-Mark Gurney	9edac6f3f9	add KTRACE hooks into kevent... This will help people debug their kqueue programs to find out exactly which events were registered and which were returned... This should be lower in kern_kevent, but that would require special munging due to locks and the functions used to copyin/copyout kevents... If someone wants to teach ktrace how to output pretty kevents, I have a kevent prety printer that can be used...	2006-09-24 02:23:29 +00:00
Martin Blapp	45e6819160	Protect enterpgrp() against another tty/proc race case until the tty locking work has been fixed. MFC after: 1 week	2006-09-23 17:35:24 +00:00
Martin Blapp	7c56049e6d	Check for tp->t_refcnt == 0 before doing anything in tty_open(). PR: 103520 MFC after: 1 week	2006-09-23 14:52:46 +00:00
Martin Blapp	153c21c8c1	If /dev/tty gets opened after your controlling terminal has been revoked you can't call tty_clone afterwords. OpenBSD and NetBSD both fail the open call in that case, so we should do so as well. This can be done in ctty_clone by returning with *dev==NULL. Admittedly this causes open to return ENOENT, instead of ENXIO as on the other BSDs, but this way requires the least touching of code. Submitted by: Nate Eldredge <nge@cs.hmc.edu> PR: 83375 MFC: 1 week	2006-09-23 14:44:14 +00:00
Bruce M Simpson	4a75dc2585	Fix a case where socket I/O atomicity is violated due to not dropping the entire record when a non-data mbuf is removed in the soreceive() path. This only triggers a panic directly when compiled with INVARIANTS. PR: 38495 Submitted by: James Juran MFC after: 1 week	2006-09-22 15:34:16 +00:00
David Xu	cda9a0d1c2	Add compatible code to let 32bit libthr work on 64bit kernel.	2006-09-22 15:04:28 +00:00
David Xu	e58b17ea53	Fix umtx command order error for freebsd 32bit.	2006-09-22 14:59:10 +00:00
David Xu	1eec02f538	Add umtx support for 32bit process on AMD64 machine.	2006-09-22 00:52:54 +00:00
Martin Blapp	1c1d411bee	Back out rev. 1.258. The real race cause has been fixed in rev. 1.241 of kern_proc.c. Requested by: jhb	2006-09-21 14:09:26 +00:00
Randall Stewart	adf5d1c6d0	atomic_fetchadd_int is used by mb_free_ext(), but it returns the previous value that the "add" effected (In this case we are adding -1), afterwhich we compare it to '0'... to see if we free the mbuf... we should be comparing it to '1'... Note that this only effects when there is contention since there is a first part to the comparison that checks to see if its '1'. So this bug would only crop up if two CPU's are trying to free the same mbuf refcount at the same time. This will happen in SCTP but I doubt can happen in TCP or UDP. PR: N/A Submitted by: rrs Reviewed by: gnn,sam Approved by: gnn,sam	2006-09-21 09:55:43 +00:00
David Xu	cca0a557dd	Regenerate.	2006-09-21 04:19:48 +00:00
David Xu	73fa3e5b88	Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.	2006-09-21 04:18:46 +00:00
Robert Watson	f50c4fd817	Remove MAC_DEBUG + MPRINTF debugging from System V IPC. This no longer appears to be serving a useful purpose, as it was used during initial development of MAC support for System V IPC. MFC after: 1 month Obtained from: TrustedBSD Project Suggested by: Christopher dot Vance at SPARTA dot com	2006-09-20 13:40:00 +00:00
Robert Watson	738f14d4b1	Remove MAC_DEBUG label counters, which were used to debug leaks and other problems while labels were first being added to various kernel objects. They have outlived their usefulness. MFC after: 1 month Suggested by: Christopher dot Vance at SPARTA dot com Obtained from: TrustedBSD Project	2006-09-20 13:33:41 +00:00
Pawel Jakub Dawidek	783deec19e	There is no need to set 'sp' to NULL anymore.	2006-09-20 07:27:05 +00:00
Tor Egge	4e59868e08	Copy stat information from mount structure before it can change identity.	2006-09-20 00:32:07 +00:00
Tor Egge	60b0b1aa18	Don't try to obtain a reference to a nonexisting (NULL) mount structure in default VOP_GETWRITEMOUNT().	2006-09-20 00:27:02 +00:00
Martin Blapp	d7b167b57b	Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The tty code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week	2006-09-19 19:25:11 +00:00
Konstantin Belousov	f37e633887	Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be unlocked only if it really exists. Found with: Coverity Prevent(tm) CID: 1535 Approved by: pjd (mentor)	2006-09-19 14:04:12 +00:00
Konstantin Belousov	4dec8579bd	Fix the race while waiting for coveredvp lock during unmount. The vnode may be recycled during the sleep, wrap the vn_lock with vhold/vdrop. Check that coveredvp still points to the same mp after sleep (needed because sleep dropped Giant). Move check for user rights for unmount after coveredvp lock is obtained. Tested by: Peter Holm Reviewed by: tegge Approved by: kan (mentor) MFC after: 2 weeks	2006-09-18 15:35:22 +00:00
Robert Watson	5702e0965e	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.	2006-09-17 20:00:36 +00:00
Andre Oppermann	a855e2b4c0	Remove VLAN mtag UMA zones and initialize ether_vtag and tso_segsz packet header fields to zero on mbuf allocation. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:44:32 +00:00
Robert Watson	da7cbdc2b3	Regenerate.	2006-09-17 13:29:36 +00:00
Robert Watson	6c2d307a0e	AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack(). Obtained from: TrustedBSD Project MFC after: 3 days	2006-09-17 13:28:11 +00:00
Robert Watson	101581b082	Expore kern.acct_configured, a sysctl that reflects the configured/ unconfigured state of the kernel accounting system. This is used by the accounting privilege regression test to determine whether accounting is in use and will be disrupted by the regression test. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project MFC after: 1 month	2006-09-17 11:00:36 +00:00
Mohan Srinivasan	3c5b80d6c2	Fix for a potential bug caught by Coverity. Pointed out to me by Kris Kennaway.	2006-09-14 17:57:02 +00:00
Mohan Srinivasan	7d7d9e2242	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
Scott Long	988129b824	Introduce a spinlock for synchronizing access to the video output hardware in syscons. This replaces a simple access semaphore that was assumed to be protected by Giant but often was not. If two threads that were otherwise SMP-safe called printf at the same time, there was a high likelyhood that the semaphore would get corrupted and result in a permanently frozen video console. This is similar to what is already done in the serial console drivers.	2006-09-13 15:48:15 +00:00
Christian S.J. Peron	7ca6b7823d	Back out one of the Giant removals from revision 1.272. Giant was not here to protect the vnode, it was present to synchronize access to TTY session information between exit(2) and the TTY code. While we are here, note that Giant is required for TTY protection. Clue from: bde Discussed with: jhb MFC after: 1 week	2006-09-13 15:47:53 +00:00
Pawel Jakub Dawidek	689f94bfe6	Fix a lock leak in an error case. Reported by: netchild Reviewed by: rwatson	2006-09-13 06:58:40 +00:00
John Baldwin	3bb00f61a2	- Revert making bus_generic_add_child() the default for BUS_ADD_CHILD(). Instead, we want busses to explicitly specify an add_child routine if they want to support identify routines, but by default disallow having outside drivers add devices. - Give smbus(4) an explicit bus_add_child() method. Requested by: imp	2006-09-11 22:20:37 +00:00
John Baldwin	4288462f38	Add a default method for BUS_ADD_CHILD() that just calls device_add_child_ordered(). Previously, a device driver that wanted to add a new child device in its identify routine had to know if the parent driver had a custom bus_add_child method and use BUS_ADD_CHILD() in that case, otherwise use device_add_child(). Getting it wrong in either direction would result in panics or failure to add the child device. Now, BUS_ADD_CHILD() always works isolating child drivers from having to know intimate details about the parent driver. Discussed with: imp MFC after: 1 week	2006-09-11 19:41:31 +00:00
John Baldwin	9914a8cc7d	- Fix rman_manage_region() to be a lot more intelligent. It now checks for overlaps, but more importantly, it collapses adjacent free regions. This is needed to cope with BIOSen that split up ports for system devices (like IPMI controllers) across multiple system resource entries. - Now that rman_manage_region() is not so dumb, remove extra logic in the x86 nexus drivers to populate the IRQ rman that manually coalesced the regions. MFC after: 1 week	2006-09-11 19:31:52 +00:00
Andre Oppermann	805def2e04	New sockets created by incoming connections into listen sockets should inherit all settings and options except listen specific options. Add the missing send/receive timeouts and low watermarks. Remove inheritance of the field so_timeo which is unused. Noticed by: phk Reviewed by: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-10 17:08:06 +00:00
Martin Blapp	f976eefa00	Fix locking race in ttymodem(). The locking of the proctree happens too late and opens a small race window before tp->t_session->s_leader is accessed. In case tp->t_session has just been set to NULL elsewhere, we get a panic(). This fix is a bandaid until someone else fixes the whole locking in the tty subsystem. Definitly more work needs to be done. MFC after: 1 week Reviewed by: mlaier PR: kern/103101	2006-09-10 16:51:56 +00:00
Robert Watson	484cc85edb	Remove slightly oddly placed suser() call from the KTR/ALQ setup sysctl: it was present only in the enable path, not the disable path, which one presumes would be equally of interest. Either way, it was not needed, as the sysctl framework already calls suser() if the operation is a write operation, which configuration requests are. Sponsored by: nCircle Network Security, Inc.	2006-09-09 16:09:01 +00:00
John Baldwin	86a93d51e3	Use sysctl_handle_long() instead of duplicating it's logic for kern.ipc.maxsockbuf so that this sysctl works for 32-bit binaries running on amd64 via compat/freebsd32. MFC after: 3 days	2006-09-06 21:59:36 +00:00
Mark Peek	f6d004d510	Remove call to fdfree() for the AIO daemons to prevent kernel panics with linprocfs. This call is not needed since file descriptor sharing was removed in v1.125. Reviewed by: alc, davidxu, ambrisko MFC after: 3 days	2006-09-06 15:11:20 +00:00
David Xu	654d6b2e0b	Merge all code of do_lock_normal, do_lock_pi and do_lock_pp into function do_lock_umutex.	2006-09-05 12:01:09 +00:00
Pawel Jakub Dawidek	c37789fe7e	Add 'show vnode <addr>' DDB command.	2006-09-04 22:15:44 +00:00
Robert Watson	89ede214c7	Regenerate for updated audit event identifiers.	2006-09-03 15:11:13 +00:00
Robert Watson	7f26ddda62	Assign proper audit event identifiers to a number of system calls not covered in previous passes: - sysarch, rtprio - clock_settime - preadv/pwritev - __getcwd - kqueue - fhstatfs - kldunloadf Obtained from: TrustedBSD Project	2006-09-03 15:10:40 +00:00
Robert Watson	863ccba5d5	Regenerate.	2006-09-03 13:48:48 +00:00
Robert Watson	d1967c5d2c	Use AUE_NTP_ADJTIME for ntp_adjtime() instead of AUE_ADJTIME. Obtained from: TrustedBSD Project	2006-09-03 13:44:21 +00:00
John-Mark Gurney	378f231e7d	add a newbus method for obtaining the bus's bus_dma_tag_t... This is required by arches like sparc64 (not yet implemented) and sun4v where there are seperate IOMMU's for each PCI bus... For all other arches, it will end up returning NULL, which makes it a no-op... Convert a few drivers (the ones we've been working w/ on sun4v) to the new convection... Eventually all drivers will need to replace the parent tag of NULL, w/ bus_get_dma_tag(dev), though dev is usually different for each driver, and will require hand inspection... Reviewed by: scottl (earlier version)	2006-09-03 00:27:42 +00:00
David Xu	295ce693b9	Check if it is root user in do_unlock_pp.	2006-09-03 00:07:37 +00:00
David Xu	81273e0632	Make sure we get new m_owner value if we can not unlock it in uncontested case. Reorder statements in do_unlock_umutex.	2006-09-02 02:41:33 +00:00
Wayne Salamon	ae1078d657	Audit the argv and env vectors passed in on exec: Add the argument auditing functions for argv and env. Add kernel-specific versions of the tokenizer functions for the arg and env represented as a char array. Implement the AUDIT_ARGV and AUDIT_ARGE audit policy commands to enable/disable argv/env auditing. Call the argument auditing from the exec system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-09-01 11:45:40 +00:00
David Xu	8a156460bf	Reorder some statments. Fix typo and remove stale comments.	2006-08-30 23:59:45 +00:00
David Xu	a324b5ecd3	Update comments about interrupted mutex locking.	2006-08-28 07:09:27 +00:00
David Xu	cd42ca3c27	Regenerate.	2006-08-28 04:28:25 +00:00
David Xu	d10183d94d	This is initial version of POSIX priority mutex support, a new userland mutex structure is added as following: struct umutex { __lwpid_t m_owner; uint32_t m_flags; uint32_t m_ceilings[2]; uint32_t m_spare[4]; }; The m_owner represents owner thread, it is a thread id, in non-contested case, userland can simply use atomic_cmpset_int to lock the mutex, if the mutex is contested, high order bit will be set, and userland should do locking and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure and kernel syscall should be invoked to do locking, this becauses for such a mutex, kernel should always boost the thread's priority before it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex, the first element is used to boost thread's priority when it locked the mutex, second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT mutex's link list is kept in userland, the m_ceiling[1] is managed by thread library so kernel needn't allocate memory to keep the link list, when such a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED. Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process shared, if the flag is not set, it saves a vm_map_lookup() call. The umtx chain is still used as a sleep queue, when a thread is blocked on PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority propagating, it is dynamically allocated and reference count is used, it is not optimized but works well in my tests, while the umtx chain has its own locking protocol, the priority propagating protocol are all protected by sched_lock because priority propagating function is called with sched_lock held from scheduler. No visible performance degradation is found which these changes. Some parameter names in _umtx_op syscall are renamed.	2006-08-28 04:24:51 +00:00
Marius Strobl	aed760ef8a	Fix another bug introduced with rev. 1.204; in vfs_donmount() if the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)' call fails, 'errmsg' is left uninitialized, making the later tests against NULL meaningless, and the uses bogus. Thus initialize 'errmsg' to NULL beforehand. [1] While at it, remove the superfluous assignment of 0 to 'errmsg_len' if the above mentioned call fails as it's already initialized to 0. Submitted by: Michael Plass [1]	2006-08-26 16:28:19 +00:00
Suleiman Souhlal	bec31a8fee	The "taskqueue_fast" spinlocks were renamed to "fast_taskqueue" in subr_taskqueue.c:r1.32 Reported by: rdivacky	2006-08-26 11:21:25 +00:00
Pawel Jakub Dawidek	bebabf24bb	Fix comment.	2006-08-25 15:13:49 +00:00
David Xu	fd4a6d10a4	Same as previous change, the user provided priority should be reversed too.	2006-08-25 10:05:30 +00:00
David Xu	4386313871	Initialize kg_base_user_pri.	2006-08-25 06:29:16 +00:00
David Xu	3db720fdce	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
Marius Strobl	3a30d178fe	Fix a bug introduced with rev. 1.204; in vfs_donmount() use copyout(9) instead of copystr(9) for copying the errmsg from kernel- to user-space. This fixes a panic on sparc64 when using the nmount(2)-converted mountd(8). While at it, use bcopy(3) instead of strncpy(3) in the kernel- to kernel-space case for consistency with vfs_buildopts() and between kernel- to user-space and kernel- to kernel-space case.	2006-08-24 18:52:28 +00:00
David Xu	de08f4ee5c	POSIX requires that higher numerical values for the priority represent higher priorities, so we should reverse the passed value here.	2006-08-23 07:22:25 +00:00
Colin Percival	23a28f3a0d	Fix a signedness bug. MFC after: 3 days Security: Local DoS	2006-08-20 10:29:08 +00:00
George V. Neville-Neil	daa5817e92	Fix a kernel panic based on receiving an ICMPv6 Packet too Big message. PR: 99779 Submitted by: Jinmei Tatuya Reviewed by: clement, rwatson MFC after: 1 week	2006-08-18 14:05:13 +00:00
Peter Wemm	bad9a7a5f9	Grab two syscall numbers. One is used to emulate functionality that linux has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname that corresponds to a given file descriptor). Valgrind-3.x needs this functionality. This is a placeholder only at this time.	2006-08-16 22:32:50 +00:00
Colin Percival	e2d70dbae1	Swap the names "sem_exithook" and "sem_exechook" in the previous commit to match up with reality and the prototype definitions. Register the sem_exechook as the "process_exec" event handler, not sem_exithook. Submitted by: rdivacky Sponsored by: SoC 2006	2006-08-16 08:25:40 +00:00
John Baldwin	462a7add8e	Add a new 'show sleepchain' ddb command similar to 'show lockchain' except that it operates on lockmgr and sx locks. This can be useful for tracking down vnode deadlocks in VFS for example. Note that this command is a bit more fragile than 'show lockchain' as we have to poke around at the wait channel of a thread to see if it points to either a struct lock or a condition variable inside of a struct sx. If td_wchan points to something unmapped, then this command will terminate early due to a fault, but no harm will be done.	2006-08-15 18:29:01 +00:00
John Baldwin	0fa2168b19	- When spinning on a spin lock, if the debugger is active or we are in a panic, go ahead and do the longer DELAY(1) spin wait. - If we panic due to spinning too long, print out a few more details including the pointer to the mutex in question and the tid of the owning thread.	2006-08-15 18:26:12 +00:00
John Baldwin	f8f1f7fb85	Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier systrace changes.	2006-08-15 17:37:01 +00:00
John Baldwin	52a79796c4	Add a new set of macros <prefix>_AUE_<syscallname> to sysproto.h that map to the audit event associated with a specific system call. For example, SYS_AUE___semctl would be set to AUE_SEMCTL in sys/sysproto.h.	2006-08-15 17:09:32 +00:00
John Baldwin	589201fd4e	- Use NOSTD rather than NOIMPL for nfssvc() to match other syscalls provided via klds. - Correct audit identifier for nfssvc().	2006-08-15 16:45:41 +00:00
John Baldwin	77e662683b	Rename 'show lockchain' to 'show locktree' and 'show threadchain' to 'show lockchain'. The churn is because I'm about to add a new 'show sleepchain' similar to 'show lockchain' for sleep locks (lockmgr and sx) and 'show threadchain' was a bit ambiguous as both commands show a chain of thread dependencies, 'lockchain' is for non-sleepable locks (mtx and rw) and 'sleepchain' is for sleepable locks.	2006-08-15 16:44:18 +00:00
John Baldwin	be6847d729	Add a 'show lockmgr' command that dumps the relevant details of a lockmgr lock.	2006-08-15 16:42:16 +00:00
Alexander Leidinger	993182e57c	- Change process_exec function handlers prototype to include struct image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret. Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)	2006-08-15 12:10:57 +00:00
Robert Watson	b7e2f3ec76	Minor white space tweaks.	2006-08-13 23:16:59 +00:00
Alan Cox	5d1445cdf2	Reduce the scope of the page queues lock in vm_pgmoveco() now that vm_page_sleep_if_busy() no longer requires the page queue lock to be held. Correctly spell "TRUE".	2006-08-12 19:47:49 +00:00
Robert Watson	79ad81c06d	Before performing a sodealloc() when pru_attach() fails, assert that the socket refcount remains 1, and then drop to 0 before freeing the socket. PR: 101763 Reported by: Gleb Kozyrev <gkozyrev at ukr dot net>	2006-08-11 23:03:10 +00:00
Pawel Jakub Dawidek	04d9e255df	getnewvnode() can be called with NULL mp. Found by: Coverity Prevent (tm) Coverity ID: 1521 Confirmed by: phk	2006-08-10 08:56:03 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Pawel Jakub Dawidek	13c85d339d	Add a bandaid to avoid a deadlock in a situation, when we are trying to suspend a file system, but need to obtain a vnode. We may not be able to do it, because all vnodes could be already in use and other processes cannot release them, because they are waiting in "suspfs" state. In such situation, we allow to allocate a vnode anyway. This is a temporary fix - there is no backpressure to free vnodes allocated in those circumstances. MFC after: 1 week Reviewed by: tegge	2006-08-09 12:47:30 +00:00
Alan Cox	ab83ac429d	Reduce the scope of the page queues lock in vfs_busy_pages() now that vm_page_sleep_if_busy() no longer requires the caller to hold the page queues lock.	2006-08-08 06:00:49 +00:00
Robert Watson	e4445a031f	Move definition of UNIX domain socket protosw and domain entries from uipc_proto.c to uipc_usrreq.c, making localdomain static. Remove uipc_proto.c as it's no longer used. With this change, UNIX domain sockets are entirely encapsulated in uipc_usrreq.c.	2006-08-07 12:02:43 +00:00
Robert Watson	ccdebe46bd	Improve commenting of vaccess(), making sure to be clear that the ifdef capabilities code is there for reference and never actually used. Slight style tweak.	2006-08-06 10:43:35 +00:00
Robert Watson	52b384621e	Don't set pru_sosend, pru_soreceive, pru_sopoll to default values, as they are already set to default values.	2006-08-06 10:39:21 +00:00
Alan Cox	7c4b7ecc4c	Reduce the scope of the page queues lock in kern_sendfile() now that vm_page_sleep_if_busy() no longer requires the caller to hold the page queues lock.	2006-08-06 01:00:09 +00:00
Robert Watson	5111b5e180	Remove register, use ANSI function headers.	2006-08-05 21:40:59 +00:00
Robert Watson	12de451046	We now spell "inode" as "vnode" in the VFS layer, so update comment for new world order. MFC after: 3 days Pointed out by: mckusick	2006-08-05 21:08:47 +00:00
John Birrell	a4bc5ae534	Add support for the generated file systrace_args.c.	2006-08-05 19:25:14 +00:00
Yaroslav Tykhiy	776fc0e90e	Commit the results of the typo hunt by Darren Pilgrim. This change affects documentation and comments only, no real code involved. PR: misc/101245 Submitted by: Darren Pilgrim <darren pilgrim bitfreak org> Tested by: md5(1) MFC after: 1 week	2006-08-04 07:56:35 +00:00
Alan Cox	10c09f3f61	The page queues lock is no longer required by vm_page_io_start(). Reduce the scope of the page queues lock in kern_sendfile() accordingly.	2006-08-04 05:53:20 +00:00
John Birrell	2826f17433	Report the correct function name in a DPRINTF.	2006-08-03 21:19:13 +00:00
John Birrell	b9279e66e4	Regen. Note the addition of the extra file now generated.	2006-08-03 05:32:43 +00:00
John Birrell	1533c33fd4	Generate another file called systrace_args.c. This will be compiled into systrace and is used to map the syscall arguments into the 64-bit parameter array.	2006-08-03 05:29:09 +00:00
Robert Watson	9126410f4b	Move destroying kqueue state from above pru_detach to below it in sofree(), as a number of protocols expect to be able to call soisdisconnected() during detach. That may not be a good assumption, but until I'm sure if it's a good assumption or not, allow it.	2006-08-02 18:37:44 +00:00
Robert Watson	92716fe04e	Change two XXX's to two notes: the fact that SOCK_LOCK(so) == SOCKBUF_LOCK(&so->so_rcv) is encoded, which is worth noting, but not a bug.	2006-08-02 16:23:52 +00:00
John Baldwin	9802d04ce0	Fix some bugs in the previous revision (1.419). Don't perform extra vfs_rel() on the mountpoint if the MAC checks fail in kern_statfs() and kern_fstatfs(). Similarly, don't perform an extra vfs_rel() if we get a doomed vnode in kern_fstatfs(), and handle the case of mp being NULL (for some doomed vnodes) by conditionalizing the vfs_rel() in kern_fstatfs() on mp != NULL. CID: 1517 Found by: Coverity Prevent (tm) (kern_fstatfs()) Pointy hat to: jhb	2006-08-02 15:27:48 +00:00
Robert Watson	f8b20fb6d6	Remove now unneeded ENOTCONN clause from SOCK_DGRAM side of uipc_send(): we have to check it regardless of the target address, so don't check it twice.	2006-08-02 14:30:58 +00:00
Robert Watson	050ac26521	Remove 'register'. Use ANSI C prototypes/function headers. More deterministically line wrap comments.	2006-08-02 13:01:58 +00:00
David Xu	64511d2abc	Don't include sys/thr.h and umtx.h in sys/sysproto.h, it is unnecessary.	2006-08-02 08:09:24 +00:00
David Xu	aff5bcb1b2	INT_MAX is defined in file sys/limits.h, include the file now.	2006-08-02 07:34:51 +00:00
Robert Watson	c0e1415d51	Move updated of 'numopensockets' from bottom of sodealloc() to the top, eliminating a second set of identical mutex operations at the bottom. This allows brief exceeding of the max sockets limit, but only by sockets in the last stages of being torn down.	2006-08-02 00:45:27 +00:00
John Baldwin	03e161fdb1	Make system call modules a bit more robust: - If we fail to register the system call during MOD_LOAD, then note that so that we don't try to deregister it or invoke the chained event handler during the subsequent MOD_UNLOAD event. Doing the deregister when the register failed could result in trashing system call entries. - Add a SI_SUB_SYSCALLS just before starting up init and use that to register syscall modules instead of SI_SUB_DRIVERS. Registering system calls as late as possible increases the chances that any other module event handlers or SYSINITs in a module are executed to initialize the data in a kld before a syscall dependent on that data is able to be invoked. MFC after: 3 days	2006-08-01 16:32:20 +00:00
John Baldwin	38affe135a	Don't lock each of the processes while looking for a pid. The allproc and proctree locks that we already hold provide sufficient protection.	2006-08-01 15:30:56 +00:00
Robert Watson	eaa6dfbcc2	Reimplement socket buffer tear-down in sofree(): as the socket is no longer referenced by other threads (hence our freeing it), we don't need to set the can't send and can't receive flags, wake up the consumers, perform two levels of locking, etc. Implement a fast-path teardown, sbdestroy(), which flushes and releases each socket buffer. A manual dom_dispose of the receive buffer is still required explicitly to GC any in-flight file descriptors, etc, before flushing the buffer. This results in a 9% UP performance improvement and 16% SMP performance improvement on a tight loop of socket();close(); in micro-benchmarking, but will likely also affect CPU-bound macro-benchmark performance.	2006-08-01 10:30:26 +00:00
Robert Watson	b5ff091431	Close a race that occurs when using sendto() to connect and send on a UNIX domain socket at the same time as the remote host is closing the new connections as quickly as they open. Since the connect() and send() paths are non-atomic with respect to another, it is possible for the second thread's close() call to disconnect the two sockets as connect() returns, leading to the consumer (which plans to send()) with a NULL kernel pointer to its proposed peer. As a result, after acquiring the UNIX domain socket subsystem lock, we need to revalidate the connection pointers even though connect() has technically succeed, and reurn an error to say that there's no connection on which to perform the send. We might want to rethink the specific errno number, perhaps ECONNRESET would be better. PR: 100940 Reported by: Young Hyun <youngh at caida dot org> MFC after: 2 weeks MFC note: Some adaptation will be required	2006-07-31 23:00:05 +00:00
John Baldwin	53c9158f24	Trim an obsolete comment. ktrgenio() stopped doing crazy gymnastics when ktrace was redone to be mostly synchronous again.	2006-07-31 15:31:43 +00:00
John Baldwin	91ce2694d1	Regen for MPSAFE flag removal.	2006-07-28 19:08:37 +00:00
John Baldwin	af5bf12239	Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.	2006-07-28 19:05:28 +00:00
John Baldwin	e0b4add8d8	Various fixes to comments in the syscall master files including removing cruft from the audit import and adding mention of COMPAT4 to freebsd32.	2006-07-28 18:55:18 +00:00
John Baldwin	764e4d54e9	Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it is a count of all non-spin locks, not just lockmgr locks. This can give us a much cheaper way to see if we have any locks held (such as when returning to userland via userret()) without requiring WITNESS. MFC after: 1 week	2006-07-27 21:45:55 +00:00
John Baldwin	ea175645b4	Hold the reference on the mountpoint slightly longer in kern_statfs() and kern_fstatfs() so that it is still held when prison_enforce_statfs() is called (since that function likes to poke and prod at the mountpoint structure). MFC after: 3 days	2006-07-27 20:00:27 +00:00
John Baldwin	186abbd727	Write a magic value into mtx_lock when destroying a mutex that will force all other mtx_lock() operations to block. Previously, when the mutex was destroyed, it would still have a valid value in mtx_lock(): either the unowned cookie, which would allow a subsequent mtx_lock() to succeed, or a pointer to the thread who destroyed the mutex if the mutex was locked when it was destroyed. MFC after: 3 days	2006-07-27 19:58:18 +00:00
John Baldwin	f30e89ced3	Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.	2006-07-27 19:54:41 +00:00
Robert Watson	0075d85869	Remove call to soisdisconnected() in uipc_detach(), since it will already have been invoked by uipc_close() or uipc_abort(), and the socket is in a state of being torn down by the time we get to this point, so kqueue state frobbed by soisdisconnected() is not available, so frobbing it will result in a panic. Reported by: Munehiro Matsuda <haro at h4 dot dion dot ne dot jp>	2006-07-26 19:16:34 +00:00
Robert Watson	f14cce87dc	Remove non-socket buffer routines from uipc_sockbuf.c, and socket buffer specific routines from uipc_socket2.c following repo-copy. We might rethink the location of one or two at some point, but the division was relatively clean. uipc_sockbuf.c is now the home of routines that manipulate socket buffers.	2006-07-24 16:21:31 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00

1 2 3 4 5 ...

9670 Commits