freebsd-dev

Author	SHA1	Message	Date
Robert Watson	223aaaecb0	Remove mac_create_root_mount() and mpo_create_root_mount(), which provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA	2005-09-19 13:59:57 +00:00
Marcel Moolenaar	73130b2224	Move the UUID generator into its own function, called kern_uuidgen(), so that UUIDs can be generated from within the kernel. The uuidgen(2) syscall now allocates kernel memory, calls the generator, and does a copyout() for the whole UUID store. This change is in support of GPT.	2005-09-18 21:40:15 +00:00
Robert Watson	8434c29b28	Add three new read-only socket options, which allow regression tests and other applications to query the state of the stack regarding the accept queue on a listen socket: SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog) SO_LISTENQLEN Return the value of so_qlen (complete sockets) SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets) Minor white space tweaks to existing socket options to make them consistent. Discussed with: andre MFC after: 1 week	2005-09-18 21:08:03 +00:00
Robert Watson	bc6b8b5d64	Fix spelling in a comment. MFC after: 3 days	2005-09-18 10:46:34 +00:00
Robert Watson	7da7362b95	Re-comment sbcompress() to explain what it is it does; it took me quite a bit of reading to figure it out, and I want to avoid figuring it out again. Convert an if (foo) else printf("this is almost a panic") into a KASSERT. MFC after: 3 days	2005-09-18 10:30:10 +00:00
Warner Losh	fe0519b171	MFp4: Expose device_probe_child()	2005-09-18 01:32:09 +00:00
Christian S.J. Peron	42e7197fba	Implement new world order in VFS locking for ACLs. This will remove the unconditional acquisition of Giant for ACL related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup giant. For any operations which require namei(9) lookups: __acl_get_file __acl_get_link __acl_set_file __acl_set_link __acl_delete_file __acl_delete_link __acl_aclcheck_file __acl_aclcheck_link -Set the MPSAFE flag in NDINIT -Initialize vfslocked variable using the NDHASGIANT macro For functions which operate on fds, make sure the operations are locked: __acl_get_fd __acl_set_fd __acl_delete_fd __acl_aclcheck_fd -Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode Discussed with: jeff	2005-09-17 22:01:14 +00:00
Tor Egge	61ac14dab6	Break out of loop if next buffer pointer has become invalid while flushing current buffer. Reviewed by: kan	2005-09-16 18:28:12 +00:00
Stephan Uphoff	19b2dff7b0	Fix race condition that caused activation of an event to be ignored immediately after it was deactivated. Found by: Yahoo! MFC after: 3 days	2005-09-15 21:10:12 +00:00
John Baldwin	21f9e816cd	Oops, missed adding the required include. Pointy hat to: jhb	2005-09-15 20:20:36 +00:00
John Baldwin	53c0e1ff7d	Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down}) with the disallow sleeping facility.	2005-09-15 20:09:08 +00:00
John Baldwin	10f508d9a3	Don't disallow sleeping for handlers on swi's since some swi handlers (like CAM) do sleep in their handlers. Requested by: scottl	2005-09-15 20:08:21 +00:00
John Baldwin	b27dbfbf4a	- Enforce an implicit lock order that Giant cannot be locked while holding any other non-sleepable lock. In plain English: Giant comes before all other mutexes. - Add some extra description to the lock order reversal printf's to indicate when a reversal is triggered by a hard-coded implicit rule. Requested by: truckman (2) MFC after: 1 week	2005-09-15 19:07:14 +00:00
John Baldwin	51460da87f	- Add a new simple facility for marking the current thread as being in a state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk	2005-09-15 19:05:37 +00:00
Christian S.J. Peron	68ff2a4397	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days	2005-09-15 15:03:48 +00:00
Maxim Konovalov	aada5cccd8	Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected sockets. Pointed out by: rwatson	2005-09-15 13:18:05 +00:00
Ralf S. Engelschall	724447ac41	Fix system shutdown timeout handling by again supporting longer running shutdown procedures (which have a duration of more than 120 seconds). We have two user-space affecting shutdown timeouts: a "soft" one in /etc/rc.shutdown and a "hard" one in init(8). The first one can be configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults to 30 seconds. The second one was originally (in 1998) intended to be configured via sysctl(8) variable "kern.shutdown_timeout" and defaults to 120 seconds. Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999 (as it obviously is actually not used within the kernel itself) and hence was intentionally but misleadingly removed in revision 1.107 from init_main.c. Kernel sysctl(8) variables are certainly a wrong way to control user-space processes in general, but in this particular case the sysctl(8) variable should have remained as it supports init(8), which isn't passed command line flags (which in turn could have been set via /etc/rc.conf), etc. As there is already a similar "kern.init_path" sysctl(8) variable which directly affects init(8), resurrect the init(8) shutdown timeout under sysctl(8) variable "kern.init_shutdown_timeout". But this time document it as being intentionally unused within the kernel and used by init(8). Also document it in the manpages init(8) and rc.conf(5). Reviewed by: phk MFC after: 2 weeks	2005-09-15 13:16:07 +00:00
Maxim Konovalov	c5cff17017	o Return ENOTCONN when shutdown(2) on non-connected socket. PR: kern/84761 Submitted by: James Juran R-test: tools/regression/sockets/shutdown MFC after: 1 month	2005-09-15 11:45:36 +00:00
Poul-Henning Kamp	74f46f19aa	Retire unused dev_named() function.	2005-09-15 08:01:57 +00:00
Robert Watson	fd1a469ba5	In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported kqueue filter type is requested on a vnode. MFC after: 3 days	2005-09-12 19:22:37 +00:00
Jung-uk Kim	9ed448b20c	use monotonic `time_uptime' instead of` time_second' Approved by: anholt (mentor) Discussed on: arch	2005-09-12 15:31:28 +00:00
Poul-Henning Kamp	2883ba6668	Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.	2005-09-12 08:46:07 +00:00
Tor Egge	6ff5e2db45	Don't retry when vget() returns ENOENT in the nonblocking case due to the vnode being doomed. It causes a livelock.	2005-09-12 01:48:57 +00:00
Don Lewis	908b3deb2b	Relocate witness_levelall(), witness_leveldescendents(), and witness_displaydescendants() so that they are protected by "#ifdef DDB/#endif" to unbreak kernels not using "option DDB". MFC after: 3 weeks	2005-09-11 07:57:06 +00:00
Gleb Smirnoff	d04304d155	Make callout_reset() return a non-zero value if a pending callout was rescheduled. If there was no pending callout, then return 0. Reviewed by: iedowse, cperciva	2005-09-08 14:20:39 +00:00
Don Lewis	d07f87a218	Add a new struct buf flag bit, B_PERSISTENT, and use it to tag struct bufs that are persistently held by ext2fs. Ignore any buffers with this flag in the code in boot() that counts "busy" and dirty buffers and attempts to sync the dirty buffers, which is done before attempting to unmount all the file systems during shutdown. This fixes the problem caused by any ext2fs file systems that are mounted at system shutdown time, which caused boot() to give up on a non-zero number of buffers and skip the call to vfs_unmountall(). This left all the mounted file systems in a dirty state and caused them to all require cleanup by fsck on reboot. Move the two separate copies of the "busy" buffer test in boot() to a separate function. Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro. Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with this and previous flag changes. PR: kern/56675, kern/85163 Tested by: "Matthias Andree" matthias.andree at gmx.de Reviewed by: bde MFC after: 3 days	2005-09-08 06:30:05 +00:00
David E. O'Brien	5b1c0294e4	Forward declaring static variables as extern is invalid ISO-C. Now that GCC can properly handle forward static declarations, do this properly.	2005-09-07 10:06:14 +00:00
Gleb Smirnoff	016e62123a	In soreceive(), when a first mbuf is removed from socket buffer use sockbuf_pushsync(). Previous manipulation could lead to an inconsistent mbuf. Reviewed by: rwatson	2005-09-06 17:05:11 +00:00
Gleb Smirnoff	f46ab10c02	Document flags of a pollrec.	2005-09-06 11:09:18 +00:00
Christian S.J. Peron	d1dfd92177	Convert the primary ACL allocator from malloc(9) to using a UMA zone instead. Also introduce an aclinit function which will be used to create the UMA zone for use by file systems at system start up. MFC after: 1 month Discussed with: rwatson	2005-09-06 00:06:30 +00:00
Gleb Smirnoff	16901c0186	Remove Giant mutex from polling(4) and use a separate poll_mtx(4) instead. Detailed changelist: o Add flags field to struct pollrec, to indicate that are particular entry is being worked on. o Define a macro PR_VALID() to check that a pollrec is valid and pollable. o Mark ISRs as mpsafe. o ether_poll() - Acquire poll_mtx while traversing pollrec array. - Skip pollrecs, that are being worked on. - Conditionally acquire Giant when entering handler. o netisr_pollmore() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics. o netisr_poll() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics and traversing pollrec array. o ether_poll_register(), ether_poll_deregister() - Conditionally assert Giant. - Acquire poll_mtx while working with pollrec array. o poll_idle() - Remove all strange manipulations with Giant. In collaboration with: ru, pjd In collaboration with: Oleg Bulyzhin <oleg rinet.ru> In collaboration with: dima <_pppp mail.ru>	2005-09-05 16:02:11 +00:00
Xin LI	5248ef8a3c	When padding with zero, do pad after prefixes rather than padding before prefixes. Use cases: printf("%05d", -42); --> "00-42" (should be "-0042") printf("%#05x", 12); --> "000xc" (should be "0x00c") Submitted by: Oliver Fromme PR: kern/85520 MFC After: 1 week	2005-09-04 18:03:45 +00:00
Poul-Henning Kamp	1e7d2c4763	If we ignore an unknown % sequence, we must stop interpreting the remaining % arguments because the varargs are now out of sync and there is a risk that we might for instance dereference an integer in a %s argument. Sponsored by: Napatech.com	2005-09-03 10:28:08 +00:00
John Baldwin	acc0265cc2	- Add some comments to some of the static lock orders. Don't explicitly link proctree and allproc to Giant since that order is already implicitly enforced. - Use a goto to handle the case where we want to enforce a reversal before calling isitmydescendant() in witness_checkorder() so that the logic is easier to follow and so that it is easier to add more forced-reversal cases in the future. MFC after: 3 days	2005-09-02 20:23:49 +00:00
John Baldwin	83cece6fa1	- Add an assertion to panic if one tries to call mtx_trylock() on a spin mutex. - Don't panic if a spin lock is held too long inside _mtx_lock_spin() if panicstr is set (meaning that we are already in a panic). Just keep spinning forever instead.	2005-09-02 20:21:49 +00:00
John Baldwin	83de502d59	Add witness warnings to panic if a thread tries to exit while holding any locks. Requested by: jeff MFC after: 3 days	2005-09-02 20:20:01 +00:00
Nate Lawson	9000b91eb9	Break out the checks for duplicates and absolute settings being too high instead of trying to do them all at once. This should fix the level sorting problems from the previous revision. Testing help: ume	2005-09-02 16:32:43 +00:00
Suleiman Souhlal	1f71de49e1	Print out a warning and a backtrace if we try to unlock a lockmgr that we do not hold. Glanced at by: phk MFC after: 3 days	2005-09-02 15:56:01 +00:00
Suleiman Souhlal	2611e5a6a9	Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed and unbusied in devfs_fixup(), which assumes that the devfs mount is still locked. Granced at by: phk MFC after: 3 days	2005-09-02 13:37:54 +00:00
Pawel Jakub Dawidek	d8b464e51e	In case of mac_check_vnode_rename_from() or vn_start_write() failure, vn_finished_write() should not be called. Reviewed by: ssouhlal MFC after: 3 days	2005-09-01 21:46:33 +00:00
Andre Oppermann	fdcc028d11	Changes and cleanups to m_sanity(): o for() instead of while() looping over mbuf chain o paren's around all flag checks o more verbose function and purpose description o some more style changes Based on feedback from: sam	2005-08-30 21:31:42 +00:00
Andre Oppermann	e0068c3a69	Unbreak m_demote() and put back the 'all' flag. Without it we cannot correctly test for m_nextpkt in an mbuf chain.	2005-08-30 21:14:30 +00:00
Andre Oppermann	fbe816384a	o Remove the 'all' flag from m_demote(). Users can simply call it with m_demote(m->m_next) if they wish to start at the second mbuf in chain. o Test m_type with == instead of &. o Check m_nextpkt against NULL instead of implicit 0. Based on feedback from: sam	2005-08-30 20:07:49 +00:00
Nate Lawson	5308b2a64e	Eliminate cpufreq levels for two cases that are less than optimal: 1. Walk the absolute list in reverse to prefer duplicated levels that have a lower absolute setting, i.e. 800 Mhz/50% is better than 1600 Mhz/25% even though both have the same actual frequency. This also removes the need to check for already-modified levels since by definition, those will be added later in the sorted list. 2. Compare the absolute settings for derived levels and don't use the new level if it's higher. For example, a level of 800 Mhz/75% is preferable to 1600 Mhz/25% even though the latter has a lower total frequency. This work is based on a patch from the submitter but reworked by myself. Submitted by: Tijl Coosemans (tijl/ulyssis.org)	2005-08-30 04:45:32 +00:00
Andre Oppermann	4da8443133	Add m_copymdata(struct mbuf m, struct mbuf n, int off, int len, int prep, int how). Copies the data portion of mbuf (chain) n starting from offset off for length len to mbuf (chain) m. Depending on prep the copied data will be appended or prepended. The function ensures that the mbuf (chain) m will be fully writeable by making real (not refcnt) copies of mbuf clusters. For the prepending the function returns a pointer to the new start of mbuf chain m and leaves as much leading space as possible in the new first mbuf. Reviewed by: glebius	2005-08-29 20:15:33 +00:00
Andre Oppermann	a048affba5	Add m_sanity(struct mbuf *m, int sanitize) to do some heavy sanity checking on mbuf's and mbuf chains. Set sanitize to 1 to garble illegal things and have them blow up later when used/accessed. m_sanity()'s main purpose is for KASSERT()'s and debugging of non- kosher mbuf manipulation (of which we have a number of). Reviewed by: glebius	2005-08-29 19:58:56 +00:00
Andre Oppermann	ed111688e9	Add m_demote(struct mbuf *m, int all) to clean up mbuf (chain) from any tags and packet headers. If "all" is set then the first mbuf in the chain will be cleaned too. This function is used before an mbuf, that arrived as packet with m->flags & M_PKTHDR, is appended to an mbuf chain using m->m_next (not m->m_nextpkt). Reviewed by: glebius	2005-08-29 19:45:39 +00:00
Pawel Jakub Dawidek	e37a499443	Add 'depth' argument to CTRSTACK() macro, which allows to reduce number of ktr slots used. If 'depth' is equal to 0, the whole stack will be logged, just like before.	2005-08-29 11:34:08 +00:00
Suleiman Souhlal	a6c109d658	Fix a typo in vop_rename_pre() where we ended up using vholdl() instead of vhold(), even though the vnode interlock is unlocked. MFC after: 3 days	2005-08-28 23:00:11 +00:00
Alan Cox	7f1ef325d7	Handle vm_map_wire()'s failure.	2005-08-28 05:38:40 +00:00
Alan Cox	5d3043ce9a	Correctly handle vm_map_wire()'s failure. (See also revisions 1.81 and 1.82.) Reviewed by: tegge	2005-08-28 04:50:11 +00:00
Alan Cox	45e31b6034	Eliminate an unneeded reference on a vm object. If, in fact, the nearby vm_map_find() fails, then the excess reference causes the vm object to be leaked. Reviewed by: tegge	2005-08-28 00:24:58 +00:00
Alan Cox	4167396552	Revert the previous change for two reasons: (1) If vm_map_find() succeeds but vm_map_wire() fails, then a vm object, vm map entries, and kernel_map free space is leaked and (2) unwiring is handled automatically by vm_map_remove(). Suggested by: tegge	2005-08-28 00:19:54 +00:00
Dag-Erling Smørgrav	d09dfa2bfd	Two minor optimizations of fdalloc(): - if minfd < fd_freefile (as is most often the case, since minfd is usually 0), set it to fd_freefile. - remove a call to fd_first_free() which duplicates work already done by fdused(). This change results in a small but measurable speedup for processes with large numbers (several thousands) of open files. PR: kern/85176 Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz> MFC after: 3 weeks	2005-08-26 11:16:39 +00:00
Don Lewis	4053cae340	Track all lock relationships instead of pruning direct relationships if an indirect relationship exists (keep both A->B->C and A->C). This allows witness_checkorder() to use isitmychild() instead of the much more expensive isitmydescendant() to check for valid lock ordering. Don't do an expensive tree walk to update the w_level values when the tree is updated. Only update the w_level values when using the debugger to display the tree. Nuke the experimental "witness_watch > 1" mode that only compared w_level for the two locks. This information is no longer maintained at run time, and the use of isitmychild() in witness_checkorder should bring performance close enough to the acceptable level that this hack is not needed. Report witness data structure allocation statistics under the debug.witness sysctl. Reviewed by: jhb MFC after: 30 days	2005-08-25 03:47:37 +00:00
Don Lewis	ad9f180121	Back out the removal of LK_NOWAIT from the VOP_LOCK() call in vlrureclaim() in vfs_subr.c 1.636 because waiting for the vnode lock aggravates an existing race condition. It is also undesirable according to the commit log for 1.631. Fix the tiny race condition that remains by rechecking the vnode state after grabbing the vnode lock and grabbing the vnode interlock. Fix the problem of other threads being starved (which 1.636 attempted to fix by removing LK_NOWAIT) by calling uio_yield() periodically in vlrureclaim(). This should be more deterministic than hoping that VOP_LOCK() without LK_NOWAIT will block, which may not happen in this loop. Reviewed by: kan MFC after: 5 days	2005-08-23 03:44:06 +00:00
Pawel Jakub Dawidek	4e4aa37e75	mp_ncpus is always (properly) initialized, even on UP kernels, so just use it.	2005-08-21 18:03:31 +00:00
Robert Watson	6cd8dee3c5	Silence "busy" warnings when unmounting devfs at system shutdown. This is a workaround for non-symetric teardown of the file systems at shutdown with respect to the mount order at boot. The proper long term fix is to properly detach devfs from the root mount before unmounting each, and should be implemented, but since the problem is non-harmful, this temporary band-aid will prevent false positive bug reports and unnecessary error output for 6.0-RELEASE. MFC after: 3 days Tested by: pav, pjd	2005-08-20 17:12:47 +00:00
Poul-Henning Kamp	1d45c50ec3	Properly un-giant-trick the cdevsw in fini_cdevsw() Tripped over by: Huang wen hui <huang@gddsn.org.cn>	2005-08-20 12:13:51 +00:00
David Xu	86ef8e2671	Add missing brackets. Noticed by: stefanf@	2005-08-19 22:30:13 +00:00
David Xu	8c6d7a8db8	Fix a LOR between sched_lock and sleep queue lock.	2005-08-19 13:35:34 +00:00
David Xu	f8ec133ed0	Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly for PRI_ITHD and PRI_REALTIME threads.	2005-08-19 11:51:41 +00:00
Hajimu UMEMOTO	1fea6ce7dd	- don't forget to save freqency when priority is raised. - nuke redundant variable initialization.	2005-08-18 16:41:25 +00:00
Hajimu UMEMOTO	5f36393468	don't forget to update curr_priority. even when frequency is not changed, priority may be changed.	2005-08-18 16:08:56 +00:00
Poul-Henning Kamp	516ad423b1	Handle device drivers with D_NEEDGIANT in a way which does not penalize the 'good' drivers: Allocate a shadow cdevsw and populate it with wrapper functions which grab Giant	2005-08-17 08:19:52 +00:00
Poul-Henning Kamp	a07b0febaa	In vop_stdpathconf(ap) also default for _PC_NAME_MAX and _PC_PATH_MAX.	2005-08-17 06:59:23 +00:00
Hajimu UMEMOTO	961f7f911f	Save cpu level only when priority is greater than PRIO_USER to make CPUFREQ_SET(NULL, prio) work. TODO: implement saved_level as stack. Reviewed by: njl	2005-08-16 20:03:08 +00:00
Poul-Henning Kamp	b3740d656f	Remove stale comment.	2005-08-16 19:47:42 +00:00
Poul-Henning Kamp	31cc57cdbd	Collect the devfs related sysctls in one place	2005-08-16 19:25:02 +00:00
Poul-Henning Kamp	9c0af1310c	Create a new internal .h file to communicate very private stuff from kern_conf.c to devfs. For now just two prototypes, more to come.	2005-08-16 19:08:01 +00:00
Alexander Kabaev	0c207975f2	Do not keep parent directory locked while calling VFS_ROOT to traverse mount points in lookup(). The lock can be dropped safely around VFS_ROOT because LOCKPARENT semantics with child and perent vnodes coming from different FSes does not really have any meaningful use. On the other hard, this prevents easily triggered deadlock on systems using automounter daemon.	2005-08-14 18:10:04 +00:00
Alexander Kabaev	857b66d505	Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable. vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way. Reported by: ade Obtained from: alc MFC after: 2 days	2005-08-13 20:21:33 +00:00
Marcel Moolenaar	fd65baf8e2	Make mpsafe_vfs=1 the default on ia64.	2005-08-13 20:07:50 +00:00
Nate Lawson	da8a77c1f1	The "lowest" sysctl setting makes more sense as the lowest one to use, so discard all levels less than this setting, not less than/equal to. MFC after: 1 day	2005-08-11 18:40:58 +00:00
Alexander Kabaev	45a0d1ed7a	Do not drop the vnode interlock if vdropl is called on already doomed vnode. vdropl callers expect it to return with interlock still being held. MFC after: 2 days	2005-08-10 11:46:03 +00:00
Robert Watson	ae018704a1	Add an order between UDP inpcb locks and the IPv4 multicast address list lock, as there has been a report that an alternative lock order is getting introduced. This should help ferret it out. Reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>	2005-08-09 13:27:50 +00:00
Robert Watson	13f4c340ae	Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to ifnet.if_drv_flags. Device drivers are now responsible for synchronizing access to these flags, as they are in if_drv_flags. This helps prevent races between the network stack and device driver in maintaining the interface flags field. Many __FreeBSD__ and __FreeBSD_version checks maintained and continued; some less so. Reviewed by: pjd, bz MFC after: 7 days	2005-08-09 10:20:02 +00:00
Christian S.J. Peron	d8339a2616	Drop in a WITNESS_WARN into SYSCTL_IN to make sure that we are not holding any non-sleep-able-locks locks when copyin is called. This gets executed un-conditionally since we have no function to wire the buffer in this direction. Pointed out by: truckman MFC after: 1 week	2005-08-08 21:06:42 +00:00
Robert Watson	6a113b3de7	Merge the dev_clone and dev_clone_cred event handlers into a single event handler, dev_clone, which accepts a credential argument. Implementors of the event can ignore it if they're not interested, and most do. This avoids having multiple event handler types and fall-back/precedence logic in devfs. This changes the kernel API for /dev cloning, and may affect third party packages containg cloning kernel modules. Requested by: phk MFC after: 3 days	2005-08-08 19:55:32 +00:00
Christian S.J. Peron	417ab24f78	Check to see if we wired the user-supplied buffers in SYSCTL_OUT, if the buffer has not been wired and we are holding any non-sleep-able locks, drop a witness warning. If the buffer has not been wired, it is possible that the writing of the data can sleep, especially if the page is not in memory. This can result in a number of different locking issues, including dead locks. MFC after: 1 week Discussed with: rwatson Reviewed by: jhb	2005-08-08 18:54:35 +00:00
David Xu	1278181c6c	Try best to keep a preempted thread at front of run queue, this seems improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.	2005-08-08 14:20:10 +00:00
Peter Grehan	e000e00118	Export a routine, kobj_machdep_init(), that allows platforms to use the kobj subsystem as soon at mutex_init() has been called instead of having to wait for the SI_SUB_LOCK sysinit. Reviewed by: dfr	2005-08-07 02:20:35 +00:00
Christian S.J. Peron	9baea4b4b4	Change the data type of the upper shared memory limits from a signed integer to an unsigned long. This lifts variables like the maximum number of pages available for shared memory from 2^31 to 2^32 on 32 bit architectures, and from 2^31 to 2^64 on 64 bit architectures. It should be noted that this changes breaks ABI on 64 bit architectures because the size of the shmmax, shmmin, shmmni, shmseg and shmall members of the shminfo structure has changed. Silence on: current@	2005-08-06 07:20:18 +00:00
Suleiman Souhlal	34cc826ae8	Holding a vnode doesn't prevent v_mount from disappearing (when the vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days	2005-08-06 01:42:04 +00:00
Robert Watson	dd5a318ba3	Introduce in_multi_mtx, which will protect IPv4-layer multicast address lists, as well as accessor macros. For now, this is a recursive mutex due code sequences where IPv4 multicast calls into IGMP calls into ip_output(), which then tests for a multicast forwarding case. For support macros in in_var.h to check multicast address lists, assert that in_multi_mtx is held. Acquire in_multi_mtx around iteration over the IPv4 multicast address lists, such as in ip_input() and ip_output(). Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses, as well as over the manipulation of ifnet multicast address lists in order to keep the two layers in sync. Lock down accesses to IPv4 multicast addresses in IGMP, or assert the lock when performing IGMP join/leave events. Eliminate spl's associated with IPv4 multicast addresses, portions of IGMP that weren't previously expunged by IGMP locking. Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded lock order in WITNESS, in that order. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 10 days	2005-08-03 19:29:47 +00:00
Jeff Roberson	40a495853a	- Unlock before we call mac_destroy_vnode to prevent a lock order reversal. Found by: trhodes	2005-08-03 05:36:50 +00:00
Jeff Roberson	9e2aaec1e3	- Use lockmgr_printinfo rather than rolling our own. This introduces a slight problem by using printf instead of db_printf however 'show lockedvnods' does the same so I believe it is ok for now.	2005-08-03 05:02:08 +00:00
Jeff Roberson	7499fd8de9	- Fix a problem that slipped through review; the stack member of the lockmgr structure should have the lk_ prefix. - Add stack_print(lkp->lk_stack) to the information printed with lockmgr_printinfo().	2005-08-03 04:59:07 +00:00
Jeff Roberson	e8ddb61d38	- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>	2005-08-03 04:48:22 +00:00
Jeff Roberson	8d511e2a05	- Add support for saving stack traces and displaying them via printf(9) and KTR. Contributed by: Antoine Brodin <antoine.brodin@laposte.net> Concept code from: Neal Fachan <neal@isilon.com>	2005-08-03 04:27:40 +00:00
David Xu	3c424d1447	In adjustrunqueue(), add code to handle thread migrating case for ULE scheduler. In original code, local run queue of threaded ksegrp is corrupted if adjustrunqueue() is called while thread is migrating.	2005-08-03 01:23:45 +00:00
Ruslan Ermilov	2319835713	Long overdue, keep up with mbuf.h,v 1.148.	2005-08-02 20:03:23 +00:00
Kelly Yancey	dcb5fef5db	Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std 1003.1 (POSIX).	2005-08-01 21:15:09 +00:00
David Xu	3d16f519b6	If a thread was removed from system run queue, kse_assign shouldn't add it again.	2005-07-31 15:11:21 +00:00
Alexander Leidinger	32069af652	The resource_xxx routines in subr_hints.c are called before and after the kenv environment in kern_environment.c switches to dynamic kenv. The prior call sets the static variable hintp to the static hints in subr_hints.c (hintmode==0). However, changes to the environment are not detected by the resource_xxx lookups after the change to dynamic kernel environment, so the lookup routines only report the old stuff of hintmode==0, even after the change to the dynamic kenv. This causes kenv users to see a different environment than the kernel routines. This is a problem in the mixer.c code that looks up initial mixer volume settings from the hints: If the hints are dynamic and not from the device.hints file, mixer.c doesn't see them, but kenv does. The patch from the PR (modified to comply to the style of the function) solves this. PR: 83686 Submitted by: Harry Coin <harrycoin@qconline.com>	2005-07-31 10:46:55 +00:00
Alexander Leidinger	3904769ba8	Add bounds checking to the setenv part of the kernel environment. This has no security implications since only root is allowed to use kenv(1) (and corrupt the kernel memory after adding too much variables previous to this commit). This is based upon the PR [1] mentioned below, but extended to check both bounds (in case of an overflow of the counting variable) and to comply to the style of the function. An overflow of the counting variable shouldn't happen after adding the check for the upper bound, but better safe than sorry (in case some other function in the kernel overwrites random memory). An interested soul may want to add a printf to notify root in case the bounds are hit. Also allocate KENV_SIZE+1 entries (the array is NULL-terminated), since the comment for KENV_SIZE says it's the maximum number of environment strings. [2] PR: 83687 [1] Submitted by: Harry Coin <harrycoin@qconline.com> [1] Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my> [2]	2005-07-31 10:28:35 +00:00
Joseph Koshy	fadcc6e201	Fail the module loading process if the currently executing kernel was not compiled with 'options HWPMC_HOOKS' or if the compiled-in version numbers of the kernel and module are out of sync. Reported by: cracauer MFC after: 3 days	2005-07-30 09:02:42 +00:00
Paul Saab	1126349ae7	Ignore mutex asserts when we're dumping as well. This allows me to panic a system from DDB when INVARIANTS is compiled into the kernel on a scsi system.	2005-07-30 05:54:30 +00:00
Sam Leffler	ab8ab90c5b	add m_align, a function to align any type of mbuf (i.e. it is a superset of M_ALIGN and MH_ALIGN) Reviewed by: several	2005-07-30 01:32:16 +00:00
R. Imura	080e3a63b3	Change API of mb_copy_t in libmchain so that netsmb can handle multibyte character share name correctly. Reviewed by: bp	2005-07-29 13:22:37 +00:00
George V. Neville-Neil	0d52d7b01a	Fix for PR 83885. Make sure that there actually is a next packet before setting nextrecord to that field. PR: 83885 Submitted by: hirose@comm.yamaha.co.jp Obtained from: Patch suggested in the PR MFC after: 1 week	2005-07-28 10:10:01 +00:00
Pawel Jakub Dawidek	73864adbd4	Fix the way how "InUse" column in 'vmstat -m' output works: - increase number of allocations count only on successfull malloc(9), so it doesn't confuse people; - because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;' under the check as well, as for size=0 it is of course a no-op; - avoid critical_enter()/critical_exit() in case of failure in malloc_type_allocated() as there will be nothing to do. OK'ed by: rwatson MFC after: 2 days	2005-07-27 23:17:31 +00:00
Xin LI	05a6b7ad62	Cast to uintptr_t when the compiler complains. This unbreaks ULE scheduler breakage accompanied by the recent atomic_ptr() change.	2005-07-25 10:21:49 +00:00
Alan Cox	ec9c9e7363	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
Jeff Roberson	39b2406838	- Allow vnlru to drop giant if the filesystem does not require it. The vnlru proc is extremely inefficient, potentially iteration over tens of thousands of vnodes without blocking. Droping Giant allows other threads to preempt us although we should revisit the algorithm to fix the runtime problems especially since this may hold up all vnode allocations. - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim. This provides a natural blocking point to help alleviate the situation described above although it may not technically be desirable. - yield after we make a pass on all mount points to prevent us from blocking other threads which require Giant. MFC after: 2 weeks	2005-07-20 01:43:27 +00:00
John Baldwin	ddf9c4f771	- Slightly reorder the events around the setting of PRS_ZOMBIE to be less hokie and much more readable and expand the comment to explain why it is the way that it is. - Close a race where one CPU could free the process belonging to a thread on another CPU that hasn't quite finished exiting yet but is beyond the point of setting the process state as PRS_ZOMBIE. Reported and tested by: ps (2) MFC after: 3 days	2005-07-18 20:08:14 +00:00
Robert Watson	68352adfe7	Define four constants, MBUF_{,MEM,CLUSTER,PACKET,TAG}_MEM_NAME, which are string names for their respective UMA zones and malloc types, and are passed into uma_zcreate() and MALLOC_DEFINE(). Export them outside of _KERNEL in mbuf.h so that netstat can reference them. Change the names to improve consistency, with each zone/type associated with the mbuf allocator being prefixed mbuf_. MFC after: 1 week	2005-07-17 14:04:03 +00:00
John Baldwin	122eceef61	Convert the atomic_ptr() operations over to operating on uintptr_t variables rather than void * variables. This makes it easier and simpler to get asm constraints and volatile keywords correct. MFC after: 3 days Tested on: i386, alpha, sparc64 Compiled on: ia64, powerpc, amd64 Kernel toolchain busted on: arm	2005-07-15 18:17:59 +00:00
Robert Watson	4f8721d2a9	Correct build on 64-bit: cast u_int64_t to (unsigned long long) before printfing as (unsigned long long). 32-bit build on i386 didn't notice this. Whoops. Reported by: arved Tested by: sledge	2005-07-14 15:21:18 +00:00
Robert Watson	cd814b2692	Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc statistics via a binary structure stream: - Add structure 'malloc_type_stream_header', which defines a stream version, definition of MAXCPUS used in the stream, and a number of malloc_type records in the stream. - Add structure 'malloc_type_header', which defines the name of the malloc type being reported on. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUS malloc_type_stats structures holding per-CPU allocation information. Typical values of MAXCPUS will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here: - Bump statistics width to uint64_t, and hard code using fixed-width type in order to be more sure about structure layout in the stream. We allocate and free a lot of memory. - Add kmemcount, a counter of the number of registered malloc types, in order to avoid excessive manual counting of types. Export via a new sysctl to allow user-space code to better size buffers. - De-XXX comment on no longer maintaining the high watermark in old sysctl monitoring code. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. Likewise, similar changes to UMA.	2005-07-14 11:52:06 +00:00
Robert Watson	49bb6870cc	Bump the module versions of the MAC Framework and MAC policy modules from 2 (6.x) to 3 (7.x) to allow for future changes in the MAC policy module ABI in 7.x. Obtained from: TrustedBSD Project	2005-07-14 10:46:03 +00:00
Robert Watson	d26dd2d99e	When devfs cloning takes place, provide access to the credential of the process that caused the clone event to take place for the device driver creating the device. This allows cloned device drivers to adapt the device node based on security aspects of the process, such as the uid, gid, and MAC label. - Add a cred reference to struct cdev, so that when a device node is instantiated as a vnode, the cloning credential can be exposed to MAC. - Add make_dev_cred(), a version of make_dev() that additionally accepts the credential to stick in the struct cdev. Implement it and make_dev() in terms of a back-end make_dev_credv(). - Add a new event handler, dev_clone_cred, which can be registered to receive the credential instead of dev_clone, if desired. - Modify the MAC entry point mac_create_devfs_device() to accept an optional credential pointer (may be NULL), so that MAC policies can inspect and act on the label or other elements of the credential when initializing the skeleton device protections. - Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(), so that the pty clone credential is exposed to the MAC Framework. While currently primarily focussed on MAC policies, this change is also a prerequisite for changes to allow ptys to be instantiated with the UID of the process looking up the pty. This requires further changes to the pty driver -- in particular, to immediately recycle pty nodes on last close so that the credential-related state can be recreated on next lookup. Submitted by: Andrew Reisse <andrew.reisse@sparta.com> Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA MFC after: 1 week MFC note: Merge to 6.x, but not 5.x for ABI reasons	2005-07-14 10:22:09 +00:00
John Baldwin	2c65cb82ad	Add a 'sysent' target that depends on the various files built from syscalls.master for the master list and the Alpha/OSF1 compat ABI to be consistent with all the other compat ABIs where 'make sysent' already works. MFC after: 3 days	2005-07-13 20:50:17 +00:00
David Xu	740fd64d65	Validate if the value written into {FS,GS}.base is a canonical address, writting non-canonical address can cause kernel a panic, by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring only canonical values get written to the registers. Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com > Approved by: re (scottl)	2005-07-10 23:31:11 +00:00
John Baldwin	522ccb2381	Regen. Approved by: re (scottl)	2005-07-08 15:06:58 +00:00
John Baldwin	4acd2e73e5	Mark second instance of lchown() MP safe just like the first. Approved by: re (scottl)	2005-07-08 15:01:13 +00:00
John Baldwin	9f3157a254	Regenerate. Approved by: re (scottl)	2005-07-07 18:20:38 +00:00
John Baldwin	bcd9e0dd20	- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week	2005-07-07 18:17:55 +00:00
Robert Watson	6758f88ea4	Add MAC Framework and MAC policy entry point mac_check_socket_create(), which is invoked from socket() and socketpair(), permitting MAC policy modules to control the creation of sockets by domain, type, and protocol. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl) Requested by: SCC	2005-07-05 22:49:10 +00:00
Pawel Jakub Dawidek	c23c87bd93	Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp) below KASSERT()s, which means there was no real problem here, we just needed better locking for assertions. OK'ed by: jeff Approved by: re (scottl)	2005-07-05 15:57:55 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Joseph Koshy	151392465f	MFP4: - pmcstat(8) gprof output mode fixes: lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h: + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event, so that pmcstat(8) can determine where the runtime loader /libexec/ld-elf.so.1 is getting loaded. sys/kern/kern_exec.c: + Use a local struct to group the entry address of the image being exec()'ed and the process credential changed flag to the exec handling hook inside hwpmc(4). usr.sbin/pmcstat/*: + Support "-k kernelpath", "-D sampledir". + Implement the ELF bits of 'gmon.out' profile generation in a new file "pmcstat_log.c". Move all log related functions to this file. + Move local definitions and prototypes to "pmcstat.h" - Other bug fixes: + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read(). + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all attached PMCs when a process exits. + sys/sys/pmc.h: correct a function prototype. + Improve usage checks in pmcstat(8). Approved by: re (blanket hwpmc)	2005-06-30 19:01:26 +00:00
Paul Saab	cff2e749e2	Use SCTL_MASK32 to determine that the sysctl call is from a 32bit binary for kern.cp_time. Approved by: re	2005-06-30 17:17:29 +00:00
Peter Wemm	62919d788b	Jumbo-commit to enhance 32 bit application support on 64 bit kernels. This is good enough to be able to run a RELENG_4 gdb binary against a RELENG_4 application, along with various other tools (eg: 4.x gcore). We use this at work. ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace, procfs and core dumps. procfs_regs.c: vary the format of proc/XXX/regs depending on the client and target application. procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their sscanf fails. They expect an unsigned long. imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps. sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note that 64 bit consumers can still debug 32 bit targets. IA64 has got stubs for ia32_reg.c. Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't implemented in the 32/64 wrapper yet. We also make a tiny patch to gdb pacify it over conflicting formats of ld-elf.so.1. Approved by: re	2005-06-30 07:49:22 +00:00
Peter Wemm	48033188a6	Second part of commit for moving KDB_STOP_NMI from opt_global.h to opt_kdb.h. Found by: kris Approved by: re	2005-06-30 03:38:10 +00:00
Peter Wemm	2de92a386e	Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with a size of zero. Traditionally this was what you did before IOC_VOID existed, and we had some established users of this in the tree, namely procfs. Certain 3rd party drivers with binary userland components also have this too. This is necessary to have 4.x and 5.x binaries use these ioctl's. We found this at work when trying to run 4.x binaries. Approved by: re	2005-06-30 00:19:08 +00:00
Peter Wemm	f0c6706de9	Move the KDB_STOP_NMI option from opt_global.h to opt_kdb.h Approved by: re	2005-06-29 23:23:16 +00:00
Mike Silbersack	a7b844d2be	Fix the false memory modified after free messages some users have been reporting - in my previous change, I missed the case where a mbuf from the packet zone was freed back to the mbuf/packet keg, where it was subsequently put into the mbuf zone and found not to contain the expected trash. This change adds the necessary trash_dtor call inside mb_fini_pack so that everything is correct. Thanks for Bosko for finding the bug and showing me how secondary zones work. Approved by: re (dwhite)	2005-06-29 08:18:26 +00:00
Dima Dorfman	1ee6b74603	Fix fdcheckstd to pass the file descriptor along through vn_open. When opening a device, devfs_open needs the file descriptor to install its own fileops. Failing to pass the file descriptor causes the vnode to be returned with the regular vnops, which will cause a panic on the first read or write because devfs_specops is not meant to support those operations. This bug caused a panic after exec'ing any set[ug]id program with fds 0..2 closed (i.e., if any action had to be taken by fdcheckstd, we would panic if the exec'd program ever tried to use any of those descriptors). Reviewed by: phk Approved by: re (scottl)	2005-06-25 03:34:49 +00:00
Pawel Jakub Dawidek	400a74bff8	Close another information leak in ktrace(2): one was able to find active process groups outside a jail, etc. by using ktrace(2). OK'ed by: rwatson Approved by: re (scottl) MFC after: 1 week	2005-06-24 12:05:24 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Pawel Jakub Dawidek	06a137780b	Actually only protect mount-point if security.jail.enforce_statfs is set to 2. If we don't return statistics about requested file systems, system tools may not work correctly or at all. Approved by: re (scottl)	2005-06-23 22:13:29 +00:00
John Baldwin	57dbcb11db	Fix a typo in a comment. Approved by: re (scottl)	2005-06-23 21:55:43 +00:00
Mike Silbersack	121f050976	Change the mbuf, mbuf cluster, and mbuf packet allocation routines so that the UMA "trash" allocator is used - this ensures that any writes to a freed mbuf should provoke a panic. Only enabled under INVARIANTS, of course. Approved by: re (scottl)	2005-06-23 04:33:39 +00:00
Pawel Jakub Dawidek	b0d9aedd28	Add missing unlock. Pointy hat to: pjd Approved by: re (dwhite)	2005-06-21 21:17:02 +00:00
John Baldwin	943928c905	Simplify the storming logic and remove a variable as a result. Approved by: re (dwhite)	2005-06-20 19:32:23 +00:00
Garance A Drosehn	bd3aace7e4	Fix a panic which could occur parsing #!-lines in a shell-script. If the #!-line had multiple whitespace characters after the interpreter name, and it did not have any options, then the code would do nasty things trying to process a (non-existent) option-string which "ended before it began"... Submitted by: Morten Johansen Approved by: re (dwhite)	2005-06-19 02:21:03 +00:00
Jeff Roberson	b770ff6eb2	- Try to catch the wrong bufobj panics a little earlier. I believe they are actually caused by a buf with both VNCLEAN and VNDIRTY set. In the traces it is clear that the buf is removed from the dirty queue while it is actually on the clean queue which leaves the tail pointer set. Assert that both flags are not set in buf_vlist_add and buf_vlist_remove. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-18 18:17:03 +00:00
Jeff Roberson	32b6dcd8a4	- Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and cache_zap() to clear the v_dd pointers when a directory vnode is forcibly discarded. For this to work, all vnodes with v_dd pointers to a directory must also have name cache entries linked via v_cache_dst to that dvp otherwise we could not find them at cache_purge() time. The following code snipit could break this guarantee by unlinking a directory before fetching it's dotdot. The dotdot lookup would initialize the v_dd field of the unlinked directory which could never be cleared. To fix this we don't initialize v_dd for orphaned vnodes. printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */ printf("chdir: %d\n", chdir("..")); printf("%s\n", getwd(NULL)); Sponsored by: Isilon Systems, Inc. Discovered by: kkenn Approved by: re (blanket vfs)	2005-06-17 01:05:13 +00:00
Ken Smith	c0cac8dc20	Remove a variable that became unused as a result of changes made in v1.139. This was only exposed if MALLOC_PROFILE was defined. Submitted by: Gary Jennejohn Pointy hat: rwatson Approved by: re (scottl)	2005-06-16 16:01:46 +00:00
Jeff Roberson	114a1006a8	- Change holdcnt use around vnode recycling. We now always keep a holdcnt ref while we're calling vgone(). This prevents transient refs from re-adding us to the free list. Previously, a vfree() triggered via vinvalbuf() getting rid of all of a vnode's pages could place a partially destructed vnode on the free list where vtryrecycle() could find it. The first call to vtryrecycle would hang up on the vnode lock, but when it failed it would place a now dead vnode onto the free list, and another call to vtryrecycle() would free an already free vnode. There were many complications of having a zero ref count while freeing which can now go away. - Change vdropl() to release the interlock before returning. All callers now respect this, so vdropl() directly frees VI_DOOMED vnodes once the last ref is dropped. This means that we'll never have VI_DOOMED vnodes on the free list. - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and v_decr_useonly(). The incr/decr split is so that incr usecount can return with the interlock still held while decr drops the interlock so it can call vdropl() which will potentially free the vnode. The calling function can't drop the lock of an already free'd node. v_decr_useonly() drops a usecount without droping the hold count. This is done so the usecount reaches zero in vput() before we recycle, however the holdcount is still 1 which prevents any new references from placing the vnode back on the free list. - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget(). We wouldn't want vnlrureclaim() to bump the usecount since this has different semantics. Also change vnlrureclaim() to do a NOWAIT on the vn_lock. When this function runs we're usually in a desperate situation and we wouldn't want to wait for any specific vnode to be released. - Fix a bunch of misc comments to reflect the new behavior. - Add vhold() and vdrop() to vflush() for the same reasons that we do in vlrureclaim(). Previously we held no reference and a vnode could have been freed while we were waiting on the lock. - Get rid of vlruvp() and vfreehead(). Neither are used. vlruvp() should really be rethought before it's reintroduced. - vgonel() always returns with the vnode locked now and never puts the vnode back on a free list. The vnode will be freed as soon as the last reference is released. Sponsored by: Isilon Systems, Inc. Debugging help from: Kris Kennaway, Peter Holm Approved by: re (blanket vfs)	2005-06-16 04:41:42 +00:00
Jeff Roberson	bdcd9f26b0	- Fix insertions of bios which represent data earlier than anything else in the queue. The insertion sort assumed this had already been taken care of. Spotted by: Antoine Brodin Approved by: re (scottl)	2005-06-15 23:32:07 +00:00
Jeff Roberson	7a06fe49dc	- Add and enhance asserts related to the wrong bufobj panic. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-14 20:32:27 +00:00
Jeff Roberson	12c2dcde40	- In reassignbuf() add many asserts to validate the head and tail pointers of the clean and dirty lists. This is in an attempt to catch the wrong bufobj problem sooner. - In vgonel() don't acquire an extra reference in the active case, the vnode lock and VI_DOOMED protect us from recursively cleaning. - Also in vgonel() clean up some stale comments. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-14 20:31:53 +00:00
Jeff Roberson	dbb3ec5ce3	- Remove vnode lock asserts at the end of vfs syscalls. These asserts were used to ensure that we weren't exiting the syscall with a lock still held. This wasn't safe, however, because we'd already executed a vput() and on a loaded system the vnode may have been free'd by the time we assert. This functionality is also handled by the td_locks assert in userret, which doesn't tell you what the syscall was, but will at least panic before you deadlock. Sponsored by: Isilon Systems, Inc. Discovred by: Peter Holm Approved by: re (blanket vfs)	2005-06-14 01:14:40 +00:00
Jeff Roberson	b930d85380	- Don't make vgonel() globally visible, we want to change its prototype anyway and it's not used outside of vfs_subr.c. - Change vgonel() to accept a parameter which determines whether or not we'll put the vnode on the free list when we're done. - Use the new vgonel() parameter rather than VI_DOOMED to signal our intentions in vtryrecycle(). - In vgonel() return if VI_DOOMED is already set, this vnode has already been reclaimed. Sponsored by: Isilon Systems, Inc.	2005-06-13 06:26:55 +00:00
Jeff Roberson	6bd8103d33	- Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may not be called in all cases where we free the cnp. Sponsored by: Isilon Systems, Inc.	2005-06-13 05:59:59 +00:00
Jeff Roberson	d598b04d44	- It has long been my suspicion that we don't actually need a loop in vn_lock(). Add an assert that will help me gain more confidence that this is correct. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:47:29 +00:00
Jeff Roberson	d2ad9baac0	- Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:46:37 +00:00
Jeff Roberson	eff2d12635	- Add KTR_VFS messages for various name cache related events. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:46:03 +00:00
Jeff Roberson	748c92fbad	- Split one KASSERT in bremfree() into two to aid in debugging. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:45:05 +00:00
Jeff Roberson	f19f6869cf	- Dramatically simplify bioqdisksort(). We no longer do ordered bios so most of the code to deal with them has been dead for sometime. Simplify the code by doing an insert sort hinted by the current head position. Met with apathy by: arch@	2005-06-12 22:32:29 +00:00
Pawel Jakub Dawidek	65ac438c8f	Do not allocate memory while holding a mutex. I introduce a very small race here (some file system can be mounted or unmounted between 'count' calculation and file systems list creation), but it is harmless. Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/ Reported by: Peter Holm <peter@holm.cc>	2005-06-12 07:03:23 +00:00
Pawel Jakub Dawidek	3a996d6e91	Do not allocate memory based on not-checked argument from userland. It can be used to panic the kernel by giving too big value. Fix it by moving allocation and size verification into kern_getfsstat(). This even simplifies kern_getfsstat() consumers, but destroys symmetry - memory is allocated inside kern_getfsstat(), but has to be freed by the caller. Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/ Reported by: Peter Holm <peter@holm.cc>	2005-06-11 14:58:20 +00:00
Maxim Konovalov	922a5d9c2b	o setsockopt(2) cannot remove accept filter. [1] o getsockopt(SO_ACCEPTFILTER) always returns success on listen socket even we didn't install accept filter on the socket. o Fix these bugs and add regression tests for them. Submitted by: Igor Sysoev [1] Reviewed by: alfred MFC after: 2 weeks	2005-06-11 11:59:48 +00:00
Jeff Roberson	d6dbf760a6	- Assert that we're not in the name cache anymore in vdestroy(). Sponsored by: Isilon Systems, Inc.	2005-06-11 08:48:09 +00:00
Jeff Roberson	1b2da2d0fa	- Assert that we're not adding a doomed vnode to the name cache. Sponsored by: Isilon Systems, Inc.	2005-06-11 08:47:30 +00:00
Jeff Roberson	9aa0eba464	- Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS events could be added to cover other interesting details. - Add some VNASSERTs to discover places where we access vnodes after they have been uma_zfree'd before we try to free them again. - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is really unused. Sponsored by: Isilon Systems, Inc.	2005-06-11 01:16:46 +00:00
Brian Feldman	cc3149b1ea	Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208	2005-06-10 23:50:41 +00:00
Jeff Roberson	37ee2d8dd4	- Add curthread to the state that ktr is saving. The extra information is well worth the bloat. - Change the formatting of 'show ktr' slightly to accommodate the additional field. Remove a tab from the verbose output and place the actual trace data after a : so it is more easy to understand which part is the event and which is part of the record.	2005-06-10 23:21:29 +00:00
Joseph Koshy	8c61b21927	Fix typo. Reviewed by: rwatson, sam	2005-06-10 18:06:59 +00:00
Brooks Davis	fc74a9f93a	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam	2005-06-10 16:49:24 +00:00
Stephan Uphoff	3ea6bbc59a	Restore preemption of idle threads. Submitted by: jhb	2005-06-10 03:00:29 +00:00
Suleiman Souhlal	679985d03a	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)	2005-06-09 20:20:31 +00:00
Scott Long	8bde93598a	Drat! Committed from the wrong branch. Restore HEAD to its previous goodness.	2005-06-09 19:59:09 +00:00
Scott Long	76b472dbda	Back out 1.68.2.26. It was a mis-guided change that was already backed out of HEAD and should not have been MFC'd. This will restore UDP socket functionality, which will correct the recent NFS problems. Submitted by: rwatson	2005-06-09 19:56:38 +00:00
Joseph Koshy	f263522a45	MFP4: - Implement sampling modes and logging support in hwpmc(4). - Separate MI and MD parts of hwpmc(4) and allow sharing of PMC implementations across different architectures. Add support for P4 (EMT64) style PMCs to the amd64 code. - New pmcstat(8) options: -E (exit time counts) -W (counts every context switch), -R (print log file). - pmc(3) API changes, improve our ability to keep ABI compatibility in the future. Add more 'alias' names for commonly used events. - bug fixes & documentation.	2005-06-09 19:45:09 +00:00
Stephan Uphoff	a3f2d84279	Lots of whitespace cleanup. Fix for broken if condition. Submitted by: nate@	2005-06-09 19:43:08 +00:00
Pawel Jakub Dawidek	820a0de9a9	Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs and extend its functionality: value policy 0 show all mount-points without any restrictions 1 show only mount-points below jail's chroot and show only part of the mount-point's path (if jail's chroot directory is /jails/foo and mount-point is /jails/foo/usr/home only /usr/home will be shown) 2 show only mount-point where jail's chroot directory is placed. Default value is 2. Discussed with: rwatson	2005-06-09 18:49:19 +00:00
Pawel Jakub Dawidek	4eb7c9f6c9	Remove process information leak from inside a jail, when security.bsd.see_other_uids is set to 0, etc. One can check if invisible process is active, by doing: # ktrace -p <pid> If ktrace returns 'Operation not permitted' the process is alive and if returns 'No such process' there is no such process. MFC after: 1 week	2005-06-09 18:33:21 +00:00
Stephan Uphoff	f3a0f87396	Fix some race conditions for pinned threads that may cause them to run on the wrong CPU. Add IPI support for preempting a thread on another CPU. MFC after:3 weeks	2005-06-09 18:26:31 +00:00
Pawel Jakub Dawidek	13a82b9623	Avoid code duplication in serval places by introducing universal kern_getfsstat() function. Obtained from: jhb	2005-06-09 17:44:46 +00:00
Warner Losh	139f16505d	Simplify the code a bit after the bzero().	2005-06-09 05:50:01 +00:00
Jeff Roberson	a3d239bc29	- My sub-par public school education has been exposed. s/sentinal/sentinel/ Noticed by: Emil Mikulic	2005-06-09 04:40:20 +00:00
Garance A Drosehn	386ea9321d	Remove the previous parsing-logic for arguments on the '#!'-line of shell scripts. As far as I know, no one has needed the '#!#<' kludge to get at the behavior implemented by the historical parsing.	2005-06-09 00:27:02 +00:00
Jeff Roberson	9e879a5ee0	- Under heavy IO load the buf daemon can run for many hundereds of milliseconds due to what is essentially n^2 algorithmic complexity. This change makes the algorithm N*2 instead. This heavy processing manifested itself as skipping in audio and video playback due to the long scheduling latencies and contention on giant by pcm. - flushbufqueues() is now responsible for flushing multiple buffers rather than one at a time. This allows us to save our progress in the list by using a sentinal. We must do the numdirtywakeup() and waitrunningbufspace() here now rather than in buf_daemon(). - Also add a uio_yield() after we have processed the list once for bufs without deps and again for bufs with deps. This is to release Giant and allow any other giant locked code to proceed. Tested by: Many users on current@ Revealed by: schedgraph traces sent by Emil Mikulic & Anthony Ginepro	2005-06-08 20:26:05 +00:00
Craig Rodrigues	1209e08faf	Initialize uio_iovcnt to 1 in extattr_list_vp() and extattr_get_vp() PR: kern/79357 Approved by: rwatson	2005-06-08 13:22:10 +00:00
Robert Watson	e2f7a83d6b	In sem_forkhook(), don't attempt to generate a copy of the process semaphore list on fork() if the process doesn't actually have references to any semaphores. This avoids extra work, as well as potentially asking to allocate storage for 0 references. Found by: avatar MFC after: 1 week	2005-06-08 07:29:22 +00:00
Jeff Roberson	fae89dce3e	- Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility of a vget causing another call to INACTIVE before we're finished.	2005-06-07 22:05:32 +00:00
Alan Cox	b490cc72b2	In lio_listio(2) change jobref from an int to a long so that lio_listio(LIO_WAIT, ...) works correctly on 64-bit architectures. Reviewed by: tegge	2005-06-07 05:28:21 +00:00
Robert Watson	3831e7d7f5	Gratuitous renaming of four System V Semaphore MAC Framework entry points to convert _sema() to _sem() for consistency purposes with respect to the other semaphore-related entry points: mac_init_sysv_sema() -> mac_init_sysv_sem() mac_destroy_sysv_sem() -> mac_destroy_sysv_sem() mac_create_sysv_sema() -> mac_create_sysv_sem() mac_cleanup_sysv_sema() -> mac_cleanup_sysv_sem() Congruent changes are made to the policy interface to support this. Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA	2005-06-07 05:03:28 +00:00
Jeff Roberson	6680bbd529	- Fix the case where we're not preempting but there is already a newtd as this happens via thread_switchout(). I don't particularly like the structure of the code here. We twice call out to thread code when a thread is voluntarily switching. Once to thread_switchout() and once to slot_fill(), while sched_4BSD does even more work which is redundant to select another thread to use our remaining slice. This should be simplified in the future, but for now I'm only going to fix the bug not the bad design.	2005-06-07 02:59:16 +00:00
Doug White	4a30c508d1	Make "show msgbuf" use the pager instead of blasting the whole thing out. MFC after: 3 days	2005-06-06 22:18:32 +00:00
David Xu	ec8297bda1	Fix a bug relavant to debugging, a masked signal unexpectedly interrupts a sleeping thread when process is being debugged. PR: GNU/77818 Tested by: Sean C. Farley <sean-freebsd at farley org>	2005-06-06 05:13:10 +00:00
Andrew Gallatin	92dd256bd4	Allow sends sent from non page-aligned userspace addresses to be considered for zero-copy sends. Reviewed by: alc Submitted by: Romer Gil at Rice University	2005-06-05 17:13:23 +00:00
Alan Cox	67b95a95eb	Eliminate an unused field from struct aio_liojob.	2005-06-05 05:41:48 +00:00
Marius Strobl	fce21e7e25	After some input from bde@ and rereading the datasheet use a MTX_SPIN mutex instead of a MTX_DEF one in order to defer preemption while reading the date and time registers. If we don't manage to read them within the time slot where we are guaranteed that no updates occur we might actually read them during an update in which case the output is undefined.	2005-06-04 23:24:50 +00:00
Alan Cox	bbe7bbdfee	Eliminate the original method of requesting notification of aio_read(2) and aio_write(2) completion through kevent(2). This method does not work on 64-bit architectures. It was deprecated in FreeBSD 4.4. See revisions 1.87 and 1.70.2.7. Change aio_physwakeup() to call psignal(9) directly rather than indirectly through a timeout(9). Discussed with: bde Correct a bug introduced in revision 1.65 that could result in premature delivery of a signal if an lio_listio(2) consisted of a mixture of direct/raw and queued I/O operations. Observed by: tegge Eliminate a field from struct kaioinfo that is now unused. Reviewed by: tegge	2005-06-04 19:16:33 +00:00
Jeff Roberson	9fe02f7e16	- It's 2005 already, I've been working on this for three years.	2005-06-04 09:24:15 +00:00
Jeff Roberson	21381d1b9e	- Don't SLOT_USE() in the preempt case, sched_add() has already taken the slot for us. Previously, we would take two slots on every preempt, and setrunqueue() would fix it up for us in the non threaded case. The threaded case was simply broken. - Clean up flags, prototypes, comments.	2005-06-04 09:23:28 +00:00
Paul Saab	efe5becafa	Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall. Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks	2005-06-03 23:15:01 +00:00
Alan Cox	3769f562e2	Synchronize access to the per process aiocb lists in many of the functions.	2005-06-03 05:27:20 +00:00
Alan Cox	e293dc860c	In aio_waitcomplete() correct two cases of using an aiocb after freeing it.	2005-06-02 23:14:38 +00:00
Alan Cox	f0e5132053	Giant is no longer required in kern_setrlimit(); remove its acquisition and release. Reviewed by: jhb	2005-06-01 17:52:51 +00:00
Ken Smith	6341095e0d	This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that. A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking). Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@	2005-05-31 19:39:52 +00:00
Alan Cox	3148c2c96a	Synchronize access to aio_freeproc with a mutex. Eliminate related spl calls. Reduce the scope of Giant in aio_daemon().	2005-05-30 22:26:34 +00:00
Alan Cox	3999ebe3b6	Use the proc mtx to prevent simultaneous changes to p_aioinfo.	2005-05-30 19:33:33 +00:00
Alan Cox	8285135020	Eliminate unnecessary calls to wakeup(); no one sleeps on &aio_freeproc. Eliminate an unused flag, AIOP_SCHED; it's cleared but never set.	2005-05-30 18:02:00 +00:00
Robert Watson	3984b2328c	Rebuild generated system call definition files following the addition of the audit event field to the syscalls.master file format. Submitted by: wsalamon Obtained from: TrustedBSD Project	2005-05-30 15:20:21 +00:00
Robert Watson	f3596e3370	Introduce a new field in the syscalls.master file format to hold the audit event identifier associated with each system call, which will be stored by makesyscalls.sh in the sy_auevent field of struct sysent. For now, default the audit identifier on all system calls to AUE_NULL, but in the near future, other BSM event identifiers will be used. The mapping of system calls to event identifiers is many:one due to multiple system calls that map to the same end functionality across compatibility wrappers, ABI wrappers, etc. Submitted by: wsalamon Obtained from: TrustedBSD Project	2005-05-30 15:09:18 +00:00

... 2 3 4 5 6 ...

8853 Commits