freebsd-dev

Author	SHA1	Message	Date
David Xu	823acd70b6	Regen for sigqueue syscall.	2005-10-14 12:56:28 +00:00
David Xu	9104847f21	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64	2005-10-14 12:43:47 +00:00
Doug Ambrisko	db43cd0417	Fix tinderbox box by removing incomplete/bad spl usage. Proper giant free locking is required in for aio. Pointed out by: imp	2005-10-12 22:33:22 +00:00
Doug Ambrisko	69cd28dacb	Add in kqueue support to LIO event notification and fix how it handled notifications when LIO operations completed. These were the problems with LIO event complete notification: - Move all LIO/AIO event notification into one general function so we don't have bugs in different data paths. This unification got rid of several notification bugs one of which if kqueue was used a SIGILL could get sent to the process. - Change the LIO event accounting to count all AIO request that could have been split across the fast path and daemon mode. The prior accounting only kept track of AIO op's in that mode and not the entire list of operations. This could cause a bogus LIO event complete notification to occur when all of the fast path AIO op's completed and not the AIO op's that ended up queued for the daemon. Suggestions from: alc	2005-10-12 17:51:31 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Tor Egge	8272da3106	Release clean buffer with wrong size and no dependencies also for non-VMIO case.	2005-10-09 22:41:25 +00:00
Marcel Moolenaar	125fbd3cdc	Add parse_uuid() that creates a binary representation of an UUID from a string representation.	2005-10-07 13:37:10 +00:00
Poul-Henning Kamp	0694506637	Eliminate __RMAN_RESOURCE_VISIBLE hack entirely by moving the struct resource_ to subr_rman.c where it belongs.	2005-10-06 21:49:31 +00:00
Gleb Smirnoff	f0796cd26c	- Don't pollute opt_global.h with DEVICE_POLLING and introduce opt_device_polling.h - Include opt_device_polling.h into appropriate files. - Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that can be compiled as loadable modules. Reviewed by: bde	2005-10-05 10:09:17 +00:00
Warner Losh	2f624c21c5	When data passed into devctl_notify is NULL, don't print (null). Instead don't print anything at all. # this fixes a problem that I noticed with devd.pipe not terminating lines # with \n correctly sometimes.	2005-10-04 22:25:14 +00:00
Robert Watson	7723d5ed12	Re-order MAC and DAC checks in shmget() in order to give precedence to the MAC result, as well as avoid losing the DAC check result when MAC is enabled. MFC after: 3 days Reported by: Patrick LeBlanc <Patrick dot LeBlanc at sparta dot com>	2005-10-04 16:40:20 +00:00
Roman Kurakin	826cf005ed	Use FILEDESC_UNLOCK(fdp) after FILE_UNLOCK(p), not before to avoid LOR. Slightly discussed on current@. LOR #055 MFC after: 14 days	2005-10-04 16:27:54 +00:00
Christian S.J. Peron	9eea3d85cc	Standard Giant push down operations for the Mandatory Access Control (MAC) framework. This makes Giant protection around MAC operations which inter- act with VFS conditional, based on the MPSAFE status of the file system. Affected the following syscalls: o __mac_get_fd o __mac_get_file o __mac_get_link o __mac_set_fd o __mac_set_file o __mac_set_link -Drop Giant all together in __mac_set_proc because the mac_cred_mmapped_drop_perms_recurse routine no longer requires it. -Move conditional Giant aquisitions to after label allocation routines. -Move the conditional release of Giant to before label de-allocation routines. Discussed with: rwatson	2005-10-04 14:32:58 +00:00
Don Lewis	34ea500bea	Add missing word to comment.	2005-10-04 04:02:33 +00:00
Gleb Smirnoff	e113edf30a	o Move a lot of parameter checking from netisr_poll() to dedicated sysctl handlers. Protect manipulations with poll_mtx. The affected sysctls are: - kern.polling.burst_max - kern.polling.each_burst - kern.polling.user_frac - kern.polling.reg_frac o Use CTLFLAG_RD on MIBs that supposed to be read-only. o u_int32t -> uint32_t o Remove unneeded locking from poll_switch().	2005-10-03 14:15:26 +00:00
Colin Percival	33812c066d	If sufficiently bad things happen during a call to kern_execve(), it is possible for do_execve() to call exit1() rather than returning. As a result, the sequence "allocate memory; call kern_execve; free memory" can end up leaking memory. This commit documents this astonishing behaviour and adds a call to exec_free_args() before the exit1() call in do_execve(). Since all the users of kern_execve() in the tree use exec_free_args() to free the command-line arguments after kern_execve() returns, this should be safe, and it fixes the memory leak which can otherwise occur. Submitted by: Peter Holm MFC after: 3 days Security: Local denial of service	2005-10-03 12:49:54 +00:00
Hajimu UMEMOTO	56e5a87a55	make saved cpu level stackable.	2005-10-03 06:57:29 +00:00
Don Lewis	5032ff8197	Always wire the sysctl output buffer in sysctl_kern_proc() before calling sysctl_out_proc(). -- fix from jhb Move the code in fill_kinfo_thread() that gathers data from struct proc into the new function fill_kinfo_proc_only(). Change all callers of fill_kinfo_thread() to call both fill_kinfo_proc_only() and fill_kinfo() thread. When gathering data from a multi-threaded process, fill_kinfo_proc_only() only needs to be called once. Grab sched_lock before accessing the process thread list or calling fill_kinfo_thread(). PR: kern/84684 MFC after: 3 days	2005-10-02 23:27:56 +00:00
Robert Watson	c30bf5c317	Include kdb.h so that kdb_active is declared regardless of KDB being included in the kernel. MFC after: 0 days	2005-10-02 10:03:51 +00:00
Poul-Henning Kamp	7bbb3a2690	Make sure the clone lists are sorted in the right order. Explosion triggered by: pjd MFC: 3 days	2005-10-01 19:21:03 +00:00
Gleb Smirnoff	4092996774	Big polling(4) cleanup. o Axe poll in trap. o Axe IFF_POLLING flag from if_flags. o Rework revision 1.21 (Giant removal), in such a way that poll_mtx is not dropped during call to polling handler. This fixes problem with idle polling. o Make registration and deregistration from polling in a functional way, insted of next tick/interrupt. o Obsolete kern.polling.enable. Polling is turned on/off with ifconfig. Detailed kern_poll.c changes: - Remove polling handler flags, introduced in 1.21. The are not needed now. - Forget and do not check if_flags, if_capenable and if_drv_flags. - Call all registered polling handlers unconditionally. - Do not drop poll_mtx, when entering polling handlers. - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx. - In netisr_poll() axe the block, where polling code asks drivers to unregister. - In netisr_poll() and ether_poll() do polling always, if any handlers are present. - In ether_poll_[de]register() remove a lot of error hiding code. Assert that arguments are correct, instead. - In ether_poll_[de]register() use standard return values in case of error or success. - Introduce poll_switch() that is a sysctl handler for kern.polling.enable. poll_switch() goes through interface list and enabled/disables polling. A message that kern.polling.enable is deprecated is printed. Detailed driver changes: - On attach driver announces IFCAP_POLLING in if_capabilities, but not in if_capenable. - On detach driver calls ether_poll_deregister() if polling is enabled. - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING flag. If there is no, then unlocks and returns. - In ioctl handler driver checks for IFCAP_POLLING flag requested to be set or cleared. Driver first calls ether_poll_[de]register(), then obtains driver lock and [dis/en]ables interrupts. - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable. If present, then returns.This is important to protect from spurious interrupts. Reviewed by: ru, sam, jhb	2005-10-01 18:56:19 +00:00
Don Lewis	5997cae9a4	Copy new process argument list in do_execve() before grabbing PROC_LOCK to avoid touching pageable memory while holding a mutex. Simplify argument list replacement logic. PR: kern/84935 Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form) MFC after: 3 days	2005-10-01 08:33:56 +00:00
Don Lewis	bd3c2d867d	Un-staticize waitrunningbufspace() and call it before returning from ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().	2005-09-30 18:07:41 +00:00
David Xu	763a429571	Fox a LOR of sleep and sched_lock by using a timeout wait when process reaches maximum number of threads. MFC after: 3 days	2005-09-30 06:09:41 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
John Baldwin	b65089ccb5	Trim a couple of unneeded includes.	2005-09-29 19:13:52 +00:00
Peter Edwards	d41c4674c2	Close a race in biodone(), whereby the bio_done field of the passed bio may have been freed and reassigned by the wakeup before being tested after releasing the bdonelock. There's a non-zero chance this is the cause of a few of the crashes knocking around with biodone() sitting in the stack backtrace. Reviewed By: phk@	2005-09-29 10:37:20 +00:00
Poul-Henning Kamp	64fd97df54	puc(4) does strange things to resources in order to fool the subdrivers to hook up. It should probably be rewritten to implement a simple bus to which the sub drivers attach using some kind of hint. Until then, provide a couple of crutch functions with big warning signs so it can survive the recent changes to struct resource.	2005-09-28 18:06:25 +00:00
Robert Watson	5f419982c2	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde	2005-09-28 07:03:03 +00:00
Christian S.J. Peron	453f7d5369	Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep track of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT macros around vrele as it's possible that this can call in the VOP_INACTIVE filesystem specific code. Also while we are here, remove the Giant assertion. from the sysctl handler, we do not actually require Giant here so we shouldn't assert it. Doing so will just complicate things when Giant is removed from the sysctl framework.	2005-09-28 00:30:56 +00:00
Robert Watson	667285c4e3	If KDB_STOP_NMI is compiled into the kernel, default debug.kdb.stop_cpus_with_nmi to 1 rather than 0. MFC after: 3 days	2005-09-27 21:12:05 +00:00
Robert Watson	2b59d50cfb	In lockstatus(), don't lock and unlock the interlock when testing the sleep lock status while kdb_active, or we risk contending with the mutex on another CPU, resulting in a panic when using "show lockedvnods" while in DDB. MFC after: 3 days Reviewed by: jhb Reported by: kris	2005-09-27 21:02:59 +00:00
Robert Watson	32a6bd9510	No longer maintain mbstat statistics for the mbuf allocator, UMA statistics and libmemstat(3) are now used to track mbuf statistics. MFC after: 1 month	2005-09-27 20:28:43 +00:00
John Baldwin	7e9e371f2d	Use the refcount API to manage the reference count for user credentials rather than using pool mutexes. Tested on: i386, alpha, sparc64	2005-09-27 18:09:42 +00:00
John Baldwin	b2149bde1f	Use the reference count API to manage the reference counts for process limit structures rather than using pool mutexes to protect the reference counts. Tested on: i386, alpha, sparc64	2005-09-27 18:07:05 +00:00
John Baldwin	55b4a5ae0d	Use the refcount API to implement reference counts on process argument structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64	2005-09-27 18:03:15 +00:00
Christian S.J. Peron	6acd4b6189	Update the "created from" section to reflect the most recent version of syscalls.master Requested by: jhb	2005-09-27 14:36:59 +00:00
Christian S.J. Peron	7f300b47dd	Mark the extended attribute syscalls as being MP safe. Requested by: jhb	2005-09-27 14:32:04 +00:00
John Baldwin	d27acf445e	Add the spin lock used by the binary nvidia driver to the static lock order list so that WITNESS and the driver play together nicely. Tested by: Harald Schmalzbauer MFC after: 3 days	2005-09-26 18:30:12 +00:00
Robert Watson	9b7915859d	Add "show allpcpu" to DDB, which prints the current CPU id followed by the per-cpu data for all CPUs. This is easier to ask users to do than "figure out how many CPUs you have, now run show pcpu, then run it once for each CPU you have". MFC after: 3 days	2005-09-26 16:55:11 +00:00
David Xu	2b7182c6b7	Reorder statements to avoid accessing unknown memory. In theory, invoking kenv with very long string can panic kernel.	2005-09-26 14:14:55 +00:00
Robert Watson	329c75a730	Acquire Giant in uprintf() and tprintf() rather than asserting it. In the vast majority of cases, these functions are called without mutexes held, meaning that in all but two cases, there will be no ordering issues with doing this, and it will eliminate the need for changes in the caller. In two cases, mutexes are held, so Giant must be acquired before those mutexes such that uprintf() and tprintf() recurse Giant rather than generating a lock order reversal. Suggested by: bde	2005-09-26 08:02:24 +00:00
Poul-Henning Kamp	2b35175c8a	Add rman_is_region_manager() for the benefit of an alpha hack.	2005-09-25 20:10:10 +00:00
Christian S.J. Peron	c47a4d1c9f	Implement new world order in VFS locking for extended attributes. This will remove the unconditional acquisition of Giant for extended attribute related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup Giant. Mark the following system calls as being MP safe so we no longer pickup Giant in the system call handler: o extattrctl o extattr_set_file o extattr_get_file o extattr_delete_file o extattr_set_fd o extattr_get_fd o extattr_delete_fd o extattr_set_link o extattr_get_link o extattr_delete_link o extattr_list_file o extattr_list_link o extattr_list_fd -Pass MPSAFE flags to namei(9) lookup and introduce vfslocked variable which will keep track of any Giant acquisitions. -Wrap any fd operations which manipulate vnodes in VFS_{UN}LOCK_GIANT -Drop VFS_ASSERT_GIANT into function which operate on vnodes to ensure that we are sufficiently protected. I've tested these changes with various TrustedBSD MAC policies which use extended attribute a lot on SMP and UP systems (thanks to Scott Long for making some SMP hardware available to me for testing). Discussed with: jeff Requested by: jhb, rwatson	2005-09-24 23:47:04 +00:00
Poul-Henning Kamp	ae7ff71f63	Split struct resource in an external and internal part. The external part is still called 'struct resource' but the contents is now visible to drivers etc. This makes it part of the device driver ABI so it not be changed lightly. A comment to this effect is in place. The internal part is called 'struct resource_i' and contain its external counterpart as one field. Move the bus_space tag+handle into the external struct resource, this removes the need for device drivers to even know about these fields in order to use bus_space to access hardware. (More in following commit).	2005-09-24 20:07:03 +00:00
Poul-Henning Kamp	a778923149	Add two convenience functions for device drivers: bus_alloc_resources() and bus_free_resources(). These functions take a list of resources and handle them all in one go. A flag makes it possible to mark a resource as optional. A typical device driver can save 10-30 lines of code by using these. Usage examples will follow RSN. MFC: A good idea, eventually.	2005-09-24 19:31:10 +00:00
Robert Watson	e1ac28e239	Canonicalize the UNIX domain socket copyright layout: original holders before more recent holders. MFC after: 3 days	2005-09-23 12:41:06 +00:00
Stephan Uphoff	3fafa27b27	Don't pretend to be thread0 when calling sync(). It confuses the lock manager since in some places thread0 is then used for vnode locking while curthread is used for vnode unlocking. Found by: Yahoo! Reviewed by: ps@,jhb@ MFC after: 3 days	2005-09-22 15:34:15 +00:00
David Xu	a861574011	Temporarily disable nice threshold detection code, as it can starve a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087	2005-09-22 01:19:37 +00:00
John Baldwin	e12560dd4b	Use correct VFS locking rather than unconditionally grabbing Giant around namei() calls in kern_alternate_path(). Reviewed by: csjp MFC after: 1 week	2005-09-21 19:49:42 +00:00
Robert Watson	87328e07e0	Pass 'curthread' into VFS_STATFS() from acctwatch(), rather than passing NULL. The NFS client expects that a thread will always be present for a VOP so that it can check for signal conditions, and will dereference a NULL pointer if one isn't present. MFC after: 3 days	2005-09-21 15:28:07 +00:00
Robert Watson	5580b0b157	Correct an incorrect comment from the dawn of time: neither tprintf() nor uprintf() is believed to perform tsleep() or msleep() as written, as ttycheckoutq() is called with '0' as its sleep argument. Remove recently added WITNESS warnings for sleep as the comment was incorrect. This should silence a warning from the nfs_timer() code. Discussed with: bde	2005-09-20 09:55:36 +00:00
Andre Oppermann	e452573df7	Start time_uptime with 1 instead of 0. Discussed with: phk	2005-09-19 22:16:31 +00:00
Poul-Henning Kamp	e606a3c63e	Rewamp DEVFS internals pretty severely [1]. Give DEVFS a proper inode called struct cdev_priv. It is important to keep in mind that this "inode" is shared between all DEVFS mountpoints, therefore it is protected by the global device mutex. Link the cdev_priv's into a list, protected by the global device mutex. Keep track of each cdev_priv's state with a flag bit and of references from mountpoints with a dedicated usecount. Reap the benefits of much improved kernel memory allocator and the generally better defined device driver APIs to get rid of the tables of pointers + serial numbers, their overflow tables, the atomics to muck about in them and all the trouble that resulted in. This makes RAM the only limit on how many devices we can have. The cdev_priv is actually a super struct containing the normal cdev as the "public" part, and therefore allocation and freeing has moved to devfs_devs.c from kern_conf.c. The overall responsibility is (to be) split such that kern/kern_conf.c is the stuff that deals with drivers and struct cdev and fs/devfs handles filesystems and struct cdev_priv and their private liason exposed only in devfs_int.h. Move the inode number from cdev to cdev_priv and allocate inode numbers properly with unr. Local dirents in the mountpoints (directories, symlinks) allocate inodes from the same pool to guarantee against overlaps. Various other fields are going to migrate from cdev to cdev_priv in the future in order to hide them. A few fields may migrate from devfs_dirent to cdev_priv as well. Protect the DEVFS mountpoint with an sx lock instead of lockmgr, this lock also protects the directory tree of the mountpoint. Give each mountpoint a unique integer index, allocated with unr. Use it into an array of devfs_dirent pointers in each cdev_priv. Initially the array points to a single element also inside cdev_priv, but as more devfs instances are mounted, the array is extended with malloc(9) as necessary when the filesystem populates its directory tree. Retire the cdev alias lists, the cdev_priv now know about all the relevant devfs_dirents (and their vnodes) and devfs_revoke() will pick them up from there. We still spelunk into other mountpoints and fondle their data without 100% good locking. It may make better sense to vector the revoke event into the tty code and there do a destroy_dev/make_dev on the tty's devices, but that's for further study. Lots of shuffling of stuff and churn of bits for no good reason[2]. XXX: There is still nothing preventing the dev_clone EVENTHANDLER from being invoked at the same time in two devfs mountpoints. It is not obvious what the best course of action is here. XXX: comment out an if statement that lost its body, until I can find out what should go there so it doesn't do damage in the meantime. XXX: Leave in a few extra malloc types and KASSERTS to help track down any remaining issues. Much testing provided by: Kris Much confusion caused by (races in): md(4) [1] You are not supposed to understand anything past this point. [2] This line should simplify life for the peanut gallery.	2005-09-19 19:56:48 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Robert Watson	223aaaecb0	Remove mac_create_root_mount() and mpo_create_root_mount(), which provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA	2005-09-19 13:59:57 +00:00
Marcel Moolenaar	73130b2224	Move the UUID generator into its own function, called kern_uuidgen(), so that UUIDs can be generated from within the kernel. The uuidgen(2) syscall now allocates kernel memory, calls the generator, and does a copyout() for the whole UUID store. This change is in support of GPT.	2005-09-18 21:40:15 +00:00
Robert Watson	8434c29b28	Add three new read-only socket options, which allow regression tests and other applications to query the state of the stack regarding the accept queue on a listen socket: SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog) SO_LISTENQLEN Return the value of so_qlen (complete sockets) SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets) Minor white space tweaks to existing socket options to make them consistent. Discussed with: andre MFC after: 1 week	2005-09-18 21:08:03 +00:00
Robert Watson	bc6b8b5d64	Fix spelling in a comment. MFC after: 3 days	2005-09-18 10:46:34 +00:00
Robert Watson	7da7362b95	Re-comment sbcompress() to explain what it is it does; it took me quite a bit of reading to figure it out, and I want to avoid figuring it out again. Convert an if (foo) else printf("this is almost a panic") into a KASSERT. MFC after: 3 days	2005-09-18 10:30:10 +00:00
Warner Losh	fe0519b171	MFp4: Expose device_probe_child()	2005-09-18 01:32:09 +00:00
Christian S.J. Peron	42e7197fba	Implement new world order in VFS locking for ACLs. This will remove the unconditional acquisition of Giant for ACL related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup giant. For any operations which require namei(9) lookups: __acl_get_file __acl_get_link __acl_set_file __acl_set_link __acl_delete_file __acl_delete_link __acl_aclcheck_file __acl_aclcheck_link -Set the MPSAFE flag in NDINIT -Initialize vfslocked variable using the NDHASGIANT macro For functions which operate on fds, make sure the operations are locked: __acl_get_fd __acl_set_fd __acl_delete_fd __acl_aclcheck_fd -Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode Discussed with: jeff	2005-09-17 22:01:14 +00:00
Tor Egge	61ac14dab6	Break out of loop if next buffer pointer has become invalid while flushing current buffer. Reviewed by: kan	2005-09-16 18:28:12 +00:00
Stephan Uphoff	19b2dff7b0	Fix race condition that caused activation of an event to be ignored immediately after it was deactivated. Found by: Yahoo! MFC after: 3 days	2005-09-15 21:10:12 +00:00
John Baldwin	21f9e816cd	Oops, missed adding the required include. Pointy hat to: jhb	2005-09-15 20:20:36 +00:00
John Baldwin	53c0e1ff7d	Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down}) with the disallow sleeping facility.	2005-09-15 20:09:08 +00:00
John Baldwin	10f508d9a3	Don't disallow sleeping for handlers on swi's since some swi handlers (like CAM) do sleep in their handlers. Requested by: scottl	2005-09-15 20:08:21 +00:00
John Baldwin	b27dbfbf4a	- Enforce an implicit lock order that Giant cannot be locked while holding any other non-sleepable lock. In plain English: Giant comes before all other mutexes. - Add some extra description to the lock order reversal printf's to indicate when a reversal is triggered by a hard-coded implicit rule. Requested by: truckman (2) MFC after: 1 week	2005-09-15 19:07:14 +00:00
John Baldwin	51460da87f	- Add a new simple facility for marking the current thread as being in a state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk	2005-09-15 19:05:37 +00:00
Christian S.J. Peron	68ff2a4397	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days	2005-09-15 15:03:48 +00:00
Maxim Konovalov	aada5cccd8	Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected sockets. Pointed out by: rwatson	2005-09-15 13:18:05 +00:00
Ralf S. Engelschall	724447ac41	Fix system shutdown timeout handling by again supporting longer running shutdown procedures (which have a duration of more than 120 seconds). We have two user-space affecting shutdown timeouts: a "soft" one in /etc/rc.shutdown and a "hard" one in init(8). The first one can be configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults to 30 seconds. The second one was originally (in 1998) intended to be configured via sysctl(8) variable "kern.shutdown_timeout" and defaults to 120 seconds. Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999 (as it obviously is actually not used within the kernel itself) and hence was intentionally but misleadingly removed in revision 1.107 from init_main.c. Kernel sysctl(8) variables are certainly a wrong way to control user-space processes in general, but in this particular case the sysctl(8) variable should have remained as it supports init(8), which isn't passed command line flags (which in turn could have been set via /etc/rc.conf), etc. As there is already a similar "kern.init_path" sysctl(8) variable which directly affects init(8), resurrect the init(8) shutdown timeout under sysctl(8) variable "kern.init_shutdown_timeout". But this time document it as being intentionally unused within the kernel and used by init(8). Also document it in the manpages init(8) and rc.conf(5). Reviewed by: phk MFC after: 2 weeks	2005-09-15 13:16:07 +00:00
Maxim Konovalov	c5cff17017	o Return ENOTCONN when shutdown(2) on non-connected socket. PR: kern/84761 Submitted by: James Juran R-test: tools/regression/sockets/shutdown MFC after: 1 month	2005-09-15 11:45:36 +00:00
Poul-Henning Kamp	74f46f19aa	Retire unused dev_named() function.	2005-09-15 08:01:57 +00:00
Robert Watson	fd1a469ba5	In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported kqueue filter type is requested on a vnode. MFC after: 3 days	2005-09-12 19:22:37 +00:00
Jung-uk Kim	9ed448b20c	use monotonic `time_uptime' instead of` time_second' Approved by: anholt (mentor) Discussed on: arch	2005-09-12 15:31:28 +00:00
Poul-Henning Kamp	2883ba6668	Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.	2005-09-12 08:46:07 +00:00
Tor Egge	6ff5e2db45	Don't retry when vget() returns ENOENT in the nonblocking case due to the vnode being doomed. It causes a livelock.	2005-09-12 01:48:57 +00:00
Don Lewis	908b3deb2b	Relocate witness_levelall(), witness_leveldescendents(), and witness_displaydescendants() so that they are protected by "#ifdef DDB/#endif" to unbreak kernels not using "option DDB". MFC after: 3 weeks	2005-09-11 07:57:06 +00:00
Gleb Smirnoff	d04304d155	Make callout_reset() return a non-zero value if a pending callout was rescheduled. If there was no pending callout, then return 0. Reviewed by: iedowse, cperciva	2005-09-08 14:20:39 +00:00
Don Lewis	d07f87a218	Add a new struct buf flag bit, B_PERSISTENT, and use it to tag struct bufs that are persistently held by ext2fs. Ignore any buffers with this flag in the code in boot() that counts "busy" and dirty buffers and attempts to sync the dirty buffers, which is done before attempting to unmount all the file systems during shutdown. This fixes the problem caused by any ext2fs file systems that are mounted at system shutdown time, which caused boot() to give up on a non-zero number of buffers and skip the call to vfs_unmountall(). This left all the mounted file systems in a dirty state and caused them to all require cleanup by fsck on reboot. Move the two separate copies of the "busy" buffer test in boot() to a separate function. Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro. Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with this and previous flag changes. PR: kern/56675, kern/85163 Tested by: "Matthias Andree" matthias.andree at gmx.de Reviewed by: bde MFC after: 3 days	2005-09-08 06:30:05 +00:00
David E. O'Brien	5b1c0294e4	Forward declaring static variables as extern is invalid ISO-C. Now that GCC can properly handle forward static declarations, do this properly.	2005-09-07 10:06:14 +00:00
Gleb Smirnoff	016e62123a	In soreceive(), when a first mbuf is removed from socket buffer use sockbuf_pushsync(). Previous manipulation could lead to an inconsistent mbuf. Reviewed by: rwatson	2005-09-06 17:05:11 +00:00
Gleb Smirnoff	f46ab10c02	Document flags of a pollrec.	2005-09-06 11:09:18 +00:00
Christian S.J. Peron	d1dfd92177	Convert the primary ACL allocator from malloc(9) to using a UMA zone instead. Also introduce an aclinit function which will be used to create the UMA zone for use by file systems at system start up. MFC after: 1 month Discussed with: rwatson	2005-09-06 00:06:30 +00:00
Gleb Smirnoff	16901c0186	Remove Giant mutex from polling(4) and use a separate poll_mtx(4) instead. Detailed changelist: o Add flags field to struct pollrec, to indicate that are particular entry is being worked on. o Define a macro PR_VALID() to check that a pollrec is valid and pollable. o Mark ISRs as mpsafe. o ether_poll() - Acquire poll_mtx while traversing pollrec array. - Skip pollrecs, that are being worked on. - Conditionally acquire Giant when entering handler. o netisr_pollmore() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics. o netisr_poll() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics and traversing pollrec array. o ether_poll_register(), ether_poll_deregister() - Conditionally assert Giant. - Acquire poll_mtx while working with pollrec array. o poll_idle() - Remove all strange manipulations with Giant. In collaboration with: ru, pjd In collaboration with: Oleg Bulyzhin <oleg rinet.ru> In collaboration with: dima <_pppp mail.ru>	2005-09-05 16:02:11 +00:00
Xin LI	5248ef8a3c	When padding with zero, do pad after prefixes rather than padding before prefixes. Use cases: printf("%05d", -42); --> "00-42" (should be "-0042") printf("%#05x", 12); --> "000xc" (should be "0x00c") Submitted by: Oliver Fromme PR: kern/85520 MFC After: 1 week	2005-09-04 18:03:45 +00:00
Poul-Henning Kamp	1e7d2c4763	If we ignore an unknown % sequence, we must stop interpreting the remaining % arguments because the varargs are now out of sync and there is a risk that we might for instance dereference an integer in a %s argument. Sponsored by: Napatech.com	2005-09-03 10:28:08 +00:00
John Baldwin	acc0265cc2	- Add some comments to some of the static lock orders. Don't explicitly link proctree and allproc to Giant since that order is already implicitly enforced. - Use a goto to handle the case where we want to enforce a reversal before calling isitmydescendant() in witness_checkorder() so that the logic is easier to follow and so that it is easier to add more forced-reversal cases in the future. MFC after: 3 days	2005-09-02 20:23:49 +00:00
John Baldwin	83cece6fa1	- Add an assertion to panic if one tries to call mtx_trylock() on a spin mutex. - Don't panic if a spin lock is held too long inside _mtx_lock_spin() if panicstr is set (meaning that we are already in a panic). Just keep spinning forever instead.	2005-09-02 20:21:49 +00:00
John Baldwin	83de502d59	Add witness warnings to panic if a thread tries to exit while holding any locks. Requested by: jeff MFC after: 3 days	2005-09-02 20:20:01 +00:00
Nate Lawson	9000b91eb9	Break out the checks for duplicates and absolute settings being too high instead of trying to do them all at once. This should fix the level sorting problems from the previous revision. Testing help: ume	2005-09-02 16:32:43 +00:00
Suleiman Souhlal	1f71de49e1	Print out a warning and a backtrace if we try to unlock a lockmgr that we do not hold. Glanced at by: phk MFC after: 3 days	2005-09-02 15:56:01 +00:00
Suleiman Souhlal	2611e5a6a9	Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed and unbusied in devfs_fixup(), which assumes that the devfs mount is still locked. Granced at by: phk MFC after: 3 days	2005-09-02 13:37:54 +00:00
Pawel Jakub Dawidek	d8b464e51e	In case of mac_check_vnode_rename_from() or vn_start_write() failure, vn_finished_write() should not be called. Reviewed by: ssouhlal MFC after: 3 days	2005-09-01 21:46:33 +00:00
Andre Oppermann	fdcc028d11	Changes and cleanups to m_sanity(): o for() instead of while() looping over mbuf chain o paren's around all flag checks o more verbose function and purpose description o some more style changes Based on feedback from: sam	2005-08-30 21:31:42 +00:00
Andre Oppermann	e0068c3a69	Unbreak m_demote() and put back the 'all' flag. Without it we cannot correctly test for m_nextpkt in an mbuf chain.	2005-08-30 21:14:30 +00:00
Andre Oppermann	fbe816384a	o Remove the 'all' flag from m_demote(). Users can simply call it with m_demote(m->m_next) if they wish to start at the second mbuf in chain. o Test m_type with == instead of &. o Check m_nextpkt against NULL instead of implicit 0. Based on feedback from: sam	2005-08-30 20:07:49 +00:00
Nate Lawson	5308b2a64e	Eliminate cpufreq levels for two cases that are less than optimal: 1. Walk the absolute list in reverse to prefer duplicated levels that have a lower absolute setting, i.e. 800 Mhz/50% is better than 1600 Mhz/25% even though both have the same actual frequency. This also removes the need to check for already-modified levels since by definition, those will be added later in the sorted list. 2. Compare the absolute settings for derived levels and don't use the new level if it's higher. For example, a level of 800 Mhz/75% is preferable to 1600 Mhz/25% even though the latter has a lower total frequency. This work is based on a patch from the submitter but reworked by myself. Submitted by: Tijl Coosemans (tijl/ulyssis.org)	2005-08-30 04:45:32 +00:00
Andre Oppermann	4da8443133	Add m_copymdata(struct mbuf m, struct mbuf n, int off, int len, int prep, int how). Copies the data portion of mbuf (chain) n starting from offset off for length len to mbuf (chain) m. Depending on prep the copied data will be appended or prepended. The function ensures that the mbuf (chain) m will be fully writeable by making real (not refcnt) copies of mbuf clusters. For the prepending the function returns a pointer to the new start of mbuf chain m and leaves as much leading space as possible in the new first mbuf. Reviewed by: glebius	2005-08-29 20:15:33 +00:00
Andre Oppermann	a048affba5	Add m_sanity(struct mbuf *m, int sanitize) to do some heavy sanity checking on mbuf's and mbuf chains. Set sanitize to 1 to garble illegal things and have them blow up later when used/accessed. m_sanity()'s main purpose is for KASSERT()'s and debugging of non- kosher mbuf manipulation (of which we have a number of). Reviewed by: glebius	2005-08-29 19:58:56 +00:00
Andre Oppermann	ed111688e9	Add m_demote(struct mbuf *m, int all) to clean up mbuf (chain) from any tags and packet headers. If "all" is set then the first mbuf in the chain will be cleaned too. This function is used before an mbuf, that arrived as packet with m->flags & M_PKTHDR, is appended to an mbuf chain using m->m_next (not m->m_nextpkt). Reviewed by: glebius	2005-08-29 19:45:39 +00:00
Pawel Jakub Dawidek	e37a499443	Add 'depth' argument to CTRSTACK() macro, which allows to reduce number of ktr slots used. If 'depth' is equal to 0, the whole stack will be logged, just like before.	2005-08-29 11:34:08 +00:00
Suleiman Souhlal	a6c109d658	Fix a typo in vop_rename_pre() where we ended up using vholdl() instead of vhold(), even though the vnode interlock is unlocked. MFC after: 3 days	2005-08-28 23:00:11 +00:00
Alan Cox	7f1ef325d7	Handle vm_map_wire()'s failure.	2005-08-28 05:38:40 +00:00
Alan Cox	5d3043ce9a	Correctly handle vm_map_wire()'s failure. (See also revisions 1.81 and 1.82.) Reviewed by: tegge	2005-08-28 04:50:11 +00:00
Alan Cox	45e31b6034	Eliminate an unneeded reference on a vm object. If, in fact, the nearby vm_map_find() fails, then the excess reference causes the vm object to be leaked. Reviewed by: tegge	2005-08-28 00:24:58 +00:00
Alan Cox	4167396552	Revert the previous change for two reasons: (1) If vm_map_find() succeeds but vm_map_wire() fails, then a vm object, vm map entries, and kernel_map free space is leaked and (2) unwiring is handled automatically by vm_map_remove(). Suggested by: tegge	2005-08-28 00:19:54 +00:00
Dag-Erling Smørgrav	d09dfa2bfd	Two minor optimizations of fdalloc(): - if minfd < fd_freefile (as is most often the case, since minfd is usually 0), set it to fd_freefile. - remove a call to fd_first_free() which duplicates work already done by fdused(). This change results in a small but measurable speedup for processes with large numbers (several thousands) of open files. PR: kern/85176 Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz> MFC after: 3 weeks	2005-08-26 11:16:39 +00:00
Don Lewis	4053cae340	Track all lock relationships instead of pruning direct relationships if an indirect relationship exists (keep both A->B->C and A->C). This allows witness_checkorder() to use isitmychild() instead of the much more expensive isitmydescendant() to check for valid lock ordering. Don't do an expensive tree walk to update the w_level values when the tree is updated. Only update the w_level values when using the debugger to display the tree. Nuke the experimental "witness_watch > 1" mode that only compared w_level for the two locks. This information is no longer maintained at run time, and the use of isitmychild() in witness_checkorder should bring performance close enough to the acceptable level that this hack is not needed. Report witness data structure allocation statistics under the debug.witness sysctl. Reviewed by: jhb MFC after: 30 days	2005-08-25 03:47:37 +00:00
Don Lewis	ad9f180121	Back out the removal of LK_NOWAIT from the VOP_LOCK() call in vlrureclaim() in vfs_subr.c 1.636 because waiting for the vnode lock aggravates an existing race condition. It is also undesirable according to the commit log for 1.631. Fix the tiny race condition that remains by rechecking the vnode state after grabbing the vnode lock and grabbing the vnode interlock. Fix the problem of other threads being starved (which 1.636 attempted to fix by removing LK_NOWAIT) by calling uio_yield() periodically in vlrureclaim(). This should be more deterministic than hoping that VOP_LOCK() without LK_NOWAIT will block, which may not happen in this loop. Reviewed by: kan MFC after: 5 days	2005-08-23 03:44:06 +00:00
Pawel Jakub Dawidek	4e4aa37e75	mp_ncpus is always (properly) initialized, even on UP kernels, so just use it.	2005-08-21 18:03:31 +00:00
Robert Watson	6cd8dee3c5	Silence "busy" warnings when unmounting devfs at system shutdown. This is a workaround for non-symetric teardown of the file systems at shutdown with respect to the mount order at boot. The proper long term fix is to properly detach devfs from the root mount before unmounting each, and should be implemented, but since the problem is non-harmful, this temporary band-aid will prevent false positive bug reports and unnecessary error output for 6.0-RELEASE. MFC after: 3 days Tested by: pav, pjd	2005-08-20 17:12:47 +00:00
Poul-Henning Kamp	1d45c50ec3	Properly un-giant-trick the cdevsw in fini_cdevsw() Tripped over by: Huang wen hui <huang@gddsn.org.cn>	2005-08-20 12:13:51 +00:00
David Xu	86ef8e2671	Add missing brackets. Noticed by: stefanf@	2005-08-19 22:30:13 +00:00
David Xu	8c6d7a8db8	Fix a LOR between sched_lock and sleep queue lock.	2005-08-19 13:35:34 +00:00
David Xu	f8ec133ed0	Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly for PRI_ITHD and PRI_REALTIME threads.	2005-08-19 11:51:41 +00:00
Hajimu UMEMOTO	1fea6ce7dd	- don't forget to save freqency when priority is raised. - nuke redundant variable initialization.	2005-08-18 16:41:25 +00:00
Hajimu UMEMOTO	5f36393468	don't forget to update curr_priority. even when frequency is not changed, priority may be changed.	2005-08-18 16:08:56 +00:00
Poul-Henning Kamp	516ad423b1	Handle device drivers with D_NEEDGIANT in a way which does not penalize the 'good' drivers: Allocate a shadow cdevsw and populate it with wrapper functions which grab Giant	2005-08-17 08:19:52 +00:00
Poul-Henning Kamp	a07b0febaa	In vop_stdpathconf(ap) also default for _PC_NAME_MAX and _PC_PATH_MAX.	2005-08-17 06:59:23 +00:00
Hajimu UMEMOTO	961f7f911f	Save cpu level only when priority is greater than PRIO_USER to make CPUFREQ_SET(NULL, prio) work. TODO: implement saved_level as stack. Reviewed by: njl	2005-08-16 20:03:08 +00:00
Poul-Henning Kamp	b3740d656f	Remove stale comment.	2005-08-16 19:47:42 +00:00
Poul-Henning Kamp	31cc57cdbd	Collect the devfs related sysctls in one place	2005-08-16 19:25:02 +00:00
Poul-Henning Kamp	9c0af1310c	Create a new internal .h file to communicate very private stuff from kern_conf.c to devfs. For now just two prototypes, more to come.	2005-08-16 19:08:01 +00:00
Alexander Kabaev	0c207975f2	Do not keep parent directory locked while calling VFS_ROOT to traverse mount points in lookup(). The lock can be dropped safely around VFS_ROOT because LOCKPARENT semantics with child and perent vnodes coming from different FSes does not really have any meaningful use. On the other hard, this prevents easily triggered deadlock on systems using automounter daemon.	2005-08-14 18:10:04 +00:00
Alexander Kabaev	857b66d505	Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable. vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way. Reported by: ade Obtained from: alc MFC after: 2 days	2005-08-13 20:21:33 +00:00
Marcel Moolenaar	fd65baf8e2	Make mpsafe_vfs=1 the default on ia64.	2005-08-13 20:07:50 +00:00
Nate Lawson	da8a77c1f1	The "lowest" sysctl setting makes more sense as the lowest one to use, so discard all levels less than this setting, not less than/equal to. MFC after: 1 day	2005-08-11 18:40:58 +00:00
Alexander Kabaev	45a0d1ed7a	Do not drop the vnode interlock if vdropl is called on already doomed vnode. vdropl callers expect it to return with interlock still being held. MFC after: 2 days	2005-08-10 11:46:03 +00:00
Robert Watson	ae018704a1	Add an order between UDP inpcb locks and the IPv4 multicast address list lock, as there has been a report that an alternative lock order is getting introduced. This should help ferret it out. Reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>	2005-08-09 13:27:50 +00:00
Robert Watson	13f4c340ae	Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to ifnet.if_drv_flags. Device drivers are now responsible for synchronizing access to these flags, as they are in if_drv_flags. This helps prevent races between the network stack and device driver in maintaining the interface flags field. Many __FreeBSD__ and __FreeBSD_version checks maintained and continued; some less so. Reviewed by: pjd, bz MFC after: 7 days	2005-08-09 10:20:02 +00:00
Christian S.J. Peron	d8339a2616	Drop in a WITNESS_WARN into SYSCTL_IN to make sure that we are not holding any non-sleep-able-locks locks when copyin is called. This gets executed un-conditionally since we have no function to wire the buffer in this direction. Pointed out by: truckman MFC after: 1 week	2005-08-08 21:06:42 +00:00
Robert Watson	6a113b3de7	Merge the dev_clone and dev_clone_cred event handlers into a single event handler, dev_clone, which accepts a credential argument. Implementors of the event can ignore it if they're not interested, and most do. This avoids having multiple event handler types and fall-back/precedence logic in devfs. This changes the kernel API for /dev cloning, and may affect third party packages containg cloning kernel modules. Requested by: phk MFC after: 3 days	2005-08-08 19:55:32 +00:00
Christian S.J. Peron	417ab24f78	Check to see if we wired the user-supplied buffers in SYSCTL_OUT, if the buffer has not been wired and we are holding any non-sleep-able locks, drop a witness warning. If the buffer has not been wired, it is possible that the writing of the data can sleep, especially if the page is not in memory. This can result in a number of different locking issues, including dead locks. MFC after: 1 week Discussed with: rwatson Reviewed by: jhb	2005-08-08 18:54:35 +00:00
David Xu	1278181c6c	Try best to keep a preempted thread at front of run queue, this seems improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.	2005-08-08 14:20:10 +00:00
Peter Grehan	e000e00118	Export a routine, kobj_machdep_init(), that allows platforms to use the kobj subsystem as soon at mutex_init() has been called instead of having to wait for the SI_SUB_LOCK sysinit. Reviewed by: dfr	2005-08-07 02:20:35 +00:00
Christian S.J. Peron	9baea4b4b4	Change the data type of the upper shared memory limits from a signed integer to an unsigned long. This lifts variables like the maximum number of pages available for shared memory from 2^31 to 2^32 on 32 bit architectures, and from 2^31 to 2^64 on 64 bit architectures. It should be noted that this changes breaks ABI on 64 bit architectures because the size of the shmmax, shmmin, shmmni, shmseg and shmall members of the shminfo structure has changed. Silence on: current@	2005-08-06 07:20:18 +00:00
Suleiman Souhlal	34cc826ae8	Holding a vnode doesn't prevent v_mount from disappearing (when the vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days	2005-08-06 01:42:04 +00:00
Robert Watson	dd5a318ba3	Introduce in_multi_mtx, which will protect IPv4-layer multicast address lists, as well as accessor macros. For now, this is a recursive mutex due code sequences where IPv4 multicast calls into IGMP calls into ip_output(), which then tests for a multicast forwarding case. For support macros in in_var.h to check multicast address lists, assert that in_multi_mtx is held. Acquire in_multi_mtx around iteration over the IPv4 multicast address lists, such as in ip_input() and ip_output(). Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses, as well as over the manipulation of ifnet multicast address lists in order to keep the two layers in sync. Lock down accesses to IPv4 multicast addresses in IGMP, or assert the lock when performing IGMP join/leave events. Eliminate spl's associated with IPv4 multicast addresses, portions of IGMP that weren't previously expunged by IGMP locking. Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded lock order in WITNESS, in that order. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 10 days	2005-08-03 19:29:47 +00:00
Jeff Roberson	40a495853a	- Unlock before we call mac_destroy_vnode to prevent a lock order reversal. Found by: trhodes	2005-08-03 05:36:50 +00:00
Jeff Roberson	9e2aaec1e3	- Use lockmgr_printinfo rather than rolling our own. This introduces a slight problem by using printf instead of db_printf however 'show lockedvnods' does the same so I believe it is ok for now.	2005-08-03 05:02:08 +00:00
Jeff Roberson	7499fd8de9	- Fix a problem that slipped through review; the stack member of the lockmgr structure should have the lk_ prefix. - Add stack_print(lkp->lk_stack) to the information printed with lockmgr_printinfo().	2005-08-03 04:59:07 +00:00
Jeff Roberson	e8ddb61d38	- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>	2005-08-03 04:48:22 +00:00
Jeff Roberson	8d511e2a05	- Add support for saving stack traces and displaying them via printf(9) and KTR. Contributed by: Antoine Brodin <antoine.brodin@laposte.net> Concept code from: Neal Fachan <neal@isilon.com>	2005-08-03 04:27:40 +00:00
David Xu	3c424d1447	In adjustrunqueue(), add code to handle thread migrating case for ULE scheduler. In original code, local run queue of threaded ksegrp is corrupted if adjustrunqueue() is called while thread is migrating.	2005-08-03 01:23:45 +00:00
Ruslan Ermilov	2319835713	Long overdue, keep up with mbuf.h,v 1.148.	2005-08-02 20:03:23 +00:00
Kelly Yancey	dcb5fef5db	Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std 1003.1 (POSIX).	2005-08-01 21:15:09 +00:00
David Xu	3d16f519b6	If a thread was removed from system run queue, kse_assign shouldn't add it again.	2005-07-31 15:11:21 +00:00
Alexander Leidinger	32069af652	The resource_xxx routines in subr_hints.c are called before and after the kenv environment in kern_environment.c switches to dynamic kenv. The prior call sets the static variable hintp to the static hints in subr_hints.c (hintmode==0). However, changes to the environment are not detected by the resource_xxx lookups after the change to dynamic kernel environment, so the lookup routines only report the old stuff of hintmode==0, even after the change to the dynamic kenv. This causes kenv users to see a different environment than the kernel routines. This is a problem in the mixer.c code that looks up initial mixer volume settings from the hints: If the hints are dynamic and not from the device.hints file, mixer.c doesn't see them, but kenv does. The patch from the PR (modified to comply to the style of the function) solves this. PR: 83686 Submitted by: Harry Coin <harrycoin@qconline.com>	2005-07-31 10:46:55 +00:00
Alexander Leidinger	3904769ba8	Add bounds checking to the setenv part of the kernel environment. This has no security implications since only root is allowed to use kenv(1) (and corrupt the kernel memory after adding too much variables previous to this commit). This is based upon the PR [1] mentioned below, but extended to check both bounds (in case of an overflow of the counting variable) and to comply to the style of the function. An overflow of the counting variable shouldn't happen after adding the check for the upper bound, but better safe than sorry (in case some other function in the kernel overwrites random memory). An interested soul may want to add a printf to notify root in case the bounds are hit. Also allocate KENV_SIZE+1 entries (the array is NULL-terminated), since the comment for KENV_SIZE says it's the maximum number of environment strings. [2] PR: 83687 [1] Submitted by: Harry Coin <harrycoin@qconline.com> [1] Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my> [2]	2005-07-31 10:28:35 +00:00
Joseph Koshy	fadcc6e201	Fail the module loading process if the currently executing kernel was not compiled with 'options HWPMC_HOOKS' or if the compiled-in version numbers of the kernel and module are out of sync. Reported by: cracauer MFC after: 3 days	2005-07-30 09:02:42 +00:00
Paul Saab	1126349ae7	Ignore mutex asserts when we're dumping as well. This allows me to panic a system from DDB when INVARIANTS is compiled into the kernel on a scsi system.	2005-07-30 05:54:30 +00:00
Sam Leffler	ab8ab90c5b	add m_align, a function to align any type of mbuf (i.e. it is a superset of M_ALIGN and MH_ALIGN) Reviewed by: several	2005-07-30 01:32:16 +00:00
R. Imura	080e3a63b3	Change API of mb_copy_t in libmchain so that netsmb can handle multibyte character share name correctly. Reviewed by: bp	2005-07-29 13:22:37 +00:00
George V. Neville-Neil	0d52d7b01a	Fix for PR 83885. Make sure that there actually is a next packet before setting nextrecord to that field. PR: 83885 Submitted by: hirose@comm.yamaha.co.jp Obtained from: Patch suggested in the PR MFC after: 1 week	2005-07-28 10:10:01 +00:00
Pawel Jakub Dawidek	73864adbd4	Fix the way how "InUse" column in 'vmstat -m' output works: - increase number of allocations count only on successfull malloc(9), so it doesn't confuse people; - because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;' under the check as well, as for size=0 it is of course a no-op; - avoid critical_enter()/critical_exit() in case of failure in malloc_type_allocated() as there will be nothing to do. OK'ed by: rwatson MFC after: 2 days	2005-07-27 23:17:31 +00:00
Xin LI	05a6b7ad62	Cast to uintptr_t when the compiler complains. This unbreaks ULE scheduler breakage accompanied by the recent atomic_ptr() change.	2005-07-25 10:21:49 +00:00
Alan Cox	ec9c9e7363	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
Jeff Roberson	39b2406838	- Allow vnlru to drop giant if the filesystem does not require it. The vnlru proc is extremely inefficient, potentially iteration over tens of thousands of vnodes without blocking. Droping Giant allows other threads to preempt us although we should revisit the algorithm to fix the runtime problems especially since this may hold up all vnode allocations. - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim. This provides a natural blocking point to help alleviate the situation described above although it may not technically be desirable. - yield after we make a pass on all mount points to prevent us from blocking other threads which require Giant. MFC after: 2 weeks	2005-07-20 01:43:27 +00:00
John Baldwin	ddf9c4f771	- Slightly reorder the events around the setting of PRS_ZOMBIE to be less hokie and much more readable and expand the comment to explain why it is the way that it is. - Close a race where one CPU could free the process belonging to a thread on another CPU that hasn't quite finished exiting yet but is beyond the point of setting the process state as PRS_ZOMBIE. Reported and tested by: ps (2) MFC after: 3 days	2005-07-18 20:08:14 +00:00
Robert Watson	68352adfe7	Define four constants, MBUF_{,MEM,CLUSTER,PACKET,TAG}_MEM_NAME, which are string names for their respective UMA zones and malloc types, and are passed into uma_zcreate() and MALLOC_DEFINE(). Export them outside of _KERNEL in mbuf.h so that netstat can reference them. Change the names to improve consistency, with each zone/type associated with the mbuf allocator being prefixed mbuf_. MFC after: 1 week	2005-07-17 14:04:03 +00:00
John Baldwin	122eceef61	Convert the atomic_ptr() operations over to operating on uintptr_t variables rather than void * variables. This makes it easier and simpler to get asm constraints and volatile keywords correct. MFC after: 3 days Tested on: i386, alpha, sparc64 Compiled on: ia64, powerpc, amd64 Kernel toolchain busted on: arm	2005-07-15 18:17:59 +00:00
Robert Watson	4f8721d2a9	Correct build on 64-bit: cast u_int64_t to (unsigned long long) before printfing as (unsigned long long). 32-bit build on i386 didn't notice this. Whoops. Reported by: arved Tested by: sledge	2005-07-14 15:21:18 +00:00
Robert Watson	cd814b2692	Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc statistics via a binary structure stream: - Add structure 'malloc_type_stream_header', which defines a stream version, definition of MAXCPUS used in the stream, and a number of malloc_type records in the stream. - Add structure 'malloc_type_header', which defines the name of the malloc type being reported on. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUS malloc_type_stats structures holding per-CPU allocation information. Typical values of MAXCPUS will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here: - Bump statistics width to uint64_t, and hard code using fixed-width type in order to be more sure about structure layout in the stream. We allocate and free a lot of memory. - Add kmemcount, a counter of the number of registered malloc types, in order to avoid excessive manual counting of types. Export via a new sysctl to allow user-space code to better size buffers. - De-XXX comment on no longer maintaining the high watermark in old sysctl monitoring code. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. Likewise, similar changes to UMA.	2005-07-14 11:52:06 +00:00
Robert Watson	49bb6870cc	Bump the module versions of the MAC Framework and MAC policy modules from 2 (6.x) to 3 (7.x) to allow for future changes in the MAC policy module ABI in 7.x. Obtained from: TrustedBSD Project	2005-07-14 10:46:03 +00:00
Robert Watson	d26dd2d99e	When devfs cloning takes place, provide access to the credential of the process that caused the clone event to take place for the device driver creating the device. This allows cloned device drivers to adapt the device node based on security aspects of the process, such as the uid, gid, and MAC label. - Add a cred reference to struct cdev, so that when a device node is instantiated as a vnode, the cloning credential can be exposed to MAC. - Add make_dev_cred(), a version of make_dev() that additionally accepts the credential to stick in the struct cdev. Implement it and make_dev() in terms of a back-end make_dev_credv(). - Add a new event handler, dev_clone_cred, which can be registered to receive the credential instead of dev_clone, if desired. - Modify the MAC entry point mac_create_devfs_device() to accept an optional credential pointer (may be NULL), so that MAC policies can inspect and act on the label or other elements of the credential when initializing the skeleton device protections. - Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(), so that the pty clone credential is exposed to the MAC Framework. While currently primarily focussed on MAC policies, this change is also a prerequisite for changes to allow ptys to be instantiated with the UID of the process looking up the pty. This requires further changes to the pty driver -- in particular, to immediately recycle pty nodes on last close so that the credential-related state can be recreated on next lookup. Submitted by: Andrew Reisse <andrew.reisse@sparta.com> Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA MFC after: 1 week MFC note: Merge to 6.x, but not 5.x for ABI reasons	2005-07-14 10:22:09 +00:00
John Baldwin	2c65cb82ad	Add a 'sysent' target that depends on the various files built from syscalls.master for the master list and the Alpha/OSF1 compat ABI to be consistent with all the other compat ABIs where 'make sysent' already works. MFC after: 3 days	2005-07-13 20:50:17 +00:00
David Xu	740fd64d65	Validate if the value written into {FS,GS}.base is a canonical address, writting non-canonical address can cause kernel a panic, by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring only canonical values get written to the registers. Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com > Approved by: re (scottl)	2005-07-10 23:31:11 +00:00
John Baldwin	522ccb2381	Regen. Approved by: re (scottl)	2005-07-08 15:06:58 +00:00
John Baldwin	4acd2e73e5	Mark second instance of lchown() MP safe just like the first. Approved by: re (scottl)	2005-07-08 15:01:13 +00:00
John Baldwin	9f3157a254	Regenerate. Approved by: re (scottl)	2005-07-07 18:20:38 +00:00
John Baldwin	bcd9e0dd20	- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week	2005-07-07 18:17:55 +00:00
Robert Watson	6758f88ea4	Add MAC Framework and MAC policy entry point mac_check_socket_create(), which is invoked from socket() and socketpair(), permitting MAC policy modules to control the creation of sockets by domain, type, and protocol. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl) Requested by: SCC	2005-07-05 22:49:10 +00:00
Pawel Jakub Dawidek	c23c87bd93	Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp) below KASSERT()s, which means there was no real problem here, we just needed better locking for assertions. OK'ed by: jeff Approved by: re (scottl)	2005-07-05 15:57:55 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Joseph Koshy	151392465f	MFP4: - pmcstat(8) gprof output mode fixes: lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h: + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event, so that pmcstat(8) can determine where the runtime loader /libexec/ld-elf.so.1 is getting loaded. sys/kern/kern_exec.c: + Use a local struct to group the entry address of the image being exec()'ed and the process credential changed flag to the exec handling hook inside hwpmc(4). usr.sbin/pmcstat/*: + Support "-k kernelpath", "-D sampledir". + Implement the ELF bits of 'gmon.out' profile generation in a new file "pmcstat_log.c". Move all log related functions to this file. + Move local definitions and prototypes to "pmcstat.h" - Other bug fixes: + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read(). + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all attached PMCs when a process exits. + sys/sys/pmc.h: correct a function prototype. + Improve usage checks in pmcstat(8). Approved by: re (blanket hwpmc)	2005-06-30 19:01:26 +00:00
Paul Saab	cff2e749e2	Use SCTL_MASK32 to determine that the sysctl call is from a 32bit binary for kern.cp_time. Approved by: re	2005-06-30 17:17:29 +00:00
Peter Wemm	62919d788b	Jumbo-commit to enhance 32 bit application support on 64 bit kernels. This is good enough to be able to run a RELENG_4 gdb binary against a RELENG_4 application, along with various other tools (eg: 4.x gcore). We use this at work. ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace, procfs and core dumps. procfs_regs.c: vary the format of proc/XXX/regs depending on the client and target application. procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their sscanf fails. They expect an unsigned long. imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps. sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note that 64 bit consumers can still debug 32 bit targets. IA64 has got stubs for ia32_reg.c. Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't implemented in the 32/64 wrapper yet. We also make a tiny patch to gdb pacify it over conflicting formats of ld-elf.so.1. Approved by: re	2005-06-30 07:49:22 +00:00
Peter Wemm	48033188a6	Second part of commit for moving KDB_STOP_NMI from opt_global.h to opt_kdb.h. Found by: kris Approved by: re	2005-06-30 03:38:10 +00:00
Peter Wemm	2de92a386e	Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with a size of zero. Traditionally this was what you did before IOC_VOID existed, and we had some established users of this in the tree, namely procfs. Certain 3rd party drivers with binary userland components also have this too. This is necessary to have 4.x and 5.x binaries use these ioctl's. We found this at work when trying to run 4.x binaries. Approved by: re	2005-06-30 00:19:08 +00:00
Peter Wemm	f0c6706de9	Move the KDB_STOP_NMI option from opt_global.h to opt_kdb.h Approved by: re	2005-06-29 23:23:16 +00:00
Mike Silbersack	a7b844d2be	Fix the false memory modified after free messages some users have been reporting - in my previous change, I missed the case where a mbuf from the packet zone was freed back to the mbuf/packet keg, where it was subsequently put into the mbuf zone and found not to contain the expected trash. This change adds the necessary trash_dtor call inside mb_fini_pack so that everything is correct. Thanks for Bosko for finding the bug and showing me how secondary zones work. Approved by: re (dwhite)	2005-06-29 08:18:26 +00:00
Dima Dorfman	1ee6b74603	Fix fdcheckstd to pass the file descriptor along through vn_open. When opening a device, devfs_open needs the file descriptor to install its own fileops. Failing to pass the file descriptor causes the vnode to be returned with the regular vnops, which will cause a panic on the first read or write because devfs_specops is not meant to support those operations. This bug caused a panic after exec'ing any set[ug]id program with fds 0..2 closed (i.e., if any action had to be taken by fdcheckstd, we would panic if the exec'd program ever tried to use any of those descriptors). Reviewed by: phk Approved by: re (scottl)	2005-06-25 03:34:49 +00:00
Pawel Jakub Dawidek	400a74bff8	Close another information leak in ktrace(2): one was able to find active process groups outside a jail, etc. by using ktrace(2). OK'ed by: rwatson Approved by: re (scottl) MFC after: 1 week	2005-06-24 12:05:24 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Pawel Jakub Dawidek	06a137780b	Actually only protect mount-point if security.jail.enforce_statfs is set to 2. If we don't return statistics about requested file systems, system tools may not work correctly or at all. Approved by: re (scottl)	2005-06-23 22:13:29 +00:00
John Baldwin	57dbcb11db	Fix a typo in a comment. Approved by: re (scottl)	2005-06-23 21:55:43 +00:00
Mike Silbersack	121f050976	Change the mbuf, mbuf cluster, and mbuf packet allocation routines so that the UMA "trash" allocator is used - this ensures that any writes to a freed mbuf should provoke a panic. Only enabled under INVARIANTS, of course. Approved by: re (scottl)	2005-06-23 04:33:39 +00:00
Pawel Jakub Dawidek	b0d9aedd28	Add missing unlock. Pointy hat to: pjd Approved by: re (dwhite)	2005-06-21 21:17:02 +00:00
John Baldwin	943928c905	Simplify the storming logic and remove a variable as a result. Approved by: re (dwhite)	2005-06-20 19:32:23 +00:00
Garance A Drosehn	bd3aace7e4	Fix a panic which could occur parsing #!-lines in a shell-script. If the #!-line had multiple whitespace characters after the interpreter name, and it did not have any options, then the code would do nasty things trying to process a (non-existent) option-string which "ended before it began"... Submitted by: Morten Johansen Approved by: re (dwhite)	2005-06-19 02:21:03 +00:00
Jeff Roberson	b770ff6eb2	- Try to catch the wrong bufobj panics a little earlier. I believe they are actually caused by a buf with both VNCLEAN and VNDIRTY set. In the traces it is clear that the buf is removed from the dirty queue while it is actually on the clean queue which leaves the tail pointer set. Assert that both flags are not set in buf_vlist_add and buf_vlist_remove. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-18 18:17:03 +00:00
Jeff Roberson	32b6dcd8a4	- Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and cache_zap() to clear the v_dd pointers when a directory vnode is forcibly discarded. For this to work, all vnodes with v_dd pointers to a directory must also have name cache entries linked via v_cache_dst to that dvp otherwise we could not find them at cache_purge() time. The following code snipit could break this guarantee by unlinking a directory before fetching it's dotdot. The dotdot lookup would initialize the v_dd field of the unlinked directory which could never be cleared. To fix this we don't initialize v_dd for orphaned vnodes. printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */ printf("chdir: %d\n", chdir("..")); printf("%s\n", getwd(NULL)); Sponsored by: Isilon Systems, Inc. Discovered by: kkenn Approved by: re (blanket vfs)	2005-06-17 01:05:13 +00:00
Ken Smith	c0cac8dc20	Remove a variable that became unused as a result of changes made in v1.139. This was only exposed if MALLOC_PROFILE was defined. Submitted by: Gary Jennejohn Pointy hat: rwatson Approved by: re (scottl)	2005-06-16 16:01:46 +00:00
Jeff Roberson	114a1006a8	- Change holdcnt use around vnode recycling. We now always keep a holdcnt ref while we're calling vgone(). This prevents transient refs from re-adding us to the free list. Previously, a vfree() triggered via vinvalbuf() getting rid of all of a vnode's pages could place a partially destructed vnode on the free list where vtryrecycle() could find it. The first call to vtryrecycle would hang up on the vnode lock, but when it failed it would place a now dead vnode onto the free list, and another call to vtryrecycle() would free an already free vnode. There were many complications of having a zero ref count while freeing which can now go away. - Change vdropl() to release the interlock before returning. All callers now respect this, so vdropl() directly frees VI_DOOMED vnodes once the last ref is dropped. This means that we'll never have VI_DOOMED vnodes on the free list. - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and v_decr_useonly(). The incr/decr split is so that incr usecount can return with the interlock still held while decr drops the interlock so it can call vdropl() which will potentially free the vnode. The calling function can't drop the lock of an already free'd node. v_decr_useonly() drops a usecount without droping the hold count. This is done so the usecount reaches zero in vput() before we recycle, however the holdcount is still 1 which prevents any new references from placing the vnode back on the free list. - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget(). We wouldn't want vnlrureclaim() to bump the usecount since this has different semantics. Also change vnlrureclaim() to do a NOWAIT on the vn_lock. When this function runs we're usually in a desperate situation and we wouldn't want to wait for any specific vnode to be released. - Fix a bunch of misc comments to reflect the new behavior. - Add vhold() and vdrop() to vflush() for the same reasons that we do in vlrureclaim(). Previously we held no reference and a vnode could have been freed while we were waiting on the lock. - Get rid of vlruvp() and vfreehead(). Neither are used. vlruvp() should really be rethought before it's reintroduced. - vgonel() always returns with the vnode locked now and never puts the vnode back on a free list. The vnode will be freed as soon as the last reference is released. Sponsored by: Isilon Systems, Inc. Debugging help from: Kris Kennaway, Peter Holm Approved by: re (blanket vfs)	2005-06-16 04:41:42 +00:00
Jeff Roberson	bdcd9f26b0	- Fix insertions of bios which represent data earlier than anything else in the queue. The insertion sort assumed this had already been taken care of. Spotted by: Antoine Brodin Approved by: re (scottl)	2005-06-15 23:32:07 +00:00
Jeff Roberson	7a06fe49dc	- Add and enhance asserts related to the wrong bufobj panic. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-14 20:32:27 +00:00
Jeff Roberson	12c2dcde40	- In reassignbuf() add many asserts to validate the head and tail pointers of the clean and dirty lists. This is in an attempt to catch the wrong bufobj problem sooner. - In vgonel() don't acquire an extra reference in the active case, the vnode lock and VI_DOOMED protect us from recursively cleaning. - Also in vgonel() clean up some stale comments. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-14 20:31:53 +00:00
Jeff Roberson	dbb3ec5ce3	- Remove vnode lock asserts at the end of vfs syscalls. These asserts were used to ensure that we weren't exiting the syscall with a lock still held. This wasn't safe, however, because we'd already executed a vput() and on a loaded system the vnode may have been free'd by the time we assert. This functionality is also handled by the td_locks assert in userret, which doesn't tell you what the syscall was, but will at least panic before you deadlock. Sponsored by: Isilon Systems, Inc. Discovred by: Peter Holm Approved by: re (blanket vfs)	2005-06-14 01:14:40 +00:00

... 2 3 4 5 6 ...

8908 Commits