freebsd-skq

Author	SHA1	Message	Date
rwatson	8448d393fa	Correct an incorrect comment from the dawn of time: neither tprintf() nor uprintf() is believed to perform tsleep() or msleep() as written, as ttycheckoutq() is called with '0' as its sleep argument. Remove recently added WITNESS warnings for sleep as the comment was incorrect. This should silence a warning from the nfs_timer() code. Discussed with: bde	2005-09-20 09:55:36 +00:00
andre	9a84c48b48	Start time_uptime with 1 instead of 0. Discussed with: phk	2005-09-19 22:16:31 +00:00
phk	6a408cbd71	Rewamp DEVFS internals pretty severely [1]. Give DEVFS a proper inode called struct cdev_priv. It is important to keep in mind that this "inode" is shared between all DEVFS mountpoints, therefore it is protected by the global device mutex. Link the cdev_priv's into a list, protected by the global device mutex. Keep track of each cdev_priv's state with a flag bit and of references from mountpoints with a dedicated usecount. Reap the benefits of much improved kernel memory allocator and the generally better defined device driver APIs to get rid of the tables of pointers + serial numbers, their overflow tables, the atomics to muck about in them and all the trouble that resulted in. This makes RAM the only limit on how many devices we can have. The cdev_priv is actually a super struct containing the normal cdev as the "public" part, and therefore allocation and freeing has moved to devfs_devs.c from kern_conf.c. The overall responsibility is (to be) split such that kern/kern_conf.c is the stuff that deals with drivers and struct cdev and fs/devfs handles filesystems and struct cdev_priv and their private liason exposed only in devfs_int.h. Move the inode number from cdev to cdev_priv and allocate inode numbers properly with unr. Local dirents in the mountpoints (directories, symlinks) allocate inodes from the same pool to guarantee against overlaps. Various other fields are going to migrate from cdev to cdev_priv in the future in order to hide them. A few fields may migrate from devfs_dirent to cdev_priv as well. Protect the DEVFS mountpoint with an sx lock instead of lockmgr, this lock also protects the directory tree of the mountpoint. Give each mountpoint a unique integer index, allocated with unr. Use it into an array of devfs_dirent pointers in each cdev_priv. Initially the array points to a single element also inside cdev_priv, but as more devfs instances are mounted, the array is extended with malloc(9) as necessary when the filesystem populates its directory tree. Retire the cdev alias lists, the cdev_priv now know about all the relevant devfs_dirents (and their vnodes) and devfs_revoke() will pick them up from there. We still spelunk into other mountpoints and fondle their data without 100% good locking. It may make better sense to vector the revoke event into the tty code and there do a destroy_dev/make_dev on the tty's devices, but that's for further study. Lots of shuffling of stuff and churn of bits for no good reason[2]. XXX: There is still nothing preventing the dev_clone EVENTHANDLER from being invoked at the same time in two devfs mountpoints. It is not obvious what the best course of action is here. XXX: comment out an if statement that lost its body, until I can find out what should go there so it doesn't do damage in the meantime. XXX: Leave in a few extra malloc types and KASSERTS to help track down any remaining issues. Much testing provided by: Kris Much confusion caused by (races in): md(4) [1] You are not supposed to understand anything past this point. [2] This line should simplify life for the peanut gallery.	2005-09-19 19:56:48 +00:00
rwatson	c479a90eb8	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
rwatson	583b25a64f	Remove mac_create_root_mount() and mpo_create_root_mount(), which provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA	2005-09-19 13:59:57 +00:00
marcel	b00bab4473	Move the UUID generator into its own function, called kern_uuidgen(), so that UUIDs can be generated from within the kernel. The uuidgen(2) syscall now allocates kernel memory, calls the generator, and does a copyout() for the whole UUID store. This change is in support of GPT.	2005-09-18 21:40:15 +00:00
rwatson	bfb05b4b93	Add three new read-only socket options, which allow regression tests and other applications to query the state of the stack regarding the accept queue on a listen socket: SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog) SO_LISTENQLEN Return the value of so_qlen (complete sockets) SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets) Minor white space tweaks to existing socket options to make them consistent. Discussed with: andre MFC after: 1 week	2005-09-18 21:08:03 +00:00
rwatson	504eccc1e5	Fix spelling in a comment. MFC after: 3 days	2005-09-18 10:46:34 +00:00
rwatson	1f48076149	Re-comment sbcompress() to explain what it is it does; it took me quite a bit of reading to figure it out, and I want to avoid figuring it out again. Convert an if (foo) else printf("this is almost a panic") into a KASSERT. MFC after: 3 days	2005-09-18 10:30:10 +00:00
imp	55babf09d2	MFp4: Expose device_probe_child()	2005-09-18 01:32:09 +00:00
csjp	7d71792fdf	Implement new world order in VFS locking for ACLs. This will remove the unconditional acquisition of Giant for ACL related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup giant. For any operations which require namei(9) lookups: __acl_get_file __acl_get_link __acl_set_file __acl_set_link __acl_delete_file __acl_delete_link __acl_aclcheck_file __acl_aclcheck_link -Set the MPSAFE flag in NDINIT -Initialize vfslocked variable using the NDHASGIANT macro For functions which operate on fds, make sure the operations are locked: __acl_get_fd __acl_set_fd __acl_delete_fd __acl_aclcheck_fd -Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode Discussed with: jeff	2005-09-17 22:01:14 +00:00
tegge	63fab0fe2d	Break out of loop if next buffer pointer has become invalid while flushing current buffer. Reviewed by: kan	2005-09-16 18:28:12 +00:00
ups	6e520a23cd	Fix race condition that caused activation of an event to be ignored immediately after it was deactivated. Found by: Yahoo! MFC after: 3 days	2005-09-15 21:10:12 +00:00
jhb	b16fb05c6c	Oops, missed adding the required include. Pointy hat to: jhb	2005-09-15 20:20:36 +00:00
jhb	95df5b283f	Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down}) with the disallow sleeping facility.	2005-09-15 20:09:08 +00:00
jhb	feeb07e6ae	Don't disallow sleeping for handlers on swi's since some swi handlers (like CAM) do sleep in their handlers. Requested by: scottl	2005-09-15 20:08:21 +00:00
jhb	00aec40493	- Enforce an implicit lock order that Giant cannot be locked while holding any other non-sleepable lock. In plain English: Giant comes before all other mutexes. - Add some extra description to the lock order reversal printf's to indicate when a reversal is triggered by a hard-coded implicit rule. Requested by: truckman (2) MFC after: 1 week	2005-09-15 19:07:14 +00:00
jhb	e535e11c9f	- Add a new simple facility for marking the current thread as being in a state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk	2005-09-15 19:05:37 +00:00
csjp	f7f404fd08	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days	2005-09-15 15:03:48 +00:00
maxim	99d897fe3c	Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected sockets. Pointed out by: rwatson	2005-09-15 13:18:05 +00:00
rse	56379f0e5b	Fix system shutdown timeout handling by again supporting longer running shutdown procedures (which have a duration of more than 120 seconds). We have two user-space affecting shutdown timeouts: a "soft" one in /etc/rc.shutdown and a "hard" one in init(8). The first one can be configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults to 30 seconds. The second one was originally (in 1998) intended to be configured via sysctl(8) variable "kern.shutdown_timeout" and defaults to 120 seconds. Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999 (as it obviously is actually not used within the kernel itself) and hence was intentionally but misleadingly removed in revision 1.107 from init_main.c. Kernel sysctl(8) variables are certainly a wrong way to control user-space processes in general, but in this particular case the sysctl(8) variable should have remained as it supports init(8), which isn't passed command line flags (which in turn could have been set via /etc/rc.conf), etc. As there is already a similar "kern.init_path" sysctl(8) variable which directly affects init(8), resurrect the init(8) shutdown timeout under sysctl(8) variable "kern.init_shutdown_timeout". But this time document it as being intentionally unused within the kernel and used by init(8). Also document it in the manpages init(8) and rc.conf(5). Reviewed by: phk MFC after: 2 weeks	2005-09-15 13:16:07 +00:00
maxim	1666b7e18b	o Return ENOTCONN when shutdown(2) on non-connected socket. PR: kern/84761 Submitted by: James Juran R-test: tools/regression/sockets/shutdown MFC after: 1 month	2005-09-15 11:45:36 +00:00
phk	4583eb7610	Retire unused dev_named() function.	2005-09-15 08:01:57 +00:00
rwatson	f2fa5d310d	In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported kqueue filter type is requested on a vnode. MFC after: 3 days	2005-09-12 19:22:37 +00:00
jkim	57e4878685	use monotonic `time_uptime' instead of` time_second' Approved by: anholt (mentor) Discussed on: arch	2005-09-12 15:31:28 +00:00
phk	4e50b9ebd8	Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.	2005-09-12 08:46:07 +00:00
tegge	0075736656	Don't retry when vget() returns ENOENT in the nonblocking case due to the vnode being doomed. It causes a livelock.	2005-09-12 01:48:57 +00:00
truckman	cc90fab85a	Relocate witness_levelall(), witness_leveldescendents(), and witness_displaydescendants() so that they are protected by "#ifdef DDB/#endif" to unbreak kernels not using "option DDB". MFC after: 3 weeks	2005-09-11 07:57:06 +00:00
glebius	8f3eb2a425	Make callout_reset() return a non-zero value if a pending callout was rescheduled. If there was no pending callout, then return 0. Reviewed by: iedowse, cperciva	2005-09-08 14:20:39 +00:00
truckman	d2af5326c9	Add a new struct buf flag bit, B_PERSISTENT, and use it to tag struct bufs that are persistently held by ext2fs. Ignore any buffers with this flag in the code in boot() that counts "busy" and dirty buffers and attempts to sync the dirty buffers, which is done before attempting to unmount all the file systems during shutdown. This fixes the problem caused by any ext2fs file systems that are mounted at system shutdown time, which caused boot() to give up on a non-zero number of buffers and skip the call to vfs_unmountall(). This left all the mounted file systems in a dirty state and caused them to all require cleanup by fsck on reboot. Move the two separate copies of the "busy" buffer test in boot() to a separate function. Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro. Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with this and previous flag changes. PR: kern/56675, kern/85163 Tested by: "Matthias Andree" matthias.andree at gmx.de Reviewed by: bde MFC after: 3 days	2005-09-08 06:30:05 +00:00
obrien	b888392910	Forward declaring static variables as extern is invalid ISO-C. Now that GCC can properly handle forward static declarations, do this properly.	2005-09-07 10:06:14 +00:00
glebius	a166323868	In soreceive(), when a first mbuf is removed from socket buffer use sockbuf_pushsync(). Previous manipulation could lead to an inconsistent mbuf. Reviewed by: rwatson	2005-09-06 17:05:11 +00:00
glebius	832f7b2c0d	Document flags of a pollrec.	2005-09-06 11:09:18 +00:00
csjp	33e564c762	Convert the primary ACL allocator from malloc(9) to using a UMA zone instead. Also introduce an aclinit function which will be used to create the UMA zone for use by file systems at system start up. MFC after: 1 month Discussed with: rwatson	2005-09-06 00:06:30 +00:00
glebius	a43b8a5135	Remove Giant mutex from polling(4) and use a separate poll_mtx(4) instead. Detailed changelist: o Add flags field to struct pollrec, to indicate that are particular entry is being worked on. o Define a macro PR_VALID() to check that a pollrec is valid and pollable. o Mark ISRs as mpsafe. o ether_poll() - Acquire poll_mtx while traversing pollrec array. - Skip pollrecs, that are being worked on. - Conditionally acquire Giant when entering handler. o netisr_pollmore() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics. o netisr_poll() - Conditionally assert Giant. - Acquire poll_mtx while working with statistics and traversing pollrec array. o ether_poll_register(), ether_poll_deregister() - Conditionally assert Giant. - Acquire poll_mtx while working with pollrec array. o poll_idle() - Remove all strange manipulations with Giant. In collaboration with: ru, pjd In collaboration with: Oleg Bulyzhin <oleg rinet.ru> In collaboration with: dima <_pppp mail.ru>	2005-09-05 16:02:11 +00:00
delphij	5f683ee68d	When padding with zero, do pad after prefixes rather than padding before prefixes. Use cases: printf("%05d", -42); --> "00-42" (should be "-0042") printf("%#05x", 12); --> "000xc" (should be "0x00c") Submitted by: Oliver Fromme PR: kern/85520 MFC After: 1 week	2005-09-04 18:03:45 +00:00
phk	40bead9126	If we ignore an unknown % sequence, we must stop interpreting the remaining % arguments because the varargs are now out of sync and there is a risk that we might for instance dereference an integer in a %s argument. Sponsored by: Napatech.com	2005-09-03 10:28:08 +00:00
jhb	107f288b6f	- Add some comments to some of the static lock orders. Don't explicitly link proctree and allproc to Giant since that order is already implicitly enforced. - Use a goto to handle the case where we want to enforce a reversal before calling isitmydescendant() in witness_checkorder() so that the logic is easier to follow and so that it is easier to add more forced-reversal cases in the future. MFC after: 3 days	2005-09-02 20:23:49 +00:00
jhb	033d70b179	- Add an assertion to panic if one tries to call mtx_trylock() on a spin mutex. - Don't panic if a spin lock is held too long inside _mtx_lock_spin() if panicstr is set (meaning that we are already in a panic). Just keep spinning forever instead.	2005-09-02 20:21:49 +00:00
jhb	736b03e795	Add witness warnings to panic if a thread tries to exit while holding any locks. Requested by: jeff MFC after: 3 days	2005-09-02 20:20:01 +00:00
njl	b0c4bd3081	Break out the checks for duplicates and absolute settings being too high instead of trying to do them all at once. This should fix the level sorting problems from the previous revision. Testing help: ume	2005-09-02 16:32:43 +00:00
ssouhlal	6dfa271942	Print out a warning and a backtrace if we try to unlock a lockmgr that we do not hold. Glanced at by: phk MFC after: 3 days	2005-09-02 15:56:01 +00:00
ssouhlal	dd616181ff	Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed and unbusied in devfs_fixup(), which assumes that the devfs mount is still locked. Granced at by: phk MFC after: 3 days	2005-09-02 13:37:54 +00:00
pjd	226a90d03e	In case of mac_check_vnode_rename_from() or vn_start_write() failure, vn_finished_write() should not be called. Reviewed by: ssouhlal MFC after: 3 days	2005-09-01 21:46:33 +00:00
andre	056dde3309	Changes and cleanups to m_sanity(): o for() instead of while() looping over mbuf chain o paren's around all flag checks o more verbose function and purpose description o some more style changes Based on feedback from: sam	2005-08-30 21:31:42 +00:00
andre	6418b7b141	Unbreak m_demote() and put back the 'all' flag. Without it we cannot correctly test for m_nextpkt in an mbuf chain.	2005-08-30 21:14:30 +00:00
andre	51994fffc4	o Remove the 'all' flag from m_demote(). Users can simply call it with m_demote(m->m_next) if they wish to start at the second mbuf in chain. o Test m_type with == instead of &. o Check m_nextpkt against NULL instead of implicit 0. Based on feedback from: sam	2005-08-30 20:07:49 +00:00
njl	63a52296e3	Eliminate cpufreq levels for two cases that are less than optimal: 1. Walk the absolute list in reverse to prefer duplicated levels that have a lower absolute setting, i.e. 800 Mhz/50% is better than 1600 Mhz/25% even though both have the same actual frequency. This also removes the need to check for already-modified levels since by definition, those will be added later in the sorted list. 2. Compare the absolute settings for derived levels and don't use the new level if it's higher. For example, a level of 800 Mhz/75% is preferable to 1600 Mhz/25% even though the latter has a lower total frequency. This work is based on a patch from the submitter but reworked by myself. Submitted by: Tijl Coosemans (tijl/ulyssis.org)	2005-08-30 04:45:32 +00:00
andre	41519e2afc	Add m_copymdata(struct mbuf m, struct mbuf n, int off, int len, int prep, int how). Copies the data portion of mbuf (chain) n starting from offset off for length len to mbuf (chain) m. Depending on prep the copied data will be appended or prepended. The function ensures that the mbuf (chain) m will be fully writeable by making real (not refcnt) copies of mbuf clusters. For the prepending the function returns a pointer to the new start of mbuf chain m and leaves as much leading space as possible in the new first mbuf. Reviewed by: glebius	2005-08-29 20:15:33 +00:00
andre	139f31aa37	Add m_sanity(struct mbuf *m, int sanitize) to do some heavy sanity checking on mbuf's and mbuf chains. Set sanitize to 1 to garble illegal things and have them blow up later when used/accessed. m_sanity()'s main purpose is for KASSERT()'s and debugging of non- kosher mbuf manipulation (of which we have a number of). Reviewed by: glebius	2005-08-29 19:58:56 +00:00

1 2 3 4 5 ...

8710 Commits