freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	5a2b158d8d	- Correct function names listed in KASSERTs. These were copied from other code and it was sloppy of me not to adjust these sooner.	2004-01-25 08:21:46 +00:00
Jeff Roberson	e17c57b14b	- Implement cpu pinning and binding. This is acomplished by keeping a per- cpu run queue that is only used for pinned or bound threads. Submitted by: Chris Bradfield <chrisb@ation.org>	2004-01-25 08:00:04 +00:00
Jeff Roberson	d1605f0ac9	- Use a unique string for the sched_setup SYSINIT and rename sched_setup to synch_setup. The schedulers use the sched_setup function name.	2004-01-25 07:49:45 +00:00
Jeff Roberson	29bcc4514f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
Robert Watson	8dc10be885	Add some basic support for measuring sleep mutex contention to the mutex profiling code. As with existing mutex profiling, measurement is done with respect to mtx_lock() instances in the code, as opposed to specific mutexes. In particular, measure two things: (1) Lock contention. How often did this mtx_lock() call get made and have to sleep (or almost sleep) waiting for the lock. This helps identify the "victims" of contention. (2) Hold contention. How often, while the lock was held by a thread as a result of this mtx_lock(), did another thread try to acquire the same mutex. This helps identify the causes of contention. I'm currently exploring adding measurement of "time waited for the lock", but the current implementation has proven useful to me so far so I figured I'd commit it so others could try it out. Note that this increases the size of mutexes when MUTEX_PROFILING is enabled, so you might find you need to further bump UMA_BOOT_PAGES. Fixes welcome. The once over: des, others	2004-01-25 01:59:27 +00:00
Poul-Henning Kamp	551260fc36	Deal with MOD_FREQUENCY before MOD_OFFSET because the latter is the one which runs the actual update. This fixes a bug where there were a delay in applying the frequency adjustment. In extreme cases this could result in marginal stability of the kernel-pll.	2004-01-24 21:48:43 +00:00
Jeff Roberson	b9509b56fa	- Move smp_topology to subr_smp.c so that it is defined on all architectures.	2004-01-24 19:52:48 +00:00
Robert Watson	646e29ccac	Don't grab Giant in crfree(), since prison_free() no longer requires it. The uidinfo code appears to be MPSAFE, and is referenced without Giant elsewhere. While this grab of Giant was only made in fairly rare circumstances (actually GC'ing on refcount==0), grabbing Giant here potentially introduces lock order issues with any locks held by the caller. So this probably won't help performance much unless you change credentials a lot in an application, and leave a lot of file descriptors and cached credentials around. However, it simplifies locking down consumers of the credential interfaces. Bumped into by: sam Appeased: tjr	2004-01-23 21:07:52 +00:00
Robert Watson	b3059e09f6	Defer the vrele() on a jail's root vnode reference from prison_free() to a new prison_complete() task run by a task queue. This removes a requirement for grabbing Giant in crfree(). Embed the 'struct task' in 'struct prison' so that we don't have to allocate memory from prison_free() (which means we also defer the FREE()). With this change, I believe grabbing Giant from crfree() can now be removed, but need to check the uidinfo code paths. To avoid header pollution, move the definition of 'struct task' to _task.h, and recursively include from taskqueue.h and jail.h; much preferably to all files including jail.h picking up a requirement to include taskqueue.h. Bumped into by: sam Reviewed by: bde, tjr	2004-01-23 20:44:26 +00:00
Poul-Henning Kamp	ee57aeea65	Write 100 times for tomorrow: "Always print time_t as %jd, you never know what width it has"	2004-01-22 19:50:06 +00:00
Ralf S. Engelschall	446655ac4f	Fix generation of random multicast MAC address. In case no real/physical IEEE 802 address is available, both the expired "draft-leach-uuids-guids-01" (section "4. Node IDs when no IEEE 802 network card is available") and RFC 2518 (section "6.4.1 Node Field Generation Without the IEEE 802 Address") recommend (quoted from RFC 2518): "The ideal solution is to obtain a 47 bit cryptographic quality random number, and use it as the low 47 bits of the node ID, with the _most_ significant bit of the first octet of the node ID set to 1. This bit is the unicast/multicast bit, which will never be set in IEEE 802 addresses obtained from network cards; hence, there can never be a conflict between UUIDs generated by machines with and without network cards." Unfortunately, this incorrectly explains how to implement this and the FreeBSD UUID generator code inherited this generation bug from the broken reference code in the standards draft. They should instead specify the "_least_ significant bit of the first octet of the node ID" as the multicast bit in a memory and hexadecimal string representation of a 48-bit IEEE 802 MAC address. This standards bug arised from a false interpretation, as the multicast bit is actually the _most_ significant bit in IEEE 802.3 (Ethernet) _transmission order_ of an IEEE 802 MAC address. The standards authors forgot that the bitwise order of an _octet_ from a MAC address _memory_ and hexadecimal string representation is still always from left (MSB, bit 7) to right (LSB, bit 0). Fortunately, this UUID generation bug could have occurred on systems without any Ethernet NICs only.	2004-01-22 13:34:11 +00:00
Poul-Henning Kamp	4e74721cac	Add a sysctl (default: off) which enables a log(LOG_INFO...) warning if the clock is stepped.	2004-01-21 21:05:40 +00:00
Robert Watson	679365e7b9	Reduce gratuitous includes: don't include jail.h if it's not needed. Presumably, at some point, you had to include jail.h if you included proc.h, but that is no longer required. Result of: self injury involving adding something to struct prison	2004-01-21 17:10:47 +00:00
Andrey A. Chernov	9bbee25931	pread/pwrite: follow lseek spirit - return EINVAL on negative offset for non-VCHR	2004-01-20 01:27:42 +00:00
Poul-Henning Kamp	50d23be140	Add linenumber and source filename to panic(9) output. Ideally a traceback should be printed too, any takers ?	2004-01-19 21:27:11 +00:00
Alexander Kabaev	54556cc7b8	One more instance of magic number used in place of IO_SEQSHIFT. Submitted by: alc	2004-01-19 20:45:43 +00:00
Ruslan Ermilov	0541040c46	Since "m" is not part of the "mp" chain, need to free() it. Reported by: Stanford Metacompilation research group	2004-01-18 14:02:53 +00:00
Andrew Gallatin	1c318b9665	Handle sf_buf_alloc() returning null. This can happen if the process takes a signal while waiting for an sf_buf to become available. Reviewed by: alc	2004-01-17 21:16:51 +00:00
Dag-Erling Smørgrav	a6d4491c71	Restore correct semantics for F_DUPFD fcntl. This should fix the errors people have been getting with configure scripts.	2004-01-17 00:59:04 +00:00
Dag-Erling Smørgrav	56a9fc0e93	WITNESS won't let us hold two filedesc locks at the same time, so juggle fdp and newfdp around a bit.	2004-01-16 21:54:56 +00:00
Robert Watson	bafc8f255a	KASSERT() that initproc->p_pid is 1. Very bad things happen if init's pid isn't 1, and it can actually occur if kthread_create() is called before SUB_SI_CREATE_INIT without RFHIGHPID. Discussed with: jhb	2004-01-16 20:29:23 +00:00
Dag-Erling Smørgrav	ddce426f69	Remove two KASSERTs which were overly paranoid.	2004-01-16 08:45:56 +00:00
Dag-Erling Smørgrav	12d568c2b1	Take care to drop locks when calling malloc()	2004-01-15 18:50:11 +00:00
Dag-Erling Smørgrav	a2fe44e8cf	New file descriptor allocation code, derived from similar code introduced in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).	2004-01-15 10:15:04 +00:00
Don Lewis	288e351b55	If a device attach routine fails during boot and calls bus_teardown_intr(), ithread_remove_handler() may fail to remove the interrupt handler if it decides to let the ithread do the removal. The problem is that during boot "cold" is set, which causes msleep() to return immediately. This will cause ithread_remove_handler() to fail to wait for the ithread to do the removal from the handler TAILQ before freeing the handler back to the heap. Bad things will happen when some other user of the TAILQ, such as ithread_add_handler() or the actual ithread attempts to use the freed handler. Fix the problem by forcing ithread_remove_handler() to do the actual removal itself if the "cold" flag is set. Reviewed by: jhb	2004-01-13 22:55:46 +00:00
Dag-Erling Smørgrav	ac34dc4e79	Back out 1.160, which was committed by mistake.	2004-01-11 20:08:57 +00:00
Dag-Erling Smørgrav	d7a1c7e34b	Back out 1.166, which was committed by mistake.	2004-01-11 20:07:15 +00:00
Dag-Erling Smørgrav	f1ea6d813d	Mechanical whitespace cleanup + other minor style nits.	2004-01-11 19:56:42 +00:00
Dag-Erling Smørgrav	0e5dfade00	Mechanical whitespace cleanup.	2004-01-11 19:54:45 +00:00
Dag-Erling Smørgrav	05c3c5c8b6	Mechanical whitespace cleanup; parenthesize return values; other minor style nits. The #ifdefs in this file give me a headache...	2004-01-11 19:52:10 +00:00
Dag-Erling Smørgrav	e5aeaa0c67	Mechanical whitespace cleanup; parenthesize return values; other minor style nits.	2004-01-11 19:48:19 +00:00
Dag-Erling Smørgrav	012b5531f4	Mechanical whitespace cleanup + minor style nits.	2004-01-11 19:43:14 +00:00
Dag-Erling Smørgrav	c9de31f55f	Mechanical whitespace cleanup.	2004-01-11 19:39:14 +00:00
Alan Cox	0e88a71798	Remove long dead code, specifically, code related to munmapfd(). (See also vm/vm_mmap.c revision 1.173.)	2004-01-11 06:59:21 +00:00
Robert Watson	def055686c	When not creating a core dump due to resource limits specifying a maximum dump size of 0, return a size-related error, rather than returning success. Otherwise, waitpid() will incorrectly return a status indicating that a core dump was created. Note that the specific error doesn't actually matter, since it's lost. MFC after: 2 weeks PR: 60367 Submitted by: Valentin Nechayev <netch@netch.kiev.ua>	2004-01-11 02:28:06 +00:00
Jens Schweikhardt	85495c72ff	s/Muliple/Multiple Removed whitespace at EOL and EOF.	2004-01-10 18:34:01 +00:00
Dag-Erling Smørgrav	d41457da80	More unparenthesized return values.	2004-01-10 17:14:53 +00:00
Dag-Erling Smørgrav	b91a599717	Style: parenthesize return values.	2004-01-10 13:03:43 +00:00
Don Lewis	2b77864f1e	Add a somewhat redundant check on the len arguement to getsockaddr() to avoid relying on the minimum memory allocation size to avoid problems. The check is somewhat redundant because the consumers of the returned structure will check that sa_len is a protocol-specific larger size. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: nectar MFC after: 30 days	2004-01-10 08:28:54 +00:00
Olivier Houchard	5cded90454	Prevent a race condition between fork1() and whatever changes the pgrp by setting the new process' p_pgrp again before inserting it in the p_pglist. Without it we can get the new process to be inserted in a different p_pglist than the one p2->p_pgrp points to, and this is not something we want to happen. This is not a fix, merely a bandaid, but it will work until someone finds a better way to do it. Discussed with: jhb (a long time ago)	2004-01-09 23:42:36 +00:00
Robert Watson	07eacae0d2	Improve the expressiveness of ttyinfo (^T) when dealing with threads in slightly less usual states: If the thread is on a run queue, display "running" if the thread is actually running, otherwise, "runnable". If the thread is sleeping, and it's on a sleep queue, display the name of the queue, otherwise "unknown" -- previously, in this situation we would display "iowait". If the thread is waiting on a lock, display *lockname. If the thread is suspended, display "suspended" -- previously, in this situation we would display "iowait". If the thread is waiting for an interrupt, display "intrwait" -- previously, in this situation we would display "iowait". If the thread is in a state not handled by the above, display "unknown" -- previously, we would print "iowait". Among other things, this avoids displaying "iowait" when the foreground process turns out to be suspended waiting for a debugger to properly attach.	2004-01-08 22:49:23 +00:00
Robert Watson	047aa39b25	Drop the sigacts mutex around calls to stopevent() to avoid sleeping holding the mutex. Because the sigacts pointer can't change while the process is "live" (proc locking (x)), we know our pointer is still valid. In communication with: truckman Reviewed by: jhb	2004-01-08 22:44:54 +00:00
Alexander Kabaev	c969c60c60	Add pid to the info printed in lockmgr_printinfo. This makes VFS diagnostic messages slightly more useful.	2004-01-06 04:34:13 +00:00
Alexander Kabaev	580ddfa64b	More style fixes. Obtained from: bde	2004-01-05 23:40:46 +00:00
John Baldwin	eac097962f	- Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recurse on a non-recursive mutex will fail but will not trigger any assertions. - Add an assertion to mtx_lock() that one never recurses on a non-recursive mutex. This is mostly useful for the non-WITNESS case. Requested by: deischen, julian, others (1)	2004-01-05 23:09:51 +00:00
Alexander Kabaev	b0fdf71656	style(9): Add empty line before first code line in functions with no local variables. Properly terminate comment sentences. Indent lines which are longer that 80 characters. Move v_addpollinfo closer to the rest of poll-related functions. Move DEBUG_VFS_LOCKS ifdefed block to the end of file. Obtained from: bde (partly)	2004-01-05 19:04:29 +00:00
Alexander Kabaev	3ff1b7c23f	Cosmetics: strip '\n' from a string passed to Debugger().	2004-01-04 03:42:20 +00:00
David Xu	a30ec4b99c	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr	2004-01-03 02:02:26 +00:00
Nate Lawson	44bb5f52d3	Move the kernel power change printf under bootverbose since the power_profile script now duplicates the message via syslog.	2004-01-02 18:24:13 +00:00
Sam Leffler	4f9f9cf3a4	m_tag fixups in preparation for heavier use: o promote several m_tag_* routines to inline o add an m_tag_setup inline to set the fixed fields in a packet tag o add an m_tag_free method pointer to each mtag to support, for example, allocating tags from zones o have m_tag_find check if the tag list is not empty before calling m_tag_locate to search Reviewed by: brooks, silence from others	2004-01-02 17:27:39 +00:00
David Malone	70ad6c2190	Plug a leak of open files that happens when you exec a suid program with one of std{in,out,err} open. This helps with the file descriptor leaks reported on -current. This should probably be merged into 5.2. Reviewed by: ru Tested by: Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>	2003-12-28 19:27:14 +00:00
Bruce Evans	9efe7d9d83	v_vxproc was a bogus name for a thread (pointer).	2003-12-28 09:12:56 +00:00
Mike Silbersack	ddeb5b242e	Track three new sendfile-related statistics: - The number of times sendfile had to do disk I/O - The number of times sfbuf allocation failed - The number of times sfbuf allocation had to wait	2003-12-28 08:57:09 +00:00
Bruce Evans	d6c847f378	Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).	2003-12-28 04:37:59 +00:00
Bruce Evans	ca46e90ef4	Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().	2003-12-28 04:18:13 +00:00
Mike Silbersack	69fba1650a	Fix the maxpipekva warning message so that it points to the correct sysctl, and shorten the message. Noticed by: bde	2003-12-28 01:19:58 +00:00
Alan Cox	34d2675761	Remove GIANT_REQUIRED from exec_unmap_first_page().	2003-12-27 19:40:03 +00:00
Mike Silbersack	5eda9873e9	Track current and peak sfbuf usage, export the values via sysctl.	2003-12-27 07:52:47 +00:00
John Baldwin	c55bbb6cb7	Create a separate kthread that executes sched_cpu() once a second. Because sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to acquire the lock, it is not safe to execute this in a callout handler from softclock().	2003-12-26 17:07:29 +00:00
Alfred Perlstein	866e3b7e73	Put restrict back in, the compilation failure was my fault when I did a bad merge from the PR. Thanks to Bruce Evans for explaining.	2003-12-26 05:58:16 +00:00
Alfred Perlstein	4abb4ff34d	Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr, copyin and copyout.	2003-12-26 05:54:35 +00:00
David Malone	9322078275	In socket(2) we only need Giant around the call to socreate, so just grab it there.	2003-12-25 23:44:38 +00:00
David Malone	1c58509c25	Don't TAILQ_INIT kq_head twice, once is enough.	2003-12-25 23:42:36 +00:00
Mike Silbersack	8dee2f6746	Fix another 0 / NULL mixup.	2003-12-25 01:17:27 +00:00
Alfred Perlstein	6502da1307	We're not ready for restrict qualifiers here.	2003-12-24 19:09:45 +00:00
Alfred Perlstein	9f144cff85	Add restrict qualifiers. PR: 44394 Submitted by: Craig Rodrigues <rodrige@attbi.com>	2003-12-24 18:47:43 +00:00
Robert Watson	69546b2fbb	Document that when we are addressing an open()/close() race, the reason we call vn_close() manually rather than letting fdrop() take care of it is that we haven't yet hooked up the various 'struct file' fields.	2003-12-24 17:13:01 +00:00
Alfred Perlstein	1805ed0772	Introduce mp_maxcpus which can be used by libkvm utils to find out how many CPUs the system was compiled for. Export the variable via a sysctl node 'kern.smp.maxcpus' as well.	2003-12-23 13:54:16 +00:00
Peter Wemm	2c74309622	Regen - this should be essentially a NOP, except for rcsid changes.	2003-12-23 03:52:14 +00:00
Peter Wemm	eec525a435	Remove namespc column and attempt to un-fold some of the longer lines that now fit.	2003-12-23 03:51:36 +00:00
Peter Wemm	1a58b07149	Remove the namespace column from the syscalls tables. We don't actually use it, if we ever did. They have been been VERY poorly maintained for some time, possibly because they were a NOP. FWIW, This brings our table formats back closer to the other *BSD's.	2003-12-23 03:50:43 +00:00
Peter Wemm	9b68618df0	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.	2003-12-23 02:42:39 +00:00
Peter Wemm	a89ec05e3e	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.	2003-12-23 02:36:43 +00:00
Peter Wemm	55cdddc0d8	Don't use NULL (pointer) when we mean 0 (integer) for the number of ticks in msleep.	2003-12-23 02:28:42 +00:00
Jeff Roberson	249e0bea8f	- Make our transfer decisions based on load and not transferable load. A cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.	2003-12-20 22:35:20 +00:00
Jeff Roberson	e7a976f415	- Enable ithread migration on x86. This is done to work around a bug in the IO APIC on Xeons that prevents round-robin interrupt assignment from working.	2003-12-20 20:36:19 +00:00
Alan Cox	96a7b42213	Remove a variable that has been initialized but otherwise unused since revision 1.315.	2003-12-20 19:46:21 +00:00
Jeff Roberson	670c524f08	- In kseq_transfer() return if smp has not been started. - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc	2003-12-20 14:03:14 +00:00
Jeff Roberson	9b5f6f623d	- Running interactive tasks with the minimum time-slice is fine for vi and sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.	2003-12-20 12:54:35 +00:00
Tim J. Robbins	f5925b7436	Reduce the overhead of semop() by using the kernel stack instead of malloc'd memory to store the operations array if it is small enough to fit.	2003-12-19 13:07:17 +00:00
John Baldwin	eb5b0e0565	Various style fixes. Submitted by: bde (mostly, if not all)	2003-12-17 21:13:04 +00:00
Jeff Roberson	958557e9c7	- In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT. Submitted by: Stephan Uphoff <ups@stups.com>	2003-12-16 17:08:27 +00:00
Jeff Roberson	d85213669b	- When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by reassigning their v_ops field to specfs, detaching from the mountpoint, etc. However, this is not sufficient. If we vclean() the vnode the pages owned by the vnode are lost, potentially while buffers reference them. Implement parts of vclean() seperately in vgonechrl() so that the pages and bufs associated with a device vnode are not destroyed while in use.	2003-12-16 17:05:05 +00:00
Bruce M Simpson	5406529771	style(9) pass and type fixups. Submitted by: bde	2003-12-16 14:13:47 +00:00
Bruce M Simpson	37621fd5d9	Push m_apply() and m_getptr() up into the colleciton of standard mbuf routines, and purge them from opencrypto. Reviewed by: sam Obtained from: NetBSD Sponsored by: spc.org	2003-12-15 21:49:41 +00:00
Jeff Roberson	86e1c22aa4	- Assign the ke_cpu field in kseq_notify() so that all of our callers do not have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.	2003-12-14 02:06:29 +00:00
Robert Watson	09a4a69c1d	Although sometimes to the uninitiated, it may seem like goup, KSEGOUP is actually spelt KSEGROUP. Go figure. Reported by: samy@kerneled.com	2003-12-12 21:25:56 +00:00
Jeff Roberson	cac77d0422	- Now that we have kseq groups, balance them seperately. - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.	2003-12-12 07:33:51 +00:00
Jeff Roberson	2e227f0406	- Don't let the pctcpu rate limiter throttle us if we have recorded over SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.	2003-12-11 04:23:39 +00:00
Jeff Roberson	b11fdad0fc	- In sched_switch(), if a thread has been assigned, don't touch the runqueues or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.	2003-12-11 04:00:49 +00:00
Jeff Roberson	80f86c9f88	- Add support for CPU groups to ule. All SMT cores on the same physical cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.	2003-12-11 03:57:10 +00:00
Peter Wemm	5be4b10c89	Regen	2003-12-10 22:18:54 +00:00
Peter Wemm	5352eb6bb1	Update file locations for syscall tables to copy to.	2003-12-10 22:08:37 +00:00
Marcel Moolenaar	ccb46feb8e	Write the thread pointer (val) in the kse mailbox (loc) before we set the new context in kse_switchin(2). This allows us to return an error to the calling context when the suword() fails.	2003-12-10 01:59:23 +00:00
John Baldwin	67ba867827	Adjust an assertion for the TDF_TSNOBLOCK race handling in turnstile_unpend(). A racing thread that does not have TDI_LOCK set may either be running on another CPU or it may be sitting on a run queue if it was preempted during the very small window in turnstile_wait() between unlocking the turnstile chain lock and locking sched_lock.	2003-12-09 21:14:31 +00:00
John Baldwin	da1d503b22	Assert that the we never give a thread a NULL turnstile when waking it up.	2003-12-09 21:09:54 +00:00
John Baldwin	6b6bd95ee5	Revert the previous race fix and replace it with a more general fix. The case of a turnstile having no threads is just one instance of the more general case where the thread we are examining has been partially awakened already in that it has been removed from the turnstile's blocked list but still has TDI_LOCK set. We detect that case by checking to see if the thread has already had a turnstile reassigned to it.	2003-12-09 21:09:04 +00:00
David Xu	a9a48d6862	Lock and unlock sched_lock when walking through thread list, current we insert kse upcall thread into thread list at mi_switch time, process lock is not enough.	2003-12-07 23:47:15 +00:00
Don Lewis	50105bcf1a	Pass MTX_DEF as the last argument to mtx_init() instead of 0. This is not a functional change. The code happened to work properly only because MTX_DEF is defined as 0.	2003-12-07 21:53:41 +00:00
Poul-Henning Kamp	377e7be416	Make the DIAGNOSTIC code which complains about long {call\|time}out(9) functions less noisy: We printf if a new function took longer than the previous record holder, or of the previous record holder took more than twice as long as the current record.	2003-12-07 20:03:28 +00:00
Marcel Moolenaar	cfa4b1e7b1	Regen due to kse_switchin(2).	2003-12-07 19:36:16 +00:00
Marcel Moolenaar	702b2a179c	Add kse_switchin(2). This syscall can be used by KSE implementations to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.	2003-12-07 19:34:29 +00:00
Peter Wemm	a2640c9ba9	rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64). Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff	2003-12-07 09:57:51 +00:00
Scott Long	774114995e	Re-arrange and consolidate some random debugging stuff	2003-12-07 05:04:49 +00:00
Alan Cox	bca62663ab	- Giant is no longer required by vm_thread_new().	2003-12-07 04:16:49 +00:00
Robert Watson	56d9e93207	Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(), and the mpo_create_cred() MAC policy entry point to mpo_copy_cred_label(). This is more consistent with similar entry points for creation and label copying, as mac_create_cred() was called from crdup() as opposed to during process creation. For a number of policies, this removes the requirement for special handling when copying credential labels, and improves consistency. Approved by: re (scottl) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-12-06 21:48:03 +00:00
John Baldwin	b6c71225a9	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
John Baldwin	45c1c90f6a	Export a few SMP related symbols in UP kernels as well. This is needed to aid other kernel code, especially code which can be in a module such as the acpi_cpu(4) driver, to work properly with both SMP and UP kernels. The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and the smp_rendezvous() function. This also means that CPU_ABSENT() is now always implemented the same on all kernels. Approved by: re (scottl)	2003-12-03 14:55:31 +00:00
David Greenman	186e347f2c	Fixed a bug in sendfile(2) where the sent data would be corrupted due to sendfile(2) being erroneously automatically restarted after a signal is delivered. Fixed by converting ERESTART to EINTR prior to exiting. Updated manual page to indicate the potential EINTR error, its cause and consequences. Approved by: re@freebsd.org	2003-12-01 22:12:50 +00:00
Ian Dowse	25cb5d7a6b	In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the forced unmount case. Otherwise, a file system that is referenced only by process fd_cdir/fd_rdir references to the file system root vnode will be successfully unmounted without the MNT_FORCE flag. The previous behaviour was not compatible with the unmount semantics required by amd(8), so file systems could be unexpectedly unmounted while there were still references to the file system root directory. Reported by: Erez Zadok <ezk@cs.sunysb.edu> Approved by: re (scottl)	2003-11-30 23:30:09 +00:00
Jeff Roberson	a6c6a93c89	- Don't forget to unlock the vnode interlock in the LK_NOWAIT case. Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)	2003-11-30 22:09:58 +00:00
Alexander Kabaev	97c43a540a	Do not attempt to destroy NULL vfs options list. Approved by: re (scottl) Reported by: Christian Laursen <xi atborderworlds dot dk>	2003-11-23 17:13:48 +00:00
John Baldwin	798a45964d	- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)	2003-11-21 22:23:26 +00:00
Mark Murray	4e3a7a14d9	Fix a major faux pas of mine. I was causing 2 very bad things to happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *	2003-11-20 15:35:48 +00:00
Mark Murray	3fed54aaaa	Hackfix to patch around a kernel panic I introduced. Real fix to follow. In the meanwhile, we are not harvesting interrupt entropy. Approved by: re (jhb)	2003-11-18 14:35:43 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Robert Watson	64d19c2ea7	Add a sysctl, security.bsd.see_other_gids, similar in semantics to see_other_uids but with the logical conversion. This is based on (but not identical to) the patch submitted by Samy Al Bahra. Submitted by: Samy Al Bahra <samy@kerneled.com>	2003-11-17 20:20:53 +00:00
Peter Wemm	0d2a298904	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.	2003-11-17 08:58:16 +00:00
Jeff Roberson	fa9c971710	- Mark ksq_assigned as volatile so that when this code is used without sched_lock we can be sure that we'll pick up the new value.	2003-11-17 08:27:11 +00:00
Jeff Roberson	093c05e39d	- Remove long dead code. rslices hasn't been used in some time and neither has sched_pickcpu().	2003-11-17 08:24:14 +00:00
Peter Wemm	90e3387e54	Expand the argument to the ithread enable/disable helper hooks from an int to something big enough to hold a pointer. amd64 needs this.	2003-11-17 06:08:10 +00:00
Robert Watson	b0323ea3aa	Implement sockets support for __mac_get_fd() and __mac_set_fd() system calls, and prefer these calls over getsockopt()/setsockopt() for ABI reasons. When addressing UNIX domain sockets, these calls retrieve and modify the socket label, not the label of the rendezvous vnode. - Create mac_copy_socket_label() entry point based on mac_copy_pipe_label() entry point, intended to copy the socket label into temporary storage that doesn't require a socket lock to be held (currently Giant). - Implement mac_copy_socket_label() for various policies. - Expose socket label allocation, free, internalize, externalize entry points as non-static from mac_net.c. - Use mac_socket_label_set() in __mac_set_fd(). MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and mac_get_peer() to retrieve and set various socket labels without directly invoking the getsockopt() interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 23:31:45 +00:00
Robert Watson	9e71dd0feb	Reduce gratuitous redundancy and length in function names: mac_setsockopt_label_set() -> mac_setsockopt_label() mac_getsockopt_label_get() -> mac_getsockopt_label() mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel() Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 18:25:20 +00:00
Alan Cox	e45db9b837	- Modify alpha's sf_buf implementation to use the direct virtual-to- physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.	2003-11-16 06:11:26 +00:00
Robert Watson	12cbb9dc56	When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, make sure to sooptcopyin() the (struct mac) so that the MAC Framework knows which label types are being requested. This fixes process queries of socket labels. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-16 03:53:36 +00:00
Bruce Evans	416ab90e6b	Localized the cy driver's locking.	2003-11-16 00:55:54 +00:00
Poul-Henning Kamp	d87526cf43	Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout".	2003-11-15 18:33:54 +00:00
Tim J. Robbins	4d93f53e74	Initialize sequence numbers to 0 in seminit() instead of using whatever garbage happens to be in memory. This did not seem to cause any problems except making semaphore ID's unpredictable (and ugly in ipcs(1) output).	2003-11-15 11:56:53 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	28c9416429	- Remove the remaining now unnecessary checks for the buf's b_object being NULL. See revision 1.421 for more detail. - Remove GIANT_REQUIRED from vfs_unbusy_pages(). Discussed with: jeff	2003-11-15 08:45:36 +00:00
Jeff Roberson	155b9987a3	- Introduce kseq_runq_{add,rem}() which are used to insert and remove kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.	2003-11-15 07:32:07 +00:00
Olivier Houchard	1a29c80648	Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week	2003-11-14 18:49:01 +00:00
Alexander Kabaev	3b39740df8	Fix a number of style(9) bugs introduced in r1.113 by me. Suggested by: bde	2003-11-14 05:27:41 +00:00
Jeff Roberson	808674fd0e	- regen.	2003-11-14 03:49:41 +00:00
Jeff Roberson	5c49a0566a	- Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.	2003-11-14 03:48:37 +00:00
Poul-Henning Kamp	555a5de270	Various minor details: Give the HZ/overflow check a 10% margin. Eliminate bogus newline. If timecounters have equal quality, prefer higher frequency. Some inspiration from: bde	2003-11-13 10:03:58 +00:00
John Baldwin	79a13d0182	- Close a race where a thread on another CPU could release a contested lock and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)	2003-11-12 23:48:42 +00:00
Kirk McKusick	48b0f4b67d	At the request of several developers, restore the DIAGNOSIC code deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>	2003-11-12 22:28:27 +00:00
Robert Watson	f0ab044241	Mark __mac_get_pid() as MPSAFE in the comment, as it runs without Giant and is also MPSAFE. Push Giant further down into __mac_get_fd() and __mac_set_fd(), grabbing it only for constrained regions dealing with VFS, and dropping it entirely for operations related to labeling of pipes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 22:19:15 +00:00
Peter Wemm	cde6302bf0	MNAMELEN is back to an int again after Kirk's statfs commit kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1	2003-11-12 17:09:12 +00:00
John Baldwin	861a7db56f	Fix a typo in a comment. Submitted by: das	2003-11-12 14:55:45 +00:00
Poul-Henning Kamp	1415a09d42	Replace B_PHYS conditional assignment to bio_offset with KASSERT check to see that the originating code already did it right.	2003-11-12 10:27:06 +00:00
Kirk McKusick	1977597b34	Update the five files derived from /sys/kern/syscalls.master after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.	2003-11-12 08:09:19 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Alexander Kabaev	5c957adbf1	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson	2003-11-12 02:54:47 +00:00
John Baldwin	961a7b244d	Add an implementation of turnstiles and change the sleep mutex code to use turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP	2003-11-11 22:07:29 +00:00
Joseph Koshy	a5896914f0	Bound the number of iterations a thread can perform inside ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)	2003-11-11 09:09:26 +00:00
Joseph Koshy	b10221ffd9	Have utrace(2) return ENOMEM if malloc() fails. Document this error return in its manual page. Reviewed by: jhb	2003-11-11 04:54:11 +00:00
Alan Cox	e35e0182c3	- Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being consistency initialized. Consequently, a number of conditionals that checked the validity of b_object before passing it to VM_OBJECT_LOCK() and VM_OBJECT_UNLOCK() are no longer needed.	2003-11-11 04:45:37 +00:00

1 2 3 4 5 ...

7022 Commits