freebsd-skq

Author	SHA1	Message	Date
jhb	2660b97507	Count the context switch when blocking on a mutex as a voluntary context switch. Count the context switch when preempting the current thread to let a higher priority thread blocked on a mutex we just released run as an involuntary context switch. Reported by: bde	2001-06-25 18:29:32 +00:00
jhb	fdfd5d01a7	Count the switch when an ithread goes idle as a voluntary context switch. Submitted by: bde	2001-06-25 18:27:33 +00:00
dwmalone	79a843a087	Don't dereference a NULL pointer if we fail to get a sendfilebuf.	2001-06-24 12:27:30 +00:00
dillon	f8016646a9	After exhaustive discussions and some meandering and confusion, enough people are on track with the cause and effect of this, and although fixing this severely degenerate case appears to violate the letter of POSIX.1-200x, Bruce and I (and enough others) agree that it should be comitted. So, this patch generates an ENOENT error for any attempt to do a path lookup through an empty symlink (e.g. open(), stat()). Submitted by: "Andrey A. Chernov" <ache@nagual.pp.ru> Reviewed by: bde Discussed exhaustively on: freebsd-current Previously committed to: NetBSD 4 years ago	2001-06-24 05:24:41 +00:00
jhb	dfa68807f4	- Lock CURSIG() with the proc lock to close the signal race with psignal. - Grab Giant around ktrace points. - Clean up KTR_PROC tracepoints to not display the value of sched_lock.mtx_lock as it isn't really needed anymore and just obfuscates the messages. - Add a few if conditions to replace gotos. - Ensure that every msleep KTR event ends up with a matching msleep resume KTR event (this was broken when we didn't do a mi_switch()). - Only note via ktrace that we resumed from a switch once rather than twice in several places in msleep(). - Remove spl's rom asleep and await as the proc lock and sched_lock provide all the needed locking. - In mawait() add in a needed ktrace point for noting that we are about to switch out.	2001-06-22 23:11:26 +00:00
jhb	ae99243f0b	- Lock CURSIG with the proc lock and don't release the proc lock until after grabbing the sched lock to close a race. - Lock ktrace points with Giant.	2001-06-22 23:06:38 +00:00
jhb	e5e16e09ad	- Grab the proc lock around CURSIG and postsig(). Don't release the proc lock until after grabbing the sched_lock to avoid CURSIG racing with psignal. - Don't grab Giant for addupc_task() as it isn't needed. Reported by: tegge (signal race), bde (addupc_task a while back)	2001-06-22 23:05:11 +00:00
jhb	8210b8d106	- Change CURSIG() and postsig() to require that the proc lock is held rather than grabbing it and releasing it themselves. This allows callers of these functions to get the lock to close race conditions. - Grab Giant around ktrace in postsig. - Count the switches performed on SIGSTOP's as involuntary context switches in the resource usage stats. Reported by: tegge (signal race), bde (missing csw stats)	2001-06-22 23:02:37 +00:00
mjacob	95a162e88f	int -> size_t fix	2001-06-22 19:54:38 +00:00
mjacob	4127a25756	Temporary fix at least- define NCPU_PRESENT which will be mp_npcus for SMP kernels, one (1) for non-SMP.	2001-06-22 16:03:23 +00:00
pirzyk	773adf0e44	changed hostid from long to unsigned long to be able to store values > 2GB on i386 platforms. Also changed SYSCTL type from INT to ULONG and removed comment about it. PR: kern/21132 MFC after: 1 month	2001-06-22 16:03:14 +00:00
bmilekic	5d710b296b	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/	2001-06-22 06:35:32 +00:00
jhb	092b28a542	Fix some lock order reversals where we called free() while holding a proc lock. We now use temporary variables to save the process argument pointer and just update the pointer while holding the lock. We then perform the free on the cached pointer after releasing the lock.	2001-06-20 23:10:06 +00:00
bmilekic	70d52016a3	Change m_devget()'s outdated and unused `offset' argument to actually mean something: offset into the first mbuf of the target chain before copying the source data over. Make drivers using m_devget() with a first argument "data - ETHER_ALIGN" to use the offset argument to pass ETHER_ALIGN in. The way it was previously done is potentially dangerous if the source data was at the top of a page and the offset caused the previous page to be copied (if the previous page has not yet been appropriately mapped). The old `offset' argument in m_devget() is not used anywhere (it's always 0) and dates back to ~1995 (and earlier?) when support for ethernet trailers existed. With that support gone, it was merely collecting dust. Tested on alpha by: jlemon Partially submitted by: jlemon Reviewed by: jlemon MFC after: 3 weeks	2001-06-20 19:48:35 +00:00
jhb	f466ba0f8e	Preemption by an interrupt thread is an involuntary switch, not a voluntary one. Pointy-hat to: me	2001-06-20 18:26:41 +00:00
des	8a223ad3ce	Constify (silence warnings introduced by last commit to sys/module.h)	2001-06-20 16:08:45 +00:00
wollman	204b9a8a22	After one too many PRs on the subject, bite the bullet and define IOV_MAX and its associated constants. Implement _SC_IOV_MAX in the usual way. Be a bit sloppy about the namespace question; this should get cleared up in time for 5.0. MFC after: 1 month	2001-06-18 20:24:54 +00:00
jhb	1852c17085	Lock Giant in postsig() for the KTRACE case as ktrpsig() needs Giant when it writes out to the trace file. Reported by: peter, gallatin, and others	2001-06-18 19:23:43 +00:00
brian	a08eb62a0d	Add linker_reference_module(). This function loads a module if required, otherwise bumps the reference count -- the opposite of linker_file_unload().	2001-06-18 15:09:33 +00:00
brian	59c2ccba3b	Don't remove the SI_CHEAPCLONE for unsupported minors	2001-06-18 09:22:30 +00:00
peter	e05ff7e2d6	Move setugid() a little sooner to before we release tracing in case crdup() or change_e*id() block on malloc() or mutex.	2001-06-16 23:34:23 +00:00
peter	38ecd59e07	Add INTR_TYPE_AV so that we can get to the PI_AV priority in the ithread handlers. This is beneficial since it means that pcm's MPSAFE handler can get run before things that will block on Giant in the shared irq case.	2001-06-16 22:42:19 +00:00
jlemon	d115ce425b	Fix warnings: 112: warning: cast to pointer from integer of different size 125: warning: cast to pointer from integer of different size	2001-06-16 07:02:47 +00:00
jlemon	0dbb10c226	Correctly hook up the write kqfilter to pipes. Submitted by: Niels Provos <provos@citi.umich.edu>	2001-06-15 20:45:01 +00:00
peter	51d35ea75c	Fix some warnings in kern_environment.c. Make the getenv*() family take a const 'name', since they dont modify anything. 159: warning: passing arg 1 of `getenv_int' discards qualifiers... 167: warning: passing arg 1 of `getenv' discards qualifiers from pointer..	2001-06-15 07:29:17 +00:00
peter	17e3eb1d7f	As per comments in sys/linker_set.h: BANG! BANG! BANG! BANG! BANG! BANG! CLICK! CLICK! CLICK! CLICK! CLICK! <reload> BANG! BANG! BANG! BANG! BANG! BANG! CLICK! CLICK! CLICK! CLICK! CLICK!	2001-06-14 01:28:56 +00:00
peter	f10fa038c1	With this commit, I hereby pronounce gensetdefs past its use-by date. Replace the a.out emulation of 'struct linker_set' with something a little more flexible. <sys/linker_set.h> now provides macros for accessing elements and completely hides the implementation. The linker_set.h macros have been on the back burner in various forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()), John Polstra (ELF clue) and myself (cleaned up API and the conversion of the rest of the kernel to use it). The macros declare a strongly typed set. They return elements with the type that you declare the set with, rather than a generic void *. For ELF, we use the magic ld symbols (__start_<setname> and __stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the trick about how to force ld to provide them for kld's. For a.out, we use the old linker_set struct. NOTE: the item lists are no longer null terminated. This is why the code impact is high in certain areas. The runtime linker has a new method to find the linker set boundaries depending on which backend format is in use. linker sets are still module/kld unfriendly and should never be used for anything that may be modular one day. Reviewed by: eivind	2001-06-13 10:58:39 +00:00
peter	a97b956712	Patch up a blunder I made a few days ago. nmbcnt was being initialized too late. Noted by: bmilekic Pointy-hat to: peter	2001-06-13 00:36:41 +00:00
peter	bbbe8875f0	Hints overhaul: - Replace some very poorly thought out API hacks that should have been fixed a long while ago. - Provide some much more flexible search functions (resource_find_*()) - Use strings for storage instead of an outgrowth of the rather inconvenient temporary ioconf table from config(). We already had a fallback to using strings before malloc/vm was running anyway.	2001-06-12 09:40:04 +00:00
des	3463e6d056	Rename nextpid to lastpid and externalize it.	2001-06-11 21:54:19 +00:00
des	7da6da146f	Blah, I cut out a tad too much in the previous commit. (thanks again, Jake!)	2001-06-11 18:43:32 +00:00
des	b21baf0f69	copyin(9) doesn't return ENAMETOOLONG. (thanks, Jake!)	2001-06-11 18:36:18 +00:00
des	86b7e548ab	Add sbuf_copyin(). Also add 'b' variants of sbuf_{cat,copyin,cpy}() which ignore NUL bytes in the source string.	2001-06-11 17:05:52 +00:00
ume	832f8d2249	Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks	2001-06-11 12:39:29 +00:00
dwmalone	46ac202c04	Try to make the setting of the SIGCHLD handler the same as setting of the NOCLDWAI flag. Susv2 seems to require this. Submitted by: Cejka Rudolf <cejkar@dcse.fee.vutbr.cz> Reviewed by: dillon	2001-06-11 09:15:41 +00:00
des	23c38e4e7c	sbuf_new(9) now returns a struct sbuf * instead of an int. If the caller does not provide a struct sbuf, sbuf_new(9) will allocate one and return a pointer to it.	2001-06-10 15:48:04 +00:00
peter	4b91e2ecf0	"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time around, use a common function for looking up and extracting the tunables from the kernel environment. This saves duplicating the same function over and over again. This way typically has an overhead of 8 bytes + the path string, versus about 26 bytes + the path string.	2001-06-08 05:24:21 +00:00
peter	c1df44ae51	Back out part of my previous commit. This was a last minute change and I botched testing. This is a perfect example of how NOT to do this sort of thing. :-(	2001-06-07 03:17:26 +00:00
tmm	0e4c21f7c1	Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed to the operation parameter, not to the flags as it should be. Reviewed by: rwatson	2001-06-06 23:34:38 +00:00
peter	0732738ec4	Make the TUNABLE_() macros look and behave more consistantly like the SYSCTL_() macros. TUNABLE_INT_DECL() was an odd name because it didn't actually declare the int, which is what the name suggests it would do.	2001-06-06 22:17:08 +00:00
jhb	df7d2486bc	We don't need to hold a lock just to test a flag.	2001-06-06 22:05:48 +00:00
ru	d698c5d44e	Unbreak setregid(2). Spotted by: Alexander Leidinger <Alexander@Leidinger.net>	2001-06-06 13:58:03 +00:00
jhb	7a4f835060	Don't hold sched_lock across addupc_task(). Reported by: David Taylor <davidt@yadt.co.uk> Submitted by: bde	2001-06-06 00:57:24 +00:00
dd	1c7d10ac21	Add a line discipline close routine which restores some functionality I accidently nuked in rev. 1.54. Also rework the error handling in snplwrite a little.	2001-06-05 05:07:53 +00:00
dd	c35e39a5cb	Style and cosmetic cleanups. This driver is now reasonably stlye(9) compliant. All the variable definitions and function names are reasonably consistent, and the functions which should be static (i.e., all of them) are. Other assorted fixes were made. The majority of the delta is indentation fixes. Partially reviewed by: bde	2001-06-05 05:00:17 +00:00
dd	b646de742c	Use the l_nullioctl exported from tty_conf.c rather than rolling our own.	2001-06-04 23:31:21 +00:00
dd	c6d2a1e6f9	Unstaticize l_nullioctl; it is needed elsewhere (like in tty_snoop.c). Suggested by: bde	2001-06-04 23:30:47 +00:00
dillon	7f9e532290	The pipe_write() code was locking the pipe without busying it first in certain cases, and a close() by another process could potentially rip the pipe out from under the (blocked) locking operation. Reported-by: Alexander Viro <viro@math.psu.edu>	2001-06-04 04:04:45 +00:00
dd	eaa7d6fe18	Remove unused includes, use *min() inline functions rather than a home-grown macro, rewrite a confusing conditional in snpdevtotty(), and change ibuf to 512 bytes instead of 1024 bytes in dsnwrite(). Reviewed by: bde	2001-06-03 05:17:39 +00:00
dd	e9e92e57f1	When tring to find out if this is a request for a write in kernel_sysctl and userland_sysctl, check for whether new is NULL, not whether newlen is 0. This allows one to set a string sysctl to "".	2001-06-03 04:58:51 +00:00
dd	824074e63c	Include sys/mutex.h to silence a warning.	2001-06-03 02:19:07 +00:00
jesper	42275c0786	Revert the last bits of my bogus move of NMBCLUSTERS to <sys/param.h>	2001-06-01 21:47:34 +00:00
tmm	9ce8a62347	Clean up the code exporting interrupt statistics via sysctl a bit: - move the sysctl code to kern_intr.c - do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine the length of the intrcnt array - move the declarations of intrnames, eintrnames, intrcnt and eintrcnt from machine-dependent include files to sys/interrupt.h - remove the hw.nintr sysctl, it is not needed. - fix various style bugs Requested by: bde Reviewed by: bde (some time ago)	2001-06-01 13:23:28 +00:00
ru	e7a85be33f	Remove vestiges of MFS.	2001-06-01 10:07:28 +00:00
obrien	538a64fd6b	Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile.	2001-06-01 09:51:14 +00:00
jesper	51b1367e42	Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c, and remove the now unneeded includes of "opt_param.h". MFC after: 1 week	2001-05-31 21:56:44 +00:00
dd	32b1a4110c	Export via sysctl: * all members of msginfo from sysv_msg.c; * msqids from sysv_msg.c; * sema from sysv_sem.c; and * shmsegs from sysv_shm.c; These will be used by ipcs(1) in non-kvm mode. Reviewed by: tmm	2001-05-30 03:28:59 +00:00
phk	8e6e314c51	Remove the hack-around for the slice/label code, it didn't cover the hole.	2001-05-29 18:19:57 +00:00
iedowse	fdb42dd4bf	Since the netexport struct was centralised to 'struct mount', attempting to remove nonexistant exports with MNT_DELEXPORT returns an error; before this change it always succeeded. This caused mountd(8) to log "can't delete exports for /whatever" warnings. Change the error code from EINVAL to a more specific ENOENT, and make mountd ignore this error when deleting the export list. I could have just restored the previous behaviour of returning success, but I think an error return is a useful diagnostic. Reviewed by: phk	2001-05-29 17:46:52 +00:00
phk	3c8f4ed442	Remove a comment which was past its shelf life. PR: 18750 Submitted by: Tony Finch <dot@dotat.at>	2001-05-29 09:22:22 +00:00
phk	aaaac2aa6c	With the new kernel dev_t conversions done at release 4.X, it becomes possible to trap in ptsstop() in kern/tty_pty.c if the slave side has never been opened during the life of a kernel. What happens is that calls to ttyflush() done from ptyioctl() for the controlling side end up calling ptsstop() [via (tp->t_stop)(tp, <X>)] which evaluates the following: struct pt_ioctl pti = tp->t_dev->si_drv1; In order for tp->t_dev to be set, the slave device must first be opened in ttyopen() [kern/tty.c]. It appears that the only problem is calls to (*tp->t_stop)(tp, <n>), so this could also happen with other ioctls initiated by the controlling side before the slave has been opened. PR: 27698 Submitted by: David Bein bein@netapp.com MFC after: 6 days	2001-05-28 20:22:12 +00:00
phk	8a71a60369	The disklabel/slice code is more twisted than I thought. Revert to calling the cdevsw_add() unconditionally.	2001-05-28 16:12:55 +00:00
brian	723d7c5d15	Handle NULL struct device *s	2001-05-28 01:00:03 +00:00
rwatson	8c5428e595	o uifree() the cr_ruidinfo in crfree() as well as cr_uidinfo now that the real uid info is in the credential also. Submitted by: egge	2001-05-27 21:43:46 +00:00
rwatson	6937691789	o pcred-removal changes included modifications to optimize the setting of the saved uid and gid during execve(). Unfortunately, the optimizations were incorrect in the case where the credential was updated, skipping the setting of the saved uid and gid when new credentials were generated. This change corrects that problem by handling the newcred!=NULL case correctly. Reported/tested by: David Malone <dwmalone@maths.tcd.ie> Obtained from: TrustedBSD Project	2001-05-26 19:59:44 +00:00
phk	2072a71f0e	Create a general facility for making dev_t's depend on another dev_t. The dev_depends(dev_t, dev_t) function is for tying them to each other. When destroy_dev() is called on a dev_t, all dev_t's depending on it will also be destroyed (depth first order). Rewrite the make_dev_alias() to use this dependency facility. kern/subr_disk.c: Make the disk mini-layer use dependencies to make sure all relevant dev_t's are removed when the disk disappears. Make the disk mini-layer precreate some magic sub devices which the disk/slice/label code expects to be there. kern/subr_disklabel.c: Remove some now unneeded variables. kern/subr_diskmbr.c: Remove some ancient, commented out code. kern/subr_diskslice.c: Minor cleanup. Use name from dev_t instead of dsname()	2001-05-26 08:27:58 +00:00
jhb	e736b41c69	Add vm locking to sendfile(2) and sf_buf_free(). Reported by: Tamiji Homma <thomma@BayNetworks.com> Tested by: Tamiji Homma <thomma@BayNetworks.com>	2001-05-25 19:23:04 +00:00
rwatson	f504530d9f	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
phk	71a2c9473e	Make the PTY drivers cloning algorithm create "CHEAPCLONE" dev_t, so that some twit cannot allocate all 256 PTY's with "ls -l".	2001-05-25 13:23:42 +00:00
phk	977405e25b	Use the name given to the dev_t, rather than creating our own. This makes it possible to give sensible information for /dev/fd.720 and similar "special" devices.	2001-05-25 09:06:52 +00:00
ru	8094d979ca	- sys/msdosfs moved to sys/fs/msdosfs - msdos.ko renamed to msdosfs.ko - /usr/include/msdosfs moved to /usr/include/fs/msdosfs	2001-05-25 08:14:14 +00:00
phk	991876e15b	Don't rely on cdevsw_add() when we hack about with dev_t's.	2001-05-24 20:28:06 +00:00
phk	170ee567dd	Don't take the detour around devsw() to find out if the proto-cdevsw is already initialized.	2001-05-24 20:27:16 +00:00
alfred	b5d4bfc0e3	whitespace/style	2001-05-24 18:06:22 +00:00
dillon	a179ee09ab	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon	2001-05-24 07:22:27 +00:00
dd	439c2f6dae	Correct style bugs with regards to long lines and comments. Reviewed by: bde	2001-05-23 23:38:05 +00:00
jhb	8d7fd621d7	Don't acquire Giant just to call trap_fatal(), we are about to panic anyway so we'd rather see the printf's then block if the system is hosed.	2001-05-23 22:58:09 +00:00
jhb	2441ff245e	Don't release Giant around vm_oject_page_clean() in fsync() as the pager putpages called will need Giant.	2001-05-23 22:55:13 +00:00
jhb	7de84bf1d3	- Always call bfreekva() w/o vm_mtx held. - Always call vfs_setdirty() with vm_mtx held. - Fix an old comment: vm_hold_unload_pages is called vm_hold_free_pages() nowadays. - Always call vm_hold_free_pages() w/o vm_mtx held.	2001-05-23 22:24:49 +00:00
jhb	87198ba253	- Lock the VM when initializing the vmspace for proc0. - Don't bother releasing Giant while doing a lookup on the vm_map of initproc while starting up init. We have to grab it again right after the lookup anyways.	2001-05-23 22:06:47 +00:00
jhb	f8b92da193	Lock the VM while twiddling the vmspace.	2001-05-23 22:05:08 +00:00
bmilekic	e328ed5df3	Increment mbstat.m_mpfail, not mbstat.m_mcfail, when m_pullup() fails. This slipped in accidently a few commits back.	2001-05-23 20:44:54 +00:00
jhb	719e9bc0bf	Don't release the vm lock just to turn around and grab it again.	2001-05-23 19:51:12 +00:00
jhb	3f4e4d353c	Add in assertions to ensure that we always call msleep or mawait with either a timeout or a held mutex to detect unprotected infinite sleeps that can easily lead to deadlock. Submitted by: alfred	2001-05-23 19:38:26 +00:00
phk	b39746576c	syslogd gets kernel log messages only once every 30 seconds or at the top of the minute, whichever comes first. It seems logtimeout() is only called once after the kernel log is opened and then never again after that. So I guess syslogd only gets kernel log messages by virtue of syncer(4)'s flushes ...? PR: 27361 Submitted by: pkern@utcc.utoronto.ca MFC after: 1 week	2001-05-23 19:02:50 +00:00
alfred	ba66967415	aquire vm_mutex a little bit earlier to protect a pmap call.	2001-05-23 10:26:36 +00:00
ru	35437d86aa	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.	2001-05-23 09:42:29 +00:00
dd	adab6d99db	Unifdef DEV_SNP; snp(4) no longer requires these ugly hacks. Silence by: -hackers, -audit	2001-05-22 22:16:18 +00:00
dd	3ad1483a6b	Convert this driver to (ab?)use line disciplines to get the input it needs instead of relying on idiosyncratic hacks in the tty subsystem. Also add module code since this can now be compiled as a module. Silence by: -hackers, -audit	2001-05-22 22:13:14 +00:00
bde	5fd5877aef	Convert npx interrupts into traps instead of vice versa. This is much simpler for npx exceptions that start as traps (no assembly required...) and works better for npx exceptions that start as interrupts (there is no longer a problem for nested interrupts). Submitted by: original (pre-SMPng) version by luoqi	2001-05-22 21:20:49 +00:00
dd	0ec004ce3f	Correct the vm_mtx handling; specifically, don't acquire it in shm_deallocate_segment because shmexit_myhook calls it, and the latter should always be called with it already held. Submitted by: dwmalone, dd Approved by: alfred	2001-05-22 03:56:26 +00:00
alfred	81a30a0ee0	Remove KASSERT test for sleeping on mv_mtx, instead let WITNESS catch it. Requested by: jhb	2001-05-22 00:58:20 +00:00
jhb	d47e07ca44	Sort includes.	2001-05-21 18:52:02 +00:00
jhb	0df006928d	- Assert that the vm mutex is held in pipe_free_kmem(). - Don't release the vm mutex early in pipespace() but instead hold it across vm_object_deallocate() if vm_map_find() returns an error and across pipe_free_kmem() if vm_map_find() succeeds. - Add a XXX above a zfree() since zalloc already has its own locking, one would hope that zfree() wouldn't need the vm lock.	2001-05-21 18:47:17 +00:00
jhb	50d57b68fb	Axe unneeded spl()'s.	2001-05-21 18:30:50 +00:00
alfred	cdb5d97b47	Aquire vm mutex when releasing sysv shm segments. Obtained from: Dima Dorfman <dima@unixfreak.org>	2001-05-20 20:37:47 +00:00
jlemon	73537a93c4	Add convenience function kernel_sysctlbyname() for kernel consumers, so they don't have to roll their own sysctlbyname function.	2001-05-19 05:45:55 +00:00
alfred	8aa6c173b0	remove my private assertions from tsleep. add one assertion to ensure we don't sleep while holding vm.	2001-05-19 01:40:48 +00:00
alfred	3285b062fc	Regen syscalls that were made mpsafe via vm_mtx obreak, getpagesize, sbrk, sstk, mmap, ovadvise, munmap, mprotect, madvise, mincore, mmap, mlock, munlock, minherit, msync, mlockall, munlockall	2001-05-19 01:37:12 +00:00
alfred	a3f0842419	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
jhb	028d04532e	- Don't panic on a try lock operation for a sleep lock if we hold a spin lock. Since we won't actually block on a try lock operation, it's not a problem. Add a comment explaining why it is safe to skip lock order checking with try locks. - Remove the ithread list lock spin lock from the order list.	2001-05-17 22:44:56 +00:00
jhb	7d56154098	- Remove the global ithread_list_lock spin lock in favor of per-ithread sleep locks. - Delay returning from ithread_remove_handler() until we are certain that the interrupt handler being removed has in fact been removed from the ithread. - XXX: There is still a problem in that nothing protects the kernel from adding a new handler while the ithread is running, though with our current architectures this is not a problem. Requested by: gibbs (2)	2001-05-17 22:43:26 +00:00
jhb	59ffccfbd6	- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT. - Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be toggled after boot. - Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just afer the display of the copyright message instead of doing it by hand in three MD places.	2001-05-17 22:28:46 +00:00
rwatson	51497c8b5f	o Modify access control checks in p_candebug() such that the policy is as follows: the effective uid of p1 (subject) must equal the real, saved, and effective uids of p2 (object), p2 must not have undergone a credential downgrade. A subject with appropriate privilege may override these protections. In the future, we will extend these checks to require that p1 effective group membership must be a superset of p2 effective group membership. Obtained from: TrustedBSD Project	2001-05-17 21:48:44 +00:00
alfred	91f8c63e8d	Cleanup Remove comment about setting error for reads on EOF, read returns 0 on EOF so the code should be ok. Remove non-effective priority boost, PRIO+1 doesn't do anything (according to McKusick), if a real priority boost is needed it should have been +4. Style fixes: .) return foo -> return (foo) .) FLAG1\|FlAG2 -> FLAG1 \| FlAG2 .) wrap long lines .) unwrap short lines .) for(i=0;i=foo;i++) -> for (i = 0; i=foo; i++) .) remove braces for some conditionals with a single statement .) fix continuation lines. md5 couldn't verify the binary because some code had to be shuffled around to address the style issues.	2001-05-17 19:47:09 +00:00
alfred	2f1ebc4a57	initialize pipe pointers	2001-05-17 18:22:58 +00:00
alfred	37f8fb3daa	pipe_create has to zero out the select record earlier to avoid returning a half-initialized pipe and causing pipeclose() to follow a junk pointer. Discovered by: "Nick S" <snicko@noid.org>	2001-05-17 17:59:28 +00:00
iedowse	dafd513732	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp	2001-05-16 18:04:37 +00:00
alfred	f678704709	remove include of ipl.h because it no longer exists	2001-05-16 02:52:06 +00:00
jhb	3fbeaa9056	Remove unneeded includes of sys/ipl.h and machine/ipl.h.	2001-05-15 23:22:29 +00:00
jhb	1ee607ffea	- Remove unneeded include of sys/ipl.h. - Lock the process before calling killproc() to kill it for exceeding the maximum CPU limit.	2001-05-15 23:15:06 +00:00
jhb	36dcd7aa2f	- Remove unneeded include of sys/ipl.h. - Require the proc lock be held for killproc() to allow for the vmdaemon to kill a process when memory is exhausted while holding the lock of the process to kill.	2001-05-15 23:13:58 +00:00
brian	2a1cde5ff9	Support /dev/ctty again Submitted by: peter	2001-05-15 18:12:38 +00:00
tanimura	ac163f8025	Back out scanning file descriptors with holding a process lock. selrecord() requires allproc sx in pfind(), resulting in lock order reversal between allproc and a process lock.	2001-05-15 10:19:57 +00:00
jlemon	0d0226733a	When calling poll() on a fd associated with a filesystem, let POLLIN/POLLOUT behave identically to POLLRDNORM/POLLWRNORM. Submitted by: bde PR: 27287 merge after: 1 week	2001-05-14 14:37:25 +00:00
phk	112e6fb138	Use the new ability to avoid practically all the gunk in this file. When people access /dev/tty, locate their controlling tty and return the dev_t of it to them. This basically makes /dev/tty act like a variant symlink sort of thing which is much simpler than all the mucking about with vnodes.	2001-05-14 08:22:56 +00:00
tanimura	90ac553bec	- Convert msleep(9) in select(2) and poll(2) to cv_wait(9). - Since polling should not involve sleeping, keep holding a process lock upon scanning file descriptors. - Hold a reference to every file descriptor prior to entering polling loop in order to avoid lock order reversal between lockmgr and p_mtx upon calling fdrop() in fo_poll(). (NOTE: this work has not been done for netncp and netsmb yet because a socket itself has no reference counts.) Reviewed by: jhb	2001-05-14 05:26:48 +00:00
jhb	c20ad9aee2	Simplify the vm fault trap handling code a bit by using if-else instead of duplicating code in the then case and then using a goto to jump around the else case.	2001-05-11 23:50:08 +00:00
iedowse	33f5635b77	In vrele() and vput(), avoid triggering the confusing "missed vn_close" KASSERT when vp->v_usecount is zero or negative. In this case, the "v*: negative ref cnt" panic that follows is much more appropriate. Reviewed by: mckusick	2001-05-11 20:42:41 +00:00
jhb	2290c1ae6c	Check witness_dead in more functions to avoid panic'ing when assertions fail due to witness exhausting its internal resources and shutting down. Reported by: Szilveszter Adam <sziszi@petra.hos.u-szeged.hu> Tested by: David Wolfskill <david@catwhisker.org>	2001-05-11 20:25:29 +00:00
tegge	ee2cea66da	Regenerate.	2001-05-11 17:05:47 +00:00
tegge	b33cad4eaf	gettimeofday() is MP safe on both -current and -stable.	2001-05-11 17:05:12 +00:00
jhb	41fc4419f3	- Split out the support for per-CPU data from the SMP code. UP kernels have per-CPU data and gdb on the i386 at least needs access to it. - Clean up includes in kern_idle.c and subr_smp.c. Reviewed by: jake	2001-05-10 17:45:49 +00:00
alfred	4285179bde	Remove an 'optimization' I hope to never see again. The pipe code could not handle running out of kva, it would panic if that happened. Instead return ENFILE to the application which is an acceptable error return from pipe(2). There was some slightly tricky things that needed to be worked on, namely that the pipe code can 'realloc' the size of the buffer if it detects that the pipe could use a bit more room. However if it failed the reallocation it could not cope and would panic. Fix this by attempting to grow the pipe while holding onto our old resources. If all goes well free the old resources and use the new ones, otherwise continue to use the smaller buffer already allocated. While I'm here add a few blank lines for style(9) and remove 'register'.	2001-05-08 09:09:18 +00:00
phk	b424c063ea	Always initialize bio_resid from bio_bcount in the disk mini-layer so that the drivers don't have to do it umpteen times.	2001-05-08 08:24:54 +00:00
knu	fa8314227c	Properly copy the P_ALTSTACK flag in struct proc::p_flag to the child process on fork(2). It is the supposed behavior stated in the manpage of sigaction(2), and Solaris, NetBSD and FreeBSD 3-STABLE correctly do so. The previous fix against libc_r/uthread/uthread_fork.c fixed the problem only for the programs linked with libc_r, so back it out and fix fork(2) itself to help those not linked with libc_r as well. PR: kern/26705 Submitted by: KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp> Tested by: knu, GOTOU Yuuzou <gotoyuzo@notwork.org>, and some other people Not objected by: hackers MFC in: 3 days	2001-05-07 18:07:29 +00:00
phk	d95099399d	Make the disk mini-layer check for and handle zero-length transfers instead of the underlying drivers.	2001-05-06 21:55:22 +00:00
phk	16caeec9b0	Actually biofinish(struct bio , struct devstat , int error) is more general than the bioerror(). Most of this patch is generated by scripts.	2001-05-06 20:00:03 +00:00
phk	293bc407b0	Fix return type of vop_stdputpages() Noticed by: rwatson	2001-05-06 17:40:22 +00:00
rwatson	305a69de66	o First step in cleaning up authorization code for the posix4 implementation. Move from direct uid 0 comparision to using suser_xxx() call with the same semantics. Simplify CAN_AFFECT() macro as passed pcred was redundant. The checks here still aren't "right", but they are probably "better". Obtained from: TrustedBSD Project	2001-05-06 16:15:42 +00:00
dillon	389d1c1b27	Raise the SysV shared memory defaults to more reasonable values. Mainly increases the shared memory limit from 4M to 32M (approx). Many more programs these days use SysV shared memory, especially X-related programs.	2001-05-04 18:43:19 +00:00
jhb	79ebab510f	Fix a bug in the pfind() changes due to confusing the process returned by pfind() ('pp') with the process being detached from ptrace. Reported by: bde	2001-05-04 18:13:11 +00:00
jhb	21bc7f9fa7	- Move state about lock objects out of struct lock_object and into a new struct lock_instance that is stored in the per-process and per-CPU lock lists. Previously, the lock lists just kept a pointer to each lock held. That pointer is now replaced by a lock instance which contains a pointer to the lock object, the file and line of the last acquisition of a lock, and various flags about a lock including its recursion count. - If we sleep while holding a sleepable lock, then mark that lock instance as having slept and ignore any lock order violations that occur while acquiring Giant when we wake up with slept locks. This is ok because of Giant's special nature. - Allow witness to differentiate between shared and exclusive locks and unlocks of a lock. Witness will now detect the case when a lock is acquired first in one mode and then in another. Mutexes are always locked and unlocked exclusively. Witness will also now detect the case where a process attempts to unlock a shared lock while holding an exclusive lock and vice versa. - Fix a bug in the lock list implementation where we used the wrong constant to detect the case where a lock list entry was full.	2001-05-04 17:15:16 +00:00
jhb	d803e7dbf8	Don't hold the process mutex across calls to FREE() since the vm system uses lockmgr locks and this leads to a lock order reversal. At this point in wait1() the process is not on any process lists or in the process tree, so no other process should be able to find it or have a reference to it anyways, so the locking is not needed.	2001-05-04 16:13:28 +00:00
phk	5948c9ed5b	Implement vop_std{get\|put}pages() and add them to the default vop[]. Un-copy&paste all the VOP_{GET\|PUT}PAGES() functions which do nothing but the default.	2001-05-01 08:34:45 +00:00
markm	bcca5847d5	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
alfred	29aa62f877	When panic()'ing because of recursion on a non-recursive mutex, print out the location it was initially locked. Ok'd by: jake	2001-04-30 01:01:52 +00:00
jake	452dc293f2	Make rtprio work again. - add a missing break which caused RTP_SET to always return EINVAL - break instead of returning if p_can fails so proc_lock is always dropped correctly - only copyin data that is actually needed - use break instead of goto - make rtp_to_pri return EINVAL instead of -1 if the values are out or range so we don't have to translate	2001-04-29 22:09:26 +00:00
rwatson	616044a97d	o As part of the move to not maintaining copies of the vnode owning uid and gid in the ACL, vaccess_acl_posix1e() was changed to accept explicit file_uid and file_gid as arguments. However, in making the change, I explicitly checked file_gid against cr->cr_groups[0], rather than using groupmember, resulting in ACL_GROUP_OBJ entries being compared to the caller's effective gid only, not the remainder of its groups. This was recently corrected for the version of the group call without privilege, but the second test (when privilege is added) was missed. This change replaces an additiona cr->cr_groups[0] check with groupmember(). Pointed out by: jedgar Reviewed by: jedgar Obtained from: TrustedBSD Project	2001-04-29 19:53:50 +00:00
phk	8e3fa89968	VOP_BALLOC was never really a VOP in the first place, so convert it to UFS_BALLOC like the other "between UFS and FFS function interfaces".	2001-04-29 12:36:52 +00:00
phk	608c1caf3b	Add a vop_stdbmap(), and make it part of the default vop vector. Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.	2001-04-29 11:48:41 +00:00
grog	4b9d9cbaac	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
alfred	b5d32120a1	Instead of asserting that a mutex is not still locked after unlocking it, assert that the mutex is owned and not recursed prior to unlocking it. This should give a clearer diagnostic when a programming error is caught.	2001-04-28 12:11:01 +00:00
jhb	8bfdafc934	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind	2001-04-27 19:28:25 +00:00
alfred	c6739267c5	Actually show the values that tripped the assertion "receive 1"	2001-04-27 13:42:50 +00:00
rwatson	f786f0e0e3	o Remove the disabled p_cansched() test cases that permitted users to modify the scheduling properties of processes with a different real uid but the same effective uid (i.e., daemons, et al). (note: these cases were previously commented out, so this does not change the compiled code at al) Obtained from: TrustedBSD Project	2001-04-27 01:56:32 +00:00
phk	161a28e738	vfs_subr.c is getting rather fat. The underlying repocopy and this commit moves the filesystem export handling code to vfs_export.c	2001-04-26 20:47:14 +00:00
alfred	9b012f16c7	Sendfile is documented to return 0 on success, however if when a sf_hdtr is used to provide writev(2) style headers/trailers on the sent data the return value is actually either the result of writev(2) from the trailers or headers of no tailers are specified. Fix sendfile to comply with the documentation, by returning 0 on success. Ok'd by: dg	2001-04-26 00:14:14 +00:00
tanimura	ed98caf17b	Do not leave a process with no credential in zombproc. Reviewed by: jhb	2001-04-25 10:22:35 +00:00
mckusick	f863141979	When closing the last reference to an unlinked file, it is freed by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended. For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written. Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.	2001-04-25 08:11:18 +00:00
phk	cdc83afc7f	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>	2001-04-25 07:07:52 +00:00
tmm	731887b731	Change uipc_sockaddr so that a sockaddr_un without a path is returned nam for an unbound socket instead of leaving nam untouched in that case. This way, the getsockname() output can be used to determine the address family of such sockets (AF_LOCAL). Reviewed by: iedowse Approved by: rwatson	2001-04-24 19:09:23 +00:00
jhb	9c03a8ae91	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake	2001-04-24 00:51:53 +00:00
tmm	901e595f36	Fix a bug introduced in the last commit: vaccess_acl_posix1 only checked the file gid gainst the egid of the accessing process for the ACL_GROUP_OBJ case, and ignored supplementary groups. Approved by: rwatson	2001-04-23 22:52:26 +00:00
grog	1f5de30718	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
rwatson	f2143d67aa	o Remove comment indicating policy permits loop-back debugging, but semantics don't: in practice, both policy and semantics permit loop-back debugging operations, only it's just a subset of debugging operations (i.e., a proc can open its own /dev/mem), and that's at a higher layer.	2001-04-21 22:41:45 +00:00
jhb	25f5a9093e	Spelling nit: acquring -> acquiring. Reported by: T. William Wells <bill@twwells.com>	2001-04-21 01:50:32 +00:00
alfred	9ae313b4ad	Assert that when using an interlock mutex it is not recursed when lockmgr() is called. Ok'd by: jhb	2001-04-20 22:38:40 +00:00
jhb	81a2b0cc18	Make the ap_boot_mtx mutex static.	2001-04-20 01:09:05 +00:00
jhb	793d318d75	- Whoops, forgot to enable the clock lock in the spin order list on the alpha. - Change the Debugger() functions to pass in the real function name.	2001-04-19 15:49:54 +00:00
bmilekic	b857e0ac23	Fix inconsistency in setup of kernel_map: we need to make sure that we also reserve _adequate_ space for the mb_map submap; i.e. we need space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to rounddown, and not roundup, so that we are consistent. Pointed out by: bde	2001-04-18 23:54:13 +00:00
alfred	3405c2ccfa	Check validity of signal callback requested via aio routines. Also move the insertion of the request to after the request is validated, there's still looks like there may be some problems if an invalid address is passed to the aio routines, basically a possible leak or having a not completely initialized structure on the queue may still be possible. A new sig macro was made _SIG_VALID to check the validity of a signal, it would be advisable to use it from now on (in kern/kern_sig.c) rather than rolling your own. PR: kern/17152	2001-04-18 22:18:39 +00:00
tanimura	546a3cb874	Reclaim directory vnodes held in namecache if few free vnodes are available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk	2001-04-18 11:19:50 +00:00
phk	11bb4116b3	bread() is a special case of breadn(), so don't replicate code.	2001-04-18 07:16:07 +00:00
dd	c50aedd1ac	Make this driver play ball with devfs(5). Reviewed by: brian	2001-04-17 20:53:11 +00:00
alfred	2c4a656351	Add a sanity check on ucred refcount. Submitted by: Terry Lambert <terry@lambert.org>	2001-04-17 20:50:43 +00:00
alfred	f0669d6c9e	Implement client side NFS locks. Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org	2001-04-17 20:45:23 +00:00
phk	676302e684	Write a switch statement as less obscure if statements.	2001-04-17 20:22:07 +00:00
jhb	f2fbc423a1	Fix an old bug related to BETTER_CLOCK. Call forward_clock if SMP and __i386__ are defined rather than if SMP and BETTER_CLOCK are defined. The removal of BETTER_CLOCK would have broken this except that kern_clock.c doesn't include <machine/smptests.h>, so it doesn't see the definition of BETTER_CLOCK, and forward_clock aren't called, even on 4.x. This seems to fix the problem where a n-way SMP system would see 100 * n clk interrupts and 128 * n rtc interrupts.	2001-04-17 17:53:36 +00:00
phk	378e561228	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.	2001-04-17 08:56:39 +00:00
mckusick	ba66879022	Add debugging option to always read/write cylinder groups as full sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'. Add debugging option to disable background writes of cylinder groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'. These debugging options should be tried on systems that are panicing with corrupted cylinder group maps to see if it makes the problem go away. The set of panics in question are: ffs_clusteralloc: map mismatch ffs_nodealloccg: map corrupted ffs_nodealloccg: block not in map ffs_alloccg: map corrupted ffs_alloccg: block not in map ffs_alloccgblk: cyl groups corrupted ffs_alloccgblk: can't find blk in cyl ffs_checkblk: partially free fragment The following panics are less likely to be related to this problem, but might be helped by these debugging options: ffs_valloc: dup alloc ffs_blkfree: freeing free block ffs_blkfree: freeing free frag ffs_vfree: freeing free inode If you try these options, please report whether they helped reduce your bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com> and to Matt Dillon <dillon@earth.backplane.com>.	2001-04-17 05:37:51 +00:00
rwatson	678b28a532	In my first reading of POSIX.1e, I misinterpreted handling of the ACL_USER_OBJ and ACL_GROUP_OBJ fields, believing that modification of the access ACL could be used by privileged processes to change file/directory ownership. In fact, this is incorrect; ACL_*_OBJ (+ ACL_MASK and ACL_OTHER) should have undefined ae_id fields; this commit attempts to correct that misunderstanding. o Modify arguments to vaccess_acl_posix1e() to accept the uid and gid associated with the vnode, as those can no longer be extracted from the ACL passed as an argument. Perform all comparisons against the passed arguments. This actually has the effect of simplifying a number of components of this call, as well as reducing the indent level, but now seperates handling of ACL_GROUP_OBJ from ACL_GROUP. o Modify acl_posix1e_check() to return EINVAL if the ae_id field of any of the ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} entries is a value other than ACL_UNDEFINED_ID. As a temporary work-around to allow clean upgrades, set the ae_id field to ACL_UNDEFINED_ID before each check so that this cannot cause a failure in the short term (this work-around will be removed when the userland libraries and utilities are updated to take this change into account). o Modify ufs_sync_acl_from_inode() so that it forces ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} ae_id fields to ACL_UNDEFINED_ID when synchronizing the ACL from the inode. o Modify ufs_sync_inode_from_acl to not propagate uid and gid information to the inode from the ACL during ACL update. Also modify the masking of permission bits that may be set from ALLPERMS to (S_IRWXU\|S_IRWXG\|S_IRWXO), as ACLs currently do not carry none-ACCESSPERMS (S_ISUID, S_ISGID, S_ISTXT). o Modify ufs_getacl() so that when it emulates an access ACL from the inode, it initializes the ae_id fields to ACL_UNDEFINED_ID. o Clean up ufs_setacl() substantially since it is no longer possible to perform chown/chgrp operations using vop_setacl(), so all the access control for that can be eliminated. o Modify ufs_access() so that it passes owner uid and gid information into vaccess_acl_posix1e(). Pointed out by: jedger Obtained from: TrustedBSD Project	2001-04-17 04:33:34 +00:00
jhb	82848b046f	Blow away the panic mutex in favor of using a single atomic_cmpset() on a panic_cpu shared variable. I used a simple atomic operation here instead of a spin lock as it seemed to be excessive overhead. Also, this can avoid recursive panics if, for example, witness is broken.	2001-04-17 04:18:08 +00:00
jhb	d16229755c	Check to see if enroll() returns NULL in the witness initialization. This can happen if witness runs out of resources during initialization or if witness_skipspin is enabled. Sleuthing by: Peter Jeremy <peter.jeremy@alcatel.com.au>	2001-04-17 03:35:38 +00:00
jhb	4448046853	Exit and re-enter the critical section while spinning for a spinlock so that interrupts can come in while we are waiting for a lock.	2001-04-17 03:34:52 +00:00
jhay	0bd9e525aa	Update to the 2001-04-02 version of the nanokernel code from Dave Mills.	2001-04-16 13:05:05 +00:00
brian	16b3606e12	Call strlen() once instead of twice.	2001-04-14 21:33:58 +00:00
rwatson	2603acd499	o Since uid checks in p_cansignal() are now identical between P_SUGID and non-P_SUGID cases, simplify p_cansignal() logic so that the P_SUGID masking of possible signals is independent from uid checks, removing redundant code and generally improving readability. Reviewed by: tmm Obtained from: TrustedBSD Project	2001-04-13 14:33:45 +00:00
alfred	f5211e7a6c	convert if/panic -> KASSERT, explain what triggered the assertion	2001-04-13 10:15:53 +00:00
murray	b31a55145f	Generate useful error messages.	2001-04-13 09:37:25 +00:00
markm	0efbb4e263	Handle a rare but fatal race invoked sometimes when SIGSTOP is invoked.	2001-04-13 09:29:34 +00:00
jhb	c987a9115b	- Add a comment at the start of the spin locks list. - The alpha SMP code uses an "ap boot" spinlock as well.	2001-04-13 08:31:38 +00:00
rwatson	c11aa73a4b	o Disallow two "allow this" exceptions in p_cansignal() restricting the ability of unprivileged processes to deliver arbitrary signals to daemons temporarily taking on unprivileged effective credentials when P_SUGID is not set on the target process: Removed: (p1->p_cred->cr_ruid != ps->p_cred->cr_uid) (p1->p_ucred->cr_uid != ps->p_cred->cr_uid) o Replace two "allow this" exceptions in p_cansignal() restricting the ability of unprivileged processes to deliver arbitrary signals to daemons temporarily taking on unprivileged effective credentials when P_SUGID is set on the target process: Replaced: (p1->p_cred->p_ruid != p2->p_ucred->cr_uid) (p1->p_cred->cr_uid != p2->p_ucred->cr_uid) With: (p1->p_cred->p_ruid != p2->p_ucred->p_svuid) (p1->p_ucred->cr_uid != p2->p_ucred->p_svuid) o These changes have the effect of making the uid-based handling of both P_SUGID and non-P_SUGID signal delivery consistent, following these four general cases: p1's ruid equals p2's ruid p1's euid equals p2's ruid p1's ruid equals p2's svuid p1's euid equals p2's svuid The P_SUGID and non-P_SUGID cases can now be largely collapsed, and I'll commit this in a few days if no immediate problems are encountered with this set of changes. o These changes remove a number of warning cases identified by the proc_to_proc inter-process authorization regression test. o As these are new restrictions, we'll have to watch out carefully for possible side effects on running code: they seem reasonable to me, but it's possible this change might have to be backed out if problems are experienced. Submitted by: src/tools/regression/security/proc_to_proc/testuid Reviewed by: tmm Obtained from: TrustedBSD Project	2001-04-13 03:06:22 +00:00
rwatson	e767472b72	o Disable two "allow this" exceptions in p_cansched()m retricting the ability of unprivileged processes to modify the scheduling properties of daemons temporarily taking on unprivileged effective credentials. These cases (p1->p_cred->p_ruid == p2->p_ucred->cr_uid) and (p1->p_ucred->cr_uid == p2->p_ucred->cr_uid), respectively permitting a subject process to influence the scheduling of a daemon if the subject process has the same real uid or effective uid as the daemon's effective uid. This removes a number of the warning cases identified by the proc_to_proc iner-process authorization regression test. o As these are new restrictions, we'll have to watch out carefully for possible side effects on running code: they seem reasonable to me, but it's possible this change might have to be backed out if problems are experienced. Reported by: src/tools/regression/security/proc_to_proc/testuid Obtained from: TrustedBSD Project	2001-04-12 22:46:07 +00:00
rwatson	6a5eb15d6e	o Make kqueue's filt_procattach() function use the error value returned by p_can(...P_CAN_SEE), rather than returning EACCES directly. This brings the error code used here into line with similar arrangements elsewhere, and prevents the leakage of pid usage information. Reviewed by: jlemon Obtained from: TrustedBSD Project	2001-04-12 21:32:02 +00:00
rwatson	9ba6e18ce6	o Limit process information leakage by introducing a p_can(...P_CAN_SEE...) in rtprio()'s RTP_LOOKIP implementation. Obtained from: TrustedBSD Project	2001-04-12 20:46:26 +00:00
rwatson	6099fe8265	o Reduce information leakage into jails by adding invocations of p_can(...P_CAN_SEE...) to getpgid(), getsid(), and setpgid(), blocking these operations on processes that should not be visible by the requesting process. Required to reduce information leakage in MAC environments. Obtained from: TrustedBSD Project	2001-04-12 19:39:00 +00:00
rwatson	366237b31f	o Replace p_cankill() with p_cansignal(), remove wrappage of p_can() from signal authorization checking. o p_cansignal() takes three arguments: subject process, object process, and signal number, unlike p_cankill(), which only took into account the processes and not the signal number, improving the abstraction such that CANSIGNAL() from kern_sig.c can now also be eliminated; previously CANSIGNAL() special-cased the handling of SIGCONT based on process session. privused is now deprecated. o The new p_cansignal() further limits the set of signals that may be delivered to processes with P_SUGID set, and restructures the access control check to allow it to be extended more easily. o These changes take into account work done by the OpenBSD Project, as well as by Robert Watson and Thomas Moestl on the TrustedBSD Project. Obtained from: TrustedBSD Project	2001-04-12 02:38:08 +00:00
rwatson	ab04223ac6	o Regenerated following introduction of __setugid() system call for "options REGRESSION". Obtained from: TrustedBSD Project	2001-04-11 20:21:37 +00:00
rwatson	af3eb0f5a2	o Introduce a new system call, __setsugid(), which allows a process to toggle the P_SUGID bit explicitly, rather than relying on it being set implicitly by other protection and credential logic. This feature is introduced to support inter-process authorization regression testing by simplifying userland credential management allowing the easy isolation and reproduction of authorization events with specific security contexts. This feature is enabled only by "options REGRESSION" and is not intended to be used by applications. While the feature is not known to introduce security vulnerabilities, it does allow processes to enter previously inaccessible parts of the credential state machine, and is therefore disabled by default. It may not constitute a risk, and therefore in the future pending further analysis (and appropriate need) may become a published interface. Obtained from: TrustedBSD Project	2001-04-11 20:20:40 +00:00
jhb	3588cc574a	Stick proc0 in the PID hash table.	2001-04-11 18:50:50 +00:00
jhb	4dd39ab878	Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just "redundant noise" and to match the IPI constant namespace (IPI_*). Requested by: bde	2001-04-11 17:06:02 +00:00
jedgar	512fd8bc5f	Correct the following defines to match the POSIX.1e spec: ACL_PERM_EXEC -> ACL_EXECUTE ACL_PERM_READ -> ACL_READ ACL_PERM_WRITE -> ACL_WRITE Obtained from: TrustedBSD	2001-04-11 02:19:01 +00:00
peter	8b9d89e1e4	Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.	2001-04-11 00:39:20 +00:00
jhb	ee034b0be2	Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here to stay for the foreseeable future. OK'd by: peter (the idea)	2001-04-10 21:34:13 +00:00
jhb	7df3e25496	Add an MI API for sending IPI's. I used the same API present on the alpha because: - it used a better namespace (smp_ipi_* rather than _ipi), - it used better constant names for the IPI's (IPI_ rather than X*_OFFSET), and - this API also somewhat exists for both alpha and ia64 already.	2001-04-10 21:04:32 +00:00
bp	a414f03f5d	Import kernel part of SMB/CIFS requester. Add smbfs(CIFS) filesystem. Userland part will be in the ports tree for a while. Obtained from: smbfs-1.3.7-dev package.	2001-04-10 07:59:06 +00:00
bp	c7aea79d8d	Avoid endless recursion on panic. Reviewed by: jhb	2001-04-10 00:56:19 +00:00
jhb	242556cfec	Maintain a reference count on the witness struct. When the reference count drops to 0 in witness_destroy, set the w_name and w_file pointers to point to the string "(dead)" and the w_line field to 0. This way, if a mutex of a given name is used only in a module, then as long as all mutexes in the module are destroyed when the module is unloaded, witness will not maintain stale references to the mutex's name in the module's data section causing a panic later on when the w_name or w_file field's are examined.	2001-04-09 22:34:05 +00:00
n_hibma	4e9569652d	Remove a stale file.	2001-04-09 10:28:33 +00:00

... 2 3 4 5 6 ...

4036 Commits