freebsd-nq

Author	SHA1	Message	Date
Nate Lawson	7dc9111650	Pick up one file missed in the previous vprint() cleanup	2003-03-03 19:50:36 +00:00
Nate Lawson	99648386d3	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
Poul-Henning Kamp	182a9f7455	Make nokqfilter() return the correct return value. Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.	2003-03-03 16:24:47 +00:00
Poul-Henning Kamp	7ac40f5f59	Gigacommit to improve device-driver source compatibility between branches: Initialize struct cdevsw using C99 sparse initializtion and remove all initializations to default values. This patch is automatically generated and has been tested by compiling LINT with all the fields in struct cdevsw in reverse order on alpha, sparc64 and i386. Approved by: re(scottl)	2003-03-03 12:15:54 +00:00
Poul-Henning Kamp	a9463ba804	Don't pick up a name from the dev_t if it is not there.	2003-03-03 11:14:36 +00:00
Jeff Roberson	65c8760dbf	- Shift the tick count by 10 and back around sched_pctcpu_update() calculations. Keep this changes local to the function so the tick count is in its natural form otherwise. Previously 1000 was added each time a tick fired and we divided by 1000 when it was reported. This is done to reduce rounding errors.	2003-03-03 05:29:09 +00:00
Jeff Roberson	a6ed41865b	- In sched_add() special case PRI_TIMESHARE and PRI_ITHD\|PRI_REALTIME. We always place ITHD & REALTIME threads on the current queue of the current cpu. Prior to this change an interrupt thread would only ever run on one cpu.	2003-03-03 04:28:07 +00:00
Jeff Roberson	f1e8dc4a3b	- Refrain from setting the td_priority in sched_wakeup(). It will be reset before we return to user space.	2003-03-03 04:11:40 +00:00
Poul-Henning Kamp	f16304aaf0	Explicitly initialize all cdevsw methods with the relevant nofoo() function if they are NULL.	2003-03-02 19:46:45 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Dag-Erling Smørgrav	8994a245e0	Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.	2003-03-02 15:56:49 +00:00
Dag-Erling Smørgrav	c952458814	uiomove-related caddr_t -> void * (just the low-hanging fruit)	2003-03-02 15:50:23 +00:00
Dag-Erling Smørgrav	d5279f20c5	Convert one of our main caddr_t consumers, uiomove(9), to void *.	2003-03-02 15:29:13 +00:00
Dag-Erling Smørgrav	34ca14c687	Clean up whitespace, unregisterize, ANSIfy, remove prototypes made superfluous by ANSIfication.	2003-03-02 15:08:33 +00:00
Poul-Henning Kamp	9c486c30e2	NO_GEOM cleanup: Remove cdevsw->d_size() implementation. No longer needed.	2003-03-02 14:43:46 +00:00
Poul-Henning Kamp	9285a87efd	NODEVFS cleanup: Replace devfs_{create,destroy} hooks with direct function calls.	2003-03-02 13:35:30 +00:00
Jeff Roberson	491081fabf	- Hold the vnode interlock across calls to bgetvp instead of acquiring it internally. This is required to stop multiple bufs from being associated with a single lblkno.	2003-03-02 06:05:23 +00:00
Tor Egge	c6faf3bf1d	Remove unneeded code added in revision 1.188.	2003-03-01 17:18:28 +00:00
Jeff Roberson	bff5362bf2	- gc USE_BUFHASH. The smp locking of the buf cache renders this useless.	2003-03-01 05:55:03 +00:00
David Xu	9948c47f0e	Check kse group limit before linking new ksegrp.	2003-02-28 15:57:33 +00:00
Poul-Henning Kamp	85f19dccdb	Add the flip-side check: If a driver wants a particular major#, make sure it is marked as allocated in reserved_majors[]. Whine if it wasn't.	2003-02-27 15:17:37 +00:00
Maxime Henrion	bca0668a92	We can now properly return ENODEV in nommap(), so do it. Remove the now wrong comment which says we can't.	2003-02-27 14:48:53 +00:00
Poul-Henning Kamp	beea48b254	Add support for allocating a device driver major number on demand. To do this, initialize the d_maj member of the cdevsw to MAJOR_AUTO. When the cdevsw is first passed to make_dev() a free major number will be assigned. Until we have a bit more experience with this a printf will announce this fact. Major numbers are not reclaimed, so loading/unloading the same device driver which uses MAJOR_AUTO will eventually deplete the pool of free major numbers and the system will panic when it can not allocate one. Still undecided who to invonvenience with the solution to this.	2003-02-27 14:46:51 +00:00
Hartmut Brandt	b89bc9e62b	When a process has been waiting on a condition variable or mutex the td_wmesg field in the thread structure points to the description string of the condition variable or mutex. If the condvar or the mutex had been initialized from a loadable module that was unloaded in the meantime, td_wmesg may now point to invalid memory. Retrieving the process table now may panic the kernel (or access junk). Setting the td_wmesg field to NULL after unblocking on the condvar/mutex prevents this panic. PR: kern/47408 Approved by: jake (mentor)	2003-02-27 08:43:27 +00:00
Poul-Henning Kamp	f477b4fd53	NODEVFS cleanup: Remove cdevsw_add() and cdevsw_remove(), they served us well for a long time. Bump __FreeBSD_version to 500104 to mark this.	2003-02-27 07:40:44 +00:00
David Xu	3b3df40fc4	Release sched_lock before calling upcall_free.	2003-02-27 05:42:01 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Sam Leffler	893bec8059	o fix ppsratecheck to interpret a maxpps of zero as "ignore everything" o add a comment explaining the significance of using 0 or -1 (actually any negative value) for maxpps	2003-02-26 17:16:38 +00:00
David Xu	426269b2c2	Fix a bug when handling SIGCONT. Reported By: Mike Makonnen <mtm@identd.net>	2003-02-26 12:47:46 +00:00
Scott Long	7874f606d5	Introduce a new taskqueue that runs completely free of Giant, and in turns runs its tasks free of Giant too. It is intended that as drivers become locked down, they will move out of the old, Giant-bound taskqueue and into this new one. The old taskqueue has been renamed to taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.	2003-02-26 03:15:42 +00:00
David Xu	5614648e5e	Add a missing '!'.	2003-02-26 01:56:14 +00:00
David Xu	4b4866ed42	Add a simple facility to allow round roubin in userland. Reviewed by: julain	2003-02-26 00:58:23 +00:00
Kirk McKusick	7e734c4149	When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c delta 1.371) we must ensure that we do not get ourselves into a recursive trap endlessly trying to clean up after ourselves. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:59:09 +00:00
Mike Makonnen	0bd5f7979d	Unbreak mutex profiling (at least for me). o Always check for null when dereferencing the filename component. o Implement a try-and-backoff method for allocating memory to dump stats to avoid a spin-lock -> sleep-lock mutex lock order panic with WITNESS. Approved by: des, markm (mentor) Not objected: jhb	2003-02-25 22:28:46 +00:00
Jeff Roberson	2e3981a70c	- Add the missing NULL interlock argument to a recently added BUF_LOCK.	2003-02-25 08:23:11 +00:00
Kirk McKusick	3a7053cb60	Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 06:44:42 +00:00
David Xu	d4b570f053	Remove a bogus comment.	2003-02-25 05:17:18 +00:00
David Xu	768298d8c4	Remove a never true condition.	2003-02-25 05:14:18 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Maxime Henrion	07159f9c56	Cleanup of the d_mmap_t interface. - Get rid of the useless atop() / pmap_phys_address() detour. The device mmap handlers must now give back the physical address without atop()'ing it. - Don't borrow the physical address of the mapping in the returned int. Now we properly pass a vm_offset_t * and expect it to be filled by the mmap handler when the mapping was successful. The mmap handler must now return 0 when successful, any other value is considered as an error. Previously, returning -1 was the only way to fail. This change thus accidentally fixes some devices which were bogusly returning errno constants which would have been considered as addresses by the device pager. - Garbage collect the poorly named pmap_phys_address() now that it's no longer used. - Convert all the d_mmap_t consumers to the new API. I'm still not sure wheter we need a __FreeBSD_version bump for this, since and we didn't guarantee API/ABI stability until 5.1-RELEASE. Discussed with: alc, phk, jake Reviewed by: peter Compile-tested on: LINT (i386), GENERIC (alpha and sparc64) Runtime-tested on: i386	2003-02-25 03:21:22 +00:00
Scott Long	3303c14b57	Don't NULL out p_fd until after closefd() has been called. This isn't totally correct, but it has caused breakage for too long. I welcome someone with more fd fu to fix it correctly.	2003-02-24 05:46:55 +00:00
David Xu	0fccb684d1	Remove a XXXKSE. kg_completed now needs proc lock.	2003-02-24 01:28:10 +00:00
David Xu	f5878f69df	Backout last surplus commit. That day just wasn't my day.	2003-02-24 00:49:55 +00:00
Tor Egge	6a07a13944	Sync new socket nonblocking/async state with file flags in accept(). PR: 1775 Reviewed by: mbr	2003-02-23 23:00:28 +00:00
Poul-Henning Kamp	acb18acfec	Bracket the kern.vnode sysctl in #ifdef notyet because it results in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems.	2003-02-23 18:09:05 +00:00
Poul-Henning Kamp	5cb3dc8fa3	OK, I was too sleepy there... Pointy hat over here!	2003-02-23 13:45:55 +00:00
Poul-Henning Kamp	8f5ef1a9fa	Implement CLOCK_MONOTONIC.	2003-02-23 10:18:31 +00:00
Jake Burkholder	fc718df7d0	Add a /a modifier to the show ktr ddb command, which prints the whole trace buffer without stopping. Useful if you just want to capture the output but can't run ktrdump.	2003-02-22 23:30:37 +00:00
Robert Watson	90623e1a9e	Don't panic when enumerating SYSCTL_NODE() nodes without any children nodes. Submitted by: green, Hiten Pandya <hiten@unixdaemons.com>	2003-02-22 17:58:06 +00:00
Mike Makonnen	750a91d8b1	Remove a comment which hasn't been true since rev. 1.158 Approved by: jhb, markm (mentor)(implicit)	2003-02-22 05:59:48 +00:00
Robert Watson	838a6d03e8	Export the name of the device used to mount the root file system as kern.rootdev. If rootdev is undefined (NFS mount, etc), export an empty string. Desired by: peter	2003-02-22 05:01:12 +00:00
Peter Wemm	86bb731626	Missing M_TRYWAIT from so_upcall third argument.	2003-02-21 22:23:40 +00:00
Poul-Henning Kamp	2c6b49f6af	NO_GEOM cleanup: Retire the "d_dump_t" and use the "dumper_t" type instead. Dumper_t takes a void * as first arg which is more general than the dev_t taken by d_dump_t. (Remember: we could have net-dumpers if somebody wrote us one!) Define the convention for GEOM controlled disk devices to be that the first argument to the dumper function is the struct disk pointer. Change device drivers accordingly.	2003-02-21 19:00:48 +00:00
David Xu	34ada4b3bb	If UTS kernel is calling kse_wakeup for itself, do nothing.	2003-02-21 07:11:38 +00:00
Poul-Henning Kamp	263444cfbf	Change the console interface to pass a "struct consdev " instead of a dev_t to the method functions. The dev_t can still be found at struct consdev ->cn_dev. Add a void *cn_arg element to struct consdev which the drivers can use for retrieving their softc.	2003-02-20 20:54:45 +00:00
Poul-Henning Kamp	02574b19e1	Add a dead_cdevsw which does its best to return ENXIO if at all possible. In devsw() return dead_cdevsw instead of NULL in case the dev_t does not have a si_devsw. This may improve our survival chances with devices which go away unexpectedly.	2003-02-20 15:35:54 +00:00
David Xu	ab7d94f7eb	Forgot to set KU_DOUPCALL in kse_wakeup.	2003-02-20 08:22:04 +00:00
David Xu	eb117d5cb0	Add a timeout parameter to kse_release.	2003-02-20 08:18:15 +00:00
Bosko Milekic	025b4be197	o Allow "buckets" in mb_alloc to be differently sized (according to compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c	2003-02-20 04:26:58 +00:00
Tim J. Robbins	27e39ae4d8	Remove the PL_SHAREMOD flag from struct plimit, which could have been used to share resource limits between rfork threads, but never was. Removing it makes resource limit locking much simpler -- only the current process can change the contents of the structure that p_limit points to.	2003-02-20 04:18:42 +00:00
Olivier Houchard	d6bf23783f	Remove duplicate includes. Submitted by: Cyril Nguyen-Huu <cyril@ci0.org>	2003-02-20 03:26:11 +00:00
Bosko Milekic	ec73437395	Fix a serious bug when computing the index for the reference counter array for mbuf clusters. I don't know how this got past early testing nor how it survived so long without getting caught. If anyone was seeing really really bizarre memory corruption in a few mbufs this would be why.	2003-02-20 03:01:04 +00:00
David Xu	a87891ee9e	Move thread limits testing code up a bit. This let UPCALLING thread takes possible accumulated contexts away.	2003-02-20 01:11:17 +00:00
Poul-Henning Kamp	0c977c9c53	Add M_WAITOK	2003-02-19 22:51:33 +00:00
David Xu	fc8cdd87d2	Count non-threaded group.	2003-02-19 13:40:24 +00:00
David Xu	4f6cfa4520	Update comments to reflect new KSE code.	2003-02-19 13:36:51 +00:00
Tim J. Robbins	a44a414e11	The "m = m->m_next" that was removed in the revision 1.12 was necessary for the m->m_next != NULL case to avoid looping infinitely when the first mbuf in the chain becomes full.	2003-02-19 10:12:42 +00:00
David Xu	30621e142d	M_WAITOK and remove an useless comment.	2003-02-19 09:59:12 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
David Xu	0252d20369	Optimize the case when max threads number was hit.	2003-02-19 04:01:55 +00:00
Peter Wemm	af3d516f55	Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been #if'ed out for a while. Complete the deed and tidy up some other bits. We need to be able to call this stuff from outer edges of interrupt handlers for devices that have the ISR bits in pci config space. Making the bios code mpsafe was just too hairy. We had also stubbed it out some time ago due to there simply being too much brokenness in too many systems. This adds a leaf lock so that it is safe to use pci_read_config() and pci_write_config() from interrupt handlers. We still will use pcibios to do interrupt routing if there is no acpi.. [yes, I tested this] Briefly glanced at by: imp	2003-02-18 03:36:49 +00:00
David Xu	88aba94cdc	Further fix PS_NEEDSIGCHK	2003-02-17 14:54:57 +00:00
David Xu	02bbffaf3c	Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall, I think it is a better place to handle it.	2003-02-17 14:41:22 +00:00
Tim J. Robbins	96d7f8ef46	Use the proc lock to protect p_realtimer instead of Giant, and obtain sched_lock around accesses to p_stats->p_timer[] to avoid a potential race with hardclock. getitimer(), setitimer() and the realitexpire() callout are now Giant-free.	2003-02-17 10:03:02 +00:00
Jeff Roberson	58a3c27384	- Add a new function, thread_signal_add(), that is called from postsig to add a signal to a mailbox's pending set. - Add a new function, thread_signal_upcall(), this causes the current thread to upcall so that we can deliver pending signals. Reviewed by: mini	2003-02-17 09:58:11 +00:00
Julian Elischer	4a338afd7a	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@	2003-02-17 09:55:10 +00:00
Jeff Roberson	5215b1872f	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu	2003-02-17 05:14:26 +00:00
Jeff Roberson	e4625663c9	- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into the proc. These counters are only examined through calcru. Submitted by: davidxu Tested on: x86, alpha, UP/SMP	2003-02-17 02:19:58 +00:00
Alfred Perlstein	9d4156aed3	Fix logic in loop so it actually executes. Pointed out by: fjoe	2003-02-16 16:12:10 +00:00
Poul-Henning Kamp	f341ca9891	Remove #include <sys/dkstat.h>	2003-02-16 14:13:23 +00:00
Poul-Henning Kamp	3abd4ccf87	Move the tty related statistics counters to live with the tty code.	2003-02-16 13:22:15 +00:00
Jeff Roberson	71146186a1	- Introduce a new function bremfreel() that does a bremfree with the buf queue lock already held. - In getblk() and flushbufqueues() use bremfreel() while we still have the buf queue lock held to keep the lists consistent. - Add LK_NOWAIT to two cases where we're essentially asserting that the bufs are not locked while acquiring the locks. This will make sure that we get the appropriate panic() and not another one for sleeping with a lock held.	2003-02-16 10:43:06 +00:00
Jeff Roberson	5e8feb5bed	- Add a WITNESS_SLEEP() for the appropriate cases in lockmgr().	2003-02-16 10:39:49 +00:00
Alfred Perlstein	5015c68a3c	prevent overflow in shminfo.shmmax	2003-02-16 06:08:55 +00:00
Jeffrey Hsu	a44009e07d	Remove extraneous FILEDESC_LOCK around atomic read.	2003-02-16 02:15:15 +00:00
Andrew R. Reiter	1f5a94d5f6	- Update a couple of comments to make sense with what today's code is doing (stale comments make arr something something ;)).	2003-02-15 23:25:12 +00:00
Tor Egge	218a01e062	Avoid file lock leakage when linuxthreads port or rfork is used: - Mark the process leader as having an advisory lock - Check if process leader is marked as having advisory lock when closing file - Check that file is still open after lock has been obtained - Don't allow file descriptor table sharing between processes with different leaders PR: 10265 Reviewed by: alfred	2003-02-15 22:43:05 +00:00
Andrew R. Reiter	da8f0c8429	- Remove old comment for PURGE() as it no longer exists and implied it was a comment to cache_zap(). - Add a comment to quickly state what cache_zap() does. Reviewed by: phk, mux	2003-02-15 18:58:06 +00:00
Tim J. Robbins	4444375710	Acquire Giant around calls to kern_sigaction() in sigaction(), freebsd4_sigaction() and osigaction() instead of around the whole body of those functions. They now no longer hold Giant around calls to copyin() and copyout(), and it is slightly more obvious what Giant is protecting.	2003-02-15 09:56:09 +00:00
Tim J. Robbins	c41c566c4a	osigpending() no longer needs Giant, for the same reason sigpending() does not.	2003-02-15 09:15:30 +00:00
Tim J. Robbins	48e8f774cb	All uses of p_siglist are protected by the proc lock now, so there's no need to acquire Giant in sigpending() anymore.	2003-02-15 08:42:02 +00:00
Alfred Perlstein	e7d6662f1b	Do not allow kqueues to be passed via unix domain sockets.	2003-02-15 06:04:55 +00:00
Alfred Perlstein	edf6699ae6	Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a barrier between free'ing filedesc structures. Basically if you want to access another process's filedesc, you want to hold this mutex over the entire operation.	2003-02-15 05:52:56 +00:00
Bosko Milekic	9e7225808e	Make m_getm() always return the top of the newly allocated chain, as opposed to returning the top of the old chain when there was one and the top of the newly allocated chain if there was no old chain. Actually, it should be noted that prior to this fix, although the comment above m_getm() advertised that m_getm() would return the top of the old chain (if an old chain was being passed in) it actually [wrongly] was returning the tail mbuf in the old chain instead. This is a bug but since the one use of m_getm() in the tree luckily did not depend on the behavior, it happened to work out without notice. Harti Brandt pointed out that the advertised behavior was actually not the real behavior and so this change makes m_getm() ALWAYS return the newly allocated chain (and fixes the comment). This is less confusing and is the best course of action as then the caller is always able to have both a reference to the top of the original chain (because it's passing it in in the call) and a reference to the newly attached chain. Although the API is slightly modified, I don't think that any third-party code uses m_getm() and if it does, it surely can't be working properly because the old behavior was bogus. API bug pointed out by: Harti Brandt <brandt@fokus.fraunhofer.de>	2003-02-14 16:50:13 +00:00
Dag-Erling Smørgrav	af2eed6648	Style nit.	2003-02-14 13:30:25 +00:00
Alfred Perlstein	3dc593c895	KASSERT format string does not need newline termination	2003-02-14 13:28:44 +00:00
Alfred Perlstein	0c5f7aaab5	Add kasserts to catch bad API usage. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-02-14 13:18:51 +00:00
Alfred Perlstein	c11110eabe	Fix crash dumps on ata and scsi. To fix scsi, don't wait for ithreads if we're dumping, it makes the debugger sad. To fix ata, use what appears to be a polling method if we're dumping, I stole this from tmm but added code to ensure that this change is only in effect while dumping. Tested by: des	2003-02-14 13:10:40 +00:00
Alfred Perlstein	e95499bd4c	style.	2003-02-14 12:44:48 +00:00
Alfred Perlstein	aae87a3681	Print a backtrace in case we tsleep from inside of DDB.	2003-02-14 12:44:07 +00:00
Alan Cox	2bd63062b5	Use atomic ops to update amountpipekva. Amountpipekva represents the total kernel virtual address space used by all pipes. It is, thus, outside the scope of any individual pipe lock.	2003-02-13 19:39:54 +00:00
Dag-Erling Smørgrav	f6cebd7310	It seems the extra precautions are no longer needed.	2003-02-13 10:05:20 +00:00
Tim J. Robbins	5ce623b8e0	Add an XXX comment noting that getrusage() accesses p_stats->p_ru and p_stats->p_cru without holding the appropriate locks.	2003-02-13 09:53:59 +00:00
Peter Wemm	1c425b874c	Add a 'debug.witness_trace' sysctl (and tunable) when DDB is present. This causes LOR and could-sleep messages to come with a stack trace.	2003-02-13 01:35:56 +00:00
Peter Wemm	891e066864	Print "Stack backtrace:" right before dumping the backtrace. We cannot expect end users to automatically recognize a stack trace for what it is.	2003-02-13 01:33:59 +00:00
Warner Losh	b235704d7c	Implement rman_get_device # I though this was alredy implemented Pointy hat on my head shown by: peter	2003-02-12 07:00:59 +00:00
Alfred Perlstein	42e1b74af2	Don't lock FILEDESC under PROC. The locking here needs to be revisited, but this ought to get rid of the LOR messages that people are complaining about for now. I imagine either I or someone else interested with smp will eventually clear this up.	2003-02-11 07:20:52 +00:00
Jeff Roberson	25c4325446	- Add a comment about a race that will happen without Giant.	2003-02-10 22:47:34 +00:00
Jeff Roberson	c7b716cc2a	- Unlock the nblock after the loop in bwillwrite().	2003-02-10 22:33:59 +00:00
Jeff Roberson	783caefbbf	- Enable STRICT_RESCHED until code that dynamically decides on resched strictness based on the current workload is finished.	2003-02-10 14:11:23 +00:00
Jeff Roberson	407b015791	- Add a new variable 'kg_runtime' that tracks the amount of time we've run. - Use the ratio of kg_runtime / kg_slptime to determine our dynamic priority. - Scale kg_runtime and kg_slptime back when the sum of the two exceeds SCHED_SLP_RUN_MAX. This allows us to slowly forget old behavior. - Scale back the runtime and slptime in fork so that the new process has the same ratio but much less accumulated time. This causes new behavior to be noticed more quickly.	2003-02-10 14:03:45 +00:00
Tim J. Robbins	fbf70de6b0	Lock the proc around accessing p_siglist in ttycheckoutq() in the unused wait != 0 case.	2003-02-10 06:06:46 +00:00
Jeff Roberson	7137d635ac	- In getnewbuf() unlock the bq lock prior to sleeping when we're out of buffers. Submitted by: tegge	2003-02-10 06:02:51 +00:00
Jake Burkholder	3749dff3f9	Remove mtx_lock_giant from functions which are mp-safe.	2003-02-10 04:42:20 +00:00
Jeff Roberson	3306adcfcf	- Correct another atomic op. Spotted by: alc	2003-02-09 22:39:51 +00:00
Jeff Roberson	08883c8a85	- Claim we're 'fsync' and not 'spec_fsync' in vop_stdfsync.	2003-02-09 12:29:38 +00:00
Jeff Roberson	69953c8435	- Move some code out from #ifdef INVARIANTS.	2003-02-09 12:11:37 +00:00
Jeff Roberson	05e393f0cd	- Update a printf format for b_flags.	2003-02-09 11:56:13 +00:00
Jeff Roberson	767b9a529d	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h	2003-02-09 11:28:35 +00:00
Jeff Roberson	15553af710	- spell add 'add' and not 'subtract' in an atomic op. Spotted by: alc Pointy hat to: jeff	2003-02-09 11:21:40 +00:00
Jeff Roberson	d85be48243	- Lock down the buffer cache's infrastructure code. This includes locks on buf lists, synchronization variables, and atomic ops for the counters. This change does not remove giant from any code although some pushdown may be possible. - In vfs_bio_awrite() don't access buf fields without the buf lock.	2003-02-09 09:47:31 +00:00
Julian Elischer	a282253a29	A little infrastructure, preceding some upcoming changes to the profiling and statistics code. Submitted by: DavidXu@ Reviewed by: peter@	2003-02-08 02:58:16 +00:00
Jeffrey Hsu	67c0ddef59	Remove vestiges of no longer needed unp_rvnode field. Approved by: phk (who originally added it in rev 1.8 of unpcb.h)	2003-02-06 01:34:43 +00:00
Julian Elischer	822ded67fe	The lockmanager has to keep track of locks per thread, not per process. Submitted by: david Xu (davidxu@) Reviewed by: jhb@	2003-02-05 19:36:58 +00:00
Dag-Erling Smørgrav	c524b1a8cf	Correct grammatical error in previous commit.	2003-02-04 18:47:17 +00:00
Dag-Erling Smørgrav	91dd013b1e	Extra precautions before trying to start init(8).	2003-02-04 18:16:50 +00:00
Poul-Henning Kamp	6334a66358	Implement proper bounds-checking and truncation of device names, this has become an issue now that end-user controlable attributes can become devices names with the geom_vol_ffs class.	2003-02-04 11:04:26 +00:00
Poul-Henning Kamp	237d2765f9	Pave the road to removing the fixed size limit on device nodes: Change the si_name of dev_t's to be a char * and put a private buffer for holding the name at then end of the struct. Initialize si_name to point to the private buffer. Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still have to generate for the disk_drivers.	2003-02-04 10:32:40 +00:00
Poul-Henning Kamp	8751a8c73b	Add vsnrprintf() which is just like vsnprintf() but takes a "radix" argument for the kernel-special %r format.	2003-02-04 10:00:34 +00:00
Poul-Henning Kamp	91f1c2b3cc	Split the global timezone structure into two integer fields to prevent the compiler from optimizing assignments into byte-copy operations which might make access to the individual fields non-atomic. Use the individual fields throughout, and don't bother locking them with Giant: it is no longer needed. Inspired by: tjr	2003-02-03 19:49:35 +00:00
Jake Burkholder	238dd3209a	Split statclock into statclock and profclock, and made the method for driving statclock based on profhz when profiling is enabled MD, since most platforms don't use this anyway. This removes the need for statclock_process, whose only purpose was to subdivide profhz, and gets the profiling clock running outside of sched_lock on platforms that implement suswintr. Also changed the interface for starting and stopping the profiling clock to do just that, instead of changing the rate of statclock, since they can now be separate. Reviewed by: jhb, tmm Tested on: i386, sparc64	2003-02-03 17:53:15 +00:00
Hajimu UMEMOTO	12e4397ea3	Break out the bind and connect syscalls to intend to make calling these syscalls internally easy. This is preparation for force coming IPv6 support for Linuxlator. Submitted by: dwmalone MFC after: 10 days	2003-02-03 17:36:52 +00:00
Tim J. Robbins	b338d59fef	No need to lock Giant around call to nanosleep1() in nanosleep().	2003-02-03 15:31:57 +00:00
Tim J. Robbins	411c25edae	Avoid holding Giant across copyout() in gettimeofday() and getitimer().	2003-02-03 14:47:22 +00:00
Hartmut Brandt	1b978d453b	Make the variable types, the sysctl macros and the sysctl handler for kern.ipc.{maxsockbuf,sockbuf_waste_factor} to agree that those variables are of type unsigned long. PR: sparc64/47389 Approved by: jake (mentor)	2003-02-03 06:50:59 +00:00
Jeff Roberson	5d7ef00cfe	- Make some context switches conditional on SCHED_STRICT_RESCHED. This may have some negative effect on interactivity but it yields great perf. gains. This also brings the conditions under which ULE context switches inline with SCHED_4BSD. - Define some new kseq_* functions for manipulating the run queue. - Add a new kseq member ksq_rslices and ksq_bload. rslices is the sum of the slices of runnable kses. This will be used for push load balance decisions. bload is the number of threads blocked waiting on IO.	2003-02-03 05:30:07 +00:00
Jeff Roberson	cd6e33df1c	- Stop abusing oncpu for our cpu binding. Define a scheduler local element in the kse datastructure called ke_cpu. This is the cpu which we are currently bound to. Some flags may be added later to support hard binding.	2003-02-03 02:26:28 +00:00
Alfred Perlstein	04738e99b5	Catch more uses of MIN().	2003-02-02 13:30:00 +00:00
Alfred Perlstein	8deebb0160	Consolidate MIN/MAX macros into one place (param.h). Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-02-02 13:17:30 +00:00
Scott Long	7121cce58a	Use hz if stathz is zero. Adopted from sched_4bsd.	2003-02-02 08:24:32 +00:00
Julian Elischer	6f8132a867	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.	2003-02-01 12:17:09 +00:00
Poul-Henning Kamp	4db4f5c87f	Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have M_ZERO specified with 0x70. (malloc_flags=J for the kernel :-)	2003-02-01 10:07:49 +00:00
Poul-Henning Kamp	33bef83cc6	Under DIAGNOSTIC, only report expensive timeouts if they are more expensive than the last on we reported.	2003-02-01 10:06:40 +00:00
Julian Elischer	ff92b12dce	Only add one tick per tick to the thread stats, instead of some random number.	2003-01-31 22:14:46 +00:00
Robert Watson	565211b27f	Correct handling of locking for chroot() and chdir() cases: rather than having change_dir() release the vnode lock on success, hold the lock so that we can use it later when invoking MAC checks and VOP_ACCESS() in the chroot() code. Update the comment to reflect this calling convention. Update callers to unlock the vnode lock. Correct a typo regarding vnode naming in the MAC case that crept in via the previous patch applied.	2003-01-31 21:13:25 +00:00
Robert Watson	7278944df1	Clean up vnode handling on return from chroot() in certain error cases: we might multiply vrele() a vnode when certain classes of failures occur. This appears to stem from earlier Giant/file descriptor lock pushdown and restructuring. Submitted by: maxim	2003-01-31 18:57:04 +00:00
Tim J. Robbins	48ed1432c5	Use a local variable to store the number of ticks that elapsed in kernel mode instead of (unintentionally) using the global `ticks'. This error completely broke profiling.	2003-01-31 11:22:31 +00:00
Poul-Henning Kamp	6e1203e558	NO_GEOM cleanup: unifdef;	2003-01-30 19:22:27 +00:00
Poul-Henning Kamp	c5cab5b2fa	NO_GEOM cleanup: retire to attic.	2003-01-30 12:58:55 +00:00
Poul-Henning Kamp	c9834aa961	NODEVFS cleanup: Unifdef.	2003-01-30 12:51:32 +00:00
Poul-Henning Kamp	8e67075792	NO_GEOM cleanup: remove #ifdef	2003-01-30 12:36:30 +00:00
Poul-Henning Kamp	4af0d0c21f	NODEVFS cleanup: remove #ifdefs	2003-01-30 12:35:40 +00:00
Poul-Henning Kamp	f6a1852dcc	NODEVFS cleanup: remove #ifdefs.	2003-01-30 12:35:17 +00:00
Poul-Henning Kamp	34189c035b	NODEVFS cleanup: Remove cdevsw[]. This implicitly removes the need for major numbers, but a number of drivers still know things they shouldn't need to, and we need to consider if there are applications which cache major(+minor) gleaned from stat(2) and rely on it being constant over reboots before we start assigning random majors.	2003-01-29 21:54:03 +00:00
Tim J. Robbins	af7cbce89c	Fix two fatal signedness errors introduced when i and j in semop() were changed from int to size_t in the previous revision. PR: 47625	2003-01-29 12:30:59 +00:00
Poul-Henning Kamp	60ca399653	Move timecounters notion of frequency to 64 bits. [WARNING: CPUs in the distant future may be closer than they appear!]	2003-01-29 11:29:22 +00:00
Jeff Roberson	0a016a05a4	- Use ksq_load as the authoritive count of kses on the pair of kseqs for sched_runnable() et all. - Remove some dead code in sched_clock(). - Define two macros KSEQ_SELF() and KSEQ_CPU() for getting the kseq of the current cpu or some alternate cpu. - Start introducing kseq_() functions, such as kseq_choose() and kseq_setup().	2003-01-29 07:00:51 +00:00
Jeff Roberson	bf857e69a2	- Remove debugging code that didn't work on UP.	2003-01-29 00:26:47 +00:00
Jeff Roberson	d465fb9589	- Allow idle's pctcpu time to be calculated.	2003-01-28 09:30:17 +00:00
Jeff Roberson	c9f25d8f92	- Fix the ksq_load calculation. It now reflects the number of entries on the run queue for each cpu. - Introduce kse stealing into the sched_choose() code. This helps balance cpus better in cases where process turnover is high. This implementation is fairly trivial and will likely be only a temporary measure until something more sophisticated has been written.	2003-01-28 09:28:20 +00:00
Peter Wemm	bf2053cad6	No longer force COMPAT_FREEBSD4 to be on.	2003-01-27 23:01:03 +00:00
Poul-Henning Kamp	109751d28c	Don't dereference null vnode pointer if controling terminal was revoked. Submitted by: "Peter Edwards" <pmedwards@eircom.net>	2003-01-27 16:54:17 +00:00
David Xu	ba07d97e62	Use kg_numupcalls to see if we are closing a thread group, not kg_kses which is not changed when a group is still working.	2003-01-26 23:39:33 +00:00
Alfred Perlstein	ca315837c7	fix warnings	2003-01-26 23:25:00 +00:00
Alfred Perlstein	b17c9cfa5e	Add const qualifier to data argument for msgsnd. PR: standards/45274 Submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-01-26 20:09:34 +00:00
David Xu	0dbb100b9b	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian	2003-01-26 11:41:35 +00:00
Jeff Roberson	35e6168fcd	- Add the ule scheduler. This is intended to be a general purpose process scheduler with many SMP benefits. It is still very experimental and should be used only in test environments.	2003-01-26 05:23:15 +00:00
Jeff Roberson	4e997f4b87	- Call sched_sleep() instead of rolling our own in cv_waitq_add().	2003-01-26 04:00:39 +00:00
Alfred Perlstein	e1d7d0bb60	Bring shm functions closer the the opengroup standards. PR: 47469 Submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-01-25 21:33:05 +00:00
Alfred Perlstein	3beb32709d	Bring semop() closer the the opengroup standards. PR: 47471 Submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-01-25 21:27:37 +00:00
Poul-Henning Kamp	4394f4767d	Add sysctl kern.timecounter.nsetclock which indicates the number of potential discontinuities in our UTC timescale. Applications can monitor this variable if they want to be informed about steps in the timescale. Slews (ntp and adjtime(2)) and frequency adjustments (ntp) will not increment this counter, only operations which set the clock. No attempt is made to classify size or direction of the step.	2003-01-25 07:51:09 +00:00
Jeffrey Hsu	afb0573a12	Remove extraneous FILEDESC_LOCKs around atomic reads. Reviewed by: jhb	2003-01-24 22:49:52 +00:00
Hajimu UMEMOTO	bdc5f6a345	Added comment why this workaround is required. Suggested by: sam MFC after: 1 week	2003-01-22 18:03:06 +00:00
Hajimu UMEMOTO	56b3905f15	getpeername() returns with no error but didn't fill struct sockaddr correctly against PF_LOCAL. It seems that the test always fails then sockaddr was not filled. So, I added else clause for workaround. I doubt if it is right fix. However, it is better than nothing. I found that NetBSD has same potential problem. But, fortunately, NetBSD has equivalent else clause. MFC after: 1 week	2003-01-22 13:13:13 +00:00
Dag-Erling Smørgrav	ecf031c9ad	There's absolutely no need for a struct-within-a-struct, so move the counters out of the inner struct and remove it.	2003-01-21 20:33:27 +00:00
Jeffrey Hsu	a448a15bc1	Add missing SMP file locks around read-modify-write operations on the flag field. Reviewed by: rwatson	2003-01-21 20:20:48 +00:00
Thomas Moestl	72aeb19aba	Correct an off-by-one in the boundary check. Otherwise, resource allocations would fail if the desired allocation size was equal to the boundary.	2003-01-21 17:02:21 +00:00
Poul-Henning Kamp	a63935c3f6	#ifdef NO_GEOM all of this file.	2003-01-21 10:40:46 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Sam Leffler	07ff231fcb	preserve the order of tags copied by m_tag_copy_chain Obtained from: OpenBSD	2003-01-21 06:14:38 +00:00
Jeffrey Hsu	34c54d9f74	Rewrite the SMP filedesc locking in knote_attach() in order to 1. eliminate unnecessary loop which frees and re-allocates the just allocated array 2. eliminate the newsize recomputation 3. eliminate unnecessary unlock and relock around free 4. correctly match the free with the malloc into M_KQUEUE instead of M_TEMP 5. eliminate conditional assignment of oldlist, which is equivalent to a simple assignment 6. eliminate the oldlist temporary variable completely Reviewed by: jhb	2003-01-21 04:05:49 +00:00
Robert Watson	ec35c2af68	Perform VOP_GETATTR() before mac_check_vnode_exec() so that the cached attributes are available to MAC modules. Submitted by: mike halderman <mrh@nosc.mil> Obtained from: TrustedBSD Project	2003-01-21 03:26:28 +00:00
Jake Burkholder	7251b4bf93	Resolve relative relocations in klds before trying to parse the module's metadata. This fixes module dependency resolution by the kernel linker on sparc64, where the relocations for the metadata are different than on other architectures; the relative offset is in the addend of an Elf_Rela record instead of the original value of the location being patched. Also fix printf formats in debug code. Submitted by: Hartmut Brandt <brandt@fokus.gmd.de> PR: 46732 Tested on: alpha (obrien), i386, sparc64	2003-01-21 02:42:44 +00:00
Matthew Dillon	2d5c7e4506	Close the remaining user address mapping races for physical I/O, CAM, and AIO. Still TODO: streamline useracc() checks. Reviewed by: alc, tegge MFC after: 7 days	2003-01-20 17:46:48 +00:00
Poul-Henning Kamp	c0805171aa	disk_dev_synth() is a NO_GEOM hack.	2003-01-20 11:29:07 +00:00
Poul-Henning Kamp	0b4583e873	Only include <sys/diskslice.h> ifdef NO_GEOM	2003-01-20 11:28:37 +00:00
Alan Cox	28ec30cd9f	- Hold the page queues lock around vm_page_hold(). - Assert that the page queues lock rather than Giant is held in vm_page_hold().	2003-01-20 09:24:03 +00:00
Julian Elischer	67f7c1bbe1	Remove a KASSERT that can now happen and add a missing setrunnable.	2003-01-20 03:41:04 +00:00
Poul-Henning Kamp	8a5c54f72d	#ifdef NO_GEOM these files entirely. When NO_GEOM is removed as an option the files can be removed.	2003-01-19 11:51:35 +00:00
Tim J. Robbins	5cb6b2cada	Remove unnecessary locking of Giant around nanotime() in clock_gettime().	2003-01-19 11:28:22 +00:00
Poul-Henning Kamp	5ecd6fd411	Mark more code #ifdef NODEVFS	2003-01-19 11:26:13 +00:00
Poul-Henning Kamp	7e760e148a	Originally when DEVFS was added, a global variable "devfs_present" was used to control code which were conditional on DEVFS' precense since this avoided the need for large-scale source pollution with #include "opt_geom.h" Now that we approach making DEVFS standard, replace these tests with an #ifdef to facilitate mechanical removal once DEVFS becomes non-optional. No functional change by this commit.	2003-01-19 11:03:07 +00:00
Poul-Henning Kamp	ec2c4225ce	When we use DEVFS, we don't need the /dev/tty pseudo-driver to do more than return ENXIO from its open routine, so most of this file is unneeded. A straight #ifdef'ing would look quite messy, and make the file quite unreadable, so instead I have simply added the DEVFS version of the file at the top, protected by #ifndef NODEVFS. Once we have removed NODEVFS option, we can retain 86 the 86 lines at the top and drop the other 287 lines.	2003-01-19 10:23:47 +00:00
Alfred Perlstein	31f3e2ad8e	useracc() is mpsafe so we only need to hold Giant over the call to nanosleep1() Pointed out by: tjr	2003-01-19 06:51:10 +00:00
Warner Losh	b47d073500	Fix comment about what we do when there are no listeners.	2003-01-19 00:34:17 +00:00
Poul-Henning Kamp	ffffe9203f	Move alpha_fix_srm_checksum() from subr_diskmbr.c to subr_disklabel.c	2003-01-17 19:37:55 +00:00
Poul-Henning Kamp	40f683a443	Remove the unused DSO_* options.	2003-01-17 19:36:14 +00:00
Thomas Moestl	6f7cab9301	Disallow listen() on sockets which are in the SS_ISCONNECTED or SS_ISCONNECTING state, returning EINVAL (which is what POSIX mandates in this case). listen() on connected or connecting sockets would cause them to enter a bad state; in the TCP case, this could cause sockets to go catatonic or panics, depending on how the socket was connected. Reviewed by: -net MFC after: 2 weeks	2003-01-17 19:20:00 +00:00
Poul-Henning Kamp	e948321c7a	Move dkmodpart() from subr_diskslice.c to subr_disklabel.c.	2003-01-17 19:05:58 +00:00
Poul-Henning Kamp	ce9fac0072	Move a local variable to avoid the compiler warning about it being unused.	2003-01-16 20:06:45 +00:00
John Hay	b1e7e2019e	hardpps() wants the raw hardware counter value converted to nanoseconds.	2003-01-16 19:22:13 +00:00
Alan Cox	6eb07b4ac2	Fix two long-standing, but likely harmless, errors in the use of vm_pageout_deficit: 1. Update vm_pageout_deficit before VM_WAIT. There is no sense in delaying the update; the sooner the pageout daemon receives this information the better. Reviewed by: tegge 2. Update vm_pageout_deficit according to the number of pages still needed to complete the allocation, not the original size of the allocation. Submitted by: tegge (These errors have existed since the introduction of vm_pageout_deficit in revision 1.144.)	2003-01-16 08:14:56 +00:00
Matthew Dillon	f597900329	Merge all the various copies of vmapbuf() and vunmapbuf() into a single portable copy. Note that pmap_extract() must be used instead of pmap_kextract(). This is precursor work to a reorganization of vmapbuf() to close remaining user/kernel races (which can lead to a panic).	2003-01-15 23:54:35 +00:00
David Xu	4e77d3c6a2	Don't forget to disconnect object from class.	2003-01-15 14:58:07 +00:00
Matthew Dillon	fe41ca530c	Introduce the ability to flag a sysctl for operation at secure level 2 or 3 in addition to secure level 1. The mask supports up to a secure level of 8 but only add defines through CTLFLAG_SECURE3 for now. As per the missif in the log entry for 1.11 of ip_fw2.c which added the secure flag to the IPFW sysctl's in the first place, change the secure level requirement from 1 to 3 now that we have support for it. Reviewed by: imp With Design Suggestions by: imp	2003-01-14 19:35:33 +00:00
Alan Cox	b0ef8c5fe4	- Update vm_pageout_deficit using atomic operations. It's a simple counter outside the scope of existing locks. - Eliminate a redundant clearing of vm_pageout_deficit.	2003-01-14 06:57:03 +00:00
Matthew Dillon	3db161e079	It is possible for an active aio to prevent shared memory from being dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped. Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free(). MFC after: 7 days	2003-01-13 23:04:32 +00:00
Alfred Perlstein	ac41f2ef0b	style(9) fixes, mostly add parens around return arguments.	2003-01-13 15:06:05 +00:00
Jeff Roberson	8fb913face	- Unbreak world. I did not notice that libkvm was still used in some places to access the pctcpu. This will have to be sorted out more later as the new scheduler requires a procedural interface for this data. A more complete solution will follow.	2003-01-13 03:42:41 +00:00
Matthew Dillon	48e3128b34	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.	2003-01-13 00:33:17 +00:00
Jeff Roberson	bcb06d5980	- Move ke_pctcpu and ke_cpticks into the scheduler specific datastructure. This will prevent access through mechanisms other than the published interfaces.	2003-01-12 19:04:49 +00:00
Tim J. Robbins	ae3b195fcf	Allowing nent < 0 in aio_suspend() and lio_listio() is just asking for trouble. Return EINVAL instead.	2003-01-12 09:40:23 +00:00
Tim J. Robbins	44a2c818de	Remove "XXX undocumented" comment from lio_listio().	2003-01-12 09:33:16 +00:00
Alan Cox	8febaa4df0	vm_hold_load_pages() needn't clear PG_ZERO because it didn't pass VM_ALLOC_ZERO to vm_page_alloc(). (PG_ZERO is clear by default.)	2003-01-12 06:30:15 +00:00
Matthew Dillon	cd72f2180b	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.	2003-01-12 01:37:13 +00:00
Maxime Henrion	5d9155dfee	Fix kernel build. Pointy hats to: dillon, Hiten Pandya <hiten@unixdaemons.com>	2003-01-11 12:39:45 +00:00
Tim J. Robbins	b3f1af6b8e	Don't count mbufs with m_type == MT_HEADER or MT_OOBDATA as control data in sballoc(), sbcompress(), sbdrop() and sbfree(). Fixes fstat() st_size reporting and kevent() EVFILT_READ on TCP sockets.	2003-01-11 07:51:52 +00:00
Matthew Dillon	57e6d29b1e	Remove all use of the LOG2() macro/inline, undoing some non-optimal cruft that crept in recently. GCC will optimize the divides and multiplies for us. Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 1 day	2003-01-11 01:09:51 +00:00
Alfred Perlstein	b3890a1c42	make sem_leave return a usable errno instead of -1. make ksem_close return that usable errno instead of -1 (ERESTART). PR: 46957	2003-01-10 23:13:16 +00:00
David Xu	7be6584678	Don't record thread pointer, it's not permanent in process life cycle, use process pointer instead.	2003-01-10 09:54:51 +00:00
Jake Burkholder	be0800e85e	Oops, add zstty to the witness order list. Noticed by: benno	2003-01-09 15:45:28 +00:00
David Xu	b47679ccff	Some KSE syscalls are MPSAFE.	2003-01-08 04:57:53 +00:00
Peter Wemm	2488f30980	Move the MOD_SHUTDOWN event from shutdown_post_sync to shutdown_final, so that entities that want to use the post_sync hook to write stuff to devices and other tidy-up can do so before the device tree is shot down. eg: da doing a SYNC_CACHE etc. This should get crashdumps working on mpt devices again, and stops the ia64 boxes locking up on regular shutdown when da tries to issue the scsi commands to mpt. Obtained from: njl, gibbs	2003-01-07 22:24:13 +00:00
Brian Feldman	c579babe54	In vn_open(), unset ndp->ni_vp when returning failure so that code which expects it to be NULL unless the return value was 0 will work. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-01-07 20:59:55 +00:00
Alfred Perlstein	a11acc6f8a	Use copyout to access user memory. Submittted by: pho MFC After: 2 days	2003-01-07 20:10:04 +00:00
Alan Cox	1f17965656	Make bogus_offset local to bufinit().	2003-01-07 19:55:08 +00:00
Poul-Henning Kamp	c49bddce94	Fix warnings & errors caused by my last commit.	2003-01-07 19:09:10 +00:00
John Baldwin	377a66bc40	Cast the integer read as the first argument for %b to an unsigned integer so it's value is not sign extended when assigned to the uintmax_t variable used internally by printf. For example, if bit 31 is set in the cpuid feature word, then %b would print out the initial value as a 16 character hexadecimal value. Now it only prints out an 8 character value. Reviewed by: bde	2003-01-07 18:17:18 +00:00
David Xu	45f603e21c	Clear some KSE fields after kse mode was turned off.	2003-01-07 06:56:43 +00:00
David Xu	b81c4d1e8c	Forgot to call setrunnable() for un-idled thread.	2003-01-07 06:04:33 +00:00
David Xu	ea5ab16eba	Check signals for idled threads.	2003-01-07 05:56:38 +00:00
Jacques Vidrine	f0c093284d	Correct file descriptor leaks in lseek and do_dup. The leak in lseek was introduced in vfs_syscalls.c revision 1.218. The leak in do_dup was introduced in kern_descrip.c revision 1.158. Submitted by: iedowse	2003-01-06 13:19:05 +00:00
Poul-Henning Kamp	578c478621	This is all "#if defined(__i386__) && __GNUC__ >= 2": Add support for GCC's --test-coverage --profile-arcs options. Add code to call the functions listed in the .ctors section, these are used to string the per .o file counter blocks into a linked list. Add empty __bb_fork_func() to cope with GCC magic gandling of exec*() named functions. To add support for other platforms should be trivial, but involves determining the exact data-types gcc uses on that platform.	2003-01-06 07:40:49 +00:00
Peter Wemm	ff29255673	Explicitly have the timecounter init happen after the cpu_initclocks is called. Otherwise (depending on a non-deterministic sort), the timecounter code can be initialized before the clock rate has been set (on ia64) and it assumes hz = 100, rather than the real value of 1024. I'm not sure how much gets upset by this. Glanced at by: phk	2003-01-06 01:01:08 +00:00
Poul-Henning Kamp	ea4804130a	Fix cut&paste bug which would result in a panic because buffer was being biodone'ed multiple times.	2003-01-05 22:01:08 +00:00
Alan Cox	9ce904432a	Allocate bogus_page with VM_ALLOC_WIRED. (Previously, bogus_page's allocation incremented the global count of wired pages, but not the page's own wire count. This inconsistency was introduced in revision 1.230.)	2003-01-05 18:46:13 +00:00
Alfred Perlstein	a09de2f7cd	In sodealloc(), if there is an accept filter present on the socket then call do_setopt_accept_filter(so, NULL) which will free the filter instead of duplicating the code in do_setopt_accept_filter(). Pointed out by: Hiten Pandya <hiten@angelica.unixdaemons.com>	2003-01-05 11:14:04 +00:00
Jake Burkholder	e548a1d4c8	- Provide backwards compatibility for kern.fallback_elf_brand. - Use the generic elf type macros in imgact_elf.h instead of ifdefing the entire contents of the header.	2003-01-05 03:48:14 +00:00
Poul-Henning Kamp	f5b11b6e2d	Temporarily introduce a new VOP_SPECSTRATEGY operation while I try to sort out disk-io from file-io in the vm/buffer/filesystem space. The intent is to sort VOP_STRATEGY calls into those which operate on "real" vnodes and those which operate on VCHR vnodes. For the latter kind, the call will be changed to VOP_SPECSTRATEGY, possibly conditionally for those places where dual-use happens. Add a default VOP_SPECSTRATEGY method which will call the normal VOP_STRATEGY. First time it is called it will print debugging information. This will only happen if a normal vnode is passed to VOP_SPECSTRATEGY by mistake. Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY does on a VCHR vnode today. Add a new VOP_STRATEGY method in specfs to catch instances where the conversion to VOP_SPECSTRATEGY has not yet happened. Handle the request just like we always did, but first time called print debugging information. Apart up to two instances of console messages per boot, this amounts to a glorified no-op commit. If you get any of the messages on your console I would very much like a copy of them mailed to phk@freebsd.org	2003-01-04 22:10:36 +00:00
Jake Burkholder	a360a43dd5	Improve the way that an elf image activator for an alternate word size is included in the kernel. Include imgact_elf.c in conf/files, instead of both imgact_elf32.c and imgact_elf64.c, which will use the default word size for an architecture as defined in machine/elf.h. Architectures that wish to build an additional image activator for an alternate word size can include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which allows it to be dependent on MD options instead of solely on architecture. Glanced at by: peter	2003-01-04 22:07:48 +00:00
Poul-Henning Kamp	3c3871e5e6	Introduce the void backtrace(void); function which will print a backtrace if DDB is in the kernel and an explanation if not. This is useful for recording backtraces in non-fatal circumstances and does not require pollution with DDB #includes in the files where it is used. It would of course be nice to have a non-DDB dependent version too, but since the meat of a backtrace is MD it is probably not worth it.	2003-01-04 20:54:58 +00:00
Poul-Henning Kamp	c7fb6fd1b8	resort the vnode ops list.	2003-01-04 20:31:27 +00:00
Poul-Henning Kamp	3ae5950529	Move #include of ddb/ddb.h up with the rest.	2003-01-04 20:15:32 +00:00
Poul-Henning Kamp	b3ed130c42	Export tc_tick with sysctl, not tick. Spotted by: bde	2003-01-04 17:33:55 +00:00
Jeffrey Hsu	98ab1489e4	Remove unnecessary lock assertion.	2003-01-04 11:45:50 +00:00
David Xu	cac3fba0ce	Some KSE syscalls are MPSAFE.	2003-01-04 11:41:12 +00:00
Poul-Henning Kamp	7b330b22b6	Don't call VOP_BMAP on VCHR vnodes when the logical and physical block numbers are identical: it cannot even hope to accomplish anything.	2003-01-04 09:37:42 +00:00
Jake Burkholder	5dadd17b08	Add a sysctl to get the vm protections for the stack of the current process. On architectures with a non-executable stack, eg sparc64, this is used by libgcc to determine at runtime if its necessary to enable execute permissions on a region of the stack which will be used to execute code, allowing the call to mprotect to be avoided if the kernel is configured to map the stack executable.	2003-01-04 07:54:23 +00:00
David Xu	450c38d016	Set kse mailbox pointer to NULL when P_KSES is turned off.	2003-01-04 05:59:25 +00:00
Julian Elischer	a98c9b8604	White space fixes	2003-01-03 20:55:52 +00:00
Julian Elischer	03ea472080	Make an explicit flag to indicate that a KSE has a reason to upcall, and use that flag when there is a kse_wakeup() call. It will probably be used with signal delivery as well eventually. Submitted by: davidxu@	2003-01-03 20:41:49 +00:00
Julian Elischer	3f5f24287f	Don't need to set retvals to 0 in the non error case. They are set to a good default anyhow. Submitted by: davidxu@	2003-01-03 19:38:54 +00:00
Poul-Henning Kamp	862702306b	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
Poul-Henning Kamp	e2a3ea1c45	Remove unused second argument from DEV_STRATEGY().	2003-01-03 05:57:35 +00:00
Andrew Gallatin	1f88bad30a	o Introduce a new external mbuf type, EXT_EXTREF. o Allow callers of m_extadd() to allocate their own reference m_ext.ref_cnt pointer, rather than having the mbuf system allocate it with a malloc() in the critical path. This speeds m_extadd() up, and also simplifies locking (malloc() may need Giant). A driver or subsystem wishing to take use its own ref counter must initialize m_ext.ref_cnt to point to its ref counter prior to calling m_extadd(), and it must use EXT_EXTREF as its external type. Eg: m->m_ext.ref_cnt = my_ref_cnt_ptr; m_extadd(.....,EXT_EXTREF); Reviewed by: bosko	2003-01-02 21:16:50 +00:00
Alan Cox	49bf855d20	Lock the vm object when performing back-to-back vm_object_clear_flag() and vm_object_set_flag().	2003-01-02 18:32:13 +00:00
David Xu	42f67bd752	Adjust code for Julian's last commit. use td_mailbox to detect if a syscall is from UTS kernel.	2003-01-02 02:48:03 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Warner Losh	62c8b32c71	Use 0600 for permissions for /dev/devctl until it is cloneable. Use UID_ROOT and GID_WHEEL rather than 0. Prompted by: rwatson	2003-01-01 03:43:58 +00:00
Alfred Perlstein	13438f6823	When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.	2003-01-01 01:56:19 +00:00
Alfred Perlstein	c522c1bf4b	fdcopy() only needs a filedesc pointer.	2003-01-01 01:19:31 +00:00
Alfred Perlstein	03282e6e3d	purge 'register'.	2003-01-01 01:05:54 +00:00
Alfred Perlstein	c7f1c11b20	Since fdshare() and fdinit() only operate on filedescs, make them take pointers to filedesc structures instead of threads. This makes it more clear that they do not do any voodoo with the thread/proc or anything other than the filedesc passed in or returned. Remove some XXX KSE's as this resolves the issue.	2003-01-01 01:01:14 +00:00
Alfred Perlstein	59c97598d3	fdinit() does not need to lock the filedesc it is creating as no one besideds itself has access until the function returns.	2003-01-01 00:35:46 +00:00
Sam Leffler	addea9d4d7	o reduce the overhead of calling ppsratecheck by using ticks instead of calling getmicrouptime (but maintain the struct timeval-based calling convention for compatibility) o eliminate the use of timersub in ratecheck Note that flood ping tests indicate ppsratecheck is inaccurate (but on the conservative side) with this revised implementation. If more accuracy is needed we'll have to introduce an alternate interface or increase the overhead. Reviewed by: silby, dillon, bde	2002-12-31 18:22:12 +00:00
Jens Schweikhardt	d64ada501a	Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/ Add FreeBSD Id tag where missing.	2002-12-30 21:18:15 +00:00
Sam Leffler	9967cafc49	Correct mbuf packet header propagation. Previously, packet headers were sometimes propagated using M_COPY_PKTHDR which actually did something between a "move" and a "copy" operation. This is replaced by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it from the source mbuf) and m_dup_pkthdr which copies the packet header contents including any m_tag chain. This corrects numerous problems whereby mbuf tags could be lost during packet manipulations. These changes also introduce arguments to m_tag_copy and m_tag_copy_chain to specify if the tag copy work should potentially block. This introduces an incompatibility with openbsd which we may want to revisit. Note that move/dup of packet headers does not handle target mbufs that have a cluster bound to them. We may want to support this; for now we watch for it with an assert. Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG\|M_LASTFRAG. Supported by: Vernier Networks Reviewed by: Robert Watson <rwatson@FreeBSD.org>	2002-12-30 20:22:40 +00:00
Robert Watson	3c67c23bcf	Implement new ACL system calls which do not follow symbolic links: __acl_get_link(), __acl_set_link(), acl_delete_link(), and __acl_aclcheck_link(), with almost identical implementations to the existing __acl_*_file() variants on these calls. Update copyright. Obtained from: TrustedBSD Project	2002-12-29 20:28:44 +00:00
Robert Watson	6f123c35a0	Regen from syscalls.master:1.139	2002-12-29 20:26:41 +00:00
Robert Watson	b1f4acd8ac	Add definitions for four new system calls: __acl_get_link() Retrieve an ACL by name without following symbolic links. __acl_set_link() Set an ACL by name without following symbolic links. __acl_delete_link() Delete an ACL by name without following symbolic links. __acl_aclcheck_link() Check an ACL against a file by name without following symbolic links. These calls are similar in spirit to lstat(), lchown(), lchmod(), etc, and will be used under similar circumstances. Obtained from: TrustedBSD Project	2002-12-29 20:25:54 +00:00
Ian Dowse	6a1b2a22ef	Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVE call is in progress on the vnode. When vput() or vrele() sees a 1->0 reference count transition, it now return without any further action if this flag is set. This flag is necessary to avoid recursion into VOP_INACTIVE if the filesystem inactive routine causes the reference count to increase and then drop back to zero. It is also used to guarantee that an unlocked vnode will not be recycled while blocked in VOP_INACTIVE(). There are at least two cases where the recursion can occur: one is that the softupdates code called by ufs_inactive() via ffs_truncate() can call vput() on the vnode. This has been reported by many people as "lockmgr: draining against myself" panics. The other case is that nfs_inactive() can call vget() and then vrele() on the vnode to clean up a sillyrename file. Reviewed by: mckusick (an older version of the patch)	2002-12-29 18:30:49 +00:00
Poul-Henning Kamp	371400cf2e	Use a timeout of one second while we wait for the vnode washer, this prevents a potential race and makes the system a little bit less jerky under extreme loads.	2002-12-29 11:18:25 +00:00
Poul-Henning Kamp	851a87ea1a	Vnodes pull in 800-900 bytes these days, all things counted, so we need to treat desiredvnodes much more like a limit than as a vague concept. On a 2GB RAM machine where desired vnodes is 130k, we run out of kmem_map space when we hit about 190k vnodes. If we wake up the vnode washer in getnewvnode(), sleep until it is done, so that it has a chance to offer us a washed vnode. If we don't sleep here we'll just race ahead and allocate yet a vnode which will never get freed. In the vnodewasher, instead of doing 10 vnodes per mountpoint per rotation, do 10% of the vnodes distributed evenly across the mountpoints.	2002-12-29 10:39:05 +00:00
Alan Cox	a28cc55e5b	Reduce the number of times that we acquire and release the page queues lock by making vm_page_rename()'s caller, rather than vm_page_rename(), responsible for acquiring it.	2002-12-29 07:17:06 +00:00
Jake Burkholder	24fbeaf9c3	Don't put a newline in KTR traces.	2002-12-28 23:22:22 +00:00
Jake Burkholder	dcc4093c7a	Add a tunable kern.smp.disabled for disabling explicitly smp on an smp kernel.	2002-12-28 23:21:13 +00:00
Poul-Henning Kamp	9f16282798	KASSERT that vop_revoke() gets a VCHR.	2002-12-28 22:27:14 +00:00
Poul-Henning Kamp	f53c6e5c9a	Remove unused cdevsw_ALLOCSTART macro.	2002-12-28 21:47:43 +00:00
Poul-Henning Kamp	7068a01c6f	Remove cdevsw_add calls, they are deprecated.	2002-12-28 21:39:46 +00:00
Matthew Dillon	45587e2514	Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day	2002-12-28 20:28:10 +00:00
Julian Elischer	93a7aa79d6	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.	2002-12-28 01:23:07 +00:00
Robert Watson	f0bc12ee8d	Improve consistency between devfs and MAKEDEV: use UID_ROOT and GID_WHEEL instead of UID_BIN and GID_BIN for /dev/fd/* entries. Submitted by: kris	2002-12-27 16:54:44 +00:00
Alfred Perlstein	5590e7fdf0	Lock filedesc while performing a range check on the file descriptor. Reviewed by: alc	2002-12-27 08:39:42 +00:00
Alan Cox	d746789347	Hold the page queues lock when calling vm_page_flag_clear().	2002-12-27 06:52:32 +00:00
Jeffrey Hsu	6f782c4636	Ensure that the made-up inode number for a Unix domain socket is persistent.	2002-12-25 07:59:39 +00:00
Robert Watson	79191eca57	Flush vop_refreshlabel() definition, since it is no longer used. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-12-24 19:47:13 +00:00
Poul-Henning Kamp	a7010ee2f4	White-space changes.	2002-12-24 09:44:51 +00:00
Jeffrey Hsu	956b0b653c	SMP locking for radix nodes.	2002-12-24 03:03:39 +00:00
Poul-Henning Kamp	08c7670a8b	Move the declaration of the socket fileops from socketvar.h to file.h. This allows us to use the new typedefs and removes the needs for a number of forward struct declarations in socketvar.h	2002-12-23 22:46:47 +00:00
Poul-Henning Kamp	f3a682116c	Detediousficate declaration of fileops array members by introducing typedefs for them.	2002-12-23 21:53:20 +00:00
Poul-Henning Kamp	6ce9c72c30	s/sokqfilter/soo_kqfilter/ for consistency with the naming of all other socket/file operations.	2002-12-23 21:37:28 +00:00
Alan Cox	0cb6c00463	- Hold the kernel_object's lock around vm_page_alloc(kernel_object,...). - Hold the page queues lock around vm_page_wakeup().	2002-12-23 20:10:47 +00:00
Jake Burkholder	c3c2862df4	- Add a spin lock to single thread cache invalidation and tlb flush ipis, which allows ipis to be sent outside of Giant. - Remove the ap boot mutex, which is unused.	2002-12-22 20:50:23 +00:00
Kris Kennaway	4ef3d7a27b	Enforce correct ordering of the filedesc structure and pipe mutex, because WITNESS can get the order wrong if it guesses based on first use. Reviewed by: jhb, alfred	2002-12-22 16:32:34 +00:00
Jeffrey Hsu	b30a244c34	SMP locking for ifnet list.	2002-12-22 05:35:03 +00:00
Marcel Moolenaar	551d79e177	Fix multiple registration of the elf_legacy_coredump sysctl variable. The duplication is caused by the fact that imgact_elf.c is included by both imgact_elf32.c and imgact_elf64.c and both are compiled by default on ia64. Consequently, we have two seperate copies of the elf_legacy_coredump variable due to them being declared static, and two entries for the same sysctl in the linker set, both referencing the unique copy of the elf_legacy_coredump variable. Since the second sysctl cannot be registered, one of the elf_legacy_coredump variables can not be tuned (if ordering still holds, it's the ELF64 related one). The only solution is to create two different sysctl variables, just like the elf<32\|64>_trace sysctl variables. This unfortunately is an (user) interface change, but unavoidable. Thus, on ELF32 platforms the sysctl variable is called elf32_legacy_coredump and on ELF64 platforms it is called elf64_legacy_coredump. Platforms that have both ELF formats have both sysctl variables. These variables should probably be retired sooner rather than later.	2002-12-21 01:15:39 +00:00
Sam Leffler	91974ce10b	add generic rate limiting support from netbsd; ratelimit is purely time based, ppsratecheck is for controlling packets/second Obtained from: netbsd	2002-12-20 23:54:47 +00:00
Alan Cox	2952e1fb58	Extend the scope of the page queues lock in vm_pgmoveco().	2002-12-20 21:18:29 +00:00
Maxime Henrion	894db7b01f	Don't forget to destroy the mutex if an error occurs in the jail() system call. Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>	2002-12-20 14:32:20 +00:00
Alan Cox	ee113343eb	Hold the page queues lock when performing vm_page_busy().	2002-12-18 20:16:22 +00:00
Poul-Henning Kamp	4d99ef8d55	Indent properly.	2002-12-17 19:31:26 +00:00
Poul-Henning Kamp	126c7e29fe	Remove unused variable cn_devfsdev.	2002-12-17 19:30:50 +00:00
Poul-Henning Kamp	d321df47c3	Don't cast a pointer to (intptr_t) and then on to (int) when we cannot be sure that (int) is large enough. Instead cast only to (intptr_t) and cast the switch/case values to (intptr_t) as well.	2002-12-17 19:13:03 +00:00
Matthew Dillon	fa7dd9c5bc	Change the way ELF coredumps are handled. Instead of unconditionally skipping read-only pages, which can result in valuable non-text-related data not getting dumped, the ELF loader and the dynamic loader now mark read-only text pages NOCORE and the coredump code only checks (primarily) for complete inaccessibility of the page or NOCORE being set. Certain applications which map large amounts of read-only data will produce much larger cores. A new sysctl has been added, debug.elf_legacy_coredump, which will revert to the old behavior. This commit represents collaborative work by all parties involved. The PR contains a program demonstrating the problem. PR: kern/45994 Submitted by: "Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org> Reviewed by: jdp, dillon MFC after: 7 days	2002-12-16 19:24:43 +00:00
Robert Drehmel	0adb6d7a49	Remove the hto(be\|le)[slq] and (be\|le)toh[slq] macros defined in _KERNEL scope from "src/sys/sys/mchain.h". Replace each occurrence of the above in _KERNEL scope with the appropriate macro from the set of hto(be\|le)(16\|32\|64) and (be\|le)toh(16\|32\|64) from "src/sys/sys/endian.h". Tested by: tjr Requested by: comment marked with XXX	2002-12-16 16:20:06 +00:00
Matthew Dillon	72e7f3ddc2	Regenerate system calls (swapoff added)	2002-12-15 19:19:15 +00:00
Matthew Dillon	92da00bb24	This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment. Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks	2002-12-15 19:17:57 +00:00
Matthew Dillon	389d2b6e21	Fix a refcount race with the vmspace structure. In order to prevent resource starvation we clean-up as much of the vmspace structure as we can when the last process using it exits. The rest of the structure is cleaned up when it is reaped. But since exit1() decrements the ref count it is possible for a double-free to occur if someone else, such as the process swapout code, references and then dereferences the structure. Additionally, the final cleanup of the structure should not occur until the last process referencing it is reaped. This commit solves the problem by introducing a secondary reference count, calling 'vm_exitingcnt'. The normal reference count is decremented on exit and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the structure is freed for real. MFC after: 3 weeks	2002-12-15 18:50:04 +00:00
Maxim Konovalov	9f59c468f3	o Clear a high bit of ipc_perm.seq so msgget(3) never returns a negative message queue id. PR: kern/46122 Submitted by: Vladimir B.Grebenschikov <vova@sw.ru> MFC after: 2 weeks	2002-12-15 09:41:46 +00:00
Alan Cox	475e8011ab	Perform vm_object_lock() and vm_object_unlock() around vm_object_page_remove().	2002-12-15 05:41:56 +00:00
Alfred Perlstein	f97182acf8	unwrap lines made short enough by SCARGS removal	2002-12-14 08:18:06 +00:00
Alfred Perlstein	b80521fee5	remove syscallarg(). Suggested by: peter	2002-12-14 02:07:32 +00:00
Alfred Perlstein	d1e405c5ce	SCARGS removal take II.	2002-12-14 01:56:26 +00:00
Kirk McKusick	0f5f789c0d	The buffer daemon cannot skip over buffers owned by locked inodes as they may be the only viable ones to flush. Thus it will now wait for an inode lock if the other alternatives will result in rollbacks (and immediate redirtying of the buffer). If only buffers with rollbacks are available, one will be flushed, but then the buffer daemon will wait briefly before proceeding. Failing to wait briefly effectively deadlocks a uniprocessor since every other process writing to that filesystem will wait for the buffer daemon to clean up which takes close enough to forever to feel like a deadlock. Reported by: Archie Cobbs <archie@dellroad.org> Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-14 01:35:30 +00:00
Alfred Perlstein	bc9e75d7ca	Backout removal SCARGS, the code freeze is only "selectively" over.	2002-12-13 22:41:47 +00:00
Alfred Perlstein	0bbe7292e1	Remove SCARGS. Reviewed by: md5	2002-12-13 22:27:25 +00:00
Tim J. Robbins	9d0fffd3ca	Drop filedesc lock and acquire Giant around calls to malloc() and free(). These call uma_large_malloc() and uma_large_free() which require Giant. Fixes panic when descriptor table is larger than KMEM_ZMAX bytes noticed by kkenn. Reviewed by: jhb	2002-12-13 09:59:40 +00:00
Julian Elischer	696058c3c5	Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage during the context switch. Rearrange thread cleanups to avoid problems with Giant. Clean threads when freed or when recycled. Approved by: re (jhb)	2002-12-10 02:33:45 +00:00
Robert Watson	990b4b2dc5	Remove dm_root entry from struct devfs_mount. It's never set, and is unused. Replace it with a dm_mount back-pointer to the struct mount that the devfs_mount is associated with. Export that pointer to MAC Framework entry points, where all current policies don't use the pointer. This permits the SEBSD port of SELinux's FLASK/TE to compile out-of-the-box on 5.0-CURRENT with full file system labeling support. Approved by: re (murray) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-12-09 03:44:28 +00:00
Alan Cox	2e29a1f21f	To avoid lock order reversals in getnewvnode(), the call to uma_zfree() must be delayed until the vnode interlock is released. Reported by: kris@ Approved by: re (jhb)	2002-12-08 05:06:50 +00:00
Giorgos Keramidas	0c920c0de8	Fix typo in comment. It's SYSINIT, not SYSINT. Approved by: re (murray)	2002-11-30 22:15:30 +00:00
Kirk McKusick	c6964d3bc9	Remove a race condition / deadlock from snapshots. When converting from individual vnode locks to the snapshot lock, be sure to pass any waiting processes along to the new lock as well. This transfer is done by a new function in the lock manager, transferlockers(from_lock, to_lock); Thanks to Lamont Granquist <lamont@scriptkiddie.org> for his help in pounding on snapshots beyond all reason and finding this deadlock. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:00:51 +00:00
Warner Losh	304f10ce4a	devd kernel improvements: 1) Record all device events when devctl is enabled, rather than just when devd has devctl open. This is necessary to prevent races between when a device arrives, and when devd starts. 2) Add hw.bus.devctl_disable to disable devctl, this can also be set as a tunable. 3) Fix async support. Reset nonblocking and async_td in open. remove async flags. 4) Free all memory when devctl is disabled. Approved by: re (blanket)	2002-11-30 00:49:43 +00:00
Alan Cox	fdff30d256	Use pmap_remove_all() instead of pmap_remove() before freeing the page in vm_pgmoveco(); the page may have more than one mapping. Hold the page queues lock when calling pmap_remove_all(). Approved by: re (blanket)	2002-11-28 08:44:26 +00:00
Robert Drehmel	f85a961930	Do not set a variable (vp->p_pollinfo) to NULL if we know it already has that value. Approved by: re	2002-11-27 16:45:54 +00:00
Maxim Konovalov	8819f45b51	Small SO_RCVTIMEO and SO_SNDTIMEO values are mistakenly taken to be zero. PR: kern/32827 Submitted by: Hartmut Brandt <brandt@fokus.gmd.de> Approved by: re (jhb) MFC after: 2 weeks	2002-11-27 13:34:04 +00:00
Tim J. Robbins	fef82663b8	o Initialise each mbuf's m_len to 0 in m_getm(); mb_put_mem() depends on this. o Update the `cur' pointer in the cluster loop in m_getm() to avoid incorrect truncation and leaked mbufs. Reviewed by: bmilekic Approved by: re	2002-11-27 04:26:00 +00:00
Warner Losh	647501a046	Make the rman_{get,set}_* macros into real functions. The macros create an ABI that encodes offsets and sizes of structures into client drivers. The functions isolate the ABI from changes to the resource structure. Since these are used very rarely (once at startup), the speed penalty will be down in the noise. Also, add r_rid to the structure so that clients can save the 'rid' of the resource in the struct resource, plus accessor functions. Future additions to newbus will make use of this to present a simplified interface for resource specification. Approved by: re (jhb) Reviewed by: jhb, jake	2002-11-27 03:55:22 +00:00
Bill Fenner	8b5f8b061a	Don't hold acct_mtx over limcopy(), since it's unnecessary and limcopy() can sleep. Approved by: re	2002-11-26 18:04:12 +00:00
Sam Leffler	c8f43965d6	correct function names in KASSERT's for 2 m_tag routines Submitted by: rwatson Approved by: re	2002-11-26 17:59:16 +00:00
Robert Drehmel	d1989db545	To avoid sleeping with all sorts of resources acquired (the reported problem was a locked directory vnode), do not give the process a chance to sleep in state "stopevent" (depends on the S_EXEC bit being set in p_stops) until most resources have been released again. Approved by: re	2002-11-26 17:30:55 +00:00
John Baldwin	04f4a16448	If the file descriptors passed into do_dup() are negative, return EBADF instead of panicing. Also, perform some of the simpler sanity checks on the fds before acquiring the filedesc lock. Approved by: re Reported by: Dan Nelson <dan@emsphone.com> and others	2002-11-26 17:22:15 +00:00
Robert Watson	4d10c0ce5f	Un-staticize mac_cred_mmapped_drop_perms() so that it may be used by policy modules making use of downgrades in the MAC AST event. This is required by the mac_lomac port of LOMAC to the MAC Framework. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-26 17:11:57 +00:00
Alan Cox	2d21129db2	Acquire and release the page queues lock around pmap_remove_pages() because it updates several of vm_page's fields.	2002-11-25 04:37:44 +00:00
Alan Cox	178949e021	Hold the page queues/flags lock when calling vm_page_set_validclean(). Approved by: re	2002-11-23 19:10:31 +00:00
Maxime Henrion	b19d9defef	Under certain circumstances, we were calling kmem_free() from i386 cpu_thread_exit(). This resulted in a panic with WITNESS since we need to hold Giant to call kmem_free(), and we weren't helding it anymore in cpu_thread_exit(). We now do this from a new MD function, cpu_thread_dtor(), called by thread_dtor(). Approved by: re@ Suggested by: jhb	2002-11-22 23:57:02 +00:00
Jeff Roberson	79acfc497b	- Add the new sched_pctcpu() function to the sched_* api. - Provide a routine in sched_4bsd to add this functionality. - Use sched_pctcpu() in kern_proc, which is the one place outside of sched_4bsd where the old pctcpu value was accessed directly. Approved by: re	2002-11-21 09:30:55 +00:00
Jeff Roberson	06439a04a1	- Move scheduler specific macros and defines out of proc.h Approved by: re	2002-11-21 09:14:13 +00:00
Jeff Roberson	148302c9c9	- Move FSCALE back to kern_sync. This is not scheduler specific. - Create a new callout for lbolt and move it out of schedcpu(). This is not scheduler specific either. Approved by: re	2002-11-21 08:57:08 +00:00
Jeff Roberson	de028f5a4a	- Implement a mechanism for allowing schedulers to place scheduler dependant data in the scheduler independant structures (proc, ksegrp, kse, thread). - Implement unused stubs for this mechanism in sched_4bsd. Approved by: re Reviewed by: luigi, trb Tested on: x86, alpha	2002-11-21 01:22:38 +00:00
Robert Watson	2555374c4f	Introduce p_label, extensible security label storage for the MAC framework in struct proc. While the process label is actually stored in the struct ucred pointed to by p_ucred, there is a need for transient storage that may be used when asynchronous (deferred) updates need to be performed on the "real" label for locking reasons. Unlike other label storage, this label has no locking semantics, relying on policies to provide their own protection for the label contents, meaning that a policy leaf mutex may be used, avoiding lock order issues. This permits policies that act based on historical process behavior (such as audit policies, the MAC Framework port of LOMAC, etc) can update process properties even when many existing locks are held without violating the lock order. No currently committed policies implement use of this label storage. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-20 15:41:25 +00:00
Robert Watson	a3df768b04	Merge kld access control checks from the MAC tree: these access control checks permit policy modules to augment the system policy for permitting kld operations. This permits policies to limit access to kld operations based on credential (and other) properties, as well as to perform checks on the kld being loaded (integrity, etc). Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-19 22:12:42 +00:00
Robert Watson	293d2d2261	We leaked a process lock reference in the event an RFTHREAD process leader wasn't exiting during a fork; instead, do remember to release the lock avoiding lock order reversals and recursion panic. Reported by: "Joel M. Baldwin" <qumqats@outel.org>	2002-11-18 14:23:21 +00:00
David Xu	bfd8325073	Make sure only update wall clock at upcall time, slightly reformat code in kse_relase().	2002-11-18 12:28:15 +00:00
Alfred Perlstein	ec63e12a03	During shutdown explain what the numbers following the 'syncing disks' message mean, specifically, 'buffers remaining...'.	2002-11-18 02:41:03 +00:00
David Xu	8798d4f9c8	1. Support versioning and wall clock in kse mailbox, also add rusage time in thread mailbox. 2. Minor change for thread limit code in thread_user_enter(), fix typo in kse_release() last I committed. Reviewed by: deischen, mini	2002-11-18 01:59:31 +00:00
Julian Elischer	904f1b77cc	include smp.h. it is required by some code that was commented out until david's last commit.	2002-11-17 23:26:42 +00:00
David Xu	fdc5ecd24f	1.Add sysctls to control KSE resource allocation. kern.threads.max_threads_per_proc kern.threads.max_groups_per_proc 2.Temporary disable borrower thread stash itself as owner thread's spare thread in thread_exit(). there is a race between owner thread and borrow thread: an owner thread may allocate a spare thread as this: if (td->td_standin == NULL) td->standin = thread_alloc(); but thread_alloc() can block the thread, then a borrower thread would possible stash it self as owner's spare thread in thread_exit(), after owner is resumed, result is a thread leak in kernel, double check in owner can avoid the race, but it may be ugly and not worth to do.	2002-11-17 11:47:03 +00:00
David Xu	db9b0729fc	Rework last exiting thread in kse_release(), wait a signal and then schedule an upcall and call thread_exit().	2002-11-17 10:12:00 +00:00
Jeff Roberson	a9a088823e	- Release the imgp vnode prior to freeing exec_map resources to avoid deadlock.	2002-11-17 09:33:00 +00:00
Alfred Perlstein	f51c1e897d	Rework the sysconf(3) interaction with aio: sysconf.c: Use 'break' rather than 'goto yesno' in sysconf.c so that we report a '0' return value from the kernel sysctl. vfs_aio.c: Make aio reset its configuration parameters to -1 after unloading instead of 0. posix4_mib.c: Initialize the aio configuration parameters to -1 to indicate that it is not loaded. Add a facility (p31b_iscfg()) to determine if a posix4 facility has been initialized to avoid having to re-order the SYSINITs. Use p31b_iscfg() to determine if aio has had a chance to run yet which is likely if it is compiled into the kernel and avoid spamming its values. Introduce a macro P31B_VALID() instead of doing the same comparison over and over. posix4.h: Prototype p31b_iscfg().	2002-11-17 04:15:34 +00:00
Alan Cox	4fec79bef8	Now that pmap_remove_all() is exported by our pmap implementations use it directly.	2002-11-16 07:44:25 +00:00
Alfred Perlstein	86d52125a2	Export the values for _SC_AIO_MAX and _SC_AIO_PRIO_DELTA_MAX via the p1003b sysctl interface.	2002-11-16 06:38:07 +00:00
Daniel Eischen	f3ec9000e9	Regenerate after adding system calls.	2002-11-16 06:36:56 +00:00
Daniel Eischen	2be05b70c9	Add getcontext, setcontext, and swapcontext as system calls. Previously these were libc functions but were requested to be made into system calls for atomicity and to coalesce what might be two entrances into the kernel (signal mask setting and floating point trap) into one. A few style nits and comments from bde are also included. Tested on alpha by: gallatin	2002-11-16 06:35:53 +00:00
Alfred Perlstein	c844abc920	Call 'p31b_setcfg(CTL_P1003_1B_AIO_LISTIO_MAX, AIO_LISTIO_MAX)' when AIO is initialized so that sysconf() gives correct results. Reported by: Craig Rodrigues <rodrigc@attbi.com>	2002-11-16 04:22:55 +00:00
Alfred Perlstein	b565fb9e6f	headers should not really include "opt_foo.h" (in this case opt_posix.h). remove it from the header and add it to the files that require it.	2002-11-15 22:55:06 +00:00
David Xu	1d2c5bd519	Return EWOULDBLOCK for last thread in kse_release(). Requested by: archie	2002-11-15 00:53:59 +00:00
Thomas Moestl	01ee43955c	Make the msg_size, msg_bufx and msg_bufr memebers of struct msgbuf signed, since they describe a ring buffer and signed arithmetic is performed on them. This avoids some evilish casts. Since this changes all but two members of this structure, style(9) those remaining ones, too. Requested by: bde Reviewed by: bde (earlier version)	2002-11-14 16:11:12 +00:00
David Xu	ca161eb6e9	In kse_release(), check if current thread is bound and current kse mailbox was already initialized, also prevent last thread from exiting unless we figure out how to safely support null thread proc.	2002-11-14 06:06:45 +00:00
Robert Watson	a96acd1ace	Introduce a condition variable to avoid returning EBUSY when the MAC policy list is busy during a load or unload attempt. We assert no locks held during the cv wait, meaning we should be fairly deadlock-safe. Because of the cv model and busy count, it's possible for a cv waiter waiting for exclusive access to the policy list to be starved by active and long-lived access control/labeling events. For now, we accept that as a necessary tradeoff. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-13 15:47:09 +00:00
Maxime Henrion	2bb95458bd	Add support for the C99 %t format modifier.	2002-11-13 15:15:59 +00:00
Robert Watson	63b6f478ec	Garbage collect mac_create_devfs_vnode() -- it hasn't been used since we brought in the new cache and locking model for vnode labels. We now rely on mac_associate_devfs_vnode(). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-12 04:20:36 +00:00
John Baldwin	d2b28e078a	Correct an assertion in the code to traverse the list of locks to find an earlier acquired lock with the same witness as the lock currently being acquired. If we had released several earlier acquired locks after acquiring enough locks to require another lock_list_entry bucket in the lock list, then subsequent lock_list_entry buckets could contain only one lock instance in which case i would be zero. Reported by: Joel M. Baldwin <qumqats@outel.org>	2002-11-11 16:36:20 +00:00
Robert Watson	2d43d24ed4	Garbage collect definition of M_MACOPVEC -- we no longer perform a dynamic mapping of an operation vector into an operation structure, rather, we rely on C99 sparse structure initialization. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-11 14:15:58 +00:00
Alan Cox	d154fb4fe6	When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations. Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().	2002-11-10 07:12:04 +00:00
Alfred Perlstein	29f194457c	Fix instances of macros with improperly parenthasized arguments. Verified by: md5	2002-11-09 12:55:07 +00:00
Robert Watson	6d7bdc8def	Assign value of NULL to imgp->execlabel when imgp is initialized in the ELF code. Missed in earlier merge from the MAC tree. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-08 20:49:50 +00:00
Robert Watson	52378b8acd	To reduce per-return overhead of userret(), call into mac_thread_userret() only if PS_MACPEND is set in the process AST mask. This avoids the cost of the entry point in the common case, but requires policies interested in the userret event to set the flag (protected by the scheduler lock) if they do want the event. Since all the policies that we're working with which use mac_thread_userret() use the entry point only selectively to perform operations deferred for locking reasons, this maintains the desired semantics. Approved by: re Requested by: bde Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-08 19:00:17 +00:00
Robert Watson	9fa3506ecd	Add an explicit execlabel argument to exec-related MAC policy entry points, rather than relying on policies to grub around in the image activator instance structure. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-08 18:04:00 +00:00
Thomas Moestl	0fca57b8b8	Move the definitions of the hw.physmem, hw.usermem and hw.availpages sysctls to MI code; this reduces code duplication and makes all of them available on sparc64, and the latter two on powerpc. The semantics by the i386 and pc98 hw.availpages is slightly changed: previously, holes between ranges of available pages would be included, while they are excluded now. The new behaviour should be more correct and brings i386 in line with the other architectures. Move physmem to vm/vm_init.c, where this variable is used in MI code.	2002-11-07 23:57:17 +00:00
John Baldwin	6274bdda4c	- Use %j to print intmax_t values. - Cast more daddr_t values to intmax_t when printing to quiet warnings.	2002-11-07 22:41:08 +00:00
John Baldwin	d0e938f4f1	Use %z to quiet a warning.	2002-11-07 22:38:04 +00:00
Maxime Henrion	a7a00d0546	- Fix a bunch of casts to long which were truncating off_t's. - Remove the comments which were justifying this by the fact that we don't have %q in the kernel, this was probably right back in time, but we now have %q, and we even have better to print those types (%j).	2002-11-07 21:56:05 +00:00
Maxime Henrion	b65d1ba9dd	- Use a better definition for MNAMELEN which doesn't require to have one #ifdef per architecture. - Change a space to a tab after a nearby #define. Obtained from: bde	2002-11-07 21:15:02 +00:00
Robert Watson	f8f750c53e	Do a bit more work in the aio code to simulate the credential environment of the original AIO request: save and restore the active thread credential as well as using the file credential, since MAC (and some other bits of the system) rely on the thread credential instead of/as well as the file credential. In brief: cache td->td_ucred when the AIO operation is queued, temporarily set and restore the kernel thread credential, and release the credential when done. Similar to ktrace credential management. Reviewed by: alc Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-07 20:46:37 +00:00
Kelly Yancey	04ac9b97b5	Spotted a couple of places where the socket buffer's counters were being manipulated directly (rather than using sballoc()/sbfree()); update them to tweak the new sb_ctl field too. Sponsored by: NTT Multimedia Communications Labs	2002-11-05 18:52:25 +00:00
Kelly Yancey	247a32f22a	Fix filt_soread() to properly flag a kevent when a 0-byte datagram is received. Verified by: dougb, Manfred Antar <null@pozo.com> Sponsored by: NTT Multimedia Communications Labs	2002-11-05 18:48:46 +00:00
Robert Watson	0c93266b9c	Correct merge-o: disable the right execve() variation if !MAC	2002-11-05 18:04:50 +00:00
Robert Watson	670cb89bf4	Bring in two sets of changes: (1) Permit userland applications to request a change of label atomic with an execve() via mac_execve(). This is required for the SEBSD port of SELinux/FLASK. Attempts to invoke this without MAC compiled in result in ENOSYS, as with all other MAC system calls. Complexity, if desired, is present in policy modules, rather than the framework. (2) Permit policies to have access to both the label of the vnode being executed as well as the interpreter if it's a shell script or related UNIX nonsense. Because we can't hold both vnode locks at the same time, cache the interpreter label. SEBSD relies on this because it supports secure transitioning via shell script executables. Other policies might want to take both labels into account during an integrity or confidentiality decision at execve()-time. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 17:51:56 +00:00
Robert Watson	051c41caf1	Regen.	2002-11-05 17:48:04 +00:00
Robert Watson	21bb9ea225	Flesh out the definition of __mac_execve(): per earlier discussion, it's essentially execve() with an optional MAC label argument. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 17:47:08 +00:00
Robert Watson	4443e9ff4a	Assert that appropriate vnodes are locked in mac_execve_will_transition(). Allow transitioning to be twiddled off using the process and fs enforcement flags, although at some point this should probably be its own flag. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 15:11:33 +00:00
Robert Watson	ccafe7eb35	Hook up the mac_will_execve_transition() and mac_execve_transition() entrypoints, #ifdef MAC. The supporting logic already existed in kern_mac.c, so no change there. This permits MAC policies to cause a process label change as the result of executing a binary -- typically, as a result of executing a specially labeled binary. For example, the SEBSD port of SELinux/FLASK uses this functionality to implement TE type transitions on processes using transitioning binaries, in a manner similar to setuid. Policies not implementing a notion of transition (all the ones in the tree right now) require no changes, since the old label data is copied to the new label via mac_create_cred() even if a transition does occur. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 14:57:49 +00:00
Giorgos Keramidas	5f9ae8e026	Typo in comment: commmand -> command Reviewed by: jhb	2002-11-05 14:54:07 +00:00
Robert Watson	450ffb4427	Remove reference to struct execve_args from struct imgact, which describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes. There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 01:59:56 +00:00
Robert Watson	e5e820fd1f	Permit MAC policies to instrument the access control decisions for system accounting configuration and for nfsd server thread attach. Policies might use this to protect the integrity or confidentiality of accounting data, limit the ability to turn on or off accounting, as well as to prevent inappropriately labeled threads from becoming nfs server threads. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-04 15:13:36 +00:00
Robert Watson	3da87a65c7	Remove mac_cache_fslabel_in_vnode sysctl -- with the new VFS/MAC construction, labels are always cached. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-04 14:55:14 +00:00
Robert Watson	6201265be7	License clarification and wording changes: NAI has approved removal of clause three, and NAI Labs now goes by the name Network Associates Laboratories.	2002-11-04 01:42:39 +00:00
Robert Watson	4b8d5f2d97	Introduce mac_check_system_settime(), a MAC check allowing policies to augment the system policy for changing the system time. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-03 02:39:42 +00:00
Robert Watson	01ce3b5661	Regen from yesterday's system call placeholder rename.	2002-11-02 23:54:36 +00:00
Alan Cox	151113a946	Catch up with the removal of the vm page buckets spin mutex.	2002-11-02 22:42:18 +00:00
Alan Cox	5ee0a409fc	Revert the change in revision 1.77 of kern/uipc_socket2.c. It is causing a panic because the socket's state isn't as expected by sofree(). Discussed with: dillon, fenner	2002-11-02 05:14:31 +00:00
Kelly Yancey	47baac87a6	Update the st_size reported via stat(2) to accurately reflect the amount of data available to read for non-TCP sockets. Reviewed by: -net, -arch Sponsored by: NTT Multimedia Communications Labs MFC after: 2 weeks	2002-11-01 21:31:13 +00:00
Kelly Yancey	e0f640e82d	Track the number of non-data chararacters stored in socket buffers so that the data value returned by kevent()'s EVFILT_READ filter on non-TCP sockets accurately reflects the amount of data that can be read from the sockets by applications. PR: 30634 Reviewed by: -net, -arch Sponsored by: NTT Multimedia Communications Labs MFC after: 2 weeks	2002-11-01 21:27:59 +00:00
Robert Watson	6cedb451fb	Rename __execve_mac() to __mac_execve() for increased consistency with other MAC system calls. Requested by: various (phk, gordont, jake, ...)	2002-11-01 21:00:02 +00:00
Robert Watson	e686e5ae91	Add MAC checks for various kenv() operations: dump, get, set, unset, permitting MAC policies to limit access to the kernel environment. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-01 20:46:53 +00:00
Poul-Henning Kamp	1fb14a47a1	Introduce malloc_last_fail() which returns the number of seconds since malloc(9) failed last time. This is intended to help code adjust memory usage to the current circumstances. A typical use could be: if (malloc_last_fail() < 60) reduce_cache_by_one();	2002-11-01 18:58:12 +00:00
Poul-Henning Kamp	38b0884cc3	Introduce a "time_uptime" global variable which holds the time since boot in seconds.	2002-11-01 18:52:20 +00:00
David Xu	adac9400a7	KSE-enabled processes only.	2002-10-31 08:00:51 +00:00

... 6 7 8 9 10 ...

6381 Commits