freebsd-dev

Author	SHA1	Message	Date
John Baldwin	fdcc1cc09f	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Mike Silbersack	7f3a40933b	Fix a horribly suboptimal algorithm in the vm_daemon. In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times. Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change. Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began. Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals. PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week	2002-02-27 18:03:02 +00:00
Peter Wemm	d1693e1701	Back out all the pmap related stuff I've touched over the last few days. There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.	2002-02-27 09:51:33 +00:00
Peter Wemm	bd1e3a0f89	Jake further reduced IPI shootdowns on sparc64 in loops by using ranged shootdowns in a couple of key places. Do the same for i386. This also hides some physical addresses from higher levels and has it use the generic vm_page_t's instead. This will help for PAE down the road. Obtained from: jake (MI code, suggestions for MD part)	2002-02-27 02:14:58 +00:00
Peter Wemm	dd50331c0e	Remove unused variable (td)	2002-02-26 01:01:37 +00:00
Poul-Henning Kamp	57c10583aa	GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.	2002-02-22 09:26:35 +00:00
Tor Egge	d2760948fe	Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices. Reviewed by: dillon, alc	2002-02-19 23:19:30 +00:00
Mike Silbersack	0c9e47230a	Add one more comment to the OOM changes so that future readers of the code may better understand the code. Suggested by: dillon MFC after: 1 week	2002-02-19 18:50:49 +00:00
Mike Silbersack	ef6020d187	Changes to make the OOM killer much more effective: - Allow the OOM killer to target processes currently locked in memory. These very often are the ones doing the memory hogging. - Drop the wakeup priority of processes currently sleeping while waiting for their page fault to complete. In order for the OOM killer to work well, the killed process and other system processes waiting on memory must be allowed to wakeup first. Reviewed by: dillon MFC after: 1 week	2002-02-19 18:34:02 +00:00
Bruce Evans	1e92845e1b	Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED, DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET and SIMPLELOCK_DEBUG.	2002-02-15 13:16:11 +00:00
Julian Elischer	2c1007663f	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
Julian Elischer	079b7badea	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,	2002-02-07 20:58:47 +00:00
Alfred Perlstein	582ec34cd8	Fix a race with free'ing vmspaces at process exit when vmspaces are shared. Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces. The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update. Submitted by: green	2002-02-05 21:23:05 +00:00
Matthew Dillon	027df6bdd7	GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer cache lockups for over a year now. MFC after: 0 days	2002-01-31 18:39:44 +00:00
David Malone	d2979f90e7	Remove a parameter name from a prototype.	2002-01-25 21:33:10 +00:00
Bruce Evans	e50f5c2e8d	Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined. Fixed some style bugs.	2002-01-17 16:46:26 +00:00
Alfred Perlstein	a4db49537b	Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().	2002-01-14 00:13:45 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
John Baldwin	c86b6ff551	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha	2002-01-05 08:47:13 +00:00
Matthew Dillon	23b590188f	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
Matthew Dillon	3ebeaf5984	This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week	2001-12-14 01:16:57 +00:00
Luigi Rizzo	60363fb9f7	vm/vm_kern.c: rate limit (to once per second) diagnostic printf when you run out of mbuf address space. kern/subr_mbuf.c: print a warning message when mb_alloc fails, again rate-limited to at most once per second. This covers other cases of mbuf allocation failures. Probably it also overlaps the one handled in vm/vm_kern.c, so maybe the latter should go away. This warning will let us gradually remove the printf that are scattered across most network drivers to report mbuf allocation failures. Those are potentially dangerous, in that they are not rate-limited and can easily cause systems to panic. Unless there is disagreement (which does not seem to be the case judging from the discussion on -net so far), and because this is sort of a safety bugfix, I plan to commit a similar change to STABLE during the weekend (it affects kern/uipc_mbuf.c there). Discussed-with: jlemon, silby and -net	2001-12-01 00:21:30 +00:00
Jonathan Lemon	4584bbf555	When laying out objects in a ZONE_INTERRUPT zone, allow them to cross a page boundary, since we've already allocated all our contiguous kva space up front. This eliminates some memory wastage, and allows us to actually reach the # of objects were specified in the zinit() call. Reviewed by: peter, dillon	2001-11-17 00:40:48 +00:00
Matthew Dillon	fe8e0238cc	Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress count on a vnode-backed object must be incremented after obtaining the vnode lock. If it is bumped before obtaining the vnode lock we can deadlock against vtruncbuf(). Submitted by: peter, ps MFC after: 3 days	2001-11-09 21:34:45 +00:00
Matthew Dillon	33c6774151	Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond the file EOF. This works around a bug in the ISOFS (CDRom) BMAP code which returns bogus values for requests beyond the file EOF rather then returning an error, resulting in either corrupt data being mmap()'d beyond the file EOF or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of CDRom files). Reported by: peter / Yahoo MFC after: 3 days	2001-11-05 18:58:47 +00:00
Matthew Dillon	e302698320	Don't let pmap_object_init_pt() exhaust all available free pages (allocating pv entries w/ zalloci) when called in a loop due to an madvise(). It is possible to completely exhaust the free page list and cause a system panic when an expected allocation fails.	2001-10-31 03:06:33 +00:00
Matthew Dillon	7a5a635273	Move recently added procedure which was incorrectly placed within an #ifdef DDB block.	2001-10-26 16:27:54 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Matthew Dillon	57601bcb5d	Syntax cleanup and documentation, no operational changes. MFC after: 1 day	2001-10-21 06:12:06 +00:00
Ian Dowse	0eb6ce3169	Move the code that computes the system load average from vm_meter.c to kern_synch.c in preparation for adding some jitter to the inter-sample time. Note that the "vm.loadavg" sysctl still lives in vm_meter.c which isn't the right place, but it is appropriate for the current (bad) name of that sysctl. Suggested by: jhb (some time ago) Reviewed by: bde	2001-10-20 13:10:43 +00:00
Matthew Dillon	b386828956	contigmalloc1() could cause the vm_page_zero_count to become incorrect. Properly track the count. Submitted by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>	2001-10-17 17:34:34 +00:00
Tor Egge	d6844b6bf6	Don't use an uninitialized field reserved for callers in the bio structure passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage. Reviewed by: dillon	2001-10-15 23:02:54 +00:00
Tor Egge	30105b9ec4	Don't remove all mappings of a swapped out process if the vm map contained wired entries. vm_fault_unwire() depends on the mapping being intact. Reviewed by: dillon	2001-10-14 20:51:14 +00:00
Tor Egge	e7673b8424	Fix locking violations during page wiring: - vm map entries are not valid after the map has been unlocked. - An exclusive lock on the map is needed before calling vm_map_simplify_entry(). Fix cleanup after page wiring failure to unwire all pages that had been successfully wired before the failure was detected. Reviewed by: dillon	2001-10-14 20:47:08 +00:00
Matthew Dillon	33bd457d91	Makes contigalloc[1]() create the vm_map / underlying wired pages in the kernel map and object in a manner that contigfree() is actually able to free. Previously contigfree() freed up the KVA space but could not unwire & free the underlying VM pages due to mismatched pageability between the map entry and the VM pages. Submitted by: Thomas Moestl <tmoestl@gmx.net> Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu> MFC after: 3 days	2001-10-13 04:23:37 +00:00
Matthew Dillon	00a6f47f13	Finally fix the VM bug where a file whos EOF occurs in the middle of a page would sometimes prevent a dirty page from being cleaned, even when synced, resulting in the dirty page being re-flushed to disk every 30-60 seconds or so, forever. The problem is that when the filesystem flushes a page to its backing file it typically does not clear dirty bits representing areas of the page that are beyond the file EOF. If the file is also mmap()'d and a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty bits to VM_PAGE_BITS_ALL. This combination could leave us with an uncleanable, unfreeable page. The solution is to have the vnode_pager detect the edge case and manually clear the dirty bits representing areas beyond the file EOF. The filesystem does the rest and the page comes up clean after the write completes. MFC after: 3 days	2001-10-12 18:17:34 +00:00
John Baldwin	bd78cece5d	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
John Baldwin	61d80e90a9	Add missing includes of sys/ktr.h.	2001-10-11 17:53:43 +00:00
Paul Saab	cbc89bfbfe	Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks	2001-10-10 23:06:54 +00:00
Ian Dowse	564bfabecb	Remove the SSLEEP case from the load average computation. This has been a no-op for as long as our CVS history goes back. Processes in state SSLEEP could only be counted if p_slptime == 0, but immediately before loadav() is called, schedcpu() has just incremented p_slptime on all SSLEEP processes.	2001-10-04 22:33:31 +00:00
Robert Watson	8c5d4fe829	o Modify access control checks in mmap() to use securelevel_gt() instead of direct variable access. Obtained from: TrustedBSD Project	2001-09-26 20:29:39 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Peter Wemm	eb30c1c0b9	Rip some well duplicated code out of cpu_wait() and cpu_exit() and move it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical. XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place. Reviewed by: jake, tmm, dillon	2001-09-10 04:28:58 +00:00
John Baldwin	29fdb744d1	Process priority is locked by the sched_lock, not the proc lock.	2001-09-01 20:16:30 +00:00
Matthew Dillon	7feaf028be	make swapon() MPSAFE (will adjust syscalls.master later)	2001-08-31 22:15:37 +00:00
Matthew Dillon	6a33d53c48	mark obreak() and ovadvise() as being MPSAFE	2001-08-31 22:10:03 +00:00
Matthew Dillon	d2c60af81a	Cleanup	2001-08-31 01:26:30 +00:00
Peter Wemm	3516c025ff	Implement idle zeroing of pages. I've been tinkering with this on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.	2001-08-25 05:00:44 +00:00
Matthew Dillon	676274db9b	Remove support for the badly broken MAP_INHERIT (from -current only).	2001-08-24 19:29:56 +00:00

1 2 3 4 5 ...

1094 Commits