freebsd-skq

Author	SHA1	Message	Date
alc	39788de49e	Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine whether the mapping should permit execute access.	2005-09-03 18:20:20 +00:00
kan	51355225d4	Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable. vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way. Reported by: ade Obtained from: alc MFC after: 2 days	2005-08-13 20:21:33 +00:00
tegge	f13480db93	Check for marker pages when scanning active and inactive page queues. Reviewed by: alc	2005-08-12 18:17:40 +00:00
des	1722544f93	Introduce the vm.boot_pages tunable and sysctl, which controls the number of pages reserved to bootstrap the kernel memory allocator. MFC after: 2 weeks	2005-08-12 12:24:19 +00:00
tegge	a73af1b81d	Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE due to the vm object being locked. When a process writes large amounts of data to a file, the vm object associated with that file can contain most of the physical pages on the machine. If the process is preempted while holding the lock on the vm object, pagedaemon would be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to other vm objects. Temporarily unlock the page queues lock while locking vm objects to avoid lock order violation. Detect and handle relevant page queue changes. This change depends on both the lock portion of struct vm_object and normal struct vm_page being type stable. Reviewed by: alc	2005-08-10 00:17:36 +00:00
ssouhlal	871cf7b33c	Use atomic operations on runningbufspace. PR: kern/84318 Submitted by: ade MFC after: 3 days	2005-08-08 22:44:10 +00:00
rwatson	149e4a2dca	Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined, as opt_vmpage.h will not be available to user space library builds. A similar existing check is present for KLD_MODULE for similar reasons. MFC after: 3 days	2005-08-04 10:05:11 +00:00
rwatson	65053d65ad	Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be used from memstat_uma.c for the purposes of kvm access without lots of additional unsafe includes. MFC after: 3 days	2005-08-04 10:03:53 +00:00
rwatson	1d80a8864a	Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in the monitoring API, which might or might not be the same as the internal maximum (currently none). Export flag information on UMA zones -- in particular, whether or not this is a secondary zone, and so the keg free count should be considered in that light. MFC after: 1 day	2005-07-25 00:47:32 +00:00
alc	38bf328ab8	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
rwatson	6fa05635cf	Further UMA statistics related changes: - Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to to update the zone's uz_frees statistic. Previously, the statistic was updated unconditionally. - Use the flag in situations where a "real" free occurs: i.e., one where the caller is freeing an allocated item, to be differentiated from situations where uma_zfree_internal() is used to tear down the item during slab teardown in order to invoke its fini() method. Also use the flag when UMA is freeing its internal objects. - When exchanging a bucket with the zone from the per-CPU cache when freeing an item, flush cache statistics back to the zone (since the zone lock and critical section are both held) to match the allocation case. MFC after: 3 days	2005-07-20 18:47:42 +00:00
alc	bef24273ae	Eliminate an incorrect (and unnecessary) cast.	2005-07-20 18:41:08 +00:00
rwatson	37cd630e38	Use mp_maxid in preference to MAXCPU when creating exports of UMA per-CPU cache statistics. UMA sizes the cache array based on the number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU could read off the end of the array (into the next zone). Reported by: yongari MFC after: 1 week	2005-07-16 11:03:06 +00:00
rwatson	4ead3022dd	Improve canonicalization of copyrights. Order copyrights by order of assertion (jeff, bmilekic, rwatson). Suggested ages ago by: bde MFC after: 1 week	2005-07-16 09:51:52 +00:00
rwatson	18ba9c6401	Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so that it covers the following of the uc_alloc/freebucket cache pointers. Originally, I felt that the race wasn't helped by holding the mutex, hence a comment in the code and not holding it across the cache access. However, it does improve consistency, as while it doesn't prevent bucket exchange, it does prevent bucket pointer invalidation. So a race in gathering cache free space statistics still can occur, but not one that follows an invalid bucket pointer, if the mutex is held. Submitted by: yongari MFC after: 1 week	2005-07-16 09:40:34 +00:00
silby	24f7a1c3d6	Increase the flags field for kegs from a 16 to a 32 bit value; we have exhausted all 16 flags.	2005-07-16 02:23:41 +00:00
rwatson	87bbce9e08	Track UMA(9) allocation failures by zone, and export via sysctl. Requested by: victor cruceru <victor dot cruceru at gmail dot com> MFC after: 1 week	2005-07-15 23:34:39 +00:00
jhb	841c5ac424	Convert a remaining !fs.map->system_map to fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier revision. This fixes mutex assertion failures in the debug.mpsafevm=0 case. Reported by: ps MFC after: 3 days	2005-07-14 21:18:07 +00:00
rwatson	83343a94ec	Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator statistics via a binary structure stream: - Add structure 'uma_stream_header', which defines a stream version, definition of MAXCPUs used in the stream, and the number of zone records in the stream. - Add structure 'uma_type_header', which defines the name, alignment, size, resource allocation limits, current pages allocated, preferred bucket size, and central zone + keg statistics. - Add structure 'uma_percpu_stat', which, for each per-CPU cache, includes the number of allocations and frees, as well as the number of free items in the cache. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUs uma_percpu_stat structures holding per-CPU allocation information. Typical values of MAXCPU will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here, also export the number of UMA zones as a sysctl vm.uma_count, in order to assist in sizing user swpace buffers to receive the stream. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. This change directly supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats rather than separately maintained mbuf allocator statistics. MFC after: 1 week	2005-07-14 16:35:13 +00:00
rwatson	c24543fa50	In addition to tracking allocs in the zone, also track frees. Add a zone free counter, as well as a cache free counter. MFC after: 1 week	2005-07-14 16:17:21 +00:00
rwatson	3f3682a4b8	In an earlier world order, UMA would flush per-CPU statistics to the zone whenever it was moving buckets between the zone and the cache, or when coalescing statistics across the CPU. Remove flushing of statistics to the zone when coalescing statistics as part of sysctl, as we won't be running on the right CPU to write to the cache statistics. Add a missed gathering of statistics: when uma_zalloc_internal() does a special case allocation of a single item, make sure to update the zone statistics to represent this. Previously this case wasn't accounted for in user-visible statistics. MFC after: 1 week	2005-07-14 16:13:46 +00:00
silby	64582f3995	Change the panic in trash_ctor into just a printf for now. Once the reports of panics in trash_ctor relating to mbufs have been examined and a fix found, this will be turned back into a panic. Approved by: re (rwatson)	2005-06-26 23:44:07 +00:00
alc	67602b23a9	Increase UMA_BOOT_PAGES to prevent a crash during initialization. See http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed description of the crash. Reported by: Eric Anderson Approved by: re (scottl) MFC after: 3 days	2005-06-16 17:06:34 +00:00
green	3bb055500e	The new contigmalloc(9) has a bad degenerate case where there were many regions checked again and again despite knowing the pages contained were not usable and only satisfied the alignment constraints This case was compounded, especially for large allocations, by the practice of looping from the top of memory so as to keep out of the important low-memory regions. While the old contigmalloc(9) has the same problem, it is not as noticeable due to looping from the low memory to high. This degenerate case is fixed, as well as reversing the sense of the rest of the loops within it, to provide a tremendous speed increase. This makes the best case O(n * VM overhead) much more likely than the worst case O(4 * VM overhead). For comparison, the worst case for old contigmalloc would be O(5 * VM overhead) in addition to its strategy of turning used memory into free being highly pessimal. Also, fix a bug that in practice most likely couldn't have been triggered, int the new contigmalloc(9): it walked backwards from the end of memory without accounting for how many pages it needed. Potentially, nonexistant pages could have been mapped. This hasn't occurred because the kernel generally requests as its first contigmalloc(9) a single page. Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes MFC After: 1 month More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes	2005-06-11 00:05:16 +00:00
alc	53e95f1eb2	Add a comment to the effect that fictitious pages do not require the initialization of their machine-dependent fields.	2005-06-10 17:27:54 +00:00
alc	2d109601cb	Introduce a procedure, pmap_page_init(), that initializes the vm_page's machine-dependent fields. Use this function in vm_pageq_add_new_page() so that the vm_page's machine-dependent and machine-independent fields are initialized at the same time. Remove code from pmap_init() for initializing the vm_page's machine-dependent fields. Remove stale comments from pmap_init(). Eliminate the Boolean variable pmap_initialized from the alpha, amd64, i386, and ia64 pmap implementations. Its use is no longer required because of the above changes and earlier changes that result in physical memory that is being mapped at initialization time being mapped without pv entries. Tested by: cognet, kensmith, marcel	2005-06-10 03:33:36 +00:00
alc	6224234587	Update some comments to reflect the change from spl-based to lock-based synchronization.	2005-05-28 17:56:18 +00:00
ups	acfce18a2a	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.	2005-05-23 23:01:53 +00:00
alc	9c80b49669	Swap in can occur safely without Giant. Release Giant on entry to scheduler().	2005-05-22 21:06:07 +00:00
alc	3b49968234	Remove GIANT_REQUIRED from swapout_procs().	2005-05-22 00:30:50 +00:00
alc	b351552abd	Reduce the number of times that we acquire and release locks in swap_pager_getpages(). MFC after: 1 week	2005-05-20 21:26:05 +00:00
alc	45eab788de	Remove calls to spl*().	2005-05-19 06:11:13 +00:00
alc	eee15b6b76	Remove a stale comment concerning spl* usage.	2005-05-19 03:53:07 +00:00
alc	c7cb7f7317	Update some comments to reflect the change from spl-based to lock-based synchronization.	2005-05-18 22:08:52 +00:00
alc	1b7ce7e514	Remove calls to spl*().	2005-05-18 20:45:33 +00:00
alc	0e9f42833e	Revert revision 1.270: swp_pager_async_iodone() need not perform VM_LOCK_GIANT(). Discussed with: jeff	2005-05-18 17:48:04 +00:00
bz	b543d49d86	Correct 32 vs 64 bit signedness issues. Approved by: pjd (mentor) MFC after: 2 weeks	2005-05-18 08:57:31 +00:00
grehan	a442ec4d3f	The final test in unlock_and_deallocate() to determine if GIANT needs to be unlocked wasn't updated to check for OBJ_NEEDGIANT. This caused a WITNESS panic when debug_mpsafevm was set to 0. Approved by: jeffr	2005-05-12 04:09:41 +00:00
marcel	9be5f7d46a	Enable debug_mpsafevm on ia64 due to the severe functional regression caused by recent locking changes when it's off. Revert the logic to trim down the conditional. Clued-in by: alc@	2005-05-08 23:56:16 +00:00
jeff	95489bf6b4	- We need to inhert the OBJ_NEEDGIANT flag from the original object in vm_object_split(). Spotted by: alc	2005-05-04 20:54:16 +00:00
jeff	d62d255d2e	- Add a new object flag "OBJ_NEEDSGIANT". We set this flag if the underlying vnode requires Giant. - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT set. - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.	2005-05-03 11:11:26 +00:00
alc	f1dec39efb	Remove GIANT_REQUIRED from vmspace_exec(). Prodded by: jeff	2005-05-02 07:05:20 +00:00
jeff	5adae6c622	- VM_LOCK_GIANT in the swap pager's iodone routine as VFS will soon call it without Giant. Sponsored by: Isilon Systems, Inc.	2005-04-30 11:25:49 +00:00
rwatson	bb1e0b257a	Modify UMA to use critical sections to protect per-CPU caches, rather than mutexes, which offers lower overhead on both UP and SMP. When allocating from or freeing to the per-cpu cache, without INVARIANTS enabled, we now no longer perform any mutex operations, which offers a 1%-3% performance improvement in a variety of micro-benchmarks. We rely on critical sections to prevent (a) preemption resulting in reentrant access to UMA on a single CPU, and (b) migration of the thread during access. In the event we need to go back to the zone for a new bucket, we release the critical section to acquire the global zone mutex, and must re-acquire the critical section and re-evaluate which cache we are accessing in case migration has occured, or circumstances have changed in the current cache. Per-CPU cache statistics are now gathered lock-free by the sysctl, which can result in small races in statistics reporting for caches. Reviewed by: bmilekic, jeff (somewhat) Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others	2005-04-29 18:56:36 +00:00
jeff	f869be5c72	- Pass the ISOPEN flag to namei so filesystems will know we're about to open them or otherwise access the data.	2005-04-27 09:05:19 +00:00
kris	69fffa3c93	Add the vm.exec_map_entries tunable and read-only sysctl, which controls the number of entries in exec_map (maximum number of simultaneous execs that can be handled by the kernel). The default value of 16 is insufficient on heavily loaded machines (particularly SMP machines), and if it is exceeded then executing further processes will generate a SIGABRT. This is a workaround until a better solution can be implemented. Reviewed by: alc MFC after: 3 days	2005-04-25 19:22:05 +00:00
des	9900fad82d	Unbreak the build on 64-bit architectures.	2005-04-16 12:37:16 +00:00
jhb	de9a1fa207	Add a vm.blacklist tunable which can hold a space or comma seperated list of physical addresses. The pages containing these physical addresses will not be added to the free list and thus will effectively be ignored by the VM system. This is mostly useful for the case when one knows of specific physical addresses that have bit errors (such as from a memtest run) so that one can blacklist the bad pages while waiting for the new sticks of RAM to arrive. The physical addresses of any ignored pages are listed in the message buffer as well.	2005-04-15 21:45:02 +00:00
csjp	e89e83d7fe	Move MAC check_vnode_mmap entry point out from being exclusive to MAP_SHARED so that the entry point gets executed un-conditionally. This may be useful for security policies which want to perform access control checks around run-time linking. -add the mmap(2) flags argument to the check_vnode_mmap entry point so that we can make access control decisions based on the type of mapped object. -update any dependent API around this parameter addition such as function prototype modifications, entry point parameter additions and the inclusion of sys/mman.h header file. -Change the MLS, BIBA and LOMAC security policies so that subject domination routines are not executed unless the type of mapping is shared. This is done to maintain compatibility between the old vm_mmap_vnode(9) and these policies. Reviewed by: rwatson MFC after: 1 month	2005-04-14 16:03:30 +00:00
jhb	e62b86cdc5	Tidy vcnt() by moving a duplicated line above #ifdef and removing a useless variable.	2005-04-12 23:15:28 +00:00
jhb	13114caff8	Flip the switch and turn mpsafevm on by default for sparc64. Approved by: alc	2005-04-04 20:59:02 +00:00
jeff	0eef91eae9	- Don't NULL the vnode's v_object pointer until after the object is torn down. If we have dirty pages, the putpages routine will need to know what the vnode's object is so that it may write out dirty pages. Pointy hat: phk Found by: obrien	2005-04-03 22:56:58 +00:00
jhb	a3c6b782c3	- Change the vm_mmap() function to accept an objtype_t parameter specifying the type of object represented by the handle argument. - Allow vm_mmap() to map device memory via cdev objects in addition to vnodes and anonymous memory. Note that mmaping a cdev directly does not currently perform any MAC checks like mapping a vnode does. - Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the cdev the ioctl is acting on rather than trying to find a suitable vnode to map from. Reviewed by: alc, arch@	2005-04-01 20:00:11 +00:00
jeff	97c40ebd49	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
alc	75d6caaaf0	Eliminate (now) unnecessary acquisition and release of the global page queues lock in vm_object_backing_scan(). Updates to the page's PG_BUSY flag and busy field are synchronized by the containing object's lock. Testing the page's hold_count and wire_count in vm_object_backing_scan()'s OBSC_COLLAPSE_NOWAIT case is unnecessary. There is no reason why the held or wired pages cannot be migrated to the shadow object. Reviewed by: tegge	2005-03-30 05:40:02 +00:00
das	fb7ab34022	Move the swap_zone == NULL check earlier (i.e. before we dereference the pointer.) Found by: Coverity Prevent analysis tool	2005-03-18 21:22:48 +00:00
jeff	92d24b8044	- Don't lock the vnode interlock in vm_object_set_writeable_dirty() if we've already set the object flags. Reviewed by: alc	2005-03-17 12:03:42 +00:00
jeff	d1b34cc38c	- In vm_page_insert() hold the backing vnode when the first page is inserted. - In vm_page_remove() drop the backing vnode when the last page is removed. - Don't check the vnode to see if it must be reclaimed on every call to vm_page_free_toq() as we only check it now when it is actually required. This saves us two lock operations per call. Sponsored by: Isilon Systems, Inc.	2005-03-15 14:14:09 +00:00
jeff	41fb0028e9	- Don't directly adjust v_usecount, use vref() instead. Sponsored by: Isilon Systems, Inc.	2005-03-14 09:03:19 +00:00
jeff	eaf9deaeb7	- Retire OLOCK and OWANT. All callers hold the vnode lock when creating a vnode object. There has been an assert to prove this for some time. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:29:40 +00:00
jeff	0c26372161	- Don't acquire the vnode lock in destroy_vobject, assert that it has already been acquired by the caller. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:05:05 +00:00
alc	2b424cf256	Revert the first part of revision 1.114 and modify the second part. On architectures implementing uma_small_alloc() pages do not necessarily belong to the kmem object.	2005-02-24 06:13:01 +00:00
phk	66dfd63961	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
bmilekic	f9dded75d0	Well, it seems that I pre-maturely removed the "All rights reserved" statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects: sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h	2005-02-16 21:45:59 +00:00
bmilekic	8fa4f6f9a4	Make UMA set the overloaded page->object back to kmem_object for UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly came from kmem_map for those two. Previously it would set it back to NULL for UMA_ZONE_REFCNT zones and although this was probably not fatal, it added MORE code for no reason.	2005-02-16 20:06:11 +00:00
bmilekic	04e8cef9b4	Rather than overloading the page->object field like UMA does, use instead an unused pageq queue reference in the page structure to stash a pointer to the MemGuard FIFO. Using the page->object field caused problems because when vm_map_protect() was called the second time to set VM_PROT_DEFAULT back onto a set of pages in memguard_map, the protection in the VM would be changed but the PMAP code would lazily not restore the PG_RW bit on the underlying pages right away (see pmap_protect()). So when a page fault finally occured and the VM noticed the faulting address corresponds to a page that _does_ have write access now, it would then call into PMAP to set back PG_RW (i386 case being discussed here). However, before it got to do that, an assertion on the object lock not being owned would get triggered, as the object of the faulting page would need to be locked but was overloaded by MemGuard. This is precisely why MemGuard cannot overload page->object. Submitted by: Alan Cox (alc@)	2005-02-15 22:17:07 +00:00
phk	ccef1b4a6d	sysctl node vm.stats can not be static (for ia64 reasons).	2005-02-11 16:34:14 +00:00
bmilekic	c4f26f55ee	Implement support for buffers larger than PAGE_SIZE in MemGuard. Adds a little bit of complexity but performance requirements lacking (this is a debugging allocator after all), it's really not too bad (still only 317 lines). Also add an additional check to help catch really weird 3-threads-involved races: make memguard_free() write to the first page handed back, always, before it does anything else. Note that there is still a problem in VM+PMAP (specifically with vm_map_protect) w.r.t. MemGuard uses it, but this will be fixed shortly and this change stands on its own.	2005-02-10 22:36:05 +00:00
phk	0e0e2e5d1c	Make three SYSCTL_NODEs static	2005-02-10 12:18:36 +00:00
phk	a8ab852940	Make npages static and const.	2005-02-10 12:18:17 +00:00
ssouhlal	972ed7b626	Set the scheduling class of the zeroidle thread to PRI_IDLE. Reviewed by: jhb Approved by: grehan (mentor) MFC after: 1 week	2005-02-04 06:18:31 +00:00
alc	19b479b41f	Update the text of an assertion to reflect changes made in revision 1.148. Submitted by: tegge Eliminate an unnecessary, temporary increment of the backing object's reference count in vm_object_qcollapse(). Reviewed by: tegge	2005-01-30 21:29:47 +00:00
phk	9b1a8ec7bf	Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject(). Make the new function zero the vp->v_object pointer so we can tell if a call is missing.	2005-01-28 08:56:48 +00:00
phk	796d435574	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
phk	ba85bee696	Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject(). Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks. Call that new function from vop_stdcreatevobject(). Make vnode_pager_alloc() private now that its only user came home.	2005-01-24 21:21:59 +00:00
phk	d5c135375c	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
jeff	1dd5432139	- Remove GIANT_REQUIRED where giant is no longer required. - Use VFS_LOCK_GIANT() rather than directly acquiring giant in places where giant is only held because vfs requires it. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:48:29 +00:00
alc	abb2aba431	Guard against address wrap in kernacc(). Otherwise, a program accessing a bad address range through /dev/kmem can panic the machine. Submitted by: Mark W. Krentel Reported by: Kris Kennaway MFC after: 1 week	2005-01-22 19:21:29 +00:00
bmilekic	802a5a53d2	s/round_page/trunc_page/g I meant trunc_page. It's only a coincidence this hasn't caused problems yet. Pointed out by: Antoine Brodin <antoine.brodin@laposte.net>	2005-01-22 00:09:34 +00:00
bmilekic	da7116f3ac	Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.	2005-01-21 18:09:17 +00:00
alc	6d14143c58	Add checks to vm_map_findspace() to test for address wrap. The conditions where this could occur are very rare, but possible. Submitted by: Mark W. Krentel MFC after: 2 weeks	2005-01-18 19:50:09 +00:00
alc	3ffc6c3bf0	Consider three objects, O, BO, and BBO, where BO is O's backing object and BBO is BO's backing object. Now, suppose that O and BO are being collapsed. Furthermore, suppose that BO has been marked dead (OBJ_DEAD) by vm_object_backing_scan() and that either vm_object_backing_scan() has been forced to sleep due to encountering a busy page or vm_object_collapse() has been forced to sleep due to memory allocation in the swap pager. If vm_object_deallocate() is then called on BBO and BO is BBO's only shadow object, vm_object_deallocate() will collapse BO and BBO. In doing so, it adds a necessary temporary reference to BO. If this collapse also sleeps and the prior collapse resumes first, the temporary reference will cause vm_object_collapse to panic with the message "backing_object %p was somehow re-referenced during collapse!" Resolve this race by changing vm_object_deallocate() such that it doesn't collapse BO and BBO if BO is marked dead. Once O and BO are collapsed, vm_object_collapse() will attempt to collapse O and BBO. So, vm_object_deallocate() on BBO need do nothing. Reported by: Peter Holm on 20050107 URL: http://www.holm.cc/stress/log/cons102.html In collaboration with: tegge@ Candidate for RELENG_4 and RELENG_5 MFC after: 2 weeks	2005-01-15 21:12:47 +00:00
phk	cc0cbc6b34	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
phk	da2718f1af	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
bmilekic	bc2ae8f1d2	While we want the recursion protection for the bucket zones so that recursion from the VM is handled (and the calling code that allocates buckets knows how to deal with it), we do not want to prevent allocation from the slab header zones (slabzone and slabrefzone) if uk_recurse is not zero for them. The reason is that it could lead to NULL being returned for the slab header allocations even in the M_WAITOK case, and the caller can't handle that (this is also explained in a comment with this commit). The problem analysis is documented in our mailing lists: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current (see entire thread for proper context). Crash dump data provided by: Peter Holm <peter@holm.cc>	2005-01-11 03:33:09 +00:00
stefanf	bc3ec4dbb0	ISO C requires at least one element in an initialiser list.	2005-01-10 20:30:04 +00:00
alc	8c07cfa5a0	Move the acquisition and release of the page queues lock outside of a loop in vm_object_split() to avoid repeated acquisition and release.	2005-01-08 23:41:11 +00:00
alc	403229a01e	Transfer responsibility for freeing the page taken from the cache queue and (possibly) unlocking the containing object from vm_page_alloc() to vm_page_select_cache(). Recent optimizations to vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and pmap_enter_quick() have resulted in panic()s because vm_page_alloc() mistakenly unlocked objects that had not been locked by vm_page_select_cache(). Reported by: Peter Holm and Kris Kennaway	2005-01-07 05:02:19 +00:00
imp	f0bf889d0d	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
alc	437d364a03	Revise the part of vm_pageout_scan() that moves pages from the cache queue to the free queue. With this change, if a page from the cache queue belongs to a locked object, it is simply skipped over rather than moved to the inactive queue.	2005-01-06 20:22:36 +00:00
phk	3542218858	When allocating bio's in the swap_pager use M_WAITOK since the alternative is much worse.	2005-01-03 13:28:56 +00:00
alc	db7aa8b882	Assert that page allocations during an interrupt specify VM_ALLOC_INTERRUPT. Assert that pages removed from the cache queue are not busy.	2004-12-31 19:50:45 +00:00
alc	ec1a7d072a	Access to the page's busy field is (now) synchronized by the containing object's lock. Therefore, the assertion that the page queues lock is held can be removed from vm_page_io_start().	2004-12-29 04:18:22 +00:00
alc	1e5940b06a	Note that access to the page's busy count is synchronized by the containing object's lock.	2004-12-27 05:27:59 +00:00
alc	5c1258faf6	Assert that the vm object is locked on entry to vm_page_sleep_if_busy(); remove some unneeded code.	2004-12-26 21:46:44 +00:00
bmilekic	764e80eed7	Add my copyright and update Jeff's copyright on UMA source files, as per his request. Discussed with: Jeffrey Roberson	2004-12-26 00:35:12 +00:00
phk	c2be0afd64	fix comment	2004-12-25 21:30:41 +00:00
alc	f16b9f1b30	Continue the transition from synchronizing access to the page's PG_BUSY flag and busy field with the global page queues lock to synchronizing their access with the containing object's lock. Specifically, acquire the containing object's lock before reading the page's PG_BUSY flag and busy field in vm_fault(). Reviewed by: tegge@	2004-12-24 19:31:54 +00:00
alc	a618275b13	Modify pmap_enter_quick() so that it expects the page queues to be locked on entry and it assumes the responsibility for releasing the page queues lock if it must sleep. Remove a bogus comment from pmap_enter_quick(). Using the first change, modify vm_map_pmap_enter() so that the page queues lock is acquired and released once, rather than each time that a page is mapped.	2004-12-23 20:16:11 +00:00
alc	c4f7988cf9	Eliminate another unnecessary call to vm_page_busy(). (See revision 1.333 for a detailed explanation.)	2004-12-17 18:54:51 +00:00
alc	aafcafb659	Enable debug.mpsafevm by default on alpha.	2004-12-17 17:17:36 +00:00
alc	ede2fb9751	In the common case, pmap_enter_quick() completes without sleeping. In such cases, the busying of the page and the unlocking of the containing object by vm_map_pmap_enter() and vm_fault_prefault() is unnecessary overhead. To eliminate this overhead, this change modifies pmap_enter_quick() so that it expects the object to be locked on entry and it assumes the responsibility for busying the page and unlocking the object if it must sleep. Note: alpha, amd64, i386 and ia64 are the only implementations optimized by this change; arm, powerpc, and sparc64 still conservatively busy the page and unlock the object within every pmap_enter_quick() call. Additionally, this change is the first case where we synchronize access to the page's PG_BUSY flag and busy field using the containing object's lock rather than the global page queues lock. (Modifications to the page's PG_BUSY flag and busy field have asserted both locks for several weeks, enabling an incremental transition.)	2004-12-15 19:55:05 +00:00
alc	40ad9ef99b	With the removal of kern/uipc_jumbo.c and sys/jumbo.h, vm_object_allocate_wait() is not used. Remove it.	2004-12-08 05:01:47 +00:00
alc	b014c2904e	Almost nine years ago, when support for 1TB files was introduced in revision 1.55, the address parameter to vnode_pager_addr() was changed from an unsigned 32-bit quantity to a signed 64-bit quantity. However, an out-of-range check on the address was not updated. Consequently, memory-mapped I/O on files greater than 2GB could cause a kernel panic. Since the address is now a signed 64-bit quantity, the problem resolution is simply to remove a cast. Reviewed by: bde@ and tegge@ PR: 73010 MFC after: 1 week	2004-12-07 22:05:38 +00:00
alc	fcf141e6aa	Correct a sanity check in vnode_pager_generic_putpages(). The cast used to implement the sanity check should have been changed when we converted the implementation of vm_pindex_t from 32 to 64 bits. (Thus, RELENG_4 is not affected.) The consequence of this error would be a legimate write to an extremely large file being treated as an errant attempt to write meta- data. Discussed with: tegge@	2004-12-05 21:48:11 +00:00
das	130bed6547	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
cognet	f788045cc2	Remove useless casts.	2004-11-26 15:04:26 +00:00
delphij	2841d31dff	Try to close a potential, but serious race in our VM subsystem. Historically, our contigmalloc1() and contigmalloc2() assumes that a page in PQ_CACHE can be unconditionally reused by busying and freeing it. Unfortunatelly, when object happens to be not NULL, the code will set m->object to NULL and disregard the fact that the page is actually in the VM page bucket, resulting in page bucket hash table corruption and finally, a filesystem corruption, or a 'page not in hash' panic. This commit has borrowed the idea taken from DragonFlyBSD's fix to the VM fix by Matthew Dillon[1]. This version of patch will do the following checks: - When scanning pages in PQ_CACHE, check hold_count and skip over pages that are held temporarily. - For pages in PQ_CACHE and selected as candidate of being freed, check if it is busy at that time. Note: It seems that this is might be unrelated to kern/72539. Obtained from: DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1] Reminded by: Matt Dillon Reworked by: alc MFC After: 1 week	2004-11-24 18:56:13 +00:00
das	af608beb40	Disable U area swapping and remove the routines that create, destroy, copy, and swap U areas. Reviewed by: arch@	2004-11-20 02:29:00 +00:00
phk	d8b3df3cb9	Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)	2004-11-15 09:18:27 +00:00
phk	6809658d1c	Add pbgetbo()/pbrelbo() lighter weight versions of pbgetvp()/pbrelvp().	2004-11-15 08:47:18 +00:00
phk	f71f0d1e60	More kasserts.	2004-11-15 08:33:09 +00:00
phk	042171d217	style polishing.	2004-11-15 08:22:38 +00:00
phk	ca008fe171	Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff.	2004-11-15 08:12:50 +00:00
phk	4d081241bd	expect the caller to have called pbrelvp() if necessary.	2004-11-15 08:07:26 +00:00
phk	f57555b632	Explicitly call pbrelvp()	2004-11-15 08:06:05 +00:00
phk	d87333225e	Improve readability with a bunch of typedefs for the pager ops. These can also be used for prototypes in the pagers.	2004-11-09 13:43:20 +00:00
des	e836fd23ea	#include <vm/vm_param.h> instead of <machine/vmparam.h> (the former includes the latter, but also declares variables which are defined in kern/subr_param.c). Change som VM parameters from quad_t to unsigned long. They refer to quantities (size limits for text, heap and stack segments) which must necessarily be smaller than the size of the address space, so long is adequate on all platforms. MFC after: 1 week	2004-11-08 18:20:02 +00:00
alc	6314cca720	Eliminate an unnecessary atomic operation. Articulate the rationale in a comment.	2004-11-06 21:48:45 +00:00
rwatson	2b775a8633	Abstract the logic to look up the uma_bucket_zone given a desired number of entries into bucket_zone_lookup(), which helps make more clear the logic of consumers of bucket zones. Annotate the behavior of bucket_init() with a comment indicating how the various data structures, including the bucket lookup tables, are initialized.	2004-11-06 11:43:30 +00:00
phk	8b96099f88	Remove dangling variable	2004-11-06 11:33:11 +00:00
rwatson	69064711c1	Annotate what bucket_size[] array does; staticize since it's used only in uma_core.c.	2004-11-06 11:24:40 +00:00
das	9d935df169	Fix the last known race in swapoff(), which could lead to a spurious panic: swapoff: failed to locate %d swap blocks The race occurred because putpages() can block between the time it allocates swap space and the time it updates the swap metadata to associate that space with a vm_object, so swapoff() would complain about the temporary inconsistency. I hoped to fix this by making swp_pager_getswapspace() and swp_pager_meta_build() a single atomic operation, but that proved to be inconvenient. With this change, swapoff() simply doesn't attempt to be so clever about detecting when all the pageout activity to the target device should have drained.	2004-11-06 07:17:50 +00:00
alc	4030274372	Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc() because this call is only needed to wake threads that slept when they discovered a dead object connected to a vnode. To eliminate unnecessary calls to wakeup() by vnode_pager_dealloc(), introduce a new flag, OBJ_DISCONNECTWNT. Reviewed by: tegge@	2004-11-06 05:33:02 +00:00
jhb	88680af3b7	- Set the priority of the page zeroing thread using sched_prio() when the thread is created rather than adjusting the priority in the main function. (kthread_create() should probably take the initial priority as an argument.) - Only yield the CPU in the !PREEMPTION case if there are any other runnable threads. Yielding when there isn't anything else better to do just wastes time in pointless context switches (albeit while the system is idle.)	2004-11-05 19:14:02 +00:00
alc	cc2178b9c8	During traversal of the inactive queue, try locking the page's containing object before accessing the page's flags or the object's reference count.	2004-11-05 06:24:05 +00:00
alc	96eb8f832a	Eliminate another unnecessary call to vm_page_busy() that immediately precedes a call to vm_page_rename(). (See the previous revision for a detailed explanation.)	2004-11-05 05:40:45 +00:00
das	2beb616ced	Close a race in swapoff(). Here are the gory details: In order to avoid livelock, swapoff() skips over objects with a nonzero pip count and makes another pass if necessary. Since it is impossible to know which objects we care about, it would choose an arbitrary object with a nonzero pip count and wait for it before making another pass, the theory being that this object would finish paging about as quickly as the ones we care about. Unfortunately, we may have slept since we acquired a reference to this object. Hack around this problem by tsleep()ing on the pointer anyway, but timeout after a fixed interval. More elegant solutions are possible, but the ones I considered unnecessarily complicate this rare case. Also, kill some nits that seem to have crept into the swapoff() code in the last 75 revisions or so: - Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since the latter can be derived from the former. - Replace swp_pager_find_dev() with something simpler. There's no need to iterate over the entire list of swap devices just to determine if a given block is assigned to the one we're interested in. - Expand the scope of the swhash_mtx in a couple of places so that it isn't released and reacquired once for every hash bucket. - Don't drop the swhash_mtx while holding a reference to an object. We need to lock the object first. Unfortunately, doing so would violate the established lock order, so use VM_OBJECT_TRYLOCK() and try again on a subsequent pass if the object is already locked. - Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.	2004-11-05 05:36:56 +00:00
phk	e5715b2cc1	Retire b_magic now, we have the bufobj containing the same hint.	2004-11-04 09:48:18 +00:00
phk	50168ede53	De-couple our I/O bio request from the embedded bio in buf by explicitly copying the fields.	2004-11-04 08:38:07 +00:00
phk	1e4caea88c	Remove buf->b_dev field.	2004-11-04 07:59:57 +00:00
alc	25b80a64b9	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
alc	d9ced80d66	Introduce a Boolean variable wakeup_needed to avoid repeated, unnecessary calls to wakeup() by vm_page_zero_idle_wakeup().	2004-10-31 19:32:57 +00:00
alc	17432a99e5	During traversal of the active queue by vm_pageout_page_stats(), try locking the page's containing object before accessing the page's flags.	2004-10-30 23:30:53 +00:00
alc	ee68591e10	Eliminate an unused but initialized variable.	2004-10-30 20:11:23 +00:00
alc	c823d3b356	Add an assignment statement that I omitted from the previous revision.	2004-10-30 07:09:46 +00:00
alc	f73575dddd	Assert that the containing vm object is locked in vm_page_cache() and vm_page_try_to_cache().	2004-10-28 05:26:21 +00:00
bmilekic	13ebdd218a	Fix a INVARIANTS-only bug introduced in Revision 1.104: IF INVARIANTS is defined, and in the rare case that we have allocated some objects from the slab and at least one initializer on at least one of those objects failed, and we need to fail the allocation and push the uninitialized items back into the slab caches -- in that scenario, we would fail to [re]set the bucket cache's ub_bucket item references to NULL, which would eventually trigger a KASSERT.	2004-10-27 21:19:35 +00:00
alc	ce02afb500	During traversal of the active queue, try locking the page's containing object before accessing the page's flags or the object's reference count. If the trylock fails, handle the page as though it is busy.	2004-10-27 18:29:17 +00:00
phk	1b27d1d3b9	Also check that the sectormask is bigger than zero. Wrap this overly long KASSERT and remove newline.	2004-10-26 19:51:57 +00:00
phk	c66aa10c8e	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
phk	76b805d6f4	Don't clear flags we just checked were not set.	2004-10-26 05:57:29 +00:00
alc	50d63268a9	Assert that the containing vm object is locked in vm_page_flash().	2004-10-25 19:52:44 +00:00
alc	774f792bae	Assert that the containing vm object is locked in vm_page_busy() and vm_page_wakeup().	2004-10-24 23:53:47 +00:00
phk	1b25a59886	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
alc	0041f5bef4	Acquire the vm object lock before rather than after calling vm_page_sleep_if_busy(). (The motivation being to transition synchronization of the vm_page's PG_BUSY flag from the global page queues lock to the per-object lock.)	2004-10-24 19:32:19 +00:00
alc	17eb61eeb6	Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().	2004-10-24 18:46:32 +00:00
alc	faeb949021	Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab() that indicates that the caller does not want a page with its busy flag set. In many places, the global page queues lock is acquired and released just to clear the busy flag on a just allocated page. Both the allocation of the page and the clearing of the busy flag occur while the containing vm object is locked. So, the busy flag might as well never be set.	2004-10-24 06:15:36 +00:00
phk	52a089c526	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
phk	3833976d12	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00

1 2 3 4 5 ...

2264 Commits