freebsd-skq

Author	SHA1	Message	Date
Alan Cox	25524d3eba	o Lock accesses to the free queue(s) in vm_page_zero_idle().	2002-07-07 19:27:57 +00:00
Alan Cox	c7118ed61b	o Traverse the object's memq rather than repeatedly calling vm_page_lookup() in vm_object_split().	2002-07-07 06:01:25 +00:00
Jeff Roberson	f6b5b182e8	- Hold a lock on the vnode acquired from the file table across the call to vm_mmap() as well as the GETATTR etc. - If the handle is a vnode in vm_mmap() assert that it is locked. - Wiggle Giant around a little to account for the extra vnode operation.	2002-07-06 22:14:38 +00:00
Andrew Gallatin	f784043a9f	Remove bogus vm_page_wakeup() in vm_page_cowfault() that will cause panics in the zero-copy send path if a process attempts to write to a page which is still in flight. reviewed by: ken	2002-07-05 23:33:27 +00:00
Jeff Roberson	17b9cc4941	Fix a lock order reversal in uma_zdestroy. The uma_mtx needs to be held across calls to zone_drain(). Noticed by: scottl	2002-07-05 21:39:52 +00:00
Alan Cox	21f1b5331f	o Lock accesses to the free page queues in contigmalloc1().	2002-07-05 06:43:32 +00:00
Jeff Roberson	f5118d6aaf	Remove unnecessary includes.	2002-07-05 05:16:19 +00:00
Alan Cox	70c1763634	o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.	2002-07-04 22:07:37 +00:00
Julian Elischer	8108a14544	A small cleanup.	2002-07-04 12:37:13 +00:00
Julian Elischer	a30ec8f8b8	Don;t call teh thread setup routines from here.. they are already called when uma calls thread_init()	2002-07-04 12:31:54 +00:00
Alan Cox	22a97b04de	o Make the reservation of KVA space for kernel map entries a function of the KVA space's size in addition to the amount of physical memory and reduce it by a factor of two. Under the old formula, our reservation amounted to one kernel map entry per virtual page in the KVA space on a 4GB i386.	2002-07-03 19:16:37 +00:00
Jeff Roberson	e221e841b0	Actually use the fini callback. Pointy hat to: me :-( Noticed By: Julian	2002-07-03 00:30:51 +00:00
Robert Drehmel	47e151dd7a	- Use (OFF_TO_IDX(off) - pi) instead of (OFF_TO_IDX(off - IDX_TO_OFF(pi))). - Reformat a comment.	2002-07-01 14:14:07 +00:00
Alan Cox	c2eda4b565	o Remove some long dead code: from revision 1.41 of vm/vm_pager.c 3+ years ago. o Remove some unused prototypes.	2002-07-01 02:38:05 +00:00
Ian Dowse	300b96aca2	Change the type of `tscan' in vm_object_page_clean() to vm_pindex_t, as it stores an absolute page index that may not fit in a vm_offset_t.	2002-06-29 20:04:38 +00:00
Julian Elischer	e602ba25fd	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
Ian Dowse	23f09d50bb	Avoid using the 64-bit vm_pindex_t in a few places where 64-bit types are not required, as the overhead is unnecessary: o In the i386 pmap_protect(), `sindex' and `eindex' represent page indices within the 32-bit virtual address space. o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary variable to store the low few bits of a vm_pindex_t that gets used as an array index. o vm_uiomove() uses `osize' and `idx' for page offsets within a map entry. o In vm_object_split(), `idx' is a page offset within a map entry.	2002-06-26 20:32:51 +00:00
Ian Dowse	5125fe4f45	Use an explicit cast to avoid relying on sign extension to do the right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'. Reviewed by: dillon	2002-06-26 19:18:14 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Matthew Dillon	a69ac1740f	Enforce RLIMIT_VMEM on growable mappings (aka the primary stack or any MAP_STACK mapping). Suggested by: alc	2002-06-26 03:13:46 +00:00
Matthew Dillon	070f64fe6f	Part I of RLIMIT_VMEM implementation. Implement core functionality for a new resource limit that covers a process's entire VM space, including mmap()'d space. (Part II will be additional code to check RLIMIT_VMEM during exec() but it needs more fleshing out). PR: kern/18209 Submitted by: Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net> MFC after: 7 days	2002-06-26 00:29:28 +00:00
Ian Dowse	6395da5437	Complete the initial set of VM changes required to support full 64-bit file sizes. This step simply addresses the remaining overflows, and does attempt to optimise performance. The details are: o Use a 64-bit type for the vm_object `size' and the size argument to vm_object_allocate(). o Use the correct type for index variables in dev_pager_getpages(), vm_object_page_clean() and vm_object_page_remove(). o Avoid an overflow in the i386 pmap_object_init_pt().	2002-06-25 22:14:06 +00:00
Jeff Roberson	e78f35b33f	Turn VM_ALLOC_ZERO into a flag. Submitted by: tegge Reviewed by: dillon	2002-06-25 22:01:12 +00:00
Jeff Roberson	5c0e403ba2	Reduce the amount of code that runs with the zone lock held in slab_zalloc(). This allows us to run the zone initialization functions without any locks held.	2002-06-25 21:04:50 +00:00
Alan Cox	366838ddfe	o Eliminate vmspace::vm_minsaddr. It's initialized but never used. o Replace stale comments in vmspace by "const until freed" annotations on some fields.	2002-06-25 18:14:38 +00:00
Alan Cox	848d14193d	o Remove GIANT_REQUIRED from kmem_alloc_pageable(), kmem_alloc_nofault(), and kmem_free(). (Annotate as MPSAFE.) o Remove incorrect casts from kmem_alloc_pageable() and kmem_alloc_nofault().	2002-06-23 18:07:40 +00:00
Alan Cox	2cd301d1e1	o Remove the unnecessary acquisition and release of Giant around fdrop() in mmap(2).	2002-06-23 01:48:22 +00:00
Alan Cox	c04c996b25	o Reduce the scope of Giant in vm_mmap() to just the code that manipulates a vnode. (Thus, MAP_ANON and MAP_STACK never acquire Giant.)	2002-06-22 19:13:56 +00:00
Alan Cox	c8664f82a5	o Replace mtx_assert(&Giant, MA_OWNED) in dev_pager_alloc() with the acquisition and release of Giant. (Annotate as MPSAFE.) o Reorder the sanity checks in dev_pager_alloc() to reduce the time that Giant is held.	2002-06-22 18:36:51 +00:00
Alan Cox	409748276e	o In vm_map_insert(), replace GIANT_REQUIRED by the acquisition and release of Giant around the direct manipulation of the vm_object and the optional call to pmap_object_init_pt(). o In vm_map_findspace(), remove GIANT_REQUIRED. Instead, acquire and release Giant around the occasional call to pmap_growkernel(). o In vm_map_find(), remove GIANT_REQUIRED.	2002-06-22 17:47:12 +00:00
Alan Cox	24c46d036d	o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition and release of Giant. (Annotate as MPSAFE.)	2002-06-22 08:03:21 +00:00
Alan Cox	2a1618cd59	o Remove GIANT_REQUIRED from phys_pager_alloc(). If handle isn't NULL, acquire and release Giant. If handle is NULL, Giant isn't needed. o Annotate phys_pager_alloc() and phys_pager_dealloc() as MPSAFE.	2002-06-22 07:54:42 +00:00
Alan Cox	990ab7add4	o Replace GIANT_REQUIRED in vnode_pager_alloc() by the acquisition and release of Giant. (Annotate as MPSAFE.) o Also, in vnode_pager_alloc(), remove an unnecessary re-initialization of struct vm_object::flags and move a statement that is duplicated in both branches of an if-else.	2002-06-22 07:28:06 +00:00
Alan Cox	43a90f3a1b	o Remove GIANT_REQUIRED from vslock(). o Annotate kernacc(), useracc(), and vslock() as MPSAFE. Motivated by: alfred	2002-06-22 01:26:02 +00:00
Alan Cox	27168693db	o Remove GIANT_REQUIRED from vm_map_stack().	2002-06-21 06:03:47 +00:00
Alan Cox	7942194583	o Remove GIANT_REQUIRED from vm_pager_allocate() and vm_pager_deallocate().	2002-06-21 05:04:56 +00:00
Alan Cox	3d66f1384e	o Remove an incorrect cast from obreak(). This cast would, for example, break an sbrk(>=4GB) on 64-bit architectures even if the resource limit allowed it. o Correct an off-by-one error. o Correct a spelling error in a comment. o Reorder an && expression so that the commonly FALSE expression comes first. Submitted by: bde (bullets 1 and 2)	2002-06-20 18:38:28 +00:00
Alan Cox	5375be1861	o Acquire and release the vm_map lock instead of Giant in obreak(). Consequently, use vm_map_insert() and vm_map_delete(), which expect the vm_map to be locked, instead of vm_map_find() and vm_map_remove(), which do not.	2002-06-20 02:04:55 +00:00
Jeff Roberson	1e081f889b	- Move the computation of pflags out of the page allocation loop in kmem_malloc() - zero fill pages if PG_ZERO bit is not set after allocation in kmem_malloc() Suggested by: alc, jake	2002-06-19 23:49:57 +00:00
Jeff Roberson	3370c5bfd7	- Remove bogus use of kmem_alloc that was inherited from the old zone allocator. - Properly set M_ZERO when talking to the back end page allocators for non malloc zones. This forces us to zero fill pages when they are first brought into a cache. - Properly handle M_ZERO in uma_zalloc_internal. This fixes a problem where per cpu buckets weren't always getting zeroed.	2002-06-19 20:49:44 +00:00
Jeff Roberson	95f24639b7	Teach kmem_malloc about M_ZERO.	2002-06-19 20:47:18 +00:00
Alan Cox	00e1854a1f	o Replace GIANT_REQUIRED in vm_object_coalesce() by the acquisition and release of Giant. o Reduce the scope of GIANT_REQUIRED in vm_map_insert(). These changes will enable us to remove the acquisition and release of Giant from obreak().	2002-06-19 06:02:03 +00:00
Alan Cox	515630b12f	o Remove LK_CANRECURSE from the vm_map lock.	2002-06-18 18:31:35 +00:00
Jeff Roberson	4741dcbff5	Honor the BUCKETCACHE flag on free as well.	2002-06-17 23:53:58 +00:00
Jeff Roberson	18aa2de5a7	- Introduce the new M_NOVM option which tells uma to only check the currently allocated slabs and bucket caches for free items. It will not go ask the vm for pages. This differs from M_NOWAIT in that it not only doesn't block, it doesn't even ask. - Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This tells uma that it should only allocate buckets out of the bucket cache, and not from the VM. It does this by using the M_NOVM option to zalloc when getting a new bucket. This is so that the VM doesn't recursively enter itself while trying to allocate buckets for vm_map_entry zones. If there are already allocated buckets when we get here we'll still use them but otherwise we'll skip it. - Use the ZONE_VM flag on vm map entries and pv entries on x86.	2002-06-17 22:02:41 +00:00
Alan Cox	b49ecb86d0	o Acquire and release Giant in vm_map_wakeup() to prevent a lost wakeup(). Reviewed by: tegge	2002-06-17 13:27:40 +00:00
Alan Cox	042bb29940	o Remove GIANT_REQUIRED from vm_fault_user_wire(). o Move pmap_pageable() outside of Giant in vm_fault_unwire(). (pmap_pageable() is a no-op on all supported architectures.) o Remove the acquisition and release of Giant from mlock().	2002-06-16 20:42:29 +00:00
Alan Cox	319490fb7b	o Remove GIANT_REQUIRED from useracc() and vsunlock(). Neither vm_map_check_protection() nor vm_map_unwire() expect Giant to be held.	2002-06-15 19:10:19 +00:00
Alan Cox	e30616dbfe	o Remove the acquisition and release of Giant from munlock(). Reviewed by: tegge	2002-06-15 05:05:04 +00:00
Alan Cox	1d7cf06c8c	o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were only used by vm_map_pageable() and vm_map_user_pageable().) Reviewed by: tegge	2002-06-14 18:21:01 +00:00
Alan Cox	d46e7d6bee	o Acquire and release Giant in vm_map_unlock_and_wait(). Submitted by: tegge	2002-06-12 08:15:52 +00:00
Alan Cox	28c58286ef	o Properly handle a failure by vm_fault_wire() or vm_fault_user_wire() in vm_map_wire(). o Make two white-space changes in vm_map_wire(). Reviewed by: tegge	2002-06-11 19:13:59 +00:00
Alan Cox	73b2bace26	o Teach vm_map_delete() to respect the "in-transition" flag on a vm_map_entry by sleeping until the flag is cleared. Submitted by: tegge	2002-06-11 05:24:22 +00:00
Alan Cox	2b4a2c272d	o In vm_map_entry_create(), call uma_zalloc() with M_NOWAIT on system maps. Submitted by: tegge o Eliminate the "!mapentzone" check from vm_map_entry_create() and vm_map_entry_dispose(). Reviewed by: tegge o Fix white-space usage in vm_map_entry_create().	2002-06-10 06:11:45 +00:00
Ian Dowse	f97d6ce396	Correct the logic for determining whether the per-CPU locks need to be destroyed. This fixes a problem where destroying a UMA zone would fail to destroy all zone mutexes. Reviewed by: jeff	2002-06-10 03:25:23 +00:00
Alan Cox	12d7cc840f	o Add vm_map_wire() for wiring contiguous regions of either kernel or user vm_maps. This implementation has two key benefits when compared to vm_map_{user_,}pageable(): (1) it avoids a race condition through the use of "in-transition" vm_map entries and (2) it eliminates lock recursion on the vm_map. Note: there is still an error case that requires clean up. Reviewed by: tegge	2002-06-09 20:25:18 +00:00
Alan Cox	b2f3846aef	o Simplify vm_map_unwire() by merging the second and third passes over the caller-specified region.	2002-06-08 19:00:40 +00:00
Alan Cox	e27e17b711	o Remove an unnecessary call to vm_map_wakeup() from vm_map_unwire(). o Add a stub for vm_map_wire(). Note: the description of the previous commit had an error. The in- transition flag actually blocks the deallocation of a vm_map_entry by vm_map_delete() and vm_map_simplify_entry().	2002-06-08 07:32:38 +00:00
Alan Cox	acd9a301ec	o Add vm_map_unwire() for unwiring contiguous regions of either kernel or user vm_maps. In accordance with the standards for munlock(2), and in contrast to vm_map_user_pageable(), this implementation does not allow holes in the specified region. This implementation uses the "in transition" flag described below. o Introduce a new flag, "in transition," to the vm_map_entry. Eventually, vm_map_delete() and vm_map_simplify_entry() will respect this flag by deallocating in-transition vm_map_entrys, allowing the vm_map lock to be safely released in vm_map_unwire() and (the forthcoming) vm_map_wire(). o Modify vm_map_simplify_entry() to respect the in-transition flag. In collaboration with: tegge	2002-06-07 18:34:23 +00:00
Alfred Perlstein	fa7212543f	fix typo in _SYS_SYSPROTO_H_ case: s/mlockall_args/munlockall_args Submitted by: Mark Santcroos <marks@ripe.net>	2002-06-06 18:51:14 +00:00
Jeff Roberson	494273bead	Add a comment describing a resource leak that occurs during a failure case in obj_alloc.	2002-06-03 22:59:19 +00:00
Alan Cox	c5aaa06ded	o Migrate vm_map_split() from vm_map.c to vm_object.c, renaming it to vm_object_split(). Its interface should still be changed to resemble vm_object_shadow().	2002-06-02 23:54:09 +00:00
Alan Cox	0d78c0dce2	o Style fixes to vm_map_split(), including the elimination of one variable declaration that shadows another. Note: This function should really be vm_object_split(), not vm_map_split(). Reviewed by: md5	2002-06-02 19:32:05 +00:00
Alan Cox	72353893d4	o Condition vm_object_pmap_copy_1()'s compilation on the kernel option ENABLE_VFS_IOOPT. Unless this option is in effect, vm_object_pmap_copy_1() is not used.	2002-06-02 06:31:41 +00:00
Alan Cox	61c075b67f	o Remove GIANT_REQUIRED from vm_map_zfini(), vm_map_zinit(), vm_map_create(), and vm_map_submap(). o Make further use of a local variable in vm_map_entry_splay() that caches a reference to one of a vm_map_entry's children. (This reduces code size somewhat.) o Revert a part of revision 1.66, deinlining vmspace_pmap(). (This function is MPSAFE.)	2002-06-01 22:41:43 +00:00
Alan Cox	794316a866	o Revert a part of revision 1.66, contrary to what that commit message says, deinlining vm_map_entry_behavior() and vm_map_entry_set_behavior() actually increases the kernel's size. o Make vm_map_entry_set_behavior() static and add a comment describing its purpose. o Remove an unnecessary initialization statement from vm_map_entry_splay().	2002-06-01 16:59:30 +00:00
Dag-Erling Smørgrav	8dcfdf3f80	Export nswapdev through sysctl(8). Sponsored by: DARPA, NAI Labs	2002-05-31 08:17:58 +00:00
Alan Cox	9917e01041	Further work on pushing Giant out of the vm_map layer and down into the vm_object layer: o Acquire and release Giant in vm_object_shadow() and vm_object_page_remove(). o Remove the GIANT_REQUIRED assertion preceding vm_map_delete()'s call to vm_object_page_remove(). o Remove the acquisition and release of Giant around vm_map_lookup()'s call to vm_object_shadow().	2002-05-31 03:48:55 +00:00
Alfred Perlstein	99b9331a4f	Check for defined(__i386__) instead of just defined(i386) since the compiler will be updated to only define(__i386__) for ANSI cleanliness.	2002-05-30 07:32:58 +00:00
Peter Wemm	7550be9c57	The kernel printf does not have %i	2002-05-29 08:25:13 +00:00
Alan Cox	8f2ba19c90	o Remove unused #defines.	2002-05-27 22:10:28 +00:00
Alan Cox	4b9fdc2bce	o Acquire and release Giant around pmap operations in vm_fault_unwire() and vm_map_delete(). Assert GIANT_REQUIRED in vm_map_delete() only if operating on the kernel_object or the kmem_object. o Remove GIANT_REQUIRED from vm_map_remove(). o Remove the acquisition and release of Giant from munmap().	2002-05-26 04:54:56 +00:00
Alan Cox	4e94f40222	o Replace the vm_map's hint by the root of a splay tree. By design, the last accessed datum is moved to the root of the splay tree. Therefore, on lookups in which the hint resulted in O(1) access, the splay tree still achieves O(1) access. In contrast, on lookups in which the hint failed miserably, the splay tree achieves amortized logarithmic complexity, resulting in dramatic improvements on vm_maps with a large number of entries. For example, the execution time for replaying an access log from www.cs.rice.edu against the thttpd web server was reduced by 23.5% due to the large number of files simultaneously mmap()ed by this server. (The machine in question has enough memory to cache most of this workload.) Nothing comes for free: At present, I see a 0.2% slowdown on "buildworld" due to the overhead of maintaining the splay tree. I believe that some or all of this can be eliminated through optimizations to the code. Developed in collaboration with: Juan E Navarro <jnavarro@cs.rice.edu> Reviewed by: jeff	2002-05-24 01:33:24 +00:00
Alan Cox	03adb816d7	o Make contigmalloc1() static.	2002-05-22 01:01:37 +00:00
John Baldwin	4c1cc01cd8	In uma_zalloc_arg(), if we are performing a M_WAITOK allocation, ensure that td_intr_nesting_level is 0 (like malloc() does). Since malloc() calls uma we can probably remove the check in malloc() for this now. Also, perform an extra witness check in that case to make sure we don't hold any locks when performing a M_WAITOK allocation.	2002-05-20 17:54:48 +00:00
Alan Cox	e0be79afbf	o Eliminate the acquisition and release of Giant from minherit(2). (vm_map_inherit() no longer requires Giant to be held.)	2002-05-18 18:59:00 +00:00
Alan Cox	094f6d2694	o Remove GIANT_REQUIRED from vm_map_madvise(). Instead, acquire and release Giant around vm_map_madvise()'s call to pmap_object_init_pt(). o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition and release of Giant. o Remove the acquisition and release of Giant from madvise().	2002-05-18 07:48:06 +00:00
Alan Cox	4328504956	o Remove the acquisition and release of Giant from mprotect().	2002-05-18 03:58:16 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Poul-Henning Kamp	98b0c78978	Make daddr_t and u_daddr_t 64bits wide. Retire daddr64_t and use daddr_t instead. Sponsored by: DARPA & NAI Labs.	2002-05-14 11:09:43 +00:00
Jeff Roberson	713deb3677	Don't call the uz free function while the zone lock is held. This can lead to lock order reversals. uma_reclaim now builds a list of freeable slabs and then unlocks the zones to do all of the frees.	2002-05-13 05:08:18 +00:00
Jeff Roberson	0aef6126a1	Remove the hash_free() lock order reversal. This could have happened for several reasons before. Fixing it involved restructuring the generic hash code to require calling code to handle locking, unlocking, and freeing hashes on error conditions.	2002-05-13 04:39:28 +00:00
Alan Cox	a47335fdb4	o Remove GIANT_REQUIRED and an excessive number of blank lines from vm_map_inherit(). (minherit() need not acquire Giant anymore.)	2002-05-12 18:42:05 +00:00
Alan Cox	47c3ccc467	o Acquire and release Giant in vm_object_reference() and vm_object_deallocate(), replacing the assertion GIANT_REQUIRED. o Remove GIANT_REQUIRED from vm_map_protect() and vm_map_simplify_entry(). o Acquire and release Giant around vm_map_protect()'s call to pmap_protect(). Altogether, these changes eliminate the need for mprotect() to acquire and release Giant.	2002-05-12 05:22:56 +00:00
Alan Cox	b3a882e936	o Header files shouldn't depend on options: Provide prototypes for uiomoveco(), uioread(), and vm_uiomove() regardless of whether ENABLE_VFS_IOOPT is defined or not. Submitted by: bde	2002-05-06 06:20:04 +00:00
Alan Cox	c0b6bbb80b	o Condition the compilation and use of vm_freeze_copyopts() on ENABLE_VFS_IOOPT.	2002-05-06 05:45:57 +00:00
Alan Cox	dcc5840ed5	o Some improvements to the page coloring of vm objects, particularly, for shadow objects. Submitted by: bde	2002-05-06 03:34:17 +00:00
Alan Cox	e86256c1f4	o Move vm_freeze_copyopts() from vm_map.{c.h} to vm_object.{c,h}. It's plainly an operation on a vm_object and belongs in the latter place.	2002-05-06 00:12:47 +00:00
Alan Cox	c50fe92b8d	o Condition the compilation of uiomoveco() and vm_uiomove() on ENABLE_VFS_IOOPT. o Add a comment to the effect that this code is experimental support for zero-copy I/O.	2002-05-05 22:42:40 +00:00
Poul-Henning Kamp	81e017430a	Expand the one-line function pbreassignbuf() the only place it is or could be used.	2002-05-05 20:37:08 +00:00
Alan Cox	15fdd586e3	o Remove GIANT_REQUIRED from vm_map_lookup() and vm_map_lookup_done(). o Acquire and release Giant around vm_map_lookup()'s call to vm_object_shadow().	2002-05-05 05:36:28 +00:00
Jeff Roberson	c7173f58fa	Use pages instead of uz_maxpages, which has not been initialized yet, when creating the vm_object. This was broken after the code was rearranged to grab giant itself. Spotted by: alc	2002-05-04 21:49:29 +00:00
Alan Cox	79660d837c	o Make _vm_object_allocate() and vm_object_allocate() callable without holding Giant. o Begin documenting the trivial cases of the locking protocol on vm_object.	2002-05-04 20:23:48 +00:00
Alan Cox	8c5c5d049f	o Remove GIANT_REQUIRED from vm_map_lookup_entry() and vm_map_check_protection(). o Call vm_map_check_protection() without Giant held in munmap().	2002-05-04 02:07:36 +00:00
Alan Cox	bc91c5107a	o Change the implementation of vm_map locking to use exclusive locks exclusively. The interface still, however, distinguishes between a shared lock and an exclusive lock.	2002-05-02 17:32:27 +00:00
Jeff Roberson	8f70816cf2	Hide a pointer to the malloc_type bucket at the end of the freed memory. If this memory is modified after it has been freed we can now report it's previous owner.	2002-05-02 09:07:04 +00:00
Jeff Roberson	b9ba893179	Move around the dbg code a bit so it's always under a lock. This stops a weird potential race if we were preempted right as we were doing the dbg checks.	2002-05-02 09:05:36 +00:00
Andrew R. Reiter	c3bdc05fb9	- Changed the size element of uma_zctor_args to be size_t instead of int. - Changed uma_zcreate to accept the size argument as a size_t intead of int. Approved by: jeff	2002-05-02 07:36:30 +00:00
Jeff Roberson	5a34a9f089	malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the mallochash. Mallochash is going to go away as soon as I introduce the kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen until all users of the malloc api that expect memory to be aligned on the size of the allocation are fixed.	2002-05-02 07:22:19 +00:00
Alan Cox	569687d02f	o Remove dead and lockmgr()-specific debugging code.	2002-05-02 02:32:09 +00:00
Jeff Roberson	639c9550fb	Remove the temporary alignment check in free(). Implement the following checks on freed memory in the bucket path: - Slab membership - Alignment - Duplicate free This previously was only done if we skipped the buckets. This code will slow down INVARIANTS a bit, but it is smp safe. The checks were moved out of the normal path and into hooks supplied in uma_dbg.	2002-05-02 02:08:48 +00:00
Alan Cox	ea0f50bcf0	o Convert the vm_page buckets mutex to a spin lock. (This resolves an issue on the Alpha platform found by jeff@.) o Simplify vm_page_lookup(). Reviewed by: jhb	2002-04-30 21:24:47 +00:00
Jeff Roberson	8efc4eff00	Add a new UMA debugging facility. This will overwrite freed memory with 0xdeadc0de and then check for it just before memory is handed off as part of a new request. This will catch any post free/pre alloc modification of memory, as well as introduce errors for anything that tries to dereference it as a pointer. This code takes the form of special init, fini, ctor and dtor routines that are specificly used by malloc. It is in a seperate file because additional debugging aids will want to live here as well.	2002-04-30 07:54:25 +00:00
Jeff Roberson	2cc35ff9c6	Move the implementation of M_ZERO into UMA so that it can be passed to uma_zalloc and friends. Remove this functionality from the malloc wrapper. Document this change in uma.h and adjust variable names in uma_core.	2002-04-30 04:26:34 +00:00
Alan Cox	7788e21963	o Revert vm_fault1() to its original name vm_fault(), eliminating the wrapper that took its place for the purposes of acquiring and releasing Giant.	2002-04-30 03:44:34 +00:00
Jeff Roberson	28bc44195c	Add a new zone flag UMA_ZONE_MTXCLASS. This puts the zone in it's own mutex class. Currently this is only used for kmapentzone because kmapents are are potentially allocated when freeing memory. This is not dangerous though because no other allocations will be done while holding the kmapentzone lock.	2002-04-29 23:45:41 +00:00
Peter Wemm	db17c6fc07	Tidy up some loose ends. i386/ia64/alpha - catch up to sparc64/ppc: - replace pmap_kernel() with refs to kernel_pmap - change kernel_pmap pointer to (&kernel_pmap_store) (this is a speedup since ld can set these at compile/link time) all platforms (as suggested by jake): - gc unused pmap_reference - gc unused pmap_destroy - gc unused struct pmap.pm_count (we never used pm_count - we track address space sharing at the vmspace)	2002-04-29 07:43:16 +00:00
Alan Cox	532eadef77	Document three synchronization issues in vm_fault().	2002-04-29 05:23:01 +00:00
Alan Cox	780b1c0997	Pass the caller's file name and line number to the vm_map locking functions.	2002-04-28 23:12:52 +00:00
Alan Cox	d974f03c69	o Introduce and use vm_map_trylock() to replace several direct uses of lockmgr(). o Add missing synchronization to vmspace_swap_count(): Obtain a read lock on the vm_map before traversing it.	2002-04-28 06:07:54 +00:00
Peter Wemm	44e74ba6c3	We do not necessarily need to map/unmap pages to zero parts of them. On systems where physical memory is also direct mapped (alpha, sparc, ia64 etc) this is slightly harmful.	2002-04-28 00:15:48 +00:00
Alan Cox	089b073345	o Begin documenting the (existing) locking protocol on the vm_map in the same style as sys/proc.h. o Undo the de-inlining of several trivial, MPSAFE methods on the vm_map. (Contrary to the commit message for vm_map.h revision 1.66 and vm_map.c revision 1.206, de-inlining these methods increased the kernel's size.)	2002-04-27 22:01:37 +00:00
Alan Cox	cbd53e95fe	o Control access to the vm_page_buckets with a mutex. o Fix some style(9) bugs.	2002-04-26 22:44:15 +00:00
Andrew R. Reiter	d4d6aee5a0	- Fix a round down bogon in uma_zone_set_max(). Submitted by: jeff@	2002-04-25 06:24:40 +00:00
Alan Cox	a569838764	Reintroduce locking on accesses to vm_object_list.	2002-04-20 07:23:22 +00:00
Alan Cox	92de35b0ce	o Move the acquisition of Giant from vm_fault() to the point after initialization in vm_fault1(). o Fix some style problems in vm_fault1().	2002-04-19 04:20:31 +00:00
Alan Cox	ff8f4ebe22	Add a comment documenting a race condition in vm_fault(): Specifically, a modification is made to the vm_map while only a read lock is held.	2002-04-18 03:55:50 +00:00
Alan Cox	6139043b1f	o Call vm_map_growstack() from vm_fault() if vm_map_lookup() has failed due to conditions that suggest the possible need for stack growth. This has two beneficial effects: (1) we can now remove calls to vm_map_growstack() from the MD trap handlers and (2) simple page faults are faster because we no longer unnecessarily perform vm_map_growstack() on every page fault. o Remove vm_map_growstack() from the i386's trap_pfault(). o Remove the acquisition and release of Giant from i386's trap_pfault(). (vm_fault() still acquires it.)	2002-04-18 03:28:27 +00:00
Peter Wemm	334f706177	Do not free the vmspace until p->p_vmspace is set to null. Otherwise statclock can access it in the tail end of statclock_process() at an unfortunate time. This bit me several times on an SMP alpha (UP2000) and the problem went away with this change. I'm not sure why it doesn't break x86 as well. Maybe it's because the clocks are much faster on alpha (HZ=1024 by default).	2002-04-17 05:26:42 +00:00
Alan Cox	b208d0633f	Remove an unused option, VM_FAULT_HOLD, to vm_fault().	2002-04-17 02:23:57 +00:00
Peter Wemm	1a87a0da66	Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.) Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)	2002-04-15 16:00:03 +00:00
Jeff Roberson	5300d9dda2	Fix a witness warning when expanding a hash table. We were allocating the new hash while holding the lock on a zone. Fix this by doing the allocation seperately from the actual hash expansion. The lock is dropped before the allocation and reacquired before the expansion. The expansion code checks to see if we lost the race and frees the new hash if we do. We really never will lose this race because the hash expansion is single threaded via the timeout mechanism.	2002-04-14 13:47:10 +00:00
Jeff Roberson	0da47b2fc6	Protect the initial list traversal in sysctl_vm_zone() with the uma_mtx.	2002-04-14 12:39:38 +00:00
Jeff Roberson	af7f9b97b6	Fix the calculation that determines uz_maxpages. It was off for large zones. Fortunately we have no large zones with maximums specified yet, so it wasn't breaking anything. Implement blocking when a zone exceeds the maximum and M_WAITOK is specified. Previously this just failed like the old zone allocator did. The old zone allocator didn't support WAITOK/NOWAIT though so we should do what we advertise. While I was in there I cleaned up some more zalloc logic to further simplify that code path and reduce redundant code. This was needed to make the blocking work properly anyway.	2002-04-14 01:56:25 +00:00
Jeff Roberson	bce9779110	Remember to unlock the zone if the fill count is too high. Pointed out by: pete, jake, jhb	2002-04-10 01:52:50 +00:00
Jeff Roberson	1d4cb54ba8	Quiet witness warnings about acquiring several zone locks. In the case that this happens it is OK.	2002-04-08 21:08:17 +00:00
Jeff Roberson	86bbae32f4	Add a mechanism to disable buckets when the v_free_count drops below v_free_min. This should help performance in memory starved situations.	2002-04-08 06:20:34 +00:00
Jeff Roberson	605cbd6a08	Don't release the zone lock until after the dtor has been called. As far as I can tell this could not have caused any problems yet because UMA is still called with giant. Pointy hat to: jeff Noticed by: jake	2002-04-08 05:13:48 +00:00
Jeff Roberson	9c2cd7e5a9	Implement uma_zdestroy(). It's prototype changed slightly. I decided that I didn't like the wait argument and that if you were removing a zone it had better be empty. Also, I broke out part of hash_expand and made a seperate hash_free() for use in uma_zdestroy.	2002-04-08 04:48:58 +00:00
Jeff Roberson	a553d4b8eb	Rework most of the bucket allocation and free code so that per cpu locks are never held across blocking operations. Also, fix two other lock order reversals that were exposed by jhb's witness change. The free path previously had a bug that would cause it to skip the free bucket list in some cases and go straight to allocating a new bucket. This has been fixed as well. These changes made the bucket handling code much cleaner and removed quite a few lock operations. This should be marginally faster now. It is now possible to call malloc w/o Giant and avoid any witness warnings. This still isn't entirely safe though because malloc_type statistics are not protected by any lock.	2002-04-08 02:42:55 +00:00
Jeff Roberson	c235bfa551	Spelling correction; s/seperate/separate/g Submitted by: eric	2002-04-07 22:56:48 +00:00
Jeff Roberson	fedfeee018	There should be no remaining references to these two files in the tree. If there are, it is an error. vm_zone has been superseded by uma.	2002-04-07 22:51:18 +00:00
Jeff Roberson	d0b06acbe1	This fixes a bug where isitem never got set to 1 if a certain chain of events relating to extreme low memory situations occured. This was only ever seen on the port build cluster, so many thanks to kris for helping me debug this. Tested by: kris	2002-04-07 22:47:36 +00:00
Alan Cox	aa4d062142	o Eliminate the use of grow_stack() and useracc() from sendsig(), osendsig(), and osf1_sendsig(). o Eliminate the prototype for the MD grow_stack() now that it has been removed from all platforms.	2002-04-05 00:52:15 +00:00
Matthew Dillon	80f5c8bf42	Embed a struct vmmeter in the per-cpu structure and add a macro, PCPU_LAZY_INC() which increments elements in it for cases where we can afford the occassional inaccuracy. Use of per-cpu stats counters avoids significant cache stalls in various critical paths that would otherwise severely limit our cpu scaleability. Adjust all sysctl's accessing cnt.* elements to now use a procedure which aggregates the requested field for all cpus and for the global vmmeter. The global vmmeter is retained, since some stats counters, like v_free_min, cannot be made per-cpu. Also, this allows us to convert counters from the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so have at it!	2002-04-04 21:38:47 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
Jake Burkholder	48f9a59443	Fix a long standing 32bit-ism. Don't assume that the size of a chunk of memory in phys_avail will fit in 'int', use vm_size_t. This fixes booting on sparc64 machines with more than 2 gigs of ram. Thanks to Jan Chrillesen for providing me with access to a 4 gig machine.	2002-04-03 06:57:52 +00:00
Alfred Perlstein	157d7b3538	fix comment typo, s/neccisary/necessary/g	2002-04-02 21:25:12 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Jeff Roberson	f22a4b62f5	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb	2002-03-27 09:23:41 +00:00
Alan Cox	433b72aa12	Remove an unused prototype.	2002-03-26 05:30:59 +00:00
Jeff Roberson	f4af24d55d	Reset the cachefree statistics after draining the cache. This fixes a bug where a sysctl within 20 seconds of a cache_drain could yield negative "USED" counts. Also, grab the uma_mtx while in the sysctl handler. This hadn't caused problems yet because Giant is held all the time. Reported by: kkenn	2002-03-24 10:56:11 +00:00
Jeff Roberson	736ee5907f	Add uma_zone_set_max() to add enforced limits to non vm obj backed zones.	2002-03-20 05:28:34 +00:00
Jeff Roberson	670d17b5c0	Remove references to vm_zone.h and switch over to the new uma API.	2002-03-20 04:02:59 +00:00
Alfred Perlstein	11caded34f	Remove __P.	2002-03-19 22:20:14 +00:00
Jeff Roberson	9eb6e51923	Quit a warning introduced by UMA. This only occurs on machines where vm_size_t != unsigned long. Reviewed by: phk	2002-03-19 11:49:10 +00:00
Peter Wemm	30171114b3	Fix a gcc-3.1+ warning. warning: deprecated use of label at end of compound statement ie: you cannot do this anymore: switch(foo) { .... default: }	2002-03-19 11:02:06 +00:00
Jeff Roberson	8355f576a9	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@	2002-03-19 09:11:49 +00:00
Brian Feldman	25adb370be	Back out the modification of vm_map locks from lockmgr to sx locks. The best path forward now is likely to change the lockmgr locks to simple sleep mutexes, then see if any extra contention it generates is greater than removed overhead of managing local locking state information, cost of extra calls into lockmgr, etc. Additionally, making the vm_map lock a mutex and respecting it properly will put us much closer to not needing Giant magic in vm.	2002-03-18 15:08:09 +00:00
Alan Cox	9f0567f557	Remove vm_object_count: It's unused, incorrectly maintained and duplicates information maintained by the zone allocator.	2002-03-17 18:37:37 +00:00
Alan Cox	5ee9fe6ba1	Undo part of revision 1.57: Now that (o)sendsig() doesn't call useracc(), the motivation for saving and restoring the map->hint in useracc() is gone. (The same tests that motivated this change in revision 1.57 now show that there is no performance loss from removing it.) This was really a hack and some day we would have had to add new synchronization here on map->hint to maintain it.	2002-03-17 07:01:42 +00:00
Alan Cox	2f6c16e1e8	Acquire a read lock on the map inside of vm_map_check_protection() rather than expecting the caller to do so. This (1) eliminates duplicated code in kernacc() and useracc() and (2) fixes missing synchronization in munmap().	2002-03-17 03:19:31 +00:00
Jake Burkholder	ac59490b5e	Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/ pmap_qremove. pmap_kenter is not safe to use in MI code because it is not guaranteed to flush the mapping from the tlb on all cpus. If the process in question is preempted and migrates cpus between the call to pmap_kenter and pmap_kremove, the original cpu will be left with stale mappings in its tlb. This is currently not a problem for i386 because we do not use PG_G on SMP, and thus all mappings are flushed from the tlb on context switches, not just user mappings. This is not the case on all architectures, and if PG_G is to be used with SMP on i386 it will be a problem. This was committed by peter earlier as part of his fine grained tlb shootdown work for i386, which was backed out for other reasons. Reviewed by: peter	2002-03-17 00:56:41 +00:00
Kirk McKusick	0d2af52141	Introduce the new 64-bit size disk block, daddr64_t. Change the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.	2002-03-15 18:49:47 +00:00
Brian Feldman	9cb574590e	Document faultstate.lookup_still_valid more than none. Requested by: alfred	2002-03-14 02:10:14 +00:00
Brian Feldman	0e0af8ecda	Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate. While doing this, move it earlier in the sysinit boot process so that the VM system can use it. After that, the system is now able to use sx locks instead of lockmgr locks in the VM system. To accomplish this, some of the more questionable uses of the locks (such as testing whether they are owned or not, as well as allowing shared+exclusive recursion) are removed, and simpler logic throughout is used so locks should also be easier to understand. This has been tested on my laptop for months, and has not shown any problems on SMP systems, either, so appears quite safe. One more user of lockmgr down, many more to go :)	2002-03-13 23:48:08 +00:00
Eivind Eklund	a128794977	- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc	2002-03-10 21:52:48 +00:00
Tor Egge	ff91d7800f	Revert change in revision 1.53 and add a small comment to protect the revived code. vm pages newly allocated are marked busy (PG_BUSY), thus calling vm_page_delete before the pages has been freed or unbusied will cause a deadlock since vm_page_object_page_remove will wait for the busy flag to be cleared. This can be triggered by calling malloc with size > PAGE_SIZE and the M_NOWAIT flag on systems low on physical free memory. A kernel module that reproduces the problem, written by Logan Gabriel <logan@mail.2cactus.com>, can be found in the freebsd-hackers mail archive (12 Apr 2001). The problem was recently noticed again by Archie Cobbs <archie@dellroad.org>. Reviewed by: dillon	2002-03-09 16:24:27 +00:00
Matthew Dillon	8c5dffe8ca	Fix a bug in the vm_map_clean() procedure. msync()ing an area of memory that has just been mapped MAP_ANON\|MAP_NOSYNC and has not yet been accessed will panic the machine. MFC after: 1 day	2002-03-07 03:54:56 +00:00
Matthew Dillon	b9b7a4be90	Add a sequential iteration optimization to vm_object_page_clean(). This moderately improves msync's and VM object flushing for objects containing randomly dirtied pages (fsync(), msync(), filesystem update daemon), and improves cpu use for small-ranged sequential msync()s in the face of very large mmap()ings from O(N) to O(1) as might be performed by a database. A sysctl, vm.msync_flush_flag, has been added and defaults to 3 (the two committed optimizations are turned on by default). 0 will turn off both optimizations. This code has already been tested under stable and is one in a series of memq / vp->v_dirtyblkhd / fsync optimizations to remove O(N^2) restart conditions that will be coming down the pipe. MFC after: 3 days	2002-03-06 02:42:56 +00:00
Eivind Eklund	f52bd684f3	* Move bswlist declaration and initialization from kern/vfs_bio.c to vm/vm_pager.c, which is the only place it is used. * Make the QUEUE_* definitions and bufqueues local to vfs_bio.c. * constify buf_wmesg.	2002-03-05 18:20:58 +00:00
Alan Cox	2be21c5e68	o Create vm_pageq_enqueue() to encapsulate code that is duplicated time and again in vm_page.c and vm_pageq.c. o Delete unusused prototypes. (Mainly a result of the earlier renaming of various functions from vm_page_() to vm_pageq_().)	2002-03-04 18:55:26 +00:00
Alan Cox	64190c7a2f	Call vm_pageq_remove_nowakeup() rather than duplicating it.	2002-03-03 22:36:14 +00:00
Alan Cox	5714577006	Remove some long dead code.	2002-03-02 22:21:42 +00:00
John Baldwin	fdcc1cc09f	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Mike Silbersack	7f3a40933b	Fix a horribly suboptimal algorithm in the vm_daemon. In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times. Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change. Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began. Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals. PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week	2002-02-27 18:03:02 +00:00
Peter Wemm	d1693e1701	Back out all the pmap related stuff I've touched over the last few days. There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.	2002-02-27 09:51:33 +00:00
Peter Wemm	bd1e3a0f89	Jake further reduced IPI shootdowns on sparc64 in loops by using ranged shootdowns in a couple of key places. Do the same for i386. This also hides some physical addresses from higher levels and has it use the generic vm_page_t's instead. This will help for PAE down the road. Obtained from: jake (MI code, suggestions for MD part)	2002-02-27 02:14:58 +00:00
Peter Wemm	dd50331c0e	Remove unused variable (td)	2002-02-26 01:01:37 +00:00
Poul-Henning Kamp	57c10583aa	GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.	2002-02-22 09:26:35 +00:00
Tor Egge	d2760948fe	Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices. Reviewed by: dillon, alc	2002-02-19 23:19:30 +00:00
Mike Silbersack	0c9e47230a	Add one more comment to the OOM changes so that future readers of the code may better understand the code. Suggested by: dillon MFC after: 1 week	2002-02-19 18:50:49 +00:00
Mike Silbersack	ef6020d187	Changes to make the OOM killer much more effective: - Allow the OOM killer to target processes currently locked in memory. These very often are the ones doing the memory hogging. - Drop the wakeup priority of processes currently sleeping while waiting for their page fault to complete. In order for the OOM killer to work well, the killed process and other system processes waiting on memory must be allowed to wakeup first. Reviewed by: dillon MFC after: 1 week	2002-02-19 18:34:02 +00:00
Bruce Evans	1e92845e1b	Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED, DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET and SIMPLELOCK_DEBUG.	2002-02-15 13:16:11 +00:00
Julian Elischer	2c1007663f	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
Julian Elischer	079b7badea	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,	2002-02-07 20:58:47 +00:00
Alfred Perlstein	582ec34cd8	Fix a race with free'ing vmspaces at process exit when vmspaces are shared. Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces. The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update. Submitted by: green	2002-02-05 21:23:05 +00:00
Matthew Dillon	027df6bdd7	GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer cache lockups for over a year now. MFC after: 0 days	2002-01-31 18:39:44 +00:00
David Malone	d2979f90e7	Remove a parameter name from a prototype.	2002-01-25 21:33:10 +00:00
Bruce Evans	e50f5c2e8d	Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined. Fixed some style bugs.	2002-01-17 16:46:26 +00:00
Alfred Perlstein	a4db49537b	Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().	2002-01-14 00:13:45 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
John Baldwin	c86b6ff551	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha	2002-01-05 08:47:13 +00:00
Matthew Dillon	23b590188f	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
Matthew Dillon	3ebeaf5984	This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week	2001-12-14 01:16:57 +00:00
Luigi Rizzo	60363fb9f7	vm/vm_kern.c: rate limit (to once per second) diagnostic printf when you run out of mbuf address space. kern/subr_mbuf.c: print a warning message when mb_alloc fails, again rate-limited to at most once per second. This covers other cases of mbuf allocation failures. Probably it also overlaps the one handled in vm/vm_kern.c, so maybe the latter should go away. This warning will let us gradually remove the printf that are scattered across most network drivers to report mbuf allocation failures. Those are potentially dangerous, in that they are not rate-limited and can easily cause systems to panic. Unless there is disagreement (which does not seem to be the case judging from the discussion on -net so far), and because this is sort of a safety bugfix, I plan to commit a similar change to STABLE during the weekend (it affects kern/uipc_mbuf.c there). Discussed-with: jlemon, silby and -net	2001-12-01 00:21:30 +00:00
Jonathan Lemon	4584bbf555	When laying out objects in a ZONE_INTERRUPT zone, allow them to cross a page boundary, since we've already allocated all our contiguous kva space up front. This eliminates some memory wastage, and allows us to actually reach the # of objects were specified in the zinit() call. Reviewed by: peter, dillon	2001-11-17 00:40:48 +00:00
Matthew Dillon	fe8e0238cc	Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress count on a vnode-backed object must be incremented after obtaining the vnode lock. If it is bumped before obtaining the vnode lock we can deadlock against vtruncbuf(). Submitted by: peter, ps MFC after: 3 days	2001-11-09 21:34:45 +00:00
Matthew Dillon	33c6774151	Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond the file EOF. This works around a bug in the ISOFS (CDRom) BMAP code which returns bogus values for requests beyond the file EOF rather then returning an error, resulting in either corrupt data being mmap()'d beyond the file EOF or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of CDRom files). Reported by: peter / Yahoo MFC after: 3 days	2001-11-05 18:58:47 +00:00
Matthew Dillon	e302698320	Don't let pmap_object_init_pt() exhaust all available free pages (allocating pv entries w/ zalloci) when called in a loop due to an madvise(). It is possible to completely exhaust the free page list and cause a system panic when an expected allocation fails.	2001-10-31 03:06:33 +00:00
Matthew Dillon	7a5a635273	Move recently added procedure which was incorrectly placed within an #ifdef DDB block.	2001-10-26 16:27:54 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Matthew Dillon	57601bcb5d	Syntax cleanup and documentation, no operational changes. MFC after: 1 day	2001-10-21 06:12:06 +00:00
Ian Dowse	0eb6ce3169	Move the code that computes the system load average from vm_meter.c to kern_synch.c in preparation for adding some jitter to the inter-sample time. Note that the "vm.loadavg" sysctl still lives in vm_meter.c which isn't the right place, but it is appropriate for the current (bad) name of that sysctl. Suggested by: jhb (some time ago) Reviewed by: bde	2001-10-20 13:10:43 +00:00
Matthew Dillon	b386828956	contigmalloc1() could cause the vm_page_zero_count to become incorrect. Properly track the count. Submitted by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>	2001-10-17 17:34:34 +00:00
Tor Egge	d6844b6bf6	Don't use an uninitialized field reserved for callers in the bio structure passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage. Reviewed by: dillon	2001-10-15 23:02:54 +00:00
Tor Egge	30105b9ec4	Don't remove all mappings of a swapped out process if the vm map contained wired entries. vm_fault_unwire() depends on the mapping being intact. Reviewed by: dillon	2001-10-14 20:51:14 +00:00
Tor Egge	e7673b8424	Fix locking violations during page wiring: - vm map entries are not valid after the map has been unlocked. - An exclusive lock on the map is needed before calling vm_map_simplify_entry(). Fix cleanup after page wiring failure to unwire all pages that had been successfully wired before the failure was detected. Reviewed by: dillon	2001-10-14 20:47:08 +00:00
Matthew Dillon	33bd457d91	Makes contigalloc[1]() create the vm_map / underlying wired pages in the kernel map and object in a manner that contigfree() is actually able to free. Previously contigfree() freed up the KVA space but could not unwire & free the underlying VM pages due to mismatched pageability between the map entry and the VM pages. Submitted by: Thomas Moestl <tmoestl@gmx.net> Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu> MFC after: 3 days	2001-10-13 04:23:37 +00:00
Matthew Dillon	00a6f47f13	Finally fix the VM bug where a file whos EOF occurs in the middle of a page would sometimes prevent a dirty page from being cleaned, even when synced, resulting in the dirty page being re-flushed to disk every 30-60 seconds or so, forever. The problem is that when the filesystem flushes a page to its backing file it typically does not clear dirty bits representing areas of the page that are beyond the file EOF. If the file is also mmap()'d and a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty bits to VM_PAGE_BITS_ALL. This combination could leave us with an uncleanable, unfreeable page. The solution is to have the vnode_pager detect the edge case and manually clear the dirty bits representing areas beyond the file EOF. The filesystem does the rest and the page comes up clean after the write completes. MFC after: 3 days	2001-10-12 18:17:34 +00:00
John Baldwin	bd78cece5d	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
John Baldwin	61d80e90a9	Add missing includes of sys/ktr.h.	2001-10-11 17:53:43 +00:00
Paul Saab	cbc89bfbfe	Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks	2001-10-10 23:06:54 +00:00
Ian Dowse	564bfabecb	Remove the SSLEEP case from the load average computation. This has been a no-op for as long as our CVS history goes back. Processes in state SSLEEP could only be counted if p_slptime == 0, but immediately before loadav() is called, schedcpu() has just incremented p_slptime on all SSLEEP processes.	2001-10-04 22:33:31 +00:00
Robert Watson	8c5d4fe829	o Modify access control checks in mmap() to use securelevel_gt() instead of direct variable access. Obtained from: TrustedBSD Project	2001-09-26 20:29:39 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Peter Wemm	eb30c1c0b9	Rip some well duplicated code out of cpu_wait() and cpu_exit() and move it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical. XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place. Reviewed by: jake, tmm, dillon	2001-09-10 04:28:58 +00:00
John Baldwin	29fdb744d1	Process priority is locked by the sched_lock, not the proc lock.	2001-09-01 20:16:30 +00:00
Matthew Dillon	7feaf028be	make swapon() MPSAFE (will adjust syscalls.master later)	2001-08-31 22:15:37 +00:00
Matthew Dillon	6a33d53c48	mark obreak() and ovadvise() as being MPSAFE	2001-08-31 22:10:03 +00:00
Matthew Dillon	d2c60af81a	Cleanup	2001-08-31 01:26:30 +00:00
Peter Wemm	3516c025ff	Implement idle zeroing of pages. I've been tinkering with this on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.	2001-08-25 05:00:44 +00:00
Matthew Dillon	676274db9b	Remove support for the badly broken MAP_INHERIT (from -current only).	2001-08-24 19:29:56 +00:00
Matthew Dillon	219d632c15	Move most of the kernel submap initialization code, including the timeout callwheel and buffer cache, out of the platform specific areas and into the machine independant area. i386 and alpha adjusted here. Other cpus can be fixed piecemeal. Reviewed by: freebsd-smp, jake	2001-08-22 04:07:27 +00:00
Matthew Dillon	0b76df7146	KASSERT if vm_page_t->wire_count overflows.	2001-08-22 04:01:56 +00:00
Matthew Dillon	2f9e4e8025	Limit the amount of KVM reserved for the buffer cache and for swap-meta information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM. Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.	2001-08-20 00:41:12 +00:00
John Baldwin	02cd7c3cf2	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter	2001-08-10 06:56:12 +00:00
John Baldwin	8ec48c6dbf	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter	2001-08-10 06:37:05 +00:00
Thomas Moestl	59fa485c3e	Add a missing semicolon to unbreak the kernel build with INVARIANTS (which was unfortunately turned off in the confguration I used for the last test build). Spotted by: jake Pointy hat to: tmm	2001-08-05 03:55:02 +00:00
John Baldwin	bd8e0d5871	Whitespace fixes.	2001-08-04 20:49:29 +00:00
Thomas Moestl	b4c53a8111	Add a zdestroy() function to the zone allocator. This is needed for the unload case of modules that use their own zones. It has been tested with the nfs module.	2001-08-04 20:17:05 +00:00
Alfred Perlstein	61ce6eeee3	Fixups for the initial allocation by dillon: 1) allocate fewer buckets 2) when failing to allocate swap zone, keep reducing the zone by a third rather than a half in order to reduce the chance of allocating way too little. I also moved around some code for readability. Suggested by: dillon Reviewed by: dillon	2001-08-02 07:54:58 +00:00
Jake Burkholder	3a9b5daf48	Oops. Last commit to vm_object.c should have got these files too. Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon	2001-07-31 04:09:52 +00:00
Jake Burkholder	b06805ad34	Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon	2001-07-31 04:03:53 +00:00
Ian Dowse	a4821e444e	Permit direct swapping to NFS regular files using swapon(2). We already allow this for NFS swap configured via BOOTP, so it is known to work fine. For many diskless configurations is is more flexible to have the client set up swapping itself; it can recreate a sparse swap file to save on server space for example, and it works with a non-NFS root filesystem such as an in-kernel filesystem image.	2001-07-28 20:18:38 +00:00
Assar Westerlund	d3e5863fa9	make vm_page_select_cache static Requested by: bde	2001-07-23 12:34:31 +00:00
Assar Westerlund	0379d76358	(vm_page_select_cache): add prototype	2001-07-21 17:08:15 +00:00
Benno Rice	1f246456a5	The i386-specific includes in this file were "fixed" by bracketing them with #ifndef __alpha__. Fix this for the rest of the world by turning it into #ifdef __i386__. Reviewed by: obrien	2001-07-15 04:11:51 +00:00
Dag-Erling Smørgrav	bf3009895e	Fix missing newline and terminator at the end of the vm.zone sysctl.	2001-07-09 03:37:33 +00:00
Matt Jacob	f343cf2135	Apply field bandages to the includes so compiles happen on alpha.	2001-07-05 06:13:44 +00:00
Matthew Dillon	7197571105	Move vm_page_zero_idle() from machine-dependant sections to a machine-independant source file, vm/vm_zeroidle.c. It was exactly the same for all platforms and updating them all was getting annoying.	2001-07-05 01:32:42 +00:00
Matthew Dillon	6d03d577a5	Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system. vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)	2001-07-04 23:27:09 +00:00
Matthew Dillon	1b40f8c036	Change inlines back into mainline code in preparation for mutexing. Also, most of these inlines had been bloated in -current far beyond their original intent. Normalize prototypes and function declarations to be ANSI only (half already were). And do some general cleanup. (kernel size also reduced by 50-100K, but that isn't the prime intent)	2001-07-04 20:15:18 +00:00
Matthew Dillon	54d9214595	whitespace / register cleanup	2001-07-04 19:00:13 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
John Baldwin	b62b9b648b	Fix a XXX comment by moving the initialization of the number of pbuf's for the vnode pager to a new vnode pager init method instead of making it a hack in getpages().	2001-07-03 07:35:56 +00:00
John Baldwin	6d541bf1ae	- Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex. - Don't drop the vm mutex while grabbing the pbuf mutex to manipulate said variables.	2001-06-22 21:12:19 +00:00
Bosko Milekic	08442f8a82	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/	2001-06-22 06:35:32 +00:00
John Baldwin	ad6c5bbede	Don't lock around swap_pager_swap_init() that is only called once during the pagedaemon's startup code since it calls malloc which results in lock order reversals.	2001-06-20 23:34:06 +00:00
John Baldwin	69a78d4666	Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for now. The proc locking isn't actually safe yet and won't be until the proc locking is finished.	2001-06-20 00:48:20 +00:00
Matthew Dillon	ef6a93ef81	Cleanup the tabbing	2001-06-11 19:17:05 +00:00
Matthew Dillon	ff2b5645b5	Two fixes to the out-of-swap process termination code. First, start killing processes a little earlier to avoid a deadlock. Second, when calculating the 'largest process' do not just count RSS. Instead count the RSS + SWAP used by the process. Without this the code tended to kill small inconsequential processes like, oh, sshd, rather then one of the many 'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on -stable and somewhat tested on -current and will be MFCd in a few days. Shamed into fixing this by: ps	2001-06-09 18:06:58 +00:00
Thomas Moestl	5c5c8fa826	Change the way information about swap devices is exported to be more canonical: define a versioned struct xswdev, and add a sysctl node handler that allows the user to get this structure for a certain device index by specifying this index as last element of the MIB. This new node handler, vm.swap_info, replaces the old vm.nswapdev and vm.swapdevX.* (where X was the index) sysctls.	2001-06-01 22:53:10 +00:00
Thomas Moestl	d279178df7	Clean up the code exporting interrupt statistics via sysctl a bit: - move the sysctl code to kern_intr.c - do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine the length of the intrcnt array - move the declarations of intrnames, eintrnames, intrcnt and eintrcnt from machine-dependent include files to sys/interrupt.h - remove the hw.nintr sysctl, it is not needed. - fix various style bugs Requested by: bde Reviewed by: bde (some time ago)	2001-06-01 13:23:28 +00:00
John Baldwin	342a1480aa	Don't hold the VM lock across VOP's and other things that can sleep.	2001-05-29 16:58:25 +00:00
John Baldwin	190609dd48	Stick VM syscalls back under Giant if the BLEED option is not defined.	2001-05-24 18:04:29 +00:00
Matthew Dillon	ac8f990bde	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon	2001-05-24 07:22:27 +00:00
John Baldwin	e6b961ffbd	- Assert Giant is held in the vnode pager methods. - Lock the VM while walking down a vm_object's backing_object list in vnode_pager_lock().	2001-05-23 22:51:23 +00:00
John Baldwin	3614c6fcbb	- Add in several asserts of vm_mtx. - Assert Giant in vm_pageout_scan() for the vnode hacking that it does. - Don't hold vm_mtx around vget() or vput(). - Lock Giant when calling vm_pageout_scan() from the pagedaemon. Also, lock curproc while setting the P_BUFEXHAUST flag. - For now we still hold Giant for all of the vm_daemon. When process limits are locked we will be only need Giant for swapout_procs().	2001-05-23 22:48:28 +00:00
John Baldwin	60517fd1f7	- Assert that the vm lock is held for all of _vm_object_allocate(). - Restore the previous order of setting up a new vm_object. The previous had a small bug where we zero'd out the flags after we set the OBJ_ONEMAPPING flag. - Add several asserts of vm_mtx. - Assert Giant is held rather than locking and unlocking it in a few places. - Add in some #ifdef objlocks code to lock individual vm objects when vm objects each have their own lock someday. - Don't bother acquiring the allproc lock for a ddb command. If DDB blocked on the lock, that would be worse than having an inconsistent allproc list.	2001-05-23 22:42:10 +00:00
John Baldwin	21c641b2a9	- Add lots of vm_mtx assertions. - Add a few KTR tracepoints to track the addition and removal of vm_map_entry's and the creation adn free'ing of vmspace's. - Adjust a few portions of code so that we update the process' vmspace pointer to its new vmspace before freeing the old vmspace.	2001-05-23 22:38:00 +00:00
John Baldwin	3a2189d451	- Lock the VM around the pmap_swapin_proc() call in faultin(). - Don't lock Giant in the scheduler() function except for when calling faultin(). - In swapout_procs(), lock the VM before the proccess to avoid a lock order violation. - In swapout_procs(), release the allproc lock before calling swapout(). We restart the process scan after swapping out a process. - In swapout_procs(), un #if 0 the code to bump the vmspace reference count and lock the process' vm structures. This bug was introduced by me and could result in the vmspace being free'd out from under a running process. - Fix an old bug where the vmspace reference count was not free'd if we failed the swap_idle_threshold2 test.	2001-05-23 22:35:45 +00:00
John Baldwin	b608320d4a	- Fix the sw_alloc_interlock to actually lock itself when the lock is acquired. - Assert Giant is held in the strategy, getpages, and putpages methods and the getchainbuf, flushchainbuf, and waitchainbuf functions. - Always call flushchainbuf() w/o the VM lock.	2001-05-23 22:31:15 +00:00
John Baldwin	6d556da5c2	Assert Giant is held for the device pager alloc and getpages methods since we call the mmap method of the cdevsw of the device we are mmap'ing.	2001-05-23 22:27:52 +00:00
John Baldwin	e4ca250d4b	- Obtain Giant in mmap() syscall while messing with file descriptors and vnodes. - Fix an old bug that would leak a reference to a fd if the vnode being mmap'd wasn't of type VREG or VCHR. - Lock Giant in vm_mmap() around calls into the VM that can call into pager routines that need Giant or into other VM routines that need Giant. - Replace code that used a goto to jump around the else branch of a test to use an else branch instead.	2001-05-23 22:17:43 +00:00
John Baldwin	bb10bb4978	Acquire Giant around vm_map_remove() inside of the obreak() syscall for vm_object_terminate().	2001-05-23 22:13:10 +00:00
John Baldwin	576f0c5fa4	Take a more conservative approach and still lock Giant around VM faults for now.	2001-05-23 22:09:18 +00:00
John Baldwin	c52f090cfb	Set the phys_pager_alloc_lock to 1 when it is acquired so that it is actually locked.	2001-05-23 19:52:23 +00:00
Alfred Perlstein	c5e62505ad	aquire Giant when playing with the buffercache and doing IO. use msleep against the vm mutex while waiting for a page IO to complete.	2001-05-23 10:28:11 +00:00
Alfred Perlstein	240e0fdd93	aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone with the mutex held.	2001-05-22 19:01:26 +00:00
John Baldwin	86e92ee7e1	Remove duplicate include and sort includes.	2001-05-22 07:21:46 +00:00
John Baldwin	7d4ad42de5	Sort includes.	2001-05-22 07:01:11 +00:00
John Baldwin	12635f9c89	Unlock the VM lock at the end of munlock() instead of locking it again.	2001-05-22 06:07:36 +00:00
John Baldwin	874468957d	Sort includes from previous commit.	2001-05-22 05:35:45 +00:00
John Baldwin	4edf4a58e6	Sort includes.	2001-05-22 00:56:25 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
John Baldwin	ea7549540f	- Use a timeout for the tsleep in scheduler() instead of having vmmeter() wakeup proc0 by hand to enforce the timeout. - When swapping out a process, keep the process locked via the proc lock from the first checks up until we clear PS_INMEM and set PS_SWAPPING in swapout(). The swapout() function now must be called with the proc lock held and releases it before returning. - Comment out the code to attempt to lock a process' VM structures before swapping out. It is broken in that it releases the lock after obtaining it. If it does grab the lock, it needs to hand it off to swapout() instead of releasing it. This can be revisisted when the VM is locked as this is a valid test to perform. It also causes a lock order reversal for the time being, which is the immediate cause for temporarily disabling it.	2001-05-18 00:08:38 +00:00
John Baldwin	1c58e4e550	During the code to pick a process to kill when memory is exhausted, keep the process in question locked as soon as we find it and determine it to be eligible until we actually kill it. To avoid deadlock, we don't block on the process lock but skip any process that is already locked during our search.	2001-05-17 22:49:03 +00:00
John Baldwin	c96d52a913	- Use PROC_LOCK_ASSERT instead of a direct mtx_assert. - Don't hold Giant in the swapper daemon while we walk the list of processes looking for a process to swap back in. - Don't bother grabbing the sched_lock while checking a process' sleep time in swapout_procs() to ensure that a process has been idle for at least swap_idle_threshold2 before swapping it out. If we lose the race we just let a process stay in memory until the next call of swapout_procs(). - Remove some unneeded spl's, sched_lock does all the locking needed in this case.	2001-05-15 22:20:44 +00:00
Poul-Henning Kamp	a468031ce8	Actually biofinish(struct bio , struct devstat , int error) is more general than the bioerror(). Most of this patch is generated by scripts.	2001-05-06 20:00:03 +00:00
Mark Murray	559034b748	Putting sys/lockmgr.h in here allows us to depollute userland includes a bit. OK'ed by: bde	2001-05-03 11:33:51 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Alfred Perlstein	93c7ba9f09	Address a number of problems with sysctl_vm_zone(). The zone allocator's locks should be leaflocks, meaning that they should never be held when entering into another subsystem, however the sysctl grabs the zone global mutex and individual zone mutexes while holding the lock it calls SYSCTL_OUT which recurses into the VM subsystem in order to wire user memory to do a safe copy. This can block and cause lock order reversals. To fix this: lock zone global. get a count of the number of zones. unlock global. allocate temporary storage. format and SYSCTL_OUT the banner. lock global. traverse list. make sure we haven't looped more than the initial count taken to avoid overflowing the allocated buffer. lock each nodes. read values and format into buffer. unlock individual node. unlock global. format and SYSCTL_OUT the rest of the data. free storage. return. Other problems included not checking for errors when doing sysctl out of the column header. Fixed. Inconsistant termination of the copied string. Fixed. Objected to by: des (for not using sbuf) Since the output is not variable length and I'm actually over allocating signifigantly and I'd like to get this fixed now, I'll work on the sbuf convertion at a later date. I would not object to someone else taking it upon themselves to convert it to sbuf. I hold no MAINTIANER rights to this code (for now).	2001-04-27 22:24:45 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Alfred Perlstein	d8d5fa8805	vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()	2001-04-19 06:18:23 +00:00
Alfred Perlstein	a9fa2c05fc	Protect pager object creation with sx locks. Protect pager object list manipulation with a mutex. It doesn't look possible to combine them under a single sx lock because creation may block and we can't have the object list manipulation block on anything other than a mutex because of interrupt requests.	2001-04-18 20:24:16 +00:00
Alfred Perlstein	305dd591ee	Fix the botched rev 1.59 where I made it such that without INVARIANTS the map is never locked. Submitted by: tegge	2001-04-18 05:30:24 +00:00
Poul-Henning Kamp	f84e29a06c	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.	2001-04-17 08:56:39 +00:00
Alfred Perlstein	cc64b484dd	use TAILQ_FOREACH, fix a comment's location	2001-04-15 10:22:04 +00:00
Alfred Perlstein	971dd34298	if/panic -> KASSERT	2001-04-13 11:15:40 +00:00
Alfred Perlstein	2a758ebe58	protect pbufs and associated counts with a mutex	2001-04-13 10:23:32 +00:00
Alfred Perlstein	493607117e	use %p for pointer printf, include sys/systm.h for printf proto	2001-04-13 10:22:14 +00:00
Alfred Perlstein	7d26b6a450	Use a macro wrapper over printf along with KASSERT to reduce the amount of code here.	2001-04-13 08:07:37 +00:00
Alfred Perlstein	b28cb1ca07	remove truncated part from commment	2001-04-12 21:50:03 +00:00
John Baldwin	1005a129e5	Convert the allproc and proctree locks from lockmgr locks to sx locks.	2001-03-28 11:52:56 +00:00
John Baldwin	f34fa851e0	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>	2001-03-28 09:17:56 +00:00
Thomas Moestl	368d2edce4	Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and hw.intrcnt). Approved by: rwatson	2001-03-23 03:45:17 +00:00
Matthew Dillon	b823bbd6be	Fix a lock reversal problem in the VM subsystem related to threaded programs. There is a case during a fork() which can cause a deadlock. From Tor - The workaround that consists of setting a flag in the vm map that indicates that a fork is in progress and using that mark in the page fault handling to force a revalidation failure. That change will only affect (pessimize) page fault handling during fork for threaded (linuxthreads style) applications and applications using aio_*(). Submited by: tegge	2001-03-14 06:48:53 +00:00
Matthew Dillon	1a484d28dd	Temporarily remove the vm_map_simplify() call from vm_map_insert(). The call is correct, but it interferes with the massive hack called vm_map_growstack(). The call will be returned after our stack handling code is fixed. Reported by: tegge	2001-03-14 06:09:42 +00:00
Ian Dowse	d30344bdfa	When creating a shadow vm_object in vmspace_fork(), only one reference count was transferred to the new object, but both the new and the old map entries had pointers to the new object. Correct this by transferring the second reference. This fixes a panic that can occur when mmap(2) is used with the MAP_INHERIT flag. PR: i386/25603 Reviewed by: dillon, alc	2001-03-09 18:25:54 +00:00
John Baldwin	136d8f42b9	Unrevert the pmap_map() changes. They weren't broken on x86. Sense beaten into me by: peter	2001-03-07 05:29:21 +00:00
John Baldwin	4a01ebd482	Back out the pmap_map() change for now, it isn't completely stable on the i386.	2001-03-07 01:04:17 +00:00
John Baldwin	968950e5d1	- Rework pmap_map() to take advantage of direct-mapped segments on supported architectures such as the alpha. This allows us to save on kernel virtual address space, TLB entries, and (on the ia64) VHPT entries. pmap_map() now modifies the passed in virtual address on architectures that do not support direct-mapped segments to point to the next available virtual address. It also returns the actual address that the request was mapped to. - On the IA64 don't use a special zone of PV entries needed for early calls to pmap_kenter() during pmap_init(). This gets us in trouble because we end up trying to use the zone allocator before it is initialized. Instead, with the pmap_map() change, the number of needed PV entries is small enough that we can get by with a static pool that is used until pmap_init() is complete. Submitted by: dfr Debugging help: peter Tested by: me	2001-03-06 06:06:42 +00:00
Alfred Perlstein	8125b1e66e	Simplify vm_object_deallocate(), by decrementing the refcount first. This allows some of the conditionals to be combined.	2001-03-04 20:25:23 +00:00
Andrew Gallatin	c909b97167	Allocate vm_page_array and vm_page_buckets from the end of the biggest chunk of memory, rather than from the start. This fixes problems allocating bouncebuffers on alphas where there is only 1 chunk of memory (unlike PCs where there is generally at least one small chunk and a large chunk). Having 1 chunk had been fatal, because these structures take over 13MB on a machine with 1GB of ram. This doesn't leave much room for other structures and bounce buffers if they're at the front. Reviewed by: dfr, anderson@cs.duke.edu, silence on -arch Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>	2001-03-01 19:21:24 +00:00
Matthew Dillon	5bf53acb74	If we intend to make the page writable without requiring another fault, make sure that PG_NOSYNC is properly set. Previously we only set it for a write-fault, but this can occur on a read-fault too. (will be MFCd prior to 4.3 freeze)	2001-02-28 04:26:43 +00:00
Robert Watson	edfa785a8e	Introduce per-swap area accounting in the VM system, and export this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem. Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit	2001-02-23 18:46:21 +00:00
Dag-Erling Smørgrav	2f9564de0f	Fix formatting bugs introduced in sysctl_vm_zone() by the previous commit. Also, if SYSCTL_OUT() returns a non-zero value, stop at once.	2001-02-22 14:44:39 +00:00

... 4 5 6 7 8 ...

1508 Commits