freebsd-dev

Author	SHA1	Message	Date
John Baldwin	e806d352d2	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
Edward Tomasz Napierala	1ba5ad4210	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
Alan Cox	eb00b276ab	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
Alan Cox	e3ef0d2fcf	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Konstantin Belousov	6fb8c0c117	When doing kstack swapin, read as much pages in one run as possible. Suggested and reviewed by: alc (previous version) Tested by: pho MFC after: 2 weeks	2010-04-29 09:59:16 +00:00
Alan Cox	7b7d5b6c58	vm_thread_swapout() can safely dirty the page before rather than after acquiring the page queues lock.	2010-04-19 00:18:14 +00:00
Juli Mallett	ca596a25f0	o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the address space for an address as aligned by the new pmap_align_tlb() function, which is for constraints imposed by the TLB. [1] o) Add a kmem_alloc_nofault_space() function, which acts like kmem_alloc_nofault() but allows the caller to specify which find-space option to use. [1] o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the kernel stack address on MIPS. [1] o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on an odd boundary within the TLB, so that they are suitable for insertion as wired entries and do not have to share a TLB entry with another mapping, assuming they are appropriately-sized. o) Eliminate md_realstack now that the kstack will be appropriately-aligned on MIPS. o) Increase the number of guard pages to 2 so that we retain the proper alignment of the kstack address. Reviewed by: [1] alc X-MFC-after: Making sure alc has not come up with a better interface.	2010-04-18 22:32:07 +00:00
Alan Cox	6c68c971cb	Simplify vm_thread_swapin().	2010-04-13 06:48:37 +00:00
Alan Cox	ac45ee97c9	Initialize the virtual memory-related resource limits in a single place. Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.) Make vm_thread_swapin() and vm_thread_swapout() static. Submitted by: bde (an earlier version) Reviewed by: kib	2010-04-11 16:26:07 +00:00
Marcel Moolenaar	1a4fcaebe3	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
Konstantin Belousov	8a945d109c	Reintroduce the r196640, after fixing the problem with my testing. Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week	2009-09-01 11:41:51 +00:00
Konstantin Belousov	f25fa6abb2	Reverse r196640 and r196644 for now.	2009-08-29 21:53:08 +00:00
Konstantin Belousov	c3cf0b476f	Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week	2009-08-29 13:28:02 +00:00
Alan Cox	0a2e596a93	Eliminate unnecessary obfuscation when testing a page's valid bits.	2009-06-07 19:38:26 +00:00
Alan Cox	d1a6e42ddd	If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)	2009-06-06 20:13:14 +00:00
Alan Cox	7a122777c9	vm_thread_swapin() needn't validate any pages. The pages are already validated by vm_pager_get_pages().	2009-06-05 17:06:20 +00:00
John Baldwin	da7bbd2c08	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks	2008-08-05 20:02:31 +00:00
Jeff Roberson	8df78c41d6	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Jeff Roberson	c5aa6b581d	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
Konstantin Belousov	89b57fcf01	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
Jeff Roberson	258853ab1c	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 05:07:07 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Alan Cox	66bdd5d619	The page queues lock is no longer required by vm_page_wakeup().	2006-10-23 05:27:31 +00:00
Tor Egge	57051fdc4b	Close race between vmspace_exitfree() and exit1() and races between vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc	2006-05-29 21:28:56 +00:00
Alan Cox	da61b9a69e	Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks	2005-12-16 18:34:14 +00:00
Stephan Uphoff	d13ec71369	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.	2005-05-23 23:01:53 +00:00
Alan Cox	10c447fac2	Swap in can occur safely without Giant. Release Giant on entry to scheduler().	2005-05-22 21:06:07 +00:00
Alan Cox	35cf2323f8	Remove GIANT_REQUIRED from swapout_procs().	2005-05-22 00:30:50 +00:00
Alan Cox	75337a5677	Guard against address wrap in kernacc(). Otherwise, a program accessing a bad address range through /dev/kmem can panic the machine. Submitted by: Mark W. Krentel Reported by: Kris Kennaway MFC after: 1 week	2005-01-22 19:21:29 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
David Schultz	9799b417d5	Disable U area swapping and remove the routines that create, destroy, copy, and swap U areas. Reviewed by: arch@	2004-11-20 02:29:00 +00:00
Alan Cox	d19ef81437	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
Alan Cox	ddf4bb37c8	Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().	2004-10-24 18:46:32 +00:00
David Schultz	8daa8c602a	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.	2004-09-19 18:34:17 +00:00
Alan Cox	94ddc7076d	Push Giant deep into vm_forkproc(), acquiring it only if the process has mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).	2004-09-03 05:11:32 +00:00
Alan Cox	9be60284a6	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
John Baldwin	d202e0cccc	- Don't use a variable to point to the user area that we only use once. Just use p2->p_uarea directly instead. - Remove an old and mostly bogus assertion regarding p2->p_sigacts. - Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.	2004-07-02 03:45:07 +00:00
David Schultz	17d9d0d049	Update a stale comment. The heuristic to swap processes out based on the number of pages already paged out was broken in rev 1.10 and removed in rev 1.11.	2004-06-27 01:58:12 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Brian Feldman	d9b2500eef	In r1.190, vslock() and vsunlock() were bogusly made to do a "user wire" and a "system unwire." Make this a "system wire" and "system unwire." Reviewed by: alc	2004-05-07 11:43:24 +00:00
Warner Losh	05eb3785e7	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-06 20:15:37 +00:00
Don Lewis	bb734798af	Make overflow/wraparound checking more robust and unbreak len=0 in vslock(), mlock(), and munlock(). Reviewed by: bde	2004-03-15 09:11:23 +00:00
Don Lewis	f0ea4612ef	Style(9) changes. Pointed out by: bde	2004-03-15 06:43:51 +00:00
Don Lewis	ce8660e395	Revert to the original vslock() and vsunlock() API with the following exceptions: Retain the recently added vslock() error return. The type of the len argument should be size_t, not u_int. Suggested by: bde	2004-03-15 06:42:40 +00:00
Alan Cox	fcffa790e9	Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.	2004-03-07 21:06:48 +00:00
Don Lewis	169299398a	Undo the merger of mlock()/vslock and munlock()/vsunlock() and the introduction of kern_mlock() and kern_munlock() in src/sys/kern/kern_sysctl.c 1.150 src/sys/vm/vm_extern.h 1.69 src/sys/vm/vm_glue.c 1.190 src/sys/vm/vm_mmap.c 1.179 because different resource limits are appropriate for transient and "permanent" page wiring requests. Retain the kern_mlock() and kern_munlock() API in the revived vslock() and vsunlock() functions. Combine the best parts of each of the original sets of implementations with further code cleanup. Make the mclock() and vslock() implementations as similar as possible. Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent test, which can return EAGAIN, last so that requests that have no hope of ever being satisfied will not be retried unnecessarily. Disable the test that can return EAGAIN in the vslock() implementation because it will cause the sysctl code to wedge. Tested by: Cy Schubert <Cy.Schubert AT komquats.com>	2004-03-05 22:03:11 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Bruce Evans	9a44a82b61	Fixed breakage of scheduling in rev.1.29 of subr_4bsd.c. The "scheduler" here has very little to do with scheduling. It is actually the swapper, and it really must be the last SYSINIT'ed item like its comment says, since proc0 metamorphoses into swapper by calling scheduler() last in mi_start(), and scheduler() never returns.. Rev.1.29 of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item (kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST. The sorting of SYSINITs with identical orders (at all levels) is apparently nondeterministic, so this resulted in schedule() sometimes being called second last and schedcpu_thread() not being called at all. This quick fix just changes the code to almost match the comment (SI_ORDER_FIRST -> SI_ORDER_ANY). "LAST" is misspelled "ANY", and there is no way to ensure that there is only 1 very lst SYSINIT. A more complete fix would remove the SYSINIT obfuscation.	2004-01-29 12:35:11 +00:00
Alan Cox	d88346020b	- The Open Group Base Specifications Issue 6 specifies that an munmap(2) must return EINVAL if size is zero. Submitted by: tegge - In order to avoid a race condition in multithreaded applications, the check and removal operations by munmap(2) must be in the same critical section. To accomodate this, vm_map_check_protection() is modified to require its caller to obtain at least a read lock on the map.	2003-11-10 01:37:40 +00:00
Bruce M Simpson	5d264f84f3	Revert previous commit. Come back vslock(), all is forgiven. Pointy hat to: bms	2003-10-05 12:41:08 +00:00
Bruce M Simpson	aac7652ecd	Retire vslock() and vsunlock() with extreme prejudice. Discussed with: pete	2003-10-05 09:47:54 +00:00
Alan Cox	ef13663bb6	Three unrelated changes to vm_proc_new(): (1) add vm object locking on the U pages object; (2) reorganize such that the U pages object is created and filled in one block; and (3) remove an unnecessary clearing of PG_ZERO.	2003-08-18 01:31:43 +00:00
Marcel Moolenaar	710338e94f	In vm_thread_swap{in\|out}(), remove the alpha specific conditional compilation and replace it with a call to cpu_thread_swap{in\|out}(). This allows us to add similar code on ia64 without cluttering the code even more.	2003-08-16 23:15:15 +00:00
Bruce M Simpson	abd498aa71	Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture are necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC). PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks	2003-08-11 07:14:08 +00:00
Poul-Henning Kamp	8f60c087e6	Change the layout policy of the swap_pager from a hardcoded width striping to a per device round-robin algorithm. Because of the policy of not attempting to retain previous swap allocation on page-out, this means that a newly added swap device almost instantly takes its 1/N share of the I/O load but it takes somewhat longer for it to assume it's 1/N share of the pages if there is plenty of space on the other devices. Change the 8G total swapspace limitation to 8G per device instead by using a per device blist rather than one global blist. This reduces the memory footprint by 75% (typically a couple hundred kilobytes) for the common case with one swapdevice but NSWAPDEV=4. Remove the compile time constant limit of number of swap devices, there is no limit now. Instead of a fixed size array, store the per swapdev structure in a TAILQ. Total swap space is still addressed by a 32 bit page number and therefore the upper limit is now 2^42 bytes = 16TB (for i386). We still do not allocate the first page of each device in order to give some amount of protection to any bsdlabel at the start of the device. A new device is appended after the existing devices in the swap space, no attempt is made to fill in holes left behind by swapoff (this can trivially be changed should it ever become a problem). The sysctl vm.nswapdev now reflects the number of currently configured swap devices. Rename vm_swap_size to swap_pager_avail for consistency with other exported names. Change argument type for vm_proc_swapin_all() and swap_pager_isswapped() to be a struct swdevt pointer rather than an index. Not changed: we are still using blists to manage the free space, but since the swapspace is no longer fragmented by the striping different resource managers might fare better.	2003-08-03 13:35:31 +00:00
Peter Wemm	15a7ad60fb	Add #include "opt_kstack_pages.h" and "opt_kstack_max_pages.h" to remain in sync with the backend machdep code. When cpu_thread_init() does not have the same idea of KSTACK_PAGES as the thing that created the kstack, all hell breaks loose. Bad alc! no cookie! :-)	2003-07-31 01:25:05 +00:00
Alan Cox	a04a7f2242	Use #ifdef __alpha__, not __alpha.	2003-06-15 00:12:42 +00:00
Alan Cox	49a2507bd1	Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms. Two details: 1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0. 2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().	2003-06-14 23:23:55 +00:00
Alan Cox	89f4fca265	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.	2003-06-14 06:20:25 +00:00
Alan Cox	8630c1173e	Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.	2003-06-13 03:02:28 +00:00
David E. O'Brien	874651b13c	Use __FBSDID().	2003-06-11 23:50:51 +00:00
Peter Wemm	77e2a274d0	GC unused cpu_wait() function	2003-06-11 05:20:33 +00:00
Poul-Henning Kamp	0b074f6c93	Remove unused variables Found by: FlexeLint	2003-05-31 19:51:05 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
Alan Cox	17cd3642fe	- Lock the vm_object when performing swap_pager_isswapped(). - Assert that the vm_object is locked in swap_pager_isswapped().	2003-04-28 17:13:53 +00:00
John Baldwin	8f88740381	- Don't bother using the proc lock to test just P_SYSTEM as that is set in fork1() and never changes. - The proc lock is enough to cover reading p_state, so push down sched_lock into the PRS_NORMAL case of the switch on p_state.	2003-04-25 20:06:30 +00:00
Alan Cox	6a07e90d63	- Lock the vm_object when iterating over its list of resident pages.	2003-04-25 16:30:02 +00:00
John Baldwin	11edc1e0d7	Fix compiling in the NO_SWAPPING case. Submitted by: bde (partially)	2003-04-23 18:21:41 +00:00
John Baldwin	664f718ba1	- Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a race where a thread could assume that a process was swapped in by PHOLD() when it actually wasn't fully swapped in yet. - In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing this check after bumping p_lock in the PS_INMEM == 0 case. Also, sched_lock is only needed for setting and clearning swapping PS_* flags and the swap thread inhibitor. - Don't set and clear the thread swap inhibitor in the same loops as the pmap_swapin/out_thread() since we have to do it under sched_lock. Instead, mimic the treatment of the PS_INMEM flag and use separate loops to set the inhibitors when clearing PS_INMEM and clear the inhibitors when setting PS_INMEM. - swapout() now returns with the proc lock held as it holds the lock while adjusting the swapping-related PS_* flags so that the proc lock can be used to test those flags. - Only use the proc lock to check the swapping-related PS_* flags in several places. - faultin() no longer requires sched_lock to be held by callers. - Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we have PS_SWAPPINGIN.	2003-04-22 20:00:26 +00:00
Tom Rhodes	9faaf3b3c8	Add some tunable descriptions. Submitted by: hmp Discussed with: bde	2003-04-17 15:44:22 +00:00
Tom Rhodes	2a3eeaa240	Pre-content whitespace commit. Discussed with: bde	2003-04-17 15:39:12 +00:00
Alfred Perlstein	c3dfdfd132	use 'void *' instead of 'caddr_t' for useracc, kernacc, vslock and vsunlock.	2003-01-21 11:34:57 +00:00
Matthew Dillon	2d5c7e4506	Close the remaining user address mapping races for physical I/O, CAM, and AIO. Still TODO: streamline useracc() checks. Reviewed by: alc, tegge MFC after: 7 days	2003-01-20 17:46:48 +00:00
Alan Cox	dc907f6632	- Hold the page queues lock around vm_page_wakeup().	2002-12-24 04:24:58 +00:00
Matthew Dillon	92da00bb24	This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment. Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks	2002-12-15 19:17:57 +00:00
John Baldwin	1c865ac70e	- Check that a process isn't a new process (p_state == PRS_NEW) before trying to acquire it's proc lock since the proc lock may not have been constructed yet. - Split up the one big comment at the top of the loop and put the pieces in the right order above the various checks. Reported by: kris (1)	2002-10-22 14:31:32 +00:00
Julian Elischer	d524d69b16	Remove old useless debugging code	2002-10-14 20:31:54 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Jake Burkholder	05ba50f522	Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.	2002-09-21 22:07:17 +00:00
Julian Elischer	71fad9fdee	Completely redo thread states. Reviewed by: davidxu@freebsd.org	2002-09-11 08:13:56 +00:00
Seigo Tanimura	b1f99ebe2b	- Do not swap out a process if it is in creation. The process may have no address space yet. - Check whether a process is a system process prior to dereferencing its p_vmspace. Aio assumes that only the curthread switches the address space of a system process.	2002-09-09 09:05:06 +00:00
Julian Elischer	1faf202ea9	Use UMA as a complex object allocator. The process allocator now caches and hands out complete process structures including substructures . i.e. it get's the process structure with the first thread (and soon KSE) already allocated and attached, all in one hit. For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is unused and in the process cache. This saves having to allocate and attach it later, effectively bringing us (hopefully) close to the efficiency of pre-KSE systems where these were a single structure. Reviewed by: davidxu@freebsd.org, peter@freebsd.org	2002-09-06 07:00:37 +00:00
David Xu	1279572a92	s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/ Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them. Approved by: julian (mentor)	2002-09-05 07:30:18 +00:00
Alan Cox	239b5b9707	o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably leads to unnecessary pmap_page_protect() calls if one of these pages is paged out after unwiring. Note: setting PG_MAPPED asserts that the page's pv list may be non-empty. Since checking the status of the page's pv list isn't any harder than checking this flag, the flag should probably be eliminated. Alternatively, PG_MAPPED could be set by pmap_enter() exclusively rather than various places throughout the kernel.	2002-07-31 18:46:47 +00:00

1 2 3 4 5 ...

300 Commits