freebsd-skq

Author	SHA1	Message	Date
jeff	a7a8bac81f	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
attilio	7dd8ed88a9	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
kib	f13486a222	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
attilio	d5fdf88def	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
alc	08b2128056	Eliminate the reactivation of cached pages in vm_fault_prefault() and vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.	2007-05-22 04:45:59 +00:00
jeff	953418f0d5	- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument. Suggested by: julian@ Contributed by: attilio@	2007-05-20 22:33:42 +00:00
jeff	e1996cb960	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
rwatson	e8ccc2cbbd	Update stale comment on protecting UMA per-CPU caches: we now use critical sections rather than mutexes.	2007-05-09 22:53:34 +00:00
alc	b34f6f7ab1	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
alc	c32653f226	Remove some code from vmspace_fork() that became redundant after revision 1.334 modified _vm_map_init() to initialize the new vm map's flags to zero.	2007-04-26 05:48:17 +00:00
rwatson	bd3cfaaef8	Audit pathnames looked up in swapon(2) and swapoff(2). MFC after: 2 weeks Obtained from: TrustedBSD Project	2007-04-23 14:41:34 +00:00
alc	a10280e050	Correct contigmalloc2()'s implementation of M_ZERO. Specifically, contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest. Submitted by: Michael Plass PR: 81301 MFC after: 1 week	2007-04-19 05:39:54 +00:00
alc	6e14b3e802	Correct two comments. Submitted by: Michael Plass	2007-04-19 04:52:47 +00:00
keramida	f572f4ce6a	Minor typo fix, noticed while I was going through *_pager.c files.	2007-04-10 12:34:51 +00:00
pjd	a4513e9da8	When KVA is exhausted, try the vm_lowmem event for the last time before panicing. This helps a lot in ZFS stability.	2007-04-05 20:52:51 +00:00
pjd	b6f1c3fccc	Fix a problem for file systems that don't implement VOP_BMAP() operation. The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet: if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] } (we have some random values in rahead and rbehind variables) I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails? alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.	2007-04-05 20:49:46 +00:00
alc	ecbefa2cc5	Prevent a race between vm_object_collapse() and vm_object_split() from causing a crash. Suppose that we have two objects, obj and backing_obj, where backing_obj is obj's backing object. Further, suppose that backing_obj has a reference count of two. One being the reference held by obj and the other by a map entry. Now, suppose that the map entry is deallocated and its reference removed by vm_object_deallocate(). vm_object_deallocate() recognizes that the only remaining reference is from a shadow object, obj, and calls vm_object_collapse() on obj. vm_object_collapse() executes if (backing_object->ref_count == 1) { /* * If there is exactly one reference to the backing * object, we can collapse it into the parent. */ vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT); vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes if (op & OBSC_COLLAPSE_WAIT) { vm_object_set_flag(backing_object, OBJ_DEAD); } Finally, suppose that either vm_object_backing_scan() or vm_object_collapse() sleeps releasing its locks. At this instant, another thread executes vm_object_split(). It crashes in vm_object_reference_locked() on the assertion that the object is not dead. If, however, assertions are not enabled, it crashes much later, after the object has been recycled, in vm_object_deallocate() because the shadow count and shadow list are inconsistent. Reviewed by: tegge Reported by: jhb MFC after: 1 week	2007-03-27 08:55:17 +00:00
alc	989c09dfd5	Two small changes to vm_map_pmap_enter(): 1) Eliminate an unnecessary check for fictitious pages. Specifically, only device-backed objects contain fictitious pages and the object is not device-backed. 2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to prevent possible wrap around with extremely large maps and objects, respectively. Observed by: tegge (last summer)	2007-03-25 19:33:40 +00:00
alc	584a970755	vm_page_busy() no longer requires the page queues lock to be held. Reduce the scope of the page queues lock in vm_fault() accordingly.	2007-03-23 06:11:25 +00:00
alc	efc8daaecb	Change the order of lock reacquisition in vm_object_split() in order to simplify the code slightly. Add a comment concerning lock ordering.	2007-03-22 07:02:43 +00:00
alc	2210798637	Use PCPU_LAZY_INC() to update page fault statistics.	2007-03-05 18:55:14 +00:00
jhb	54e4ea54b6	Use pause() in vm_object_deallocate() to yield the CPU to the lock holder rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended to awaken the swapper, not random threads blocked in vm_object_deallocate().	2007-02-27 19:40:26 +00:00
jhb	9081d44243	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
alc	573a964db6	Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.	2007-02-25 06:14:58 +00:00
alc	e4e74de1c2	Change the page's CLEANCHK flag from being a page queue mutex synchronized flag to a vm object mutex synchronized flag.	2007-02-22 06:15:52 +00:00
alc	c0ed1b65cf	Enable vm_page_free() and vm_page_free_zero() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.	2007-02-18 05:54:42 +00:00
alc	7913035c29	Remove a stale comment. Add punctuation to a nearby comment.	2007-02-17 19:37:00 +00:00
alc	3e7d1b7ebd	Relax the page queue lock assertions in vm_page_remove() and vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)	2007-02-15 05:43:38 +00:00
alc	0250171bf6	Avoid the unnecessary acquisition of the free page queues lock when a page is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)	2007-02-14 07:05:55 +00:00
rwatson	b9f6b60a84	Add uma_set_align() interface, which will be called at most once during boot by MD code to indicated detected alignment preference. Rather than cache alignment being encoded in UMA consumers by defining a global alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now a special value (-1) that causes UMA to look at registered alignment. If no preferred alignment has been selected by MD code, a default alignment of (16 - 1) will be used. Currently, no hardware platforms specify alignment; architecture maintainers will need to modify MD startup code to specify an alignment if desired. This must occur before initialization of UMA so that all UMA zones pick up the requested alignment. Reviewed by: jeff, alc Submitted by: attilio	2007-02-11 20:13:52 +00:00
alc	909177e227	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.	2007-02-11 05:18:40 +00:00
jhb	9c764c7fc3	- Move 'struct swdevt' back into swap_pager.h and expose it to userland. - Restore support for fetching swap information from crash dumps via kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps. Reviewed by: arch@, phk MFC after: 1 week	2007-02-07 17:43:11 +00:00
alc	2eb15b506b	Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.	2007-02-07 06:37:30 +00:00
alc	4881bd38e2	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
mohans	83064ec323	Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@	2007-01-25 01:05:23 +00:00
mohans	9799fcf93c	Fix for a bug where only one process (of multiple) blocked on maxpages on a zone is woken up, with the rest never being woken up as a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked procsses instead. This change introduces a thundering herd, but since this should be relatively infrequent, optimizing this (by introducing a count of blocked processes, for example) may be premature. Reviewd by: ups@	2007-01-24 22:49:11 +00:00
jeff	474b917526	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
delphij	9856d14ea1	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 15:05:52 +00:00
rwatson	d6f9a991c6	Remove uma_zalloc_arg() hack, which coerced M_WAITOK to M_NOWAIT when allocations were made using improper flags in interrupt context. Replace with a simple WITNESS warning call. This restores the invariant that M_WAITOK allocations will always succeed or die horribly trying, which is relied on by many UMA consumers. MFC after: 3 weeks Discussed with: jhb	2007-01-10 21:04:43 +00:00
alc	57699b6f9c	Declare the map entry created by kmem_init() for the range from VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as MAP_NOFAULT.	2007-01-07 07:32:04 +00:00
jhb	db17c1d997	- Add a new function uma_zone_exhausted() to see if a zone is full. - Add a printf in swp_pager_meta_build() to warn if the swapzone becomes exhausted so that there's at least a warning before a box that runs out of swapzone space before running out of swap space deadlocks. MFC after: 1 week Reviwed by: alc	2007-01-05 19:09:01 +00:00
alc	143ffef93b	Optimize vm_object_split(). Specifically, make the number of iterations equal to the number of physical pages that are renamed to the new object rather than the new object's virtual size.	2006-12-17 20:14:43 +00:00
alc	71d9f62bca	Simplify the computation of the new object's size in vm_object_split().	2006-12-16 08:17:07 +00:00
kmacy	d0a44b7b7c	Remove the requirement that phys_avail be sorted in ascending order by explicitly finding the lowest and highest addresses when calculating the size of the vm_pages array Reviewed by :alc	2006-12-08 08:44:47 +00:00
julian	396ed947f6	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
ru	01fad2a3f6	The clean_map has been made local to vm_init.c long ago.	2006-11-20 16:23:34 +00:00
ru	825e064fde	Remove a redundant pointer-type variable.	2006-11-20 08:33:55 +00:00
ru	e58221ee6f	When counting vm totals, skip unreferenced objects, including vnodes representing mounted file systems. Reviewed by: alc MFC after: 3 days	2006-11-20 00:16:00 +00:00
alc	ce5848ab99	There is no point in setting PG_REFERENCED on kmem_object pages because they are "unmanaged", i.e., non-pageable, pages. Remove a stale comment.	2006-11-13 00:27:02 +00:00
alc	6093953d36	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)	2006-11-12 21:48:34 +00:00

1 2 3 4 5 ...

2331 Commits