Commit Graph

2772 Commits

Author SHA1 Message Date
jhb
c17f46e472 Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
kib
4f8260e700 Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
alc
2ff68e8630 Eliminate a redundant alignment directive on the page locks array. 2011-01-09 04:34:02 +00:00
alc
a3f4c0274d Eliminate the counting of vm_page_pa_tryrelock calls. We really don't
need it anymore.  Moreover, its implementation had a type mismatch, a
long is not necessarily an uint64_t.  (This mismatch was hidden by
casting.)  Move the remaining two counters up a level in the sysctl
hierarchy.  There is no reason for them to be under the vm.pmap node.

Reviewed by:	kib
2011-01-08 22:45:22 +00:00
alc
8cd48d17c8 Release the page lock early in vm_pageout_clean(). There is no reason to
hold this lock until the end of the function.

With the aforementioned change to vm_pageout_clean(), page locks don't need
to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions.

Reviewed by:	kib
2011-01-03 00:41:56 +00:00
alc
7fb616e10a Make a couple refinements to r216799 and r216810. In particular, revise
a comment and move it to its proper place.

Reviewed by:	kib
2011-01-01 17:39:38 +00:00
brucec
688b83a0a6 There can be more than 0x20000000 swap meta blocks allocated if a swap-backed
md(4) device is used. Don't panic when deallocating such a device if swap
has been used.

PR:	kern/133170
Discussed with:	kib
MFC after:	3 days
2011-01-01 16:59:05 +00:00
kib
f112887cab Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the only
consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY
was cleared early in vm_object_page_clean, before the cleaning pass
was done. This is no longer true after r216799.

 Moreover, since OBJ_CLEANING is a flag, and not the counter, it could
be reset too prematurely when parallel vm_object_page_clean() are
performed.

Reviewed by:	alc (as a part of the bigger patch)
MFC after:	1 month (after r216799 is merged)
2010-12-29 22:26:49 +00:00
alc
cc9b2308bc There is no point in vm_contig_launder{,_page}() flushing held pages,
instead skip over them.  As long as a page is held, it can't be reclaimed by
contigmalloc(M_WAITOK).  Moreover, a held page may be undergoing
modification, e.g., vmapbuf(), so even if the hold were released before the
completion of contigmalloc(), the page might have to be flushed again.

MFC after:	3 weeks
2010-12-29 20:35:36 +00:00
kib
1a1716f8a9 Move the increment of vm object generation count into
vm_object_set_writeable_dirty().

Fix an issue where restart of the scan in vm_object_page_clean() did
not removed write permissions for newly added pages or, if the mapping
for some already scanned page changed to writeable due to fault.
Merge the two loops in vm_object_page_clean(), doing the remove of
write permission and cleaning in the same loop. The restart of the
loop then correctly downgrade writeable mappings.

Fix an issue where a second caller to msync() might actually return
before the first caller had actually completed flushing the
pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not
before.

Calls to pmap_is_modified() are not needed after pmap_remove_write()
there.

Proposed, reviewed and tested by:	alc
MFC after:	1 week
2010-12-29 12:53:53 +00:00
alc
99000f1878 Correct a typo in vm_fault_quick_hold_pages().
Reported by:	Bartosz Stec
2010-12-28 20:02:30 +00:00
alc
8e22952d45 Move vm_object_print()'s prototype to the expected place. 2010-12-27 07:12:22 +00:00
alc
d835374cc6 Retire vm_fault_quick(). It's no longer used.
Reviewed by:	kib@
2010-12-25 23:54:50 +00:00
alc
971b02b7bc Introduce and use a new VM interface for temporarily pinning pages. This
new interface replaces the combined use of vm_fault_quick() and
pmap_extract_and_hold() throughout the kernel.

In collaboration with:	kib@
2010-12-25 21:26:56 +00:00
alc
be5201b0d1 Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages().  Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().

In collaboration with:	kib@
MFC after:	6 weeks
2010-12-20 22:49:31 +00:00
alc
303f816df2 Implement and use a single optimized function for unholding a set of pages.
Reviewed by:	kib@
2010-12-17 22:41:22 +00:00
alc
fa0943bb62 Change memguard_fudge() so that it can handle km_max being zero. Not
every platform defines VM_KMEM_SIZE_MAX, and on those platforms km_max
will be zero.

Reviewed by:	mdf
Tested by:	marius
2010-12-14 05:47:35 +00:00
mlaier
da2dde653e Fix a long standing (from the original 4.4BSD lite sources) race between
vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page
missing" panics.  While faulting in pages for a map entry that is being
wired down, mark the containing map as busy.  In vmspace_fork wait until the
map is unbusy, before we try to copy the entries.

Reviewed by:	kib
MFC after:	5 days
Sponsored by:	Isilon Systems, Inc.
2010-12-09 21:02:22 +00:00
jchandra
bfd4abe2ac Revert the vm/vm_page.c change in r216317.
This adds back changes in r216141, which was reverted by the above
check in.
2010-12-09 07:39:06 +00:00
jchandra
012e7effe8 swi_vm() for mips. 2010-12-09 06:54:06 +00:00
trasz
b2b2bfee2e Fix comment intentation. 2010-12-04 17:41:58 +00:00
imp
d804a05c38 To make minidumps work properly on mips for memory that's direct
mapped and entered via vm_page_setup, keep track of it like we do
for amd64.

# A separate commit will be made to move this to a capability-based ifdef
# rather than arch-based ifdef.

Submitted by:	alc@
MFC after:	1 week
2010-12-03 04:39:48 +00:00
trasz
e5fb69509c Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object".  This is required to make it possible to account
for per-jail swap usage.

Reviewed by:	kib@
Tested by:	pho@
Sponsored by:	FreeBSD Foundation
2010-12-02 17:37:16 +00:00
alc
9ebc09a72f Correct an error in the allocation of the vm_page_dump array in
vm_page_startup().  Specifically, the dump_avail array should be used
instead of the phys_avail array to calculate the size of vm_page_dump.  For
example, the pages for the message buffer are allocated prior to
vm_page_startup() by subtracting them from the last entry in the phys_avail
array, but the first thing that vm_page_startup() does after creating the
vm_page_dump array is to set the bits corresponding to the message buffer
pages in that array.  However, these bits might not actually exist in the
array, because the size of the array is determined by the current value in
the last entry of the phys_avail array.  In general, the only reason why
this doesn't always result in an out-of-bounds array access is that the size
of the vm_page_dump array is rounded up to the next page boundary.  This
change eliminates that dependence on rounding (and luck).

MFC after:	6 weeks
2010-12-01 03:35:19 +00:00
jchandra
c106d4cc70 Fix issue noted by alc while reviewing r215938:
The current implementation of vm_page_alloc_freelist() does not handle
order > 0 correctly. Remove order parameter to the function and use it
only for order 0 pages.

Submitted by:	alc
2010-11-28 05:51:31 +00:00
kib
26f88f4e38 After the sleep caused by encountering a busy page, relookup the page.
Submitted and reviewed by:	alc
Reprted and tested by:	pho
MFC after:	5 days
2010-11-24 12:25:17 +00:00
kib
e6b1bc24dd Eliminate the mab, maf arrays and related variables.
The change also fixes off-by-one error in the calculation of mreq.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	5 days
2010-11-21 10:18:28 +00:00
alc
5bd8badd5e Optimize vm_object_terminate().
Reviewed by:	kib
MFC after:	1 week
2010-11-20 22:30:09 +00:00
kib
0116c3c4cf The runlen returned from vm_pageout_flush() might be zero legitimately,
when mreq page has status VM_PAGER_AGAIN.

MFC after:	5 days
2010-11-20 17:27:38 +00:00
alc
40263c2648 Reduce the amount of detail printed by vm_page_free_toq() when it panics.
Reviewed by:	kib
2010-11-19 17:49:08 +00:00
mlaier
67f52f02f5 Off by one page in vm_reserv_reclaim_contig(): Also reclaim reservations
with only a single free page if that satisfies the requested size.

MFC after:	3 days
Reviewed by:	alc
2010-11-19 04:30:33 +00:00
kib
3851a62f83 vm_pageout_flush() might cache the pages that finished write to the
backing storage. Such pages might be then reused, racing with the
assert in vm_object_page_collect_flush() that verified that dirty
pages from the run (most likely, pages with VM_PAGER_AGAIN status) are
write-protected still. In fact, the page indexes for the pages that
were removed from the object page list should be ignored by
vm_object_page_clean().

Return the length of successfully written run from vm_pageout_flush(),
that is, the count of pages between requested page and first page
after requested with status VM_PAGER_AGAIN. Supply the requested page
index in the array to vm_pageout_flush(). Use the returned run length
to forward the index of next page to clean in vm_object_page_clean().

Reported by:	avg
Reviewed by:	alc
MFC after:	1 week
2010-11-18 21:09:02 +00:00
kib
743ac64197 Only increment object generation count when inserting the page into
object page list.  The only use of object generation count now is a
restart of the scan in vm_object_page_clean(), which makes sense to do
on the page addition. Page removals do not affect the dirtiness of the
object, as well as manipulations with the shadow chain.

Suggested and reviewed by:	alc
MFC after:    1 week
2010-11-18 20:46:28 +00:00
kib
f8badff8ef Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by:	    swell.k gmail com
MFC after:   1 week
2010-11-14 21:59:11 +00:00
kib
336fd1996d Use symbolic names instead of hardcoding values for magic p_osrel constants.
MFC after:   1 week
2010-11-14 18:24:12 +00:00
kib
81eb2c446b Implement a (soft) stack guard page for auto-growing stack mappings.
The unmapped page separates the tip of the stack and possible adjanced
segment, making some uses of stack overflow harder.  The stack growing
code refuses to expand the segment to the last page of the reseved
region when sysctl security.bsd.stack_guard_page is set to 1. The
default value for sysctl and accompanying tunable is 0.

Please note that mmap(MAP_FIXED) still can place a mapping right up to
the stack, making continuous region.

Reviewed by:	alc
MFC after:	1 week
2010-11-14 17:53:52 +00:00
alc
c275a07931 Enable reservation-based physical memory allocation. Even without the
creation of large page mappings in the pmap, it can provide modest
performance benefits.  In particular, for a "buildworld" on a 2x 1GHz
Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system
time by 12.6%.

Tested by:	marius@
2010-11-10 17:57:34 +00:00
alc
bac85b90be In case the stack size reaches its limit and its growth must be restricted,
ensure that grow_amount is a multiple of the page size.  Otherwise, the
kernel may crash in swap_reserve_by_uid() on HEAD and FreeBSD 8.x, and
produce a core file with a missing stack on FreeBSD 7.x.

Diagnosed and reported by: jilles
Reviewed by:	kib
MFC after:	1 week
2010-11-07 21:40:34 +00:00
gonzo
4a291f4ab0 - Add minidump support for FreeBSD/mips 2010-11-07 03:09:02 +00:00
jhb
5497beb840 Update startup_alloc() to support multi-page allocations and allow internal
zones whose objects are larger than a page to use startup_alloc().  This
allows allocation of zone objects during early boot on machines with a large
number of CPUs since the resulting zone objects are larger than a page.

Submitted by:	trema
Reviewed by:	attilio
MFC after:	1 week
2010-11-04 15:33:50 +00:00
alc
80380396eb Correct some format strings used by sysctls.
MFC after:	1 week
2010-10-30 18:00:53 +00:00
jhb
2d6ab8853a - Make 'vm_refcnt' volatile so that compilers won't be tempted to treat
its value as a loop invariant.  Currently this is a no-op because
  'atomic_cmpset_int()' clobbers all memory on current architectures.
- Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop
  a reference in vmspace_free().

Reviewed by:	alc
MFC after:	1 month
2010-10-21 17:29:32 +00:00
avg
9f9aed22bf PG_BUSY -> VPO_BUSY, PG_WANTED -> VPO_WANTED in manual pages and comments
Reviewed by:	alc
MFC after:	4 days
2010-10-20 05:17:23 +00:00
mdf
3f66b92677 uma_zfree(zone, NULL) should do nothing, to match free(9).
Noticed by:	Ron Steinke <rsteinke at isilon dot com>
MFC after:	3 days
2010-10-19 16:06:00 +00:00
lstewart
4171305db0 Change uma_zone_set_max to return the effective value of "nitems" after
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.

Sponsored by:	FreeBSD Foundation
Reviewed by:	gnn, jhb
2010-10-16 04:41:45 +00:00
lstewart
2770ad22da - Simplify implementation of uma_zone_get_max.
- Add uma_zone_get_cur which returns the current approximate occupancy of
  a zone. This is useful for providing stats via sysctl amongst other things.

Sponsored by:	FreeBSD Foundation
Reviewed by:	gnn, jhb
MFC after:	2 weeks
2010-10-16 04:14:45 +00:00
alc
b0606e2f16 If vm_map_find() is asked to allocate a superpage-aligned region of virtual
addresses that is greater than a superpage in size but not a multiple of
the superpage size, then vm_map_find() is not always expanding the kernel
pmap to support the last few small pages being allocated.  These failures
are not commonplace, so this was first noticed by someone porting FreeBSD
to a new architecture.  Previously, we grew the kernel page table in
vm_map_findspace() when we found the first available virtual address.
This works most of the time because we always grow the kernel pmap or page
table by an amount that is a multiple of the superpage size.  Now, instead,
we defer the call to pmap_growkernel() until we are committed to a range
of virtual addresses in vm_map_insert().  In general, there is another
reason to prefer calling pmap_growkernel() in vm_map_insert().  It makes
it possible for someone to do the equivalent of an mmap(MAP_FIXED) on the
kernel map.

Reported by:	Svatopluk Kraus
Reviewed by:	kib@
MFC after:	3 weeks
2010-10-04 16:49:40 +00:00
mdf
41d564973e Replace an XXX comment with the appropriate code.
Submitted by:	alc
2010-09-20 20:41:59 +00:00
alc
ec0b99e7f0 Allow a POSIX shared memory object that is opened for read but not for
write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e.,
copy-on-write.

(This is a regression in the new implementation of POSIX shared memory
objects that is used by HEAD and RELENG_8.  This bug does not exist in
RELENG_7's user-level, file-based implementation.)

PR:		150260
MFC after:	3 weeks
2010-09-19 19:42:04 +00:00
alc
f3dba8dd74 Make refinements to r212824. In particular, don't make
vm_map_unlock_nodefer() part of the synchronization interface for maps.

Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing
how they should be used.  In particular, describe the deferred deallocations
issue with vm_map_unlock_and_wait().

Redo the implementation of vm_map_unlock_and_wait() so that it passes
along the caller's file and line information, just like the other map
locking primitives.

Reviewed by:	kib
X-MFC after:	r212824
2010-09-19 17:43:22 +00:00