Commit Graph

2848 Commits

Author SHA1 Message Date
alc
57e8705396 Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at
eliminating duplicated code in the various pmap implementations.

Micro-optimize vm_phys_free_pages().

Introduce vm_phys_free_contig().  It is fast routine for freeing an
arbitrary number of physically contiguous pages.  In particular, it
doesn't require the number of pages to be a power of two.

Use "u_long" instead of "unsigned long".

Bruce Evans (bde@) has convinced me that the "boundary" parameters
to kmem_alloc_contig(), vm_phys_alloc_contig(), and
vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not
"u_long".  Make this change.
2011-10-30 05:06:14 +00:00
alc
15734d833a Use "u_long" instead of "unsigned long". 2011-10-28 22:36:15 +00:00
alc
f553bda045 Tidy up the comment at the head of vm_page_alloc, and mention that the
returned page has the flag VPO_BUSY set.
2011-10-27 17:29:19 +00:00
alc
841afea2d9 Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc().  While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.
2011-10-27 16:39:17 +00:00
alc
0a7d6450d6 contigmalloc(9) and contigfree(9) are now implemented in terms of other
more general VM system interfaces.  So, their implementation can now
reside in kern_malloc.c alongside the other functions that are declared
in malloc.h.
2011-10-27 02:52:24 +00:00
alc
955d2b5af8 Speed up vm_page_cache() and vm_page_remove() by checking for a few
common cases that can be handled in constant time.  The insight being
that a page's parent in the vm object's tree is very often its
predecessor or successor in the vm object's ordered memq.

Tested by:	jhb
MFC after:	10 days
2011-10-25 16:35:08 +00:00
attilio
846436b8c8 VN_NRESERVLEVEL is used in this file but opt_vm is not included
thus the stub switch won't be correctly handled.
Include opt_vm.h.

Submitted by:	jeff
MFC after:	3 days
2011-10-22 22:00:35 +00:00
kib
8e118d38cf Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

Reviewed by:	marcel
2011-10-15 12:35:18 +00:00
jhb
0d19c767ae Fix a typo in a comment. 2011-10-14 11:48:32 +00:00
marcel
d5c0a67c82 In sys_obreak() and when compiling for amd64 or ia64, when the process
is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x
depends on being able to execute from the heap on i386.
2011-10-13 16:20:10 +00:00
glebius
2522c42334 Make memguard(9) capable to guard uma(9) allocations. 2011-10-12 18:08:28 +00:00
kib
eca9de9f4b Style nit.
Submitted by:	jhb
MFC after:	2 weeks
2011-09-29 00:44:34 +00:00
kib
2e7081472b Fix grammar.
Submitted by:	bf
MFC after:	2 weeks
2011-09-28 16:12:15 +00:00
kib
e84b0ecd81 Use the trick of performing the atomic operation on the contained aligned
word to handle the dirty mask updates in vm_page_clear_dirty_mask().
Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold()
the sole purpose of which was to protect dirty on architectures which
does not provide short or byte-wide atomics.

Reviewed by:	alc, attilio
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 14:57:50 +00:00
kib
3383aff65a Use the explicitly-sized types for the dirty and valid masks.
Requested by:	attilio
Reviewed by:	alc
MFC after:	2 weeks
2011-09-28 14:51:28 +00:00
kmacy
99851f359e In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
kib
a9d505a22a Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
kib
97e23a4cb5 Update some comments in swap_pager.c.
Reviewed and most wording by:	alc
MFC after:	1 week
Approved by:	re (bz)
2011-08-22 20:44:18 +00:00
kib
1d94389f8f Apply the limit to avoid the overflows in the radix tree subr_blist.c
after the conversion of the swap device size to the page size units,
not before. That lifts the limit on the usable swap partition size
from 32GB to 256GB, that is less depressing for the modern systems.

Submitted by:   Alexander V. Chernikov <melifaro ipfw ru>
Reviewed by:    alc
Approved by:	re (bz)
MFC after:      2 weeks
2011-08-22 11:18:47 +00:00
rwatson
4af919b491 Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
kib
f408aa11a3 - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
alc
24eb641020 Fix an error in kmem_alloc_attr(). Unless "tries" is updated,
kmem_alloc_attr() could get stuck in a loop.

Approved by:	re (kib)
MFC after:	3 days
2011-08-07 00:11:39 +00:00
kib
96a4fe50dc Implement the linprocfs swaps file, providing information about the
configured swap devices in the Linux-compatible format.

Based on the submission by:	Robert Millan <rmh debian org>
PR:	kern/159281
Reviewed by:	bde
Approved by:	re (kensmith)
MFC after:	2 weeks
2011-08-01 19:12:15 +00:00
kib
645f499b5d Fix a race in the device pager allocation. If another thread won and
allocated the device pager for the given handle, then the object
fictitious pages list and the object membership in the global object
list still need to be initialized. Otherwise, dev_pager_dealloc() will
traverse uninitialized pointers.

Reported and tested by: pho
Reviewed by:    jhb
Approved by:	re (kensmith)
MFC after:      1 week
2011-07-30 14:13:57 +00:00
kib
e45048555a Extract the code to translate VM error into errno, into an exported
function vm_mmap_to_errno(). It is useful for the drivers that implement
mmap(2)-like functionality, to be able to return error codes consistent
with mmap(2).

Sponsored by:	The FreeBSD Foundation
No objections from:	alc
MFC after:	1 week
2011-07-10 20:49:13 +00:00
kib
315e379ec2 Style.
MFC after:	3 days
2011-07-10 20:45:13 +00:00
kib
61e3fec296 Add a facility to disable processing page faults. When activated,
uiomove generates EFAULT if any accessed address is not mapped, as
opposed to handling the fault.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc (previous version)
2011-07-09 15:21:10 +00:00
trasz
4a17b24427 All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0.  This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".
2011-07-06 20:06:44 +00:00
attilio
9be9b5e188 Handle a race between device_pager and devsw in a more graceful manner:
return an error code rather than panic the kernel.

Sponsored by:	Sandvine Incorporated
Reviewed by:	kib
Tested by:	pho
MFC after:	2 weeks
2011-07-06 15:09:52 +00:00
alc
dd0c3b188c Initialize marker pages as held rather than fictitious/wired. Marking the
page as held is more useful as a safety precaution in case someone forgets
to check for PG_MARKER.

Reviewed by:	kib
2011-07-02 23:34:47 +00:00
alc
21902be08c Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
alc
e7ea911039 Revert to using the page queues lock in vm_page_clear_dirty_mask() on
MIPS.  (At present, although atomic_clear_char() is defined by atomic.h
on MIPS, it is not actually implemented by support.S.)
2011-06-23 05:23:59 +00:00
alc
95eeb54f18 Precisely document the synchronization rules for the page's dirty field.
(Saying that the lock on the object that the page belongs to must be held
only represents one aspect of the rules.)

Eliminate the use of the page queues lock for atomically performing read-
modify-write operations on the dirty field when the underlying architecture
supports atomic operations on char and short types.

Document the fact that 32KB pages aren't really supported.

Reviewed by:	attilio, kib
2011-06-19 19:13:24 +00:00
kib
6b9465356d Assert that page is VPO_BUSY or page owner object is locked in
vm_page_undirty(). The assert is not precise due to VPO_BUSY owner
to tracked, so assertion does not catch the case when VPO_BUSY is
owned by other thread.

Reviewed by:	alc
2011-06-11 20:15:19 +00:00
kib
27bd440e10 Fix a bug in r222586. Lock the page owner object around the modification
of the m->dirty.

Reported and tested by:	nwhitehorn
Reviewed by:	alc
2011-06-11 20:13:28 +00:00
kib
ad5bd06523 In the VOP_PUTPAGES() implementations, change the default error from
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().

VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.

The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.

Reported and tested by:	gavin
Reviewed by:	alc, rmacklem
MFC after:	1 week
2011-06-01 21:00:28 +00:00
alc
2c928f6173 Correct an error in r222163. Unless UMA_MD_SMALL_ALLOC is defined,
startup_alloc() must be used until uma_startup2() is called.

Reported by:	jh
2011-05-22 17:46:16 +00:00
alc
a899800e2a 1. Prior to r214782, UMA did not support multipage allocations before
uma_startup2() was called.  Thus, setting the variable "booted" to true in
uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because
any allocations made after uma_startup() but before uma_startup2() could be
satisfied by uma_small_alloc().  Now, however, some multipage allocations
are necessary before uma_startup2() just to allocate zone structures on
machines with a large number of processors.  Thus, a Boolean can no longer
effectively describe the state of the UMA allocator.  Instead, make "booted"
have three values to describe how far initialization has progressed.  This
allows multipage allocations to continue using startup_alloc() until
uma_startup2(), but single-page allocations may begin using
uma_small_alloc() after uma_startup().

2. With the aforementioned change, only a modest increase in boot pages is
necessary to boot UMA on a large number of processors.

3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM.  It has only been used between
r182028 and r204128.

Reviewed by:	attilio [1], nwhitehorn [3]
Tested by:	sbruno
2011-05-21 17:43:43 +00:00
alc
3c08fd7d05 Fix spelling errors. 2011-05-20 17:28:00 +00:00
alc
4037bb644a Eliminate a redundant #include. ("vm/vm_param.h" already includes
"machine/vmparam.h".)
2011-05-20 15:26:31 +00:00
mdf
3d3b036f95 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
mdf
9465c34001 Usa a globally visible region of zeros for both /dev/zero and the md
device.  There are likely other kernel uses of "blob of zeros" than can
be converted.

Reviewed by:	alc
MFC after:	1 week
2011-05-13 18:48:00 +00:00
mlaier
39f7e10a26 Another long standing vm bug found at Isilon:
Fix a race between vm_object_collapse and vm_fault.

Reviewed by:	alc@
MFC after:	3 days
2011-05-09 20:27:49 +00:00
obrien
1b56a148b0 Reap old SPL comments.
Reviewed by:	alc
2011-04-26 22:18:53 +00:00
kib
d98bdded17 Fix two bugs in r218670.
Hold the vnode around the region where object lock is dropped, until
vnode lock is acquired.

Do not drop the vnode reference for a case when the object was
deallocated during unlock. Note that in this case, VV_TEXT is cleared
by vnode_pager_dealloc().

Reported and tested by:	pho
Reviewed by:	alc
MFC after:	3 days
2011-04-23 21:38:21 +00:00
jhb
96b1d8b6d7 Fix several places to ignore processes that are not yet fully constructed.
MFC after:	1 week
2011-04-06 17:47:22 +00:00
trasz
440cd5face In vm_daemon(), do not skip processes stopped with SIGSTOP. 2011-04-06 16:27:04 +00:00
trasz
71afa1f865 Add RACCT_RSS.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-06 16:24:24 +00:00
trasz
92bec9b84c Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
kib
eeb1ebf124 Handle the corner case in vm_fault_quick_hold_pages().
If supplied length is zero, and user address is invalid, function
might return -1, due to the truncation and rounding of the address.
The callers interpret the situation as EFAULT. Instead of handling
the zero length in caller, filter it in vm_fault_quick_hold_pages().

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
2011-03-25 16:38:10 +00:00