Commit Graph

2416 Commits

Author SHA1 Message Date
Marcel Moolenaar
8775db6f50 Make the vm_pmap field of struct vmspace the last field in the
structure. This allows per-CPU variations of struct pmap on a
single architecture without affecting the machine-independent
fields. As such, the PMAP variations don't affect the ABI. They
become part of it.
2008-03-01 22:54:42 +00:00
Alan Cox
688559667f Correct a long-standing error in vm_object_page_remove(). Specifically,
pmap_remove_all() must not be called on fictitious pages.  To date,
fictitious pages have been allocated from zeroed memory, effectively
hiding this problem because the fictitious pages appear to have an empty
pv list.  Submitted by: Kostik Belousov

Rewrite the comments describing vm_object_page_remove() to better
describe what it does.  Add an assertion.  Reviewed by: Kostik Belousov

MFC after: 1 week
2008-02-26 17:16:48 +00:00
Alan Cox
4c8e0452e0 Correct a long-standing error in vm_object_deallocate(). Specifically,
only anonymous default (OBJT_DEFAULT) and swap (OBJT_SWAP) objects should
ever have OBJ_ONEMAPPING set.  However, vm_object_deallocate() was
setting it on device (OBJT_DEVICE) objects.  As a result,
vm_object_page_remove() could be called on a device object and if that
occurred pmap_remove_all() would be called on the device object's pages.
However, a device object's pages are fictitious, and fictitious pages do
not have an initialized pv list (struct md_page).

To date, fictitious pages have been allocated from zeroed memory,
effectively hiding this problem.  Now, however, the conversion of rotting
diagnostics to invariants in the amd64 and i386 pmaps has revealed the
problem.  Specifically, assertion failures have occurred during the
initialization phase of the X server on some hardware.

MFC after: 1 week
Discussed with: Kostik Belousov
Reported by: Michiel Boland
2008-02-24 18:03:56 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Pawel Jakub Dawidek
79c2840d1d When one tries to allocate memory with the M_WAITOK flag and we are short in
address space in kmem map call vm_lowmem event in a loop and wait a bit for
subsystems to reclaim some memory which in turn will reclaim address space as
well.

Note, this is a work-around.

Reviewed by:	alc
Approved by:	alc
MFC after:	3 days
2008-01-10 08:36:38 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
John Baldwin
8e38aeff17 Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
  object which provides the backing store.  Each descriptor starts off with
  a size of zero, but the size can be altered via ftruncate(2).  The shared
  memory file descriptors also support fstat(2).  read(2), write(2),
  ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
  memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
  manage shared memory file descriptors.  The virtual namespace that maps
  pathnames to shared memory file descriptors is implemented as a hash
  table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
  of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
  path argument to shm_open(2).  In this case, an unnamed shared memory
  file descriptor will be created similar to the IPC_PRIVATE key for
  shmget(2).  Note that the shared memory object can still be shared among
  processes by sharing the file descriptor via fork(2) or sendmsg(2), but
  it is unnamed.  This effectively serves to implement the getmemfd() idea
  bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
  collected when they are not referenced by any open file descriptors or
  the shm_open(2) virtual namespace.

Submitted by:	dillon, peter (previous versions)
Submitted by:	rwatson (I based this on his version)
Reviewed by:	alc (suggested converting getmemfd() to shm_open())
2008-01-08 21:58:16 +00:00
Christian S.J. Peron
35918c55e5 When MAC is enabled in the kernel, fix a panic triggered by a locking
assertion hit in swapoff_one() when we un-mount a swap partition.  We
should be using curthread where we used thread0 before.  This change
also replaces the thread argument with a credential argument, as the
MAC framework only requires the cred.

It should be noted that this allows the machine to be rebooted without
panicing with "cannot differ from curthread or NULL" when MAC is enabled.

Submitted by:	rwatson
Reviewed by:	attilio
MFC after:	2 weeks
2008-01-08 14:58:41 +00:00
Konstantin Belousov
77bc7900bc In the vm_map_stack(), check for the specified stack region wraparound.
Reported and tested by:	Peter Holm
Reviewed by:	alc
MFC after:	3 days
2008-01-04 04:33:13 +00:00
Alan Cox
eb2a051720 Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.
2008-01-03 07:34:34 +00:00
Alan Cox
273bf93c8d Defer setting either PG_CACHED or PG_FREE until after the free page
queues lock is acquired.  Otherwise, the state of a reservation's
pages' flags and its population count can be inconsistent.  That could
result in a page being freed twice.

Reported by:	kris
2008-01-02 04:43:47 +00:00
Alan Cox
af6ce1660a Correct a style error that was introduced in revision 1.77. 2008-01-01 20:36:04 +00:00
Alan Cox
f8a47341fe Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
Alan Cox
3df92083af Add a list of reservations to the vm object structure.
Recycle the vm object's "pg_color" field to represent the color of the
first virtual page address at which the object is mapped instead of the
color of the object's first physical page.  Since an object may not be
mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color"
is valid.
2007-12-27 17:56:35 +00:00
Alan Cox
ae0fee95e1 Add the superpage reservation type. 2007-12-27 17:08:11 +00:00
Alan Cox
9742373a92 Update the comment describing vm_phys_unfree_page(). 2007-12-21 02:44:31 +00:00
Alan Cox
e35395ce21 Modify vm_phys_unfree_page() so that it no longer requires the given
page to be in the free lists.  Instead, it now returns TRUE if it
removed the page from the free lists and FALSE if the page was not
in the free lists.

This change is required to support superpage reservations.  Specifically,
once reservations are introduced, a cached page can either be in the
free lists or a reservation.
2007-12-20 22:45:54 +00:00
Alan Cox
bc8794a12a Correct one half of a loop continuation condition in vm_phys_unfree_page().
At present, this error is inconsequential; the other half of the loop
continuation condition is sufficient to achieve correct execution.
2007-12-19 23:09:45 +00:00
Alan Cox
0349775790 Eliminate redundant code from vm_page_startup(). 2007-12-19 05:47:50 +00:00
Alan Cox
21e10ad46a Simplify vm_page_free_toq(). 2007-12-11 21:20:34 +00:00
Alan Cox
b640825647 Correct a comment. 2007-12-02 07:43:42 +00:00
Robert Watson
9ccca7d1b1 Modify stack(9) stack_print() and stack_sbuf_print() routines to use new
linker interfaces for looking up function names and offsets from
instruction pointers.  Create two variants of each call: one that is
"DDB-safe" and avoids locking in the linker, and one that is safe for
use in live kernels, by virtue of observing locking, and in particular
safe when kernel modules are being loaded and unloaded simultaneous to
their use.  This will allow them to be used outside of debugging
contexts.

Modify two of three current stack(9) consumers to use the DDB-safe
interfaces, as they run in low-level debugging contexts, such as inside
lockmgr(9) and the kernel memory allocator.

Update man page.
2007-12-01 22:04:16 +00:00
Alan Cox
da31e3aa04 Make contigmalloc(9)'s page laundering more robust. Specifically, use
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem.  Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.

MFC after: 6 weeks
2007-11-25 20:37:29 +00:00
Alan Cox
9c5ce94257 Tidy up: Add comments. Eliminate the pointless
malloc_type_allocated(..., 0) calls that occur when contigmalloc() has
failed.  Eliminate the acquisition and release of the page queues lock
from vm_page_release_contig().  Rename contigmalloc2() to
contigmapping(), reflecting what it does.
2007-11-25 07:42:34 +00:00
Alan Cox
5dfc28704d Add a read/write sysctl for reconfiguring the maximum number of physical
pages that can be wired.

Submitted by:	Eugene Grosbein
PR:		114654
MFC after:	6 weeks
2007-11-23 00:30:19 +00:00
Alan Cox
82cfdd5adc Remove an unnecessary call to pmap_remove_all() and the associated "XXX"
comments from vnode_pager_setsize().  This call was introduced in
revision 1.140 to address a problem that no longer exists.
Specifically, pmap_zero_page_area() has replaced a (possibly)
problematic implementation of page zeroing that was based on
vm_pager_map(), bzero(), and vm_pager_unmap().
2007-11-22 20:01:38 +00:00
Alan Cox
ddd6e7d2ab When reactivating a cached page, reset the page's pool to the default
pool.  (Not doing this before was a performance pessimization but not
a cause for panic.)
2007-11-21 23:22:10 +00:00
Alan Cox
59677d3c0e Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed.  Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed.  However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count.  This can be a mistake.  Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings.  Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed.  Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired.  The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed.  To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks
2007-11-17 22:52:29 +00:00
Pawel Jakub Dawidek
8ce2d00a04 Change unused 'user_wait' argument to 'timo' argument, which will be
used to specify timeout for msleep(9).

Discussed with:	alc
Reviewed by:	alc
2007-11-07 21:56:58 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
Konstantin Belousov
aefac17759 The intent of the freeing the (zeroed) page in vm_page_cache() for
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().

For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().

Reported and tested by:	Peter Holm
Reviewed by:	alc
MFC after:	3 days
2007-11-05 10:25:12 +00:00
Maxim Konovalov
7036145b25 o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages().
Submitted by:	Mark Tinguely
2007-11-02 20:48:10 +00:00
Remko Lodder
248a0568e7 Correct a copy and paste'o in phys_pager.c, we are talking about phys here
and not about devices.

PR:		93755
Approved by:	imp (mentor, implicit when re-assigning the ticket to me).
2007-10-30 14:48:13 +00:00
Alan Cox
21f7958604 Change vm_page_cache_transfer() such that it does not transfer pages
that would have an offset beyond the end of the target object.  Such
pages should remain in the source object.

MFC after:	3 days
Diagnosed and reviewed by:	Kostik Belousov
Reported and tested by:		Peter Holm
2007-10-27 00:09:30 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Alan Cox
0ab3c7a594 Correct an error of omission in the reimplementation of the page
cache: vnode_pager_setsize() must handle the case where a file is
truncated to a non-page-size-aligned boundary and there is a cached
page underlying the new end of file.

Reported by:	kris, tegge
Tested by:	kris
MFC after:	3 days
2007-10-22 06:23:46 +00:00
Alan Cox
7b0e72d184 Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed
since revision 1.1.  Specifically, neither traversal of the vm map checks
whether the end of the vm map has been reached.  Consequently, the first
traversal can wrap around and bogusly return an error.

This error has gone unnoticed for so long because no one had ever before
tried msync(2)ing a region above the stack.

Reported by:	peter
MFC after:	1 week
2007-10-22 05:21:05 +00:00
Julian Elischer
3745c395ec Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
Alan Cox
2573269111 The previous revision, updating vm_object_page_remove() for the new page
cache, did not account for the case where the vm object has nothing but
cached pages.

Reported by:	kris, tegge
Reviewed by:	tegge
MFC after:	3 days
2007-10-18 23:02:18 +00:00
Peter Wemm
c899450b21 Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int. 2007-10-18 22:47:39 +00:00
Ruslan Ermilov
8229241a90 Fix CTL_VM_NAMES. 2007-10-16 11:32:57 +00:00
John Baldwin
71eb44c7b1 Allow recursion on the 'zones' internal UMA zone.
Submitted by:	thompsa
MFC after:	1 week
Approved by:	re (kensmith)
Discussed with:	jeff
2007-10-11 20:11:27 +00:00
Konstantin Belousov
4ab8ab9285 Do not dereference NULL pointer.
Reported by:	Peter Holm
Reviewed by:	alc
Approved by:	re (kensmith)
2007-10-08 20:09:53 +00:00
Alan Cox
b8c5048025 In the rare case that vm_page_cache() actually frees the given page,
it must first ensure that the page is no longer mapped.  This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache().  While I'm in the neighborbood, make a related
panic message a little more useful.

Approved by:	re (kensmith)
Reported by:	Peter Holm and Konstantin Belousov
Reviewed by:	Konstantin Belousov
2007-10-08 18:01:38 +00:00
Alan Cox
dc9250f55c Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76.  It occurs
when uma_small_free() frees a page.  The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED.  (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.

Approved by:	re (kensmith)
PR:		116794
2007-10-07 18:03:03 +00:00
Alan Cox
c944491426 Correct an error of omission in the reimplementation of the page
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages.  Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.

Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.

Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
2007-09-27 04:21:59 +00:00
Alan Cox
f3a2ed4bd9 Correct an error in the previous revision, specifically,
vm_object_madvise() should request that the reactivated, cached page
not be busied.

Reported by: Rink Springer
Approved by: re (kensmith)
2007-09-25 21:01:10 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Jeff Roberson
258853ab1c - Redefine p_swtime and td_slptime as p_swtick and td_slptick. This
changes the units from seconds to the value of 'ticks' when swapped
   in/out.  ULE does not have a periodic timer that scans all threads in
   the system and as such maintaining a per-second counter is difficult.
 - Change computations requiring the unit in seconds to subtract ticks
   and divide by hz.  This does make the wraparound condition hz times
   more frequent but this is still in the range of several months to
   years and the adverse effects are minimal.

Approved by:    re
2007-09-21 05:07:07 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00