invalidation code cannot wait for paging to complete while holding a
vnode lock, so we don't wait. Instead we simply allow the lower level
code to simply block on any busy pages it encounters. I think Yahoo
may be the only entity in the entire world that actually uses this
msync feature :-).
Bug reported by: Paul Saab <paul@mu.org>
madvise().
This feature prevents the update daemon from gratuitously flushing
dirty pages associated with a mapped file-backed region of memory. The
system pager will still page the memory as necessary and the VM system
will still be fully coherent with the filesystem. Modifications made
by other means to the same area of memory, for example by write(), are
unaffected. The feature works on a page-granularity basis.
MAP_NOSYNC allows one to use mmap() to share memory between processes
without incuring any significant filesystem overhead, putting it in
the same performance category as SysV Shared memory and anonymous memory.
Reviewed by: julian, alc, dg
from vm_map_pageable(). At the point they called, vm_map_pageable()
holds a read (or shared) lock on the map. The purpose
of vm_map_{clear,set}_recursive() is to disable/enable repeated
write (or exclusive) lock requests by the same process.
vm_map always failed because vm_map_lookup() looked at
"vm_map_entry->wired_count" instead of "(vm_map_entry->eflags &
MAP_ENTRY_USER_WIRED)". The effect was that many page
wiring operations by sysctl were (silently) failing.
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.
This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
A complete rewrite by dillon and myself to separate
the implementation of behaviors that effect the vm_map_entry
from those that effect the vm_object.
A result of this change is that madvise(..., MADV_FREE);
is much cheaper.
Now that behaviors are stored in the vm_map_entry rather than
the vm_object, it's no longer necessary to instantiate a vm_object
just to hold the behavior.
Reviewed by: dillon
When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.
The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.
PR: kern/12378
Submitted by: tegge
Reviewed by: dfr
vm_map.c:
Don't set OBJ_ONEMAPPING on arbitrary vm objects. Only default
and swap type vm objects should have it set. vm_object_deallocate
already handles these cases.
vm_object.c:
If OBJ_ONEMAPPING isn't already clear in vm_object_shadow,
we are in trouble. Instead of clearing it, make it
an assertion that it is already clear.
creating a new entry. vm_map_stack and vm_map_growstack can panic when
a new entry isn't created. Fixed vm_map_stack and vm_map_growstack.
Also, when extending the stack, always set the protection to VM_PROT_ALL.
Remove a useless argument from vm_map_madvise's interface (vm_map.c,
vm_map.h, and vm_mmap.c).
Remove a redundant test in vm_uiomove (vm_map.c).
Make two changes to vm_object_coalesce:
1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)
2. Free any swap space allocated to removed pages.
It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices.
Reviewed by: David Greenman <dg@root.com>
1. Don't bother checking object->ref_count == 1 in order to set
OBJ_ONEMAPPING. It's a waste of time. If object->ref_count == 1,
vm_map_entry_delete will "run-down" the object and its pages.
2. If object->ref_count == 1, ignore OBJ_ONEMAPPING. Wait for
vm_map_entry_delete to "run-down" the object and its pages.
Otherwise, we're calling two different procedures to delete
the object's pages.
Note: "vmstat -s" will once again show a non-zero value
for "pages freed by exiting processes".
Remove more (redundant) map timestamp increments from properly
synchronized routines. (Changed: vm_map_entry_link, vm_map_entry_unlink,
and vm_map_pageable.)
Micro-optimize vm_map_entry_link and vm_map_entry_unlink, eliminating
unnecessary dereferences. At the same time, converted them from macros
to inline functions.
In general, vm_map_simplify_entry should be performed INSIDE
the loop that traverses the map, not outside. (Changed:
vm_map_inherit, vm_map_pageable.)
vm_fault_unwire doesn't acquire the map lock (or block holding
it). Thus, vm_map_set/clear_recursive shouldn't be called.
(Changed: vm_map_user_pageable, vm_map_pageable.)
lock) until it actually needs to modify the vm_map.
Note: it is legal to modify vm_map::hint without holding a write lock.
Submitted by: "Richard Seaman, Jr." <dick@tar.com> with minor changes
by myself.
Fix bug where an object's OBJ_WRITEABLE/OBJ_MIGHTBEDIRTY flags do
not get set under certain circumstances ( page rename case ).
Reviewed by: Alan Cox <alc@cs.rice.edu>, John Dyson
is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>
OBJ_ONEMAPPING in the case where an object is extended by an
additional vm_map_entry must be allocated.
In vm_object_madvise(), remove calll to vm_page_cache() in MADV_FREE
case in order to avoid a page fault on page reuse. However, we still
mark the page as clean and destroy any swap backing store.
Submitted by: Alan Cox <alc@cs.rice.edu>
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.
The vm_map_insert()/vm_object_coalesce() optimization has been extended
to include OBJT_SWAP objects as well as OBJT_DEFAULT objects. This is
possible because it costs nothing to extend an OBJT_SWAP object with
the new swapper. We can't do this with the old swapper. The old swapper
used a linear array that would have had to have been reallocated, costing
time as well as a potential low-memory deadlock.
in vm_map_simplify_entry. Basically, once you've verified that
the objects in the adjacent vm_map_entry's are the same, either
NULL or the same vm_object, there's no point in checking that the
objects have the same behavior.
Obtained from: Alan Cox <alc@cs.rice.edu>
Checked by: "Richard Seaman, Jr." <dick@tar.com>
Fix the following problem:
As the code stands now, growing any stack, and not just the process's
main stack, modifies vm->vm_ssize. This is inconsistent with the code
earlier in the same procedure.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.
As the author says:
|I ran into a problem pulling out the VM_STACK option. I was aware of this
|when I first did the work, but then forgot about it. The VM_STACK stuff
|has some code changes in the i386 branch. There need to be corresponding
|changes in the alpha branch before it can come out completely.
what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in. This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches. The vm_map_entry will then always have one
|extra element (avail_ssize). It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c. This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch. I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
about conversions of objects to OBJT_SWAP, it is done automatically
now.
Replaced manually inserted code with inline calls for busy waiting on
pages, which also incidently fixes a potential PG_BUSY race due to
the code not running at splvm().
vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.
expected. This bug caused builds of Modula-3 to fail in mysterious
ways on SMP kernels. More precisely, such builds failed on systems
with kern.fast_vfork equal to 0, the default and only supported
value for SMP kernels.
PR: kern/7468
Submitted by: tegge (Tor Egge)