Commit Graph

2584 Commits

Author SHA1 Message Date
alc
b6a248b75a Correct an error in vm_fault_copy_entry() that has existed since the first
version of this file.  When a process forks, any wired pages are immediately
copied because copy-on-write is not supported for wired pages.  In other
words, the child process is given its own private copy of each wired page
from its parent's address space.  Unfortunately, to date, these copied pages
have been mapped into the child's address space with the wrong permissions,
typically VM_PROT_ALL.  This change corrects the permissions.

Reviewed by:	kib
2009-10-31 17:39:56 +00:00
kib
feb999713b When protection of wired read-only mapping is changed to read-write,
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.

Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.

Reviewed by:	alc
MFC after:	3 weeks
2009-10-27 10:15:58 +00:00
alc
d4f827eb7a Simplify the inner loop of vm_fault_copy_entry().
Reviewed by:	kib
2009-10-26 00:01:52 +00:00
alc
9911b79277 Eliminate an unnecessary check from vm_fault_prefault(). 2009-10-25 17:30:50 +00:00
marcel
51bb720939 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
kib
04ed7ad878 Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).
Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is
actually exceed.

Both changes aim at calling priv_check(9) only for the cases when
privilege is actually exercised by the process.

Reported and tested by:	rwatson
Reviewed by:	alc
MFC after:	3 days
2009-10-18 12:55:39 +00:00
alc
dce82c729a Align and pad the page queue and free page queue locks so that the linker
can't possibly place them together within the same cache line.

MFC after:	3 weeks
2009-10-04 18:53:10 +00:00
bz
3b88ee8187 Back out the functional parts from r197537. After r197711, affecting all
user mappings, mmap no longer needs special treatment.
2009-10-02 17:51:46 +00:00
kib
75708a4b21 Move the annotation for vm_map_startup() immediately before the function.
MFC after:	3 days
2009-10-01 12:48:35 +00:00
simon
a0b7c793b4 Do not allow mmap with the MAP_FIXED argument to map at address zero.
This is done to make it harder to exploit kernel NULL pointer security
vulnerabilities.  While this of course does not fix vulnerabilities,
it does mitigate their impact.

Note that this may break some applications, most likely emulators or
similar, which for one reason or another require mapping memory at
zero.

This restriction can be disabled with the security.bsd.mmap_zero
sysctl variable.

Discussed with:	rwatson, bz
Tested by:	bz (Wine), simon (VirtualBox)
Submitted by:	jhb
2009-09-27 14:49:51 +00:00
kib
c04bcd3033 Old (a.out) rtld attempts to mmap zero-length region, e.g. when bss
of the linked object is zero-length. More old code assumes that mmap
of zero length returns success.

For a.out and pre-8 ELF binaries, allow the mmap of zero length.

Reported by:	tegge
Reviewed by:	tegge, alc, jhb
MFC after:	3 days
2009-09-20 12:40:56 +00:00
kib
bae5df8cbb Reintroduce the r196640, after fixing the problem with my testing.
Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].

This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.

Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.

Submitted by:	peter [1]
Discussed with:	jhb, julian, peter
Reviewed by:	jhb
Tested by:	pho (and retested according to new test scenarious)
MFC after:	1 week
2009-09-01 11:41:51 +00:00
kib
d105721a22 Reverse r196640 and r196644 for now. 2009-08-29 21:53:08 +00:00
kib
9e8ade6852 Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].

This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.

Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.

Submitted by:	peter [1]
Discussed with:	jhb, julian, peter
Reviewed by:	jhb
Tested by:	pho
MFC after:	1 week
2009-08-29 13:28:02 +00:00
jhb
73fabff57d Mark the fake pages constructed by the OBJT_SG pager valid. This was
accidentally lost at one point during the PAT development.  Without this
fix vm_pager_get_pages() was zeroing each of the pages.

Submitted by:	czander @ NVidia
MFC after:	3 days
2009-08-29 02:17:40 +00:00
jhb
7b069e86c6 Extend the device pager to support different memory attributes on different
pages in an object.
- Add a new variant of d_mmap() currently called d_mmap2() which accepts
  an additional in/out parameter that is the memory attribute to use for
  the requested page.
- A driver either uses d_mmap() or d_mmap2() for all requests but not both.
  The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate
  that the driver provides a d_mmap2() handler instead of d_mmap().  This
  is done to make the change ABI compatible with existing drivers and
  MFC'able to 7 and 8.

Submitted by:	alc
MFC after:	1 month
2009-08-28 14:06:55 +00:00
jhb
b6b550b7e4 Remove debugging that crept in with previous commit.
Reported by:	nwhitehorn
Approved by:	re (kib)
2009-07-24 15:06:49 +00:00
jhb
44220d7e1e Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
alc
6e8105e16a Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386.  Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages.  Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache.  Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address.  For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it.  It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by:	re (kib)
2009-07-19 21:40:19 +00:00
alc
40432bac3b An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc().  It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by:	re (kib)
2009-07-18 01:50:05 +00:00
jhb
d81f73fcb5 - Change mmap() to fail requests with EINVAL that pass a length of 0. This
behavior is mandated by POSIX.
- Do not fail requests that pass a length greater than SSIZE_MAX
  (such as > 2GB on 32-bit platforms).  The 'len' parameter is actually
  an unsigned 'size_t' so negative values don't really make sense.

Submitted by:	Alexander Best  alexbestms at math.uni-muenster.de
Reviewed by:	alc
Approved by:	re (kib)
MFC after:	1 week
2009-07-14 19:45:36 +00:00
alc
ea60573817 Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
kib
871f788d79 When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounters
non-readable and non-executable map entry, the entry is skipped from
wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not
set for the map entry, its wired_count is later erronously decremented.
vm_map_delete(9) for such map entry stuck in "vmmaps".

Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop.

Reported by:	John Marshall <john.marshall riverwillow com au>
Approved by:	re (kensmith)
2009-07-12 12:37:38 +00:00
kib
af8ce5a988 When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.

Reported by:	Greg Rivers <gcr+freebsd-current tharned org>
Reviewed by:	alc (previous version)
Approved by:	re (kensmith)
2009-07-03 22:17:37 +00:00
kib
25ffa6178c Eliminiate code duplication by calling vm_object_destroy()
from vm_object_collapse().

Requested and reviewed by:	alc
Approved by:	re (kensmith)
2009-06-28 08:42:17 +00:00
alc
91cafd48b1 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
kib
a7a5954511 Change the type of uio_resid member of struct uio from int to ssize_t.
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.

Tested by:	pho
Reviewed by:	jhb, bde (as part of the review of the bigger patch)
2009-06-25 18:46:30 +00:00
kib
3dfa8ddd10 Initialize the uip to silence gcc warning that seems to sneak in in some
build environments.

Reported by:	alc, bf1783 at googlemail com
2009-06-24 09:26:33 +00:00
alc
323707be16 The bits set in a page's dirty mask are a subset of the bits set in its
valid mask.  Consequently, there is no need to perform a bit-wise and of
the page's dirty and valid masks in order to determine which parts of a
page are dirty and valid.

Eliminate an unnecessary #include.
2009-06-24 04:45:03 +00:00
kib
fa686c638e Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
alc
73f600f681 Validate the page in one place, dev_pager_getpages(), rather than doing it
in two places, dev_pager_getfake() and dev_pager_updatefake().

Compare a pointer to "NULL" rather than "0".
2009-06-22 19:09:48 +00:00
alc
d092e8e13d Implement a mechanism within vm_phys_alloc_contig() to defer all necessary
calls to vdrop() until after the free page queues lock is released.  This
eliminates repeatedly releasing and reacquiring the free page queues lock
each time the last cached page is reclaimed from a vnode-backed object.
2009-06-21 20:29:14 +00:00
alc
7b05ffed76 Strive for greater consistency among the places that implement real,
fictious, and contiguous page allocation.  Eliminate unnecessary
reinitialization of a page's fields.
2009-06-21 00:21:33 +00:00
thompsa
f3a1b951fc Track the kernel mapping of a physical page by a new entry in vm_page
structure. When the page is shared, the kernel mapping becomes a special
type of managed page to force the cache off the page mappings. This is
needed to avoid stale entries on all ARM VIVT caches, and VIPT caches
with cache color issue.

Submitted by:	Mark Tinguely
Reviewed by:	alc
Tested by:	Grzegorz Bernacki, thompsa
2009-06-18 20:42:37 +00:00
alc
d26e62824b Add support for UMA_SLAB_KERNEL to page_free(). (While I'm here remove an
unnecessary newline character from the end of two panic messages.)
2009-06-18 07:27:11 +00:00
alc
31dc96eed0 Eliminate unnecessary forward declarations. 2009-06-17 20:12:23 +00:00
alc
b4d6ffe1f5 Refactor contigmalloc() into two functions: a simple front-end that deals
with the malloc tag and calls a new back-end, kmem_alloc_contig(), that
allocates the pages and maps them.

The motivations for this change are two-fold: (1) A cache mode parameter
will be added to kmem_alloc_contig().  In other words, kmem_alloc_contig()
will be extended to support the allocation of memory with caller-specified
caching. (2) The UMA allocation function that is used by the two jumbo
frames zones can use kmem_alloc_contig() in place of contigmalloc() and
thereby avoid having free jumbo frames held by the zone counted as live
malloc()ed memory.
2009-06-17 17:19:48 +00:00
alc
bb2f6a2f79 Pass the size of the mapping to contigmapping() as a "vm_size_t" rather
than a "vm_pindex_t".  A "vm_size_t" is more convenient for it to use.
2009-06-17 07:11:38 +00:00
alc
cb610fa25d Make the maintenance of a page's valid bits by contigmalloc() more like
kmem_alloc() and kmem_malloc().  Specifically, defer the setting of the
page's valid bits until contigmapping() when the mapping is known to be
successful.
2009-06-17 04:57:32 +00:00
alc
07cfd3813e Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt().  This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous.  Unfortunately, this is not always the case.  For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption.  Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous.  This
revision also changes some inconsistent if not buggy behavior.  For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid.  However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page.  The amd64 version has a bug of
my own creation.  It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance.  I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms.  However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity).  These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with:	jhb@
2009-06-14 19:51:43 +00:00
alc
595149bf90 Eliminate an unnecessary clearing of a page's dirty bits in
phys_pager_getpages().
2009-06-13 20:58:12 +00:00
alc
dd8ed7c8db Eliminate an unnecessary restriction on the vm object type from
vm_map_pmap_enter().  The immediate effect of this change is that automatic
prefaulting by mmap() for small mappings is performed on POSIX shared memory
objects just the same as it is on ordinary files.
2009-06-09 17:04:39 +00:00
alc
919e3cbf28 Eliminate unnecessary obfuscation when testing a page's valid bits. 2009-06-07 19:38:26 +00:00
alc
d18a094f47 Eliminate an unneeded forward declaration. (This should have been removed
in revision 1.42.)
2009-06-06 21:23:29 +00:00
alc
569ccdf52b If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check
the page's valid bits.  The page is guaranteed to be fully valid.  (For the
record, this is documented in vm/vm_pager.h's comments.)
2009-06-06 20:13:14 +00:00
alc
24bb8c9e98 vm_thread_swapin() needn't validate any pages. The pages are already
validated by vm_pager_get_pages().
2009-06-05 17:06:20 +00:00
alc
d419d0f3dc Simplify contigfree(). 2009-06-05 16:55:10 +00:00
rwatson
f4934662e5 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
alc
4a00409486 Correct a boundary case error in the management of a page's dirty bits by
shm_dotruncate() and vnode_pager_setsize().  Specifically, if the length of
a shared memory object or a file is truncated such that the length modulo
the page size is between 1 and 511, then all of the page's dirty bits were
cleared.  Now, a dirty bit is cleared only if the corresponding block is
truncated in its entirety.
2009-06-02 08:02:27 +00:00
jhb
fea04a3fd1 Add an extension to the character device interface that allows character
device drivers to use arbitrary VM objects to satisfy individual mmap()
requests.
- A new d_mmap_single(cdev, &foff, objsize, &object, prot) callback is
  added to cdevsw.  This function is called for each mmap() request.
  If it returns ENODEV, then the mmap() request will fall back to using
  the device's device pager object and d_mmap().  Otherwise, the method
  can return a VM object to satisfy this entire mmap() request via
  *object.  It can also modify the starting offset into this object via
  *foff.  This allows device drivers to use the file offset as a cookie
  to identify specific VM objects.
- vm_mmap_vnode() has been changed to call vm_mmap_cdev() directly when
  mapping V_CHR vnodes.  This avoids duplicating all the cdev mmap
  handling code and simplifies some of vm_mmap_vnode().
- D_VERSION has been bumped to D_VERSION_02.  Older device drivers
  using D_VERSION_01 are still supported.

MFC after:	1 month
2009-06-01 21:32:52 +00:00