Commit Graph

2966 Commits

Author SHA1 Message Date
Maksim Yevmenkin
251386b4b2 Tweak condition for disabling allocation from per-CPU buckets in
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.

Reviewed by:	alc
MFC after:	1 week
2012-05-23 18:56:29 +00:00
Konstantin Belousov
4d34e019c4 Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

Submitted by:	Andrey Zonov <andrey zonov org>
MFC after:	1 week
2012-05-23 18:10:54 +00:00
Andriy Gapon
b6062382be vm_pager_object_lookup: small performance optimization
do not needlessly lock an object if its handle doesn't match

Reviewed by:	kib, alc
MFC after:	1 week
2012-05-23 12:51:49 +00:00
Andrew Turner
c415e17250 Fix booting on ARM.
In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past
the end of vm_page_array was incorrect causing it to return NULL. This
value is then used in vm_phys_add_page causing a data abort.

Reviewed by:	alc, kib, imp
Tested by:	stas
2012-05-22 07:04:23 +00:00
Nathan Whitehorn
ccc4a5c761 Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies
range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.
2012-05-20 14:33:28 +00:00
Konstantin Belousov
df2f557df6 Do not double-reference the found vm object in cdev_pager_lookup().
vm_pager_object_lookup() already referenced the object.

Note that there is no in-tree consumers of cdev_pager_lookup(). The
only known user of the function is i915 gem driver, which is not yet
imported. This should make the KPI change minor.

Submitted by:	avg
MFC after:	1 week
2012-05-18 10:23:47 +00:00
Konstantin Belousov
b7ac5a8571 Add new pager type, OBJT_MGTDEVICE. It provides the device pager
which carries fictitous managed pages. In particular, the consumers of
the new object type can remove all mappings of the device page with
pmap_remove_all().

The range of physical addresses used for fake page allocation shall be
registered with vm_phys_fictitious_reg_range() interface to allow the
PHYS_TO_VM_PAGE() to work in pmap.

Most likely, only i386 and amd64 pmaps can handle fictitious managed
pages right now.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:49:58 +00:00
Konstantin Belousov
b6de32bd9b Add a facility to register a range of physical addresses to be used
for allocation of fictitious pages, for which PHYS_TO_VM_PAGE()
returns proper fictitious vm_page_t. The range should be de-registered
after consumer stopped using it.

De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate
over registered ranges.

A hash container might be developed instead of range registration
interface, and fake pages could be put automatically into the hash,
were PHYS_TO_VM_PAGE() could look them up later. This should be
considered before the MFC of the commit is done.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:42:56 +00:00
Konstantin Belousov
e461aae747 Split the code from vm_page_getfake() to initialize the fake page struct
vm_page into new interface vm_page_initfake(). Handle the case of fake
page re-initialization with changed memattr.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:34:22 +00:00
Konstantin Belousov
116c213502 Assert that the page passed to vm_page_putfake() is unmanaged.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:27:51 +00:00
Konstantin Belousov
7900f95d88 Assert that fictitious or unmanaged pages do not appear on
active/inactive lists.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:24:46 +00:00
Konstantin Belousov
13a0b7bcc4 Commit the change forgotten in r235356.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:10:18 +00:00
Konstantin Belousov
0c26bb71f6 Make the vm_page_array_size long. Remove redundand zero initialization
for vm_page_array_size and nearby variablees.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2012-05-12 20:03:06 +00:00
Alan Cox
13458803f4 Give vm_fault()'s sequential access optimization a makeover.
There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again.  This
revision changes both aspects.

The read ahead optimization is now more effective.  It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults.  This can yield increased read bandwidth.  For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.

The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed.  This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.

The unmap and cache behind optimization now clears each page's referenced
flag.  Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.

From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.

Glanced at:	kib
MFC after:	2 weeks
2012-05-10 15:16:42 +00:00
Nathan Whitehorn
0b852c03eb Avoid a lock order reversal in pmap_extract_and_hold() from relocking
the page. This PMAP requires an additional lock besides the PMAP lock
in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.

Suggested by:	kib
MFC after:	4 days
2012-04-22 17:58:30 +00:00
Konstantin Belousov
1472f4f4b9 When MAP_STACK mapping is created, the map entry is created only to
cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent
call to vm_map_wire() to wire the whole stack region fails due to
VM_MAP_WIRE_NOHOLES flag.

Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.

Reported and tested by:	Sushanth Rai <sushanth_rai yahoo com>
Reviewed by:	alc
MFC after:	1 week
2012-04-21 18:36:53 +00:00
Alan Cox
2aa163dc57 As documented in vm_page.h, updates to the vm_page's flags no longer
require the page queues lock.

MFC after:	1 week
2012-04-21 18:26:16 +00:00
Attilio Rao
a0f2c37b6f - Introduce a cache-miss optimization for consistency with other
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.

Reviewed by:	alc
MFC after:	2 weeks
X-MFC:		r234039
2012-04-09 17:05:18 +00:00
Alan Cox
1c8279e4e7 Fix mincore(2) so that it reports PG_CACHED pages as resident.
MFC after:	2 weeks
2012-04-08 18:25:12 +00:00
Alan Cox
908e3da10e If a page belonging a reservation is cached, then mark the reservation so
that it will be freed to the cache pool rather than the default pool.
Otherwise, the cached pages within the reservation may be recycled sooner
than necessary.

Reported by:	Andrey Zonov
2012-04-08 17:00:46 +00:00
Attilio Rao
d1aa86e151 Staticize vm_page_cache_remove().
Reviewed by:	alc
2012-04-06 20:34:00 +00:00
Nathan Whitehorn
57bd5cce62 Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction
caches, by invalidating kernel icaches only when needed and not flushing
user caches for shared pages.

Suggested by:	kib
MFC after:	2 weeks
2012-04-06 16:03:38 +00:00
John Baldwin
35818d2e94 Add new ktrace records for the start and end of VM faults. This gives
a pair of records similar to syscall entry and return that a user can
use to determine how long page faults take.  The new ktrace records are
enabled via the 'p' trace type, and are enabled in the default set of
trace points.

Reviewed by:	kib
MFC after:	2 weeks
2012-04-05 17:13:14 +00:00
Kirk McKusick
1faacf5d09 Keep track of the mount point associated with a special device
to enable the collection of counts of synchronous and asynchronous
reads and writes for its associated filesystem. The counts are
displayed using `mount -v'.

Ensure that buffers used for paging indicate the vnode from
which they are operating so that counts of paging I/O operations
from the filesystem are collected.

This checkin only adds the setting of the mount point for the
UFS/FFS filesystem, but it would be trivial to add the setting
and clearing of the mount point at filesystem mount/unmount
time for other filesystems too.

Reviewed by: kib
2012-03-28 20:49:11 +00:00
Alan Cox
5730afc9b6 Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB.  In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB.  This is exactly as
recommended in AMD's documentation.  In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry.  In short, this saves us from
having to perform potentially costly TLB flushes.  In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry.  Usually, such spurious page faults are handled
by vm_fault() without incident.  However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault().  This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with:	kib
Tested by:		gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after:	1 week
2012-03-22 04:52:51 +00:00
John Baldwin
d6e9b97b3f Bah, just revert my earlier change entirely. (Missed alc's request to do
this earlier.)

Requested by:	alc
2012-03-19 19:06:40 +00:00
John Baldwin
92a5994685 Fix madvise(MADV_WILLNEED) to properly handle individual mappings larger
than 4GB.  Specifically, the inlined version of 'ptoa' of the the 'int'
count of pages overflowed on 64-bit platforms.  While here, change
vm_object_madvise() to accept two vm_pindex_t parameters (start and end)
rather than a (start, count) tuple to match other VM APIs as suggested
by alc@.
2012-03-19 18:47:34 +00:00
John Baldwin
8407f69657 Alter the previous commit to use vm_size_t instead of vm_pindex_t.
vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t,
but a page index instead of a byte offset.
2012-03-19 18:43:44 +00:00
Konstantin Belousov
126d60823a In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag
if the filesystem performed short write and we are skipping the page
due to this.

Propogate write error from the pager back to the callers of
vm_pageout_flush().  Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.

While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.

PR:	kern/165927
Reviewed by:	alc
MFC after:	2 weeks
2012-03-17 23:00:32 +00:00
John Baldwin
df96bc9713 Pedantic nit: use vm_pindex_t instead of long for a count of pages. 2012-03-14 20:57:48 +00:00
John Baldwin
b47f624183 Add KTR_VFS traces to track modifications to a vnode's writecount. 2012-03-08 20:27:20 +00:00
Alan Cox
83cbe16ff4 Eliminate stale incorrect ARGSUSED comments.
Submitted by:	bde
2012-03-02 17:33:51 +00:00
Alan Cox
9ed54e79b5 Simplify kmem_alloc() by eliminating code that existed on account of
external pagers in Mach.  FreeBSD doesn't implement external pagers.
Moreover, it don't pageout the kernel object.  So, the reasons for
having code don't hold.

Reviewed by:	kib
MFC after:	6 weeks
2012-02-29 05:41:29 +00:00
Alan Cox
f9230ad6b8 Simplify vm_mmap()'s control flow.
Add a comment describing what vm_mmap_to_errno() does.

Reviewed by:	kib
MFC after:	3 weeks
X-MFC after:	r232071
2012-02-25 21:06:39 +00:00
Alan Cox
79e538388f Simplify vmspace_fork()'s control flow by copying immutable data before
the vm map locks are acquired.  Also, eliminate redundant initialization
of the new vm map's timestamp.

Reviewed by:	kib
MFC after:	3 weeks
2012-02-25 17:49:59 +00:00
Konstantin Belousov
9d22083da8 Place the if() at the right location, to activate the v_writecount
accounting for shared writeable mappings for all filesystems, not only
for the bypass layers.

Submitted by:	alc
Pointy hat to:	kib
MFC after:	20 days
2012-02-24 10:41:58 +00:00
Konstantin Belousov
84110e7e0b Account the writeable shared mappings backed by file in the vnode
v_writecount.  Keep the amount of the virtual address space used by
the mappings in the new vm_object un_pager.vnp.writemappings
counter. The vnode v_writecount is incremented when writemappings gets
non-zero value, and decremented when writemappings is returned to
zero.

Writeable shared vnode-backed mappings are accounted for in vm_mmap(),
and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on
the created map entry.  During deferred map entry deallocation,
vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and
decrements writemappings for the vm object.

Now, the writeable mount cannot be demoted to read-only while
writeable shared mappings of the vnodes from the mount point
exist. Also, execve(2) fails for such files with ETXTBUSY, as it
should be.

Noted by:	tegge
Reviewed by:	tegge (long time ago, early version), alc
Tested by:	pho
MFC after:	3 weeks
2012-02-23 21:07:16 +00:00
Konstantin Belousov
501f538675 Remove wrong comment.
Discussed with:	alc
MFC after:	3 days
2012-02-22 20:01:38 +00:00
Alan Cox
a649296959 When vm_mmap() is used to map a vm object into a kernel vm_map, it
makes no sense to check the size of the kernel vm_map against the
user-level resource limits for the calling process.

Reviewed by:	kib
2012-02-16 06:45:51 +00:00
Konstantin Belousov
8211bd45bc Close a race due to dropping of the map lock between creating map entry
for a shared mapping and marking the entry for inheritance.
Other thread might execute vmspace_fork() in between (e.g. by fork(2)),
resulting in the mapping becoming private.

Noted and reviewed by:	alc
MFC after:	1 week
2012-02-11 17:29:07 +00:00
Ed Schouten
7870adb640 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
Alexander Motin
8f12d83ad9 Fix NULL dereference panic on attempt to turn off (on system shutdown)
disconnected swap device.

This is quick and imperfect solution, as swap device will still be opened
and GEOM will not be able to destroy it. Proper solution would be to
automatically turn off and close disconnected swap device, but with existing
code it will cause panic if there is at least one page on device, even if
it is unimportant page of the user-level process. It needs some work.

Reviewed by:	kib@
MFC after:	1 week
2012-02-01 20:12:44 +00:00
Kip Macy
263811f724 exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create

Reviewed by:	alc, avg
MFC after:	2 weeks
2012-01-27 20:18:31 +00:00
Nathan Whitehorn
8d01a3b281 Revert r212360 now that PowerPC can handle large sparse arguments to
pmap_remove() (changed in r228412).

MFC after:	2 weeks
2012-01-17 00:31:09 +00:00
Konstantin Belousov
5dda2db9c8 Change the type of the paging_in_progress refcounter from u_short to
u_int. With the auto-sized buffer cache on the modern machines, UFS
metadata can generate more the 65535 pages belonging to the buffers
undergoing i/o, overflowing the counter.

Reported and tested by:	jimharris
Reviewed by:	alc
MFC after:	1 week
2012-01-10 18:05:44 +00:00
Konstantin Belousov
e65919f9fc Do not restart the scan in vm_object_page_clean() on the object
generation change if requested mode is async. The object generation is
only changed when the object is marked as OBJ_MIGHTBEDIRTY. For async
mode it is enough to write each dirty page, not to make a guarantee that
all pages are cleared after the vm_object_page_clean() returned.

Diagnosed by:	truckman
Tested by:	flo
Reviewed by:	alc, truckman
MFC after:	2 weeks
2012-01-04 16:04:20 +00:00
Alan Cox
b5f359b7c3 Optimize vm_object_split()'s handling of reservations. 2011-12-28 20:27:18 +00:00
Konstantin Belousov
75ff604a78 Optimize the common case of msyncing the whole file mapping with
MS_SYNC flag. The system must guarantee that all writes are finished
before syscalls returned. Schedule the writes in async mode, which is
much faster and allows the clustering to occur. Wait for writes using
VOP_FSYNC(), since we are syncing the whole file mapping.

Potentially, the restriction to only apply the optimization can be
relaxed by not requiring that the mapping cover whole file, as it is
done by other OSes.

Reported and tested by:	 az
Reviewed by: alc
MFC after:   2 weeks
2011-12-23 09:09:42 +00:00
Konstantin Belousov
e878d99718 Move kstack_cache_entry into the private header, and make the
stack cache list header accessible outside vm_glue.c.

MFC after:	1 week
2011-12-16 10:56:16 +00:00
Eitan Adler
33fd7c5628 - The previous commit (r228449) accidentally moved the vm.stats.vm.* sysctls
to vm.stats.sys.  Move them back.

Noticed by:		pho
Reviewed by:	bde (earlier version)
Approved by:	bz
MFC after:		1 week
Pointy hat to:	me
2011-12-14 13:25:00 +00:00
Eitan Adler
3eb9ab5255 Document a large number of currently undocumented sysctls. While here
fix some style(9) issues and reduce redundancy.

PR:		kern/155491
PR:		kern/155490
PR:		kern/155489
Submitted by:	Galimov Albert <wtfcrap@mail.ru>
Approved by:	bde
Reviewed by:	jhb
MFC after:	1 week
2011-12-13 00:38:50 +00:00
Konstantin Belousov
134465d732 Fix printf.
Submitted by:	az
MFC after:	1 week
2011-12-12 10:04:04 +00:00
Alan Cox
c68c35372e Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to
use superpage reservations.  So, for the first time, kernel virtual memory
that is allocated by contigmalloc(), kmem_alloc_attr(), and
kmem_alloc_contig() can be promoted to superpages.  In fact, even a series
of small contigmalloc() allocations may collectively result in a promoted
superpage.

Eliminate some duplication of code in vm_reserv_alloc_page().

Change the type of vm_reserv_reclaim_contig()'s first parameter in order
that it be consistent with other vm_*_contig() functions.

Tested by:	marius (sparc64)
2011-12-05 18:29:25 +00:00
Konstantin Belousov
dc874f9881 Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by:	attilio, alc
2011-11-30 17:39:00 +00:00
Konstantin Belousov
cf1911a9ad Hide the internals of vm_page_lock(9) from the loadable modules.
Since the address of vm_page lock mutex depends on the kernel options,
it is easy for module to get out of sync with the kernel.

No vm_page_lockptr() accessor is provided for modules. It can be added
later if needed, unless proper KPI is developed to serve the needs.

Reviewed by:	attilio, alc
MFC after:	3 weeks
2011-11-29 13:07:32 +00:00
Attilio Rao
9fde98bba3 Introduce the same mutex-wise fix in r227758 for sx locks.
The functions that offer file and line specifications are:
- sx_assert_
- sx_downgrade_
- sx_slock_
- sx_slock_sig_
- sx_sunlock_
- sx_try_slock_
- sx_try_xlock_
- sx_try_upgrade_
- sx_unlock_
- sx_xlock_
- sx_xlock_sig_
- sx_xunlock_

Now vm_map locking is fully converted and can avoid to know specifics
about locking procedures.
Reviewed by:	kib
MFC after:	1 month
2011-11-21 12:59:52 +00:00
Attilio Rao
ccdf233323 Introduce macro stubs in the mutex implementation that will be always
defined and will allow consumers, willing to provide options, file and
line to locking requests, to not worry about options redefining the
interfaces.
This is typically useful when there is the need to build another
locking interface on top of the mutex one.

The introduced functions that consumers can use are:
- mtx_lock_flags_
- mtx_unlock_flags_
- mtx_lock_spin_flags_
- mtx_unlock_spin_flags_
- mtx_assert_
- thread_lock_flags_

Spare notes:
- Likely we can get rid of all the 'INVARIANTS' specification in the
  ppbus code by using the same macro as done in this patch (but this is
  left to the ppbus maintainer)
- all the other locking interfaces may require a similar cleanup, where
  the most notable case is sx which will allow a further cleanup of
  vm_map locking facilities
- The patch should be fully compatible with older branches, thus a MFC
  is previewed (infact it uses all the underlying mechanisms already
  present).

Comments review by:	eadler, Ben Kaduk
Discussed with:		kib, jhb
MFC after:	1 month
2011-11-20 16:33:09 +00:00
Alan Cox
5ff276b7f4 Eliminate end-of-line white space. 2011-11-17 06:54:49 +00:00
Alan Cox
fbd80bd047 Refactor the code that performs physically contiguous memory allocation,
yielding a new public interface, vm_page_alloc_contig().  This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig().  For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space.  It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages.  Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary.  That said, at present, this new function can't be applied to all
types of vm objects.  However, that restriction will be eliminated in the
coming weeks.

From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't.  Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations.  Now, vm_page_alloc_contig() is responsible
for these things.

Reviewed by:	kib
Discussed with:	jhb
2011-11-16 16:46:09 +00:00
Konstantin Belousov
286790a7dd Update the device pager interface, while keeping the compatibility
layer for old KPI and KBI.  New interface should be used together with
d_mmap_single cdevsw method.

Device pager can be allocated with the cdev_pager_allocate(9)
function, which takes struct cdev_pager_ops, containing
constructor/destructor and page fault handler methods supplied by
driver.

Constructor and destructor, called at the pager allocation and
deallocation time, allow the driver to handle per-object private data.

The pager handler is called to handle page fault on the vm map entry
backed by the driver pager. Driver shall return either the vm_page_t
which should be mapped, or error code (which does not cause kernel
panic anymore). The page handler interface has a placeholder to
specify the access mode causing the fault, but currently PROT_READ is
always passed there.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc
MFC after:	1 month
2011-11-15 14:40:00 +00:00
Konstantin Belousov
bf277cf450 Remove the condition that is always true.
Submitted by:	alc
MFC after:	1 week
2011-11-15 14:09:53 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Alan Cox
c835bd16a8 Wake up the page daemon in vm_page_alloc_freelist() if it couldn't
allocate the requested page because too few pages are cached or free.

Document the VM_ALLOC_COUNT() option to vm_page_alloc() and
vm_page_alloc_freelist().

Make style changes to vm_page_alloc() and vm_page_alloc_freelist(),
such as using a variable name that more closely corresponds to the
comments.
2011-11-06 02:03:27 +00:00
Konstantin Belousov
7845becfe8 Remove redundand definitions. The chunk was missed from r227102.
MFC after:	2 weeks
2011-11-05 09:03:18 +00:00
Konstantin Belousov
561cc9fcb5 Provide typedefs for the type of bit mask for the page bits.
Use the defined types instead of int when manipulating masks.
Supposedly, it could fix support for 32KB page size in the
machine-independend VM layer.

Reviewed by:	alc
MFC after:	2 weeks
2011-11-05 08:20:32 +00:00
Alan Cox
2614c5c47c Simplify the implementation of the failure case in kmem_alloc_attr(). 2011-11-04 04:41:58 +00:00
John Baldwin
936c09ac0f Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region.  It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types.  The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE().  These modes are
thus filesystem independent.  Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used.  These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request.  This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf().  This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode.  This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue.  The second change adds
a new vm_object_page_cache() method.  This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by:	jilles, kib, arch@
Approved by:	re (kib)
MFC after:	1 month
2011-11-04 04:02:50 +00:00
Alan Cox
8393768074 Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist()
and use these new options in the mips pmap.

Wake up the page daemon in vm_page_alloc_freelist() if the number of free
and cached pages becomes too low.

Tidy up vm_page_alloc_init().  In particular, add a comment about an
important restriction on its use.

Tested by:	jchandra@
2011-11-02 05:42:51 +00:00
Alan Cox
5c1f2cc4c2 Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at
eliminating duplicated code in the various pmap implementations.

Micro-optimize vm_phys_free_pages().

Introduce vm_phys_free_contig().  It is fast routine for freeing an
arbitrary number of physically contiguous pages.  In particular, it
doesn't require the number of pages to be a power of two.

Use "u_long" instead of "unsigned long".

Bruce Evans (bde@) has convinced me that the "boundary" parameters
to kmem_alloc_contig(), vm_phys_alloc_contig(), and
vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not
"u_long".  Make this change.
2011-10-30 05:06:14 +00:00
Alan Cox
1933a67cf4 Use "u_long" instead of "unsigned long". 2011-10-28 22:36:15 +00:00
Alan Cox
125b695b6e Tidy up the comment at the head of vm_page_alloc, and mention that the
returned page has the flag VPO_BUSY set.
2011-10-27 17:29:19 +00:00
Alan Cox
703dec68bf Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc().  While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.
2011-10-27 16:39:17 +00:00
Alan Cox
f346986b76 contigmalloc(9) and contigfree(9) are now implemented in terms of other
more general VM system interfaces.  So, their implementation can now
reside in kern_malloc.c alongside the other functions that are declared
in malloc.h.
2011-10-27 02:52:24 +00:00
Alan Cox
9c60ca3238 Speed up vm_page_cache() and vm_page_remove() by checking for a few
common cases that can be handled in constant time.  The insight being
that a page's parent in the vm object's tree is very often its
predecessor or successor in the vm object's ordered memq.

Tested by:	jhb
MFC after:	10 days
2011-10-25 16:35:08 +00:00
Attilio Rao
2d5106600e VN_NRESERVLEVEL is used in this file but opt_vm is not included
thus the stub switch won't be correctly handled.
Include opt_vm.h.

Submitted by:	jeff
MFC after:	3 days
2011-10-22 22:00:35 +00:00
Konstantin Belousov
126b36a21e Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

Reviewed by:	marcel
2011-10-15 12:35:18 +00:00
John Baldwin
9860134635 Fix a typo in a comment. 2011-10-14 11:48:32 +00:00
Marcel Moolenaar
5f81660285 In sys_obreak() and when compiling for amd64 or ia64, when the process
is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x
depends on being able to execute from the heap on i386.
2011-10-13 16:20:10 +00:00
Gleb Smirnoff
8d689e042f Make memguard(9) capable to guard uma(9) allocations. 2011-10-12 18:08:28 +00:00
Konstantin Belousov
17514c1bd9 Style nit.
Submitted by:	jhb
MFC after:	2 weeks
2011-09-29 00:44:34 +00:00
Konstantin Belousov
2042bb377a Fix grammar.
Submitted by:	bf
MFC after:	2 weeks
2011-09-28 16:12:15 +00:00
Konstantin Belousov
abb9b935ca Use the trick of performing the atomic operation on the contained aligned
word to handle the dirty mask updates in vm_page_clear_dirty_mask().
Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold()
the sole purpose of which was to protect dirty on architectures which
does not provide short or byte-wide atomics.

Reviewed by:	alc, attilio
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 14:57:50 +00:00
Konstantin Belousov
005f609130 Use the explicitly-sized types for the dirty and valid masks.
Requested by:	attilio
Reviewed by:	alc
MFC after:	2 weeks
2011-09-28 14:51:28 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Konstantin Belousov
3407fefef6 Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
Konstantin Belousov
15523cf799 Update some comments in swap_pager.c.
Reviewed and most wording by:	alc
MFC after:	1 week
Approved by:	re (bz)
2011-08-22 20:44:18 +00:00
Konstantin Belousov
6e903bd0d6 Apply the limit to avoid the overflows in the radix tree subr_blist.c
after the conversion of the swap device size to the page size units,
not before. That lifts the limit on the usable swap partition size
from 32GB to 256GB, that is less depressing for the modern systems.

Submitted by:   Alexander V. Chernikov <melifaro ipfw ru>
Reviewed by:    alc
Approved by:	re (bz)
MFC after:      2 weeks
2011-08-22 11:18:47 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Konstantin Belousov
d98d0ce27a - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
Alan Cox
12f4b65fa6 Fix an error in kmem_alloc_attr(). Unless "tries" is updated,
kmem_alloc_attr() could get stuck in a loop.

Approved by:	re (kib)
MFC after:	3 days
2011-08-07 00:11:39 +00:00
Konstantin Belousov
dda4f96087 Implement the linprocfs swaps file, providing information about the
configured swap devices in the Linux-compatible format.

Based on the submission by:	Robert Millan <rmh debian org>
PR:	kern/159281
Reviewed by:	bde
Approved by:	re (kensmith)
MFC after:	2 weeks
2011-08-01 19:12:15 +00:00
Konstantin Belousov
339772b003 Fix a race in the device pager allocation. If another thread won and
allocated the device pager for the given handle, then the object
fictitious pages list and the object membership in the global object
list still need to be initialized. Otherwise, dev_pager_dealloc() will
traverse uninitialized pointers.

Reported and tested by: pho
Reviewed by:    jhb
Approved by:	re (kensmith)
MFC after:      1 week
2011-07-30 14:13:57 +00:00
Konstantin Belousov
2e32165ce0 Extract the code to translate VM error into errno, into an exported
function vm_mmap_to_errno(). It is useful for the drivers that implement
mmap(2)-like functionality, to be able to return error codes consistent
with mmap(2).

Sponsored by:	The FreeBSD Foundation
No objections from:	alc
MFC after:	1 week
2011-07-10 20:49:13 +00:00
Konstantin Belousov
3103730c82 Style.
MFC after:	3 days
2011-07-10 20:45:13 +00:00
Konstantin Belousov
2801687d56 Add a facility to disable processing page faults. When activated,
uiomove generates EFAULT if any accessed address is not mapped, as
opposed to handling the fault.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	alc (previous version)
2011-07-09 15:21:10 +00:00
Edward Tomasz Napierala
afcc55f318 All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0.  This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".
2011-07-06 20:06:44 +00:00
Attilio Rao
91a1929f07 Handle a race between device_pager and devsw in a more graceful manner:
return an error code rather than panic the kernel.

Sponsored by:	Sandvine Incorporated
Reviewed by:	kib
Tested by:	pho
MFC after:	2 weeks
2011-07-06 15:09:52 +00:00
Alan Cox
a8229fa37c Initialize marker pages as held rather than fictitious/wired. Marking the
page as held is more useful as a safety precaution in case someone forgets
to check for PG_MARKER.

Reviewed by:	kib
2011-07-02 23:34:47 +00:00
Alan Cox
6bbee8e28a Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
Alan Cox
1bfec3dfb6 Revert to using the page queues lock in vm_page_clear_dirty_mask() on
MIPS.  (At present, although atomic_clear_char() is defined by atomic.h
on MIPS, it is not actually implemented by support.S.)
2011-06-23 05:23:59 +00:00