Commit Graph

3375 Commits

Author SHA1 Message Date
alc
8a01505f5e The M_ZERO can be eliminated from the uma_zalloc() call in
vm_radix_node_get() with a small change to vm_radix_reclaim_allnodes_int().
This change further reduced the average number of cycles per
vm_page_insert() call from 532 to 519.

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-17 16:49:37 +00:00
alc
9e48bd7ba9 Most allocation of pages to objects proceeds from lower to higher
indices.  Consequentially, vm_page_insert() should use
vm_radix_lookup_le() instead of vm_radix_lookup_ge().  Here's why.  In
the expected case, vm_radix_lookup_le() will quickly find a page less
than the specified key at the same radix node.  In contrast,
vm_radix_lookup_ge() is expected to return NULL, but to do that it must
examine every slot in the radix tree that is greater than the key.

Prior to this change, the average cost of a vm_page_insert() call on my
test machine was 992 cycles.  After this change, the average cost is only
532 cycles, a reduction of 46%.

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-17 16:23:19 +00:00
alc
b346e448af Simplify the interface to vm_radix_insert() by eliminating the parameter
"index".  The content of a radix tree leaf, or at least its "key", is not
opaque to the other radix tree operations.  Specifically, they know how to
extract the "key" from a leaf.  So, eliminating the parameter "index" isn't
breaking the abstraction.  Moreover, eliminating the parameter "index"
effectively prevents the caller from passing an inconsistent "index" and
leaf to vm_radix_insert().

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-17 16:06:03 +00:00
attilio
a2e67affe3 Expand ambiguous comments some more.
Requested by:	alc
2013-03-17 15:27:26 +00:00
kib
7ca94eca24 Some style fixes.
Sponsored by:	The FreeBSD Foundation
2013-03-14 20:31:39 +00:00
kib
63efc821c3 Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination.  Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations.  For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used.  The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after:	2 weeks
2013-03-14 20:18:12 +00:00
kib
51407f194b Remove excessive and inconsistent initializers for the various kernel
maps and submaps.

MFC after:	2 weeks
2013-03-14 19:50:09 +00:00
attilio
07b5846fc9 Fix compilation.
Sponsored by:	EMC / Isilon storage division
2013-03-13 01:38:32 +00:00
attilio
3b0a5f0419 Use the _KERNEL protectors.
Sponsored by:	EMC / Isilon storage division
Requested by:	alc
2013-03-13 01:02:11 +00:00
attilio
02cf10e6db Add a further safety belt to prevent inconsistencies.
Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
2013-03-13 01:00:34 +00:00
attilio
ba43ac477b For uniformity, use the user provided index.
Sponsored by:	EMC / Isilon storage division
Reviewed and reported by:	alc
2013-03-13 00:41:37 +00:00
attilio
3c52979cb4 MFC 2013-03-12 13:26:12 +00:00
attilio
45af7dd4e7 Simplify vm_page_is_valid().
Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-03-12 12:20:49 +00:00
alc
07fc599921 When transferring the page from one object to another, don't insert the
page into its new object until the page's pindex has been updated.
Otherwise, one code path within vm_radix_insert() may use the wrong
pindex value.

Sponsored by:	EMC / Isilon Storage Division
2013-03-12 06:14:31 +00:00
attilio
32a3275e77 MFC 2013-03-11 10:49:02 +00:00
alc
2c9c761886 Introduce vm_radix_is_empty(), and use it in place of
vm_object_cache_is_empty() where the caller is aware of the page cache's
implementation as a radix trie.

Sponsored by:	EMC / Isilon Storage Division
2013-03-10 17:30:57 +00:00
alc
854d4fd5e6 Update a comment: The object lock is no longer a mutex. 2013-03-09 21:32:24 +00:00
attilio
76954ad68a Merge from vmcontention. 2013-03-09 03:19:53 +00:00
attilio
16a80466e5 MFC 2013-03-09 02:51:51 +00:00
attilio
72f7f3e528 Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
attilio
754f3790b8 Merge from vmc-playground:
Introduce a new KPI that verifies if the page cache is empty for a
specified vm_object.  This KPI does not make assumptions about the
locking in order to be used also for building assertions at init and
destroy time.
It is mostly used to hide implementation details of the page cache.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	alc (vm_radix based version)
Tested by:	flo, pho, jhb, davide
2013-03-09 02:05:29 +00:00
attilio
7fd2627275 MFC 2013-03-09 01:39:42 +00:00
andre
adea04bda7 Move the callout subsystem initialization to its own SYSINIT()
from being indirectly called via cpu_startup()+vm_ksubmap_init().
The boot order position remains the same at SI_SUB_CPU.

Allocation of the callout array is changed to stardard kernel malloc
from a slightly obscure direct kernel_map allocation.

kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init()
to better describe its purpose.
kern_timeout_callwheel_init() is removed simplifying the per-cpu
initialization.

Reviewed by:	davide
2013-03-08 10:37:17 +00:00
attilio
bf1dc90446 MFC 2013-03-08 00:03:07 +00:00
attilio
82aa86d64f Improve comments.
Sponsored by:	EMC / Isilon storage division
Submitted by:	mdf
2013-03-07 23:37:10 +00:00
attilio
1be810ec73 MFC 2013-03-04 13:14:59 +00:00
attilio
e5bdd2f06e Merge from vmcontention:
As vm objects are type-stable there is no need to initialize the
resident splay tree pointer and the cache splay tree pointer in
_vm_object_allocate() but this could be done in the init UMA zone
handler.

The destructor UMA zone handler, will further check if the condition is
retained at every destruction and catch for bugs.

Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
2013-03-04 13:10:59 +00:00
attilio
709ad55889 Evaluations on the likelyhood of empty object cache cannot be made in
general way but must be evaluated case by case.
Embedd the decision in the caller themselves rather than in a
general purpose KPI.

Sponsored by:	EMC / Isilon storage division
Reported by:	alc
Reviewed by:	alc
2013-03-04 12:33:40 +00:00
alc
a8671df14b Fix a typo.
Sponsored by:	EMC / Isilon Storage Division
2013-03-04 07:25:11 +00:00
alc
a855741cf1 A Boolean is more appropriate than an int here. Use what I think is a
slightly better variable name.

Sponsored by:	EMC / Isilon Storage Division
2013-03-04 07:20:59 +00:00
alc
c3be5353b8 Make a pass over most of the comments. 2013-03-04 07:11:10 +00:00
alc
475367da61 Simplify Boolean expressions.
Sponsored by:	EMC / Isilon Storage Division
2013-03-04 06:26:25 +00:00
alc
5094368613 Fix spelling.
Sponsored by:	EMC / Isilon Storage Division
2013-03-04 06:13:26 +00:00
attilio
60e39c95b8 Remove the boot-time cache support and rely on UMA boot-time slab cache
for allocating the nodes before to have the possibility to carve
directly from the UMA subsystem.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-03-04 00:07:23 +00:00
alc
f661d6e522 We don't need to reinitialize the root of the page cache trie on every
vm object allocation.  We can, instead, rely on the type stability of
the vm object zone.  (Note that we already assert that the page cache
trie is empty in the vm object zone destructor.)

Sponsored by:	EMC / Isilon Storage Division
2013-03-03 20:37:27 +00:00
alc
317a9584fb Two out of three times that vm_page_find_least() is called, it's going to
return the vm object's first page.  In those cases, there is no need to
traverse the trie.

Sponsored by:	EMC / Isilon Storage Division
2013-03-03 01:36:31 +00:00
attilio
a345907061 Merge from vmcontention 2013-03-03 01:10:49 +00:00
attilio
c53a782d3a MFC 2013-03-03 01:06:24 +00:00
alc
2322e91e7c Revert white space change in the previous commit.
Requested by:	attilio
2013-03-02 18:27:51 +00:00
alc
c5b028cc14 Assert that the trie is empty when a vm object is destroyed.
Since vm objects are allocated from type-stable memory, we don't need to
initialize the trie's root in _vm_object_allocate() on every vm object
allocation.  We can instead do it once in vm_object_zinit().

We don't need to call vm_radix_reclaim_allnodes() in vm_object_terminate()
unless the resident page count is non-zero.

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-02 18:18:30 +00:00
alc
90d4aeb975 The value held by the vm object's field pg_color is only considered
valid if the flag OBJ_COLORED is set.  Since _vm_object_allocate()
doesn't set this flag, it needn't initialize pg_color.

Sponsored by:	EMC / Isilon Storage Division
2013-03-02 18:07:29 +00:00
attilio
e98f58faf6 MFC 2013-03-02 14:48:41 +00:00
attilio
89979cd218 Merge from vmcontention 2013-03-02 14:35:15 +00:00
attilio
17028bb6ae MFC 2013-03-02 14:28:31 +00:00
pjd
f07ebb8888 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
attilio
31dffb7b33 Merge from vmcontention 2013-02-27 18:25:57 +00:00
attilio
6ff1954532 MFC 2013-02-27 18:23:12 +00:00
attilio
8d28f94790 Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-02-27 18:12:13 +00:00
attilio
c74a3afc6a Fix compiling. 2013-02-26 23:54:17 +00:00
attilio
74f58faa15 MFC 2013-02-26 23:52:23 +00:00
attilio
cd86838830 Merge from vmcontention 2013-02-26 23:46:19 +00:00
attilio
cbe7c0e167 MFC 2013-02-26 23:43:28 +00:00
attilio
cc89d0bd92 Merge from vmc-playground branch:
Replace the sub-optimal uma_zone_set_obj() primitive with more modern
uma_zone_reserve_kva().  The new primitive reserves before hand
the necessary KVA space to cater the zone allocations and allocates pages
with ALLOC_NOOBJ.  More specifically:
- uma_zone_reserve_kva() does not need an object to cater the backend
  allocator.
- uma_zone_reserve_kva() can cater M_WAITOK requests, in order to
  serve zones which need to do uma_prealloc() too.
- When possible, uma_zone_reserve_kva() uses directly the direct-mapping
  by uma_small_alloc() rather than relying on the KVA / offset
  combination.

The removal of the object attribute allows 2 further changes:
1) _vm_object_allocate() becomes static within vm_object.c
2) VM_OBJECT_LOCK_INIT() is removed.  This function is replaced by
   direct calls to mtx_init() as there is no need to export it anymore
   and the calls aren't either homogeneous anymore: there are now small
   differences between arguments passed to mtx_init().

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc (which also offered almost all the comments)
Tested by:	pho, jhb, davide
2013-02-26 23:35:27 +00:00
attilio
726aa55a61 Merge from vmcontention 2013-02-26 21:17:38 +00:00
attilio
9d00dd1afe MFC 2013-02-26 21:13:09 +00:00
attilio
820ab571ec MFC 2013-02-26 21:09:35 +00:00
attilio
5a60eaa26c Remove white spaces.
Sponsored by:	EMC / Isilon storage division
2013-02-26 20:35:40 +00:00
attilio
43aa55b4cd Revert the moving of vm_object objects initialization:
the objects zone ensures type-stability and thus we want to execute
actual lock initialization only when the objects are brought into the
zone otherwise there could be races between lock threads doing
re-initilization and other threads that want to acquire the lock
without a reference.

Sponsored by:	EMC / Isilon storage division
Reported by:	alc
2013-02-26 20:18:25 +00:00
attilio
210a93e7f7 Merge from vmcontention 2013-02-26 18:18:39 +00:00
attilio
134623836d MFC 2013-02-26 18:11:43 +00:00
attilio
afe5ce0c13 MFC 2013-02-26 17:33:18 +00:00
attilio
49f99b7251 Wrap the sleeps synchronized by the vm_object lock into the specific
macro VM_OBJECT_SLEEP().
This hides some implementation details like the usage of the msleep()
primitive and the necessity to access to the lock address directly.
For this reason VM_OBJECT_MTX() macro is now retired.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-02-26 17:22:08 +00:00
alc
c5315c03fb Update a comment: noobj_alloc() has replaced obj_alloc(), but it doesn't
really make sense for this comment to name specific backend allocators,
instead simply refer to backend allocators.

Sponsored by:	EMC / Isilon Storage Division
2013-02-26 06:38:00 +00:00
attilio
e590e8091e As VM_OBJECT_SLEEP() is a vm_object_t specific function, make
the passed object as the first argument of the function for consistency.

Sponsored by:	EMC / Isilon storage revision
2013-02-26 01:38:12 +00:00
attilio
fc0ecac2f8 Revert wrongly added asserts: lookup and remove from the collection
of cached pages doesn't require the object lock to be held.

Sponsored by:	EMC / Isilon storage division
2013-02-26 00:34:52 +00:00
alc
26f238c055 Revise the comment describing uma_zone_reserve_kva().
Sponsored by:	EMC / Isilon Storage Division
Reviewed by:	attilio
2013-02-26 00:18:50 +00:00
attilio
343c9f6f19 Missing semicolon.
Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
Pointy hat to:	me
2013-02-24 19:10:16 +00:00
attilio
1a753217f3 Simplify return logic.
Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
2013-02-24 19:05:11 +00:00
attilio
69d25b60d5 Merge from vmcontention 2013-02-24 17:11:10 +00:00
attilio
cff31deb1a MFC 2013-02-24 16:50:53 +00:00
attilio
12289fcebc Retire the old UMA primitive uma_zone_set_obj() and replace it with the
more modern uma_zone_reserve_kva(). The difference is that it doesn't
rely anymore on an obj to allocate pages and the slab allocator doesn't
use any more any specific locking but atomic operations to complete
the operation.
Where possible, the uma_small_alloc() is instead used and the uk_kva
member becomes unused.

The subsequent cleanups also brings along the removal of
VM_OBJECT_LOCK_INIT() macro which is not used anymore as the code
can be easilly cleaned up to perform a single mtx_init(), private
to vm_object.c.
For the same reason, _vm_object_allocate() becomes private as well.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-02-24 16:41:36 +00:00
attilio
6b1291b4d1 Do not call vm_radix_lookup_ge() in the reservation system unless
it is absolutely necessary.

Sponsored by:	EMC / Isilon storage division
Submitted by:	alc
2013-02-24 16:10:43 +00:00
attilio
f6d331e804 Fix an inverted check that was reporting indexes wrongly detected
as wrapped.

Sponsored by:	EMC / Isilon storage divison
Reported by:	alc
2013-02-24 16:08:37 +00:00
alc
96feae12e9 Correctly assert that no page already exists at the offset within the
object that is currently being allocated.

Sponsored by:	EMC / Isilon Storage Division
2013-02-23 19:28:31 +00:00
attilio
8702b26c68 Complete the asserts by definining also assertions for
RA_RLOCKED and RA_LOCKED cases.

Sponsored by:	EMC / Isilon storage division
Requested by:	alc
2013-02-21 21:56:51 +00:00
attilio
905e648d42 Hide the details for the assertion for VM_OBJECT_LOCK operations.
Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into
VM_OBJECT_ASSERT_WLOCKED(foo)

Sponsored by:	EMC / Isilon storage division
Requested by:	alc
2013-02-21 21:54:53 +00:00
attilio
b2afca4987 Add read mode operations to VM_OBJECT_LOCK* class of functions.
Sponsored by:	EMC / Isilon storage division
2013-02-20 12:06:33 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
attilio
1f1e13ca03 There is no need to use VM_OBJECT_LOCKED() as the assertion won't
make the check available in any case if INVARIANTS is switched off.
Remove VM_OBJECT_LOCKED().
2013-02-20 10:51:34 +00:00
attilio
6a2e2ce522 Remove unused VM_OBJECT_LOCKPTR().
Sponsored by:	EMC / Isilon storage division
2013-02-20 10:40:27 +00:00
attilio
658534ed5a Switch vm_object lock to be a rwlock.
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
  get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
  files require including directly rwlock.h
2013-02-20 10:38:34 +00:00
alc
88b6705ed6 On arm, like sparc64, the end of the kernel map varies from one type of
machine to another.  Therefore, VM_MAX_KERNEL_ADDRESS can't be a constant.
Instead, #define it to be a variable, vm_max_kernel_address, just like we
do on sparc64.

Reviewed by:	kib
Tested by:	ian
2013-02-18 01:02:48 +00:00
attilio
47d4527f8f Remove an unuseful check as looking up into an empty trie should be
as fast as checking a NULL ptr.

Sponsored by:	EMC / Isilon storage division
2013-02-15 17:22:57 +00:00
attilio
ea8f6ef283 Remove whitespace. 2013-02-15 17:21:41 +00:00
attilio
ebcaab64ea Merge from vmcontention 2013-02-15 16:11:30 +00:00
attilio
b4e24f9126 MFC 2013-02-15 16:08:08 +00:00
attilio
1cf2f4550b On arches with VM_PHYSSEG_DENSE the vm_page_array is larger than
the actual number of vm_page_t that will be derived, so v_page_count
should be used appropriately.

Besides that, add a panic condition in case UMA fails to properly
restrict the area in a way to keep all the desired objects.

Sponsored by:	EMC / Isilon storage division
Reported by:	alc
2013-02-15 16:05:18 +00:00
attilio
757b950804 Remove unused headers. 2013-02-15 15:34:19 +00:00
attilio
daa1f2caab Fix comment. 2013-02-15 14:54:09 +00:00
attilio
fa12391493 Move the radix node zone destructor definition closer to
vm_radix_init() definition.

Sponsored by:	EMC / Isilon storage division
2013-02-15 14:53:42 +00:00
attilio
be627ca24c - When panicing for "too small boot cache" reason, print the actual
cache size value
- Add a way to specify the size of the boot cache at compile time

Sponsored by:	EMC / Isilon storage division
2013-02-15 14:50:36 +00:00
attilio
47ecbcf556 Improve dynamic branch prediction and i-cache utilization:
- Use predict_false() to tag boot-time cache decisions
- Compact boot-time cache allocation into a separate, non-inline,
  function that won't be called most of the times.

Sponsored by:	EMC / Isilon storage division
2013-02-15 14:48:06 +00:00
attilio
af55a9fb46 - Fix style in vm_page_lookup(): there is no whiteline between
assertions and other code in this file.
- Reinsert some comments that were lost during the work but which are
  actual yet, reducing differences with HEAD.

Sponsoed by:	EMC / Isilon storage division
2013-02-15 12:35:16 +00:00
jhb
ca6ecf3ea5 Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel
config file.

Requested by:	phk (ages ago)
MFC after:	1 month
2013-02-14 19:38:04 +00:00
attilio
642ba82fdf Remove an unuseful check on resident_page_count.
vm_radix_lookup_ge() of an empty trie is as fast as checking a
NULL pointer.
2013-02-14 15:25:31 +00:00
attilio
908e129569 Fix style. 2013-02-14 15:24:13 +00:00
attilio
eafe26c8a6 The radix preallocation pages can overfow the biggestone segment, so
use a different scheme for preallocation: reserve few KB of nodes to be
used to cater page allocations before the memory can be efficiently
pre-allocated by UMA.

This at all effects remove boot_pages further carving and along with
this modifies to the boot_pages allocation system and necessity to
initialize the UMA zone before pmap_init().

Reported by:	pho, jhb
2013-02-14 15:23:00 +00:00
attilio
3db337c2ea Grammar.
Sponsored by:	EMC / Isilon storage division
2013-02-13 02:04:49 +00:00
attilio
53f78d1a7d Implement a new algorithm for managing the radix trie which also
includes path-compression. This greatly helps with sparsely populated
tries, where an uncompressed trie may end up by having a lot of
intermediate nodes for very little leaves.

The new algorithm introduces 2 main concepts: the node level and the
node owner.  Every node represents a branch point where the leaves share
the key up to the level specified in the node-level (current level
excluded, of course).  Such key partly shared is the one contained in
the owner.  Of course, the root branch is exempted to keep a valid
owner, because theoretically all the keys are contained in the space
designed by the root branch node.  The search algorithm seems very
intuitive and that is where one should start reading to understand the
full approach.

In the end, the algorithm ends up by demanding only one node per insert
and this is not necessary in all the cases.  To stay safe, we basically
preallocate as many nodes as the number of physical pages are in the
system, using uma_preallocate().  However, this raises 2 concerns:
* As pmap_init() needs to kmem_alloc(), the nodes must be pre-allocated
  when vm_radix_init() is currently called, which is much before UMA
  is fully initialized.  This means that uma_prealloc() will dig into the
  UMA_BOOT_PAGES pool of pages, which is often not enough to keep track
  of such large allocations.
  In order to fix this, change a bit the concept of UMA_BOOT_PAGES and
  vm.boot_pages. More specifically make the UMA_BOOT_PAGES an initial "value"
  as long as vm.boot_pages and extend the boot_pages physical area by as
  many bytes as needed with the information returned by
  vm_radix_allocphys_size().
* A small amount of pages will be held in per-cpu buckets and won't be
  accessible from curcpu, so the vm_radix_node_get() could really panic
  when the pre-allocation pool is close to be exhausted.
  In theory we could pre-allocate more pages than the number of physical
  frames to satisfy such request, but as many insert would happen without
  a node allocation anyway, I think it is safe to assume that the
  over-allocation is already compensating for such problem.
  On the field testing can stand me correct, of course.  This could be
  further helped by the case where we allow a single-page insert to not
  require a complete root node.

The use of pre-allocation gets rid all the non-direct mapping trickery
and introduced lock recursion allowance for vm_page_free_queue.

The nodes children are reduced in number from 32 -> 16 and from 16 -> 8
(for respectively 64 bits and 32 bits architectures).
This would make the children to fit into cacheline for amd64 case,
for example, and in general spawn less cacheline, which may be
helpful in lookup_ge() case.
Also, path-compression cames to help in cases where there are many levels,
making the fallouts of such change less hurting.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff (partially)
Tested by:	flo
2013-02-13 01:19:31 +00:00
attilio
87d8d2eec7 Fix style. 2013-02-10 16:00:14 +00:00
attilio
8743c2878e Fix wrong object reference.
Sponsored by:	EMC / Isilon Storage Division
2013-02-10 01:30:13 +00:00
attilio
25a17068be Remove implementation specific comments from a public interface. 2013-02-07 15:13:35 +00:00
attilio
702feea4c3 Correctly complete r246474. 2013-02-07 15:08:35 +00:00
attilio
5ab232ef16 Strengten checks. 2013-02-07 15:06:45 +00:00
attilio
59f669fb82 Style. 2013-02-07 15:06:04 +00:00
attilio
12b7890c27 Reduce differences with HEAD. 2013-02-07 11:36:34 +00:00
attilio
950536d7a4 Reformat comments to follow original version and re-add correct
locking flags.
2013-02-06 23:48:04 +00:00
attilio
86cff934d5 Do not assume the lock to be held so that this can be used also in
safe cases as a short-cut.
2013-02-06 19:03:48 +00:00
attilio
9066f231e3 Tweak comment to remove splay tree references. 2013-02-06 19:02:46 +00:00
attilio
1bab15985e Make vm_object_cache_is_empty() inline. 2013-02-06 18:59:34 +00:00
attilio
4c22b4bafe Cleanup vm_radix KPI:
- Avoid the return value for vm_radix_insert()
- Name the functions argument per-style(9)
- Avoid to get and return opaque objects but use vm_page_t as vm_radix is
  thought to not really be general code but to cater specifically page
  cache and resident cache.
2013-02-06 18:37:46 +00:00
attilio
3bbca3bce2 Fixup r246423 by adding vm_radix.h includes where it is not present
currently.
2013-02-06 18:33:32 +00:00
attilio
c02c27a33d Avoid a namespace pollution in vm_object.h by defining separately the
structure for vm_radix implementation.
2013-02-06 18:04:28 +00:00
attilio
d3fb98bfb4 Enrich comments on newly added assertions. 2013-02-06 17:47:24 +00:00
attilio
abbe2a9b91 - Move the vm_object_cache_is_empty() prototype to be sorted
alphabetically.
- Change the return type to be boolean_t in order to match what
  vm_page_is_cached() does.
2013-02-06 17:27:41 +00:00
attilio
cde4f0caa2 Fix mismerge. 2013-02-06 17:22:16 +00:00
attilio
22b0de04b3 Reduce diffs against HEAD. 2013-02-06 17:17:11 +00:00
attilio
9d88c5279c Now that vm_page_cache_free() and vm_page_cache_transfer() are
reimplemented as ranged operations, sync vm_page_is_cached() semantic
with HEAD.
2013-02-06 14:50:34 +00:00
attilio
44f85cd1a5 Reduce diffs against HEAD:
Reimplement vm_page_cache_free() as a range operation.
2013-02-06 14:29:05 +00:00
attilio
439c0b8cf1 Reduce diffs against HEAD:
- Reimplement vm_page_cache_transfer() properly
- Remove vm_page_cache_rename() as a subsequent change
2013-02-05 00:09:33 +00:00
attilio
62f53da2e7 Merge from vmcontention 2013-02-04 22:15:36 +00:00
attilio
d3b7ec3a08 MFC 2013-02-04 22:10:01 +00:00
attilio
d61cd60feb Reduce differences with HEAD. 2013-02-04 22:05:22 +00:00
attilio
b972b67ed7 Merge from vmcontention 2013-02-04 15:44:42 +00:00
marius
790d2fce4f Try to improve r242655 take III: move these SYSCTLs describing the kernel
map, which is defined and initialized in vm/vm_kern.c, to the latter.

Submitted by:	alc
2013-02-04 09:35:48 +00:00
attilio
b134f527dc Detect address wrapup without defining the right boundary. 2013-02-04 08:53:51 +00:00
attilio
0d3b58aee0 MFC 2013-02-03 20:13:33 +00:00
glebius
e3e319a0b6 Fix typo in debug printf. 2013-01-29 19:06:16 +00:00
zont
b5edc96a84 - Add system wide page faults requiring I/O counter.
Reviewed by:	alc
MFC after:	2 weeks
2013-01-28 12:54:53 +00:00
zont
875b69507c - Add sysctls to show number of stats scans.
MFC after:	2 weeks
2013-01-28 12:20:20 +00:00
zont
b3905d7835 - Style.
MFC after:	2 weeks
2013-01-28 12:08:29 +00:00
zont
ee65990ea4 - Get rid of unused function vmspace_wired_count().
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-14 12:12:56 +00:00
zont
3b71bce613 - Improve readability of sys_obreak().
Suggested by:	alc
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-11 09:58:35 +00:00
zont
d2863e4c68 - Reduce kernel size by removing unnecessary pointer indirections.
GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes.

Suggested by:	alc
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-10 12:43:58 +00:00
attilio
f458bac614 Remove vm_radix_lookupn() and its usage in the kernel. 2013-01-10 12:30:58 +00:00
ken
1abc90f894 Fix a bug in the device pager code that can trigger an assertion
in devfs if a particular race condition is hit in the device pager
code.

This was a side effect of change 227530 which changed the device
pager interface to call a new destructor routine for the cdev.
That destructor routine, old_dev_pager_dtor(), takes a VM object
handle.

The object handle is cast to a struct cdev *, and passed into
dev_rel().

That works in most cases, except the case in cdev_pager_allocate()
where there is a race condition between two threads allocating an
object backed by the same device.  The loser of the race
deallocates its object at the end of the function.

The problem is that before inserting the object into the
dev_pager_object_list, the object's handle is changed from the
struct cdev pointer to the object's own address.  This is to avoid
conflicts with the winner of the race, which already inserted an
object in the list with a handle that is a pointer to the same cdev
structure.

The object is then passed to vm_object_deallocate(), and eventually
makes its way down to old_dev_pager_dtor().  That function passes
the handle pointer (which is actually a VM object, not a struct
cdev as usual) into dev_rel().  dev_rel() decrements the reference
count in the assumed struct cdev (which happens to be 0), and
that triggers the assertion in dev_rel() that the reference count
is greater than or equal to 0.

The fix is to add a cdev pointer to the VM object, and use that
pointer when calling the cdev_pg_dtor() routine.

vm_object.h:	Add a struct cdev pointer to the VM object
		structure.

device_pager.c:	In cdev_pager_allocate(), populate the new cdev
		pointer.

		In dev_pager_dealloc(), use the new cdev pointer
		when calling the object's cdev_pg_dtor() routine.

Reviewed by:	kib
Sponsored by:	Spectra Logic Corporation
MFC after:	1 week
2013-01-09 16:48:38 +00:00
attilio
fcadd67d75 MFC 2012-12-26 08:20:27 +00:00
glebius
1292747048 Comment fix: there is no ub_ptr, instead explain meaning of uz_count
field verbally.
2012-12-21 10:09:45 +00:00
zont
15b694913e - Fix locked memory accounting for maps with MAP_WIREFUTURE flag.
- Add sysctl vm.old_mlock which may turn such accounting off.

Reviewed by:	avg, trasz
Approved by:	kib (mentor)
MFC after:	1 week
2012-12-18 07:35:01 +00:00
attilio
be719e9167 MFC 2012-12-11 00:07:19 +00:00
alc
02094caa2c In the past four years, we've added two new vm object types. Each time,
similar changes had to be made in various places throughout the machine-
independent virtual memory layer to support the new vm object type.
However, in most of these places, it's actually not the type of the vm
object that matters to us but instead certain attributes of its pages.
For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain
fictitious pages.  In other words, in most of these places, we were
testing the vm object's type to determine if it contained fictitious (or
unmanaged) pages.

To both simplify the code in these places and make the addition of future
vm object types easier, this change introduces two new vm object flags
that describe attributes of the vm object's pages, specifically, whether
they are fictitious or unmanaged.

Reviewed and tested by:	kib
2012-12-09 00:32:38 +00:00
pjd
fc89492084 White-space cleanups. 2012-12-08 09:23:05 +00:00
pjd
a585ca9ec8 Implemented uma_zone_set_warning(9) function that sets a warning, which
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.

All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.

Discussed on:	arch
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-12-07 22:27:13 +00:00
alc
793b12af79 Add support for the (relatively) new object type OBJT_MGTDEVICE to
vm_object_set_memattr().  Also, add a "safety belt" so that
vm_object_set_memattr() doesn't silently modify undefined object types.

Reviewed by:	kib
MFC after:	10 days
2012-11-28 18:29:34 +00:00
alc
e44badfb9f Make a few small changes to vm_map_pmap_enter():
Add detail to the comment describing this function.  In particular,
describe what MAP_PREFAULT_PARTIAL does.

Eliminate the abrupt change in behavior when the specified address range
grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages.  Instead of
doing nothing, i.e., preloading no mappings whatsoever, map any resident
pages that fall within the start of the specified address range, i.e.,
[addr, addr + ulmin(size, ptoa(MAX_INIT_PT))).

Long ago, the vm object's list of resident pages was not ordered, so
this function had to choose between probing the global hash table of
all resident pages and iterating over the vm object's unordered list of
resident pages.  Now, the list is ordered, so there is no reason for
MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of
resident changes.

MFC after:	14 days
2012-11-25 19:42:36 +00:00
alc
e12b2ad698 Correct an error in r230623. When both VM_ALLOC_NODUMP and VM_ALLOC_ZERO
were specified to vm_page_alloc(), PG_NODUMP wasn't being set on the
allocated page when it happened to be pre-zeroed.

MFC after:	5 days
2012-11-21 06:26:18 +00:00
jh
25dd09b996 - Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by:	pjd
2012-11-20 12:32:18 +00:00
alc
1284a0383d Update a comment to reflect the elimination of the hold queue in r242300. 2012-11-17 04:00:19 +00:00
kib
bc5bfde14d Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:55:56 +00:00
kib
75f2aa672f Explicitely state that M_USE_RESERVE requires M_NOWAIT, using assertion.
Reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:49:56 +00:00
kib
e8ae50d444 Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by:	"Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by:	alc, mdf (previous version)
Tested by:	pho (previous version)
MFC after:	2 weeks
2012-11-14 20:01:40 +00:00
alc
ff7333d33f Replace the single, global page queues lock with per-queue locks on the
active and inactive paging queues.

Reviewed by:	kib
2012-11-13 02:50:39 +00:00
attilio
7efc7fb950 Fix DDB command "show map XXX":
- Check that an argument is always available, otherwise current map
  printing before to recurse is garbage.
- Spit out a message if an argument is not provided.
- Remove unread nlines variable.
- Use an explicit recursive function, disassociated from the
  DB_SHOW_COMMAND() body, in order to make clear prototype and recursion
  of the above mentioned function.  The code results now much less
  obscure.

Submitted by:	gianni
2012-11-12 00:30:40 +00:00
kib
f16ea99007 The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
alc
60d5a532fb In general, we call pmap_remove_all() before calling vm_page_cache(). So,
the call to pmap_remove_all() within vm_page_cache() is usually redundant.
This change eliminates that call to pmap_remove_all() and introduces a
call to pmap_remove_all() before vm_page_cache() in the one place where
it didn't already exist.

When iterating over a paging queue, if the object containing the current
page has a zero reference count, then the page can't have any managed
mappings.  So, a call to pmap_remove_all() is pointless.

Change a panic() call in vm_page_cache() to a KASSERT().

MFC after:	6 weeks
2012-11-01 16:20:02 +00:00
attilio
d38d7bb245 Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.

The sole exception being nvme and sxfge drivers, where the author
redefined CACHE_LINE_SIZE manually, so they need to be analyzed and
dealt with separately.

Reviwed by:	jimharris, alc
2012-10-31 18:07:18 +00:00
alc
77582e8298 Replace the page hold queue, PQ_HOLD, by a new page flag, PG_UNHOLDFREE,
because the queue itself serves no purpose.  When a held page is freed,
inserting the page into the hold queue has the side effect of setting the
page's "queue" field to PQ_HOLD.  Later, when the page is unheld, it will
be freed because the "queue" field is PQ_HOLD.  In other words, PQ_HOLD is
used as a flag, not a queue.  So, this change replaces it with a flag.

To accomodate the new page flag, make the page's "flags" field wider and
"oflags" field narrower.

Reviewed by:	kib
2012-10-29 06:15:04 +00:00
trasz
edd0d84112 Remove useless check; vm_pindex_t is unsigned on all architectures.
CID:		3701
Found with:	Coverity Prevent
2012-10-28 20:03:57 +00:00
mdf
1bc1b805d7 Const-ify the zone name argument to uma_zcreate(9).
MFC after:	3 days
2012-10-26 17:51:05 +00:00
andre
a8b2ff5af7 Move the corresponding MTX_SYSINIT() next to their struct mtx declaration
to make their relationship more obvious as done with the other such mutexs.
2012-10-26 17:31:35 +00:00
kib
ddaaa16d8b Commit the actual text provided by Alan, instead of the wrong update
in r242011.

MFC after:	1 week
2012-10-24 18:32:37 +00:00
kib
2a76567642 Dirty the newly copied anonymous pages after the wired region is
forked. Otherwise, pagedaemon might reclaim the page without saving
its content into the swap file, resulting in the valid content
replaced by zeroes.

Reported and tested by:	pho
Reviewed and comment update by:	alc
MFC after:	1 week
2012-10-24 18:21:59 +00:00
attilio
64eaf39fd7 MFC 2012-10-22 21:26:36 +00:00
kib
560aa751e0 Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
eadler
23c67e54ce Print flags as hex instead of an integer.
PR:		kern/168210
Submitted by:	linimon
Reviewed by:	alc
Approved by:	cperciva
MFC after:	3 days
2012-10-22 02:11:57 +00:00
alc
1df941a7f3 Move vm_page_requeue() to the only file that uses it.
MFC after:	3 weeks
2012-10-13 20:19:43 +00:00
alc
c84b1820ea Eliminate the conditional for releasing the page queues lock in
vm_page_sleep().  vm_page_sleep() is no longer called with this lock
held.

Eliminate assertions that the page queues lock is NOT held.  These
assertions won't translate well to having distinct locks on the active
and inactive page queues, and they really aren't that useful.

MFC after:	3 weeks
2012-10-13 18:46:46 +00:00
alc
271cefc5f6 Tidy up a bit:
Update some of the comments.  In particular, use "sleep" in preference to
"block" where appropriate.

Eliminate some unnecessary casts.

Make a few whitespace changes for consistency.

Reviewed by:	kib
MFC after:	3 days
2012-10-03 05:06:45 +00:00
kib
8f845e475e Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
alc
a0349df30f Address a race condition that was introduced in r238212. Unless the page
queues lock is acquired before the page lock is released, there is no
guarantee that the page will still be in that same page queue when
vm_page_requeue() is called.

Reported by:		pho
In collaboration with:	kib
MFC after:	3 days
2012-09-23 17:42:39 +00:00
attilio
be930066f1 MFC 2012-09-21 03:07:34 +00:00
kib
667e5e154a Plug the accounting leak for the wired pages when msync(MS_INVALIDATE)
is performed on the vnode mapping which is wired in other address space.

While there, explicitely assert that the page is unwired and zero the
wire_count instead of substract. The condition is rechecked later in
vm_page_free(_toq) already.

Reported and tested by:	zont
Reviewed by:	alc (previous version)
MFC after:	1 week
2012-09-20 09:52:57 +00:00
glebius
e4b6b754eb If caller specifies UMA_ZONE_OFFPAGE explicitly, then do not waste memory
in an allocation for a slab.

Reviewed by:	jeff
2012-09-18 20:28:55 +00:00
eadler
8600cbb5b6 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
zont
2b9b209471 - Simplify VM code by using vmspace_wired_count() for counting wired
memory of a process.

Reviewed by:	avg
Approved by:	kib (mentor)
MFC after:	2 weeks
2012-09-05 18:19:54 +00:00
des
71dbd73468 Whitespace cleanup. 2012-09-05 12:24:50 +00:00
des
3ed9c078db No memory barrier is required. This was pointed out by kib@ a while ago,
but I got distracted by other matters.

(for real this time)
2012-09-04 22:19:33 +00:00
des
dec17a5bb5 Revert previous commit, which was performed in the wrong tree. 2012-09-04 21:06:53 +00:00
des
627c3f1a6e No memory barrier is required. This was pointed out by kib@ a while ago,
but I got distracted by other matters.
2012-09-04 19:04:02 +00:00
zont
f93fc1d719 - After r240026 sgrowsiz should be used in a safer maner.
Approved by:	kib (mentor)
MCF after:	1 week
2012-09-03 09:34:46 +00:00
zont
2f4305a824 - Remove accounting of locked memory from vsunlock(9) that I missed in r239818.
Approved by:	kib (mentor)
2012-08-30 08:03:33 +00:00
zont
85dfc3b8b7 - Don't take an account of locked memory for current process in vslock(9).
There are two consumers of vslock(9): sysctl code and drm driver.  These
consumers are using locked memory as transient memory, it doesn't belong
to a process's memory.

Suggested by:	avg
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	2 weeks
2012-08-29 11:23:59 +00:00
attilio
d3c5a80b69 MFC 2012-08-27 11:59:04 +00:00
pluknet
91ac59768e Typo in previous change: print half the theoretical maximum as maximum
recommended amount.

Reported by:	<site freebsd at orientalsensation com>
Reviewed by:	des
2012-08-27 10:59:49 +00:00
glebius
b1ab314c3f Fix function name in keg_cachespread_init() assert. 2012-08-26 09:54:11 +00:00
des
5e88649166 - When running out of swzone, instead of spewing an error message every
tick until the situation is resolved (if ever), just print a single
  message when running out and another when space becomes available.

- When adding more swap, warn if the total amount exceeds half the
  theoretical maximum we can handle.
2012-08-16 08:29:49 +00:00
kib
ce7012daf6 For old mmap syscall, when executing on amd64 or ia64, enforce the
PROT_EXEC if prot is non-zero, process is 32bit and
kern.elf32.i386_read_exec syscal is enabled. This workaround is needed
for old i386 a.out binaries, where dynamic linker did not specified
PROT_EXEC for mapping of the text.

The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries,
but I reused the existing knob which already has the needed semantic.

MFC after:	1 week
2012-08-14 12:11:48 +00:00
kib
0d46b47153 Adjust the r205536, by allowing a non-zero offset for anonymous
mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD
1.1.5.1 can issue such requests.

Reported and tested by:	Dan Plassche <dplassche@gmail.com>
MFC after:	1 week
2012-08-14 11:47:07 +00:00
kib
a3d0fb0175 Do not leave invalid pages in the object after the short read for a
network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.

Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.

Noted and reviewed by:	alc
MFC after:	1 week
2012-08-14 11:45:47 +00:00
alc
cd8266338a Never sleep on busy pages in vm_pageout_launder(), always skip them. Long
ago, sleeping on busy pages in vm_pageout_launder() made sense.  The call
to vm_pageout_flush() specified asynchronous I/O and sleeping on busy pages
blocked vm_pageout_launder() until the flush had completed.  However, in
CVS revision 1.35 of vm/vm_contig.c, the call to vm_pageout_flush() was
changed to request synchronous I/O, but the sleep on busy pages was not
removed.
2012-08-07 04:48:14 +00:00
kib
cac2fe116f After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
kib
4259905d31 Reduce code duplication and exposure of direct access to struct
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().

Reviewed by:	alc
MFC after:	1 week
2012-08-04 18:16:43 +00:00
attilio
c52a057b19 MFC 2012-08-03 15:58:05 +00:00
alc
5b4712b5a1 Inline vm_page_aflags_clear() and vm_page_aflags_set().
Add comments stating that neither these functions nor the flags that they
are used to manipulate are part of the KBI.
2012-08-03 01:48:15 +00:00
alc
ceefb8bf17 Eliminate an unneeded declaration. (I should have removed this as part
of r227568.)
2012-07-30 20:38:37 +00:00
kib
4f8212948b Do not requeue held page or page for which locking failed, just leave
them alone.

Process the act_count updates for the held pages in the vm_pageout
loop over the inactive queue, instead of refusing to do anything with
such page.

Clarify the intent of the addl_page_shortage counter and change its
use for pages which are not processed in the loop according to the
description.

Reviewed by:	alc
MFC after:	2 weeks
2012-07-26 09:06:48 +00:00
alc
26fd7fb588 Addendum to r238604. If the inactive queue scan isn't restarted, then
the variable "addl_page_shortage_init" isn't needed.

X-MFC after:	r238604
2012-07-24 02:35:30 +00:00
kib
80c3756a6f Do not restart scan of the inactive queue when non-inactive page is
found. Rather, we shall not find such pages on inactive queue at all.

Requested and reviewed by:	alc
MFC after:    2 weeks
2012-07-18 21:47:50 +00:00
alc
e5949174d4 Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similar
code resides.  Rename vm_contig_grow_cache() to vm_pageout_grow_cache().

Reviewed by:	kib
2012-07-18 05:21:34 +00:00
alc
ad2692aed9 Correct vm_page_alloc_contig()'s implementation of VM_ALLOC_NODUMP. 2012-07-17 02:36:59 +00:00