2998 Commits

Author SHA1 Message Date
zont
ee65990ea4 - Get rid of unused function vmspace_wired_count().
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-14 12:12:56 +00:00
zont
3b71bce613 - Improve readability of sys_obreak().
Suggested by:	alc
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-11 09:58:35 +00:00
zont
d2863e4c68 - Reduce kernel size by removing unnecessary pointer indirections.
GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes.

Suggested by:	alc
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-10 12:43:58 +00:00
ken
1abc90f894 Fix a bug in the device pager code that can trigger an assertion
in devfs if a particular race condition is hit in the device pager
code.

This was a side effect of change 227530 which changed the device
pager interface to call a new destructor routine for the cdev.
That destructor routine, old_dev_pager_dtor(), takes a VM object
handle.

The object handle is cast to a struct cdev *, and passed into
dev_rel().

That works in most cases, except the case in cdev_pager_allocate()
where there is a race condition between two threads allocating an
object backed by the same device.  The loser of the race
deallocates its object at the end of the function.

The problem is that before inserting the object into the
dev_pager_object_list, the object's handle is changed from the
struct cdev pointer to the object's own address.  This is to avoid
conflicts with the winner of the race, which already inserted an
object in the list with a handle that is a pointer to the same cdev
structure.

The object is then passed to vm_object_deallocate(), and eventually
makes its way down to old_dev_pager_dtor().  That function passes
the handle pointer (which is actually a VM object, not a struct
cdev as usual) into dev_rel().  dev_rel() decrements the reference
count in the assumed struct cdev (which happens to be 0), and
that triggers the assertion in dev_rel() that the reference count
is greater than or equal to 0.

The fix is to add a cdev pointer to the VM object, and use that
pointer when calling the cdev_pg_dtor() routine.

vm_object.h:	Add a struct cdev pointer to the VM object
		structure.

device_pager.c:	In cdev_pager_allocate(), populate the new cdev
		pointer.

		In dev_pager_dealloc(), use the new cdev pointer
		when calling the object's cdev_pg_dtor() routine.

Reviewed by:	kib
Sponsored by:	Spectra Logic Corporation
MFC after:	1 week
2013-01-09 16:48:38 +00:00
glebius
1292747048 Comment fix: there is no ub_ptr, instead explain meaning of uz_count
field verbally.
2012-12-21 10:09:45 +00:00
zont
15b694913e - Fix locked memory accounting for maps with MAP_WIREFUTURE flag.
- Add sysctl vm.old_mlock which may turn such accounting off.

Reviewed by:	avg, trasz
Approved by:	kib (mentor)
MFC after:	1 week
2012-12-18 07:35:01 +00:00
alc
02094caa2c In the past four years, we've added two new vm object types. Each time,
similar changes had to be made in various places throughout the machine-
independent virtual memory layer to support the new vm object type.
However, in most of these places, it's actually not the type of the vm
object that matters to us but instead certain attributes of its pages.
For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain
fictitious pages.  In other words, in most of these places, we were
testing the vm object's type to determine if it contained fictitious (or
unmanaged) pages.

To both simplify the code in these places and make the addition of future
vm object types easier, this change introduces two new vm object flags
that describe attributes of the vm object's pages, specifically, whether
they are fictitious or unmanaged.

Reviewed and tested by:	kib
2012-12-09 00:32:38 +00:00
pjd
fc89492084 White-space cleanups. 2012-12-08 09:23:05 +00:00
pjd
a585ca9ec8 Implemented uma_zone_set_warning(9) function that sets a warning, which
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.

All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.

Discussed on:	arch
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-12-07 22:27:13 +00:00
alc
793b12af79 Add support for the (relatively) new object type OBJT_MGTDEVICE to
vm_object_set_memattr().  Also, add a "safety belt" so that
vm_object_set_memattr() doesn't silently modify undefined object types.

Reviewed by:	kib
MFC after:	10 days
2012-11-28 18:29:34 +00:00
alc
e44badfb9f Make a few small changes to vm_map_pmap_enter():
Add detail to the comment describing this function.  In particular,
describe what MAP_PREFAULT_PARTIAL does.

Eliminate the abrupt change in behavior when the specified address range
grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages.  Instead of
doing nothing, i.e., preloading no mappings whatsoever, map any resident
pages that fall within the start of the specified address range, i.e.,
[addr, addr + ulmin(size, ptoa(MAX_INIT_PT))).

Long ago, the vm object's list of resident pages was not ordered, so
this function had to choose between probing the global hash table of
all resident pages and iterating over the vm object's unordered list of
resident pages.  Now, the list is ordered, so there is no reason for
MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of
resident changes.

MFC after:	14 days
2012-11-25 19:42:36 +00:00
alc
e12b2ad698 Correct an error in r230623. When both VM_ALLOC_NODUMP and VM_ALLOC_ZERO
were specified to vm_page_alloc(), PG_NODUMP wasn't being set on the
allocated page when it happened to be pre-zeroed.

MFC after:	5 days
2012-11-21 06:26:18 +00:00
jh
25dd09b996 - Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by:	pjd
2012-11-20 12:32:18 +00:00
alc
1284a0383d Update a comment to reflect the elimination of the hold queue in r242300. 2012-11-17 04:00:19 +00:00
kib
bc5bfde14d Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:55:56 +00:00
kib
75f2aa672f Explicitely state that M_USE_RESERVE requires M_NOWAIT, using assertion.
Reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:49:56 +00:00
kib
e8ae50d444 Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by:	"Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by:	alc, mdf (previous version)
Tested by:	pho (previous version)
MFC after:	2 weeks
2012-11-14 20:01:40 +00:00
alc
ff7333d33f Replace the single, global page queues lock with per-queue locks on the
active and inactive paging queues.

Reviewed by:	kib
2012-11-13 02:50:39 +00:00
attilio
7efc7fb950 Fix DDB command "show map XXX":
- Check that an argument is always available, otherwise current map
  printing before to recurse is garbage.
- Spit out a message if an argument is not provided.
- Remove unread nlines variable.
- Use an explicit recursive function, disassociated from the
  DB_SHOW_COMMAND() body, in order to make clear prototype and recursion
  of the above mentioned function.  The code results now much less
  obscure.

Submitted by:	gianni
2012-11-12 00:30:40 +00:00
kib
f16ea99007 The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
alc
60d5a532fb In general, we call pmap_remove_all() before calling vm_page_cache(). So,
the call to pmap_remove_all() within vm_page_cache() is usually redundant.
This change eliminates that call to pmap_remove_all() and introduces a
call to pmap_remove_all() before vm_page_cache() in the one place where
it didn't already exist.

When iterating over a paging queue, if the object containing the current
page has a zero reference count, then the page can't have any managed
mappings.  So, a call to pmap_remove_all() is pointless.

Change a panic() call in vm_page_cache() to a KASSERT().

MFC after:	6 weeks
2012-11-01 16:20:02 +00:00
attilio
d38d7bb245 Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.

The sole exception being nvme and sxfge drivers, where the author
redefined CACHE_LINE_SIZE manually, so they need to be analyzed and
dealt with separately.

Reviwed by:	jimharris, alc
2012-10-31 18:07:18 +00:00
alc
77582e8298 Replace the page hold queue, PQ_HOLD, by a new page flag, PG_UNHOLDFREE,
because the queue itself serves no purpose.  When a held page is freed,
inserting the page into the hold queue has the side effect of setting the
page's "queue" field to PQ_HOLD.  Later, when the page is unheld, it will
be freed because the "queue" field is PQ_HOLD.  In other words, PQ_HOLD is
used as a flag, not a queue.  So, this change replaces it with a flag.

To accomodate the new page flag, make the page's "flags" field wider and
"oflags" field narrower.

Reviewed by:	kib
2012-10-29 06:15:04 +00:00
trasz
edd0d84112 Remove useless check; vm_pindex_t is unsigned on all architectures.
CID:		3701
Found with:	Coverity Prevent
2012-10-28 20:03:57 +00:00
mdf
1bc1b805d7 Const-ify the zone name argument to uma_zcreate(9).
MFC after:	3 days
2012-10-26 17:51:05 +00:00
andre
a8b2ff5af7 Move the corresponding MTX_SYSINIT() next to their struct mtx declaration
to make their relationship more obvious as done with the other such mutexs.
2012-10-26 17:31:35 +00:00
kib
ddaaa16d8b Commit the actual text provided by Alan, instead of the wrong update
in r242011.

MFC after:	1 week
2012-10-24 18:32:37 +00:00
kib
2a76567642 Dirty the newly copied anonymous pages after the wired region is
forked. Otherwise, pagedaemon might reclaim the page without saving
its content into the swap file, resulting in the valid content
replaced by zeroes.

Reported and tested by:	pho
Reviewed and comment update by:	alc
MFC after:	1 week
2012-10-24 18:21:59 +00:00
kib
560aa751e0 Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
eadler
23c67e54ce Print flags as hex instead of an integer.
PR:		kern/168210
Submitted by:	linimon
Reviewed by:	alc
Approved by:	cperciva
MFC after:	3 days
2012-10-22 02:11:57 +00:00
alc
1df941a7f3 Move vm_page_requeue() to the only file that uses it.
MFC after:	3 weeks
2012-10-13 20:19:43 +00:00
alc
c84b1820ea Eliminate the conditional for releasing the page queues lock in
vm_page_sleep().  vm_page_sleep() is no longer called with this lock
held.

Eliminate assertions that the page queues lock is NOT held.  These
assertions won't translate well to having distinct locks on the active
and inactive page queues, and they really aren't that useful.

MFC after:	3 weeks
2012-10-13 18:46:46 +00:00
alc
271cefc5f6 Tidy up a bit:
Update some of the comments.  In particular, use "sleep" in preference to
"block" where appropriate.

Eliminate some unnecessary casts.

Make a few whitespace changes for consistency.

Reviewed by:	kib
MFC after:	3 days
2012-10-03 05:06:45 +00:00
kib
8f845e475e Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
alc
a0349df30f Address a race condition that was introduced in r238212. Unless the page
queues lock is acquired before the page lock is released, there is no
guarantee that the page will still be in that same page queue when
vm_page_requeue() is called.

Reported by:		pho
In collaboration with:	kib
MFC after:	3 days
2012-09-23 17:42:39 +00:00
kib
667e5e154a Plug the accounting leak for the wired pages when msync(MS_INVALIDATE)
is performed on the vnode mapping which is wired in other address space.

While there, explicitely assert that the page is unwired and zero the
wire_count instead of substract. The condition is rechecked later in
vm_page_free(_toq) already.

Reported and tested by:	zont
Reviewed by:	alc (previous version)
MFC after:	1 week
2012-09-20 09:52:57 +00:00
glebius
e4b6b754eb If caller specifies UMA_ZONE_OFFPAGE explicitly, then do not waste memory
in an allocation for a slab.

Reviewed by:	jeff
2012-09-18 20:28:55 +00:00
eadler
8600cbb5b6 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
zont
2b9b209471 - Simplify VM code by using vmspace_wired_count() for counting wired
memory of a process.

Reviewed by:	avg
Approved by:	kib (mentor)
MFC after:	2 weeks
2012-09-05 18:19:54 +00:00
des
71dbd73468 Whitespace cleanup. 2012-09-05 12:24:50 +00:00
des
3ed9c078db No memory barrier is required. This was pointed out by kib@ a while ago,
but I got distracted by other matters.

(for real this time)
2012-09-04 22:19:33 +00:00
des
dec17a5bb5 Revert previous commit, which was performed in the wrong tree. 2012-09-04 21:06:53 +00:00
des
627c3f1a6e No memory barrier is required. This was pointed out by kib@ a while ago,
but I got distracted by other matters.
2012-09-04 19:04:02 +00:00
zont
f93fc1d719 - After r240026 sgrowsiz should be used in a safer maner.
Approved by:	kib (mentor)
MCF after:	1 week
2012-09-03 09:34:46 +00:00
zont
2f4305a824 - Remove accounting of locked memory from vsunlock(9) that I missed in r239818.
Approved by:	kib (mentor)
2012-08-30 08:03:33 +00:00
zont
85dfc3b8b7 - Don't take an account of locked memory for current process in vslock(9).
There are two consumers of vslock(9): sysctl code and drm driver.  These
consumers are using locked memory as transient memory, it doesn't belong
to a process's memory.

Suggested by:	avg
Reviewed by:	alc
Approved by:	kib (mentor)
MFC after:	2 weeks
2012-08-29 11:23:59 +00:00
pluknet
91ac59768e Typo in previous change: print half the theoretical maximum as maximum
recommended amount.

Reported by:	<site freebsd at orientalsensation com>
Reviewed by:	des
2012-08-27 10:59:49 +00:00
glebius
b1ab314c3f Fix function name in keg_cachespread_init() assert. 2012-08-26 09:54:11 +00:00
des
5e88649166 - When running out of swzone, instead of spewing an error message every
tick until the situation is resolved (if ever), just print a single
  message when running out and another when space becomes available.

- When adding more swap, warn if the total amount exceeds half the
  theoretical maximum we can handle.
2012-08-16 08:29:49 +00:00
kib
ce7012daf6 For old mmap syscall, when executing on amd64 or ia64, enforce the
PROT_EXEC if prot is non-zero, process is 32bit and
kern.elf32.i386_read_exec syscal is enabled. This workaround is needed
for old i386 a.out binaries, where dynamic linker did not specified
PROT_EXEC for mapping of the text.

The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries,
but I reused the existing knob which already has the needed semantic.

MFC after:	1 week
2012-08-14 12:11:48 +00:00