Commit Graph

389 Commits

Author SHA1 Message Date
markj
6549c0692e Respect the caller's hints when performing swap readahead.
The pager getpages interface allows the caller to bound the number of
readahead and readbehind pages, and vm_fault_hold() makes use of this
feature. These bounds were ignored after r305056, causing the swap pager
to potentially page in more than the specified number of pages.

Reported and reviewed by:	alc
X-MFC with:	r305056
2016-09-04 00:25:49 +00:00
kib
b09a396eb7 Make swapoff reliable.
The swap_pager_swapoff() function uses trylock for the object lock
before pagein, which means that either i/o to md(4) over swap, or
intensive page faults over swap pager objects might prevent swapoff()
from making any progress. Then the retry < 100 check fails and machine
panics.

If trylock fails, acquire the object lock in the blockable way and
restart the hash bucket walk.  Keep retries logic for now.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7688
2016-08-31 14:49:58 +00:00
markj
fa1e90d6b0 Restore swap pager readahead after r292373.
The removal of vm_fault_additional_pages() meant that a hard fault on
a swap-backed page would result in only that page being read in. This
change implements readahead and readbehind for the swap pager in
swap_pager_getpages(). swap_pager_haspage() is modified to return the
largest contiguous non-resident range of pages containing the requested
range.

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7677
2016-08-30 05:56:21 +00:00
kib
eaa044710b Explain why swapgeom_close_ev() is delegated.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 07:11:19 +00:00
kib
ec00115bbc Do not delegate a work to geom event thread which can be done inline.
In particular, swapongeom_ev() needed event thread context when swap
pager configuration was performed under Giant and geom asserted that
Giant is not owned.  Now both of the reason went away.

On the other hand, note that swpageom_release() is called from the
bio_done context, and possible close cannot be performed inline.

Also fix some minor issues.  The swapgeom() function does not use the
td argument, remove it.  Recheck that the vnode passed is still VCHR
and not reclaimed after the lock.

Reviewed by:	mav
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-07-28 15:57:01 +00:00
kib
c732b6bc6f Fix style and typo.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-07-28 15:49:51 +00:00
kib
528a6a2f82 Fix inconsistent locking of the swap pager named objects list.
Right now, all modifications of the list are locked by sw_alloc_mtx.
But initial lookup of the object by the handle in swap_pager_alloc()
is not protected by sw_alloc_mtx, which means that
vm_pager_object_lookup() could follow freed pointer.

Create a new named swap object with the OBJT_SWAP type, instead
of OBJT_DEFAULT.  With this change, swp_pager_meta_build() never need
to upgrade named OBJT_DEFAULT to OBJT_SWAP (in the other place, we do
not forbid for client code to create named OBJT_DEFAULT objects at
all).

That change allows to remove sw_alloc_mtx and make the list locked by
sw_alloc_sx lock.  Update swap_pager_copy() to new locking mode.

Create helper swap_pager_alloc_init() to consolidate named and
anonymous swap objects creation, while a caller ensures that the
neccesary locks are held around the helper.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (hrs)
2016-06-13 03:42:46 +00:00
kib
501a76d137 Explicitely initialize sw_alloc_sx. Currently it is not initialized
but works due to zeroed out bss on startup.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (hrs)
2016-06-13 03:39:16 +00:00
kib
2832c7177d Remove Giant around allocation of the swap pager with non-NULL handle.
Existing issue of not protecting pager_object_list iteration in
vm_pager_object_lookup() by sw_alloc_mtx is not affected by Giant
removal.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2016-05-24 10:16:03 +00:00
kib
359a939783 Mark swap-related proc sysctls as not requiring Giant.
Reviewed by:	alc (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
2016-05-22 23:28:23 +00:00
kib
ea08516c97 Replace hand-made exclusive lock, protecting against parallel
swapon/swapoff invocations, with sx.

Reviewed by:	alc (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
2016-05-22 23:25:01 +00:00
pfg
8327dbfe4d sys/vm: minor spelling fixes in comments.
No functional change.
2016-05-02 20:16:29 +00:00
glebius
63cd1c131a A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
imp
2a2f9e7745 Mark swap_pager_putpages static at its definition. It was already
static at its declaration. Remove needless swapdev_strategy forward
declaration.

MFC After: 3 days
2015-10-05 21:29:17 +00:00
imp
f1f20a2907 The swap pager is compatible with direct dispatch. It does its own
locking and doesn't sleep. Flag the consumer we create as such. In
addition, decrement the in flight index when we have an out of memory
error after having incremented it previously. This would have
prevented swapoff from working if the swap pager ever hit a resource
shortage trying to swap out something (the swap in path always waits
for a bio, so won't have this issue). Simplify the close logic by
abandoning the use of private and initializing the index to 1 and
dropping that reference when we previously set private.

Also, set sw_id only while sw_dev_mtx is held. This should only affect
swapping to a vnode, as opposed to a geom whose close always sets it to
NULL with sw_dev_mtx held.

Differential Review: https://reviews.freebsd.org/D3547
2015-09-08 17:47:56 +00:00
alc
499ef76819 Eliminate pointless assignments to rtvals[] in swap_pager_putpages().
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-08-21 17:00:39 +00:00
jeff
3fb666cfae Refactor unmapped buffer address handling.
- Use pointer assignment rather than a combination of pointers and
   flags to switch buffers between unmapped and mapped.  This eliminates
   multiple flags and generally simplifies the logic.
 - Eliminate b_saveaddr since it is only used with pager bufs which have
   their b_data re-initialized on each allocation.
 - Gather up some convenience routines in the buffer cache for
   manipulating buf space and buf malloc space.
 - Add an inline, buf_mapped(), to standardize checks around unmapped
   buffers.

In collaboration with: mlaier
Reviewed by:	kib
Tested by:	pho (many small revisions ago)
Sponsored by:	EMC / Isilon Storage Division
2015-07-23 19:13:41 +00:00
glebius
5b81a20433 o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async().
o Provide an extensive set of assertions for input array of pages.
o Remove now duplicate assertions from different pagers.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-06-17 22:44:27 +00:00
mjg
d7bc9285a6 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
jhb
0bf260e595 Place VM objects on the object list when created and never remove them.
This is ok since objects come from a NOFREE zone and allows objects to
be locked while traversing the object list without triggering a LOR.

Ensure that objects on the list are marked DEAD while free or stillborn,
and that they have a refcount of zero.  This required updating most of
the pagers to explicitly mark an object as dead when deallocating it.
(Only the vnode pager did this previously.)

Differential Revision:	https://reviews.freebsd.org/D2423
Reviewed by:	alc, kib (earlier version)
MFC after:	2 weeks
Sponsored by:	Norse Corp, Inc.
2015-05-08 19:43:37 +00:00
glebius
0a56d25a94 Instead of reading, validating and adjusting value of the vm.swap_async_max
in the main swapper work cycle, do it in the sysctl handler.  This removes
extra mutex acquisition from the main cycle and makes the sysctl knob return
error on an invalid value, instead of accepting and fixing it.

Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-05-02 20:27:37 +00:00
trasz
802017a04b Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
mav
2e38078077 Remove sleeps from geom_up thread on device destruction.
MFC after:	3 days.
2015-04-09 13:09:05 +00:00
mav
ee2fe1ad5c Make swapper release orphaned (lost) GEOM provider.
Swap device is still reported as enabled, and system still may crash later
if some swapped-out kernel pages were lost with the device, but at least
GEOM and CAM can now release the lost disk, allowing it to be reconnected.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-03-26 17:21:12 +00:00
glebius
3f7fdcc87f \n at end of panicstr is redundant.
Submitted by:	alc
2014-11-23 18:32:21 +00:00
glebius
b4ef8e602d Merge from projects/sendfile:
o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but
  doesn't sleep. It returns immediately, and will execute the I/O done handler
  function that must be supplied as argument.
o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager.
o Extend pagertab to support pgo_getpages_async method, and implement this
  method for vnode_pager.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-23 12:01:52 +00:00
kib
938d2505e3 Fix mis-spelling of bits and types names in the
default_pager_putpages() and swap_pager_putpages().
It is the same fix as was done for vnode_pager_putpages()
in r271586.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-04 19:56:04 +00:00
des
65493094e9 Add sysctl OIDs showing the actual size and capacity of the swap zone.
MFC after:	1 week
2014-04-26 12:18:17 +00:00
bdrewery
6fcf6199a4 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
attilio
f19bbde667 vm_page_grab() and vm_pager_get_pages() can drop the vm_object lock,
then threads can sleep on the pip condition.
Avoid to deadlock such threads by correctly awakening the sleeping ones
after the pip is finished.
swapoff side of the bug can likely result in shutdown deadlocks.

Sponsored by:	EMC / Isilon Storage Division
Reported by:	pho, pluknet
Tested by:	pho
2014-03-19 01:13:42 +00:00
kib
ba12eedccd Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).
The flag was mandatory since r209792, where vm_page_grab(9) was
changed to only support the alloc retry semantic.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-22 07:39:53 +00:00
attilio
16c7563cf4 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
kib
ff1a2e73b1 When swap pager allocates metadata in the pagedaemon context, allow it
to drain the reserve.  This was broken in r243040, causing deadlock.
Note that VM_WAIT call in case of uma_zalloc() failure from pagedaemon
would only wait for the v_pageout_free_min anyway.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-07-11 20:33:57 +00:00
kib
70e5ada2a0 Fix typo in comment.
MFC after:	3 days
2013-07-09 13:22:30 +00:00
attilio
3b60ec551b Complete r251452:
Avoid to busy/unbusy a page in cases where there is no need to drop the
vm_obj lock, more nominally when the page is full valid after
vm_page_grab().

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-06-06 18:19:26 +00:00
attilio
4f8aa13b4a o Change the locking scheme for swp_bcount.
It can now be accessed with a write lock on the object containing it OR
  with a read lock on the object containing it along with the swhash_mtx.
o Remove some duplicate assertions for swap_pager_freespace() and
  swap_pager_unswapped() but keep the object locking references for
  documentation.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-05-28 22:07:23 +00:00
kib
2ace051956 Do not map the swap i/o pbufs if the geom provider for the swap
partition accepts unmapped requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:39:27 +00:00
attilio
74f58faa15 MFC 2013-02-26 23:52:23 +00:00
attilio
e590e8091e As VM_OBJECT_SLEEP() is a vm_object_t specific function, make
the passed object as the first argument of the function for consistency.

Sponsored by:	EMC / Isilon storage revision
2013-02-26 01:38:12 +00:00
attilio
905e648d42 Hide the details for the assertion for VM_OBJECT_LOCK operations.
Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into
VM_OBJECT_ASSERT_WLOCKED(foo)

Sponsored by:	EMC / Isilon storage division
Requested by:	alc
2013-02-21 21:54:53 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
attilio
658534ed5a Switch vm_object lock to be a rwlock.
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
  get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
  files require including directly rwlock.h
2013-02-20 10:38:34 +00:00
jh
25dd09b996 - Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by:	pjd
2012-11-20 12:32:18 +00:00
des
71dbd73468 Whitespace cleanup. 2012-09-05 12:24:50 +00:00
des
3ed9c078db No memory barrier is required. This was pointed out by kib@ a while ago,
but I got distracted by other matters.

(for real this time)
2012-09-04 22:19:33 +00:00
des
dec17a5bb5 Revert previous commit, which was performed in the wrong tree. 2012-09-04 21:06:53 +00:00
des
627c3f1a6e No memory barrier is required. This was pointed out by kib@ a while ago,
but I got distracted by other matters.
2012-09-04 19:04:02 +00:00
pluknet
91ac59768e Typo in previous change: print half the theoretical maximum as maximum
recommended amount.

Reported by:	<site freebsd at orientalsensation com>
Reviewed by:	des
2012-08-27 10:59:49 +00:00
des
5e88649166 - When running out of swzone, instead of spewing an error message every
tick until the situation is resolved (if ever), just print a single
  message when running out and another when space becomes available.

- When adding more swap, warn if the total amount exceeds half the
  theoretical maximum we can handle.
2012-08-16 08:29:49 +00:00
alc
6eeaee04e4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00