Commit Graph

415 Commits

Author SHA1 Message Date
Konstantin Belousov
53faf5a7d4 Evaluate the real size of the sblk_zone.
Submitted by:	ota@j.email.ne.jp
PR:	221356
Reviewed by:	alc, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12660
2017-10-13 16:23:05 +00:00
Alan Cox
37244a84fd Replace an unnecessary call to vm_page_activate() by an assertion that
the page is already wired or queued.  Prior to the elimination of PG_CACHED
pages, vm_page_grab() might have returned a valid, previously PG_CACHED
page, in which case enqueueing the page was necessary.  Now, that can't
happen.  Moreover, activating the page is a dubious choice, since the page
is not being accessed.

Reviewed by:	kib
MFC after:	1 week
2017-10-08 16:54:42 +00:00
Alan Cox
41e5a22698 When an I/O error occurs on page out, there is no need to dirty the page,
because it is already dirty.  Instead, assert that the page is dirty.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-01 17:04:26 +00:00
Alan Cox
d027ed2e7a To analyze the allocation of swap blocks by blist functions, add a method
for analyzing the radix tree structures and reporting on the number, and
sizes, of maximal intervals of free blocks.  The report includes the number
of maximal intervals, and also the number of them in each of several size
ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size
boundaries defined by Fibonacci numbers.  The report is written in the test
tool with the 's' command, or in a running kernel by sysctl.

The analysis of the radix tree frequently computes the position of the lone
bit set in a u_daddr_t, a computation that also appears in leaf allocation.
That computation has been moved into a function of its own, and optimized
for cases where an inlined machine instruction can replace the usual binary
search.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11906
2017-09-10 17:46:03 +00:00
Konstantin Belousov
85d88d8799 Do not leak empty swblk.
In swp_pager_meta_build(), if the requested operation results in
freeing the last swap pointer in the swblk, free the trie node.  Other
swap pager code does not expect to find completely empty swblk.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 16:18:53 +00:00
Konstantin Belousov
eed99cb81b In swp_pager_meta_build(), handle a race with other thread allocating
swapblk for our index while we dropped the object lock.

Noted by:	jeff
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 16:16:11 +00:00
Konstantin Belousov
35872e79b7 Adjust interface of swapon_check_swzone() to its actual usage.
The function return value is not used.  Its argument is always
swap_total/PAGE_SIZE, so make it not take any arguments.

Submitted by:	ota@j.email.ne.jp
PR:	221356
MFC after:	1 week
2017-08-30 10:17:00 +00:00
Konstantin Belousov
f08b30995a Make the swap_pager_full variable static.
r290920 removed the use of the variable from vm/vm_pageout.c.

Submitted by:	ota@j.email.ne.jp
PR:	221356
MFC after:	1 week
2017-08-30 09:44:05 +00:00
Alan Cox
ee620ea47d Update a couple vm_object lock assertions in the swap pager to reflect the
new use of the vm_object's lock to synchronize updates to a radix trie
mapping per-vm object page indices to on-disk swap blocks.

Fix a typo in a nearby comment.

Reviewed by:	kib, markj
X-MFC with:	r322913
Differential Revision:	https://reviews.freebsd.org/D12134
2017-08-28 17:02:25 +00:00
Konstantin Belousov
f425ab8e50 Replace global swhash in swap pager with per-object trie to track swap
blocks assigned to the object pages.

- The global swhash_mtx is removed, trie is synchronized by the
  corresponding object lock.
- The swp_pager_meta_free_all() function used during object
  termination is optimized by only looking at the trie instead of
  having to search whole hash for the swap blocks owned by the object.
- On swap_pager_swapoff(), instead of iterating over the swhash,
  global object list have to be inspected. There, we have to ensure
  that we do see valid trie content if we see that the object type is
  swap.
Sizing of the swblk zone is same as for swblock zone, each swblk maps
SWAP_META_PAGES pages.

Proposed by:	alc
Reviewed by:	alc, markj (previous version)
Tested by:	alc, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D11435
2017-08-25 23:13:21 +00:00
Konstantin Belousov
9680bb9877 Remove unused function swap_pager_isswapped().
Noted by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-19 17:28:46 +00:00
Alan Cox
e22415906d Increase the pageout cluster size to 32 pages.
Decouple the pageout cluster size from the size of the hash table entry
used by the swap pager for mapping (object, pindex) to a block on the
swap device(s), and keep the size of a hash table entry at its current
size.

Eliminate a pointless macro.

Reviewed by:	kib, markj (an earlier version)
MFC after:	4 weeks
Differential Revision:	https://reviews.freebsd.org/D11305
2017-06-24 17:10:33 +00:00
Alan Cox
3a5d839ebc Eliminate an unused macro.
MFC after:	3 days
2017-06-21 03:55:45 +00:00
Alan Cox
87b0ab69a9 Pages that are passed to swap_pager_putpages() should already be fully
dirty.  Assert that they are fully dirty rather than redundantly calling
vm_page_dirty() on them.

Reviewed by:	kib, markj
MFC after:	1 week
X-MFC after:	r319932
2017-06-17 03:05:25 +00:00
Alan Cox
761097c85e Starting in r118390, swaponsomething() began to reserve the blocks at the
beginning of a swap area for a disk label.  However, neither r118390 nor
r118544, which increased the reservation from one to two blocks, correctly
accounted for these blocks when updating the variable "swap_pager_avail".
This change corrects that error.

Reviewed by:	kib
MFC after:	5 days
2017-06-06 16:52:07 +00:00
Alan Cox
03bdd65f18 When the function blist_fill() was added to the kernel in r107913, the swap
pager used a different scheme for striping the allocation of swap space
across multiple devices.  And, although blist_fill() was intended to support
fill operations with large counts, the old striping scheme never performed a
fill larger than the stripe size.  Consequently, the misplacement of a
sanity check in blst_meta_fill() went undetected.  Now, moving forward in
time to r118390, a new scheme for striping was introduced that maintained a
blist allocator per device, but as noted in r318995, swapoff_one() was not
fully and correctly converted to the new scheme.  This change completes what
was started in r318995 by fixing the underlying bug in blst_meta_fill() that
stops swapoff_one() from simply performing a single blist_fill() operation.

Reviewed by:	kib
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D11043
2017-06-06 03:32:17 +00:00
Alan Cox
064650c180 Halve the memory being internally allocated by the blist allocator. In
short, half of the memory that is allocated to implement the radix tree is
wasted because we did not change "u_daddr_t" to be a 64-bit unsigned int
when we changed "daddr_t" to be a 64-bit (signed) int.  (See r96849 and
r96851.)

Reviewed by:	kib, markj
Tested by:	pho
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D11028
2017-06-05 17:14:16 +00:00
Alan Cox
07c348ea7b After r118390, the variable "dmmax" was neither the correct strip size
nor the correct maximum block size.  Moreover, after r318995, it serves
no purpose except to provide information to user space through a read-
sysctl.

This change eliminates the variable "dmmax" but retains the sysctl.  It
also corrects the value returned by the sysctl.

Reviewed by:	kib, markj
MFC after:	3 days
2017-05-27 21:46:00 +00:00
Alan Cox
fe71561af2 In r118390, the swap pager's approach to striping swap allocation over
multiple devices was changed.  However, swapoff_one() was not fully and
correctly converted.  In particular, with r118390's introduction of a per-
device blist, the maximum swap block size, "dmmax", became irrelevant to
swapoff_one()'s operation.  Moreover, swapoff_one() was performing out-of-
range operations on the per-device blist that were silently ignored by
blist_fill().

This change corrects both of these problems with swapoff_one(), which will
allow us to potentially increase MAX_PAGEOUT_CLUSTER.  Previously,
swapoff_one() would panic inside of blist_fill() if you increased
MAX_PAGEOUT_CLUSTER.

Reviewed by:	kib, markj
MFC after:	3 days
2017-05-27 16:40:00 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Mark Johnston
b1fd102ee7 Add a page queue for holding dirty anonymous unswappable pages.
On systems without a configured swap device, an attempt to launder pages
from a swap object will always fail and result in the page being
reactivated. This means that the page daemon will continuously scan pages
that can never be evicted. With this change, anonymous pages are instead
moved to PQ_UNSWAPPABLE after a failed laundering attempt when no swap
devices are configured. PQ_UNSWAPPABLE is not scanned unless a swap device
is configured, so unreferenced unswappable pages are excluded from the page
daemon's workload.

Reviewed by:	alc
2017-01-03 00:05:44 +00:00
Konstantin Belousov
2e56b64fa4 Fix argument type and microoptimize swp_pager_meta_free().
The count argument natural type if vm_pindex_t, but due to the loop
organization, it has to be signed type to detect the termination
condition.  Replace this logic by using distinguished counter for the
processed pages, and terminate loop when the counter exceeds the
argument.

Completely process one swblock for all relevant indexes instead of
doing relookup in hash when incrementing page index on the loop step.

Do not drop hash mutex around iterations.

Noted and reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-12-24 09:57:31 +00:00
Konstantin Belousov
77d6fd97ef Improve vm_object_scan_all_shadowed() to also check swap backing objects.
As noted in the removed comment, it is possible and not prohibitively
costly to look up the swap blocks for the given page index.  Implement
a swap_pager_find_least() function to do that, and use it to iterate
simultaneously over both backing object page queue and swap
allocations when looking for shadowed pages.

Testing shows that number of new succesful scans, enabled by this
addition, is small but non-zero.  When worked out, the change both
further reduces the depth of the shadow object chain, and frees unused
but allocated swap and memory.

Suggested and reviewed by:	alc
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-12-18 20:56:14 +00:00
Konstantin Belousov
71057cd207 In swp_pager_meta_free_all(), fix type of the index variable. Style.
Noted and reviewed by:	alc (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-16 23:33:37 +00:00
Alan Cox
bba39b9ae3 Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used.  More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8583
2016-11-22 18:13:46 +00:00
Alan Cox
7667839a7e Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
Alan Cox
ebcddc7217 Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
pages, specificially, dirty pages that have passed once through the inactive
queue.  A new, dedicated thread is responsible for both deciding when to
launder pages and actually laundering them.  The new policy uses the
relative sizes of the inactive and laundry queues to determine whether to
launder pages at a given point in time.  In general, this leads to more
intelligent swapping behavior, since the laundry thread will avoid pageouts
when the marginal benefit of doing so is low.  Previously, without a
dedicated queue for dirty pages, the page daemon didn't have the information
to determine whether pageout provides any benefit to the system.  Thus, the
previous policy often resulted in small but steadily increasing amounts of
swap usage when the system is under memory pressure, even when the inactive
queue consisted mostly of clean pages.  This change addresses that issue,
and also paves the way for some future virtual memory system improvements by
removing the last source of object-cached clean pages, i.e., PG_CACHE pages.

The new laundry thread sleeps while waiting for a request from the page
daemon thread(s).  A request is raised by setting the variable
vm_laundry_request and waking the laundry thread.  We request launderings
for two reasons: to try and balance the inactive and laundry queue sizes
("background laundering"), and to quickly make up for a shortage of free
pages and clean inactive pages ("shortfall laundering").  When background
laundering is requested, the laundry thread computes the number of page
daemon wakeups that have taken place since the last laundering.  If this
number is large enough relative to the ratio of the laundry and (global)
inactive queue sizes, we will launder vm_background_launder_target pages at
vm_background_launder_rate KB/s.  Otherwise, the laundry thread goes back
to sleep without doing any work.  When scanning the laundry queue during
background laundering, reactivated pages are counted towards the laundry
thread's target.

In contrast, shortfall laundering is requested when an inactive queue scan
fails to meet its target.  In this case, the laundry thread attempts to
launder enough pages to meet v_free_target within 0.5s, which is the
inactive queue scan period.

A laundry request can be latched while another is currently being
serviced.  In particular, a shortfall request will immediately preempt a
background laundering.

This change also redefines the meaning of vm_cnt.v_reactivated and removes
the functions vm_page_cache() and vm_page_try_to_cache().  The new meaning
of vm_cnt.v_reactivated now better reflects its name.  It represents the
number of inactive or laundry pages that are returned to the active queue
on account of a reference.

In collaboration with:	markj
Reviewed by:	kib
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8302
2016-11-09 18:48:37 +00:00
Mark Johnston
dd9cb6da0b Respect the caller's hints when performing swap readahead.
The pager getpages interface allows the caller to bound the number of
readahead and readbehind pages, and vm_fault_hold() makes use of this
feature. These bounds were ignored after r305056, causing the swap pager
to potentially page in more than the specified number of pages.

Reported and reviewed by:	alc
X-MFC with:	r305056
2016-09-04 00:25:49 +00:00
Konstantin Belousov
9815066425 Make swapoff reliable.
The swap_pager_swapoff() function uses trylock for the object lock
before pagein, which means that either i/o to md(4) over swap, or
intensive page faults over swap pager objects might prevent swapoff()
from making any progress. Then the retry < 100 check fails and machine
panics.

If trylock fails, acquire the object lock in the blockable way and
restart the hash bucket walk.  Keep retries logic for now.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7688
2016-08-31 14:49:58 +00:00
Mark Johnston
915d1b71cd Restore swap pager readahead after r292373.
The removal of vm_fault_additional_pages() meant that a hard fault on
a swap-backed page would result in only that page being read in. This
change implements readahead and readbehind for the swap pager in
swap_pager_getpages(). swap_pager_haspage() is modified to return the
largest contiguous non-resident range of pages containing the requested
range.

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7677
2016-08-30 05:56:21 +00:00
Konstantin Belousov
0c657d22eb Explain why swapgeom_close_ev() is delegated.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 07:11:19 +00:00
Konstantin Belousov
88ad2d7b47 Do not delegate a work to geom event thread which can be done inline.
In particular, swapongeom_ev() needed event thread context when swap
pager configuration was performed under Giant and geom asserted that
Giant is not owned.  Now both of the reason went away.

On the other hand, note that swpageom_release() is called from the
bio_done context, and possible close cannot be performed inline.

Also fix some minor issues.  The swapgeom() function does not use the
td argument, remove it.  Recheck that the vnode passed is still VCHR
and not reclaimed after the lock.

Reviewed by:	mav
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-07-28 15:57:01 +00:00
Konstantin Belousov
2174a0c607 Fix style and typo.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-07-28 15:49:51 +00:00
Konstantin Belousov
eb4d6a1b3b Fix inconsistent locking of the swap pager named objects list.
Right now, all modifications of the list are locked by sw_alloc_mtx.
But initial lookup of the object by the handle in swap_pager_alloc()
is not protected by sw_alloc_mtx, which means that
vm_pager_object_lookup() could follow freed pointer.

Create a new named swap object with the OBJT_SWAP type, instead
of OBJT_DEFAULT.  With this change, swp_pager_meta_build() never need
to upgrade named OBJT_DEFAULT to OBJT_SWAP (in the other place, we do
not forbid for client code to create named OBJT_DEFAULT objects at
all).

That change allows to remove sw_alloc_mtx and make the list locked by
sw_alloc_sx lock.  Update swap_pager_copy() to new locking mode.

Create helper swap_pager_alloc_init() to consolidate named and
anonymous swap objects creation, while a caller ensures that the
neccesary locks are held around the helper.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (hrs)
2016-06-13 03:42:46 +00:00
Konstantin Belousov
1571927369 Explicitely initialize sw_alloc_sx. Currently it is not initialized
but works due to zeroed out bss on startup.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (hrs)
2016-06-13 03:39:16 +00:00
Konstantin Belousov
9a2047083f Remove Giant around allocation of the swap pager with non-NULL handle.
Existing issue of not protecting pager_object_list iteration in
vm_pager_object_lookup() by sw_alloc_mtx is not affected by Giant
removal.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2016-05-24 10:16:03 +00:00
Konstantin Belousov
4c36e917b2 Mark swap-related proc sysctls as not requiring Giant.
Reviewed by:	alc (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
2016-05-22 23:28:23 +00:00
Konstantin Belousov
04533e1ef7 Replace hand-made exclusive lock, protecting against parallel
swapon/swapoff invocations, with sx.

Reviewed by:	alc (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
2016-05-22 23:25:01 +00:00
Pedro F. Giffuni
763df3ec55 sys/vm: minor spelling fixes in comments.
No functional change.
2016-05-02 20:16:29 +00:00
Gleb Smirnoff
b0cd20172d A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
Warner Losh
d635a37ffa Mark swap_pager_putpages static at its definition. It was already
static at its declaration. Remove needless swapdev_strategy forward
declaration.

MFC After: 3 days
2015-10-05 21:29:17 +00:00
Warner Losh
9e3e3fe5b3 The swap pager is compatible with direct dispatch. It does its own
locking and doesn't sleep. Flag the consumer we create as such. In
addition, decrement the in flight index when we have an out of memory
error after having incremented it previously. This would have
prevented swapoff from working if the swap pager ever hit a resource
shortage trying to swap out something (the swap in path always waits
for a bio, so won't have this issue). Simplify the close logic by
abandoning the use of private and initializing the index to 1 and
dropping that reference when we previously set private.

Also, set sw_id only while sw_dev_mtx is held. This should only affect
swapping to a vnode, as opposed to a geom whose close always sets it to
NULL with sw_dev_mtx held.

Differential Review: https://reviews.freebsd.org/D3547
2015-09-08 17:47:56 +00:00
Alan Cox
77923df2c1 Eliminate pointless assignments to rtvals[] in swap_pager_putpages().
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-08-21 17:00:39 +00:00
Jeff Roberson
fade8dd714 Refactor unmapped buffer address handling.
- Use pointer assignment rather than a combination of pointers and
   flags to switch buffers between unmapped and mapped.  This eliminates
   multiple flags and generally simplifies the logic.
 - Eliminate b_saveaddr since it is only used with pager bufs which have
   their b_data re-initialized on each allocation.
 - Gather up some convenience routines in the buffer cache for
   manipulating buf space and buf malloc space.
 - Add an inline, buf_mapped(), to standardize checks around unmapped
   buffers.

In collaboration with: mlaier
Reviewed by:	kib
Tested by:	pho (many small revisions ago)
Sponsored by:	EMC / Isilon Storage Division
2015-07-23 19:13:41 +00:00
Gleb Smirnoff
093ebe1d28 o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async().
o Provide an extensive set of assertions for input array of pages.
o Remove now duplicate assertions from different pagers.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-06-17 22:44:27 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
John Baldwin
e735691b61 Place VM objects on the object list when created and never remove them.
This is ok since objects come from a NOFREE zone and allows objects to
be locked while traversing the object list without triggering a LOR.

Ensure that objects on the list are marked DEAD while free or stillborn,
and that they have a refcount of zero.  This required updating most of
the pagers to explicitly mark an object as dead when deallocating it.
(Only the vnode pager did this previously.)

Differential Revision:	https://reviews.freebsd.org/D2423
Reviewed by:	alc, kib (earlier version)
MFC after:	2 weeks
Sponsored by:	Norse Corp, Inc.
2015-05-08 19:43:37 +00:00
Gleb Smirnoff
89c241d1a6 Instead of reading, validating and adjusting value of the vm.swap_async_max
in the main swapper work cycle, do it in the sysctl handler.  This removes
extra mutex acquisition from the main cycle and makes the sysctl knob return
error on an invalid value, instead of accepting and fixing it.

Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-05-02 20:27:37 +00:00
Edward Tomasz Napierala
4b5c9cf62f Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00