Commit Graph

519 Commits

Author SHA1 Message Date
Dimitry Andric
f74be55e30 vm: fix a number of functions to match the expected prototypes
Noticed while attempting to make boolean_t unsigned: some vm-related
function declarations and defintions were using boolean_t where they
should have used int, and vice versa.

MFC after:	1 week
Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D39753
2023-04-25 19:58:18 +02:00
Konstantin Belousov
645510e62e Provide consistent prototype for swp_pager_meta_free()
This should fix 32bit build breakage.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-12-09 17:23:09 +02:00
Konstantin Belousov
baa1ccceef Make swap_pager_freespace() global
also make it return the count of the swap pages freed, which are not
simultaneously resident in the object.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37097
2022-12-09 14:15:37 +02:00
Konstantin Belousov
5bd45b2ba3 swap_pager_find_least(): assert that the function is called on the right object type
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37024
2022-10-19 20:24:07 +03:00
Konstantin Belousov
26eed2aa06 swap_pager: style, wrap long lines
Reviewed by:	brooks, imp (previous version)
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36540
2022-09-16 23:23:13 +03:00
Konstantin Belousov
ccdaa1ab1c vm_overcommit: put into __read_mostly section
Suggested by:	mjg
Reviewed by:	brooks, imp (previous version)
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36540
2022-09-16 23:23:04 +03:00
Konstantin Belousov
a6cc4c6e98 vm: make vm.overcommit available externally
Reviewed by:	brooks, imp (previous version)
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36540
2022-09-16 23:22:49 +03:00
Alan Cox
54291f7d65 swap_pager: Reduce the scope of the object lock in putpages
We don't need to hold the object lock while allocating swap space, so
don't.

Reviewed by:	dougm, kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35839
2022-07-18 22:35:49 -05:00
Mark Johnston
fffc1c594a vm_object: Release object swap charge in the swap pager destructor
With the removal of OBJT_DEFAULT, we can simply handle this in
swap_pager_dealloc().  No functional change intended.

Suggested by:	alc
Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35787
2022-07-17 07:09:48 -04:00
Mark Johnston
cb6757c0a6 swap_pager: Removing handling for objects with OBJ_SWAP clear
With the removal of OBJT_DEFAULT, we can assume that pager operations
provide an object with OBJ_SWAP set.  Also, we do not need to convert
objects from type OBJT_DEFAULT.  Thus, remove checks for OBJ_SWAP and
remove code which modifies the object type.  In some places, replace the
check for OBJ_SWAP with a check for whether any swap blocks are
assigned.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35786
2022-07-17 07:09:48 -04:00
John Baldwin
b8ebd99aa5 vm: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:20 -07:00
Enji Cooper
567378cc07 Fix OID format for vm.swap_reserved and vm.swap_total
The correct OID format for CTLTYPE_U64 is `QU` (`uquad_t`), not `A`
(text expressed via `char *`).

This issue was noticed while doing an sysctl tree walk using a
sysctl(9) consumer that relies on the OID format to intuit what the
type should be for a given sysctl.

MFC after:	1 month
Sponsored by:	DellEMC Isilon
Differential Revision: https://reviews.freebsd.org/D34877
2022-04-10 18:17:09 -07:00
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Mark Johnston
43b3b8e52d swap_pager: uma_zcreate() doesn't fail
Remove always-false checks for UMA zone creation failure.  No functional
change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33809
2022-01-11 09:27:45 -05:00
Konstantin Belousov
5346570276 swapoff: add one more variant of the syscall
Requested and reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:46 +02:00
Konstantin Belousov
e8dc2ba29c swapoff(2): add a SWAPOFF_FORCE flag
The flag requests skipping the heuristic which tries to avoid leaving
system with more allocated memory than available from RAM and remanining
swap.

Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Konstantin Belousov
a4e4132fa3 swapoff(2): replace special device name argument with a structure
For compatibility, add a placeholder pointer to the start of the
added struct swapoff_new_args, and use it to distinguish old vs. new
style of syscall invocation.

Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Konstantin Belousov
6df359449f swap_pager.c: Remove MPSAFE and ARGSUSED annotations
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Konstantin Belousov
0190c38b9d swapoff_one(): only check free pages count manually turning swap off
When swap is turned off due to system shutdown or reboot, ignore the
check.  Problem is that the check is not accurate by any means, free
page count can legitimately be low while system still able to page in
everything from the swap.  Then, we turn swap off if swapping on
real file or some non-standard geom provider, and typically panic
when system appears to actually need to unavailable page.

For syscall, it is better to be safe than sorry.

Reported and tested by:	peterj
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33147
2021-11-29 18:38:02 +02:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Konstantin Belousov
b19740f4ce swap_pager: lock vnode in swapdev_strategy()
VOP_STRATEGY() requires locked vnode.  Note that we lock the swap vnode
while pages are busy, but this would only cause real LoR if pages belong
to the swap vnode, which must not be the case for correct use.

Reported and tested by:	peterj
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33119
2021-11-25 21:34:50 +02:00
Konstantin Belousov
6ddf41faa6 swapon: extend the region where the swap vnode is locked
to cover VOP_GETATTR() call in sys_swapon().  Move locking from inside
swapongeom() and swaponvp() into sys_swapon().

Reported by and tested by:	peterj
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33119
2021-11-25 21:34:44 +02:00
Konstantin Belousov
a6d04f34a4 swap pager: lock vnode around VOP_CLOSE()
Reported and tested by:	peterj
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33119
2021-11-25 21:34:39 +02:00
Gleb Smirnoff
183f8e1e57 Externalize nsw_cluster_max and initialize it early.
GEOM_ELI needs to know the value, cause it will soon have special
memory handling for IO operations associated with swap.

Move initialization to swap_pager_init(), which is executed at
SI_SUB_VM, unlike swap_pager_swap_init(), which would be executed
only when a swap is configured. GEOM_ELI might need the value at
SI_SUB_DRIVERS, when disks are tasted by GEOM.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:52 -07:00
Gleb Smirnoff
c6213beff4 Add flag BIO_SWAP to mark IOs that are associated with swap.
Submitted by:		jtl
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D24400
2021-09-28 11:23:51 -07:00
Mark Johnston
686aa9287c swap_pager: Handle large swap_pager_reserve() requests
This interface is used solely by md(4) when the MD_RESERVE flag is
specified, as in `mdconfig -a -t swap -s 1G -o reserve`.  It
pre-allocates swap blocks for the entire object.

The number of blocks to be reserved is specified as a vm_size_t, but
swp_pager_getswapspace() can allocate at most INT_MAX blocks.  vm_size_t
also seems like the incorrect type to use here it refers only to the
size of the VM object, not the size of a mapping.  So:
- change the type of "size" in swap_pager_reserve() to vm_pindex_t, and
- clamp the requested number of blocks for a single
  swp_pager_getswapspace() call to INT_MAX.

Reported by:	syzkaller
Reviewed by:	dougm, alc, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31875
2021-09-07 14:04:50 -04:00
Konstantin Belousov
28bc23ab92 tmpfs: dynamically register tmpfs pager
Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:13:34 +03:00
Konstantin Belousov
00a3fe968b vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Mark Johnston
06d1fd9f42 swap_pager: Zero swap info before exporting to userspace
Otherwise padding bytes are leaked.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-12 12:52:05 -04:00
Konstantin Belousov
d474440ab3 Constify vm_pager-related virtual tables.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
4b8365d752 Add OBJT_SWAP_TMPFS pager
This is OBJT_SWAP pager, specialized for tmpfs.  Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways.  Replace (almost) all such places with
proper methods.

Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.

This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
a7c198a24b Implement vm_object_vnode() using vm_pager_getvp()
Allow vp_heldp argument to be NULL, in which case the returned vnode
is not held for tmpfs swap objects.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
1390a5cbeb Add pgo_freespace method
Makes the code in vm_object collapse/page_remove cleaner

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
192112b74f Add pgo_getvp method
This eliminates the staircase of conditions in vm_map_entry_set_vnode_text().

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
c23c555bc1 Add pgo_mightbedirty method
Used to implement vm_object_mightbedirty()

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
180bcaa46c vm_pager: add pgo_set_writeable_dirty method
specialized for swap and vnode pagers, and used to implement
vm_object_set_writeable_dirty().

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov
a0850dd057 swappagerops: slightly more style-compliant formatting
Remove excess spaces from comments.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:02 +03:00
Konstantin Belousov
ecfbddf0cd sysctl vm.objects: report backing object and swap use
For anonymous objects, provide a handle kvo_me naming the object,
and report the handle of the backing object.  This allows userspace
to deconstruct the shadow chain.  Right now the handle is the address
of the object in KVA, but this is not guaranteed.

For the same anonymous objects, report the swap space used for actually
swapped out pages, in kvo_swapped field.  I do not believe that it is
useful to report full 64bit counter there, so only uint32_t value is
returned, clamped to the max.

For kinfo_vmentry, report anonymous object handle backing the entry,
so that the shadow chain for the specific mapping can be deconstructed.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29771
2021-04-19 21:32:01 +03:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Mateusz Guzik
c3aa3bf97c vm: clean up empty lines in .c and .h files 2020-09-01 21:20:45 +00:00
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Doug Moore
00fd73d2da Fix an overflow bug in the blist allocator that needlessly capped max
swap size by dividing a value, which was always a multiple of 64, by
64.  Remove the code that reduced max swap size down to that cap.

Eliminate the distinction between BLIST_BMAP_RADIX and
BLIST_META_RADIX.  Call them both BLIST_RADIX.

Make improvments to the blist self-test code to silence compiler
warnings and to test larger blists.

Reported by:	jmallett
Reviewed by:	alc
Discussed with:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D25736
2020-07-25 18:29:10 +00:00
Mateusz Guzik
ee74412269 vm: fix swap reservation leak and clean up surrounding code
The code did not subtract from the global counter if per-uid reservation
failed.

Cleanup highlights:
- load overcommit once
- move per-uid manipulation to dedicated routines
- don't fetch wire count if requested size is below the limit
- convert return type from int to bool
- ifdef the routines with _KERNEL to keep vm.h compilable by userspace

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25787
2020-07-24 13:23:32 +00:00
Mateusz Guzik
126a2470b9 vm: annotate swap_reserved with __exclusive_cache_line
The counter keeps being updated all the time and variables read afterwards
share the cacheline. Note this still fundamentally does not scale and needs
to be replaced, in the meantime gets a bandaid.

brk1_processes -t 52 ops/s:
before: 8598298
after:  9098080
2020-07-23 08:42:16 +00:00
Mateusz Guzik
7ce3a31286 vm: rework swap_pager_status to execute in constant time
The lock-protected iteration is trivially avoidable.

This removes a serialisation point from Linux binaries (which end up calling
here from the sysinfo syscall).
2020-06-09 14:16:18 +00:00
Mark Johnston
d869a17e62 Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things.
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23978
2020-03-06 19:10:00 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Doug Moore
36b01270d1 The last argument to swp_pager_getswapspace is always 1. Remove that argument.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23810
2020-02-24 04:01:09 +00:00
Mark Johnston
7ca5539285 Allow swap_pager_putpages() to allocate one block at a time.
The minimum allocation size of 4 blocks is an old policy that came with
the "new" swap pager in r42957.  Since then the blist allocator has
gotten better at reducing fragmentation; for example, with r349777 it
can return a range that spans multiple leaves.  When swap space is close
to being exhaused, the minimum of 4 blocks most likely exacerbates
memory pressure, so reduce it to 1.

Reported by:	alc
Tested by:	pho
Reviewed by:	alc, dougm, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23763
2020-02-23 17:59:51 +00:00
Jeff Roberson
6c5f36ff30 Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in
virtual address or physical page allocation need to be marked with this
flag.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23712
2020-02-19 08:17:27 +00:00