freebsd-dev/sys/kern
Mark Johnston fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
..
bus_if.m Add necessary bits for Linux KPI to work correctly on powerpc 2019-08-04 19:28:10 +00:00
capabilities.conf Add sysctlbyname system call 2019-09-03 04:16:30 +00:00
clock_if.m
cpufreq_if.m
device_if.m
genassym.sh
genoffset.c With epoch not inlined, there is no point in using _lite KPI. While here, 2018-11-13 23:45:38 +00:00
genoffset.sh expose thread_lite definition to tied modules 2018-07-03 02:50:07 +00:00
imgact_aout.c Switch to use shared vnode locks for text files during image activation. 2019-05-05 11:20:43 +00:00
imgact_binmisc.c Add helper functions to copy strings into struct image_args. 2018-11-29 21:00:56 +00:00
imgact_elf32.c
imgact_elf64.c
imgact_elf.c When loading ELF interpreter, initialize whole nested image_params with zero. 2019-09-07 16:03:26 +00:00
imgact_shell.c Add helper functions to copy strings into struct image_args. 2018-11-29 21:00:56 +00:00
init_main.c Update comment explaining create_init(). 2019-08-08 16:42:53 +00:00
init_sysent.c Add sysctlbyname system call 2019-09-03 04:16:30 +00:00
kern_acct.c
kern_alq.c
kern_clock.c Update td_runtime of running thread on each statclock(). 2019-06-14 01:09:10 +00:00
kern_clocksource.c Rename hardclock_cnt() to hardclock() and remove the old implementation. 2018-09-06 02:10:59 +00:00
kern_condvar.c
kern_conf.c Bump SPECNAMELEN to MAXNAMLEN. 2019-01-27 00:46:06 +00:00
kern_cons.c Replace ttyprintf with sbuf_printf and tty drain routine 2018-10-20 18:31:36 +00:00
kern_context.c Remove superfluous bzero in getcontext/swapcontext/sendsig 2018-11-26 20:56:05 +00:00
kern_cpu.c kern_cpu: When adding abs frequency allow for unordered insertion 2018-07-19 11:28:14 +00:00
kern_cpuset.c Bump up the low range of cpuset numbers to account for the kernel cpuset. 2019-09-05 17:48:39 +00:00
kern_ctf.c Convert DDB_CTF to use newer version of ZLIB. 2019-08-08 07:27:49 +00:00
kern_descrip.c Check and avoid overflow when incrementing fp->f_count in 2019-07-21 15:07:12 +00:00
kern_dtrace.c
kern_dump.c Move phys_avail definition into MI code. It is consumed in the MI layer and 2019-08-16 00:45:14 +00:00
kern_environment.c Fix zapping of static hints and env in init_static_kenv(). Environments 2019-02-05 15:34:55 +00:00
kern_et.c
kern_event.c Avoid relying on header pollution from sys/refcount.h. 2019-07-29 20:26:01 +00:00
kern_exec.c Change synchonization rules for vm_page reference counting. 2019-09-09 21:32:42 +00:00
kern_exit.c proc: clear pid bitmap entry after dropping proctree lock 2019-09-02 12:46:43 +00:00
kern_fail.c
kern_ffclock.c
kern_fork.c Add procctl(PROC_STACKGAP_CTL) 2019-09-03 18:56:25 +00:00
kern_hhook.c
kern_idle.c
kern_intr.c Disable intr_storm_threshold mechanism by default 2019-05-24 22:33:14 +00:00
kern_jail.c Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last 2018-12-07 16:11:45 +00:00
kern_kcov.c Change synchonization rules for vm_page reference counting. 2019-09-09 21:32:42 +00:00
kern_khelp.c
kern_kthread.c proc: always store parent pid in p_oppid 2018-11-16 17:07:54 +00:00
kern_ktr.c Drop "All rights reserved" from my copyright statements. 2019-03-06 22:11:45 +00:00
kern_ktrace.c
kern_linker.c Add flags variants to linker_files / stack(9) symbol resolution 2018-10-20 18:08:43 +00:00
kern_lock.c Add lockmgr(9) probes to the lockstat DTrace provider. 2019-08-21 23:43:58 +00:00
kern_lockf.c lockf: annotate LOCKF_DEBUG only var 2018-05-19 05:04:38 +00:00
kern_lockstat.c Add lockmgr(9) probes to the lockstat DTrace provider. 2019-08-21 23:43:58 +00:00
kern_loginclass.c Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last 2018-12-07 16:11:45 +00:00
kern_malloc.c Handle overflow in calculating max kmem size. 2019-01-14 07:31:19 +00:00
kern_mbuf.c Extend uma_reclaim() to permit different reclamation targets. 2019-09-01 22:22:43 +00:00
kern_mib.c Add missing include of sys/boot.h 2019-06-24 20:52:21 +00:00
kern_module.c Use NULL for SYSINIT's last arg, which is a pointer type 2018-05-18 17:58:09 +00:00
kern_mtxpool.c Increase MTX_POOL_SLEEP_SIZE from 128 to 1024. 2018-12-24 23:52:35 +00:00
kern_mutex.c INVARIANTS: treat LA_LOCKED as the same of LA_XLOCKED in mtx_assert. 2019-08-23 06:39:40 +00:00
kern_ntptime.c Clear pad bytes in the struct exported by kern.ntp_pll.gettime. 2018-11-20 20:32:10 +00:00
kern_osd.c
kern_physio.c Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. 2019-01-15 01:02:16 +00:00
kern_pmc.c Add malloc_domainset(9) and _domainset variants to other allocator KPIs. 2018-10-30 18:26:34 +00:00
kern_poll.c
kern_priv.c Check for probes enabled in priv_check_cred before evaluting the error. 2018-12-19 23:28:29 +00:00
kern_proc.c proc: eliminate the zombproc list 2019-08-28 16:18:23 +00:00
kern_procctl.c Add procctl(PROC_STACKGAP_CTL) 2019-09-03 18:56:25 +00:00
kern_prot.c Only enable COMPAT_43 changes for syscalls ABI for a.out processes. 2019-08-11 19:16:07 +00:00
kern_racct.c proc: eliminate the zombproc list 2019-08-28 16:18:23 +00:00
kern_rangelock.c Add non-blocking trylock variants for the rangelock functions. 2019-06-27 23:10:40 +00:00
kern_rctl.c
kern_resource.c Fix a typo introduced in r344133 2019-03-18 12:41:42 +00:00
kern_rmlock.c Make no assertions about lock state when the scheduler is stopped. 2018-11-13 20:48:05 +00:00
kern_rwlock.c Drop "All rights reserved" from my copyright statements. 2019-03-06 22:11:45 +00:00
kern_sdt.c
kern_sema.c
kern_sendfile.c Change synchonization rules for vm_page reference counting. 2019-09-09 21:32:42 +00:00
kern_sharedpage.c
kern_shutdown.c shutdown_halt: make sure that watchdog timer is stopped 2019-09-04 13:26:59 +00:00
kern_sig.c Only enable COMPAT_43 changes for syscalls ABI for a.out processes. 2019-08-11 19:16:07 +00:00
kern_switch.c make critical_{enter, exit} inline 2018-07-03 01:55:09 +00:00
kern_sx.c sx: retire SX_NOADAPTIVE 2018-12-05 16:43:03 +00:00
kern_synch.c Add a blocking wait bit to refcount. This allows refs to be used as a simple 2019-08-18 11:43:58 +00:00
kern_syscalls.c fix a typo resulting in a wrong variable in kern_syscall_deregister 2018-08-02 09:41:55 +00:00
kern_sysctl.c Add sysctlbyname system call 2019-09-03 04:16:30 +00:00
kern_tc.c Initialize timehands linkage much earlier. 2019-09-09 12:42:48 +00:00
kern_thr.c Don't acquire evclass_lock with a spinlock held 2018-07-11 19:38:42 +00:00
kern_thread.c Always set td_errno to the error value of a system call. 2019-07-15 21:16:01 +00:00
kern_time.c Disallow excessively small times of day in clock_settime(2). 2019-05-03 21:26:44 +00:00
kern_timeout.c Save the last callout function executed on each CPU 2019-07-03 19:22:44 +00:00
kern_tslog.c
kern_ubsan.c Teach the kernel KUBSAN runtime about alignment_assumption 2019-05-28 09:12:15 +00:00
kern_umtx.c Make umtxq_check_susp() to correctly handle thread exit requests. 2019-08-01 14:34:27 +00:00
kern_uuid.c Use sbuf_cat() in GEOM confxml generation. 2019-06-19 15:36:02 +00:00
kern_xxx.c Normalize COMPAT_43 syscall declarations. 2018-12-04 16:48:47 +00:00
ksched.c
link_elf_obj.c Convert DDB_CTF to use newer version of ZLIB. 2019-08-08 07:27:49 +00:00
link_elf.c Convert DDB_CTF to use newer version of ZLIB. 2019-08-08 07:27:49 +00:00
linker_if.m
Make.tags.inc Remove a couple of harmless stray references to nandfs. 2019-06-25 16:39:25 +00:00
Makefile
makesyscalls.sh Reorganise conditionals to reduce duplication. 2019-08-22 10:21:07 +00:00
md4c.c
md5c.c
msi_if.m
p1003_1b.c
pic_if.m
posix4_mib.c Use NULL for SYSINIT's last arg, which is a pointer type 2018-05-18 17:58:09 +00:00
sched_4bsd.c Reduce umtx-related work on exec and exit 2019-05-08 16:30:38 +00:00
sched_ule.c Move scheduler state into the per-cpu area where it can be allocated on the 2019-08-13 04:54:02 +00:00
serdev_if.m
stack_protector.c Revert r346292 (permit_nonrandom_stackcookies) 2019-05-13 23:37:44 +00:00
subr_acl_nfs4.c Remove unused argument to priv_check_cred. 2018-12-11 19:32:16 +00:00
subr_acl_posix1e.c Remove unused argument to priv_check_cred. 2018-12-11 19:32:16 +00:00
subr_autoconf.c
subr_blist.c Address problems in blist_alloc introduced in r349777. The swap block allocator could become corrupted 2019-07-11 20:52:39 +00:00
subr_boot.c When parsing command line stuff, treat tabs and spaces the same. 2019-04-18 22:52:12 +00:00
subr_bufring.c
subr_bus_dma.c Add an external mbuf buffer type that holds multiple unmapped pages. 2019-06-29 00:48:33 +00:00
subr_bus.c Add necessary bits for Linux KPI to work correctly on powerpc 2019-08-04 19:28:10 +00:00
subr_busdma_bufalloc.c Add malloc_domainset(9) and _domainset variants to other allocator KPIs. 2018-10-30 18:26:34 +00:00
subr_capability.c kern_sendit: use pre-initialized rights 2018-05-23 01:48:09 +00:00
subr_clock.c Kill tz_minuteswest and tz_dsttime. 2019-03-12 04:49:47 +00:00
subr_compressor.c GZIO: Update to use zlib 1.2.11. 2019-08-25 07:50:44 +00:00
subr_counter.c Fix pre-SI_SUB_CPU initialization of per-CPU counters. 2018-07-10 00:18:12 +00:00
subr_coverage.c Extract the coverage sanitizer KPI to a new file. 2019-01-29 11:04:17 +00:00
subr_devmap.c Raise the size of L3 table for early devmap on arm64 2018-07-19 21:58:06 +00:00
subr_devstat.c devstat(9): Constify function parameters that can be const 2018-08-23 01:42:45 +00:00
subr_disk.c Fix botched merge with 355066 2019-03-12 05:10:41 +00:00
subr_dummy_vdso_tc.c
subr_early.c Add a file missed in r339321 2018-10-12 00:32:45 +00:00
subr_epoch.c Fix the turnstile_lock() KPI. 2019-07-24 23:04:59 +00:00
subr_eventhandler.c Include ktr.h in more compilation units 2019-05-21 20:38:48 +00:00
subr_fattime.c
subr_firmware.c
subr_gtaskqueue.c Make taskqgroup_attach{,_cpu}(9) work across architectures 2019-02-12 21:23:59 +00:00
subr_hash.c
subr_hints.c res_find: Fix fallback logic 2018-08-18 19:45:56 +00:00
subr_intr.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
subr_kdb.c Always stop the scheduler when entering kdb 2018-10-30 14:54:15 +00:00
subr_kobj.c Prevent some kobj memory allocation failures from panicking the system. 2019-01-31 22:27:39 +00:00
subr_lock.c Drop "All rights reserved" from my copyright statements. 2019-03-06 22:11:45 +00:00
subr_log.c
subr_mchain.c
subr_module.c Have preload_delete_name() free pages backing preloaded data. 2018-07-19 20:00:28 +00:00
subr_msgbuf.c msgbuf: Light detailing (const'ify and bool'itize) 2018-08-09 17:42:27 +00:00
subr_param.c riscv: restore default HZ=1000, keep QEMU at HZ=100 2019-09-07 05:13:31 +00:00
subr_pcpu.c Remove the obsolete pcpu_zone_ptr zone. 2019-08-24 00:01:19 +00:00
subr_pctrie.c kern/subr_pctrie: Fix mismatched signedness in assertion comparison 2019-04-06 21:56:24 +00:00
subr_pidctrl.c When pidctrl_daemon() is called multiple times within an interval, it 2018-06-07 07:48:50 +00:00
subr_power.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
subr_prf.c device_printf: Use sbuf for more coherent prints on SMP 2019-05-07 17:47:20 +00:00
subr_prof.c ANSIfy sys/kern 2018-06-01 13:26:45 +00:00
subr_rangeset.c Implement rangesets. 2019-02-20 09:38:19 +00:00
subr_rman.c add support for marking interrupt handlers as suspended 2018-12-17 17:11:00 +00:00
subr_rtc.c Improve error messages from clock_if.m method failures. 2018-09-02 20:17:51 +00:00
subr_sbuf.c sbuf(9): Add sbuf_nl_terminate() API 2019-08-07 19:27:14 +00:00
subr_scanf.c Add support for 'j', 't' and 'z' flags to kernel sscanf(). 2019-08-16 19:46:22 +00:00
subr_sfbuf.c
subr_sglist.c Add an external mbuf buffer type that holds multiple unmapped pages. 2019-06-29 00:48:33 +00:00
subr_sleepqueue.c Add wakeup_any(), cheaper wakeup_one() for taskqueue(9). 2019-06-20 01:15:33 +00:00
subr_smp.c x86: Implement MWAIT support for stopping a CPU 2019-05-04 20:34:26 +00:00
subr_stack.c stack(9): Drop unused API mode and comment that referenced it 2019-03-15 22:39:55 +00:00
subr_syscall.c Don't pass error from syscallenter() to syscallret(). 2019-07-15 21:25:16 +00:00
subr_taskqueue.c Add wakeup_any(), cheaper wakeup_one() for taskqueue(9). 2019-06-20 01:15:33 +00:00
subr_terminal.c teken, vt(4): New callbacks to lock the terminal once 2018-05-16 09:01:02 +00:00
subr_trap.c assert that td_lk_slocks is not leaked upon return from kernel 2019-08-19 11:18:36 +00:00
subr_turnstile.c Fix the turnstile_lock() KPI. 2019-07-24 23:04:59 +00:00
subr_uio.c simplify control flow so that gcc knows we never pass save to curthread_pflags_restore 2018-05-19 04:04:44 +00:00
subr_unit.c Implement unr64 2018-11-20 14:58:41 +00:00
subr_vmem.c Extend uma_reclaim() to permit different reclamation targets. 2019-09-01 22:22:43 +00:00
subr_witness.c Only check the blessings table for known LORs. 2019-08-02 18:01:47 +00:00
sys_capability.c Let kern.trap_enotcap be set as a tunable. 2018-12-06 17:29:37 +00:00
sys_generic.c Check and avoid overflow when incrementing fp->f_count in 2019-07-21 15:07:12 +00:00
sys_getrandom.c
sys_pipe.c Modify pipe_poll() to properly check for pending direct writes. 2019-08-21 19:35:04 +00:00
sys_procdesc.c procdesc: fix the function name 2019-08-05 20:31:17 +00:00
sys_process.c Change synchonization rules for vm_page reference counting. 2019-09-09 21:32:42 +00:00
sys_socket.c
syscalls.c Add sysctlbyname system call 2019-09-03 04:16:30 +00:00
syscalls.master Add sysctlbyname system call 2019-09-03 04:16:30 +00:00
systrace_args.c Add sysctlbyname system call 2019-09-03 04:16:30 +00:00
sysv_ipc.c sysv: get rid of fork/exit hooks if the code is compiled in 2019-05-04 19:05:30 +00:00
sysv_msg.c ANSIfy sys/kern 2018-06-01 13:26:45 +00:00
sysv_sem.c
sysv_shm.c sysv: get rid of fork/exit hooks if the code is compiled in 2019-05-04 19:05:30 +00:00
tty_compat.c
tty_info.c Avoid fixing the tty_info() buffer size in tty.h. 2018-11-06 23:41:44 +00:00
tty_inq.c tty: use __unused annotation instead to silence warnings 2018-05-19 04:48:26 +00:00
tty_outq.c tty: use __unused annotation instead to silence warnings 2018-05-19 04:48:26 +00:00
tty_pts.c Move 32-bit compat support for FIODGNAME to the right place. 2018-10-26 17:59:25 +00:00
tty_tty.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
tty_ttydisc.c Replace ttyprintf with sbuf_printf and tty drain routine 2018-10-20 18:31:36 +00:00
tty.c Defer funsetown() calls for a TTY to tty_rel_free(). 2019-07-04 15:42:02 +00:00
uipc_accf.c
uipc_debug.c Load balance sockets with new SO_REUSEPORT_LB option. 2018-06-06 15:45:57 +00:00
uipc_domain.c
uipc_ktls.c Add kernel-side support for in-kernel TLS. 2019-08-27 00:01:56 +00:00
uipc_mbuf2.c In m_pulldown(), before trying to prepend bytes to the subsequent mbuf, 2019-08-09 05:18:59 +00:00
uipc_mbuf.c Change synchonization rules for vm_page reference counting. 2019-09-09 21:32:42 +00:00
uipc_mbufhash.c
uipc_mqueue.c mqueuefs: fix compat32 struct file leak 2019-08-20 17:44:03 +00:00
uipc_sem.c Remove unused argument to priv_check_cred. 2018-12-11 19:32:16 +00:00
uipc_shm.c Change synchonization rules for vm_page reference counting. 2019-09-09 21:32:42 +00:00
uipc_sockbuf.c Add kernel-side support for in-kernel TLS. 2019-08-27 00:01:56 +00:00
uipc_socket.c Add kernel-side support for in-kernel TLS. 2019-08-27 00:01:56 +00:00
uipc_syscalls.c Only enable COMPAT_43 changes for syscalls ABI for a.out processes. 2019-08-11 19:16:07 +00:00
uipc_usrreq.c Check and avoid overflow when incrementing fp->f_count in 2019-07-21 15:07:12 +00:00
vfs_acl.c
vfs_aio.c Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. 2019-01-15 01:02:16 +00:00
vfs_bio.c Remove long-dead BUF_ASSERT_{,UN}HELD assertions 2019-09-05 21:43:33 +00:00
vfs_cache.c vfs: implement usecount implying holdcnt 2019-09-03 15:42:11 +00:00
vfs_cluster.c Use an atomic reference count for paging in progress so that callers do not 2019-08-19 23:09:38 +00:00
vfs_default.c vfs: restore mp null check in vop_stdgetwritemount 2019-09-02 15:24:25 +00:00
vfs_export.c Ensure that directory entry padding bytes are zeroed. 2018-11-23 22:24:59 +00:00
vfs_extattr.c extattr_list_vp: Narrow locked section somewhat 2019-02-05 04:47:21 +00:00
vfs_hash.c vfs: implement usecount implying holdcnt 2019-09-03 15:42:11 +00:00
vfs_init.c Only call sigdeferstop() for NFS. 2018-10-23 21:43:41 +00:00
vfs_lookup.c NDFREE(): Fix unlocking for LOCKPARENT|LOCKLEAF and ndp->ni_dvp == ndp->ni_vp. 2019-05-21 15:12:13 +00:00
vfs_mount.c De-commision the MNTK_NOINSMNTQ kernel mount flag. 2019-08-23 19:40:10 +00:00
vfs_mountroot.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
vfs_subr.c vfs: temporarily revert r351825 2019-09-05 18:19:51 +00:00
vfs_syscalls.c vfs: stop always overwriting ->mnt_stat in VFS_STATFS 2019-08-18 18:40:12 +00:00
vfs_vnops.c vm pager: writemapping accounting for OBJT_SWAP 2019-09-03 20:31:48 +00:00
vnode_if.src vfs: add VOP_NEED_INACTIVE 2019-08-28 20:34:24 +00:00