Commit Graph

2799 Commits

Author SHA1 Message Date
Andriy Gapon
337f6465a9 document taskqueue_start_threads_in_proc
While here, fix taskqueue_start_threads_cpuset that was documented under
old name of taskqueue_start_threads_pinned.

MFC after:	4 weeks
2019-10-17 06:58:07 +00:00
Andriy Gapon
c812bea351 add superio.4 and superio.9 manual pages
This adds basic documentation on what the superio driver is and how
other drivers can interact with it.  I decided to also document
superio's ivar accessors.

Reviewed by:	bcr, brueffer (both manual contents only)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D21958
2019-10-11 11:13:47 +00:00
Andriy Gapon
93d9a79816 remove unrelated files accidentally committed in r353381 2019-10-10 07:41:42 +00:00
Andriy Gapon
d0c0856f63 emulate illumos membar_producer with atomic_thread_fence_rel
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).

We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.

Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants.  I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish.  Not sure if this
is an improvement, actually.

After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.

Discussed with:	kib
MFC after:	4 weeks
2019-10-10 07:39:41 +00:00
Mark Johnston
38a20ba16f Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:06:19 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Yuri Pankov
e3e469850c sbuf(9): fix sbuf_drain_func typedef markup
Reviewed by:	0mp (previous version)
Differential Revision:	https://reviews.freebsd.org/D21569
2019-09-16 13:10:03 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Mark Johnston
e46cfc2542 Revert a portion of r351628 that I did not mean to commit.
Reported by:	mjg
MFC with:	r351628
2019-09-03 14:39:36 +00:00
Mark Johnston
08cfa56ea3 Extend uma_reclaim() to permit different reclamation targets.
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure.  This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.

With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets.  Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure.  In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim().  When trimming, only
items in excess of the estimate are reclaimed.  Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.

Now, when under memory pressure, the page daemon will trim zones
rather than draining them.  As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.

Reviewed by:	jeff
Tested by:	pho (an earlier version)
MFC after:	2 months
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16667
2019-09-01 22:22:43 +00:00
Mark Johnston
d794b3a3c2 Update and clean up the UMA man page.
- Fix warnings from igor and mandoc.
- Provide a brief description of the separation between zones and their
  backend slab allocators.
- Document cache zones and secondary zones.
- Document the kernel config options added in r350659.
- Document the uma_zalloc_pcpu() and uma_zfree_pcpu() wrappers.
- Document uma_zone_reserve(), uma_zone_reserve_kva() and
  uma_zone_prealloc().
- Document uma_zone_alloc() and uma_zone_freef().
- Add some missing MLINKs and Xrefs.

MFC after:	2 weeks
2019-08-30 19:35:44 +00:00
John Baldwin
93a0379392 Trim a spurious blank line I added in r348969.
I did not bump .Dd since there is no content change.

MFC after:	3 days
2019-08-19 17:28:12 +00:00
Konstantin Belousov
3a91d1062a i386: Implement atomic_load_64(9) and atomic_store_64(9).
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 15:58:44 +00:00
Li-Wen Hsu
26a6feda9c Follow r350693 to add a link for sbuf_nl_terminate(9)
Sponsored by:	The FreeBSD Foundation
2019-08-08 00:51:17 +00:00
Conrad Meyer
ac03832ef3 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
Conrad Meyer
76cb1112da sbuf(9): Add sbuf_nl_terminate() API
The API is used to gracefully terminate text line(s) with a single \n.  If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it.  The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.

Reviewed by:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21030
2019-08-07 19:27:14 +00:00
Conrad Meyer
71db411eb6 sbuf(9): Add NOWAIT dynamic buffer extension mode
The goal is to avoid some kinds of low-memory deadlock when formatting
heap-allocated buffers.

Reviewed by:	vangyzen
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21015
2019-08-07 19:23:07 +00:00
Mariusz Zaborski
7244507616 seqc: add man page
Reviewed by:	markj
Earlier version reviewed by:	emaste, mjg, bcr, 0mp
Differential Revision:	https://reviews.freebsd.org/D16744
2019-07-29 21:53:02 +00:00
Benjamin Kaduk
1f0a85545e Fix grammar nit in copy_file_range docs
Bytes are countable, so we have fewer of them, not less of them.
2019-07-25 15:43:15 +00:00
Rick Macklem
4b5b98d22d Create a man page for VOP_COPY_FILE_RANGE(9).
r350315 created a Linux compatible copy_file_range(2) syscall.
It uses a VOP method called VOP_COPY_FILE_RANGE so that file systems,
such as the NFSv4.2 client can do file system specific copying.
For NFSv4.2, this allows the copying to be done locally on the NFS server,
avoiding transferring the data across the wire twice.

This is a new man page (content changed).

Reviewed by:	kib, asomers
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20584
2019-07-25 06:20:00 +00:00
Alan Somers
147b8ba829 VOP_FSYNC.9: update copyright after r345677
MFC after:	2 weeks
MFC-With:	r345677
Sponsored by:	The FreeBSD Foundation
2019-07-23 23:14:57 +00:00
Konstantin Belousov
43806bc5b2 Update refcount(9).
Describe missed functions.
Give some hint about refcount_release(9) memory ordering guarantees.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21020
2019-07-23 16:11:38 +00:00
Alan Somers
fd6659d50b VOP_PATHCONF.9: correct the type of the retval argument
It was changed from int to register_t in r22521 and from register_t to long
in r328099, but the man page wasn't updated either time.

MFC after:	2 weeks
2019-07-22 04:14:53 +00:00
Konstantin Belousov
13ff4eb1fa Switch the rest of the refcount(9) functions to bool return type.
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.

Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21014
2019-07-21 20:16:48 +00:00
Emmanuel Vadot
1d6d0a43ce pkgbase: move man pages from runtime-manual to runtime
We don't split the other man pages in their own package so do the same for runtime.

Reviewed by:	bapt, gjb
Differential Revision:	https://reviews.freebsd.org/D20962
2019-07-19 15:12:20 +00:00
Konstantin Belousov
30b3018d48 Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely.  Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison.  The primitive no longer retries
the operation if it failed spuriously.  Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by:	andrew (arm64, previous version), markj
Tested by:	pho
Reported by:	https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20772
2019-07-12 18:43:24 +00:00
Konstantin Belousov
e1be41acf7 Style: avoid long lines by using .Fo instead of .Fn.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-07-12 18:39:41 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Li-Wen Hsu
9725e74c7b Fix VOP_PUTPAGES(9) in regards to the use of VM_PAGER_CLUSTER_OK
Submitted by:	Ka Ho Ng <khng300 at gmail.com>
Reviewed by:	mckusick
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20695
2019-06-29 14:55:53 +00:00
John Baldwin
82334850ea Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages.  It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer.  This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers).  It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused.  To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability.  This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands.  For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output.  If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	ae, kp (firewalls)
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616
2019-06-29 00:48:33 +00:00
John Baldwin
773123546b Sync mbuf flags, types, and external buffer types with <sys/mbuf.h>.
Sponsored by:	Netflix
2019-06-28 19:49:47 +00:00
John Baldwin
7b29ab7e25 Use a tab after #define for EXT_* constants.
This matches other #define's in this manpage as well as <sys/mbuf.h>.

Sponsored by:	Netflix
2019-06-28 19:37:48 +00:00
Hans Petter Selasky
131b2b7658 Implement API for draining EPOCH(9) callbacks.
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.

Differential Revision: https://reviews.freebsd.org/D20109
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-28 10:38:56 +00:00
Alan Somers
b13625b573 [skip ci] VOP_BMAP.9: fix diction in copyright header
MFC after:	2 weeks
MFC-With:	r349230
Sponsored by:	The FreeBSD Foundation
2019-06-27 23:37:09 +00:00
Doug Moore
e70bb5c1a1 Document the KERN_PROTECTION_FAILURE return value from vm_map_protect().
Reviewed by: alc (earlier version)
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D20751
2019-06-25 17:27:37 +00:00
Ian Lepore
1d90bbbb24 Do some general cleanup and light wordsmithing.
Sort methods alphabetically.  Wrap long lines.  Start sentences on a new
line.  Remove contractions (not because it's a good idea, just to silence
igor).  Add some explanation of the units for the period and duty arguments
and the convention for channel numbers.
2019-06-21 15:12:17 +00:00
Ian Lepore
7202a38099 Catch up with recent changes in pwmbus(9). The pwm(9) and pwmbus(9)
interfaces were unified into pwmbus(9), and the PWMBUS_CHANNEL_MAX method
was renamed PWMBUS_CHANNEL_COUNT.  The pwmbus_attach_bus() function just
went away completely.  Also, fix a few typos such as s/is/if/.
2019-06-21 14:46:43 +00:00
Kevin Lo
a3cea1de1a Correct function names. 2019-06-21 02:49:36 +00:00
Ed Maste
cb53797473 Clarify vm_map_protect max_protection downgrade
As reported in review D20709 by brooks calling vm_map_protect to set a
new max_protection value downgrades existing mappings if necessary (as
opposed to returning an error).

Reported by:	brooks
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-20 18:30:19 +00:00
Ed Maste
0cf197862d Clarify that vm_map_protect cannot upgrade max_protection
It's implied by the man page's RETURN VALUES section, but be explicit in
the description that vm_map_protect can not set new protection bits that
are already in each entry's max_protection.

Reviewed by:	brooks
MFC After:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20709
2019-06-20 18:19:09 +00:00
Alan Somers
67056e3d08 VOP_REVOKE(9): update locking requirements per r143495
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20524
2019-06-20 16:36:20 +00:00
Alan Somers
8bd416a216 VOP_BMAP(9): fix typo in the copyright header
Reported by:	rgrimes
MFC after:	2 weeks
MFC-With:	349230
Sponsored by:	The FreeBSD Foundation
2019-06-20 14:40:36 +00:00
Alan Somers
d01752c703 Add a VOP_BMAP(9) man page
Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20704
2019-06-20 13:59:46 +00:00
Alexander Motin
f91aa773be Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
John Baldwin
0f70218343 Make the warning intervals for deprecated crypto algorithms tunable.
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings.  The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by:	cem
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20555
2019-06-11 23:00:55 +00:00
John Baldwin
574b98cbd9 Document sysctl nodes that translate their values.
This documents the behavior of sysctl_msec_to_ticks and
SYSCTL_{ADD,}_SBINTIME_[UM]SEC.

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20596
2019-06-11 22:57:25 +00:00
Ryan Libby
3cf556f05e Allow fail points to have separate declarations, definitions, and evals
Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20546
2019-06-07 04:09:12 +00:00
Conrad Meyer
208be1617c style.9: Codify tolerance for eliding blank lines
Consensus seems to be that eliding blank lines for functions with no local
variables is acceptable.  Codify that explicitly in the style document.

Reported by:	jhb
Reviewed by:	delphij, imp, vangyzen (earlier version); rgrimes
With feedback from:	kib
Differential Revision:	https://reviews.freebsd.org/D20448
2019-06-03 23:57:29 +00:00
Conrad Meyer
fd55875350 style.9: Correct usage's definition to match its declaration
Suggested by:	emaste
Reviewed by:	delphij, imp, rgrimes, vangyzen (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	(part of D20448)
2019-05-28 20:44:23 +00:00