Commit Graph

2784 Commits

Author SHA1 Message Date
lwhsu
0990bc3053 Follow r350693 to add a link for sbuf_nl_terminate(9)
Sponsored by:	The FreeBSD Foundation
2019-08-08 00:51:17 +00:00
cem
10d53fcce8 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
cem
efd8ed9206 sbuf(9): Add sbuf_nl_terminate() API
The API is used to gracefully terminate text line(s) with a single \n.  If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it.  The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.

Reviewed by:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21030
2019-08-07 19:27:14 +00:00
cem
ada2b1cd07 sbuf(9): Add NOWAIT dynamic buffer extension mode
The goal is to avoid some kinds of low-memory deadlock when formatting
heap-allocated buffers.

Reviewed by:	vangyzen
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21015
2019-08-07 19:23:07 +00:00
oshogbo
b082515a3d seqc: add man page
Reviewed by:	markj
Earlier version reviewed by:	emaste, mjg, bcr, 0mp
Differential Revision:	https://reviews.freebsd.org/D16744
2019-07-29 21:53:02 +00:00
bjk
c1b91fd7f9 Fix grammar nit in copy_file_range docs
Bytes are countable, so we have fewer of them, not less of them.
2019-07-25 15:43:15 +00:00
rmacklem
0873981e64 Create a man page for VOP_COPY_FILE_RANGE(9).
r350315 created a Linux compatible copy_file_range(2) syscall.
It uses a VOP method called VOP_COPY_FILE_RANGE so that file systems,
such as the NFSv4.2 client can do file system specific copying.
For NFSv4.2, this allows the copying to be done locally on the NFS server,
avoiding transferring the data across the wire twice.

This is a new man page (content changed).

Reviewed by:	kib, asomers
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20584
2019-07-25 06:20:00 +00:00
asomers
1ffa303f46 VOP_FSYNC.9: update copyright after r345677
MFC after:	2 weeks
MFC-With:	r345677
Sponsored by:	The FreeBSD Foundation
2019-07-23 23:14:57 +00:00
kib
714ee6bfd2 Update refcount(9).
Describe missed functions.
Give some hint about refcount_release(9) memory ordering guarantees.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21020
2019-07-23 16:11:38 +00:00
asomers
133ea4fff4 VOP_PATHCONF.9: correct the type of the retval argument
It was changed from int to register_t in r22521 and from register_t to long
in r328099, but the man page wasn't updated either time.

MFC after:	2 weeks
2019-07-22 04:14:53 +00:00
kib
c8ac9961b7 Switch the rest of the refcount(9) functions to bool return type.
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.

Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21014
2019-07-21 20:16:48 +00:00
manu
800c51243e pkgbase: move man pages from runtime-manual to runtime
We don't split the other man pages in their own package so do the same for runtime.

Reviewed by:	bapt, gjb
Differential Revision:	https://reviews.freebsd.org/D20962
2019-07-19 15:12:20 +00:00
kib
ea314818c6 Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely.  Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison.  The primitive no longer retries
the operation if it failed spuriously.  Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by:	andrew (arm64, previous version), markj
Tested by:	pho
Reported by:	https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20772
2019-07-12 18:43:24 +00:00
kib
911dbf2f91 Style: avoid long lines by using .Fo instead of .Fn.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-07-12 18:39:41 +00:00
markj
039f74039e Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
lwhsu
c552a58a7e Fix VOP_PUTPAGES(9) in regards to the use of VM_PAGER_CLUSTER_OK
Submitted by:	Ka Ho Ng <khng300 at gmail.com>
Reviewed by:	mckusick
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20695
2019-06-29 14:55:53 +00:00
jhb
520aafe3ec Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages.  It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer.  This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers).  It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused.  To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability.  This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands.  For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output.  If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	ae, kp (firewalls)
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616
2019-06-29 00:48:33 +00:00
jhb
040fbbc230 Sync mbuf flags, types, and external buffer types with <sys/mbuf.h>.
Sponsored by:	Netflix
2019-06-28 19:49:47 +00:00
jhb
cb5976d165 Use a tab after #define for EXT_* constants.
This matches other #define's in this manpage as well as <sys/mbuf.h>.

Sponsored by:	Netflix
2019-06-28 19:37:48 +00:00
hselasky
9586d860bb Implement API for draining EPOCH(9) callbacks.
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.

Differential Revision: https://reviews.freebsd.org/D20109
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-28 10:38:56 +00:00
asomers
aeb32005f6 [skip ci] VOP_BMAP.9: fix diction in copyright header
MFC after:	2 weeks
MFC-With:	r349230
Sponsored by:	The FreeBSD Foundation
2019-06-27 23:37:09 +00:00
dougm
bde6cfd229 Document the KERN_PROTECTION_FAILURE return value from vm_map_protect().
Reviewed by: alc (earlier version)
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D20751
2019-06-25 17:27:37 +00:00
ian
679b7ef604 Do some general cleanup and light wordsmithing.
Sort methods alphabetically.  Wrap long lines.  Start sentences on a new
line.  Remove contractions (not because it's a good idea, just to silence
igor).  Add some explanation of the units for the period and duty arguments
and the convention for channel numbers.
2019-06-21 15:12:17 +00:00
ian
2a1f073196 Catch up with recent changes in pwmbus(9). The pwm(9) and pwmbus(9)
interfaces were unified into pwmbus(9), and the PWMBUS_CHANNEL_MAX method
was renamed PWMBUS_CHANNEL_COUNT.  The pwmbus_attach_bus() function just
went away completely.  Also, fix a few typos such as s/is/if/.
2019-06-21 14:46:43 +00:00
kevlo
76d6e9061c Correct function names. 2019-06-21 02:49:36 +00:00
emaste
5460546183 Clarify vm_map_protect max_protection downgrade
As reported in review D20709 by brooks calling vm_map_protect to set a
new max_protection value downgrades existing mappings if necessary (as
opposed to returning an error).

Reported by:	brooks
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-20 18:30:19 +00:00
emaste
517712e7a8 Clarify that vm_map_protect cannot upgrade max_protection
It's implied by the man page's RETURN VALUES section, but be explicit in
the description that vm_map_protect can not set new protection bits that
are already in each entry's max_protection.

Reviewed by:	brooks
MFC After:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20709
2019-06-20 18:19:09 +00:00
asomers
ec929b8c9b VOP_REVOKE(9): update locking requirements per r143495
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20524
2019-06-20 16:36:20 +00:00
asomers
11d2dfaa2f VOP_BMAP(9): fix typo in the copyright header
Reported by:	rgrimes
MFC after:	2 weeks
MFC-With:	349230
Sponsored by:	The FreeBSD Foundation
2019-06-20 14:40:36 +00:00
asomers
2a58791758 Add a VOP_BMAP(9) man page
Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20704
2019-06-20 13:59:46 +00:00
mav
da627eba88 Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
jhb
7f6f4a4c07 Make the warning intervals for deprecated crypto algorithms tunable.
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings.  The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by:	cem
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20555
2019-06-11 23:00:55 +00:00
jhb
88e325b257 Document sysctl nodes that translate their values.
This documents the behavior of sysctl_msec_to_ticks and
SYSCTL_{ADD,}_SBINTIME_[UM]SEC.

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20596
2019-06-11 22:57:25 +00:00
rlibby
b592fee0b4 Allow fail points to have separate declarations, definitions, and evals
Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20546
2019-06-07 04:09:12 +00:00
cem
286888d78c style.9: Codify tolerance for eliding blank lines
Consensus seems to be that eliding blank lines for functions with no local
variables is acceptable.  Codify that explicitly in the style document.

Reported by:	jhb
Reviewed by:	delphij, imp, vangyzen (earlier version); rgrimes
With feedback from:	kib
Differential Revision:	https://reviews.freebsd.org/D20448
2019-06-03 23:57:29 +00:00
cem
6f8fcac302 style.9: Correct usage's definition to match its declaration
Suggested by:	emaste
Reviewed by:	delphij, imp, rgrimes, vangyzen (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	(part of D20448)
2019-05-28 20:44:23 +00:00
asomers
7a839cf576 VOP_ADVLOCK.9: fix description of flags
* F_RDLCK, F_UNLCK, and F_WRLCK aren't flags.  They're stored in the
  fl.l_type field.
* Add F_REMOTE, added in r177633
* Add F_NOINTR, added in r180025

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-05-27 23:25:19 +00:00
imp
f18d106f99 Add warning that the PNP info has to follow the module declaration.
Due to how the linker.hints file is laid out, we'll associate the pnp
info with the wrong module if the module declaration comes after the
pnp info. Until that limiation is removed, we need to have this
ordering. Ideally, we'd also enforce the ordering somehow, but I've
come up with no way to do that yet...
2019-05-23 15:53:41 +00:00
asomers
ee76a5cce4 Update VFS_FHTOVP(9) with the flags argument
Revison 222167 added a new argument to VFS_FHTOVP. This revision updates the
man page to match.

Reviewed by:	rmacklem
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20323
2019-05-22 16:24:39 +00:00
markj
cd58dbb0bc Hook DEFINE_IFUNC.9 up to the build.
Reported by:	pluknet
MFC with:	r348003
2019-05-20 21:23:33 +00:00
markj
56e7fdbb29 Add a man page for DEFINE_IFUNC.
Reviewed by:	kib
Discussed with:	emaste
MFC after:	2 weeks
Event:		Waterloo Hackathon 2019
Differential Revision:	https://reviews.freebsd.org/D20310
2019-05-20 19:12:29 +00:00
markj
535e3bf427 Typo.
MFC after:	3 days
2019-05-20 19:08:55 +00:00
markj
7d39a491bf Implement the M_NEXTFIT allocation strategy for vmem(9).
This is described in the vmem paper: "directs vmem to use the next free
segment after the one previously allocated."  The implementation adds a
new boundary tag type, M_CURSOR, which is linked into the segment list
and precedes the segment following the previous M_NEXTFIT allocation.
The cursor is used to locate the next free segment satisfying the
allocation constraints.

This implementation isn't O(1) since busy tags aren't coalesced, and we
may potentially scan the entire segment list during an M_NEXTFIT
allocation.

Reviewed by:	alc
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D17226
2019-05-18 01:46:38 +00:00
cem
38fa0aaa51 device_printf: Use sbuf for more coherent prints on SMP
device_printf does multiple calls to printf allowing other console messages to
be inserted between the device name, and the rest of the message.  This change
uses sbuf to compose to two into a single buffer, and prints it all at once.

It exposes an sbuf drain function (drain-to-printf) for common use.

Update documentation to match; some unit tests included.

Submitted by:	jmg
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16690
2019-05-07 17:47:20 +00:00
gallatin
accdb3810d Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets.  When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node.  Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by:	glebius, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19930
2019-04-22 19:24:21 +00:00
cem
8cd2fbf3e0 Revert r346410 and r346411
libkern in .PATH has too many filename conflicts with libc and my -DNO_CLEAN
tinderbox didn't catch that ahead of time.  Mea culpa.
2019-04-19 22:08:17 +00:00
cem
316c180eb7 libkern: Bring in arc4random_uniform(9) from libc
It is a useful arc4random wrapper in the kernel for much the same reasons as
in userspace.  Move the source to libkern (because kernel build is
restricted to sys/, but userspace can include any file it likes) and build
kernel and libc versions from the same source file.

Copy the documentation from arc4random_uniform(3) to the section 9 page.

While here, add missing arc4random_buf(9) symlink.

Sponsored by:	Dell EMC Isilon
2019-04-19 20:05:47 +00:00
manu
c3ff664282 ofw_graph: Add functions for graph bindings
Those functions are helpers to work on graph bindings.
graphs are mostly use with video related devices.
See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/graph.txt?id=4436a3711e3249840e0679e92d3c951bcaf25515

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19877
2019-04-17 20:09:01 +00:00
emaste
4495ee7460 iflibtxrx.9: update function descriptions to match implementation
isc_rxd_refill, isc_rxd_flush return nothing, not void *.

isc_txd_credits_update, isc_rxd_available return int, not int *.

isc_txd_credits_update has a bool as final argument, not a uint32_t.
Prior to r315217 it took four arguments; the final two were
uint32_t, bool.

Reported by:	Gerald Aryeetey <aryeeteygerald_rogers.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-04-16 20:41:04 +00:00
cem
dec0f8b592 random(4): Add is_random_seeded(9) KPI
The imagined use is for early boot consumers of random to be able to make
decisions based on whether random is available yet or not.  One such
consumer seems to be __stack_chk_init(), which runs immediately after random
is initialized.  A follow-up patch will attempt to address that.

Reported by:	many
Reviewed by:	delphij (except man page)
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D19926
2019-04-16 17:12:17 +00:00