Commit Graph

11327 Commits

Author SHA1 Message Date
Konstantin Belousov
30b3018d48 Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely.  Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison.  The primitive no longer retries
the operation if it failed spuriously.  Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by:	andrew (arm64, previous version), markj
Tested by:	pho
Reported by:	https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20772
2019-07-12 18:43:24 +00:00
Konstantin Belousov
e1be41acf7 Style: avoid long lines by using .Fo instead of .Fn.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-07-12 18:39:41 +00:00
Hiroki Sato
2625e51956 Add support for RTL8156, 2.5GbE USB network controller, to if_cdce(4).
This chip can be found in Planex USB-LAN2500R.
2019-07-10 05:45:50 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Niclas Zeising
f1bb9412c2 pci(4): Use plural configuration registers
Change to use registers instead of register, as it is customary to use
plural when talking about PCI registers.

This was missed in r349150.

MFC after:	3 days
2019-07-02 17:48:27 +00:00
Li-Wen Hsu
9725e74c7b Fix VOP_PUTPAGES(9) in regards to the use of VM_PAGER_CLUSTER_OK
Submitted by:	Ka Ho Ng <khng300 at gmail.com>
Reviewed by:	mckusick
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20695
2019-06-29 14:55:53 +00:00
John Baldwin
82334850ea Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages.  It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer.  This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers).  It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused.  To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability.  This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands.  For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output.  If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	ae, kp (firewalls)
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616
2019-06-29 00:48:33 +00:00
John Baldwin
773123546b Sync mbuf flags, types, and external buffer types with <sys/mbuf.h>.
Sponsored by:	Netflix
2019-06-28 19:49:47 +00:00
John Baldwin
7b29ab7e25 Use a tab after #define for EXT_* constants.
This matches other #define's in this manpage as well as <sys/mbuf.h>.

Sponsored by:	Netflix
2019-06-28 19:37:48 +00:00
Hans Petter Selasky
131b2b7658 Implement API for draining EPOCH(9) callbacks.
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.

Differential Revision: https://reviews.freebsd.org/D20109
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-28 10:38:56 +00:00
Alan Somers
b13625b573 [skip ci] VOP_BMAP.9: fix diction in copyright header
MFC after:	2 weeks
MFC-With:	r349230
Sponsored by:	The FreeBSD Foundation
2019-06-27 23:37:09 +00:00
Andriy Gapon
061b38cdcc gpiobus: provide a new hint, pin_list
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask.  One problem with that is that a controller can have more than 32
pins.  One example is amdgpio.  Also, a list of numbers is a little bit
more human friendly than a matching bit mask.  As a side note, it seems
that in FDT pins are typically specified by their numbers as well.

This commit also adds accessors for instance variables (IVARs) that
define the child pins.  My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform).  Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.

While there, I removed "flags" instance variable.  It was unused.

Reviewed by:	mizhka
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D20459
2019-06-27 15:46:06 +00:00
Andriy Gapon
59c94acee9 gpio.4: document device hints common to all devices on gpiobus
"at" keyword is documented in device.hints(5) for all buses, but it does
hurt to add another reference to it.
"pins" keyword is specific to gpiobus.
At least these two hints should be configured for any gpiobus device on
a hints based system.

MFC after:	10 days
2019-06-26 07:38:31 +00:00
Andriy Gapon
50b4788ba1 fix up r349406, add missing .El
MFC after:	1 week
2019-06-26 07:08:51 +00:00
Andriy Gapon
8352171a58 owc.4: document how to set up the 1-wire bus on a device.hints system
MFC after:	1 week
2019-06-26 06:40:30 +00:00
Doug Moore
e70bb5c1a1 Document the KERN_PROTECTION_FAILURE return value from vm_map_protect().
Reviewed by: alc (earlier version)
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D20751
2019-06-25 17:27:37 +00:00
Warner Losh
f5a95d9a07 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
Doug Moore
a616b25342 Modify swapon(8) to invoke BIO_DELETE to trim swap devices, either if
'-E' appears on the swapon command line, or if "trimonce" appears as
an fstab option.

Discussed at: BSDCAN
Tested by: markj
Reviewed by: markj
Approved by: markj (mentor)
Differential Revision:https://reviews.freebsd.org/D20599
2019-06-22 03:16:01 +00:00
Ian Lepore
1d90bbbb24 Do some general cleanup and light wordsmithing.
Sort methods alphabetically.  Wrap long lines.  Start sentences on a new
line.  Remove contractions (not because it's a good idea, just to silence
igor).  Add some explanation of the units for the period and duty arguments
and the convention for channel numbers.
2019-06-21 15:12:17 +00:00
Ian Lepore
7202a38099 Catch up with recent changes in pwmbus(9). The pwm(9) and pwmbus(9)
interfaces were unified into pwmbus(9), and the PWMBUS_CHANNEL_MAX method
was renamed PWMBUS_CHANNEL_COUNT.  The pwmbus_attach_bus() function just
went away completely.  Also, fix a few typos such as s/is/if/.
2019-06-21 14:46:43 +00:00
Kevin Lo
a3cea1de1a Correct function names. 2019-06-21 02:49:36 +00:00
Conrad Meyer
c363b16c63 sys: Remove DEV_RANDOM device option
Remove 'device random' from kernel configurations that reference it (most).
Replace perhaps mistaken 'nodevice random' in two MIPS configs with 'options
RANDOM_LOADABLE' instead.  Document removal in UPDATING; update NOTES and
random.4.

Reviewed by:	delphij, markm (previous version)
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D19918
2019-06-21 00:16:30 +00:00
Ed Maste
cb53797473 Clarify vm_map_protect max_protection downgrade
As reported in review D20709 by brooks calling vm_map_protect to set a
new max_protection value downgrades existing mappings if necessary (as
opposed to returning an error).

Reported by:	brooks
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-20 18:30:19 +00:00
Ed Maste
0cf197862d Clarify that vm_map_protect cannot upgrade max_protection
It's implied by the man page's RETURN VALUES section, but be explicit in
the description that vm_map_protect can not set new protection bits that
are already in each entry's max_protection.

Reviewed by:	brooks
MFC After:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20709
2019-06-20 18:19:09 +00:00
Alan Somers
67056e3d08 VOP_REVOKE(9): update locking requirements per r143495
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20524
2019-06-20 16:36:20 +00:00
Alan Somers
8bd416a216 VOP_BMAP(9): fix typo in the copyright header
Reported by:	rgrimes
MFC after:	2 weeks
MFC-With:	349230
Sponsored by:	The FreeBSD Foundation
2019-06-20 14:40:36 +00:00
Alan Somers
d01752c703 Add a VOP_BMAP(9) man page
Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20704
2019-06-20 13:59:46 +00:00
Alexander Motin
f91aa773be Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
Ian Lepore
a161bab854 Add a pwmc(4) manpage. 2019-06-18 04:32:19 +00:00
Niclas Zeising
2d4a74d7a0 pci.4: Use plural configuration registers
It is customary to use plural when talking about PCI configure registers.

Reported by:	scottl
MFC after:	2 weeks
X-MFC-with:	r349133
2019-06-17 17:35:55 +00:00
Mark Johnston
970a1ed345 Add some missing MLINKs for tree(3).
MFC after:	3 days
2019-06-17 16:57:44 +00:00
Niclas Zeising
999546e169 pci.4: wordsmith and add missing words
Add missing words after PCI in the description of the PCIOCWRITE and
PCIOCATTACHED ioctls.
Use singular in PCIOCREAD, we only read one register at the time.

Reviewed by:	bcr, bjk, rgrimes, cem
MFC after:	2 weeks
X-MFC-with:	r349133
Differential Revision:	https://reviews.freebsd.org/D20671
2019-06-17 16:54:51 +00:00
Niclas Zeising
65d328a3eb pci(4): Document PCIOCATTACHED
Document the PCIOCATTACHED ioctl(2) in the pci(4) manual.
PCIOCATTACHED is used to query if a driver has attached to a PCI.

Reviewed by:	bcr, imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20652
2019-06-17 05:41:47 +00:00
Bryan Drewery
15a04cec78 symlinkat(2) is not covered. 2019-06-16 05:12:17 +00:00
Mateusz Piotrowski
966b76b496 netmap.4: Fix a typo as FreeBSD Linux is not a thing
Approved by:	src (emaste)
Event:		Berlin Devsummit 2019
2019-06-15 12:09:22 +00:00
Stephen Hurd
705aad98c6 Some devices take undesired actions when RTS and DTR are
asserted. Some development boards for example will reset on DTR,
and some radio interfaces will transmit on RTS.

This patch allows "stty -f /dev/ttyu9.init -rtsdtr" to prevent
RTS and DTR from being asserted on open(), allowing these devices
to be used without problems.

Reviewed by:    imp
Differential Revision:  https://reviews.freebsd.org/D20031
2019-06-12 18:07:04 +00:00
John Baldwin
0f70218343 Make the warning intervals for deprecated crypto algorithms tunable.
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings.  The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by:	cem
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20555
2019-06-11 23:00:55 +00:00
John Baldwin
574b98cbd9 Document sysctl nodes that translate their values.
This documents the behavior of sysctl_msec_to_ticks and
SYSCTL_{ADD,}_SBINTIME_[UM]SEC.

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20596
2019-06-11 22:57:25 +00:00
Ryan Libby
3cf556f05e Allow fail points to have separate declarations, definitions, and evals
Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20546
2019-06-07 04:09:12 +00:00
Conrad Meyer
208be1617c style.9: Codify tolerance for eliding blank lines
Consensus seems to be that eliding blank lines for functions with no local
variables is acceptable.  Codify that explicitly in the style document.

Reported by:	jhb
Reviewed by:	delphij, imp, vangyzen (earlier version); rgrimes
With feedback from:	kib
Differential Revision:	https://reviews.freebsd.org/D20448
2019-06-03 23:57:29 +00:00
John Baldwin
37e98e0dda Add 'device cxgbe' explicitly in the synopsis.
ccr depends on symbols exported by the cxgbe driver as well as having
a runtime dependency.  While the runtime depenency was noted in the
manpage already, the compile-time dependency wasn't as clear.

PR:		238265
MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-06-03 15:41:54 +00:00
Edward Tomasz Napierala
1cedef8f7f Make tests(7) point people at the atf(7) man page.
Other frameworks, such as googletest, should be added there as well,
once they become viable.  For now let's keep it simple.

Discussed with: ngie, emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20124
2019-06-02 16:33:24 +00:00
Alexander Motin
08b96108ae Document max_chains bump to 16384 at r330049.
MFC after:	1 week
2019-06-01 16:04:20 +00:00
Dmitry Chagin
c5afec6e89 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
Marcin Wojtas
e44f5c81b8 Fix ENA manual issues
The issues were pointed in community review:
https://reviews.freebsd.org/D10427#inline-67587

Also, fix other issues found by the igor tool.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:50:45 +00:00
Conrad Meyer
fd55875350 style.9: Correct usage's definition to match its declaration
Suggested by:	emaste
Reviewed by:	delphij, imp, rgrimes, vangyzen (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	(part of D20448)
2019-05-28 20:44:23 +00:00
Alan Somers
2b3899c0ee VOP_ADVLOCK.9: fix description of flags
* F_RDLCK, F_UNLCK, and F_WRLCK aren't flags.  They're stored in the
  fl.l_type field.
* Add F_REMOTE, added in r177633
* Add F_NOINTR, added in r180025

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-05-27 23:25:19 +00:00
Conrad Meyer
b1adb000cd virtio.4: Add missing devices and Xr
This page could probably use further improvement.
2019-05-27 00:51:27 +00:00
Mateusz Piotrowski
7005cea324 ipheth.4: Explain how to manually configure USB tethering on Apple devices
Reviewed by:	danfe, hselasky
Approved by:	src (hselasky)
Differential Revision:	https://reviews.freebsd.org/D20353
2019-05-26 14:15:54 +00:00
Edward Tomasz Napierala
0407410ba0 We don't really need two entries to describe how to deal with
optical drives in devfs.conf(5).

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-05-25 17:37:28 +00:00