Commit Graph

242515 Commits

Author SHA1 Message Date
Alan Somers
a1c9f4ad0d fusefs: implement VOP_BMAP
If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap.  Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.

The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.

Sponsored by:	The FreeBSD Foundation
2019-06-20 17:08:21 +00:00
Alan Somers
e532a99901 MFHead @349234
Sponsored by:	The FreeBSD Foundation
2019-06-20 15:56:08 +00:00
Alan Somers
8bd416a216 VOP_BMAP(9): fix typo in the copyright header
Reported by:	rgrimes
MFC after:	2 weeks
MFC-With:	349230
Sponsored by:	The FreeBSD Foundation
2019-06-20 14:40:36 +00:00
Alan Somers
9de14ff010 #include <sys/types.h> from sys/filio.h
This fixes world build after r349231

Reported by:	Jenkins
MFC after:	2 weeks
MFC-With:	349231
Sponsored by:	The FreeBSD Foundation
2019-06-20 14:35:28 +00:00
Alan Somers
d49b446bfb Add FIOBMAP2 ioctl
This ioctl exposes VOP_BMAP information to userland. It can be used by
programs like fragmentation analyzers and optimized cp implementations. But
I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name
distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD
and Linux.  FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block
number instead of 32-bit, and it also returns runp and runb.

Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20705
2019-06-20 14:13:10 +00:00
Alan Somers
d01752c703 Add a VOP_BMAP(9) man page
Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20704
2019-06-20 13:59:46 +00:00
Antoine Brodin
1321298390 Add head(1) to native-xtools so that it can be used in qemu-user jails 2019-06-20 13:24:58 +00:00
Michael Tuexen
592bbe54d9 The variable names in the description of the port number usage is
inconsistent. This patch fixes that and improves the precision of
the description.
Thanks to Tom Marcoen for reporting the issue and providing an
initial patch, on which this change is based.

PR:			237723
Reviewed by:		bcr@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D20708
2019-06-20 12:38:41 +00:00
Li-Wen Hsu
99b6ccd8c9 Finsh readding Big5 in r317204, which was reverting r315568. This commit
reverts r315569.

Reported by:	Ting-Wei Lan <lantw44 gmail com>
Discussed with:	kevlo
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-06-20 07:17:16 +00:00
Alexander Motin
f91aa773be Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
Mark Johnston
ee1f168540 Group vm_page_activate()'s definition with other related functions.
No functional change intended.

MFC after:	3 days
2019-06-19 21:36:00 +00:00
Matt Macy
6459a61ea7 Tell loader to ignore newer features enabled on the root pool.
There are many new features in ZoF. Most, if not all, do not effect read only usage.
Encryption in particular is enabled at the pool level but used at the dataset level.
The loader obviously will not be able to boot if the boot dataset is encrypted, but
should not care if some other dataset in the root pool is encrypted.

Reviewed by:	allanjude
MFC after:	1 week
2019-06-19 21:10:13 +00:00
Bryan Drewery
8e0c373903 Follow-up r349065: Fix .TARGET flag ambiguity with PROGS which broke MK_TESTS.
X-MFC-With:	r349065
Sponsored by:	DellEMC
2019-06-19 19:19:37 +00:00
Rebecca Cran
3109cebc22 efinet: Defer exclusively opening the network handles
Don't commit to exclusive access to the network device handle by
efinet until the loader has decided to load something through the
network. This allows for the possibility of other users of the
network device.

Submitted by:	scottph
Reviewed by:	tsoome, emaste
Tested by: 	tsoome, bcran
Differential Revision:	https://reviews.freebsd.org/D20642
2019-06-19 18:47:44 +00:00
Mark Johnston
ab877e64d0 Make zlib encoding messages idempotent.
Otherwise duplicate messages can trigger a reinitialization of the
compression stream while the update thread is running.  Also ensure
that the stream is initialized before the update thread may attempt
to use it.

PR:		238333
Reviewed by:	cem, rgrimes
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20673
2019-06-19 16:09:20 +00:00
Alexander Motin
49ee0fcea5 Use sbuf_cat() in GEOM confxml generation.
When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-19 15:36:02 +00:00
Jonathan T. Looney
5e02b277a4 Add the ability to limit how much the code will fragment the RACK send map
in response to SACKs. The default behavior is unchanged; however, the limit
can be activated by changing the new net.inet.tcp.rack.split_limit sysctl.

Submitted by:	Peter Lei <peterlei@netflix.com>
Reported by:	jtl
Reviewed by:	lstewart (earlier version)
Security:	CVE-2019-5599
2019-06-19 13:55:00 +00:00
Alexander Motin
eeea0fcf0f Fix typo in r349178.
Reported by:	ae
MFC after:	1 week
2019-06-19 13:30:50 +00:00
Leandro Lupori
68ed5ad2d5 [PPC] Fix loader input with newer QEMU versions
At least since version 4.0.0, QEMU became bug-compatible with PowerVM's
vty, by inserting a \0 after every \r. As this confuses loader's
interpreter and as a \0 coming from the console doesn't seem reasonable,
it's now being filtered at OFW console input.

Reviewed by:	jhibbits
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20676
2019-06-19 11:37:43 +00:00
Sevan Janiyan
87278c17c9 Whitespace 2019-06-19 11:22:09 +00:00
Marko Zec
188adcb7e4 V_ip6_forwarding and V_ipforwarding have been defined in ip6_var.h /
ip_var.h since at least 2008, so make use of those definitions here.

MFC after:	3 days
2019-06-19 08:49:24 +00:00
Marko Zec
6aee0bfa85 Evaluating htons() at compile time is more efficient than doing ntohs()
at runtime.  This change removes a dependency on a barrel shifter pass
before branch resolution, while reducing the instruction stream size
by 9 bytes on amd64.

MFC after:	3 days
2019-06-19 08:39:19 +00:00
Scott Long
da761f3b1f Implement VT-d capability detection on chipsets that have multiple
translation units with differing capabilities

From the author via Bugzilla:
---
When an attempt is made to passthrough a PCI device to a bhyve VM
(causing initialisation of IOMMU) on certain Intel chipsets using
VT-d the PCI bus stops working entirely. This issue occurs on the
E3-1275 v5 processor on C236 chipset and has also been encountered
by others on the forums with different hardware in the Skylake
series.

The chipset has two VT-d translation units. The issue is caused by
an attempt to use the VT-d device-IOTLB capability that is
supported by only the first unit for devices attached to the
second unit which lacks that capability. Only the capabilities of
the first unit are checked and are assumed to be the same for all
units.

Attached is a patch to rectify this issue by determining which
unit is responsible for the device being added to a domain and
then checking that unit's device-IOTLB capability. In addition to
this a few fixes have been made to other instances where the first
unit's capabilities are assumed for all units for domains they
share. In these cases a mutual set of capabilities is determined.
The patch should hopefully fix any bugs for current/future
hardware with multiple translation units supporting different
capabilities.

A description is on the forums at
https://forums.freebsd.org/threads/pci-passthrough-bhyve-usb-xhci.65235
The thread includes observations by other users of the bug
occurring, and description as well as confirmation of the fix.
I'd also like to thank Ordoban for their help.

---
Personally tested on a Skylake laptop, Skylake Xeon server, and
a Xeon-D-1541, passing through XHCI and NVMe functions.  Passthru
is hit-or-miss to the point of being unusable without this
patch.

PR: 229852
Submitted by: callum@aitchison.org
MFC after: 1 week
2019-06-19 06:41:07 +00:00
Alan Cox
0b3c5f1bc9 Correct an error in r349122. pmap_unwire() should update the pmap's wired
count, not its resident count.

X-MFC with:	r349122
2019-06-19 03:33:00 +00:00
Bryan Drewery
46dea205c6 Rework r349061: Don't apply guessed dependencies if there is a custom target.
This is still targeting bin/sh cyclic dependency issues.  Only apply
guessed dependencies that are explicitly set for an object (which
gnu/lib/cc/cc_tools needs) and if no custom target exists with its
own dependencies.

This was manifesting as a missing yacc.h in usr.bin/mkesdb_static when
built without -j (or -B). No actual yacc.h dependency ordering was
defined but with -j it got lucky and built fine.

Before r349061 the behavior was different for META_MODE but that logic
difference isn't needed.

X-MFC-With:	r349061
Sponsored by:	DellEMC
2019-06-18 22:00:38 +00:00
Alexander Motin
5c32e9fcb2 Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output.  Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns.  Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
 - on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
 - on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-18 21:05:10 +00:00
Sevan Janiyan
cfdcc8c7fa Mark NetBSD branch points
NetBSD 7.0 was a separate branch, subsequent 8.x releases did not emerge from
this branch.
Clean up minor visual nits, centre OpenBSD listing on the B, DragonFly
listings on the y.
2019-06-18 21:02:40 +00:00
Conrad Meyer
22eedc9722 random(4): Fix a regression in short AES mode reads
In r349154, random device reads of size < 16 bytes (AES block size) were
accidentally broken to loop forever.  Correct the loop condition for small
reads.

Reported by:	pho
Reviewed by:	delphij
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D20686
2019-06-18 18:50:58 +00:00
Vincenzo Maffione
5c2b348a54 bhyve: vtnet: fix locking on receive
The vsc_rx_ready and the RX virtqueue is protected by the rx_mtx lock.
However, pci_vtnet_ping_rxq() (currently called only once after each
device reset) accesses those without acquiring the lock.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20609
2019-06-18 17:51:30 +00:00
Ian Lepore
edd96b9fb9 Handle labels specified with hints even on FDT systems. Hints are the
easiest thing for a user to control (via loader.conf or kenv+kldload), so
handle them in addition to any label specified via the FDT data.
2019-06-18 17:05:05 +00:00
Ed Maste
05918954d8 Remove sys/capability.h for the third time
In all supported (and most unsupported) FreeBSD versions the appropriate
header for Capsicum is sys/capsicum.h.  Software including sys/capability.h
is most likely looking for Linux capabilities based on the withdrawn
POSIX.1e draft.

This header was previously removed in r334929 and r340156, but reverted
each time due to ports failures.  These issues have now (broadly) been
addressed.

PR:		228878 [exp-run]
Submitted by:	eadler (r334929)
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
2019-06-18 14:13:52 +00:00
Ian Lepore
a161bab854 Add a pwmc(4) manpage. 2019-06-18 04:32:19 +00:00
Ian Lepore
0309916a88 Oops, it seems I left out the word 'cycle', fix it.
Reported by:	rpokala@
2019-06-18 02:27:30 +00:00
Ian Lepore
26f3ca615d Rearrange the argument checking and processing so that enable and disable
can be combined with configuring the period and duty cycle (the same ioctl
sets all 3 values at once, so there's no reason to require the user to run
the program twice to get all 3 things set).
2019-06-18 01:15:00 +00:00
Ian Lepore
123570fb92 Explain the relationship between PWM hardware channels being controlled and
pwmc(4) device filenames.  Also, use uppercase PWM when the term is being
used as an acronym, and expand the acronym where it's first used.
2019-06-18 00:17:10 +00:00
Ian Lepore
780c3de886 Remove everything related to channels from the pwmc public interface, now
that there is a pwmc(4) instance per channel and the channel number is
maintained as a driver ivar rather than being passed in from userland.
2019-06-18 00:11:00 +00:00
Alan Somers
84879e46c2 fusefs: multiple fixes related to the write cache
* Don't always write the last page synchronously.  That's not actually
  required.  It was probably just masking another bug that I fixed later,
  possibly in r349021.

* Enable the NotifyWriteback tests now that Writeback cache is working.

* Add a test to ensure that the write cache isn't flushed synchronously when
  in writeback mode.

Sponsored by:	The FreeBSD Foundation
2019-06-17 23:34:11 +00:00
Takanori Watanabe
e68fcc8875 Add ACPI support for USB driver.
This adds ACPI device path on devinfo(8) output and
show  value of _UPC(usb port capabilities), _PLD (physical location of device)
when hw.usb.debug >= 1 .

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D20630
2019-06-17 23:03:30 +00:00
Glen Barber
c1f6499260 Fix passing ${CONF_FILES} (which contains MAKE_CONF and
SRC_CONF, __MAKE_CONF and SRCCONF, respectively) through
to arm_install_base() and chroot_arm_build_release().
This prevents failures when the target image is intended
to be build with make.conf(5) and src.conf(5) overrides,
which are correctly handled for non-embedded image builds.

Reported and tested by:	Daniel Engberg
PR:		238615
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-06-17 22:53:39 +00:00
Alan Somers
0482ec3e3f fusefs: run the Io tests with various combinations of mount options
Sponsored by:	The FreeBSD Foundation
2019-06-17 22:13:59 +00:00
Alan Somers
402b609c80 fusefs: use cluster_read for more readahead
fusefs will now use cluster_read.  This allows readahead of more than one
cache block.  However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.

Sponsored by:	The FreeBSD Foundation
2019-06-17 22:01:23 +00:00
Sevan Janiyan
b6bb4d1e19 Add NetBSD 8.1 & DragonFly BSD 5.6 2019-06-17 21:46:13 +00:00
Sevan Janiyan
d5646db99b Fix tab 2019-06-17 21:38:33 +00:00
Conrad Meyer
179f62805c random(4): Fortuna: allow increased concurrency
Add experimental feature to increase concurrency in Fortuna.  As this
diverges slightly from canonical Fortuna, and due to the security
sensitivity of random(4), it is off by default.  To enable it, set the
tunable kern.random.fortuna.concurrent_read="1".  The rest of this commit
message describes the behavior when enabled.

Readers continue to update shared Fortuna state under global mutex, as they
do in the status quo implementation of the algorithm, but shift the actual
PRF generation out from under the global lock.  This massively reduces the
CPU time readers spend holding the global lock, allowing for increased
concurrency on SMP systems and less bullying of the harvestq kthread.

It is somewhat of a deviation from FS&K.  I think the primary difference is
that the specific sequence of AES keys will differ if READ_RANDOM_UIO is
accessed concurrently (as the 2nd thread to take the mutex will no longer
receive a key derived from rekeying the first thread).  However, I believe
the goals of rekeying AES are maintained: trivially, we continue to rekey
every 1MB for the statistical property; and each consumer gets a
forward-secret, independent AES key for their PRF.

Since Chacha doesn't need to rekey for sequences of any length, this change
makes no difference to the sequence of Chacha keys and PRF generated when
Chacha is used in place of AES.

On a GENERIC 4-thread VM (so, INVARIANTS/WITNESS, numbers not necessarily
representative), 3x concurrent AES performance jumped from ~55 MiB/s per
thread to ~197 MB/s per thread.  Concurrent Chacha20 at 3 threads went from
roughly ~113 MB/s per thread to ~430 MB/s per thread.

Prior to this change, the system was extremely unresponsive with 3-4
concurrent random readers; each thread had high variance in latency and
throughput, depending on who got lucky and won the lock.  "rand_harvestq"
thread CPU use was high (double digits), seemingly due to spinning on the
global lock.

After the change, concurrent random readers and the system in general are
much more responsive, and rand_harvestq CPU use dropped to basically zero.

Tests are added to the devrandom suite to ensure the uint128_add64 primitive
utilized by unlocked read functions to specification.

Reviewed by:	markm
Approved by:	secteam(delphij)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20313
2019-06-17 20:29:13 +00:00
Cy Schubert
2b951e9c6d Allow the hostapd program to be specified. This allows users to use
hostapd from ports instead of the one in base. The default is the hostapd
in base.

PR:		238571
MFC after:	1 week
2019-06-17 20:11:02 +00:00
Cy Schubert
b8358917db Make ipf_objbytes a constant. ipf_objbytes is a table of internal data
structures that are saved across reboots by ipfs(8). The table is not
changed at runtime.

MFC after:	3 days
2019-06-17 20:10:55 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Niclas Zeising
2d4a74d7a0 pci.4: Use plural configuration registers
It is customary to use plural when talking about PCI configure registers.

Reported by:	scottl
MFC after:	2 weeks
X-MFC-with:	r349133
2019-06-17 17:35:55 +00:00
Alan Somers
6fa772a88e fusefs: skip the Write.mmap test when mmap is not available
fusefs doesn't not allow mmap when data caching is disabled.

Sponsored by:	The FreeBSD Foundation
2019-06-17 17:17:01 +00:00
Mark Johnston
970a1ed345 Add some missing MLINKs for tree(3).
MFC after:	3 days
2019-06-17 16:57:44 +00:00