Commit Graph

127697 Commits

Author SHA1 Message Date
Hans Petter Selasky
43a9329e1b Free all allocated unit IDs in cuse(3) after the client character
devices have been destroyed to avoid creating character devices with
identical name.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:46:01 +00:00
Hans Petter Selasky
c7ffaed92e Fix for deadlock situation in cuse(3)
The final server unref should be done by the server thread to prevent
deadlock in the client cdevpriv destructor, which cannot destroy
itself.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:42:53 +00:00
Andrey V. Elsukov
019c8c9330 Follow the RFC 3128 and drop short TCP fragments with offset = 1.
Reported by:	emaste
MFC after:	1 week
2019-06-25 11:40:37 +00:00
Andrey V. Elsukov
7d4b2d5244 Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in
compact form.

MFC after:	1 week
2019-06-25 09:11:22 +00:00
Doug Moore
18cd8bb800 vm_map_protect may return an INVALID_ARGUMENT or PROTECTION_FAILURE
error response after clipping the first map entry in the region to be
reserved. This creates a pair of matching entries that should have
been "simplified" back into one, or never created. This change defers
the clipping of that entry until those two vm_map_protect failure
cases have been ruled out.

Reviewed by: alc
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20711
2019-06-25 07:44:37 +00:00
Cy Schubert
c964c98793 The definition of icmptypes in ip_compt.h is dead code as it already
use the icmptypes in ip_icmp.h.

MFC after:	1 week
2019-06-25 07:04:47 +00:00
Warner Losh
a9154c1c83 Replay r349342 by imp accidentally reverted by r349352
Use the cam_ed copy of ata_params rather than malloc and freeing
memory for it. This reaches into internal bits of xpt a little, and
I'll clean that up later.
2019-06-25 06:14:31 +00:00
Warner Losh
296218d4cf Replay r349340 by imp accidentally reverted by r349352
Create ata_param_fixup

Create a common fixup routine to do the canonical fixup of the
ata_param fixup. Call it from both the ATA and the ATA over SCSI
paths.
2019-06-25 06:14:21 +00:00
Warner Losh
76769dc108 Replay r349339 by imp accidentally reverted by r349352
Go ahead and completely fix the ata_params before calling the veto
function. This breaks nothing that uses it in the tree since
ata_params is ignored in storvsc_ada_probe_veto which is the only
in-tree consumer.
2019-06-25 06:14:16 +00:00
Warner Losh
e5500f1efa Replay r349334 by markj accidentally reverted by r349352
Remove a lingering use of splbio().

The buffer must be locked by the caller.  No functional change
intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-25 06:14:00 +00:00
Warner Losh
f5a95d9a07 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
Justin Hibbits
f62da49b2f powerpc: Transition to Secure-PLT, like most other OSs
Summary:
PowerPC has two PLT models: BSS-PLT and Secure-PLT.  BSS-PLT uses runtime
code generation to generate the PLT stubs.  Secure-PLT was introduced with
GCC 4.1 and Binutils 2.17 (base has GCC 4.2.1 and Binutils 2.17), and is a
more secure PLT format, using a read-only linkage table, with the dynamic
linker populating a non-executable index table.

This is the libc, rtld, and kernel support only.  The toolchain and build
parts will be updated separately.

Reviewed By: nwhitehorn, bdragon, pfg
Differential Revision: https://reviews.freebsd.org/D20598
MFC after:	1 month
2019-06-25 00:40:44 +00:00
Jayachandran C.
b0f79e328e arm64 acpi_iort: add some error handling
Print warnings for some bad kernel configurations (like NUMA disabled
with multiple domains). Check and report some firmware errors (like
incorrect proximity domain entries).

Differential Revision:	https://reviews.freebsd.org/D20416
2019-06-24 21:24:55 +00:00
Jayachandran C.
c66524f07d arm64 gicv3_its: enable all ITS blocks for a CPU
We now support multiple ITS blocks raising interrupts to a CPU.
Add all available CPUs to the ITS when no NUMA information is
available.

This reverts the check added in r340602, at that tim we did not
suppport multiple ITS blocks for a CPU.

Differential Revision:	https://reviews.freebsd.org/D20417
2019-06-24 21:13:45 +00:00
Jayachandran C.
893caf588a arm64 gic: Drop unused GICV3_IVAR_REDIST_VADDR
Now that GICV3_IVAR_REDIST is available, GICV3_IVAR_REDIST_VADDR
is unused and can be removed. Drop the define and add a comment.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D20454
2019-06-24 21:00:28 +00:00
Warner Losh
af9727f618 Add missing include of sys/boot.h
This change was dropped out in a rebase and I didn't catch that before
I committed.
2019-06-24 20:52:21 +00:00
Warner Losh
ec9abc1843 Move to using a common kernel path between the boot / laoder bits and
the kernel.
2019-06-24 20:34:53 +00:00
Warner Losh
97ad52ca4c Use the cam_ed copy of ata_params rather than malloc and freeing
memory for it. This reaches into internal bits of xpt a little, and
I'll clean that up later.
2019-06-24 20:23:19 +00:00
Warner Losh
2afaed2d0f Create ata_param_fixup
Create a common fixup routine to do the canonical fixup of the
ata_param fixup. Call it from both the ATA and the ATA over SCSI
paths.
2019-06-24 20:18:58 +00:00
Warner Losh
161d2a1796 Go ahead and completely fix the ata_params before calling the veto
function. This breaks nothing that uses it in the tree since
ata_params is ignored in storvsc_ada_probe_veto which is the only
in-tree consumer.
2019-06-24 20:18:49 +00:00
Mark Johnston
673c1c2944 Remove a lingering use of splbio().
The buffer must be locked by the caller.  No functional change
intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-24 19:19:37 +00:00
Cy Schubert
51a7230a18 Clean out duplicate definitions of TCP macros also found in netinet/tcp.h.
MFC after:	1 week
2019-06-24 02:58:02 +00:00
Ian Lepore
0bab2b6e6f Add pwm devices to NOTES. 2019-06-24 02:39:56 +00:00
Ian Lepore
6e36309d83 Add gpio(4) and related drivers to NOTES. 2019-06-24 02:30:05 +00:00
Ian Lepore
2973d38a49 The gpiopps(4) driver currently has probe and attach code only for FDT based
systems, so conditionalize it accordingly in conf/files.
2019-06-24 02:27:17 +00:00
Ian Lepore
5364951d98 Build an armv7 LINT kernel in addition to armv5 LINT. You might think this
had been done years ago.  I did.  All this time we've only compiled a LINT
kernel for TARGET_ARCH=arm.  Now separate LINT-V5 and LINT-V7 configs are
generated and built.

There are two new files in arm/conf, NOTES.armv5 and NOTES.armv7, containing
some of what used to be in the arm NOTES file.  That file now contains only
the bits that are common to v5 and v7.

The makeLINT.mk file now creates the LINT-V5 and LINT-V7 files by concatening
sys/conf/NOTES, arm/conf/NOTES, and arm/conf/NOTES.armv{5,7} in that order.
2019-06-24 01:42:09 +00:00
Konstantin Belousov
d8ddb98a5e amd64 pmap: block on turnstile for lock-less DI.
Port the code to block on turnstile instead of yielding, to lock-less
delayed invalidation. The yield might cause tight loop due to priority
inversion.

Since it is impossible to avoid race between block and wake-up, arm
1-tick callout to wakeup when thread blocks itself.

Reported and tested by:	mjg
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
Differential revision:	https://reviews.freebsd.org/D20636
2019-06-23 21:21:11 +00:00
Ian Lepore
7a3a48426e Allow compiling ukbdmap.h on arm, since it appears to work fine. 2019-06-23 21:17:41 +00:00
Konstantin Belousov
89f2ab0608 Switch to check for effective user id in r349320, and disable dumping
into existing files for sugid processes.

Despite using real user id pronounces the intent, it actually breaks
suid coredumps, while not making any difference for non-sugid
processes.  The reason for the breakage is that non-existent core file
is created with the effective uid (unless weird hacks like SUIDDIR are
configured).

Then, if user enabled kern.sugid_coredump, core dumping should not
overwrite core files owned by effective uid, but we cannot pretend to
use real uid for dumping.

PR:	68905
admbugs:	358
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-23 21:15:31 +00:00
Alan Cox
22c7bcb842 pmap_enter_quick_locked() never replaces a valid mapping, so it need not
perform a TLB invalidation.  A barrier suffices.  (See r343876.)

Add a comment to pmap_enter_quick_locked() in order to highlight the
fact that it does not replace valid mappings.

Correct a typo in one of pmap_enter()'s comments.

MFC after:	1 week
2019-06-23 21:06:56 +00:00
Alexander Motin
53f5ac1310 Improve AHCI Enclosure Management and SES interoperation.
Since SES specs do not define mechanism to map enclosure slots to SATA
disks, AHCI EM code I written many years ago appeared quite useless,
that always bugged me.  I was thinking whether it was a good idea, but
if LSI HBAs do that, why I shouldn't?

This change introduces simple non-standard mechanism for the mapping
into both AHCI EM and SES code, that makes AHCI EM on capable controllers
(most of Intel's) a first-class SES citizen, allowing it to report disk
physical path to GEOM, show devices inserted into each enclosure slot in
`sesutil map` and `getencstat`, control locate and fault LEDs for specific
devices with `sesutil locate adaX on` and `sesutil fault adaX on`, etc.

I've successfully tested this on Supermicro X10DRH-i motherboard connected
with sideband cable of its S-SATA Mini-SAS connector to SAS815TQ backplane.
It can indicate with LEDs Locate, Fault and Rebuild/Remap SES statuses for
each disk identical to real SES of Supermicro SAS2 backplanes.

MFC after:	2 weeks
2019-06-23 19:05:01 +00:00
Konstantin Belousov
7a29e0bf96 coredump: avoid writing to core files not owned by the real user.
Reported by: blake frantz <trew@hick.org>
PR:	68905
admbugs:	358
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-23 18:35:11 +00:00
Ian Lepore
ac6a9e474f Add some i2c slave-device drivers that were missing from NOTES. 2019-06-23 17:39:13 +00:00
Ian Lepore
9026f4b86d The sy8106a and syr827 drviers require FDT and the ext_resources subsystem. 2019-06-23 17:38:30 +00:00
Ian Lepore
48fedd0960 Add the rtc8583 driver to conf/files. Also, move sy8106a from
file.allwinner to conf/files... it's not allwinner-specific, some day
other platforms could use the same regulator chip.
2019-06-23 17:23:56 +00:00
Ian Lepore
5cafc16207 Remove some unused header files from the ad7418 driver. 2019-06-23 17:20:39 +00:00
Alexander Motin
6d4d657360 Decouple enc/ses verbosity from bootverbose.
I don't want to be regularly notified that my enclosure violates standards
until there is some real problem I want to debug.

MFC after:	2 weeks
2019-06-22 19:09:10 +00:00
Alan Cox
36c5a4cb3f Introduce pmap_remove_l3_range() and use it in two places:
(1) pmap_remove(), where it eliminates redundant TLB invalidations by
pmap_remove() and pmap_remove_l3(), and (2) pmap_enter_l2(), where it may
optimize the TLB invalidations by batching them.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12725
2019-06-22 16:26:38 +00:00
Ryan Libby
a6d2a24c3e ddb show proc typo 2019-06-22 05:35:23 +00:00
Alexander Motin
b8038d7827 Remove ancient SCSI-2/3 mentioning.
MFC after:	2 weeks
2019-06-22 03:50:43 +00:00
Eric van Gyzen
df8406543f VirtIO SCSI: validate seg_max on attach
Until r349278, bhyve presented a seg_max to the guest that was too large.
Detect this case and clamp it to the virtqueue size.  Otherwise, we would
fail the "too many segments to enqueue" assertion in virtqueue_enqueue().

I hit this by running a guest with a MAXPHYS of 256 KB.

Reviewed by:	bryanv cem
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20703
2019-06-22 01:20:45 +00:00
Alexander Motin
6805c9b74d Make ELEMENT INDEX validation more strict.
SES specifications tell: "The Additional Element Status descriptors shall
be in the same order as the status elements in the Enclosure Status
diagnostic page".  It allows us to question ELEMENT INDEX that is lower
then values we already processed.  There are many SAS2 enclosures with
this kind of problem.

While there, add more specific error messages for cases when ELEMENT INDEX
is obviously wrong.  Also skip elements with INVALID bit set.

MFC after:	2 weeks
2019-06-22 01:06:41 +00:00
Scott Long
0feb46b0c6 Refactor xpt_getattr() to make it more readable. No outwardly
visible functional changes, though code flow was modified a bit
internally to lessen the need for goto jumps and chained if
conditionals.
2019-06-21 23:40:26 +00:00
Alexander Motin
7318fcb51d Fix individual_element_index when some type has 0 elements.
When some type has 0 elements, saved_individual_element_index was set
to -1 on second type bump, since individual_element_index was not
restored after the first.  To me it looks easier just to increment
saved_individual_element_index separately than think when to save it.

MFC after:	2 weeks
2019-06-21 23:29:16 +00:00
Alan Somers
1bb957296b Reduce namespace pollution from r349233
Define __daddr_t in _types.h and use it in filio.h

Reported by:	ian, bde
Reviewed by:	ian, imp, cem
MFC after:	2 weeks
MFC-With:	349233
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20715
2019-06-21 21:50:14 +00:00
Johannes Lundberg
6425fed7e6 LinuxKPI: Additions to rcu list.
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D20719
2019-06-21 18:48:07 +00:00
Johannes Lundberg
62260f68b4 LinuxKPI: Add atomic_long_sub macro.
Reviewed by:	imp (mentor), hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D20718
2019-06-21 16:43:16 +00:00
Ian Lepore
83b319101f Add pwm to the armv7 GENERIC kernel, it's now used by TI and Allwinner. 2019-06-21 15:44:58 +00:00
Ian Lepore
ef558a1078 Add support for the PWM(9) API. This allows configuring the pwm output using
pwm(9), but also maintains the historical sysctl config interface for
compatiblity with existing apps.  The two config systems are not compatible
with each other; if you use both interfaces to change configurations you're
likely to end up with incorrect output or none at all.
2019-06-21 14:24:33 +00:00
Ian Lepore
3103a7eef7 Some mundane tweaks and cleanups to help de-clutter the diffs of some
upcoming functional changes.

Add an ofw_compat_data table for probing compat strings, and use it to add
PNP data.  Remove some stray semicolons at the end of macro definitions,
and add a PWM_LOCK_ASSERT macro to round out the usual suite.  Move the
device_t and driver_methods structs to the end of the file.  Tweak comments.
2019-06-21 14:01:02 +00:00
Ed Maste
b72236b407 nandsim: correct test to avoid out-of-bounds access
Previously nandsim_chip_status returned EINVAL iff both of user-provided
chip->ctrl_num and chip->num were out of bounds.  If only one failed the
bounds check arbitrary memory would be read and returned.

The NAND framework is not built by default, nandsim is not intended for
production use (it is a simulator), and the nandsim device has root-only
permissions.

admbugs:	827
Reported by:	Daniel Hodson of elttam
MFC after:	3 days
Security:	kernel information leak or DoS
Sponsored by:	The FreeBSD Foundation
2019-06-21 13:42:40 +00:00
Andrey V. Elsukov
978f2d1728 Add "tcpmss" opcode to match the TCP MSS value.
With this opcode it is possible to match TCP packets with specified
MSS option, whose value corresponds to configured in opcode value.
It is allowed to specify single value, range of values, or array of
specific values or ranges. E.g.

 # ipfw add deny log tcp from any to any tcpmss 0-500

Reviewed by:	melifaro,bcr
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-06-21 10:54:51 +00:00
Kristof Provost
05fc9d78d7 ip_output: pass PFIL_FWD in the slow path
If we take the slow path for forwarding we should still tell our
firewalls (hooked through pfil(9)) that we're forwarding. Pass the
ip_output() flags to ip_output_pfil() so it can set the PFIL_FWD flag
when we're forwarding.

MFC after:	1 week
Sponsored by:	Axiado
2019-06-21 07:58:08 +00:00
Conrad Meyer
c363b16c63 sys: Remove DEV_RANDOM device option
Remove 'device random' from kernel configurations that reference it (most).
Replace perhaps mistaken 'nodevice random' in two MIPS configs with 'options
RANDOM_LOADABLE' instead.  Document removal in UPDATING; update NOTES and
random.4.

Reviewed by:	delphij, markm (previous version)
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D19918
2019-06-21 00:16:30 +00:00
Takanori Watanabe
a809abd44a Fix the case where no root hub object while host controller object exist in ACPI namespace.
Also you can disable ACPI support for USB by setting
debug.acpi.disabled="usb"

PR:	238711
2019-06-20 23:52:33 +00:00
Alan Somers
38b06f8ac4 fcntl: fix overflow when setting F_READAHEAD
VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field.
However, fcntl allows you to set the seqcount in bytes to any nonnegative
31-bit value. The result can be a 16-bit overflow, which will be
sign-extended in functions like ffs_read. Fix this by sanitizing the
argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather
than INT16_MAX.

Also, fifos have overloaded the f_seqcount field for a completely different
purpose ever since r238936.  Formalize that by using a union type.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20710
2019-06-20 23:07:20 +00:00
Alexander Motin
68035f6381 SPC-3 and up require some UAs to be returned as fixed.
MFC after:	2 weeks
2019-06-20 22:20:30 +00:00
Alexander Motin
35a9ffc350 Optimize xpt_getattr().
Do not allocate temporary buffer for attributes we are going to return
as-is, just make sure to NUL-terminate them.  Do not zero temporary 64KB
buffer for CDAI_TYPE_SCSI_DEVID, XPT tells us how much data it filled
and there are also length fields inside the returned data also.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-20 20:29:42 +00:00
Navdeep Parhar
17795d8234 cxgbe/t4_tom: DDP_DEAD is a ddp flag and not a toepcb flag.
The driver was in effect setting TPF_ABORT_SHUTDOWN on the toepcb
instead of what was intended.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-06-20 20:06:19 +00:00
Brooks Davis
74a1b66cf4 Extend mmap/mprotect API to specify the max page protections.
A new macro PROT_MAX() alters a protection value so it can be OR'd with
a regular protection value to specify the maximum permissions.  If
present, these flags specify the maximum permissions.

While these flags are non-portable, they can be used in portable code
with simple ifdefs to expand PROT_MAX() to 0.

This change allows (e.g.) a region that must be writable during run-time
linking or JIT code generation to be made permanently read+execute after
writes are complete.  This complements W^X protections allowing more
precise control by the programmer.

This change alters mprotect argument checking and returns an error when
unhandled protection flags are set.  This differs from POSIX (in that
POSIX only specifies an error), but is the documented behavior on Linux
and more closely matches historical mmap behavior.

In addition to explicit setting of the maximum permissions, an
experimental sysctl vm.imply_prot_max causes mmap to assume that the
initial permissions requested should be the maximum when the sysctl is
set to 1.  PROT_NONE mappings are excluded from this for compatibility
with rtld and other consumers that use such mappings to reserve
address space before mapping contents into part of the reservation.  A
final version this is expected to provide per-binary and per-process
opt-in/out options and this sysctl will go away in its current form.
As such it is undocumented.

Reviewed by:	emaste, kib (prior version), markj
Additional suggestions from:	alc
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18880
2019-06-20 18:24:16 +00:00
Alan Somers
9de14ff010 #include <sys/types.h> from sys/filio.h
This fixes world build after r349231

Reported by:	Jenkins
MFC after:	2 weeks
MFC-With:	349231
Sponsored by:	The FreeBSD Foundation
2019-06-20 14:35:28 +00:00
Alan Somers
d49b446bfb Add FIOBMAP2 ioctl
This ioctl exposes VOP_BMAP information to userland. It can be used by
programs like fragmentation analyzers and optimized cp implementations. But
I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name
distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD
and Linux.  FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block
number instead of 32-bit, and it also returns runp and runb.

Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20705
2019-06-20 14:13:10 +00:00
Alan Somers
d01752c703 Add a VOP_BMAP(9) man page
Reviewed by:	mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20704
2019-06-20 13:59:46 +00:00
Alexander Motin
f91aa773be Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
Mark Johnston
ee1f168540 Group vm_page_activate()'s definition with other related functions.
No functional change intended.

MFC after:	3 days
2019-06-19 21:36:00 +00:00
Alexander Motin
49ee0fcea5 Use sbuf_cat() in GEOM confxml generation.
When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-19 15:36:02 +00:00
Jonathan T. Looney
5e02b277a4 Add the ability to limit how much the code will fragment the RACK send map
in response to SACKs. The default behavior is unchanged; however, the limit
can be activated by changing the new net.inet.tcp.rack.split_limit sysctl.

Submitted by:	Peter Lei <peterlei@netflix.com>
Reported by:	jtl
Reviewed by:	lstewart (earlier version)
Security:	CVE-2019-5599
2019-06-19 13:55:00 +00:00
Alexander Motin
eeea0fcf0f Fix typo in r349178.
Reported by:	ae
MFC after:	1 week
2019-06-19 13:30:50 +00:00
Marko Zec
188adcb7e4 V_ip6_forwarding and V_ipforwarding have been defined in ip6_var.h /
ip_var.h since at least 2008, so make use of those definitions here.

MFC after:	3 days
2019-06-19 08:49:24 +00:00
Marko Zec
6aee0bfa85 Evaluating htons() at compile time is more efficient than doing ntohs()
at runtime.  This change removes a dependency on a barrel shifter pass
before branch resolution, while reducing the instruction stream size
by 9 bytes on amd64.

MFC after:	3 days
2019-06-19 08:39:19 +00:00
Scott Long
da761f3b1f Implement VT-d capability detection on chipsets that have multiple
translation units with differing capabilities

From the author via Bugzilla:
---
When an attempt is made to passthrough a PCI device to a bhyve VM
(causing initialisation of IOMMU) on certain Intel chipsets using
VT-d the PCI bus stops working entirely. This issue occurs on the
E3-1275 v5 processor on C236 chipset and has also been encountered
by others on the forums with different hardware in the Skylake
series.

The chipset has two VT-d translation units. The issue is caused by
an attempt to use the VT-d device-IOTLB capability that is
supported by only the first unit for devices attached to the
second unit which lacks that capability. Only the capabilities of
the first unit are checked and are assumed to be the same for all
units.

Attached is a patch to rectify this issue by determining which
unit is responsible for the device being added to a domain and
then checking that unit's device-IOTLB capability. In addition to
this a few fixes have been made to other instances where the first
unit's capabilities are assumed for all units for domains they
share. In these cases a mutual set of capabilities is determined.
The patch should hopefully fix any bugs for current/future
hardware with multiple translation units supporting different
capabilities.

A description is on the forums at
https://forums.freebsd.org/threads/pci-passthrough-bhyve-usb-xhci.65235
The thread includes observations by other users of the bug
occurring, and description as well as confirmation of the fix.
I'd also like to thank Ordoban for their help.

---
Personally tested on a Skylake laptop, Skylake Xeon server, and
a Xeon-D-1541, passing through XHCI and NVMe functions.  Passthru
is hit-or-miss to the point of being unusable without this
patch.

PR: 229852
Submitted by: callum@aitchison.org
MFC after: 1 week
2019-06-19 06:41:07 +00:00
Alan Cox
0b3c5f1bc9 Correct an error in r349122. pmap_unwire() should update the pmap's wired
count, not its resident count.

X-MFC with:	r349122
2019-06-19 03:33:00 +00:00
Alexander Motin
5c32e9fcb2 Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output.  Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns.  Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
 - on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
 - on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-18 21:05:10 +00:00
Conrad Meyer
22eedc9722 random(4): Fix a regression in short AES mode reads
In r349154, random device reads of size < 16 bytes (AES block size) were
accidentally broken to loop forever.  Correct the loop condition for small
reads.

Reported by:	pho
Reviewed by:	delphij
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D20686
2019-06-18 18:50:58 +00:00
Ian Lepore
edd96b9fb9 Handle labels specified with hints even on FDT systems. Hints are the
easiest thing for a user to control (via loader.conf or kenv+kldload), so
handle them in addition to any label specified via the FDT data.
2019-06-18 17:05:05 +00:00
Ed Maste
05918954d8 Remove sys/capability.h for the third time
In all supported (and most unsupported) FreeBSD versions the appropriate
header for Capsicum is sys/capsicum.h.  Software including sys/capability.h
is most likely looking for Linux capabilities based on the withdrawn
POSIX.1e draft.

This header was previously removed in r334929 and r340156, but reverted
each time due to ports failures.  These issues have now (broadly) been
addressed.

PR:		228878 [exp-run]
Submitted by:	eadler (r334929)
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
2019-06-18 14:13:52 +00:00
Ian Lepore
780c3de886 Remove everything related to channels from the pwmc public interface, now
that there is a pwmc(4) instance per channel and the channel number is
maintained as a driver ivar rather than being passed in from userland.
2019-06-18 00:11:00 +00:00
Takanori Watanabe
e68fcc8875 Add ACPI support for USB driver.
This adds ACPI device path on devinfo(8) output and
show  value of _UPC(usb port capabilities), _PLD (physical location of device)
when hw.usb.debug >= 1 .

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D20630
2019-06-17 23:03:30 +00:00
Conrad Meyer
179f62805c random(4): Fortuna: allow increased concurrency
Add experimental feature to increase concurrency in Fortuna.  As this
diverges slightly from canonical Fortuna, and due to the security
sensitivity of random(4), it is off by default.  To enable it, set the
tunable kern.random.fortuna.concurrent_read="1".  The rest of this commit
message describes the behavior when enabled.

Readers continue to update shared Fortuna state under global mutex, as they
do in the status quo implementation of the algorithm, but shift the actual
PRF generation out from under the global lock.  This massively reduces the
CPU time readers spend holding the global lock, allowing for increased
concurrency on SMP systems and less bullying of the harvestq kthread.

It is somewhat of a deviation from FS&K.  I think the primary difference is
that the specific sequence of AES keys will differ if READ_RANDOM_UIO is
accessed concurrently (as the 2nd thread to take the mutex will no longer
receive a key derived from rekeying the first thread).  However, I believe
the goals of rekeying AES are maintained: trivially, we continue to rekey
every 1MB for the statistical property; and each consumer gets a
forward-secret, independent AES key for their PRF.

Since Chacha doesn't need to rekey for sequences of any length, this change
makes no difference to the sequence of Chacha keys and PRF generated when
Chacha is used in place of AES.

On a GENERIC 4-thread VM (so, INVARIANTS/WITNESS, numbers not necessarily
representative), 3x concurrent AES performance jumped from ~55 MiB/s per
thread to ~197 MB/s per thread.  Concurrent Chacha20 at 3 threads went from
roughly ~113 MB/s per thread to ~430 MB/s per thread.

Prior to this change, the system was extremely unresponsive with 3-4
concurrent random readers; each thread had high variance in latency and
throughput, depending on who got lucky and won the lock.  "rand_harvestq"
thread CPU use was high (double digits), seemingly due to spinning on the
global lock.

After the change, concurrent random readers and the system in general are
much more responsive, and rand_harvestq CPU use dropped to basically zero.

Tests are added to the devrandom suite to ensure the uint128_add64 primitive
utilized by unlocked read functions to specification.

Reviewed by:	markm
Approved by:	secteam(delphij)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20313
2019-06-17 20:29:13 +00:00
Cy Schubert
b8358917db Make ipf_objbytes a constant. ipf_objbytes is a table of internal data
structures that are saved across reboots by ipfs(8). The table is not
changed at runtime.

MFC after:	3 days
2019-06-17 20:10:55 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Ian Lepore
b5d67730ee Put the pwmc cdev filenames under the pwm directory along with any label
names.  I.e., everything related to pwm now goes in /dev/pwm.  This will
make it easier for userland tools to turn an unqualified name into a fully
qualified pathname, whether it's the base pwmcX.Y name or a label name.
2019-06-17 16:26:43 +00:00
Conrad Meyer
d0d71d818c random(4): Generalize algorithm-independent APIs
At a basic level, remove assumptions about the underlying algorithm (such as
output block size and reseeding requirements) from the algorithm-independent
logic in randomdev.c.  Chacha20 does not have many of the restrictions that
AES-ICM does as a PRF (Pseudo-Random Function), because it has a cipher
block size of 512 bits.  The motivation is that by generalizing the API,
Chacha is not penalized by the limitations of AES.

In READ_RANDOM_UIO, first attempt to NOWAIT allocate a large enough buffer
for the entire user request, or the maximal input we'll accept between
signal checking, whichever is smaller.  The idea is that the implementation
of any randomdev algorithm is then free to divide up large requests in
whatever fashion it sees fit.

As part of this, two responsibilities from the "algorithm-generic" randomdev
code are pushed down into the Fortuna ra_read implementation (and any other
future or out-of-tree ra_read implementations):

  1. If an algorithm needs to rekey every N bytes, it is responsible for
  handling that in ra_read(). (I.e., Fortuna's 1MB rekey interval for AES
  block generation.)

  2. If an algorithm uses a block cipher that doesn't tolerate partial-block
  requests (again, e.g., AES), it is also responsible for handling that in
  ra_read().

Several APIs are changed from u_int buffer length to the more canonical
size_t.  Several APIs are changed from taking a blockcount to a bytecount,
to permit PRFs like Chacha20 to directly generate quantities of output that
are not multiples of RANDOM_BLOCKSIZE (AES block size).

The Fortuna algorithm is changed to NOT rekey every 1MiB when in Chacha20
mode (kern.random.use_chacha20_cipher="1").  This is explicitly supported by
the math in FS&K §9.4 (Ferguson, Schneier, and Kohno; "Cryptography
Engineering"), as well as by their conclusion: "If we had a block cipher
with a 256-bit [or greater] block size, then the collisions would not
have been an issue at all."

For now, continue to break up reads into PAGE_SIZE chunks, as they were
before.  So, no functional change, mostly.

Reviewed by:	markm
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D20312
2019-06-17 15:09:12 +00:00
Conrad Meyer
403c041316 random(4): Add regression tests for uint128 implementation, Chacha CTR
Add some basic regression tests to verify behavior of both uint128
implementations at typical boundary conditions, to run on all architectures.

Test uint128 increment behavior of Chacha in keystream mode, as used by
'kern.random.use_chacha20_cipher=1' (r344913) to verify assumptions at edge
cases.  These assumptions are critical to the safety of using Chacha as a
PRF in Fortuna (as implemented).

(Chacha's use in arc4random is safe regardless of these tests, as it is
limited to far less than 4 billion blocks of output in that API.)

Reviewed by:	markm
Approved by:	secteam(gordon)
Differential Revision:	https://reviews.freebsd.org/D20392
2019-06-17 14:59:45 +00:00
Ian Lepore
2c6c030ce2 Add back a const qualifier I somehow fumbled away between test-building
and committing recent changes.
2019-06-17 03:48:44 +00:00
Ian Lepore
0a06f11da3 Implement the ofw_bus_get_node method in aw_pwm(4) so that ofw_pwmbus can
find its metadata for instantiating children.
2019-06-17 03:40:00 +00:00
Ian Lepore
b43e2c8b56 Add ofw_pwmbus to enumerate pwmbus devices on systems configured with fdt
data.  Also, add fdt support to pwmc.
2019-06-17 03:32:05 +00:00
Alan Cox
29c25bd781 Eliminate a redundant call to pmap_invalidate_page() from
pmap_ts_referenced().

MFC after:	14 days
Differential Revision:	https://reviews.freebsd.org/D12725
2019-06-17 01:58:25 +00:00
Alan Cox
c8f59059d8 Three changes to arm64's pmap_unwire():
Implement wiring changes on superpage mappings.  Previously, a superpage
mapping was unconditionally demoted by pmap_unwire(), even if the wiring
change applied to the entire superpage mapping.

Rewrite a comment to use the arm64 names for bits in a page table entry.
Previously, the bits were referred to by their x86 names.

Use atomic_"op"_64() instead of atomic_"op"_long() to update a page table
entry in order to match the prevailing style in this file.

MFC after:	10 days
2019-06-16 22:13:27 +00:00
Nathan Whitehorn
4d210a60c3 Fix bug on newbus device deletion: we should delete the child's devinfo
on deletion, not the parent's.

MFC after:	3 weeks
2019-06-16 21:56:45 +00:00
Ian Lepore
0af7a9a451 Rework pwmbus and pwmc so that each child will handle a single PWM channel.
Previously, there was a pwmc instance for each instance of pwm hardware
regardless of how many pwm channels that hardware supported.  Now there
will be a pwmc instance for each channel when the hardware supports
multiple channels.  With a separate instance for each channel, we can have
"named channels" in userland by making devfs alias entries in /dev/pwm.

These changes add support for ivars to pwmbus, and use an ivar to track the
channel number for each child.  It also adds support for hinted children.

In pwmc, the driver checks for a label hint, and if present, it's used to
create an alias for the cdev in /dev/pwm.  It's not anticipated that hints
will be heavily used, but it's easy to do and allows quick ad-hoc creation
of named channels from userland by using kenv to create hint.pwmc.N.label=
hints.  Upcoming changes will add FDT support, and most labels will
probably be specified that way.
2019-06-16 19:44:42 +00:00
Alan Cox
bf13f9d279 Three enhancements to arm64's pmap_protect():
Implement protection changes on superpage mappings.  Previously, a superpage
mapping was unconditionally demoted by pmap_protect(), even if the
protection change applied to the entire superpage mapping.

Precompute the bit mask describing the protection changes rather than
recomputing it for every page table entry that is changed.

Skip page table entries that already have the requested protection changes
in place.

Reviewed by:	andrew, kib
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D20657
2019-06-16 16:45:01 +00:00
Ian Lepore
b71764df96 In detach(), call bus_generic_detach() before deleting the iicbus child.
This gives the bus and its children the chance to return EBUSY to abort
the detach if they're in the middle of doing some IO.
2019-06-16 16:02:50 +00:00
Ian Lepore
b93539730b Rename pwmbus.h to ofw_pwm.h, because after all the recent changes, there
is nothing left in the file that related to pwmbus at all.  It just contains
prototypes for the functions implemented in dev/pwm.ofw_pwm.c, so name it
accordingly and fix the include protect wrappers to match.

A new pwmbus.h will be coming along in a future commit.
2019-06-16 15:56:59 +00:00
Philip Paeps
5a037b1197 Add macOS-like three finger drag trackpad gesture to psm(4)
Submitted by:	Yan Ka Chiu <nyan@myuji.xyz>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20648
2019-06-16 03:06:05 +00:00
Ian Lepore
6e14f601aa Build SoC-specific modules with GENERIC for the SoCs that have them. 2019-06-16 01:23:45 +00:00
Ian Lepore
b9f654b163 Add module makefiles for Texas Instruments ARM SoCs.
The natural place to look for them based on how other SoCs are organized
would be sys/modules/ti, but that's already taken.  Drop a clue into
modules/ti/Makefile directing people to modules/arm_ti if they're looking
for ARM modules.
2019-06-16 01:22:44 +00:00
Ian Lepore
5935e64693 Split the dtb MODULES_EXTRA line to a series of += lines, making it easier
to maintain and keep in alphabetical order, and paving the way for adding
some other modules that aren't dtb-related.
2019-06-16 01:05:53 +00:00
Ian Lepore
e108c3df04 Add module makefiles for pwm. 2019-06-16 00:53:09 +00:00
Ian Lepore
e3384e8c44 This code no longer uses fdt/ofw stuff, no need to include ofw headers. 2019-06-16 00:43:05 +00:00
Ian Lepore
09ebe549ae Make channel number unsigned, and spell unsigned int u_int. This should
have been part of r349088.
2019-06-16 00:32:19 +00:00
Ian Lepore
aee0e20139 The pwm interface was replaced with pwmbus, include the right header file. 2019-06-16 00:27:11 +00:00
Ian Lepore
6cdbe2bf20 Make pwm channel numbers unsigned. 2019-06-15 23:02:09 +00:00
Ian Lepore
f8f8d87cd9 Restructure the pwm device hirearchy and interfaces.
The pwm and pwmbus interfaces were nearly identical, this merges them into a
single pwmbus interface.  The pwmbus driver now implements the pwmbus
interface by simply passing all calls through to its parent (the hardware
driver).  The channel_count method moves from pwm to pwmbus, and the
get_bus method is deleted (just no longer needed).

The net effect is that the interface for doing pwm stuff is now the same
regardless of whether you're a child of pwmbus, or some random driver
elsewhere in the hierarchy that is bypassing the pwmbus layer and is talking
directly to the hardware driver via cross-hierarchy connections established
using fdt data.

The pwmc driver is now a child of pwmbus, instead of being its sibling
(that's why the get_bus method is no longer needed; pwmc now gets the
device_t of the bus using device_get_parent()).
2019-06-15 22:25:39 +00:00
Ian Lepore
6bb8042535 Destroy the cdev on device detach. Also, make the driver and devclass
static, because nothing outside this file needs them.
2019-06-15 21:51:55 +00:00
Ian Lepore
9878710395 Rename the channel_max method to channel_count, because that's what it's
returning.  (If the channel count is 2, then the max channel number is 1.)
2019-06-15 21:36:14 +00:00
Ian Lepore
47e17a1b1a Give the aw_pwm driver a module version. 2019-06-15 21:31:04 +00:00
Ian Lepore
59d8a61ca7 Spell unsigned int as u_int and channel as chan; eliminates the need to wrap
some long lines.
2019-06-15 21:19:23 +00:00
Ian Lepore
cd6e47c168 Unwrap prototype lines so that return type and function name are on the
same line.  No functional changes.
2019-06-15 20:54:33 +00:00
Ian Lepore
968e5efcca Make pwmbus driver and devclass vars static; they're not mentioned in any
header file, so they can't be used outside this file anyway.
2019-06-15 20:53:26 +00:00
Ian Lepore
2d5913e4f4 Add a missing #include. I suspect this used to get included via some header
pollution that was cleaned up recently, and this file got missed in the
cleanup because it's not attached to the build unless you specifically
request this device in a custom kernel config.
2019-06-15 20:20:36 +00:00
Ian Lepore
1e76aee880 Use device_delete_children() instead of a locally-rolled copy of it that
leaks the device-list memory.
2019-06-15 20:17:00 +00:00
Ian Lepore
3cee44bc88 Remove pwmbus_attach_bus(), it no longer has any callers. Also remove a
couple prototypes for functions that never existed (and never will).
2019-06-15 20:13:42 +00:00
Ian Lepore
71fb373934 Move/rename the sys/pwm.h header file to dev/pwm/pwmc.h. The file contains
ioctl definitions and related datatypes that allow userland control of pwm
hardware via the pwmc device.  The new name and location better reflects its
assocation with a single device driver.
2019-06-15 19:46:59 +00:00
Ian Lepore
f9d8090ea8 Do not include pwm.h here, it is purely a userland interface file containing
ioctl defintions for the pwmc driver. It is not part of the pwmbus interface.
2019-06-15 19:43:33 +00:00
Alan Cox
3b97569c31 Previously, when pmap_remove_pages() destroyed a dirty superpage mapping,
it only called vm_page_dirty() on the first of the superpage's constituent
4KB pages.  This revision corrects that error, calling vm_page_dirty() on
all of superpage's constituent 4KB pages.

MFC after:	3 days
2019-06-15 17:26:42 +00:00
Ian Lepore
4d7ef8a2be Handle failure to enable the clock or obtain its frequency. 2019-06-15 16:59:03 +00:00
Ian Lepore
1bf4afc527 Don't call pwmbus_attach_bus(), because it may not be present if this
driver is compiled into the kernel but pwmbus will be loaded as a module
when needed (and because of that, pwmbus_attach_bus() is going away in
the near future).  Instead, just directly do what that function did:
register the fdt xfef handle, and attach the pwmbus.
2019-06-15 16:56:00 +00:00
Ian Lepore
5ef9c1079e In detach(), check for failure of bus_generic_detach(), only release
resources if they got allocated (because detach() gets called from attach()
to handle various failures), and delete the pwmbus child if it got created.
2019-06-15 16:36:29 +00:00
Ian Lepore
dd47326c82 Allow pwm(9) components to be selected individually, while 'device pwm'
still includes it all.
2019-06-15 16:16:29 +00:00
Marius Strobl
d49e83eac3 - Replace unused and only ever written to members of public iflib(9)
structs with placeholders (in the latter case, IFLIB_MAX_TX_BYTES
  etc. are also only ever used for these write-only members if at all,
  so both these macros and members can just go). Using these spares
  may render it possible to merge certain iflib(9) fixes to stable/12.
  Otherwise, changes extending struct if_irq or struct if_shared_ctx
  in any way would break KBI as instances of these are allocated by
  the driver front-ends (by contrast, struct if_pkt_info as well as
  struct if_softc_ctx instances are provided by iflib(9) and, thus,
  may grow at least at the end without breaking KBI).
- Make the pvi_name in struct pci_vendor_info const char * as device
  identifiers in hardware lookup tables aren't to be expected to ever
  change at runtime.
- Similarly, make the pci_vendor_info_t of struct if_shared_ctx which
  is used to point to the struct pci_vendor_info arrays provided by
  the driver front-ends const.
- Remove the ETH_ADDR_LEN macro from iflib.h; this was duplicating
  ETHER_ADDR_LEN of <net/ethernet.h> with iflib(9) actually only
  consuming the latter macro.
- Make the name argument of iflib_io_tqg_attach(9) const, matching
  the taskqgroup_attach_cpu(9) this function wraps as well as e. g.
  iflib_config_gtask_init(9).
- Remove the orphaned iflib_qset_lock_get() prototype.
- Remove some extraneous empty lines.
2019-06-15 11:07:41 +00:00
Doug Moore
4766eba1df Critical comments were lost in r349203. This patch seeks to restore
the lost information in new comments.

Reported by: alc
Reviewed by: alc
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20632
2019-06-15 04:30:13 +00:00
Julian Elischer
c749d68596 Lightly hide the 'var' inside the macros to read the arm special registers.
I just happenned to have 3rd party code using 'var' as the output variable
which drew my attention to this. variables defined inside macros should be
prefixed to avoid getting shadowed varable wanrings from clang.
2019-06-15 00:47:39 +00:00
Alan Cox
26066aef10 Batch the TLB invalidations that are performed by pmap_protect() rather
than performing them one at a time.

MFC after:	10 days
2019-06-14 22:06:43 +00:00
Alexander Motin
3bae917061 Minimize aggsum_compare(&arc_size, arc_c) calls.
For busy ARC situation when arc_size close to arc_c is desired.  But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result.  Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.

Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.

I suppose there some minor arc_p behavior change due to lower precision
of the new code, but I don't think it is a big deal, since it should
affect only very small window in time (aggsum buckets are flushed every
second) and in ARC size (buckets are limited to 10 average ARC blocks
per CPU).

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-14 20:04:28 +00:00
Alexander Motin
b3b3aa2e29 Alike to ZoL disable metaslab allocation tracing code.
It is too generous to collect in production debug traces that can only
be read with kernel debugger.  Illumos includes special code in their
mdb debugger to read it, we don't.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-14 19:57:32 +00:00
Alexander Motin
284e53a401 Properly align struct multilist_sublist to cache line.
Manual Illumos alignment does not fit us due to different kmutex_t size.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-14 17:09:39 +00:00
Alan Cox
aa4a1d3a55 Change the arm64 pmap so that updates to the global count of wired pages are
not performed directly by the pmap.  Instead, they are performed by
vm_page_free_pages_toq().  (This is the same approach that we use on x86.)

Reviewed by:	kib, markj
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D20627
2019-06-14 04:01:08 +00:00
Doug Moore
771315283b Avoid using the prev field of vm_map_entry_t in two functions that
iterate over consecutive vm_map entries, and that can easily just
'remember' the prev value instead of looking it up.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20628
2019-06-14 03:15:54 +00:00
Alexander Motin
c80038a0a7 Update td_runtime of running thread on each statclock().
Normally td_runtime is updated on context switch, but there are some kernel
threads that due to high absolute priority may run for many seconds without
context switches (yes, that is bad, but that is true), which means their
td_runtime was not updated all that time, that made them invisible for top
other then as some general CPU usage.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-14 01:09:10 +00:00
Warner Losh
1682a3ab4b Add opt_cam.h so we can build this outside of a kernel build. 2019-06-13 22:03:53 +00:00
Doug Moore
af1d6d6a11 Create a function for creating objects to back map entries, and one
for giving cred to a map entry backed by an object, and use them
instead of the code duplicated inline now.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20370
2019-06-13 20:09:07 +00:00
Warner Losh
2e3d6d0f5f Don't print the request we may be aborting in ciss_notify_abort as
part of ciss_detach. It's a left-over debug that isn't needed and also
discloses a kernel address. Only root could provoke as part of a
devctl or kldunload.

Submitted by: Fuqian Huang
MFC After: 1 week
2019-06-13 05:19:42 +00:00
Alexander Motin
913095dc56 Move write aggregation memory copy out of vq_lock.
Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.

Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.

Reviewed by:	ahrens, behlendorf, ryao
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-13 01:21:32 +00:00
Bryan Drewery
61b54f34a3 Don't delete .depend files outside of cleandepend.
Sponsored by:	DellEMC
2019-06-12 23:09:10 +00:00
Alan Cox
a04cd5cdee Change pmap_demote_l2_locked() so that it removes the superpage mapping on a
demotion failure.  Otherwise, some callers to pmap_demote_l2_locked(), such
as pmap_protect(), may leave an incorrect mapping in place on a demotion
failure.

Change pmap_demote_l2_locked() so that it handles addresses that are not
superpage aligned.  Some callers to pmap_demote_l2_locked(), such as
pmap_protect(), may not pass a superpage aligned address.

Change pmap_enter_l2() so that it correctly calls vm_page_free_pages_toq().
The arm64 pmap is updating the count of wired pages when freeing page table
pages, so pmap_enter_l2() should pass false to vm_page_free_pages_toq().

Optimize TLB invalidation in pmap_remove_l2().

Reviewed by:	kib, markj (an earlier version)
Discussed with:	andrew
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D20585
2019-06-12 20:38:49 +00:00
Mariusz Zaborski
a802439365 geli: style nits 2019-06-12 19:29:48 +00:00
Mariusz Zaborski
e7630efbe6 geli: partially revert r348709
Let's change the unsigned arguments to the signed one, but let's don't
change pointers to the array notation.

Requested by:	pjd
2019-06-12 19:29:12 +00:00
Stephen Hurd
705aad98c6 Some devices take undesired actions when RTS and DTR are
asserted. Some development boards for example will reset on DTR,
and some radio interfaces will transmit on RTS.

This patch allows "stty -f /dev/ttyu9.init -rtsdtr" to prevent
RTS and DTR from being asserted on open(), allowing these devices
to be used without problems.

Reviewed by:    imp
Differential Revision:  https://reviews.freebsd.org/D20031
2019-06-12 18:07:04 +00:00
Jonathan T. Looney
1524298754 The current IPMI KCS code is waiting 100us for all transitions (roughly
between each byte either sent or received). However, most transitions
actually complete in 2-3 microseconds.

By polling the status register with a delay of 4us with exponential
backoff, the performance of most IPMI operations is significantly
improved:
  - A BMC update on a Supermicro x9 or x11 motherboard goes from ~1 hour
    to ~6-8 minutes.
  - An ipmitool sensor list time improves by a factor of 4.

Testing showed no significant improvements on a modern server by using
a lower delay.

The changes should also generally reduce the total amount of CPU or
I/O bandwidth used for a given IPMI operation.

Submitted by:	Loic Prylli <lprylli@netflix.com>
Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20527
2019-06-12 16:06:31 +00:00
Ian Lepore
3a3ab50916 Don't attempt to include hwpmc support for armv6, we're missing some of the
necessary support functions in cpu-v6.h, and it may be that the only armv6
platform we support (RPi, the bcm2835 SOC) is incapable of supporting hwpmc.

Reported by:	dim@
2019-06-12 16:05:20 +00:00
Brandon Bergren
664104b4af Fix PPC970 boot after r348783
r348783 changed the behavior of the kernel mappings and broke booting on G5.

- Split the kernel mapping logic out so that the case where we are
running from the wrong memory space is handled using identity
mappings, and the case where we are not using a DMAP is handled by
forcibly mapping the kernel into the dmap range as intended by
r348783.

Reported by:	Mikael Urankar
Reviewed by:	luporl
Approved by:	jhibbits (mentor)
Differential Revision:	https://reviews.freebsd.org/D20608
2019-06-12 15:58:11 +00:00
Cy Schubert
6ee97dd9a3 Whitespace adjustments replacing spaces with tabs.
MFC after:	1 month
X-MFC with:	r348987
2019-06-12 11:18:11 +00:00
Cy Schubert
394fa2b515 Resolve IPv6 checksum errors with stateful inspection. According to
PR/203585 this appears to have been broken by r235959, which predates
the ipfilter 5.1.2 import into FreeBSD.

The IPv6 checksum calculation is incorrect. To resolve this we call
in6_cksum() to do the the heavy lifting for us, through a new function
ipf_pcksum6(). Should we need to revisit this area again, a DTrace probe
is added to aid with future debugging.

PR:		203275, 203585
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D20583
2019-06-12 11:06:58 +00:00
Cy Schubert
6000630b72 Register pfil hooks when VNET != vnet0. r302298, which virtualized ipf,
assumed the pfil hook registration performed in ipf_modload() would take
care of this. However ipf_modload() is only called when the ipl kld is
loaded or when ipfilter is first called when it is statically linked
into the kernel at build time.

Prior to this, even though r302298 has been in the tree for a while, it
has never been used. So, r302298 in reality begins now.

PR:		212000
Reported by:	ahsanb@
MFC after:	1 month
2019-06-12 11:06:54 +00:00
Cy Schubert
61208bb681 Enclose a long multi-line single conditional statement in braces to
improve legibility and aesthetics.

MFC after:	1 week
2019-06-12 11:06:51 +00:00
Bryan Drewery
f68801df10 Stop using .OODATE for extracting firmware.
This fixes META_MODE rebuilding since it assumes that it this is
a non-consistent build command. These are always unencoded consistently
though and do not need to use the .OODATE/$? mechanism.

MFC after:	2 weeks
Reported by:	npn
Sponsored by:	DellEMC
2019-06-12 00:03:00 +00:00
Bryan Drewery
c40ecff845 Add missing DPSRCS entry for assym.inc.
This brings in various CLEANFILES/DEPENDOBJS handling.

MFC after:	2 weeks
Sponsored by:	DellEMC
2019-06-11 23:35:49 +00:00
Bryan Drewery
9d2b4c4fa3 Restore genassym.o to CLEANFILES.
This was lost in r335910 for some reason.

This also fixes a META_MODE rebuild issue in some modules [1].

MFC after:	2 weeks
Reported by:	npn [1]
Sponsored by:	DellEMC
2019-06-11 23:35:34 +00:00
John Baldwin
a0c4047d4d Move declaration of warninterval out from under COMPAT_FREEBSD32.
This fixes builds of kernels without COMPAT_FREEBSD32.

Reported by:	tinderbox
MFC after:	1 month
2019-06-11 23:28:07 +00:00
John Baldwin
0f70218343 Make the warning intervals for deprecated crypto algorithms tunable.
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings.  The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by:	cem
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20555
2019-06-11 23:00:55 +00:00
Doug Moore
e65d58a0fe To test to see if a free space is big enough compare the required
length to the difference of the two offsets that define the gap, to
avoid overflow, rather that adding the length to an offset and
comparing that to another offset.

This addresses an overflow issue reported by Peter Holm on i386.

Reported by: pho
Tested by: pho
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20594
2019-06-11 22:41:39 +00:00
John Baldwin
77a0144145 Sort opt_foo.h #includes and add a missing blank line in ip_output(). 2019-06-11 22:07:39 +00:00
John Baldwin
dbdfb1fd19 Add M_NOFREE to M_FLAG_BITS. 2019-06-11 22:06:31 +00:00
John Baldwin
19c40c76e1 Trim an extra space. 2019-06-11 22:06:05 +00:00
Warner Losh
15865ae73d Minor white space changes.
Remove trailing white space that's crept into this file.
2019-06-11 20:48:19 +00:00
Leandro Lupori
49f10b5181 [PPC] Fix build error when POWERNV is disabled
When building a kernel supporting PSERIES but not POWERNV,
the compiler would complain about an error variable being
possibly used before being initialized.

In practice, however, this should never happen. In any case, it
is now initialized to an error value.
2019-06-11 11:23:15 +00:00
Leandro Lupori
5d8c3a0657 [PPC64] Fix ofw_initrd
Before this change, OFW initrd (as md) handling code was simulating an ofwbus
device. But as there isn't really a Device Tree (DT) node representing OFW
initrd (it is specified in 2 properties under /chosen), its driver was in fact
stealing other driver's DT node.  This was noticed after MD_ROOT_MEM became
default and QEMU's USB keyboard stopped working under VNC.

This change consists in simplifying the process of detection and mapping of
initrd memory, turning it into a simple startup step, instead of trying to
simulate a device.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D20553
2019-06-11 11:16:41 +00:00
Mitchell Horne
ffedb98b3e RISC-V: expose extension bits in AT_HWCAP
AT_HWCAP is a field in the elf auxiliary vector meant to describe
cpu-specific hardware features. For RISC-V we want to use this to
indicate the presence of any standard extensions supported by the CPU.
This allows userland applications to query the system for supported
extensions using elf_aux_info(3).

Support for an extension is indicated by the presence of its
corresponding bit in AT_HWCAP -- e.g. systems supporting the 'c'
extension (compressed instructions) will have the second bit set.

Extensions advertised through AT_HWCAP are only those that are supported
by all harts in the system.

Reviewed by:	jhb, markj
Approved by:	markj (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20493
2019-06-11 00:55:54 +00:00
Bjoern A. Zeeb
8c9ca28b31 A bit of code hygiene (no functional changes).
Hide unused code under #ifdef notyet (in one case the only caller is under
that same ifdef), or if it is arm (not arm64) specific code under the
__arm__ ifdef to not yield -Wunused-function warnings during the arm64
kernel compile.

MFC after:	2 weeks
2019-06-10 23:25:40 +00:00
Doug Moore
f8c8b2e8a0 r348879 introduced a wrong-way comparison that broke mmap.
This change rights that comparison.

Reported by: pho
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20595
2019-06-10 22:06:40 +00:00
Luiz Otavio O Souza
1251a82da6 Add support for the GPIO SD Card VCC regulator/switch and the GPIO SD Card
detection pins to the Marvell Xenon SDHCI controller.

These features are enable by 'vqmmc-supply' and 'cd-gpios' properties in the
DTS.

This fixes the SD Card detection on espressobin.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2019-06-10 21:50:07 +00:00
Doug Moore
5a0879da80 The computations of vm_map_splay_split and vm_map_splay_merge touch both
children of every entry on the search path as part of updating values of
the max_free field. By comparing the max_free values of an entry and its
child on the search path, the code can avoid accessing the child off the
path in cases where the max_free value decreases along the path.

Specifically, this patch changes splay_split so that the max_free field
of every entry on the search path is replaced, temporarily, by the
max_free field from its child not on the search path or, if the child
in that direction is NULL, then a difference between start and end
values of two pointers already available in the split code, without
following any next or prev pointers. However, to find that max_free
value does not require looking toward that other child if either the
child on the search path has a lower max_free value, or the current max_free
value is zero, because in either case we know that the value of max_free for
the other child is the value we already have. So, the changes to
vm_entry_splay_split make sure that we know all the off-search-path entries
we will need to complete the splay, without looking at all of them. There is
an exception at the bottom of the search path where we cannot rely on the
max_free value in the direction of the NULL pointer that ends the search,
because of the behavior of entry-clipping code.

The corresponding change to vm_splay_entry_merge makes it simpler, since it's
just reversing pointers and updating running maxima.

In a test intended to exercise vigorously the vm_map implementation, the
effect of this change was to reduce the data cache miss rate by 10-14% and
the running time by 5-7%.

Tested by: pho
Reviewed by: alc
Approved by: kib (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D19826
2019-06-10 21:34:07 +00:00
Luiz Otavio O Souza
8c62ce83cc Add the GPIO driver for the North/South bridge in Marvell Armada 37x0.
The A3700 has a different GPIO controller and thus, do not use the old (and
shared) code for Marvell.

The pinctrl driver, also part of the controller, is not supported yet (but
the implementation should be straightforward).

Sponsored by:	Rubicon Communications, LLC (Netgate)
2019-06-10 21:27:21 +00:00
Doug Moore
77555b849d Change the check for 'size' wrapping around to zero in kern_mmap to account
for both the lower and upper bound modifications. Change the error returned
to ENOMEM. Rename the parameter size to len and make size a local variable
that stores the value of len after it has been modified.

This addresses concerns expressed by Bruce Evans after r348843.

Reported by: brde@optusnet.com.au
Reviewed by: kib, markj (mentors)
MFC after: 3 days
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20592
2019-06-10 21:26:14 +00:00
Bjoern A. Zeeb
0422393286 Add a bus_add_child device method to bcm2835_sdhci.
This allows SDIO (through CAM) to attach to an upstream, e.g.,
      ..
      sdhci_bcm0 pnpinfo name=mmc@7e300000 compat=brcm,bcm2835-mmc
        sdiob0
          ..

Without this, upon trying to load sdio, we would panic with
"bus_add_child is not implemented".

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-06-10 21:24:38 +00:00
John Baldwin
5e35041990 Add warnings to /dev/crypto for deprecated algorithms.
These algorithms are deprecated algorithms that will have no in-kernel
consumers in FreeBSD 13.  Specifically, deprecate the following
algorithms:
- ARC4
- Blowfish
- CAST128
- DES
- 3DES
- MD5-HMAC
- Skipjack

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20554
2019-06-10 19:26:57 +00:00
John Baldwin
db4709c579 Add warnings for Kerberos GSS algorithms deprecated in RFCs 6649 and 8429.
All of these algorithms are explicitly marked SHOULD NOT in one of these
RFCs.

Specifically, RFC 6649 deprecates all algorithms using DES as well as
the "export-friendly" variant of RC4.  RFC 8429 deprecates Triple DES
and the remaining RC4 algorithms.

Reviewed by:	cem
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20343
2019-06-10 19:22:36 +00:00
John Baldwin
0b96ca3310 Remove an overly-aggressive assertion.
While it is true that the new vmspace passed to vmspace_switch_aio
will always have a valid reference due to the AIO job or the extra
reference on the original vmspace in the worker thread, it is not true
that the old vmspace being switched away from will have more than one
reference.

Specifically, when a process with queued AIO jobs exits, the exit hook
in aio_proc_rundown will only ensure that all of the AIO jobs have
completed or been cancelled.  However, the last AIO job might have
completed and woken up the exiting process before the worker thread
servicing that job has switched back to its original vmspace.  In that
case, the process might finish exiting dropping its reference to the
vmspace before the worker thread resulting in the worker thread
dropping the last reference.

Reported by:	np
Reviewed by:	alc, markj, np, imp
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20542
2019-06-10 19:01:54 +00:00
Niclas Zeising
2dd9a967d2 psm(4): Enable touchpads and trackpads by default
Enable synaptics and elantech touchpads, as well as IBM/Lenovo TrackPoints
by default, instead of having users find and toggle a loader tunable.
This makes things like two finger scroll and other modern features work out
of the box with X.  By enabling these settings by default, we get a better
desktop experience in X, since xserver and evdev can make use of the more
advanced synaptics and elantech features.

Reviewed by:	imp, wulf, 0mp
Approved by:	imp
Sponsored by:	B3 Init (zeising)
Differential Revision:	https://reviews.freebsd.org/D20507
2019-06-10 18:19:49 +00:00
Bjoern A. Zeeb
cd02c6b10d Enhance the comment ieee80211_add_channel() to avoid a
misunderstanding that the function does not work additive
when repeatedly called for diffferent bands.

Reviewed by:	avos (a few months ago)
MFC after:	2 weeks
2019-06-10 14:31:18 +00:00
Bjoern A. Zeeb
440565dad1 allwinner mmc: move variable assignment into block
"blksz is only used in one of the two blocks, so assign it inside
that block rather than outside.

MFC after:	2 weeks
2019-06-10 13:46:36 +00:00
Dmitry Chagin
a5ec4a9dba Use C11 anonymous unions.
PR:		215202
Reported by:	glebius
MFC after:	2 weeks
2019-06-10 05:28:03 +00:00
Justin Hibbits
fdb916d53e powernv: Port HMI handler to use the message framework
When an HMI occurs a message event also gets created with the details of the
exception.  Hook into the messaging framework to retrieve the HMI message.
Nothing is done with it yet, except to panic on unhandled exception.
2019-06-10 03:24:38 +00:00
Justin Hibbits
f433dab2de powerpc/powernv: Reduce the scope of the sensor guarding mutex
vmem_xalloc() cannot be called while holding a nonblocking mutex, warned
by WITNESS.  The lock may not be necessary in general, but it avoids
superfluous concurrent OPAL calls for the same sensor.

Reported by:	pkubaj
2019-06-10 03:16:55 +00:00
Doug Moore
97220a279f There are times when a len==0 parameter to mmap is okay. But on a
32-bit machine, a len parameter just a few bytes short of 4G, rounded
up to a page boundary and hitting zero then, is not okay. Return
failure in that case.

Reported by: pho
Reviewed by: alc, kib (mentor)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20580
2019-06-10 03:07:10 +00:00
Mitchell Horne
ff0e7938f2 Remove unused mcall_trap() function
The mcall_trap() dummy function is unused, and should be removed as we
are unlikely to support M-mode traps any time soon.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20494
2019-06-09 15:52:26 +00:00
Mitchell Horne
c04c594daa RISC-V: Clean up some GENERIC options
Some of the config options that are disabled by default seem to be only
for historical reasons. Enable those that appear to no longer be
problematic. This includes WITH_CTF, STACK, GEOM_RAID, and re-enabling
blacklisted kernel modules.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20495
2019-06-09 15:50:35 +00:00
Mitchell Horne
bffa317ff2 RISC-V: Announce real and available memory at boot
Most architectures print their total (real) and available memory during
boot. Properly initialize the realmem global and print these messages.
Also print the physical memory chunks (behind a bootverbose flag).

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20496
2019-06-09 15:48:36 +00:00
Mitchell Horne
93ca8057c5 Add TSLOG events to initriscv()
Add the enter and exit events, similar to what's found in
hammer_time() on amd64. We must use TSRAW as the pcpu isn't yet
initialized.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20497
2019-06-09 15:45:48 +00:00
Mitchell Horne
6ae48dd870 Fix global pointer relaxations in the RISC-V kernel
The gp register is intended to used by the linker as another means of
performing relaxations, and should point to the small data section (.sdata).

Currently gp is being used as the pcpu pointer within the kernel, but the more
appropriate choice for this is the tp register, which is unused.

Swap existing usage of gp with tp within the kernel, and set up gp properly
at boot with the value of __global_pointer$ for all harts.

Additionally, remove some cases of accessing tp from the PCB, as it is not
part of the per-thread state. The user's tp and gp should be tracked only
through the trapframe.

Reviewed by:	markj, jhb
Approved by:	markj (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19893
2019-06-09 15:43:38 +00:00
Mitchell Horne
fc261c16bd Remove block of dead code
Approved by:	markj (mentor)
2019-06-09 15:36:51 +00:00
Alan Cox
1fe65d054b Correct a new KASSERT() in r348828.
X-MFC with:	r348828
2019-06-09 05:55:58 +00:00
Alan Cox
fd2dae0a30 Implement an alternative solution to the amd64 and i386 pmap problem that we
previously addressed in r348246.

This pmap problem also exists on arm64 and riscv.  However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv.  In
particular, arm64 and riscv do not define a PG_PROMOTED flag in their level
2 PTEs.  (A PG_PROMOTED flag makes no sense on arm64, where unlike x86 or
riscv we are required to break the old 4KB mappings before making the 2MB
mapping; and on riscv there are no unused bits in the PTE to define a
PG_PROMOTED flag.)

This commit implements an alternative solution that can be used on all four
architectures.  Moreover, this solution has two other advantages.  First, on
older AMD processors that required the Erratum 383 workaround, it is less
costly.  Specifically, it avoids unnecessary calls to pmap_fill_ptp() on a
superpage demotion.  Second, it enables the elimination of some calls to
pagezero() in pmap_kernel_remove_{l2,pde}().

In addition, remove a related stale comment from pmap_enter_{l2,pde}().

Reviewed by:	kib, markj (an earlier version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20538
2019-06-09 03:36:10 +00:00
Vladimir Kondratyev
6c53fea7d6 psm(4): Add extra sanity checks to Elantech trackpoint packet parser.
Add strict checks for unused bit states in Elantech trackpoint packet
parser to filter out spurious events produces by some hardware which
are detected as trackpoint packets. See comment on r328191 for example.

Tested by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
2019-06-08 21:36:22 +00:00
Vladimir Kondratyev
8fa4620039 psm(4): Fix Elantech trackpoint support.
Sign bits for X and Y motion data were taken from wrong places.

PR:		238291
Reported by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
Tested by:	Andrey Kosachenko <andrey.kosachenko@gmail.com>
MFC after:	2 weeks
2019-06-08 21:33:34 +00:00
Konstantin Belousov
452a2db863 Style MAP_ENTRY_ and MAP_ definitions.
Spell all bits in the hex constants.
Since all lines are modified, consistently use <tab> after #define.

Reviewed by:	alc (previous version), dougm
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D20560
2019-06-08 20:28:04 +00:00
Konstantin Belousov
2f73a6e9c4 Correct definition for PGEX_SGX.
At the moment it is only used for page fault error code textual
representation.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-06-08 20:26:04 +00:00
Konstantin Belousov
8b49f4dd80 Make trap_msg array constant as well.
Suggested by:	tijl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 19:50:57 +00:00
Jonathan T. Looney
ca8929d2a3 Currently, MCA entries remain on an every-growing linked list. This means
that it becomes increasingly expensive to process a steady stream of
correctable errors. Additionally, the memory used by the MCA entries can
grow without bound.

Change the code to maintain two separate lists: a list of entries which
still need to be logged, and a list of entries which have already been
logged. Additionally, allow a user-configurable limit on the number of
entries which will be saved after they are logged. (The limit defaults
to -1 [unlimited], which is the current behavior.)

Reviewed by:	imp, jhb
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20482
2019-06-08 18:26:48 +00:00
Doug Moore
7c022327ab Simple code refactoring originally in D13484.
Extract swp_pager_force_dirty() and swp_pager_force_launder() out of
swp_pager_force_pagein().

Extract swap_pager_swapoff_object() out of swap_pager_swapoff().

Submitted by: ota_j.email.ne.jp
Reviewed by: alc, dougm
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20545
2019-06-08 17:49:17 +00:00
Bjoern A. Zeeb
4c62bffef5 Fix dpcpu and vnet panics with complex types at the end of the section.
Apply a linker script when linking i386 kernel modules to apply padding
to a set_pcpu or set_vnet section.  The padding value is kind-of random
and is used to catch modules not compiled with the linker-script, so
possibly still having problems leading to kernel panics.

This is needed as the code generated on certain architectures for
non-simple-types, e.g., an array can generate an absolute relocation
on the edge (just outside) the section and thus will not be properly
relocated. Adding the padding to the end of the section will ensure
that even absolute relocations of complex types will be inside the
section, if they are the last object in there and hence relocation will
work properly and avoid panics such as observed with carp.ko or ipsec.ko.

There is a rather lengthy discussion of various options to apply in
the mentioned PRs and their depends/blocks, and the review.
There seems no best solution working across multiple toolchains and
multiple version of them, so I took the liberty of taking one,
as currently our users (and our CI system) are hitting this on
just i386 and we need some solution.  I wish we would have a proper
fix rather than another "hack".

Also backout r340009 which manually, temporarily fixed CARP before 12.0-R
"by chance" after a lead-up of various other link-elf.c and related fixes.

PR:			230857,238012
With suggestions from:	arichardson (originally last year)
Tested by:		lwhsu
Event:			Waterloo Hackathon 2019
Reported by:		lwhsu, olivier
MFC after:		6 weeks
Differential Revision:	https://reviews.freebsd.org/D17512
2019-06-08 17:44:42 +00:00
Bjoern A. Zeeb
6e33e7e0f9 Remove extra stray + from a diff from the beginning of the lines after
r348805 to fix the build.  Please do not ask how 3 more local builds
succeeded without barfing.

Pointyhat to:		bz
MFC after:		6 weeks
X-MFC with:		r348805
2019-06-08 17:38:27 +00:00
Bjoern A. Zeeb
67ca7330cf Add SDIO support.
Add a CAM-Newbus SDIO support module.  This works provides a newbus
infrastructure for device drivers wanting to use SDIO.  On the lower end
while it is connected by newbus to SDHCI, it talks CAM using the MMCCAM
framework to get to it.

This also duplicates the usbdevs framework to equally create sdiodev
header files with #defines for "vendors" and "products".

Submitted by:	kibab (initial work, see https://reviews.freebsd.org/D12467)
Reviewed by:	kibab, imp (comments on earlier version)
MFC after:	6 weeks
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19749
2019-06-08 16:26:56 +00:00
Bjoern A. Zeeb
9c907eb913 bcm2835_sdhci.c: exit DMA if not enough data left to avoid timeout errors
In the DMA case, given we disable the data interrupts, we never seem
to get DATA_END.  Given we are relying on DMA interrupts we are not
using the SDHCI state machine and hence only call into
sdhci_platform_will_handle() for the first check of data.
We do not call "will handle" for any following round trips of the same
transaction if block size * count > BCM_DMA_BLOCK_SIZE.
Manually check "left" in the DMA interrupt handler to see if we have at
least another full BCM_DMA_BLOCK_SIZE to handle.
Without this change we would DMA that and then even start a DMA with
left == 0 which would lead to a timeout and error.
Now we re-enable data interrupts and return and let the SDHCI generic
interrupt handler and state machine pick the SPACE_AVAIL up and then
find that it should punt to the pio_handler for the remaining bytes
or finish the data transaction.

With this change block mode seems to work beyond 7 * 64byte blocks,
which worked as it was below BCM_DMA_BLOCK_SIZE.

MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D20199
2019-06-08 16:15:00 +00:00
Bjoern A. Zeeb
901491d025 bcm2835_sdhci.c: save block registers to avoid controller bug
Extending what the initial revision, r273264, r276985, r277346 have
started for the transfer mode and command registers, another pair of
16bit registers written in sequence are block size and block count,
which fall together onto the same 32bit line and hence the same
register(s) would be written twice in sequence for those as well.

Use a similar approach to transfer mode and command and save the writes
to either of the block regiters and then only execute a write once.
We can do this as with transfer mode their values are meaningless until
a command is issued so we can use that write to command as a trigger
to also write out the block registers.
Compared to transfer mode and command the value of block count can
change, so we need to keep state and actually read the block registers
back the first time after a write.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20197
2019-06-08 16:05:43 +00:00
Konstantin Belousov
99b81dcb9e Remove lazy FPU switch support from amd64.
It is incompatible with some future features.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 16:03:34 +00:00
Bjoern A. Zeeb
27d72fe14a Improve sdhci slot_printf() debug printing.
Currently slot_printf() uses two printf() calls to print the
device-slot name, and actual message. When other printf()s are
ongoing in parallel this can lead to interleaved message on the console,
which is especially unhelpful for debugging or error messages.

Take a hit on the stack and vsnprintf() the message to the buffer.
This way it can be printed along with the device-slot name in one go
avoiding console gibberish.

Reviewed by:	marius
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19747
2019-06-08 15:24:03 +00:00
Bjoern A. Zeeb
6e40542a4e Introduce sim_dev and cam_sim_alloc_dev().
Add cam_sim_alloc_dev() as a wrapper to cam_sim_alloc() which takes
a device_t instead of the unit_number (which we can derive from the
dev again).

Add device_t sim_dev to struct cam_sim. It will be used to pass through
the bus for cases when both sides of CAM speak newbus already and we want
to link them (yet make the calls through CAM for now).

SDIO will be the first consumer of this. For that make use of
cam_sim_alloc_dev() in sdhci under MMCCAM.

This will also allow people to start iterating more on the idea
to newbus-ify CAM without changing 50+ device drivers from the start.
Also to be clear there are callers to cam_sim_alloc() which do not
have a device_t (e.g., XPT) or provide their own unit number so we cannot
simply switch the KPI entirely.

Submitted by:	kibab (original idea, see https://reviews.freebsd.org/D12467)
Reviewed by:	imp, chuck
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19746
2019-06-08 15:19:50 +00:00
Konstantin Belousov
c46d985629 i386 trap.c: Remove unused MAX_TRAP_MSG define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 13:41:39 +00:00
Konstantin Belousov
c7228026a8 amd64 trap.c: Modernize syntax around trap_msg[].
Convert the array to use C99 initializers.
Make it constant.
Replace MAX_TRAP_MSG with nitems().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-08 13:40:57 +00:00
Justin Hibbits
988d63af1c powerpc/pmap: Move the SLB spill handlers to a better place
The SLB spill handlers are AIM-specific, and belong better with the rest of
the SLB code anyway.  No functional change.
2019-06-08 03:07:08 +00:00
Justin Hibbits
b7918b86b3 powerpc/aim: Use nitems() for calculating size of phys_avail in AIM pmaps
Same thing was already done in r347164 for Book-E pmap.
2019-06-08 02:36:07 +00:00
John Baldwin
5f37b74d5d Fix debug trace after removal of pdu_overhead.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-06-07 21:30:11 +00:00
Alexander Motin
35251e9c28 Fix comparison signedness in arc_is_overflowing().
When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size).  Use
of signed comparison there fixes the problem.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-07 20:59:24 +00:00
Alexander Motin
61586dd647 Explicitly start ARC adjustment on limits change.
While formally it is not necessary, but the sooner it start, the sooner it
finish, and supposedly less disturbing for workload it will be.

MFC after:	2 weeks
2019-06-07 19:03:17 +00:00
Chuck Tuffli
b1f1471064 Fix nda(4) PCIe link status output
Differentiate between PCI Express Endpoint devices and Root Complex
Integrated Endpoints in the nda driver. The Link Status and Capability
registers are not valid for Integrated Endpoints and should not be
displayed. The bhyve emulated NVMe device will advertise as being an
Integrated Endpoint.

Reviewed by:	imp
Approved byL	imp (mentor)
Differential Revision: https://reviews.freebsd.org/D20282
2019-06-07 18:34:48 +00:00
Mark Johnston
88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00
Leandro Lupori
b934fc7468 [PPC64] Support QEMU/KVM pseries without hugepages
This set of changes make it possible to run FreeBSD for PowerPC64/pseries,
under QEMU/KVM, without requiring the host to make hugepages available to the
guest.

While there was already this possibility, by means of setting hw_direct_map to
0, on PowerPC64 there were a couple of issues/wrong assumptions that prevented
this from working, before this changelist.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D20522
2019-06-07 17:58:59 +00:00
Christian S.J. Peron
ca3075599a Teach readelf about some OpenBSD ELF program headers
- Add constants for OpenBSD wxneeded, bootdata and randomize to the
  FreeBSD elf_common.h file. This is the file that gets used by the
  elftoolchain library.
- Update readelf and elfdump utilities to decode these program headers
  if they are encountered.

Note: FreeBSD has it's own version of elfdump(1), which will be updated
in a subsequent commit. I am adding it here anyway because this diff is
going to be submitted upstream.

Discussed with:	emaste
Reviewed by:	imp
MFC afer:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20548

M    contrib/elftoolchain/elfdump/elfdump.c
M    contrib/elftoolchain/readelf/readelf.c
M    sys/sys/elf_common.h
2019-06-07 14:51:55 +00:00
Andrey V. Elsukov
cb1ee1bf9b Use underscores for internal variable name to avoid conflicts.
MFC after:	1 week
2019-06-07 08:30:35 +00:00
Andriy Gapon
d8b12a2162 Restore ARC MFU/MRU pressure
Before r305323 (MFV r302991: 6950 ARC should cache compressed data)
arc_read() code did this for access to a ghost buffer:
 arc_adapt() (from arc_get_data_buf())
 arc_access(hdr, hash_lock)
I.e., we first checked access to the MFU ghost/MRU ghost buffer and
adapt MFU/MRU sizes (in arc_adapt()) and next move buffer from the ghost
state to regular.

After r305323 the sequence is different:
 arc_access(hdr, hash_lock);
 arc_hdr_alloc_pabd(hdr);
I.e., we first move the buffer from the ghost state in arc_access() and
then we check access to buffer in ghost state (in arc_hdr_alloc_pabd()
-> arc_get_data_abd() -> arc_get_data_impl() -> arc_adapt()).  This is
incorrect: arc_adapt() never see access to the ghost buffer because
arc_access() already migrated the buffer from the ghost state to
regular.

So, the fix is to restore a call to arc_adapt() before arc_access() and
to suppress the call to arc_adapt() after arc_access().

Submitted by:	Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after:	2 weeks
Sponsored by:	Integros [integros.com]
Differential Revision: https://reviews.freebsd.org/D19094
2019-06-07 06:35:42 +00:00
Navdeep Parhar
27c3a85d07 cxgbe(4): Rename the DDP sysctl to rx_zcopy to match the tx_zcopy sysctl
and update its description.  The old name continues to work for now.

Sponsored by:	Chelsio Communications
2019-06-07 05:03:03 +00:00
Ryan Libby
3cf556f05e Allow fail points to have separate declarations, definitions, and evals
Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20546
2019-06-07 04:09:12 +00:00
Alexander Motin
3b2f2cb8e9 Allow UMA hash tables to expand faster then 2x in 20 seconds.
ZFS ABD allocates tons of 4KB chunks via UMA, requiring huge hash tables.
With initial hash table size of only 32 elements it takes ~20 expansions
or ~400 seconds to adapt to handling 220GB ZFS ARC.  During that time not
only the hash table is highly inefficient, but also each of those expan-
sions takes significant time with the lock held, blocking operation.

On my test system with 256GB of RAM and ZFS pool of 28 HDDs this change
reduces time needed to first time read 240GB from ~300-400s, during which
system is quite busy and unresponsive, to only ~150s with light CPU load
and just 5 sub-second CPU spikes to expand the hash table.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-06 23:57:28 +00:00
Luiz Otavio O Souza
5429f5f309 Do not overwrite the RGMII bits in the CPU port register of Switch.
Fixes the network on Espressobin.

The GENERIC kernel now boots over NFS.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2019-06-06 21:25:46 +00:00
Luiz Otavio O Souza
e5b6bcc7d2 Zero the GPIO regulator pins memory.
This fixes a panic in Espressobin when gpioregulator fails to allocate the
GPIO pin (the GPIO controller is not there).

Sponsored by:	Rubicon Communications, LLC (Netgate)
2019-06-06 20:54:09 +00:00
D Scott Phillips
806adc6c00 nvdimm: Provide nvdimm location information
Provide the acpi handle path as the location string for the nvdimm
children of the nvdimm_root device.

Reviewed by:	kib
Approved by:	jhb (mentor)
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20528
2019-06-06 20:12:04 +00:00
Mark Johnston
ce7fb386d8 Restore the comment removed in r348745.
LAGG_RLOCK() enters an epoch section, so the comment wasn't stale.

Reported by:	jhb
MFC with:	r348745
2019-06-06 17:20:35 +00:00
Doug Moore
f96e8a0bab The means of finding ranges of free pages was changed for
vm_reserv_break in r348484, and there was found to improve performance
minutely and reduce code size. This change applies a similar change to
vm_reserv_reclaim_config, expecting similar benefits. This change also
allows quick rejection of page ranges that are unsuitable on account
of alignment or boundary issues, where those issues are processed a
page at a time in the current implementation.  For contrived test
cases, this can make finding a reservation satisfying a major
alignment requirement around 30 times faster.

Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20274
2019-06-06 16:28:34 +00:00
Mark Johnston
fbd9585915 Add sysctls for uma_kmem_{limit,total}.
Reviewed by:	alc, dougm, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20514
2019-06-06 16:26:58 +00:00
Mark Johnston
058f0f7464 Remove the volatile qualifer from uma_kmem_total.
No functional change intended.

Reviewed by:	alc, dougm, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20514
2019-06-06 16:23:44 +00:00
Mark Johnston
9995dfd364 Conditionalize an in_epoch() call on INVARIANTS.
Its result is only used to determine whether to perform further
INVARIANTS-only checks.  Remove a stale comment while here.

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
2019-06-06 16:22:29 +00:00
Mark Johnston
1ef5e651fd Make the linuxkpi's alloc_pages() consistently return wired pages.
Previously it did this only on platforms without a direct map.  This
also more closely matches Linux's semantics.

Since some DRM v5.0 code assumes the old behaviour, use a
LINUXKPI_VERSION guard to preserve that until the out-of-tree module
is updated.

Reviewed by:	hselasky, kib (earlier versions), johalun
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20502
2019-06-06 16:09:19 +00:00
Mark Johnston
c080655467 Fix a race between fasttrap and the user breakpoint handler.
When disabling the last enabled userspace probe, fasttrap clears the
function pointers which hook in to the breakpoint handler.  If a traced
thread hit a fasttrap breakpoint before it was removed, we must ensure
that it is able to call the hook; otherwise fasttrap will not consume
the trap and SIGTRAP will be delievered to the thread.  Synchronize
with such threads by ensuring that they load the hook pointer with
interrupts disabled, and by completing an SMP rendezvous after removing
breakpoints and before clearing the pointers.

Reported by:	Alexander Alexeev <Alexander.Alexeev@dell.com>
Tested by:	Alexander Alexeev (earlier version)
Reviewed by:	cem, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20526
2019-06-06 16:03:25 +00:00
Ian Lepore
3aad8ca854 For armv6 and armv7, build hwpmc_armv7.c as well as the base hwpmc_arm.c.
Submitted by:	Arnaud YSMAL <arnaud.ysmal@stormshield.eu>
2019-06-06 15:21:36 +00:00
Ian Lepore
fbc27301ba Don't refer to the cpu variable in a KASSERT before initializing it. 2019-06-06 15:18:23 +00:00
Alan Somers
46f8169aea Add a testing facility to manually reclaim a vnode
Add the debug.try_reclaim_vnode sysctl. When a pathname is written to it, it
will be reclaimed, as long as it isn't already or doomed. The purpose is to
gain test coverage for vnode reclamation, which is otherwise hard to
achieve.

Add the debug.ftry_reclaim_vnode sysctl.  It does the same thing, except
that its argument is a file descriptor instead of a pathname.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20519
2019-06-06 15:04:50 +00:00
Michael Tuexen
d1156b0505 r347382 added receiver side DSACK support for the TCP base stack.
The corresponding changes for the RACK stack where missed and are added
by this commit.

Reviewed by:		Richard Scheffenegger, rrs@
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D20372
2019-06-06 07:49:03 +00:00
Cy Schubert
121d6c186b Whitespace adjustment.
MFC after:	3 days
2019-06-06 03:02:25 +00:00
Mariusz Zaborski
1808673cc4 geli: build warning fixes
Submitted by:	Aaron Prieger <aprieger@llnw.com>
Reviewed by:	sbruno
Differential Revision:	https://reviews.freebsd.org/D11068
2019-06-05 22:46:18 +00:00
Mariusz Zaborski
8da024d941 dtrace: 64-bits registers support
The registers in ilumos and FreeBSD have a different number.
In the illumos, last 32-bits register defined is SS an in FreeBSD is GS.
This off-by-one caused the uregs array to returns the wrong 64-bits register
on amd64.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20363
2019-06-05 22:29:05 +00:00
Konstantin Belousov
32d2014dde In vm_map_entry_set_vnode_text(), tolerate tmpfs mappings for which
vnode is no longer resident.

Mapping of tmpfs file does not bump use count on the vnode, because
backing object has swap type.  As result, even during normal
operations, and of course on forced unmount, we might end up with text
mapping from tmpfs node which has no vnode in memory.  In this case,
there is no v_writecount to clear (this was done during reclaim), and
no reason to assert that the vnode is present.

Restructure the code to silently ignore OBJ_SWAP objects with
OBJ_TMPFS_NODE flag set, but OBJ_TMPFS flag clear.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:21:17 +00:00
Konstantin Belousov
3c93d22758 Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise.  Nullfs because vnode vm_object handle never
points to nullfs vnode.  Tmpfs because its vm_object is never vnode
object at all.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:16:25 +00:00
John Baldwin
0d1fd6e541 Support MSI-X for passthrough devices with a separate PBA BAR.
pci_alloc_msix() requires both the table and PBA BARs to be allocated
by the driver.  ppt was only allocating the table BAR so would fail
for devices with the PBA in a separate BAR.  Fix this by allocating
the PBA BAR before pci_alloc_msix() if it is stored in a separate BAR.

While here, release BARs after calling pci_release_msi() instead of
before.  Also, don't call bus_teardown_intr() in error handling code
if bus_setup_intr() has just failed.

Reported by:	gallatin
Tested by:	gallatin
Reviewed by:	rgrimes, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20525
2019-06-05 19:30:32 +00:00
Andriy Gapon
2b1d0ab2f6 first step towards enforcing must-succeed semantics for bus accessors
Unlike BUS_READ_IVAR / BUS_WRITE_IVAR, bus accessors do not have a
return code.  It is assumed that there is a tight coupling between a bus
driver and a driver for a device on the bus with respect to instance
variables that the bus defines for its children.  So, the driver is
supposed to have only valid accesses to the variables and, thus, the
accessors must always succeed.

Of course, programming errors sometimes happen.  At present, such errors
go completely unnoticed.  The idea of this change is to start catching
them.  As a first step, there will be a warning about a failed accessor
call.  This is to give developers a heads-up.  I plan to replace the
printf with a KASSERT a week later, so that the warning is harder to
ignore.

Reviewed by:	cem, imp, ian
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D20458
2019-06-05 13:18:00 +00:00
Tycho Nightingale
56db4ebd34 another occurrence where a very large dma mapping can cause integer overflow
Submitted by:	rlibby
Sponsored by:	Dell EMC Isilon
2019-06-05 13:08:21 +00:00
Andrey V. Elsukov
efdadaa2d8 Initialize V_nat64out methods explicitly.
It looks like initialization of static variable doesn't work for
VIMAGE and this leads to panic.

Reported by:	olivier
MFC after:	1 week
2019-06-05 09:25:40 +00:00
Colin Percival
e0235fd34a Only respond to the PCIe Attention Button if a device is already plugged in.
Prior to this commit, if PCIEM_SLOT_STA_ABP and PCIEM_SLOT_STA_PDC are
asserted simultaneously, FreeBSD sets a 5 second "hardware going away" timer
and then processes the "presence detect" change. In the (physically
challenging) case that someone presses the "attention button" and inserts
a new PCIe device at exactly the same moment, this results in FreeBSD
recognizing that the device is present, attaching it, and then detaching it
5 seconds later.

On EC2 "bare metal" hardware this is the precise sequence of events which
takes place when a new EBS volume is attached; virtual machines have no
difficulty effecting physically implausible simultaneity.

This patch changes the handling of PCIEM_SLOT_STA_ABP to only detach a
device if the presence of a device was detected *before* the interrupt
which reports the Attention Button push.

Reported by:	Matt Wilson
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D20499
2019-06-05 04:58:42 +00:00
Cy Schubert
37dbd136c3 While working on a PR, more are discovered.
Remove more #ifdefs missed in r343701.

MFC after:	1 week
2019-06-04 19:37:51 +00:00
Cy Schubert
e5492b8bc4 Clean up #ifdefs from old unsupported releases of FreeBSD.
MFC after:	1 week
2019-06-04 19:25:32 +00:00
Mark Johnston
2d2748710a Remove an outdated header comment for vm_page.c.
The listed rules were incomplete and outdated.  There is a much more
comprehensive comment in vm_page.h.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20503
2019-06-04 18:38:27 +00:00
Hans Petter Selasky
9ccaf2215a In usb(4) fix a lost completion event issue towards libusb(3). It may happen
if a USB transfer is cancelled that we need to fake a completion event.
Implement missing support in ugen_fs_copy_out() to handle this.

This fixes issues with webcamd(8) and firefox.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-06-04 16:40:18 +00:00
Alan Cox
6f1412f800 The changes to pmap_demote_pde_locked()'s control flow in r348476 resulted
in the loss of a KASSERT that guarded against the invalidation a wired
mapping.  Restore this KASSERT.

Remove an unnecessary KASSERT from pmap_demote_pde_locked().  It guards
against a state that was already handled at the start of the function.

Reviewed by:	kib
X-MFC with:	r348476
2019-06-04 16:21:14 +00:00
Ed Maste
b734222edd elf_common: add GNU note types and NT_GNU_PROPERTY_TYPE_0 bits
To support Intel CET IBT/Shadow Stack.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-04 15:44:31 +00:00
Ed Maste
004caac2d8 style(9) / tidying for r348611
MFC with:	r348611
Event:		Waterloo Hackathon 2019
2019-06-04 13:45:30 +00:00
Ed Maste
74cd06b42e Expose the kernel's build-ID through sysctl
After our migration (of certain architectures) to lld the kernel is built
with a unique build-ID.  Make it available via a sysctl and uname(1) to
allow the user to identify their running kernel.

Submitted by:	Ali Mashtizadeh <ali_mashtizadeh.com>
MFC after:	2 weeks
Relnotes:	Yes
Event:		Waterloo Hackathon 2019
Differential Revision:	https://reviews.freebsd.org/D20326
2019-06-04 13:07:10 +00:00
Hans Petter Selasky
253c93f26b In xhci(4) there is no stream ID in the completion TRB.
Instead iterate all the stream IDs in stream mode to find
the matching USB transfer.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-06-04 09:01:02 +00:00
Hans Petter Selasky
76a3555808 Make sure the DMA tags get freed in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-06-04 08:06:51 +00:00
Slava Shwartsman
bb43866c38 Fix prio vs. nonprio tagged traffic in RDMACM
In current RDMACM implementation RDMACM server will not find a GID
index when the request was prio-tagged and the sever is non
prio-tagged and vise-versa.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should
be considered as untagged. Treat RDMACM request the same.

Reviewed by:    hselasky, kib
MFC after:      3 Days
Sponsored by:   Mellanox Technologies
2019-06-04 06:21:31 +00:00