Commit Graph

128079 Commits

Author SHA1 Message Date
Alexander Motin
ed3bf01599 Add support for Long LBA mode parameter block descriptor.
It is formally required for SBC Base 2016 feature set.

MFC after:	2 weeks
2019-07-26 19:14:12 +00:00
Alan Cox
b3d6ea5bc2 Implement pmap_advise(). (Without a working pmap_advise() implementation
madvise(MADV_DONTNEED) and madvise(MADV_FREE) are NOPs.)

Reviewed by:	markj
X-MFC after:	r350004
Differential Revision:	https://reviews.freebsd.org/D21062
2019-07-26 05:07:09 +00:00
Alexander Motin
ae8828bad1 Add device temperature reporting into CTL.
The values to report can be set via LUN options.  It can be useful for
testing, and also required for Drive Maintenance 2016 feature set.

MFC after:	2 weeks
2019-07-26 03:49:16 +00:00
Alexander Motin
0ea67e7019 Add reporting of SCSI Feature Sets VPD page from SPC-5.
CTL implements all defined feature sets except Drive Maintenance 2016,
which is not very applicable to such a virtual device, and implemented
only partially now.  But may be it could be fixed later at least for
completeness.

MFC after:	2 weeks
2019-07-26 01:49:28 +00:00
Kyle Evans
0dbac71f19 if_tuntap(4): Add TUNGIFNAME
This effectively just moves TAPGIFNAME into common ioctl territory.

MFC after:	3 days
2019-07-25 22:23:34 +00:00
Alan Cox
5d18382b72 Simplify the handling of superpages in pmap_clear_modify(). Specifically,
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.

Deindent the nearby code.

Reviewed by:	kib, markj
Tested by:	pho (amd64, i386)
X-MFC after:	r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision:	https://reviews.freebsd.org/D21027
2019-07-25 22:02:55 +00:00
Warner Losh
08a607e0f3 Widen the type for to.
The timeout field in the CAPS register is defined to be 8 bits, so its type was
uint8_t. We recently started adding 1 to it to cope with rogue devices that
listed 0 timeout time (which is impossible). However, in so doing, other devices
that list 0xff (for a 2 minute timeout) were broken when adding 1
overflowed. Widen the type to be uint32_t like its source register to avoid the
issue.

Reported by: bapt@
2019-07-25 20:26:21 +00:00
Alexander Motin
c15a591cbd Make camcontrol sanitize support also ATA devices.
ATA sanitize is functionally identical to SCSI, just uses different
initiation commands and status reporting mechanism.

While there, make kernel better handle sanitize commands and statuses.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-07-25 18:48:31 +00:00
Leandro Lupori
6962841780 powerpc: Improve pvo allocation code
Added allocation retry loop in alloc_pvo_entry(), to wait for
memory to become available if the caller specifies the M_WAITOK flag.

Also, the loop in moa64_enter() was removed, as moea64_pvo_enter()
never returns ENOMEM. It is alloc_pvo_entry() memory allocation that
can fail and must be retried.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21035
2019-07-25 15:27:05 +00:00
Rick Macklem
013e5361d6 r350320 committed the wrong version of generated syscall.mk.
This commit is for the correct version. (The incorrect one had the order
of the last two entries reversed, due to my testing with copy_file_range
at 568 instead of 569.) This misordering should not have been a problem,
but is now fixed.
2019-07-25 06:48:30 +00:00
Rick Macklem
4c50e495ff Update the generated syscall.mk for copy_file_range(2).
I missed this file for commit r350316.
2019-07-25 06:35:21 +00:00
Rick Macklem
bf499e87f5 Update the generated syscall files for copy_file_range(2) added by
r350315.
2019-07-25 05:55:55 +00:00
Rick Macklem
bbbbeca3e9 Add kernel support for a Linux compatible copy_file_range(2) syscall.
This patch adds support to the kernel for a Linux compatible
copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9).
This syscall/VOP can be used by the NFSv4.2 client to implement the
Copy operation against an NFSv4.2 server to do file copies locally on
the server.
The vn_generic_copy_file_range() function in this patch can be used
by the NFSv4.2 server to implement the Copy operation.
Fuse may also me able to use the VOP_COPY_FILE_RANGE() method.

vn_generic_copy_file_range() attempts to maintain holes in the output
file in the range to be copied, but may fail to do so if the input and
output files are on different file systems with different _PC_MIN_HOLE_SIZE
values.

Separate commits will be done for the generated syscall files and userland
changes. A commit for a compat32 syscall will be done later.

Reviewed by:	kib, asomers (plus comments by brooks, jilles)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20584
2019-07-25 05:46:16 +00:00
Justin Hibbits
7c382eea30 powerpc/pmap64: Make moea64 statistics optional
Summary:
It turns out statistics accounting is very expensive in the pmap driver,
and doesn't seem necessary in the common case.  Make this optional
behind a MOEA64_STATS #define, which one can set if they really need
statistics.

This saves ~7-8% on buildworld time on a POWER9.

Found by bdragon.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D20903
2019-07-25 03:47:27 +00:00
Mark Johnston
2fb62b1a46 Fix the turnstile_lock() KPI.
turnstile_{lock,unlock}() were added for use in epoch.  turnstile_lock()
returned NULL to indicate that the calling thread had lost a race and
the turnstile was no longer associated with the given lock, or the lock
owner.  However, reader-writer locks may not have a designated owner,
in which case turnstile_lock() would return NULL and
epoch_block_handler_preempt() would leak spinlocks as a result.

Apply a minimal fix: return the lock owner as a separate return value.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21048
2019-07-24 23:04:59 +00:00
Mark Johnston
a76f78dc3f Remove cap_random(3).
Now that we have a way to obtain entropy in capability mode
(getrandom(2)), libcap_random is obsolete.  Remove it.

Bump __FreeBSD_version in case anything happens to use it, though I've
found no consumers.

Reviewed by:	delphij, emaste, oshogbo
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21033
2019-07-24 22:50:43 +00:00
Eric Joyner
7f3f6aad3e iflib: fix dangling device softc pointer
Commit text by Jake:
If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register
function will free the ctx pointer. However, it does not reset the
device softc pointer to NULL.

This will result in memory corruption as a future access to the now
invalid pointer will corrupt memory that is later allocated on top of
the same memory location.

The iflib_device_deregister function correctly resets the softc pointer
by using device_set_softc().

This clears up the invalid dangling pointer and prevents memory
corruption that could lead to a panic or undefined behavior if the
device's driver failed to attach.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21003
2019-07-24 21:43:41 +00:00
Ed Maste
5e3ccbd9ac enable ig4_acpi on aarch64
The already-listed APMC0D0F ID belongs to the Ampere eMAG aarch64
platform, but ACPI support was not even built on aarch64.

Submitted by:	Greg V <greg_unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D21059
2019-07-24 21:26:17 +00:00
Ed Maste
532bc58628 pf: zero output buffer in pfioctl
Avoid potential structure padding leak.

Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	kp
MFC after:	3 days
Security:	Potential kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-07-24 16:51:14 +00:00
Kirill Ponomarev
b7592822d5 Allow set MTU more than 1500 bytes.
Submitted by:	Alexandr Fedorov <aleksandr.fedorov_itglobal_dot_com>
Approved by:	jhb, rgrimes
Sponsored by:	ITGlobal.com
Differential Revision:	https://reviews.freebsd.org/D19422
2019-07-24 16:10:20 +00:00
Mark Johnston
e020a35f5b Remove a redundant offset computation in elf_load_section().
With r344705 the offset is always zero.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
2019-07-24 15:18:05 +00:00
Michael Tuexen
d21036e033 Add a sysctl variable ts_offset_per_conn to change the computation
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.

Reported by:		devgs@ukr.net
Reviewed by:		rrs@
MFC after:		2 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20980
2019-07-23 21:28:20 +00:00
Ed Maste
051e692a99 mqueuefs: fix struct file leak
In some error cases we previously leaked a stuct file.

Submitted by:	mjg, markj
2019-07-23 20:59:36 +00:00
Michael Tuexen
9a4f1a2492 Don't hold a mutex while calling sbwait. This was found by syzkaller.
Submitted by:		rrs@
Reported by:		markj@
MFC after:		1 week
2019-07-23 18:31:07 +00:00
Eric Joyner
2dc2d58035 ixgbe(4): Fix enabling/disabling and reconfiguration of queues
- Wrong order of casting and bit shift caused that enabling and disabling
  queues didn't work properly for queues number larger than 32. Use literals
  with right suffix instead.

- TX ring tail address was not updated during reinitiailzation of TX
  structures. It could block sending traffic.

- Also remove unused variables 'eims' and 'active_queues'.

Submitted by:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by:	erj@
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20826
2019-07-23 18:14:32 +00:00
Michael Tuexen
1a91d6987c Fix a LOR in SCTP which was found by running syzkaller.
Submitted by:		rrs@
Reported by:		markj@
MFC after:		1 week
2019-07-23 18:07:36 +00:00
Andrew Turner
ac4e582795 As with r350241 use the new UL macro on the main register mask.
MFC after:	1 week
Sponsored by:	DARPA, AFRL
2019-07-23 14:52:46 +00:00
Andrew Turner
f31c5955e2 Ensure the arm64 ID register fields are 64 bit types.
Previously only some of the ID register fields were 64 bit. To allow
for a script to generate these mark them all 64 bit. To allow for their
use in assembly we need to use the UINT64_C macro via a new UL macro
to stop the lines from being too long.

MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20977
2019-07-23 14:40:37 +00:00
Andrey V. Elsukov
455d2ecb71 Eliminate rmlock from ipfw's BPF code.
After r343631 pfil hooks are invoked in net_epoch_preempt section,
this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion
to each ipfw_bpf_*tap*() call to require to be called from inside
epoch section.

Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes
safe to free() ifnet. And use on-stack ifnet pointer in each
ipfw_bpf_*tap*() call to avoid NULL pointer dereference in case when
V_*log_if global variable will become NULL during ipfw_bpf_*tap*() call.

Sponsored by:	Yandex LLC
2019-07-23 12:52:36 +00:00
Alexander Motin
76d843dab2 Make CAM ATA stack handle disk resizes.
While for ATA disks resize is even more rare situation than for SCSI, it
may happen in case of HPA or AMA being used.  Make ATA XPT report minor
IDENTIFY DATA change to upper layers with AC_GETDEV_CHANGED, and ada(4)
periph driver handle that event, recalculating all the disk properties and
signalling resize to GEOM.  Since ATA has no mechanism of UNIT ATTENTIONs,
like SCSI, it has no way to detect that something has changed.  That is why
this functionality depends on explicit reprobe via XPT_REPROBE_LUN call.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2019-07-23 02:11:14 +00:00
Justin Hibbits
b75bd60a04 powerpc: Unbreak 64-bit pmap from 350206
oldpvo is never explicitly NULL'd by moea64_pvo_enter(), so don't check for
NULL to do anything, only check error.

PR:		239372
Reported by:	Francis Little
2019-07-22 22:59:50 +00:00
Ian Lepore
39a289c3d5 Correct spelling, partion -> partition. 2019-07-22 22:41:44 +00:00
Emmanuel Vadot
b9305a86e1 arm: ti: Add a driver for ti,sysc bus
ti,sysc is a simple-bus like driver.
Add a driver for it so child nodes can attach.
2019-07-22 21:55:33 +00:00
Emmanuel Vadot
04132e0157 arm: ti: Get the hwmods property from the parent node
Since the Linux 5.0 dts the ti,hwmods property is on the parent
ti.sysc node.
2019-07-22 21:53:58 +00:00
Brooks Davis
c7bacdcc32 ata_xpt: Use the correct union member when accessing valid.
In principle this should not matter as it's a union and they point to
the same memory location but based on the code above we should be
accessing .sata and not .ata.

Submitted by:	arichardson
Reviewed by:	scottl, imp
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D21002
2019-07-22 21:07:58 +00:00
Alan Somers
caaa7cee09 [skip ci] Fix the comment for cache_purge(9)
This is a merge of r348738 from projects/fuse2

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-07-22 21:03:52 +00:00
Michael Tuexen
f1903dc055 Wakeup the application when doing PD-API for unordered DATA chunks.
Work done with rrs@.

MFC after:		1 week
2019-07-22 18:11:35 +00:00
Ruslan Bukin
e8e90fef03 Remove unused header.
Sponsored by:	DARPA, AFRL
2019-07-22 16:50:37 +00:00
Ruslan Bukin
951e058411 o Add support for BERI IOMMU device
o Add an experimental IOMMU support to xDMA framework

The BERI IOMMU device is the part of CHERI device-model project [1]. It
translates memory addresses for various BERI peripherals modelled in
software. It accepts FreeBSD/mips64 page directories format and manages
BERI TLB.

1. https://github.com/CTSRD-CHERI/device-model

Sponsored by:	DARPA, AFRL
2019-07-22 16:01:20 +00:00
Justin Hibbits
5db86748b5 powerpc64/mmu: Make moea64_pvo_enter() return if an entry already exists
Summary:
Instead of searching for a PVO entry before adding, take advantage of
the fact that RB_INSERT() returns NULL if it inserts, and the existing entry if
an entry exists, without inserting a new entry.  This saves an extra tree
traversal in the cases where the PVO does not exist.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D20944
2019-07-22 03:11:54 +00:00
Konstantin Belousov
13ff4eb1fa Switch the rest of the refcount(9) functions to bool return type.
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.

Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21014
2019-07-21 20:16:48 +00:00
Ian Lepore
d4828bcfc7 Add support for setting the aging/frequency-offset register via sysctl.
The 2127 and 2129 chips support a frequency tuning value in the range of
-7 through +8 PPM; add a sysctl handler to read and set the value.
2019-07-21 17:14:39 +00:00
Alan Cox
f606d835de With the introduction of software dirty bit emulation for managed mappings,
we should test ATTR_SW_DBM, not ATTR_AP_RW, to determine whether to set
PGA_WRITEABLE.  In effect, we are currently setting PGA_WRITEABLE based on
whether the dirty bit is preset, not whether the mapping is writeable.
Correct this mistake.

Reviewed by:	markj
X-MFC with:	r350004
Differential Revision:	https://reviews.freebsd.org/D21013
2019-07-21 17:00:19 +00:00
Konstantin Belousov
1004039858 Fix userspace build after r350199.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-07-21 16:24:40 +00:00
Konstantin Belousov
f1cf2b9dcb Check and avoid overflow when incrementing fp->f_count in
fget_unlocked() and fhold().

On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
	kern.maxfilesperproc: 939132
	kern.maxprocperuid: 34203
which already overflows u_int.  More, the malicious code can create
transient references by sending fds over unix sockets.

I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Reviewed by:	markj (previous version), mjg
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20947
2019-07-21 15:07:12 +00:00
Alan Cox
1a4cb969d1 Introduce pmap_store(), and use it to replace pmap_load_store() in places
where the page table entry was previously invalid.  (Note that I did not
replace pmap_load_store() when it was followed by a TLB invalidation, even
if we are not using the return value from pmap_load_store().)

Correct an error in pmap_enter().  A test for determining when to set
PGA_WRITEABLE was always true, even if the mapping was read only.

In pmap_enter_l2(), when replacing an empty kernel page table page by a
superpage mapping, clear the old l2 entry and issue a TLB invalidation.  My
reading of the ARM architecture manual leads me to believe that the TLB
could hold an intermediate entry referencing the empty kernel page table
page even though it contains no valid mappings.

Replace a couple direct uses of atomic_clear_64() by the new
pmap_clear_bits().

In a couple comments, replace the term "paging-structure caches", which is
an Intel-specific term for the caches that hold intermediate entries in the
page table, with wording that is more consistent with the ARM architecture
manual.

Reviewed by:	markj
X-MFC after:	r350004
Differential Revision:	https://reviews.freebsd.org/D20998
2019-07-21 03:26:26 +00:00
Justin Hibbits
b982c7ee20 powerpc: Remove an unnecessary #ifdef guard from slb.c
slb.c is only compiled for powerpc64, so no need for the #ifdef in this block.
2019-07-21 03:19:54 +00:00
Ian Lepore
2444018f7d Rewrite the nxprtc chip init to extend battery life by using power-saving
features offered by the chips.

For 2127 and 2129 chips, fix the detection of when chip-init is needed.  The
chip config needs to be reset whenever power was lost, but the logic was
wrong for 212x chips (it only worked for 8523).  Now the "oscillator
stopped" bit rather than the power manager mode is used to detect startup
after powerfail.

For all chips, disable the clock output pin.

For chips that have a timestamp/tamper-monitor feature, turn off monitoring
of the timestamp trigger pin.

The 8523, 2127, and 2129 chips have a "power manager" feature that offers
several options.  We've been using the default mode which enables
everything.  Now the code sets the power manager options to

 - direct-switch (when Vdd < Vbat, without extra threshold check)
 - no battery monitor
 - no external powerfail monitor

This reduces the current draw while running on battery from 1930nA to 880nA,
which should roughly double the lifespan of the battery under load.

Because battery checking is a nice thing to have, the code now does a check
at startup, and then once a day after that, instead of checking continuously
(but only actually reporting at startup).  The battery check is now done by
setting the power manager back to default mode, sleeping briefly while it
makes a voltage measurement, then switching back to power-saving mode.
2019-07-20 21:10:27 +00:00
Mark Johnston
b16e57a6c9 Rename vm_page_{import,release}() to vm_page_zone_{import,release}().
I would like to use the name vm_page_release() for a different purpose,
and vm_page_{import,release}() are local to vm_page.c.

Reviewed by:	kib
MFC after:	1 week
2019-07-20 18:25:41 +00:00
Justin Hibbits
cafceaebea powerpc/SPE: Enable SPV bit for EFSCFD instruction emulation
EFSCFD (floating point single convert from double) emulation requires saving
the high word of the register, which uses SPE instructions.  Enable the SPE
to avoid an SPV Unavailable exception.

MFC after:	1 week
2019-07-20 18:22:01 +00:00
Emmanuel Vadot
ad6521c24f dtso: allwinner: Add an overlay for H3 i2c0
Most of the H3 boards don't enable i2c as it is unused.
Add an overlay so it's easier for user to use i2c device.
2019-07-20 17:42:46 +00:00
John Baldwin
87c39157c6 Improve the precision of bhyve's vPIT.
Use 'struct bintime' instead of 'sbintime_t' to manage times in vPIT
to postpone rounding to final results rather than intermediate
results.  In tests performed by Joyent, this reduced the error measured
by Linux guests by 59 ppm.

Smart OS bug:	https://smartos.org/bugview/OS-6923
Submitted by:	Patrick Mooney
Reviewed by:	rgrimes
Obtained from:	SmartOS / Joyent
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20335
2019-07-20 15:59:49 +00:00
Emmanuel Vadot
d5fdfa2c8a arm64: Implement HWCAP
Add HWCAP support for arm64.
defines are the same as in Linux and a userland program can use
elf_aux_info to get the data.
We only save the common denominator for all cores in case the
big and little cluster have different support (this is known to
exists even if we don't support those SoCs in FreeBSD)

Differential Revision:	https://reviews.freebsd.org/D17137
2019-07-20 14:29:11 +00:00
Ganbold Tsagaankhuu
b24594e544 Add emmc support for Rockchip RK3399 SoC.
Tested on NanoPC-T4 board.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D20156
2019-07-20 02:53:06 +00:00
Ganbold Tsagaankhuu
ea01660f1c Add driver for Rockchip RK3399 eMMC PHY.
Tested on NanoPC-T4 board.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D20840
2019-07-20 02:03:31 +00:00
Konstantin Belousov
47c3450e50 Fix leak of memory and file refs with sendmsg(2) over unix domain sockets.
When sendmsg(2) sucessfully internalized one SCM_RIGHTS control
message, but failed to process some other control message later, both
file references and filedescent memory needs to be freed. This was not
done, only mbuf chain was freed.

Noted, test case written, reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21000
2019-07-19 20:51:39 +00:00
Doug Moore
312df2c1dd Define vm_map_entry_in_transition to handle an in-transition map
entry, combining code currently in vm_map_unwire and
vm_map_wire_locked into a single function, called by each of them for
entries in transition.

Discussed with: kib, markj
Reviewed by: alc
Approved by: kib, markj (mentors, implicit)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20833
2019-07-19 20:47:35 +00:00
Alexander Motin
89b35a5274 Add Accessible Max Address Configuration support to camcontrol.
AMA replaced HPA in ACS-3 specification.  It allows to limit size of the
disk alike to HPA, but declares inaccessible data as indeterminate.  One
of its practical use cases is to under-provision SATA SSDs for better
reliability and performance.

While there, fix HPA Security detection/reporting.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2019-07-19 19:15:08 +00:00
Warner Losh
5e83c2ffaa Keep track of the number of commands that exhaust their retry limit.
While we print failure messages on the console, sometimes logs are lost or
overwhelmed. Keeping a count of how many times we've failed retriable commands
helps get a magnitude of the problem.
2019-07-19 18:39:24 +00:00
Warner Losh
c37fc318c4 Keep track of the number of retried commands.
Retried commands can indicate a performance degredation of an nvme drive. Keep
track of the number of retries and report it out via sysctl, just like number of
commands an interrupts.
2019-07-19 18:39:18 +00:00
Warner Losh
710becdd96 Remove pre-FreeBSD 7.0 compatibility. 2019-07-19 18:38:47 +00:00
Alan Somers
fca79580be sendfile: don't panic when VOP_GETPAGES_ASYNC returns an error
PR:		236466
Sponsored by:	The FreeBSD Foundation
2019-07-19 18:03:30 +00:00
Warner Losh
eec0e91e05 Add comments about KERN_OPT here. 2019-07-19 17:48:29 +00:00
Warner Losh
1071b50a65 Use sysctl + CTLRWTUN for hw.nvme.verbose_cmd_dump.
Also convert it to a bool. While the rest of the driver isn't yet bool clean,
this will help.

Reviewed by: cem@
Differential Revision: https://reviews.freebsd.org/D20988
2019-07-19 00:32:56 +00:00
Warner Losh
c75bdc044d Provide new tunable hw.nvme.verbose_cmd_dump
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.

Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
2019-07-18 21:58:51 +00:00
Alan Somers
ed74f781c9 fusefs: add a intr/nointr mount option
FUSE file systems can optionally support interrupting outstanding
operations.  However, the file system does not identify to the kernel at
mount time whether it's capable of doing that.  Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives.  That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts.  The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.

Fix the signal delivery logic by making interruptibility an opt-in mount
option.  This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.

Bump __FreeBSD_version due to the new mount option.

Sponsored by:	The FreeBSD Foundation
2019-07-18 17:55:13 +00:00
Warner Losh
62d2cf1847 Provide macros to extract the sub-fields of the CAP_LO and CAP_HI registers.
These macros make places where we extract these easier to read. The shift and
mask stuff is also a bit tedious and error prone. Start with the CAP_LO and
CAP_HI registers since their scope is somewhat constrained. This is style
chagne only, no functional changes.

Reviewed by: chuck
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20979
2019-07-18 15:41:10 +00:00
Alan Somers
f05962453e fusefs: fix another semi-infinite loop bug regarding signal handling
fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate.  The reason is because msleep(9) would
return EINTR even for a masked signal.  One reason is when the thread is
stopped, which happens for example during sigexit().  Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.

Sponsored by:	The FreeBSD Foundation
2019-07-18 15:30:00 +00:00
Andrew Turner
f1fbf9c3b1 Rename arm64 macros in preperation for a script to generate them.
I have a script to generate most of the ID_AA64* macros from the Arm
XML source [1]. In preperation for using this we need to clean up the
macros to be in line with what the script will generate. This is the
first step, rename the macros to follow the names in said XML.

[1] https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools

MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20976
2019-07-18 13:58:04 +00:00
Ian Lepore
18cd8a2df8 Fix a paste-o, set is212x = false for other chip types. Doh! 2019-07-18 01:37:00 +00:00
Ian Lepore
634a2d26fd Handle the PCF2127 RTC chip the same as PCF2129 when init'ing the chip.
This affects the detection of 24-hour vs AM/PM mode... the ampm bit is in a
different location on 2127 and 2129 chips compared to other nxp rtc chips.
I noticed the 2127 case wasn't being handled correctly when I accidentally
misconfiged my system by claiming my PCF2129 was a 2127.
2019-07-18 01:30:56 +00:00
Alan Somers
d26d63a4af fusefs: multiple interruptility improvements
1) Don't explicitly not mask SIGKILL.  kern_sigprocmask won't allow it to be
   masked, anyway.

2) Fix an infinite loop bug.  If a process received both a maskable signal
   lower than 9 (like SIGINT) and then received SIGKILL,
   fticket_wait_answer would spin.  msleep would immediately return EINTR,
   but cursig would return SIGINT, so the sleep would get retried.  Fix it
   by explicitly checking whether SIGKILL has been received.

3) Abandon the sig_isfatal optimization introduced by r346357.  That
   optimization would cause fticket_wait_answer to return immediately,
   without waiting for a response from the server, if the process were going
   to exit anyway.  However, it's vulnerable to a race:

   1) fatal signal is received while fticket_wait_answer is sleeping.
   2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
   3) fticket_wait_answer determines that the signal was fatal and returns
      without waiting for a response.
   4) Another thread changes the signal to non-fatal.
   5) The first thread returns to userspace.  Instead of exiting, the
      process continues.
   6) The application receives EINTR, wrongly believes that the operation
      was successfully interrupted, and restarts it.  This could cause
      problems for non-idempotent operations like FUSE_RENAME.

Reported by:    kib (the race part)
Sponsored by:   The FreeBSD Foundation
2019-07-17 22:45:43 +00:00
Kirk McKusick
fdf34aa3a5 The error reported in FS-14-UFS-3 can only happen on UFS/FFS
filesystems that have block pointers that are out-of-range for their
filesystem. These out-of-range block pointers are corrected by
fsck(8) so are only encountered when an unchecked filesystem is
mounted.

A new "untrusted" flag has been added to the generic mount interface
that can be set when mounting media of unknown provenance or integrity.
For example, a daemon that automounts a filesystem on a flash drive
when it is plugged into a system.

This commit adds a test to UFS/FFS that validates all block numbers
before using them. Because checking for out-of-range blocks adds
unnecessary overhead to normal operation, the tests are only done
when the filesystem is mounted as an "untrusted" filesystem.

Reported by:  Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:  FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg)
Reviewed by:  kib
Sponsored by: Netflix
2019-07-17 22:07:43 +00:00
Kristof Provost
cd7795a5a4 riscv: Return vm_paddr_t in pmap_early_vtophys()
We can't use a u_int to compute the physical address in
pmap_early_vtophys(). Our int is 32-bit, but the physical address is
64-bit. This works fine if everything lives in below 0x100000000, but as
soon as it doesn't this breaks.

MFC after:	1 week
Sponsored by:	Axiado
2019-07-17 21:25:26 +00:00
Warner Losh
204498d7c2 Remove now-obsolete comment. 2019-07-17 20:43:14 +00:00
Alan Somers
6cb46f390d Revert r346608
That change was intended to be cosmetic, but it inadvertenly caused
vnode_pager_setsize to discard cached indirect blocks and extended
attributes on UFS during truncation.  The reason is because those blocks
have negative LBNs, which get sign-cast to positive VM indexes.

Reported by:	kib
Sponsored by:	The FreeBSD Foundation
2019-07-17 19:39:08 +00:00
Alan Somers
0122532ee0 F_READAHEAD: Fix r349248's overflow protection, broken by r349391
I accidentally broke the main point of r349248 when making stylistic changes
in r349391.  Restore the original behavior, and also fix an additional
overflow that was possible when uio->uio_resid was nearly SSIZE_MAX.

Reported by:	cem
Reviewed by:	bde
MFC after:	2 weeks
MFC-With:	349248
Sponsored by:	The FreeBSD Foundation
2019-07-17 17:01:07 +00:00
Mark Johnston
61f2f0bae6 Fix FASTTRAPIOC_GETINSTR.
This ioctl is used when a breakpoint is encountered while disassembling
a symbol in the target process.  Since only one DTrace consumer can
toggle or enumerate fasttrap probes from a given process at time, this
ioctl does not appear to be used in practice.
2019-07-17 16:38:29 +00:00
Sean Bruno
fceeeec75f I add the ability to accept the default pin widget configuration to help
with various laptops using hdaa(4) sound devices.  We don't seem to know
the "correct" configurations for these devices and the defaults are far
superiour, e.g. they work if you don't nuke the default configs.

PR:	200526
Differential Revision:	https://reviews.freebsd.org/D17772
2019-07-17 04:13:46 +00:00
Kirk McKusick
ba554157a3 Style.
No change intended.
2019-07-16 23:39:39 +00:00
Kirk McKusick
1fd136ec5e When a process attempts to allocate space on a full filesystem, a
filesystem full message is sent to the offending process or the
kernel log if the offending process cannot be identified.

To prevent an explotion of messages, the kernel ppsratecheck()
function is used to limit the messages to one per second. This
revision changes the variable that tracks the rate of these messages
from a systemwide limit to a per-filesystem limit by moving it from
a global variable to a variable in the ufsmount structure.

Suggested by: kib
Reviewed by:  kib
Sponsored by: Netflix
2019-07-16 23:12:27 +00:00
Warner Losh
dc9df3a59d Assume that the timeout value from the capacity is 1-based
Neither the 1.3 or 1.4 standards say this number is 1's based, but adding 1
costs little and copes with those NVMe drives that report '0' in this field
cheaply. This is consistent with what the Linux driver does as well.
2019-07-16 22:55:30 +00:00
Cy Schubert
caddc9e343 As of upstream fil.c CVS r1.53 (March 1, 2009), prior to the import of
ipfilter 5.1.2 into FreeBSD-10, the fix for, 2580062 from/to targets
should be able to use any interface name, moved frentry.fr_cksum to
prior to frentry.fr_func thereby making this code redundant. After
investigating whether this fix to move fr_cksum was correct and if it
broke anything, it has been determined that the fix is correct and this
code is redundant. We remove it here.

MFC after:	2 weeks
2019-07-16 19:00:42 +00:00
Cy Schubert
a422d59f7b Refactor, removing one compare.
This changes the return code however the caller only tests for 0 and != 0.
One might ask then, why multiple return codes when the caller only tests
for 0 and != 0? From what I can tell, Darren probably passed various
return codes for sake of debugging. The debugging code is long gone
however we can still use the different return codes using DTrace FBT
traces. We can still determine why the compare failed by examining the
differences between the fr1 and fr2 frentry structs, which is a simple
test in DTrace. This allows reducing the number of tests, improving the
code while not affecting our ability to capture information for
diagnostic purposes.

MFC after:	1 week
2019-07-16 19:00:38 +00:00
Michael Tuexen
e4a5561e01 Fix compilation on platforms using gcc.
When compiling RACK on platforms using gcc, a warning that tcp_outflags
is defined but not used is issued and terminates compilation on PPC64,
for example. So don't indicate that tcp_outflags is used.

Reviewed by:		rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20971
2019-07-16 17:54:20 +00:00
Eric van Gyzen
9d3ecb7e62 Adds signal number format to kern.corefile
Add format capability to core file names to include signal
that generated the core. This can help various validation workflows
where all cores should not be considered equally (SIGQUIT is often
intentional and not an error unlike SIGSEGV or SIGBUS)

Submitted by:	David Leimbach (leimy2k@gmail.com)
Reviewed by:	markj
MFC after:	1 week
Relnotes:	sysctl kern.corefile can now include the signal number
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20970
2019-07-16 15:51:09 +00:00
Mark Johnston
7af2abed6a Always use the software DBM bit for now.
r350004 added most of the machinery needed to support hardware DBM
management, but it did not intend to actually enable use of the hardware
DBM bit.

Reviewed by:	andrew
MFC with:	r350004
Sponsored by:	The FreeBSD Foundation
2019-07-16 15:41:09 +00:00
Mark Johnston
32e09b04ce Fix the arm64 page table entry attribute mask.
It did not include the DBM or contiguous bits.

Reported by:	andrew
Reviewed by:	andrew
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-16 15:38:01 +00:00
Mark Johnston
9da9cb48e9 Propagate attribute changes during demotion.
After r349117 and r349122, some mapping attribute changes do not trigger
superpage demotion. However, pmap_demote_l2() was not updated to ensure
that the replacement L3 entries carry any attribute changes that
occurred since promotion.

Reported and tested by:	manu
Reviewed by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20965
2019-07-16 14:40:49 +00:00
Andriy Gapon
a70e114dc6 bge: check that the bus is a pci bus before using it as such
This fixes the following panic on powerpc:
  pci_get_vendor failed for pcib1 on bus ofwbus0, error = 2

PR:		238730
Reported by:	Dennis Clarke <dclarke@blastwave.org>
Tested by:	Dennis Clarke <dclarke@blastwave.org>
MFC after:	2 weeks
2019-07-16 08:36:49 +00:00
Justin Hibbits
8d00d89228 powerpc: Fix casueword(9) post-r349951
'=' asm constraint marks a variable as write-only.  Because of this, gcc
throws away the initialization of 'res', causing garbage to be returned if
the CAS was successful.  Use '+' to mark res as read/write, so that the
initialization stays in the generated asm.  Also, fix the reservation
clearing stwcx store index register in casueword32, and only do the dummy
store when needed, skip it if the real store has already succeeded.
2019-07-16 03:55:27 +00:00
Alan Cox
43184d8e5c Revert r349973. Upon further reflection, I realized that the comment
deleted by r349973 is still valid on i386.  Restore it.

Discussed with:	   markj
2019-07-16 03:09:03 +00:00
John Baldwin
32451fb9fc Add ptrace op PT_GET_SC_RET.
This ptrace operation returns a structure containing the error and
return values from the current system call.  It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).

The sr_error member holds the error value from the system call.  Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.

If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20901
2019-07-15 21:48:02 +00:00
Ian Lepore
0cc1098a1a In nxprtc(4), use the countdown timer for better timekeeping resolution
on PCx2129 chips too.

The datasheet for the PCx2129 chips says that there is only a watchdog
timer, no countdown timer.  It turns out the countdown timer hardware is
there and works just the same as it does on a PCx2127 chip, except that you
can't use it to trigger an interrupt or toggle an output pin.  We don't need
interrupts or output pins, we only need to read the timer register to get
sub-second resolution.  So start treating the 2129 chips the same as 2127.
2019-07-15 21:47:40 +00:00
Ian Lepore
a134d96ef1 Fix nxprtc(4) on systems that support i2c repeat-start correctly.
An obscure footnote in the datasheets for the PCx2127, PCx2129, and
PCF8523 rtc chips states that the chips do not support i2c repeat-start
operations.  When the driver was originally written and tested, the i2c
bus on that system also didn't support repeat-start and just quietly
turned repeat-start operations into a stop-then-start, making it appear
that the nxprtc driver was working properly.

The repeat-start situation only comes up on reads, so instead of using
the standard iicdev_readfrom(), use a local nxprtc_readfrom(), which is
just a cut-and-pasted copy of iicdev_readfrom(), modified to send two
separate start-data-stop sequences instead of using repeat-start.
2019-07-15 21:40:58 +00:00
John Baldwin
c18ca74916 Don't pass error from syscallenter() to syscallret().
syscallret() doesn't use error anymore.  Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D2090
2019-07-15 21:25:16 +00:00
John Baldwin
1af9474b26 Always set td_errno to the error value of a system call.
Early errors prior to a system call did not set td_errno.  This commit
sets td_errno for all errors during syscallenter().  As a result,
syscallret() can now always use td_errno without checking TDP_NERRNO.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20898
2019-07-15 21:16:01 +00:00
Michael Tuexen
8a3cfbff92 Don't free read control entries, which are still on the stream queue when
adding them the the read queue fails

MFC after:		1 week
2019-07-15 20:45:01 +00:00
Konstantin Belousov
9f7b7da5bf In do_sem2_wait(), balance umtx_key_get() with umtx_key_release() on retry.
Reported by:	ler
Bisected and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
2019-07-15 19:18:25 +00:00
Mark Johnston
ca2cae0b4d Implement software access and dirty bit management for arm64.
Previously the arm64 pmap did no reference or modification tracking;
all mappings were treated as referenced and all read-write mappings
were treated as dirty.  This change implements software management
of these attributes.

Dirty bit management is implemented to emulate ARMv8.1's optional
hardware dirty bit modifier management, following a suggestion from alc.
In particular, a mapping with ATTR_SW_DBM set is logically writeable and
is dirty if the ATTR_AP_RW_BIT bit is clear.  Mappings with
ATTR_AP_RW_BIT set are write-protected, and a write access will trigger
a permission fault.  pmap_fault() handles permission faults for such
mappings and marks the page dirty by clearing ATTR_AP_RW_BIT, thus
mapping the page read-write.

Reviewed by:	alc
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20907
2019-07-15 17:13:32 +00:00
Mark Johnston
1fd21cb005 pmap_clear_modify() needs to clear PTE_W.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-07-15 15:45:33 +00:00
Mark Johnston
24074d28ec Fix reference counting in pmap_ts_referenced() on RISC-V.
pmap_ts_referenced() does not necessarily clear the access bit from
all accessed mappings of a given page.  Thus, if a scan of the mappings
needs to be restarted, we should be careful to avoid double-counting
accessed mappings whose access bits were not cleared in a previous
attempt.

Reported by:	alc
Reviewed by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20926
2019-07-15 15:43:15 +00:00
Emmanuel Vadot
e52acf6a46 Remove duplicated device firmware entry in generic arm kernel config added in r333191
Submitted by:	Daniel Engberg (daniel.engberg.lists@pyret.net)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20680
2019-07-15 15:07:55 +00:00
Michael Tuexen
248bd1b80f Add support for MSG_EOR and MSG_EOF in sendmsg() for SCTP.
This is an FreeBSD extension, not covered by Posix.

This issue was found by running syzkaller.

MFC after:		1 week
2019-07-15 14:54:04 +00:00
Michael Tuexen
25fa310a5f Fix socket state handling when freeing an SCTP endpoint.
This issue was found by runing syzkaller.

MFC after:		1 week
2019-07-15 14:52:52 +00:00
Konstantin Belousov
ad038195bd In do_lock_pi(), do not return prematurely.
If umtxq_check_susp() indicates an exit, we should clean the resources
before returning.  Do it by breaking out of the loop and relying on
post-loop cleanup.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
Differential revision:	https://reviews.freebsd.org/D20949
2019-07-15 08:39:52 +00:00
Konstantin Belousov
40bd868ba7 Correctly check for casueword(9) success in do_set_ceiling().
After r349951, the return code must be checked instead of old == new
comparision.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
Differential revision:	https://reviews.freebsd.org/D20949
2019-07-15 08:38:01 +00:00
Michael Tuexen
a85b7f125b Improve the input validation for l_linger.
When using the SOL_SOCKET level socket option SO_LINGER, the structure
struct linger is used as the option value. The component l_linger is of
type int, but internally copied to the field so_linger of the structure
struct socket. The type of so_linger is short, but it is assumed to be
non-negative and the value is used to compute ticks to be stored in a
variable of type int.

Therefore, perform input validation on l_linger similar to the one
performed by NetBSD and OpenBSD.

Thanks to syzkaller for making me aware of this issue.

Thanks to markj@ for pointing out that a similar check should be added
to so_linger_set().

Reviewed by:		markj@
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D20948
2019-07-14 21:44:18 +00:00
Konstantin Belousov
b7b6b7a9c5 PR: 239143
Reported and tested by:	Wes Maag <jwmaag@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-07-14 21:08:54 +00:00
Randall Stewart
e5926fd368 This is the second in a number of patches needed to
get BBRv1 into the tree. This fixes the DSACK bug but
is also needed by BBR. We have yet to go two more
one will be for the pacing code (tcp_ratelimit.c) and
the second will be for the new updated LRO code that
allows a transport to know the arrival times of packets
and (tcp_lro.c). After that we should finally be able
to get BBRv1 into head.

Sponsored by:	Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D20908
2019-07-14 16:05:47 +00:00
Michael Tuexen
8a956abe12 When calling sctp_initialize_auth_params(), the inp must have at
least a read lock. To avoid more complex locking dances, just
call it in sctp_aloc_assoc() when the write lock is still held.

Reported by:		syzbot+08a486f7e6966f1c3cfb@syzkaller.appspotmail.com
MFC after:		1 week
2019-07-14 12:04:39 +00:00
Chuck Tuffli
94c15665a5 Fix a typo in r349969
OUI_FRREBSD_NVME_HIGH should have been OUI_FREEBSD_NVME_HIGH

Caught by:	Gary Jennejohn
2019-07-14 03:49:48 +00:00
Cy Schubert
d096fc9ccd Calculate the offset of the interface name using FR_NAME rather than
calclulating it "by hand". This improves consistency with the rest of
the code and is in line with planned fixes and other work.

MFC after:	1 week
2019-07-14 02:46:34 +00:00
Cy Schubert
49a28fbdd2 Recycle the unused FR_CMPSIZ macro which became orphaned in ipfilter 5
prior to its import into FreeBSD. This macro calculates the size to be
compared within the frentry structure. The ipfilter 4 version of the
macro calculated the compare size based upon the static size of the
frentry struct. Today it uses the ipfilter 5 method of calculating the
size based upon the new to ipfilter 5 fr_size value found in the
frentry struct itself.

No effective change in code is intended.

MFC after:	1 week
2019-07-14 02:46:30 +00:00
Cy Schubert
d4af744b6a style(9)
MFC after:	3 days
2019-07-14 02:46:26 +00:00
Alan Somers
07e86257e6 fusefs: fix the build with some NODEBUG kernels
systm.h needs to be included before counter.h

Sponsored by:	The FreeBSD Foundation
2019-07-13 21:41:12 +00:00
Alan Cox
f884dbbcfc Revert r349442, which was a workaround for bus errors caused by an errant
TLB entry.  Specifically, at the start of pmap_enter_quick_locked(), we
would sometimes have a TLB entry for an invalid PTE, and we would need to
issue a TLB invalidation before exiting pmap_enter_quick_locked().  However,
we should never have a TLB entry for an invalid PTE.  r349905 has addressed
the root cause of the problem, and so we no longer need this workaround.

X-MFC after:	r349905
2019-07-13 16:32:19 +00:00
Alan Cox
f138406359 Remove a stale comment.
Reported by:	markj
MFC after:	1 week
2019-07-13 15:53:28 +00:00
Ian Lepore
805eb13a60 Add arm_sync_icache() and arm_drain_writebuf() sysarch syscall wrappers.
NetBSD and OpenBSD have libc wrapper functions for the ARM_SYNC_ICACHE and
ARM_DRAIN_WRITEBUF sysarch operations. This change adds compatible functions
to our library. This should make it easier for various upstream sources to
support *BSD operating systems with a single variation of cache maintence
code in tools like interpreters and JIT compilers.

I consider the argument types passed to arm_sync_icache() to be especially
unfortunate, but this is intended to match the other BSDs.

Differential Revision:	https://reviews.freebsd.org/D20906
2019-07-13 15:34:29 +00:00
Alan Somers
97b0512b23 projects/fuse2: build fixes
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64

Sponsored by:	The FreeBSD Foundation
2019-07-13 14:42:09 +00:00
Chuck Tuffli
409a80e5a4 bhyve: Create EUI64 for NVMe namespaces
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
 - the FreeBSD IEEE OUI
 - PCI bus, device/slot, function values
 - Namespace ID

Reviewed by:	imp, araujo, jhb, rgrimes
Approved by:	imp (mentor), jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D19905
2019-07-13 12:48:28 +00:00
Michael Tuexen
9e44bc22d8 r348494 fixes a race in udp_output(). The same race exists in
udp_output6(), therefore apply a similar patch to IPv6.

Reported by:		syzbot+c5ffbc8f14294c7b0e54@syzkaller.appspotmail.com
Reviewed by:		bz@, markj@
MFC after:		2 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20936
2019-07-13 12:45:08 +00:00
Vincenzo Maffione
d7143780ce netmap: fix bug introduced by r349752
r349752 introduced a NULL pointer reference bug
in the emulated netmap code.

Reported by:	lwhsu
MFC after:	3 days
2019-07-13 08:08:25 +00:00
Justin Hibbits
07b57507c9 powerpc64/pmap: No need for moea64_pvo_remove_from_page_locked() wrapper
The only consumer of moea64_pvo_remove_from_page_locked() already has the
page in hand, so there is no need to search for the page while holding the
lock.  Drop the wrapper, and rename _moea64_pvo_remove_from_page_locked().

Reported by:	alc
2019-07-13 03:39:46 +00:00
Justin Hibbits
a7e6ec601a powerpc64/pmap: Reduce scope of PV_LOCK in remove path
Summary:
Since the 'page pv' lock is one of the most highly contended locks, we
need to try to do as much work outside of the lock as we can.  The
moea64_pvo_remove_from_page() path is a low hanging fruit, where we can
do some heavy work (PHYS_TO_VM_PAGE()) outside of the lock if needed.
In one path, moea64_remove_all(), the PV lock is already held and can't
be swizzled, so we provide two ways to perform the locked operation, one
that can call PHYS_TO_VM_PAGE outside the lock, and one that calls with
the lock already held.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20694
2019-07-13 03:02:11 +00:00
Justin Hibbits
21e47be25e Set pcpu curpmap for powerpc64
Summary:
If an illegal instruction is encountered on a process running on a
powerpc64 kernel it would attempt to sync the cache before retrying the
instruction "just in case".  However, since curpmap is not set, when
moea64_sync_icache() attempts to lock the pmap, it's locking on a NULL pointer,
triggering a panic.  Fix this by adding a (assumed unnecessary) fallback to
curthread's pmap in moea64_sync_icache().

Reported by:	alfredo.junior_eldorado.org.br
Reviewed by:	luporl, alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20911
2019-07-13 00:19:57 +00:00
Navdeep Parhar
6620004df5 cxgbe(4): Completely ignore all top level interrupts that are not enabled.
The driver used to log any non-zero cause and when running with a single
line interrupt it would spam the console/logs with reports of interrupts
that are of no interest to anyone.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-07-12 20:59:10 +00:00
Konstantin Belousov
026e450262 Fix syntax.
Nod from:	jhb
Sponsored by:	The FreeBSD Foundation
2019-07-12 19:14:52 +00:00
Konstantin Belousov
30b3018d48 Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely.  Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison.  The primitive no longer retries
the operation if it failed spuriously.  Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by:	andrew (arm64, previous version), markj
Tested by:	pho
Reported by:	https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20772
2019-07-12 18:43:24 +00:00
Scott Long
422a8a4d3a Tie the name limit of a VM to SPECNAMELEN from devfs instead of a
hard-coded value. Don't allocate space for it from the kernel stack.
Account for prefix, suffix, and separator space in the name. This
takes the effective length up to 229 bytes on 13-current, and 37 bytes
on 12-stable. 37 bytes is enough to hold a full GUID string.

PR:		234134
MFC after:	1 week
Differential Revision:	http://reviews.freebsd.org/D20924
2019-07-12 18:37:56 +00:00
Mark Johnston
194a6e146d Apply some light cleanup to uses of pmap_pte_dirty().
- Check for ATTR_SW_MANAGED before anything else.
- Use pmap_pte_dirty() in pmap_remove_pages().

No functional change intended.

Reviewed by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-07-12 15:24:25 +00:00
Randall Stewart
55f795883f add back the comment around the pending DSACK fixes. 2019-07-12 11:45:42 +00:00
Andrey V. Elsukov
2dab0de635 Do not modify cmd pointer if it is already last opcode in the rule.
MFC after:	1 week
2019-07-12 09:59:21 +00:00
Andrey V. Elsukov
4ee2f4c180 Correctly truncate the rule in case when it has several action opcodes.
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-07-12 09:48:42 +00:00
Poul-Henning Kamp
ccbb355988 Support multiple serial ports per device.
Enable this for the NovAtel OEMv2 GPS receiver.

Not fixed:  The receiver shows up as "<Interface 0>" in the device
tree, because that is literally what the descriptor-string is.

Reviewed by:	hselasky@
2019-07-12 09:02:12 +00:00
Cy Schubert
75118b47fc Move the new ipf_pcksum6() function from ip_fil_freebsd.c to fil.c.
The reason for this is that ipftest(8), which still works on FreeBSD-11,
fails to link to it, breaking stable/11 builds.

ipftest(8) was broken (segfault) sometime during the FreeBSD-12 cycle.
glebius@ suggested we disable building it until I can get around to
fixing it. Hence this was not caught in -current.

The intention is to fix ipftest(8) as it is used by the netbsd-tests
(imported by ngie@ many moons ago) for regression testing.

MFC after:	immediately
2019-07-12 01:59:08 +00:00
Doug Moore
3f3f7c056f Address problems in blist_alloc introduced in r349777. The swap block allocator could become corrupted
if a retry to allocate swap space, after a larger allocation attempt failed, allocated a smaller set of free blocks
that ended on a 32- or 64-block boundary.

Add tests to detect this kind of failure-to-extend-at-boundary and prevent the associated accounting screwup.

Reported by: pho
Tested by: pho
Reviewed by: alc
Approved by: markj (mentor)
Discussed with: kib
Differential Revision: https://reviews.freebsd.org/D20893
2019-07-11 20:52:39 +00:00
Cy Schubert
c5dddb272d Remove a tautological test for adding a rule in the block that
adds rules.

MFC after:	1 week
2019-07-11 19:36:18 +00:00
Cy Schubert
3133f9c2a3 Correct r349898. The default is add a rule.
MFC after:	1 week
X-MFC with:	r349898
2019-07-11 19:36:14 +00:00
Konstantin Belousov
e2e0470dfa Ensure that mds_handler always points to a valid method.
Depending on system configuration, version, and architecture,
mds_handler might be dereferenced from doreti before
hw_mds_recalculate_boot() initialized it.  Statically assign void
method to cover all cases.

Reported by:	"Schuendehuette, Matthias (LDA IT PLM)" <matthias.schuendehuette@siemens.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-07-11 16:22:49 +00:00
Mark Johnston
a9da8477af Fix some ISS bit definitions for data aborts.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-07-11 15:36:59 +00:00
Warner Losh
4b1ac5c2d8 More fully implement the state machine.
When a command is finished running, we must transition it from INQUEUE
to busy state. We were failing to do that, so we hit a panic when the
commands were freed. This only affects mpr, mps already did simmilar
things. Now both the polling and interrupt paths properly set BUSY as
appropriate.
2019-07-11 06:22:15 +00:00
Randall Stewart
1cf999a5f3 Update to jhb's other suggestion, use #error when
we are missing  HPTS.
2019-07-11 04:40:58 +00:00
Randall Stewart
9cf3c235c0 Update copyright per JBH's suggestions.. thanks. 2019-07-11 04:38:33 +00:00
Justin Hibbits
9ac516a6f1 powerpc: Only worry about the lower 32 bits of SP in a 32-bit process
Summary:
Running a 32-bit process on a 64-bit POWER CPU may still use all 64-bits
in calculations, while ignoring the upper 32 bits for addressing
storage.  It so happens that some processes end up with r1 (SP) having
bit 31 set in some cases (33-bit address).  Writing out to this 33-bit
address obviosly fails.  Since the CPU ignores the upper bits, we should
as well.

sendsig() and cpu_fetch_syscall_args() appear to be the only functions
that actually rely on userspace register values for copy in/out, and
cpu_fetch_syscall_args() doesn't seem to be bitten in practice yet.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20896
2019-07-11 03:29:25 +00:00
Alan Cox
46a7f2ebd4 According to Section D5.10.3 "Maintenance requirements on changing System
register values" of the architecture manual, an isb instruction should be
executed after updating ttbr0_el1 and before invalidating the TLB.  The
lack of this instruction in pmap_activate() appears to be the reason why
andrew@ and I have observed an unexpected TLB entry for an invalid PTE on
entry to pmap_enter_quick_locked().  Thus, we should now be able to revert
the workaround committed in r349442.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20904
2019-07-11 02:43:23 +00:00
Cy Schubert
d37052fc86 ipfilter commands, in this case ipf(8), passes its operations and rules
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 1 is add, 2 is remove, and 3 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.

MFC after:	1 week
2019-07-11 00:08:46 +00:00
Mark Johnston
f84a04c8bc Rename pmap_page_dirty() to pmap_pte_dirty().
This is a precursor to implementing dirty bit management.

Discussed with:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-07-10 22:52:26 +00:00
Warner Losh
f6ccd325fc Enforce a 4GB DMA boundary on isci(4)
This device cannot cross a 4GB boundary with DMA.  Removing the
boundary in r346386 resulted in low frequency memory corruption on
machines with isci(4) controllers.

Submitted by: gallatin@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20910
2019-07-10 22:23:59 +00:00
Randall Stewart
3b0b41e613 This commit updates rack to what is basically being used at NF as
well as sets in some of the groundwork for committing BBR. The
hpts system is updated as well as some other needed utilities
for the entrance of BBR. This is actually part 1 of 3 more
needed commits which will finally complete with BBRv1 being
added as a new tcp stack.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D20834
2019-07-10 20:40:39 +00:00