Commit Graph

128079 Commits

Author SHA1 Message Date
Konstantin Belousov
4dd892181d freebsd32 shims for copy_file_range(2).
Reviewed by:	brooks, rmacklem (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21092
2019-07-31 19:20:05 +00:00
Konstantin Belousov
fd336e2ac0 Fix handling of transient casueword(9) failures in do_sem_wait().
In particular, restart should be only done when the failure is
transient.  For this, recheck the count1 value after the operation.

Note that do_sem_wait() is older usem interface.

Reported and tested by:	bdrewery
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-07-31 19:16:49 +00:00
Mariusz Zaborski
4d7486c30f gnop: style nits 2019-07-31 17:51:06 +00:00
Mariusz Zaborski
4f80c85519 gnop: Introduce requests delay.
This allows to simulated disk that is responding slowly to the IO requests.

Reviewed by:	markj, bcr, pjd (previous version)
Differential Revision:	https://reviews.freebsd.org/D21052
2019-07-31 17:47:12 +00:00
Ed Maste
c54ee572e5 pf: zero (another) output buffer in pfioctl
Avoid potential structure padding leak.  r350294 identified a leak via
static analysis; although there's no report of a leak with the
DIOCGETSRCNODES ioctl it's a good practice to zero the memory.

Suggested by:	kp
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-31 16:58:09 +00:00
Kyle Evans
b5a7ac997f kern_shm_open: push O_CLOEXEC into caller control
The motivation for this change is to allow wrappers around shm to be written
that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets
it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which
is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1().
Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from
the context.

sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX
requirements, and a comment has been dropped in to kern_fd_open() to explain
the situation and add a pointer to where O_CLOEXEC setting is maintained for
shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally
sets O_CLOEXEC to match previous behavior.

This also has the side-effect of making flags correctly reflect the
O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a
glance-over leads me to believe that it didn't really matter.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21119
2019-07-31 15:16:51 +00:00
Alan Cox
43ded0a321 In pmap_advise(), when we encounter a superpage mapping, we first demote the
mapping and then destroy one of the 4 KB page mappings so that there is a
potential trigger for repromotion.  Currently, we destroy the first 4 KB
page mapping that falls within the (current) superpage mapping or the
virtual address range [sva, eva).  However, I have found empirically that
destroying the last 4 KB mapping produces slightly better results,
specifically, more promotions and fewer failed promotion attempts.
Accordingly, this revision changes pmap_advise() to destroy the last 4 KB
page mapping.  It also replaces some nearby uses of boolean_t with bool.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21115
2019-07-31 05:38:39 +00:00
Mark Johnston
520482f4aa Use VNASSERT() in checked VOP wrappers.
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21120
2019-07-30 22:41:25 +00:00
Ed Maste
305b9efefc linuxulator: rename linux_locore.s to .asm
It is assembled using "${CC} -x assembler-with-cpp", which by convention
(bsd.suffixes.mk) uses the .asm extension.

This is a portion of the review referenced below (D18344).  That review
also renamed linux_support.s to .S, but that is a functional change
(using the compiler's integrated assembler instead of as) and will be
revisited separately.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18344
2019-07-30 17:18:31 +00:00
Mark Johnston
49c3e8c8d1 Enable witness(4) blessings.
witness has long had a facility to "bless" designated lock pairs.  Lock
order reversals between a pair of blessed locks are not reported upon.
We have a number of long-standing false positive LOR reports; start
marking well-understood LORs as blessed.

This change hides reports about UFS vnode locks and the UFS dirhash
lock, and UFS vnode locks and buffer locks, since those are the two that
I observe most often.  In the long term it would be preferable to be
able to limit blessings to a specific site where a lock is acquired,
and/or extend witness to understand why some lock order reversals are
valid (for example, if code paths with conflicting lock orders are
serialized by a third lock), but in the meantime the false positives
frequently confuse users and generate bug reports.

Reviewed by:	cem, kib, mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21039
2019-07-30 17:09:58 +00:00
Mark Johnston
ed13ff4549 Regenerate after r350447. 2019-07-30 16:01:16 +00:00
Mark Johnston
f30f7b9870 Enable copy_file_range(2) in capability mode.
copy_file_range() operates on a pair of file descriptors; it requires
CAP_READ for the source descriptor and CAP_WRITE for the destination
descriptor.

Reviewed by:	kevans, oshogbo
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21113
2019-07-30 15:59:44 +00:00
Mark Johnston
0b21d89499 Handle refcount(9) wraparound.
Attempt to mitigate the security risks around refcount overflows by
introducing a "saturated" state for the counter.  Once a counter reaches
INT_MAX+1, subsequent acquire and release operations will blindly set
the counter value to INT_MAX + INT_MAX/2, ensuring that the protected
resource will not be freed; instead, it will merely be leaked.

The approach introduces a small race: if a refcount value reaches
INT_MAX+1, a subsequent release will cause the releasing thread to set
the counter to the saturation value after performing the decrement.  If
in the intervening window INT_MAX refcount releases are performed by a
different thread, a use-after-free is possible.  This is very difficult
to trigger in practice, and any situation where it could be triggered
would likely be vulnerable to reference count wraparound problems
to begin with.  An alternative would be to use atomic_cmpset to acquire
and release references, but this would introduce a larger performance
penalty, particularly when the counter is contended.

Note that refcount_acquire_checked(9) maintains its previous behaviour;
code which must accurately track references should use it instead of
refcount_acquire(9).

Reviewed by:	kib, mjg
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21089
2019-07-30 15:57:31 +00:00
Ruslan Bukin
250cbedd1e Fix MMCCAM kernel build.
Sponsored by:	DARPA, AFRL
2019-07-30 14:21:00 +00:00
Ruslan Bukin
f808f2ce3e Add support for the SD/MMC controller found in Terasic DE10-Pro
(an Intel Stratix 10 GX/SX FPGA Development Kit).

Set the bus speed manually due to lack of clock management support.

Sponsored by:	DARPA, AFRL
2019-07-30 12:51:14 +00:00
Xin LI
a2f17b9dce Bump __FreeBSD_version after removal of gzip'ed a.out support. 2019-07-30 05:14:28 +00:00
Xin LI
d4565741c6 Remove gzip'ed a.out support.
The current implementation of gzipped a.out support was based
on a very old version of InfoZIP which ships with an ancient
modified version of zlib, and was removed from the GENERIC
kernel in 1999 when we moved to an ELF world.

PR:		205822
Reviewed by:	imp, kib, emaste, Yoshihiro Ota <ota at j.email.ne.jp>
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D21099
2019-07-30 05:13:16 +00:00
Marcelo Araujo
145b1792a2 Fix sound on headset jack for ALC255 and ALC256 codec.
PR:		219350 [1], [2]
Submitted by:	Masachika ISHIZUKA (ish_at_amail.plala.or.jp) [1]
		Neel Chauhan (neel_at_neelc.org) [2]
		uri Momotyuk (yurkis_at_gmail.com) [3]
Reported by:	miwi
Reviewed by:	mav
Obtained from:	https://github.com/trueos/trueos/pull/279 [3]
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19017
2019-07-30 02:01:49 +00:00
Mark Johnston
98549e2dc6 Centralize the logic in vfs_vmio_unwire() and sendfile_free_page().
Both of these functions atomically unwire a page, optionally attempt
to free the page, and enqueue or requeue the page.  Add functions
vm_page_release() and vm_page_release_locked() to perform the same task.
The latter must be called with the page's object lock held.

As a side effect of this refactoring, the buffer cache will no longer
attempt to free mapped pages when completing direct I/O.  This is
consistent with the handling of pages by sendfile(SF_NOCACHE).

Reviewed by:	alc, kib
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20986
2019-07-29 22:01:28 +00:00
Mariusz Zaborski
7244507616 seqc: add man page
Reviewed by:	markj
Earlier version reviewed by:	emaste, mjg, bcr, 0mp
Differential Revision:	https://reviews.freebsd.org/D16744
2019-07-29 21:53:02 +00:00
Mariusz Zaborski
9db97ca0bd proc: make clear_orphan an public API
This will be useful for other patches with process descriptors.
Change its name as well.

Reviewed by:	markj, kib
2019-07-29 21:42:57 +00:00
Mark Johnston
1587dc37fa Have arm64's pmap_fault() handle WnR faults on dirty PTEs.
If we take a WnR permission fault on a managed, writeable and dirty
PTE, simply return success without calling the main fault handler.  This
situation can occur if multiple threads simultaneously access a clean
writeable mapping and trigger WnR faults; losers of the race to mark the
PTE dirty would end up calling the main fault handler, which had no work
to do.

Reported by:	alc
Reviewed by:	alc
MFC with:	r350004
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21097
2019-07-29 21:21:53 +00:00
Alan Somers
0367bca479 sendfile: don't panic when VOP_GETPAGES_ASYNC returns an error
This is a partial merge of 350144 from projects/fuse2

PR:		236466
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21095
2019-07-29 20:50:26 +00:00
Mark Johnston
96e90e5f14 Remove an unneeded trunc_page() in pmap_fault().
Reported by:	alc
MFC with:	r350004
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:31:28 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Ruslan Bukin
4be6714234 Add glue driver for Altera SOCFPGA Ethernet MAC (EMAC) found in
Terasic DE10-Pro (an Intel Stratix 10 GX/SX FPGA Development Kit).

The Altera EMAC is an instance of Synopsys DesignWare Gigabit MAC.

This driver sets correct clock range for MDIO interface on Intel Stratix 10
platform.

This is required due to lack of support for clock manager device for
this platform that could tell us the clock frequency value for ethernet
clock domain.

Sponsored by:	DARPA, AFRL
2019-07-29 16:32:23 +00:00
Andrey V. Elsukov
e758846c09 dd ipfw_get_action() function to get the pointer to action opcode.
ACTION_PTR() returns pointer to the start of rule action section,
but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ,
and only then real action opcode is stored.

ipfw_get_action() function inspects the rule action section, skips
all modifiers and returns action opcode.

Use this function in ipfw_reset_eaction() and flush_nat_ptrs().

MFC after:	1 week
Sponsored by:	Yandex LLC
2019-07-29 15:09:12 +00:00
Kristof Provost
52bb6100e9 riscv: Fix copyin/copyout
r343275 introduced a performance optimisation to the copyin/copyout
routines by attempting to copy word-per-word rather than byte-per-byte
where possible.

This optimisation failed to account for cases where the buffer is longer
than XLEN_BYTES, but due to misalignment does not not allow for any
word-sized copies. E.g. a 9 byte buffer (with XLEN_BYTES == 8) which is
misaligned by 2 bytes. The code nevertheless did a single full-word
copy, which meant we copied too much data. This potentially clobbered
other data.

This is most easily demonstrated by a simple `sysctl -a`.

Fix it by not assuming that we'll always have at least one full-word
copy to do, but instead checking the remaining length first.

Reviewed by:	markj@, mhorne@, br@ (previous version)
MFC after:	1 week
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D21100
2019-07-29 14:59:14 +00:00
Ruslan Bukin
af77cd7584 Find the correct node of PHY chip using "phy-handle" property of
ethernet MAC node.

This fixes operation on Terasic DE10-Pro (Intel Stratix 10 GX/SX
FPGA Development Kit).

Sponsored by:	DARPA, AFRL
2019-07-29 14:58:29 +00:00
Kristof Provost
f287767d4f pf: Remove partial RFC2675 support
Remove our (very partial) support for RFC2675 Jumbograms. They're not
used, not actually supported and not a good idea.

Reviewed by:	thj@
Differential Revision:	https://reviews.freebsd.org/D21086
2019-07-29 13:21:31 +00:00
Andrey V. Elsukov
d4e6a52959 Avoid possible lock leaking.
After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro
is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just
returns via `goto pullup_failed`. There are two places where PULLUP_LEN()
is called with IPFW_PF_RLOCK() held.

Add PULLUP_LEN_LOCKED() macro to use in these places to be able release
the lock, when m_pullup() fails.

Sponsored by:	Yandex LLC
2019-07-29 12:55:48 +00:00
Emmanuel Vadot
0ce33c042e arm: ti: cpsw: Check the new slave node address
Since DTS from >= Linux 5.0 the slave address are relative to the parent
node address and aren't the full ones.
Check both so the cpsw driver can find the phy id.
2019-07-29 10:42:15 +00:00
Emmanuel Vadot
7be976a88e arm: ti: Get the hwmods property either from the node or the parent
r350229 changed the code to lookup the ti,hwmods property in the parent
as it's now like that in the DTS from >= Linux 5.0, allow the property
to be also in the node itself so we can boot with an older DTB.

Reported by:	"Dr. Rolf Jansen" <rj@obsigna.com>
2019-07-29 10:40:51 +00:00
Michael Tuexen
bb63f59b22 When performing after_idle() or post_recovery(), don't disable the
DCTCP specific methods. Also fallthrough NewReno for non ECN capable
TCP connections and improve the integer arithmetic.

Obtained from:		Richard Scheffenegger
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D20550
2019-07-29 09:19:48 +00:00
Michael Tuexen
333ba164d6 * Improve input validation of sysctl parameters for DCTPC.
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page

Obtained from:		Richard Scheffenegger
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D20549
2019-07-29 08:50:35 +00:00
Alexander Motin
8de2d8c009 Add some new fields and bits from NVMe 1.4.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-07-29 03:28:46 +00:00
Hans Petter Selasky
f9e3413a3c Add support for tethering with Nokia 7 plus and the alike.
PR:		239495
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-07-28 21:47:04 +00:00
Alexander Motin
d7c1da6153 Decode some more IDENTIFY DEVICE bits.
MFC after:	2 weeks
2019-07-28 20:17:40 +00:00
Doug Moore
23612f0df3 In swap_pager_putpages, move the initialization of a free-blocks
counter, and the final freeing of freed swap blocks, outside the
region where an object lock is held.  Correct some style(9) and
spelling errors.  Change a panic() to a KASSERT().  Change a boolean_t
to a bool.

Suggested by: alc
Reviewed by: alc
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D21093
2019-07-28 19:32:23 +00:00
Alan Somers
e7d8ebc8ca Better comments for vlrureclaim
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-07-28 16:07:27 +00:00
Alan Somers
2240d8c465 Add v_inval_buf_range, like vtruncbuf but for a range of a file
v_inval_buf_range invalidates all buffers within a certain LBA range of a
file. It will be used by fusefs(5). This commit is a partial merge of
r346162, r346606, and r346756 from projects/fuse2.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21032
2019-07-28 00:48:28 +00:00
Alexander Motin
1173e5a721 Reenable UNMAP support on ramdisks by default.
For some reason, I guess just mechanical editing, it was disable in r333446.

MFC after:	2 weeks
2019-07-27 18:07:46 +00:00
Alexander Motin
4b9fba0cc5 Allow WRITE SAME handle more then 2^^32 blocks.
If not limited by write_same_max_lba option, split operation into several
2^^31 blocks chunks in a loop.  For large disks it may take a while, so
setting write_same_max_lba may be useful to avoid timeouts.

While there, fix build with CAM_CTL_DEBUG.

MFC after:	2 weeks
2019-07-27 17:27:26 +00:00
Warner Losh
0d212ef0cd Remove support for kernel.tramp and kernel.tramp.gz
Nothing uses these anymore. They were for super small armv4 boards without
uboot. We removed armv4 support before 13.0, but neglected to garbage collect
this at the same time. Today, both flavors of armv5 kernels (mv and ralink) boot
via uboot which has its own compression scheme for boards that need it.

Note: OLDFILES has not been updated beacuse installkernel will move the whole
directory out of the way before installing the new kernel.

Differential Revision: https://reviews.freebsd.org/D21072
2019-07-27 17:24:19 +00:00
Emmanuel Vadot
1f3392ed10 arm: Fix TEGRA124 kernel
Since r350162 device syscon is needed for sdhci driver.
Add it to the config file.

Reported by:	dim
2019-07-27 15:04:10 +00:00
Andrew Rybchenko
7daf1fed80 sfxge(4): unify power of 2 alignment check macro
Substitute driver-defined IS_P2ALIGNED() with EFX_IS_P2ALIGNED()
defined in libefx.

Add type argument and cast value and alignment to one specified type.

Reported by:    Andrea Valsania <andrea.valsania at answervad.it>
Reviewed by:    philip
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision:  https://reviews.freebsd.org/D21076
2019-07-27 09:36:45 +00:00
Andrew Rybchenko
e561c5fe44 sfxge(4): fix align to power of 2 when align has smaller type
Substitute driver-defined P2ALIGN() with EFX_P2ALIGN() defined in
libefx.

Cast value and alignment to one specified type to guarantee result
correctness.

Reported by:    Andrea Valsania <andrea.valsania at answervad.it>
Reviewed by:	philip
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision:  https://reviews.freebsd.org/D21075
2019-07-27 09:36:36 +00:00
Andrew Rybchenko
ec30f0bec6 sfxge(4): fix power of 2 round up when align has smaller type
Substitute driver-defined P2ROUNDUP() h with EFX_P2ROUNDUP()
defined in libefx.

Cast value and alignment to one specified type to guarantee result
correctness.

Reported by:	Andrea Valsania <andrea.valsania at answervad.it>
Reviewed by:    philip
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision:  https://reviews.freebsd.org/D21074
2019-07-27 09:36:27 +00:00
Rick Macklem
b4c9955e41 Lock the vnode before calling ufs_bmap_seekdata().
r346932 replaced a call to vn_bmap_seekhole() with a call to
ufs_bmap_seekdata(). Although vn_bmap_seekhole() locks the vnode,
ufs_bmap_seekdata() assumes it is already locked.
This patch adds locking of the vnode before the ufs_bmap_seekdata() call.
If the vn_lock() call fails, it returns EBADF since that is the normal
error returned when a file system is forced dismounted and is already
listed as an error return in the lseek(2) man page.

Discussed with:	markj
Reviewed by:	kib
2019-07-27 01:52:34 +00:00
Kristof Provost
776d3d5924 virtio: Fix running on machines with memory above 0xffffffff
We want to allocate a contiguous memory block anywhere in memory, but
expressed this as having to be between 0 and 0xffffffff. This limits us
on 64-bit machines, and outright breaks on machines where memory is
mapped above that address range.

Allow the full address range to be used for this allocation.

Sponsored by:	Axiado
2019-07-26 19:16:02 +00:00
Alexander Motin
ed3bf01599 Add support for Long LBA mode parameter block descriptor.
It is formally required for SBC Base 2016 feature set.

MFC after:	2 weeks
2019-07-26 19:14:12 +00:00
Alan Cox
b3d6ea5bc2 Implement pmap_advise(). (Without a working pmap_advise() implementation
madvise(MADV_DONTNEED) and madvise(MADV_FREE) are NOPs.)

Reviewed by:	markj
X-MFC after:	r350004
Differential Revision:	https://reviews.freebsd.org/D21062
2019-07-26 05:07:09 +00:00
Alexander Motin
ae8828bad1 Add device temperature reporting into CTL.
The values to report can be set via LUN options.  It can be useful for
testing, and also required for Drive Maintenance 2016 feature set.

MFC after:	2 weeks
2019-07-26 03:49:16 +00:00
Alexander Motin
0ea67e7019 Add reporting of SCSI Feature Sets VPD page from SPC-5.
CTL implements all defined feature sets except Drive Maintenance 2016,
which is not very applicable to such a virtual device, and implemented
only partially now.  But may be it could be fixed later at least for
completeness.

MFC after:	2 weeks
2019-07-26 01:49:28 +00:00
Kyle Evans
0dbac71f19 if_tuntap(4): Add TUNGIFNAME
This effectively just moves TAPGIFNAME into common ioctl territory.

MFC after:	3 days
2019-07-25 22:23:34 +00:00
Alan Cox
5d18382b72 Simplify the handling of superpages in pmap_clear_modify(). Specifically,
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.

Deindent the nearby code.

Reviewed by:	kib, markj
Tested by:	pho (amd64, i386)
X-MFC after:	r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision:	https://reviews.freebsd.org/D21027
2019-07-25 22:02:55 +00:00
Warner Losh
08a607e0f3 Widen the type for to.
The timeout field in the CAPS register is defined to be 8 bits, so its type was
uint8_t. We recently started adding 1 to it to cope with rogue devices that
listed 0 timeout time (which is impossible). However, in so doing, other devices
that list 0xff (for a 2 minute timeout) were broken when adding 1
overflowed. Widen the type to be uint32_t like its source register to avoid the
issue.

Reported by: bapt@
2019-07-25 20:26:21 +00:00
Alexander Motin
c15a591cbd Make camcontrol sanitize support also ATA devices.
ATA sanitize is functionally identical to SCSI, just uses different
initiation commands and status reporting mechanism.

While there, make kernel better handle sanitize commands and statuses.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-07-25 18:48:31 +00:00
Leandro Lupori
6962841780 powerpc: Improve pvo allocation code
Added allocation retry loop in alloc_pvo_entry(), to wait for
memory to become available if the caller specifies the M_WAITOK flag.

Also, the loop in moa64_enter() was removed, as moea64_pvo_enter()
never returns ENOMEM. It is alloc_pvo_entry() memory allocation that
can fail and must be retried.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21035
2019-07-25 15:27:05 +00:00
Rick Macklem
013e5361d6 r350320 committed the wrong version of generated syscall.mk.
This commit is for the correct version. (The incorrect one had the order
of the last two entries reversed, due to my testing with copy_file_range
at 568 instead of 569.) This misordering should not have been a problem,
but is now fixed.
2019-07-25 06:48:30 +00:00
Rick Macklem
4c50e495ff Update the generated syscall.mk for copy_file_range(2).
I missed this file for commit r350316.
2019-07-25 06:35:21 +00:00
Rick Macklem
bf499e87f5 Update the generated syscall files for copy_file_range(2) added by
r350315.
2019-07-25 05:55:55 +00:00
Rick Macklem
bbbbeca3e9 Add kernel support for a Linux compatible copy_file_range(2) syscall.
This patch adds support to the kernel for a Linux compatible
copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9).
This syscall/VOP can be used by the NFSv4.2 client to implement the
Copy operation against an NFSv4.2 server to do file copies locally on
the server.
The vn_generic_copy_file_range() function in this patch can be used
by the NFSv4.2 server to implement the Copy operation.
Fuse may also me able to use the VOP_COPY_FILE_RANGE() method.

vn_generic_copy_file_range() attempts to maintain holes in the output
file in the range to be copied, but may fail to do so if the input and
output files are on different file systems with different _PC_MIN_HOLE_SIZE
values.

Separate commits will be done for the generated syscall files and userland
changes. A commit for a compat32 syscall will be done later.

Reviewed by:	kib, asomers (plus comments by brooks, jilles)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20584
2019-07-25 05:46:16 +00:00
Justin Hibbits
7c382eea30 powerpc/pmap64: Make moea64 statistics optional
Summary:
It turns out statistics accounting is very expensive in the pmap driver,
and doesn't seem necessary in the common case.  Make this optional
behind a MOEA64_STATS #define, which one can set if they really need
statistics.

This saves ~7-8% on buildworld time on a POWER9.

Found by bdragon.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D20903
2019-07-25 03:47:27 +00:00
Mark Johnston
2fb62b1a46 Fix the turnstile_lock() KPI.
turnstile_{lock,unlock}() were added for use in epoch.  turnstile_lock()
returned NULL to indicate that the calling thread had lost a race and
the turnstile was no longer associated with the given lock, or the lock
owner.  However, reader-writer locks may not have a designated owner,
in which case turnstile_lock() would return NULL and
epoch_block_handler_preempt() would leak spinlocks as a result.

Apply a minimal fix: return the lock owner as a separate return value.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21048
2019-07-24 23:04:59 +00:00
Mark Johnston
a76f78dc3f Remove cap_random(3).
Now that we have a way to obtain entropy in capability mode
(getrandom(2)), libcap_random is obsolete.  Remove it.

Bump __FreeBSD_version in case anything happens to use it, though I've
found no consumers.

Reviewed by:	delphij, emaste, oshogbo
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21033
2019-07-24 22:50:43 +00:00
Eric Joyner
7f3f6aad3e iflib: fix dangling device softc pointer
Commit text by Jake:
If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register
function will free the ctx pointer. However, it does not reset the
device softc pointer to NULL.

This will result in memory corruption as a future access to the now
invalid pointer will corrupt memory that is later allocated on top of
the same memory location.

The iflib_device_deregister function correctly resets the softc pointer
by using device_set_softc().

This clears up the invalid dangling pointer and prevents memory
corruption that could lead to a panic or undefined behavior if the
device's driver failed to attach.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21003
2019-07-24 21:43:41 +00:00
Ed Maste
5e3ccbd9ac enable ig4_acpi on aarch64
The already-listed APMC0D0F ID belongs to the Ampere eMAG aarch64
platform, but ACPI support was not even built on aarch64.

Submitted by:	Greg V <greg_unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D21059
2019-07-24 21:26:17 +00:00
Ed Maste
532bc58628 pf: zero output buffer in pfioctl
Avoid potential structure padding leak.

Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	kp
MFC after:	3 days
Security:	Potential kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-07-24 16:51:14 +00:00
Kirill Ponomarev
b7592822d5 Allow set MTU more than 1500 bytes.
Submitted by:	Alexandr Fedorov <aleksandr.fedorov_itglobal_dot_com>
Approved by:	jhb, rgrimes
Sponsored by:	ITGlobal.com
Differential Revision:	https://reviews.freebsd.org/D19422
2019-07-24 16:10:20 +00:00
Mark Johnston
e020a35f5b Remove a redundant offset computation in elf_load_section().
With r344705 the offset is always zero.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
2019-07-24 15:18:05 +00:00
Michael Tuexen
d21036e033 Add a sysctl variable ts_offset_per_conn to change the computation
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.

Reported by:		devgs@ukr.net
Reviewed by:		rrs@
MFC after:		2 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20980
2019-07-23 21:28:20 +00:00
Ed Maste
051e692a99 mqueuefs: fix struct file leak
In some error cases we previously leaked a stuct file.

Submitted by:	mjg, markj
2019-07-23 20:59:36 +00:00
Michael Tuexen
9a4f1a2492 Don't hold a mutex while calling sbwait. This was found by syzkaller.
Submitted by:		rrs@
Reported by:		markj@
MFC after:		1 week
2019-07-23 18:31:07 +00:00
Eric Joyner
2dc2d58035 ixgbe(4): Fix enabling/disabling and reconfiguration of queues
- Wrong order of casting and bit shift caused that enabling and disabling
  queues didn't work properly for queues number larger than 32. Use literals
  with right suffix instead.

- TX ring tail address was not updated during reinitiailzation of TX
  structures. It could block sending traffic.

- Also remove unused variables 'eims' and 'active_queues'.

Submitted by:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by:	erj@
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20826
2019-07-23 18:14:32 +00:00
Michael Tuexen
1a91d6987c Fix a LOR in SCTP which was found by running syzkaller.
Submitted by:		rrs@
Reported by:		markj@
MFC after:		1 week
2019-07-23 18:07:36 +00:00
Andrew Turner
ac4e582795 As with r350241 use the new UL macro on the main register mask.
MFC after:	1 week
Sponsored by:	DARPA, AFRL
2019-07-23 14:52:46 +00:00
Andrew Turner
f31c5955e2 Ensure the arm64 ID register fields are 64 bit types.
Previously only some of the ID register fields were 64 bit. To allow
for a script to generate these mark them all 64 bit. To allow for their
use in assembly we need to use the UINT64_C macro via a new UL macro
to stop the lines from being too long.

MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20977
2019-07-23 14:40:37 +00:00
Andrey V. Elsukov
455d2ecb71 Eliminate rmlock from ipfw's BPF code.
After r343631 pfil hooks are invoked in net_epoch_preempt section,
this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion
to each ipfw_bpf_*tap*() call to require to be called from inside
epoch section.

Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes
safe to free() ifnet. And use on-stack ifnet pointer in each
ipfw_bpf_*tap*() call to avoid NULL pointer dereference in case when
V_*log_if global variable will become NULL during ipfw_bpf_*tap*() call.

Sponsored by:	Yandex LLC
2019-07-23 12:52:36 +00:00
Alexander Motin
76d843dab2 Make CAM ATA stack handle disk resizes.
While for ATA disks resize is even more rare situation than for SCSI, it
may happen in case of HPA or AMA being used.  Make ATA XPT report minor
IDENTIFY DATA change to upper layers with AC_GETDEV_CHANGED, and ada(4)
periph driver handle that event, recalculating all the disk properties and
signalling resize to GEOM.  Since ATA has no mechanism of UNIT ATTENTIONs,
like SCSI, it has no way to detect that something has changed.  That is why
this functionality depends on explicit reprobe via XPT_REPROBE_LUN call.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2019-07-23 02:11:14 +00:00
Justin Hibbits
b75bd60a04 powerpc: Unbreak 64-bit pmap from 350206
oldpvo is never explicitly NULL'd by moea64_pvo_enter(), so don't check for
NULL to do anything, only check error.

PR:		239372
Reported by:	Francis Little
2019-07-22 22:59:50 +00:00
Ian Lepore
39a289c3d5 Correct spelling, partion -> partition. 2019-07-22 22:41:44 +00:00
Emmanuel Vadot
b9305a86e1 arm: ti: Add a driver for ti,sysc bus
ti,sysc is a simple-bus like driver.
Add a driver for it so child nodes can attach.
2019-07-22 21:55:33 +00:00
Emmanuel Vadot
04132e0157 arm: ti: Get the hwmods property from the parent node
Since the Linux 5.0 dts the ti,hwmods property is on the parent
ti.sysc node.
2019-07-22 21:53:58 +00:00
Brooks Davis
c7bacdcc32 ata_xpt: Use the correct union member when accessing valid.
In principle this should not matter as it's a union and they point to
the same memory location but based on the code above we should be
accessing .sata and not .ata.

Submitted by:	arichardson
Reviewed by:	scottl, imp
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D21002
2019-07-22 21:07:58 +00:00
Alan Somers
caaa7cee09 [skip ci] Fix the comment for cache_purge(9)
This is a merge of r348738 from projects/fuse2

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-07-22 21:03:52 +00:00
Michael Tuexen
f1903dc055 Wakeup the application when doing PD-API for unordered DATA chunks.
Work done with rrs@.

MFC after:		1 week
2019-07-22 18:11:35 +00:00
Ruslan Bukin
e8e90fef03 Remove unused header.
Sponsored by:	DARPA, AFRL
2019-07-22 16:50:37 +00:00
Ruslan Bukin
951e058411 o Add support for BERI IOMMU device
o Add an experimental IOMMU support to xDMA framework

The BERI IOMMU device is the part of CHERI device-model project [1]. It
translates memory addresses for various BERI peripherals modelled in
software. It accepts FreeBSD/mips64 page directories format and manages
BERI TLB.

1. https://github.com/CTSRD-CHERI/device-model

Sponsored by:	DARPA, AFRL
2019-07-22 16:01:20 +00:00
Justin Hibbits
5db86748b5 powerpc64/mmu: Make moea64_pvo_enter() return if an entry already exists
Summary:
Instead of searching for a PVO entry before adding, take advantage of
the fact that RB_INSERT() returns NULL if it inserts, and the existing entry if
an entry exists, without inserting a new entry.  This saves an extra tree
traversal in the cases where the PVO does not exist.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D20944
2019-07-22 03:11:54 +00:00
Konstantin Belousov
13ff4eb1fa Switch the rest of the refcount(9) functions to bool return type.
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.

Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21014
2019-07-21 20:16:48 +00:00
Ian Lepore
d4828bcfc7 Add support for setting the aging/frequency-offset register via sysctl.
The 2127 and 2129 chips support a frequency tuning value in the range of
-7 through +8 PPM; add a sysctl handler to read and set the value.
2019-07-21 17:14:39 +00:00
Alan Cox
f606d835de With the introduction of software dirty bit emulation for managed mappings,
we should test ATTR_SW_DBM, not ATTR_AP_RW, to determine whether to set
PGA_WRITEABLE.  In effect, we are currently setting PGA_WRITEABLE based on
whether the dirty bit is preset, not whether the mapping is writeable.
Correct this mistake.

Reviewed by:	markj
X-MFC with:	r350004
Differential Revision:	https://reviews.freebsd.org/D21013
2019-07-21 17:00:19 +00:00
Konstantin Belousov
1004039858 Fix userspace build after r350199.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-07-21 16:24:40 +00:00
Konstantin Belousov
f1cf2b9dcb Check and avoid overflow when incrementing fp->f_count in
fget_unlocked() and fhold().

On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
	kern.maxfilesperproc: 939132
	kern.maxprocperuid: 34203
which already overflows u_int.  More, the malicious code can create
transient references by sending fds over unix sockets.

I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Reviewed by:	markj (previous version), mjg
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20947
2019-07-21 15:07:12 +00:00
Alan Cox
1a4cb969d1 Introduce pmap_store(), and use it to replace pmap_load_store() in places
where the page table entry was previously invalid.  (Note that I did not
replace pmap_load_store() when it was followed by a TLB invalidation, even
if we are not using the return value from pmap_load_store().)

Correct an error in pmap_enter().  A test for determining when to set
PGA_WRITEABLE was always true, even if the mapping was read only.

In pmap_enter_l2(), when replacing an empty kernel page table page by a
superpage mapping, clear the old l2 entry and issue a TLB invalidation.  My
reading of the ARM architecture manual leads me to believe that the TLB
could hold an intermediate entry referencing the empty kernel page table
page even though it contains no valid mappings.

Replace a couple direct uses of atomic_clear_64() by the new
pmap_clear_bits().

In a couple comments, replace the term "paging-structure caches", which is
an Intel-specific term for the caches that hold intermediate entries in the
page table, with wording that is more consistent with the ARM architecture
manual.

Reviewed by:	markj
X-MFC after:	r350004
Differential Revision:	https://reviews.freebsd.org/D20998
2019-07-21 03:26:26 +00:00
Justin Hibbits
b982c7ee20 powerpc: Remove an unnecessary #ifdef guard from slb.c
slb.c is only compiled for powerpc64, so no need for the #ifdef in this block.
2019-07-21 03:19:54 +00:00
Ian Lepore
2444018f7d Rewrite the nxprtc chip init to extend battery life by using power-saving
features offered by the chips.

For 2127 and 2129 chips, fix the detection of when chip-init is needed.  The
chip config needs to be reset whenever power was lost, but the logic was
wrong for 212x chips (it only worked for 8523).  Now the "oscillator
stopped" bit rather than the power manager mode is used to detect startup
after powerfail.

For all chips, disable the clock output pin.

For chips that have a timestamp/tamper-monitor feature, turn off monitoring
of the timestamp trigger pin.

The 8523, 2127, and 2129 chips have a "power manager" feature that offers
several options.  We've been using the default mode which enables
everything.  Now the code sets the power manager options to

 - direct-switch (when Vdd < Vbat, without extra threshold check)
 - no battery monitor
 - no external powerfail monitor

This reduces the current draw while running on battery from 1930nA to 880nA,
which should roughly double the lifespan of the battery under load.

Because battery checking is a nice thing to have, the code now does a check
at startup, and then once a day after that, instead of checking continuously
(but only actually reporting at startup).  The battery check is now done by
setting the power manager back to default mode, sleeping briefly while it
makes a voltage measurement, then switching back to power-saving mode.
2019-07-20 21:10:27 +00:00
Mark Johnston
b16e57a6c9 Rename vm_page_{import,release}() to vm_page_zone_{import,release}().
I would like to use the name vm_page_release() for a different purpose,
and vm_page_{import,release}() are local to vm_page.c.

Reviewed by:	kib
MFC after:	1 week
2019-07-20 18:25:41 +00:00
Justin Hibbits
cafceaebea powerpc/SPE: Enable SPV bit for EFSCFD instruction emulation
EFSCFD (floating point single convert from double) emulation requires saving
the high word of the register, which uses SPE instructions.  Enable the SPE
to avoid an SPV Unavailable exception.

MFC after:	1 week
2019-07-20 18:22:01 +00:00