137889 Commits

Author SHA1 Message Date
emaste
673399d81f Update comments and ordering in linux*_dummy.c
- sort alphabetically
- getcpu arrived in Linux 2.6.19
- fanotify_* arrived in 2.6.36
2019-09-11 17:56:48 +00:00
emaste
22e36b3874 linuxulator: add stub arm64 linux_genassym.c
This will be fleshed out in the future but allows us to build the arm64
linuxulator using the same infrastructure as x86.
2019-09-11 17:29:44 +00:00
emaste
bc0ee42672 linuxulator: memfd_create first appeared in Linux 3.17
Reference: http://man7.org/linux/man-pages/man2/memfd_create.2.html
2019-09-11 17:05:49 +00:00
emaste
142a3d6513 linuxulator: seccomp syscall first appeared in Linux 3.17
Reference: http://man7.org/linux/man-pages/man2/seccomp.2.html
2019-09-11 17:04:13 +00:00
kp
68aace5a8d riscv: Small fix to CPU compatibility identification
fdt_is_compatible_strict() inspects the first compatible property.
We need to inspect the following properties for 'riscv'.
ofw_bus_node_is_compatible() does a recursive search.

This patch fixes "Can't find CPU" error message when bootverbose = true.

Submitted by:	Nicholas O'Brien (nickisobrien_gmail.com)
Reviewed by:	philip, kp
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D21576
2019-09-11 16:16:53 +00:00
rrs
364ba42474 With the recent commit of ktls, we no longer have a
sb_tls_flags, its just the sb_flags. Also the ratelimit
code, now that the defintion is in sockbuf.h, does not
need the ktls.h file (or its predecessor).

Sponsored by:	Netflix Inc
2019-09-11 15:41:36 +00:00
emaste
341e632828 fw_stub.awk: use @generated tag in generated files
Multiple tools use @generated to identify generated files (for example,
in a review Phabricator will by default hide diffs in enerated files).
Use the @generated tag in makesyscalls.sh as we've done for other
generated files.
2019-09-11 13:35:22 +00:00
emaste
c1fe73ee39 linux: add trivial renameat2 implementation
Just return EINVAL if flags != 0.  The Linux man page documents one
case of EINVAL as "The filesystem does not support one of the flags in
flags."

After r351723 userland binaries will try using new system calls.

Reported by:	mjg
Reviewed by:	mjg, trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21590
2019-09-11 13:01:59 +00:00
emaste
89195b1af4 regen linuxulator sysent after r352208 2019-09-11 12:58:53 +00:00
emaste
d50f11cc69 make linux_renameat2 args consistent with linux_renameat
Use 'dfd' consistently for a directory fd.
2019-09-11 12:58:06 +00:00
hselasky
39803421e2 Use true and false when dealing with bool type in the LinuxKPI.
No functional change.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:24:47 +00:00
hselasky
f2bc2e1d05 Fix synchronous work drain issue in the LinuxKPI.
A work callback may restart itself. Loop in the drain function to see if the
work has been rescheduled and stop the subsequent reschedules, if any.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:20:13 +00:00
hselasky
9bf06d7d6b Fix broken DECLARE_TASKLET() macro after r347852.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 07:53:49 +00:00
mav
3157c6b68c Fix assumptions of only one device per SES slot.
It is typical to have one, but no longer true for multi-actuator HDDs
with separate LUN for each actuator.

MFC after:	4 days
Sponsored by:	iXsystems, Inc.
2019-09-11 03:25:30 +00:00
ian
0d5f7d7285 In am335x_dmtpps, use a spin mutex to interlock between PPS capture and PPS
ioctl(2) handling.  This allows doing the pps_event() work in the polling
routine, instead of using a taskqueue task to do that work.

Also, add PNPINFO, and switch to using make_dev_s() to create the cdev.

Using a spin mutex and calling pps_event() from the polling function works
around the situation which requires more than 2 sets of timecounter
timehands in a single-core system to get reliable PPS capture.  That problem
would happen when a single-core system is idle in cpu_idle() then gets woken
up with an event timer event which was scheduled to handle a hardclock tick.
That processing path would end up calling tc_windup 3 or 4 times between
when the tc polling function was called and when the taskqueue task would
eventually run, and with only two sets of timehands, the th_generation count
would always be too old to allow the captured PPS data to be used.
2019-09-10 22:08:34 +00:00
mjg
e1f31f93d7 cache: avoid excessive relocking on entry removal during lookup
Due to lock ordering issues (bucket lock held, vnode locks wanted) the code
starts with trylocking which in face of contention often fails. Prior to
the change it would loop back with a possible yield.

Instead note we know what locks are needed and can take them in the right
order, avoiding retries. Then we can safely re-lookup and see if the entry
we are looking for is still there.

On a 104-way box poudriere would result in constant retries during an 11h
run as seen in the vfs.cache.zap_and_exit_bucket_fail counter.

before: 408866592
after :         0

However, a new stat reports:
vfs.cache.zap_and_exit_bucket_relock_success: 32638

Note this is only a bandaid over current design issues.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2019-09-10 20:19:29 +00:00
mjg
5924940444 cache: change the formula for calculating lock array sizes
It used to be mp_ncpus * 64, but this gives unnecessarily big values for small
machines and at the same time constraints bigger ones. In particular this helps
on a 104-way box for which the count is now doubled.

While here make cache_purgevfs less likely. Currently it is not efficient in
face of contention due to lock ordering issues. These are fixable but not worth
it at the moment.

Sponsored by:	The FreeBSD Foundation
2019-09-10 20:11:00 +00:00
mjg
3a41dd1946 cache: assorted cleanups
Sponsored by:	The FreeBSD Foundation
2019-09-10 20:08:24 +00:00
jeff
90a6cf3d6d Replace redundant code with a few new vm_page_grab facilities:
- VM_ALLOC_NOCREAT will grab without creating a page.
 - vm_page_grab_valid() will grab and page in if necessary.
 - vm_page_busy_acquire() automates some busy acquire loops.

Discussed with:	alc, kib, markj
Tested by:	pho (part of larger branch)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21546
2019-09-10 19:08:01 +00:00
jeff
fa230897c4 Use the sleepq lock rather than the page lock to protect against wakeup
races with page busy state.  The object lock is still used as an interlock
to ensure that the identity stays valid.  Most callers should use
vm_page_sleep_if_busy() to handle the locking particulars.

Reviewed by:	alc, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21255
2019-09-10 18:27:45 +00:00
luporl
009ed2340b Add R_PPC_IRELATIVE relocation
Pre-requisite for most ifunc related changes.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21587
2019-09-10 16:16:05 +00:00
hselasky
f907290a97 Callout drain does not have to be followed by a callout stop call.
Fix bogus code.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-10 14:33:07 +00:00
lwhsu
5be64602ac Fix build for the platforms where db_expr_t is not long
Sponsored by:	The FreeBSD Foundation
2019-09-10 08:51:11 +00:00
cem
08e679900d Appease Clang false-positive Werrors in r352112
Reported by:	bcran
2019-09-10 01:56:47 +00:00
cem
468799be93 ddb(4): Add 'show route <dest>' and 'show routetable [<af>]'
These commands show the route resolved for a specified destination, or
print out the entire routing table for a given address family (or all
families, if none is explicitly provided).

Discussed with:	emaste
Differential Revision:	https://reviews.freebsd.org/D21510
2019-09-09 22:54:27 +00:00
markj
ccbfa8304f Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
emaste
fc5f7b4c45 msdosfsmount.h: fix ifdef comment 2019-09-09 18:35:17 +00:00
cem
17adfbe017 ddb(4): Add some support for lexing IPv6 addresses
Allow commands to specify that (hex) numbers may start with A-F, by adding
the DRT_HEX flag for db_read_token_flags().  As before, numbers containing
invalid digits for the current radix are rejected.

Also, lex ':' and '::' tokens as tCOLON and tCOLONCOLON respectively.

There is a mild conflict here with lexed "identifiers" (tIDENT): ddb
identifiers may contain arbitrary colons, and the ddb lexer is greedy.  So
the identifier lex will swallow any colons it finds inside identifiers, and
consumers are still unable to expect the token sequence 'tIDENT tCOLON'.
That limitation does not matter for IPv6 addresses, because the lexer always
attempts to lex numbers before identifiers.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D21509
2019-09-09 16:32:23 +00:00
cem
718e6295e7 ddb(4): Enhance lexer functionality for specialized commands
Add a db_read_token_flags() variant of db_read_token() with configurable
parameters.

Allow specifying an explicit radix for tNUMBER lexing.  It overrides the
default inference and db_radix setting.

Also provide the option of yielding any lexed whitespace (tWSPACE) (instead
of ignoring it).  This is useful for whitespace-sensitive CS_OWN commands.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D21459
2019-09-09 16:31:14 +00:00
tuexen
f0fb0ad52b Only update SACK/DSACK lists when a non-empty segment was received.
This fixes hitting a KASSERT with a valid packet exchange.

Reviewed by:		rrs@, Richard Scheffenegger
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21567
2019-09-09 16:07:47 +00:00
kp
af6c5b7e8c riscv: Ensure that BSS is 8-byte aligned
This makes clearing it (from locore.S) work without misaligned accesses
(which can trap to machine mode, and be slow).

Reviewed by:	br
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D21538
2019-09-09 15:57:24 +00:00
kib
9fa66d3539 Initialize timehands linkage much earlier.
Reported and tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-09 12:42:48 +00:00
kib
a8b7b27cb7 Make timehands count selectable at boottime.
Tested by:	O'Connor, Daniel <darius@dons.net.au>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21563
2019-09-09 11:29:58 +00:00
kib
479eaba412 Remove some unneeded vfs_busy() calls in SU code.
When softdep_fsync() is running, a caller must already started write
for the mount point.  Since unmount or remount to ro suspends mount
point, it cannot run in parallel with softdep_fsync(), which makes
vfs_busy() call there not needed.

Doing blocking vfs_busy() there effectively causes lock order reversal
between vn_start_write() and setting MNTK_UNMOUNT, because
vfs_busy(mp, 0) sleeps waiting for MNTK_UNMOUNT becoming clear, while
unmount sets the flag and starts the suspension.

Note that all other uses of vfs_busy() in SU code are non-blocking.

Reported by:	chs by mckusick
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-09 11:22:38 +00:00
pfg
752098d2d5 ral(4): Use unsigned to avoid undefined behavior.
Found by NetBSD's kUBSan

Obtained from:	NetBSD (github 5b153f1)
2019-09-09 03:31:46 +00:00
cem
7527999a72 ddb(4): Move an extern variable declaration to a header
Trivial cleanup, no functional change.
2019-09-09 01:33:45 +00:00
cem
1430355eaa gdb(4): Root a sysctl tree at 'debug.gdb.'
Like debug.ddb and debug.kdb.  Rename 'debug.gdbcons' to 'debug.gdb.cons,'
but leave the old name as a compatibility alias.
2019-09-08 22:52:47 +00:00
mhorne
199d0f1085 Fix compilation of locore.S with clang
The branch from _start to mpentry has to cross a large section of data;
an offset larger than can be specified with a 12-bit branch immediate.
Fix this by converting the branch to an unconditional jump. The gcc
assembler does this conversion silently but it is not done automatically
by clang.

Reported by:	Jeremy Bennett <jeremy.bennett@embecosm.com>
Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21437
2019-09-08 19:53:11 +00:00
mhorne
09e90944bc Remove a duplicate KTR entry
Reviewed by:	markj
Approved by:	markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D21438
2019-09-08 19:46:34 +00:00
mhorne
1c6307c769 RISC-V: fix kernel CFLAGS with clang
Use the -march and -mabi flags for both gcc and clang as they are
compatible. Specify the "medium" code model separately as it goes by the
name "medany" under gcc, although they are equivalent.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D21436
2019-09-08 19:44:21 +00:00
gonzo
00598670ae [rpi] Inherit framebuffer BPP value from the VideoCore firmware
Instead of using hardcoded bpp of 24, obtain current/configured value
from VideoCore. This solves certain problems with Xorg/Qt apps that
require bpp of 32 to work properly. The mode can be forced by setting
framebuffer_depth value in config.txt

PR:		235363
Submitted by:	Steve Peurifoy <ssw01@mathistry.net>
2019-09-08 09:47:21 +00:00
kib
df3d8d3b51 In do_execve(), use shared text vnode lock consistently.
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 16:10:57 +00:00
kib
6e4e3447f1 In do_execve(), clear imgp->textset when restarting for interpreter.
Otherwise, we might left the boolean set, which would affect cleanup
after an error on interpreter activation.

Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 16:05:17 +00:00
kib
8d669d15e5 When loading ELF interpreter, initialize whole nested image_params with zero.
Otherwise we could mishandle imgp->textset.

Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 16:03:26 +00:00
kib
beb54c01ec vm_object_deallocate(): Remove no longer needed code.
We track text mappings explicitly, there is no removal of the text
refs on the object deallocate any more, so tmpfs objects should not be
treated specially. Doing so causes excess deref.

Reported and tested by:	gallatin
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 16:01:45 +00:00
kib
bb0c984706 vm_object_coalesce(): avoid extending any nosplit objects, not only
that which back tmpfs nodes.

Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 15:58:48 +00:00
kib
e09c8a69d2 Properly check for writers when fetching quotas for writeable vnodes
in UFS quotaon().

Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21560
2019-09-07 15:57:23 +00:00
mav
f4dc60c4d5 Supply SAT layer with valid transfer sizes.
This is a rework of r344701, that noticed that number of bytes passes to
8 bit sector count field gets truncated.  First decision was to not pass
anything, since ATA specs define the field as N/A.  But it appeared to be a
problem for some SAT devices, that require information about data transfer
to operate properly.  Some additional investigation shown that it is quite
a common practice to set unused fields of ATA commands (fortunately ATA
specs formally allow it) to supply the information to SAT layer.  I have
found SAS-SATA interposer that does not allow pass-through without it.

As side effect, reduce code duplication by removing ata_do_28bit_cmd()
function, replacing it with more universal ata_do_cmd().

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-09-07 15:56:00 +00:00
philip
d04bb84782 riscv: restore default HZ=1000, keep QEMU at HZ=100
This reverts r351918 and r351919.

Discussed with:	br, ian, imp
2019-09-07 05:13:31 +00:00
imp
a3a09b7094 Some newer HID devices have descriptors that are larger than 1k. Bump
this to 2k to prevent them from being truncated and ignored. It
appears to be a sanity check only, but bumping it to 2k allows both of
my iic hid devices to be parsed and the second one to work...
2019-09-07 03:51:26 +00:00