15573 Commits

Author SHA1 Message Date
alc
6b281b98b3 The *_meta_* functions include a radix parameter, a blk parameter, and
another parameter that identifies a starting point in the memory address
block.  Radix is a power of two, blk is a multiple of radix, and the
starting point is in the range [blk, blk+radix), so that blk can always be
computed from the other two.  This change drops the blk parameter from the
meta functions and computes it instead.  (On amd64, for example, this
change reduces subr_blist.o's text size by 7%.)

It also makes the radix parameters unsigned to address concerns that the
calculation of '-radix' might overflow without the -fwrapv option.  (See
https://reviews.freebsd.org/D11819.)

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11964
2017-08-13 16:39:49 +00:00
markj
bce4478d7d Have sendfile_swapin() use vm_page_grab_pages().
Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11942
2017-08-11 16:32:24 +00:00
markj
6f4724899b Modify vm_page_grab_pages() to handle VM_ALLOC_NOWAIT.
This will allow its use in sendfile_swapin().

Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11942
2017-08-11 16:29:22 +00:00
alc
c85a3e68de An invalid page can't be dirty.
Reviewed by:	kib
MFC after:	1 week
2017-08-11 16:27:54 +00:00
andrew
78ce51bf4f Only return the current cpu if it's in the cpumask. When we restrict the
cpumask it probably means we are unable to sent interrupts to CPUs outside
the map. As such only return the current CPU when it's within the mask
otherwise return the first valid CPU.

This is needed on ThunderX as, in a dual socket configuration, we are
unable to send MSI/MSI-X interrupts between sockets.

Reviewed by:	mmel
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11957
2017-08-11 12:45:58 +00:00
glebius
3eeca31b85 Plug uninitialized stack variable leak in sendfile(2).
Reported by:	Ilja Van Sprundel <ivansprundel ioactive.com>
Submitted by:	Domagoj Stolfa <domagoj.stolfa gmail.com>
MFC after:	1 week
Security:	uninitialized stack variable leak
2017-08-09 17:48:38 +00:00
alc
318304a5b7 Introduce vm_page_grab_pages(), which is intended to replace loops calling
vm_page_grab() on consecutive page indices.  Besides simplifying the code
in the caller, vm_page_grab_pages() allows for batching optimizations.
For example, the current implementation replaces calls to vm_page_lookup()
on consecutive page indices by cheaper calls to vm_page_next().

Reviewed by:	kib, markj
Tested by:	pho (an earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11926
2017-08-09 04:23:04 +00:00
asomers
94c85a3167 Make p1003_1b.aio_listio_max a tunable
p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
Its value will be bounded from below by the compile-time constant
AIO_LISTIO_MAX and from above by the compile-time constant
MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.

Reviewed by:	jhb, kib
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D11601
2017-08-08 16:14:31 +00:00
br
3364e8aea9 o Replace __riscv__ with __riscv
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)

This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.

RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):

__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen

Reviewed by:	ngie
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11901
2017-08-07 14:09:57 +00:00
alc
590490023c In case readers are misled by expressions that combine multiplication and
division, add parentheses to make the precedence explicit.

Submitted by:	Doug Moore <dougm@rice.edu>
Requested by:	imp
Reviewed by:	imp
MFC after:	1 week
X-MFC after:	r321840
Differential Revision:	https://reviews.freebsd.org/D11815
2017-08-04 04:23:23 +00:00
markj
2350ab0562 Amend r321884 to check the refcount and update the class with w_mtx held.
Reviewed by:	jhb
X-MFC with:	r321884
2017-08-01 23:14:38 +00:00
markj
c11beada10 Fix a witness assertion that fires when a lock type's class changes.
When all instances of a lock type are destroyed (for example, after a
module unload), the corresponding witness entry remains associated with
that lock type. In this case, we shouldn't panic if a new instance of the
lock type is created and its lock class does not match that recorded in the
witness entry.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11788
2017-08-01 17:50:28 +00:00
alc
48fd0ecbb7 The blist_meta_* routines that process a subtree take arguments 'radix' and
'skip', which denote, respectively, the largest number of blocks that can be
managed by a subtree of that height, and one less than the number of nodes
in a subtree of that height.  This change removes the 'skip' argument from
those functions because 'skip' can be trivially computed from 'radius'.
This change also redefines 'skip' so that it denotes the number of nodes in
the subtree, and so changes loop upper bound tests from '<= skip' to '<
skip' to account for the change.

The 'skip' field is also removed from the blist struct.

The self-test program is changed so that the print command includes the
cursor value in the output.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
2017-08-01 03:51:26 +00:00
dchagin
b084ee9dfc Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.

Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.

Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.

For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.

For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:

    mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd

Reviewed by:	kib@
MFC after:	1 week
Relnotes:	yes
2017-08-01 03:40:19 +00:00
markj
2c9a28c567 Batch v_wire_count decrements in vm_hold_free_pages().
Atomic updates to v_wire_count are a significant source of contention, so
combine multiple updates into one in this easy case. Also remove an old
printf that gets executed if the page is shared-busied, which is a case
that will lead to a panic anyway.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11791
2017-07-31 18:48:58 +00:00
ian
4b1f61b073 Add clock_schedule(), a feature that allows realtime clock drivers to
request that their clock_settime() methods be called at a given offset
from top-of-second.  This adds a timeout_task to the rtc_instance so that
each clock can be separately added to taskqueue_thread with the scheduling
it prefers, instead of looping through all the clocks at once with a
single task on taskqueue_thread.  If a driver doesn't call clock_schedule()
the default is the old behavior: clock_settime() is queued immediately.

The motivation behind this is that I was on the path of adding identical
code to a third RTC driver to figure out a delta to top-of-second and
sleep for that amount of time because writing the the RTC registers resets
the hardware's concept of top-of-second.  (Sometimes it's not top-of-second,
some RTC clocks tick over a half second after you set their time registers.)
Worst-case would be to sleep for almost a full second, which is a rude thing
to do on a shared task queue thread.
2017-07-31 01:18:21 +00:00
markj
e0ef97d135 Correct the predicates on which lockstat:::{thread,spin}-spin fire.
In particular, they should fire only if the lock was owned by another
thread when we first attempted to acquire that lock.

MFC after:	1 week
2017-07-31 00:59:28 +00:00
ian
91a53253d4 Add taskqueue_enqueue_timeout_sbt(), because sometimes you want more control
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
2017-07-31 00:54:50 +00:00
cem
526221c070 kldstat: Use sizeof in place of named constants for sizing
No functional change.

This is handy for FreeBSD derivatives that want to modify the value of
MAXPATHLEN, but not the kld_file_stat ABI.

Submitted by:	Siddhant Agarwal <sagarwal AT isilon.com>
Sponsored by:	Dell EMC Isilon
2017-07-29 23:31:21 +00:00
kib
380834e269 Make it possible to request nosys logging to console.
New kern.lognosys values are
1 - log to ctty
2 - log to console
3 - log to both.

Inspired by:	eugen
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-27 20:45:41 +00:00
kib
d55176ddf0 Make the number of children for pctrie node available outside subr_pctrie.c.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11435
2017-07-27 16:40:14 +00:00
alc
e0d1fd2306 Change the interactions of the interface functions with the "meta" and
"leaf" functions for alloc, free, and fill.  After the change, the interface
functions call "meta" unconditionally, and the "meta" functions recur
unconditionally in looping over their descendants. The "meta" functions
start with a validity test, and then a test for the "leaf" case, before
falling into the general recursive case.  This simplifies and shrinks the
code, and, for "free" and "fill" moves panic tests that check the same meta
node repeatedly in a loop to a place that will have each node tested once.

Remove irrelevant null checks from blist_free and blist_fill.

Make the code that initializes a meta node the same in blist_meta_alloc and
blist_meta_fill.

Parenthesize return expressions in blst_meta_fill.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
2017-07-24 17:23:53 +00:00
ian
074825d383 Add common code to support realtime clocks that store year without century.
Most realtime clocks store the year as 2 BCD digits.  Some add a century bit
to extend the range another hundred years.  Every clock driver has its own
code to determine the century and pass a full year value to clock_ct_to_ts().
Now clock drivers can just convert BCD to bin and store the result in the
clocktime struct and let the common code figure out the century.  Clocks
with a century bit can just add 100 to year if the century bit is on.
2017-07-23 21:28:00 +00:00
tuexen
b0936e7876 Fix getsockopt() for listening sockets when using SO_SNDBUF, SO_RCVBUF,
SO_SNDLOWAT, SO_RCVLOWAT. Since r31972 it only worked for non-listening
sockets.

Sponsored by:	Netflix, Inc.
2017-07-21 07:44:43 +00:00
ngie
609a66450f Fix whitespace regression accidentally checked in via ^/head@r280149
MFC after:	now
2017-07-18 06:51:27 +00:00
alc
0b8b38898f Tidy up before making another round of functional changes: Remove end-
of-line whitespace, remove excessive whitespace and blank lines, remove
dead code, follow our standard style for function definitions, and
correct grammatical and factual errors in some of the comments.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
2017-07-17 23:16:33 +00:00
jhb
a9d5e2394e Set the current vnet pointer in the socket buffer AIO handler.
This fixes panics when using AIO under VIMAGE.

Reported by:	kp
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-07-17 16:59:22 +00:00
ian
2f85e4f446 Minor optimization: instead of converting between days and years using loops
that start in 1970, assume most conversions are going to be for recent dates
and use a precomputed number of days through the end of 2016.

This is a do-over of r320997, hopefully this time with 100% more workiness.

The first attempt had an off-by-one error, but instead of just adding
another mysterious +1 adjustment, this rearranges the relationship between
recent_base_year and recent_base_days so that the latter is the number of
days that occurred before the start of the associated year (instead of the
count thru the end of that year).  This makes the recent_base stuff work
more like the original loop logic that didn't need any +1 adjustments.
2017-07-16 16:54:03 +00:00
markj
f60764cc69 Revert r320918 and have mkdumpheader() handle version string truncation.
Reported by:	jhb
MFC after:	1 week
2017-07-15 20:53:08 +00:00
ian
c9dbb32c87 Revert r320997. There are reports of it getting the wrong results, so
clearly my testing was insuffficent, and it's best to just revert it
until I get it straightened out.
2017-07-15 00:45:22 +00:00
brooks
5339426e33 Add 32-bit compat for kinfo_proc's ki_tdaddr.
This appears to have been an oversight in r213536.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11521
2017-07-14 21:13:05 +00:00
ian
fab5d327e8 Minor optimization: instead of converting between days and years using
loops that start in 1970, assume most conversions are going to be for recent
dates and use a precomputed number of days through the end of 2016.
2017-07-14 18:36:15 +00:00
ian
0574231ce5 Allow setting debug.clocktime as a tunable. Print 64-bit time_t correctly
on 32-bit systems.
2017-07-14 18:13:54 +00:00
imp
4859d0d3e7 This adds CAM pass(4) support for NVMe IO's. Applications indicate
the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or
XPT_NVME_IO.

Submitted by:   Chuck Tuffli <chuck@tuffli.net>
Differential Revision:  https://reviews.freebsd.org/D10247
2017-07-14 14:52:20 +00:00
kib
2f63b6248f Correct sysent flags for dynamically loaded syscalls.
Using the https://github.com/google/capsicum-test/ suite, the
PosixMqueue.CapModeForked test was failing due to an ECAPMODE after
calling kmq_notify(). On further inspection, the dynamically
loaded syscall entry was initialized with sy_flags zeroed out, since
SYSCALL_INIT_HELPER() left sysent.sy_flags with the default value.

Add a new helper SYSCALL{,32}_INIT_HELPER_F() which takes an
additional argument to specify the sy_flags value.

Submitted by:	Siva Mahadevan <smahadevan@freebsdfoundation.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11576
2017-07-14 09:34:44 +00:00
rlibby
fda9c1d9c4 kvprintf %b enhancements
Make the %b formatter accept number formatting flags. It will now accept
alternate form, precision, and length modifiers. It also now partially
supports field width (but forces left justification).

Reviewed by:	markj
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11284
2017-07-12 07:30:14 +00:00
ian
91f22a6f6a Support multiple realtime clocks, and remove locking/sleeping restrictions
on clock drivers.

This tracks multiple concurrent realtime clock drivers in a list sorted by
clock resolution.  When system time changes (and periodically) the
clock_settime() methods of all registered clocks are invoked.

To initialize system time, each driver is tried in turn from best to worst
resolution, until one succesfully returns a valid time.

The code no longer holds a mutex while calling the clock_settime() and
clock_gettime() methods of the registered clocks. This allows clock drivers
to do whatever kind of locking or sleeping is necessary (this is especially
important for i2c clock chips since i2c drivers often need to sleep).

A new clock_register_flags() function allows the clock driver to pass
flags. The flags currently defined help support drivers that use their own
techniques to avoid roundoff errors (prevents the 4/5 rounding done by the
subr_rtc code). A driver which may need to wait for resources (such as bus
ownership) may pass a flag to indicate that it will obtain system time for
itself after waiting for resources; this is merely an optimization to avoid
the common code retrieving a timespec that will never get used.

Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11484
2017-07-12 02:53:54 +00:00
gallatin
4b09bc993e Simplify UIO_SYSSPACE and UIO_NOCOPY paths in uiomove
Uiomove can only block when the segflag is UIO_USERSPACE,
otherwise we end up just doing a bcopy (or nothing) and
moving cursors. So only emit witness warnings and
set deadlock thread flags in the UIO_USERSPACE case.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D11489
2017-07-06 15:03:54 +00:00
hselasky
2f17b2a4e9 After r319722 two fields were left uninitialized when transforming a
socket structure into a listening socket. This resulted in an invalid
instruction fault for all 32-bit platforms.

When INVARIANTS is set the union where the two uninitialized fields
reside gets properly zeroed. This patch ensures the two uninitialized
fields are zeroed when INVARIANTS is undefined.

For 64-bit platforms this issue was not visible because so->sol_upcall
which is uninitialized overlaps with so->so_rcv.sb_state which is
already zero during soalloc();

For 32-bit platforms this issue was visible and resulted in an invalid
instruction fault, because so->sol_upcall overlaps with
so->so_rcv.sb_sel which is always initialized to a valid data pointer
during soalloc().

Verifying the offset locations mentioned above are identical is left
as an exercise to the reader.

PR: 220452
PR: 220358
Reviewed by:	ae (network), gallatin
Differential Revision:	https://reviews.freebsd.org/D11475
Sponsored by:	Mellanox Technologies
2017-07-04 18:23:17 +00:00
kib
03059da737 Resolve confusion between different error code spaces.
The vm_map_fixed() and vm_map_stack() VM functions return Mach error
codes.  Convert them into errno values before returning result from
exec_new_vmspace().

While there, modernize the comment and do minor style adjustments.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-03 20:44:01 +00:00
mjg
ba463d64a7 rwlock: perform the typically false td_rw_rlocks check later
Check if the lock is available first instead.

MFC after:	1 week
2017-07-02 01:05:16 +00:00
alc
31bb357d50 Change blst_leaf_alloc() to handle a cursor argument, and to improve
performance.

To find in the leaf bitmap all ranges of sufficient length, use a doubling
strategy with shift-and-and until each bit still set represents a bit
sequence of length 'count', or until the bitmask is zero.  In the latter
case, update the hint based on the first bit sequence length not found to
be available.  For example, seeking an interval of length 12, the set bits
of the bitmap would represent intervals of length 1, then 2, then 3, then
6, then 12.  If no bits are set at the point when each bit represents an
interval of length 6, then the hint can be updated to 5 and the search
terminated.

If long-enough intervals are found, discard those before the cursor.  If
any remain, use binary search to find the position of the first of them,
and allocate that interval.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	kib, markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D11426
2017-07-01 05:27:40 +00:00
kib
8aea8b1631 Define ino64_trunc_error under same conditions as the code which uses
the variable.

Noted by:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-30 16:10:21 +00:00
alc
0b28a56ef7 Clear the MAP_WIREFUTURE flag on the vm map in exec_new_vmspace() when it
recycles the current vm space.  Otherwise, an mlockall(MCL_FUTURE) could
still be in effect on the process after an execve(2), which violates the
specification for mlockall(2).

It's pointless for vm_map_stack() to check the MEMLOCK limit.  It will
never be asked to wire the stack.  Moreover, it doesn't even implement
wiring of the stack.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11421
2017-06-30 15:49:36 +00:00
jhb
131ea5a7f4 Store a 32-bit PT_LWPINFO struct for 32-bit process core dumps.
Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D11407
2017-06-29 21:31:13 +00:00
np
f043fa948f Adjust sowakeup post-r319685 so that it continues to make upcalls but
still avoids calling soconnected during sodisconnected.

Discussed with:	glebius@
Sponsored by:	Chelsio Communications
2017-06-29 19:43:27 +00:00
kib
f855d50bc8 Do not cast struct kevent_args or struct freebsd11_kevent_args to
struct g_kevent_args.

On some architectures, e.g. PowerPC, there is additional padding in uap.

Reported and tested by:	andreast
Sponsored by:	The FreeBSD Foundation
2017-06-29 14:40:33 +00:00
kib
adc5ef3aae Do not ignore an error from vm_mmap_object().
Found and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-27 20:12:13 +00:00
alc
535efc897a Address the remaining integer overflow issues with the "skip" parameters
and "next_skip" variables.  The "skip" value in struct blist has long been
a 64-bit quantity but various functions have implicitly truncated this
value to 32 bits.  Now, all arithmetic involving the "skip" value is 64
bits wide.  (This should allow us to relax the size limit on a swap device
in the swap pager.)

Maintain the ability to test this allocator as a user-space application by
including <stdbool.h>.

Remove an unused variable from blst_radix_print().

Reviewed by:	kib, markj
MFC after:	4 weeks
Differential Revision:	https://reviews.freebsd.org/D11358
2017-06-27 17:45:26 +00:00
cem
3521ec05c1 Fix one more place uio_resid is truncated to int
A follow-up to r231949 and r194990.

Reported by:	pho@
Reviewed by:	kib@, markj@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11373
2017-06-27 17:23:20 +00:00