Commit Graph

15312 Commits

Author SHA1 Message Date
Konstantin Belousov
987ff18184 Consistently handle negative or wrapping offsets in the mmap(2) syscalls.
For regular files and posix shared memory, POSIX requires that
[offset, offset + size) range is legitimate.  At the maping time,
check that offset is not negative.  Allowing negative offsets might
expose the data that filesystem put into vm_object for internal use,
esp. due to OFF_TO_IDX() signess treatment.  Fault handler verifies
that the mapped range is valid, assuming that mmap(2) checked that
arithmetic gives no undefined results.

For device mappings, leave the semantic of negative offsets to the
driver.  Correct object page index calculation to not erronously
propagate sign.

In either case, disallow overflow of offset + size.

Update mmap(2) man page to explain the requirement of the range
validity, and behaviour when the range becomes invalid after mapping.

Reported and tested by:	royger (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-02-12 21:05:44 +00:00
Konstantin Belousov
f2277b64ec Switch copyout_map() to use vm_mmap_object() instead of vm_mmap().
This is both a microoptimization and a move of the consumer to more
commonly used vm function.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-02-12 20:54:31 +00:00
Mateusz Guzik
c4a48867f1 lockmgr: implement fast path
The main lockmgr routine takes 8 arguments which makes it impossible to
tail-call it by the intermediate vop_stdlock/unlock routines.

The routine itself starts with an if-forest and reads from the lock itself
several times.

This slows things down both single- and multi-threaded. With the patch
single-threaded fstats go 4% up and multithreaded up to ~27%.

Note that there is still a lot of room for improvement.

Reviewed by:	kib
Tested by:	pho
2017-02-12 09:49:44 +00:00
John Baldwin
bb9b710477 Regenerate all the system call tables to drop "created from" lines.
One of the ibcs2 files contains some actual changes (new headers) as
it hasn't been regenerated after older changes to makesyscalls.sh.
2017-02-10 19:45:02 +00:00
John Baldwin
807a7231f2 Drop the "created from" line from files generated by makesyscalls.sh.
This information is less useful when the generated files are included in
source control along with the source.  If needed it can be reconstructed
from the $FreeBSD$ tag in the generated file.  Removing this information
from the generated output permits committing the generated files along
with the change to the system call master list without having inconsistent
metadata in the generated files.

Reviewed by:	emaste, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9497
2017-02-10 19:25:52 +00:00
Konstantin Belousov
e83a71c656 Fix r313495.
The file type DTYPE_VNODE can be assigned as a fallback if VOP_OPEN()
did not initialized file type.  This is a typical code path used by
normal file systems.

Also, change error returned for inappropriate file type used for
O_EXLOCK to EOPNOTSUPP, as declared in the open(2) man page.

Reported by:	cy, dhw, Iblis Lin <iblis@hs.ntnu.edu.tw>
Tested by:	dhw
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2017-02-10 14:49:04 +00:00
Konstantin Belousov
e628e1b919 Increase a chance of devfs_close() calling d_close cdevsw method.
If a file opened over a vnode has an advisory lock set at close,
vn_closefile() acquires additional vnode use reference to prevent
freeing the vnode in vn_close().  Side effect is that for device
vnodes, devfs_close() sees that vnode reference count is greater than
one and refuses to call d_close().  Create internal version of
vn_close() which can avoid dropping the vnode reference if needed, and
use this to execute VOP_CLOSE() without acquiring a new reference.

Note that any parallel reference to the vnode would still prevent
d_close call, if the reference is not from an opened file, e.g. due to
stat(2).

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-02-09 23:36:50 +00:00
Konstantin Belousov
7903b00087 Do not establish advisory locks when doing open(O_EXLOCK) or open(O_SHLOCK)
for files which do not have DTYPE_VNODE type.

Both flock(2) and fcntl(2) syscalls refuse to acquire advisory lock on
a file which type is not DTYPE_VNODE.  Do the same when lock is
requested from open(2).

Restructure the block in vn_open_vnode() which handles O_EXLOCK and
O_SHLOCK open flags to make it easier to quit its execution earlier
with an error.

Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-02-09 23:35:57 +00:00
Mateusz Guzik
8eaaf58a5f rwlock: fix r313454
The runlock slow path would update wrong variable before restarting the
loop, in effect corrupting the state.

Reported by:	pho
2017-02-09 13:32:19 +00:00
Mateusz Guzik
3b3cf014fc locks: tidy up unlock fallback paths
Update comments to note these functions are reachable if lockstat is
enabled.

Check if the lock has any bits set before attempting unlock, which saves
an unnecessary atomic operation.
2017-02-09 08:19:30 +00:00
Mateusz Guzik
834f70f32f sx: implement slock/sunlock fast path
See r313454.
2017-02-08 19:29:34 +00:00
Mateusz Guzik
b0a61642d4 rwlock: implemenet rlock/runlock fast path
This improves singlethreaded throughput on my test machine from ~247 mln
ops/s to ~328 mln.

It is mostly about avoiding the setup cost of lockstat.

Reviewed by:	jhb (previous version)
2017-02-08 19:28:46 +00:00
John Baldwin
885f13dc96 Copy the e_machine and e_flags fields from the binary into an ELF core dump.
In the kernel, cache the machine and flags fields from ELF header to use in
the ELF header of a core dump. For gcore, the copy these fields over from
the ELF header in the binary.

This matters for platforms which encode ABI information in the flags field
(such as o32 vs n32 on MIPS).

Reviewed by:	kib
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D9392
2017-02-07 20:34:03 +00:00
Emmanuel Vadot
beaa6e1e64 subr_sfbus.c need sys/proc.h for struct thread definition.
This fixes kernel build for armv6.

Discussed with: kib
2017-02-07 17:31:24 +00:00
Mateusz Guzik
dbccc8105c rwlock: implement RW_LOCK_WRITER_RECURSED bit
This moves recursion handling out of the inlined wunlock path and in
particular saves a read and a branch.

Discussed with:
2017-02-07 17:04:31 +00:00
Mateusz Guzik
f743ea9638 Bump struct thread alignment to 32.
This gives additional bits to use in locking primitives which store
the lock thread pointer in the lock value.

Discussed with:	kib
2017-02-07 17:03:22 +00:00
Mateusz Guzik
3c798b2b1f locks: follow up r313386
Unfinished diff was committed by accident. The loop in lock_delay
was changed to decrement, but the loop iterator was still incrementing.
2017-02-07 16:01:07 +00:00
Mateusz Guzik
8e5a3e9a9d locks: change backoff to exponential
Previous implementation would use a random factor to spread readers and
reduce chances of starvation. This visibly reduces effectiveness of the
mechanism.

Switch to the more traditional exponential variant. Try to limit starvation
by imposing an upper limit of spins after which spinning is half of what
other threads get. Note the mechanism is turned off by default.

Reviewed by:	kib (previous version)
2017-02-07 14:49:36 +00:00
Edward Tomasz Napierala
1110d0029a Make root_mount_hold() work after boot. This is important for two
reasons. First is rerooting into USB-mounted device that happens
to be not yet enumerated. The second is when mounting with (non-root)
filesystem on USB device on a hub that's enumerated later than the root
mount: the rc scripts explicitly mount for the root mount holds to be
released, but each USB bus takes the hold asynchronously, and if that
happens after root mount, it would just get ignored.

Reviewed by:	marcel
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9388
2017-02-06 20:44:34 +00:00
Edward Tomasz Napierala
4f9d7bad48 In r290196 the root mount hold mechanism was changed to make it not wait
for mount hold release if the root device already exists.  So, unless your
rootdev is not on USB - ie in the usual case - the root mount won't wait
for USB.  However, the old behaviour was sometimes used as "wait until USB
is fully enumerated", and r290196 broke that.

This commit adds vfs.root_mount_always_wait tunable, to force the kernel
to always wait for root mount holds, even if the root is already there.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9387
2017-02-06 20:36:59 +00:00
Andrew Turner
c0d5237034 Only allow the pic type to be either a PIC or MSI type. All interrupt
controller drivers handle either MSI/MSI-X interrupts, or regular
interrupts, as such enforce this in the interrupt handling framework.
If a later driver was to handle both it would need to create one of each.

This will allow future changes to allow the xref space to overlap, but
refer to different drivers.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
X-Differential Revision:	https://reviews.freebsd.org/D8616
2017-02-06 13:08:48 +00:00
Mateusz Guzik
c1aaf63cb5 locks: fix recursion support after recent changes
When a relevant lockstat probe is enabled the fallback primitive is called with
a constant signifying a free lock. This works fine for typical cases but breaks
with recursion, since it checks if the passed value is that of the executing
thread.

Read the value if necessary.
2017-02-06 09:40:14 +00:00
Mateusz Guzik
993ddec44d rwlock: move lockstat handling out of inline primitives
See r313275 for details.

One difference here is that recursion handling was removed from the fallback
routine. As it is it was never supposed to see a recursed lock in the first
place. Future changes will move it out of inline variants, but right now
there is no easy to way to test if the lock is recursed without reading
additional words.
2017-02-05 13:37:23 +00:00
Edward Tomasz Napierala
96ee43103d Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(),
and use it in compats instead of their sys_*() counterparts.

Reviewed by:	kib, jhb, dchagin
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9383
2017-02-05 13:24:54 +00:00
Mateusz Guzik
6ebb77b6a6 sx: move lockstat handling out of inline primitives
See r313275 for details.
2017-02-05 09:54:16 +00:00
Mateusz Guzik
dc0896512c mtx: fixup r313278, the assignemnt was supposed to go inside the loop 2017-02-05 09:53:13 +00:00
Mateusz Guzik
cae4ab7f37 mtx: fix up _mtx_obtain_lock_fetch usage in thread lock
Since _mtx_obtain_lock_fetch no longer sets the argument to MTX_UNOWNED,
callers have to do it on their own.
2017-02-05 09:35:17 +00:00
Mateusz Guzik
08da267775 mtx: move lockstat handling out of inline primitives
Lockstat requires checking if it is enabled and if so, calling a 6 argument
function. Further, determining whether to call it on unlock requires
pre-reading the lock value.

This is problematic in at least 3 ways:
- more branches in the hot path than necessary
- additional cacheline ping pong under contention
- bigger code

Instead, check first if lockstat handling is necessary and if so, just fall
back to regular locking routines. For this purpose a new macro is introduced
(LOCKSTAT_PROFILE_ENABLED).

LOCK_PROFILING uninlines all primitives. Fold in the current inline lock
variant into the _mtx_lock_flags to retain the support. With this change
the inline variants are not used when LOCK_PROFILING is defined and thus
can ignore its existence.

This results in:
   text	   data	    bss	    dec	    hex	filename
22259667	1303208	4994976	28557851	1b3c21b	kernel.orig
21797315	1303208	4994976	28095499	1acb40b	kernel.patched

i.e. about 3% reduction in text size.

A remaining action is to remove spurious arguments for internal kernel
consumers.
2017-02-05 08:04:11 +00:00
Mateusz Guzik
3ae56ce958 sx: add witness support missed in r313272 2017-02-05 06:51:45 +00:00
Mateusz Guzik
9d2e4290ff sx: uninline slock/sunlock
Shared locking routines explicitly read the value and test it. If the
change attempt fails, they fall back to a regular function which would
retry in a loop.

The problem is that with many concurrent readers the risk of failure is pretty
high and even the value returned by fcmpset is very likely going to be stale
by the time the loop in the fallback routine is reached.

Uninline said primitives. It gives a throughput increase when doing concurrent
slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s.

Interestingly, rwlock primitives are already not inlined.
2017-02-05 05:20:29 +00:00
Mateusz Guzik
fa47404353 sx: switch to fcmpset
Discussed with:	jhb
Tested by:	pho (previous version)
2017-02-05 04:54:20 +00:00
Mateusz Guzik
c84f347985 rwlock: switch to fcmpset
Discussed with:	jhb
Tested by:	pho
2017-02-05 04:53:13 +00:00
Mateusz Guzik
90836c3270 mtx: switch to fcmpset
The found value is passed to locking routines in order to reduce cacheline
accesses.

mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures
the routine can fail even if the lock could have been handled by the inline
primitive.

Discussed with:	jhb
Tested by:	pho (previous version)
2017-02-05 03:26:34 +00:00
Mateusz Guzik
2d78a5531e vfs: use atomic_fcmpset in vfs_refcount_* 2017-02-05 03:23:16 +00:00
Mark Johnston
69d2418faa Make witness_warn() always print to the console.
witness_warn() either breaks into the debugger or panics the system, so its
output should go to the console regardless of the witness(4) output channel
configuration.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-02-05 02:27:04 +00:00
Mateusz Guzik
3a2f282532 fd: switch fget_unlocked to atomic_fcmpset 2017-02-05 01:40:27 +00:00
Jason A. Harmening
ad62ba6e96 Revert r313037
The switch to get_pcpu() in MI code seems to cause hangs on MIPS.
Back out until we can get a better idea of what's happening there.

Reported by:	kan, lidl
2017-02-04 06:24:49 +00:00
Hartmut Brandt
4b481ba0ed Merge filt_soread and filt_solisten and decide what to do when checking
for EVFILT_READ at the point of the check not when the event is registers.
This fixes a problem with asio when accepting a connection.

Reviewed by:	kib@, Scott Mitchell
2017-02-01 13:12:07 +00:00
Jason A. Harmening
65ed483615 Implement get_pcpu() for the remaining architectures and use it to
replace pcpu_find(curcpu) in MI code.
2017-02-01 03:32:49 +00:00
Edward Tomasz Napierala
b38b22b0b2 Add kern_pread() and kern_pwrite(), and use it in compats instead
of their sys_*() counterparts. The svr4 is left unchanged.

Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9379
2017-01-31 15:35:18 +00:00
Edward Tomasz Napierala
fc8bde8ffe Replace calls to sys_truncate() with kern_truncate().
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9371
2017-01-31 15:19:44 +00:00
Edward Tomasz Napierala
ea2ebdc19e Add kern_cpuset_getid() and kern_cpuset_setid(), and use them
in compat32 instead of their sub_*() counterparts.

Reviewed by:	jhb@, kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9382
2017-01-31 15:11:23 +00:00
Andriy Gapon
826b3d3187 put very expensive sanity checks of advisory locks under DIAGNOSTIC
The checks have quadratic complexity over a number of advisory locks
active for a file and that could be a lot.  What's the worse is that the
checks are done while holding ls_lock.  That could lead to a long a very
long backlog and performance degradation even if all requested locks are
compatible (e.g. all shared locks).

The checks used to be under INVARIANTS.

Discussed with:	kib
MFC after:	2 weeks
Sponsored by:	Panzura
2017-01-30 15:20:13 +00:00
Edward Tomasz Napierala
d293f35c09 Add kern_listen(), kern_shutdown(), and kern_socket(), and use them
instead of their sys_*() counterparts in various compats. The svr4
is left untouched, because there's no point.

Reviewed by:	ed@, kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9367
2017-01-30 12:57:22 +00:00
Edward Tomasz Napierala
f67d6b5f12 Add kern_lseek() and use it instead of sys_lseek() in various compats.
I didn't touch svr4/, there's no point.

Reviewed by:	ed@, kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9366
2017-01-30 12:24:47 +00:00
Edward Tomasz Napierala
ae6b6ef6cb Replace sys_ftruncate() with kern_ftruncate() in various compats.
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9368
2017-01-30 11:50:54 +00:00
Mateusz Guzik
dfecf51dd0 cache: use vrefact for '.' lookups and refing the rdir in fullpath 2017-01-30 03:20:05 +00:00
Mateusz Guzik
3071469d57 fd: sprinkle __read_mostly and __exclusive_cache_line 2017-01-30 03:07:32 +00:00
Baptiste Daroussin
b4b4b5304b Revert crap accidentally committed 2017-01-28 16:31:23 +00:00
Baptiste Daroussin
814aaaa7da Revert r312923 a better approach will be taken later 2017-01-28 16:30:14 +00:00