Commit Graph

16434 Commits

Author SHA1 Message Date
Konstantin Belousov
eba8ab0e3e Remove special case handling for getfhat(fd, NULL, handle).
There is no reason for it to behave differently from openat(fd, NULL).
Also the handling did not worked because the substituted path was from
the system address space, causing EFAULT.

Submitted by:	Jack Halford <jack@gandi.net>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18501
2018-12-11 02:48:49 +00:00
John Baldwin
c5786670ac Don't report stale signal information for non-signal events in ptrace_lwpinfo.
Once a signal's siginfo was copied to 'td_si' as part of the signal
exchange in issignal(), it was never cleared.  This caused future
thread events that are reported as SIGTRAP events without signal
information to report the stale siginfo in 'td_si'.  For example, if a
debugger created a new process and used SIGSTOP to stop it after
PT_ATTACH, future system call entry / exit events would set PL_FLAG_SI
with the SIGSTOP siginfo in pl_siginfo.  This broke 'catch syscall' in
current versions of gdb as it assumed PL_FLAG_SI with SIGTRAP
indicates a breakpoint or single step trap.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18487
2018-12-10 19:39:24 +00:00
Alan Cox
2905d1ceaf blst_leaf_alloc updates bighint for a leaf when an allocation is successful
and includes the last block represented by the leaf.  The reasoning is that,
if the last block is included, then there must be no solution before that
one in the leaf, so the leaf cannot provide an allocation that big again;
indeed, the leaf cannot provide a solution bigger than range1.

Which is all correct, except that if the value of blk passed in did not
represent the first block of the leaf, because the cursor was pointing to
the middle of the leaf, then a possible solution before the cursor may have
been ignored, and bighint cannot be updated.

Consider the sequence allocate 63 (returning address 0), free 0,63 (freeing
that same block, and allocate 1 (returning 63).  The result is that one
block is allocated from the first leaf, and the value of bighint is 0, so
that nothing can be allocated from that leaf until the only block allocated
from that leaf is freed.  This change detects that skipped-over solution,
and when there is one it makes sure that the value of bighint is not changed
when the last block is allocated.

Submitted by:	Doug Moore <dougm@rice.edu>
Tested by:	pho
X-MFC with:	r340402
Differential Revision:	https://reviews.freebsd.org/D18474
2018-12-09 17:55:10 +00:00
Mateusz Guzik
6017827676 umtx: avoid umtxshm locking on object termination if possible
Sample build world result on tmpfs:
kern.ipc.umtx_terminate_notempty: 0
kern.ipc.umtx_terminate_empty: 2891815

Sponsored by:	The FreeBSD Foundation
2018-12-08 14:04:57 +00:00
Mateusz Guzik
b0b246b0ba Remove proctree acquire from note_procstat_proc
It is not needed since r340482 ("proc: always store parent pid in p_oppid")

Sponsored by:	The FreeBSD Foundation
2018-12-08 11:38:39 +00:00
Mateusz Guzik
eab2132ad9 Fix a corner case in ID bitmap management.
If all IDs from trypid to pid_max were used as pids, the code would enter
a loop which would be infinite if none of the IDs could become free (e.g.
they all belong to processes which did not transitioned to zombie).

Fixes:	r341684 ("Manage process-related IDs with bitmaps")

Sponsored by:	The FreeBSD Foundation
2018-12-08 10:22:12 +00:00
Mateusz Guzik
e52327e3c5 proc: postpone proc unlock until after reporting with kqueue
kqueue would always relock immediately afterwards.

While here drop the NULL check for list itself. The list is
always allocated.

Sponsored by:	The FreeBSD Foundation
2018-12-08 06:34:12 +00:00
Mateusz Guzik
eadb1dcb71 proc: handle sdt exit probe before taking the proc lock
Sponsored by:	The FreeBSD Foundation
2018-12-08 06:31:43 +00:00
Mateusz Guzik
13a45e4b14 Provide SDT_PROBES_ENABLED macro.
Sponsored by:	The FreeBSD Foundation
2018-12-08 06:30:41 +00:00
Konstantin Belousov
18519f1583 Simplify kern_readlink_vp().
When we detected that the vnode is not symlink, return immediately.
This moves the readlink code out of else branch and unindents it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-12-07 23:07:51 +00:00
Konstantin Belousov
978f879483 Fix expression evaluation.
Braces were put in the wrong place, causing failing EAGAIN check to
return zero result.  Remove the problematic assignment from the
conditional expression at all.

While there, remove used once variable vp, and wrap too long line.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-12-07 23:05:12 +00:00
Mateusz Guzik
08d005e6a3 fd: use racct_set_unlocked
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:51:38 +00:00
Mateusz Guzik
448db4f761 racct: add RACCT_ENABLED macro and racct_set_unlocked
This allows to remove PROC_LOCK/UNLOCK pairs spread thorought the kernel
only used to appease racct_set.

Sponsored by:	The FreeBSD Foundation
2018-12-07 16:47:34 +00:00
Mateusz Guzik
82f4b82634 fd: try do less work with the lock in dup
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:44:52 +00:00
Mateusz Guzik
6ff4688b09 Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:11:45 +00:00
Konstantin Belousov
fd52edaf70 Regen. 2018-12-07 15:19:00 +00:00
Konstantin Belousov
d1fd400a80 Add new file handle system calls.
Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by:	Jack Halford <jack@gandi.net>
Sponsored by:	Gandi.net
Tested by:	pho
Feedback from:	brooks, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18359
2018-12-07 15:17:29 +00:00
Mateusz Guzik
b1fbffe73c proc: when exiting move to zombproc before taking proctree
The kernel was already doing this prior to r329615. It was changed
to reduce contention on allproc. However, introduction of pidhash
locks and removal of proctree -> allproc ordering from fork thanks
to bitmaps fixed things enough to make this change pessimal.

waitpid takes proctree on each call and this change (now) causes
avoidable stalls if allproc is held.

Sponsored by:	The FreeBSD Foundation
2018-12-07 12:32:25 +00:00
Mateusz Guzik
34ebdceac0 Manage process-related IDs with bitmaps
Currently unique pid allocation on fork often requires a full walk of
process, group, session lists to make sure it is not used by anything.
This has a side effect of requiring proctree to be held along with allproc,
which adds more contention in poudriere -j 128.

The patch below implements trivial bitmaps which gets rid of the problem.
Dedicated lock is introduced to manage IDs.

While here a bug was discovered: all processes would inherit reap id from
the first process spawned by init. This had a side effect of keeping the
ID used and when allocation rolls over to the beginning it keeps being
skipped.

The patch is loosely based on initial work by mjoras@.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2018-12-07 12:22:32 +00:00
Mateusz Guzik
6e8c1ccbe2 Annotate Giant drop/pickup macros with __predict_false
They are used in important places of the kernel with the lock not being held
majority of the time.

Sponsored by:	The FreeBSD Foundation
2018-12-07 12:06:03 +00:00
Mark Johnston
afde86eba3 Let kern.trap_enotcap be set as a tunable.
This is handy for testing programs that are run by rc.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-12-06 17:29:37 +00:00
Brooks Davis
827c3852fe Further simplify arguments to init.
With the removal of BOOTCDROM and fastboot support, this code always
passed "-s" or "--". The latter simply terminates getopt(3) processing
in init so we only need to pass "-s" in the single user case, or nothing
in other cases.

The passing of "--" seems to have been done to ensure that the number of
arguments passed to init was always the same and thus that argc was the
same.

Also GC the write-only variable pathlen (not in reviewed version).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18441
2018-12-05 19:18:16 +00:00
Alan Cox
749cdf6f3b Terminate a blist_alloc search when a blst_meta_alloc call fails with
cursor == 0.

Every call to blst_meta_alloc but the one at the root is made only when the
meta-node is known to include a free block, so that either the allocation
will succeed, the node hint will be updated, or the last block of the meta-
node range is, and remains, free.  But the call at the root is made without
checking that there is a free block, so in the case that every block is
allocated, there is no hint update to prevent the current code from looping
forever.

Submitted by:	Doug Moore <dougm@rice.edu>
Reported by:	pho
Reviewed by:	pho
Tested by:	pho
X-MFC with:	r340402
Differential Revision:	https://reviews.freebsd.org/D17999
2018-12-05 18:26:40 +00:00
Brooks Davis
68ea829fe7 Remove never enabled support for "fastboot".
This has been ifdef notyet since the import of BSD 4.4 Lite Kernel
Sources in r1541.

Sponsored by:	DARPA, AFRL
2018-12-05 17:35:15 +00:00
Brooks Davis
7a5db3a770 Remove ifdef BOOTCDROM option to start init.
When BOOTCDROM is defined (via CFLAGS as there is no config option)
it causes -C to be passed to init, but our init and the version of
sysinstall I glanced at in 6.x don't support -C. The last plausibly
related support was removed from the tree in 1995.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18431
2018-12-05 17:29:14 +00:00
Mateusz Guzik
f26db6948d sx: retire SX_NOADAPTIVE
The flag is not used by anything for years and supporting it requires an
explicit read from the lock when entering slow path.

Flag value is left unused on purpose.

Sponsored by:	The FreeBSD Foundation
2018-12-05 16:43:03 +00:00
Brooks Davis
41f7b25317 Remove NOARGS from oaccept.
This was in the orignal patch, but lost in a rebase.

Reported by:	andrew
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 21:56:45 +00:00
Brooks Davis
63de13cfee Regen after r341474: Normalize COMPAT_43 syscall declarations. 2018-12-04 16:49:14 +00:00
Brooks Davis
d48719bd96 Normalize COMPAT_43 syscall declarations.
Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept
declare o<foo>_args structs rather than non-compat ones. Due to a
failure to use NOARGS in most cases this adds only one new declaration.

No changes required in freebsd32 as only ogetpagesize() is implemented
and it has a 32-bit specific implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 16:48:47 +00:00
Brooks Davis
3a325dec32 Remove a needlessly clever hack to start init with sys_exec().
Construct a struct image_args with the help of new exec_args_*() helper
functions and call kern_execve().

The previous code mapped a page in userspace, copied arguments out
to it one at a time, and then constructed a struct execve_args all so
that sys_execve() can call exec_copyin_args() to copy the data back in
to a struct image_args.

Opencode the part of pre_execve()/post_execve() that releases a
reference to the initial vmspace. We don't need to stop threads like
they do.

Reviewed by:	kib, jhb (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15469
2018-12-04 00:15:47 +00:00
Mark Johnston
02164d3603 Add a missing definition for the !COMPAT_FREEBSD32 case.
Reported by:	jenkins
MFC with:	r341442
Sponsored by:	The FreeBSD Foundation
2018-12-03 21:07:10 +00:00
Mark Johnston
352aaa5122 Plug memory disclosures via ptrace(2).
On some architectures, the structures returned by PT_GET*REGS were not
fully populated and could contain uninitialized stack memory.  The same
issue existed with the register files in procfs.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Security:	kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18421
2018-12-03 20:54:17 +00:00
Konstantin Belousov
200bf72793 Correct accuracy of the barrier writes accounting.
Discussed with:	mckusick
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-12-02 12:53:39 +00:00
Eric van Gyzen
5e38e3f5eb Include path for tmpfs objects in vm.objects sysctl
This applies the fix in r283924 to the vm.objects sysctl
added by r283624 so the output will include the vnode
information (i.e. path) for tmpfs objects.

Reviewed by:	kib, dab
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:59:43 +00:00
Brooks Davis
f373437a01 Add helper functions to copy strings into struct image_args.
Given a zeroed struct image_args with an allocated buf member,
exec_args_add_fname() must be called to install a file name (or NULL).
Then zero or more calls to exec_args_add_env() followed by zero or
more calls to exec_args_add_env(). exec_args_adjust_args() may be
called after args and/or env to allow an interpreter to be prepended to
the argument list.

To allow code reuse when adding arg and env variables, begin_envv
should be accessed with the accessor exec_args_get_begin_envv()
which handles the case when no environment entries have been added.

Use these functions to simplify exec_copyin_args() and
freebsd32_exec_copyin_args().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15468
2018-11-29 21:00:56 +00:00
Konstantin Belousov
7d2b0bd7d7 If BENEATH is specified, always latch the topping directory vnode.
It is possible that we started with a relative path but during the
lookup, found an absolute symlink.  In this case, BENEATH handling
code needs the latch, but it is too late to calculate it.

While there, somewhat improve the assertions.  Clear the NI_LCF_LATCH
flag when the latch vnode is released, so that asserts know the state.
Assert that there is a latch if we entered beneath+abs path mode,
after the starting point is processed.

Reported by:	wulf
With more input from:	pho
Sponsored by:	The FreeBSD Foundation
2018-11-29 19:13:10 +00:00
Mateusz Guzik
1f6ad48c76 vfs: fix i386 build after r341220 2018-11-29 09:54:27 +00:00
Mateusz Guzik
22443809ff cache: retire cache_enter compat schim
It was added over 6 years ago for binary compat. cache_enter macro remains
as it expands to cache_enter_time.

Sponsored by:	The FreeBSD Foundation
2018-11-29 09:32:59 +00:00
Mateusz Guzik
712775843f vfs: drop spurious memcpy in stat
Sponsored by:	The FreeBSD Foundation
2018-11-29 09:04:10 +00:00
Mateusz Guzik
d47f3fdb0a fd: unify fd range check across the routines
While here annotate out of range as unlikely.

Sponsored by:	The FreeBSD Foundation
2018-11-29 08:53:39 +00:00
Mateusz Guzik
eec8d0a378 Convert racct_enable to bool and annotate as __read_frequently
Sponsored by:	The FreeBSD Foundation
2018-11-29 05:17:16 +00:00
Mateusz Guzik
64cf6a62d4 Deinline racct throttling out of syscall exit path.
racct is not enabled by default and even when it is enabled processes are
typically not throttled. The order of checks is left unchanged since
racct_enable will be annotated as __read_frequently, while checking for the
flag in the processes would probably require an extra fetch.

Sponsored by:	The FreeBSD Foundation
2018-11-29 05:08:46 +00:00
Mateusz Guzik
e272bf479b Annotate td_cowgen check as unlikely.
Sponsored by:	The FreeBSD Foundation
2018-11-29 04:48:22 +00:00
Mateusz Guzik
3277792bde Tidy up hardclock.
- use fcmpset for updating ticks
- move (rarely used) itimer handling to a dedicated function

Sponsored by:	The FreeBSD Foundation
2018-11-29 03:44:02 +00:00
Mateusz Guzik
1e9a1bf589 proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock
waitpid always takes proctree to evaluate the list, but only takes allproc
if it can reap. With this patch allproc is no longer taken, which helps during
poudriere -j 128.

Discussed with: kib
Sponsored by:	The FreeBSD Foundation
2018-11-29 02:52:08 +00:00
Konstantin Belousov
affd918514 Improve sigonstack().
Avoid relying on unsigned overflow for the test.
Simplify expressions to avoid duplicate check for the range.
Style.
Add herald comment.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18361
2018-11-27 19:50:58 +00:00
Jamie Gritton
b307954481 In hardened systems, where the security.bsd.unprivileged_proc_debug sysctl
node is set, allow setting security.bsd.unprivileged_proc_debug per-jail.
In part, this is needed to create jails in which the Address Sanitizer
(ASAN) fully works as ASAN utilizes libkvm to inspect the virtual address
space. Instead of having to allow unprivileged process debugging for the
entire system, allow setting it on a per-jail basis.

The sysctl node is still security.bsd.unprivileged_proc_debug and the
jail(8) param is allow.unprivileged_proc_debug. The sysctl code is now a
sysctl proc rather than a sysctl int. This allows us to determine setting
the flag for the corresponding jail (or prison0).

As part of the change, the dynamic allow.* API needed to be modified to
take into account pr_allow flags which may now be disabled in prison0.
This prevents conflicts with new pr_allow flags (like that of vmm(4)) that
are added (and removed) dynamically.

Also teach the jail creation KPI to allow differences for certain pr_allow
flags between the parent and child jail. This can happen when unprivileged
process debugging is disabled in the parent prison, but enabled in the
child.

Submitted by:	Shawn Webb <lattera at gmail.com>
Obtained from:	HardenedBSD (45b3625edba0f73b3e3890b1ec3d0d1e95fd47e1, deba0b5078cef0faae43cbdafed3035b16587afc, ab21eeb3b4c72f2500987c96ff603ccf3b6e7de8)
Relnotes:	yes
Sponsored by:	HardenedBSD and G2, Inc
Differential Revision:	https://reviews.freebsd.org/D18319
2018-11-27 17:51:50 +00:00
Eric van Gyzen
607a0eb2f1 Remove superfluous bzero in getcontext/swapcontext/sendsig
We zero the whole structure; we don't need to zero the __spare__ field again.

Remove trailing whitespace.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-11-26 20:56:05 +00:00
Alan Somers
72bce9fff6 vfs_aio.c: rename "physio" symbols to "bio".
aio has two paths: an asynchronous "physio" path and a synchronous path.
Confusingly, physio(9) isn't actually used by the "physio" path, and never
has been.  In fact, it may even be called by the synchronous path!  Rename
the "physio" path to the "bio" path to reflect what it actually does:
directly compose BIOs and send them to character devices.

MFC after:	2 weeks
2018-11-26 18:31:00 +00:00
Alan Cox
ee73fef96e blist_meta_alloc assumes that mask=scan->bm_bitmap is nonzero. But if the
cursor lies in the middle of the space that the meta node represents, then
blanking the low bits of mask may make it zero, and break later code that
expects a nonzero value.  Add a test that returns failure if the mask has
been cleared.

Submitted by:	Doug Moore <dougm@rice.edu>
Reported by:	pho
Tested by:	pho
X-MFC with:	r340402
Differential Revision:	https://reviews.freebsd.org/D18058
2018-11-24 21:52:10 +00:00