Commit Graph

16401 Commits

Author SHA1 Message Date
mjg
2dadc8e0dd Revert "fork: fix use-after-free with vfork"
This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.

vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.

While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.

Revert the fix until a solution can be worked out.

Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.
2018-11-23 04:38:50 +00:00
mjg
42ea18a714 Annotate TDP_RFPPWAIT as unlikely.
The flag is only set on vfork, but is tested for *all* syscalls.
On amd64 this shortens common-case (not vfork) code.
2018-11-22 21:38:24 +00:00
mjg
09e7b78f66 fork: remove avoidable proc lock/unlock pair
We don't have to access the process after making it runnable, so there
is no need to hold it either.

Sponsored by:	The FreeBSD Foundation
2018-11-22 21:29:36 +00:00
mjg
75deef51a7 fork: fix use-after-free with vfork
The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.

Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18295
2018-11-22 21:08:37 +00:00
markj
5c563658ea Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
mjg
ffdee46ab5 uipc_usrreq: fix inode number assignment
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.

Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:25:05 +00:00
mjg
b51585a153 proc: update list manipulation comment on process exit
Processes stay in the hash until they get reaped.

This code does not unlink the child from the parent, so remove
the claim that it does.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:16:10 +00:00
mjg
6fd8f10bb4 uipc_shm: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-21 22:01:06 +00:00
mjg
b06fa0f93a proc: convert pfind & friends to use pidhash locks and other cleanup
pfind_locked is retired as it relied on allproc which unnecessarily
restricts locking of the hash.

Sponsored by:	The FreeBSD Foundation
2018-11-21 20:15:56 +00:00
mjg
71aabf21a1 proc: implement pid hash locks and an iterator
forks, exits and waits are frequently stalled during poudriere -j 128 runs
due to killpg and process list exports performed for each package.

Both uses take the allproc lock. The latter case can be modified to iterate
over the hash with finer grained locking instead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17817
2018-11-21 18:56:15 +00:00
markj
2e57d41f44 Avoid unsynchronized updates to kn_status.
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held.  For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.

Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter.  KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.

Reported and tested by:	Sylvain GALLIANO <sg@efficientip.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18060
2018-11-21 17:32:09 +00:00
markj
6994e92830 Remove KN_HASKQLOCK.
It is a write-only flag whose last use was removed in r302235.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18059
2018-11-21 17:28:10 +00:00
markj
0e3d68b2b4 Add a taskqueue_quiesce(9) KPI.
This is similar to taskqueue_drain_all(9) but will wait for the queue
to become idle before returning instead of only waiting for
already-enqueued tasks to finish.  This will be used in the opensolaris
compat layer.

PR:		227784
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17975
2018-11-21 17:18:27 +00:00
markj
62e656950f Clear pad bytes in the struct exported by kern.ntp_pll.gettime.
Reported by:	Thomas Barabosch, Fraunhofer FKIE
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-20 20:32:10 +00:00
mjg
b359994628 pipe: use unr64
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18054
2018-11-20 14:59:27 +00:00
mjg
b46ad9fa56 Implement unr64
Important users of unr like tmpfs or pipes can get away with just
ever-increasing counters, making the overhead of managing the state
for 32 bit counters a pessimization.

Change it to an atomic variable. This can be further sped up by making
the counts variable "allocate" ranges and store them per-cpu.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18054
2018-11-20 14:58:41 +00:00
bwidawsk
9f689ee7c3 acpi: fix acpi_ec_probe to only check EC devices
This patch utilizes the fixed_devclass attribute in order to make sure
other acpi devices with params don't get confused for an EC device.

The existing code assumes that acpi_ec_probe is only ever called with a
dereferencable acpi param. Aside from being incorrect because other
devices of ACPI_TYPE_DEVICE may be probed here which aren't ec devices,
(and they may have set acpi private data), it is even more nefarious if
another ACPI driver uses private data which is not dereferancable. This
will result in a pointer deref during boot and therefore boot failure.

On X86, as it stands today, no other devices actually do this (acpi_cpu
checks for PROCESSOR type devices) and so there is no issue. I ran into
this because I am adding such a device which gets probed before
acpi_ec_probe and sets private data. If ARM ever has an EC, I think
they'd run into this issue as well.

There have been several iterations of this patch. Earlier
iterations had ECDT enumerated ECs not call into the probe/attach
functions of this driver. This change was Suggested by: jhb@.

Reviewed by:    jhb
Approved by:	emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D16635
2018-11-19 18:29:03 +00:00
hselasky
aa82228844 Minor code factoring. No functional change.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-19 09:36:09 +00:00
hselasky
af549cba43 Be more verbose when a sysctl fails to unregister.
Print name of sysctl in question.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-19 09:35:16 +00:00
kbowling
59c7844347 Retire sbsndptr() KPI
As of r340465 all consumers use sbsndptr_adv and sbsndptr_noadv

Reviewed by:	gallatin
Approved by:	krion (mentor)
Differential Revision:	https://reviews.freebsd.org/D17998
2018-11-19 00:54:31 +00:00
mjg
4493b1d3a8 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
markj
f4acf54f49 Remove mostly-useless proc provider probes.
For some reason the proc UMA zone's ctor, dtor and init functions are
instrumented, but these functions are always available through FBT.
Moreover, the probes are not part of the original Solaris proc
provider, aren't documented, have no uses (e.g., in dwatch(8)) and
have no clear use to begin with.  Therefore, remove them.

Reviewed by:	rpaulo
Differential Revision:	https://reviews.freebsd.org/D2169
2018-11-15 23:02:59 +00:00
imp
27905292be Do proper conversion to/from sbt.
Doh! sbttoX and Xtosbt were backwards. While they ran, they produced
bogus results.

Pointy hat to: imp@
2018-11-15 16:02:24 +00:00
glebius
45d396c83a Initialize compatibility epoch tracker for thread0. Fixes
panics for drivers that call if_maddr_lock() during startup.

Reported by:	cy
2018-11-14 19:10:35 +00:00
brooks
d696b58dd0 Use the main capabilities.conf for freebsd32.
Allow the location of capabilities.conf to be configured.

Also allow a per-abi syscall prefix to be configured with the
abi_func_prefix syscalls.conf variable and check syscalls against
entries in capabilities.conf with and without the prefix amended.

Take advantage of these two features to allow use shared capabilities.conf
between the default syscall vector and the freebsd32 compatability
layer.  We've been inconsistent about keeping the two in sync as
evidenced by the bugs fixed in r340294.  This eliminates that problem
going forward.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17932
2018-11-14 00:46:02 +00:00
glebius
90bd0c6d35 Fix build on some architectures after r340413. On amd64 epoch.h
appeared to be included implicitly.
2018-11-14 00:33:03 +00:00
mmacy
e1aba317c2 epoch(9) revert r340097 - no longer a need for multiple sections per cpu
I spoke with Samy Bahra and recent changes to CK to make ck_epoch_call and
ck_epoch_poll not modify the record have eliminated the need for this.
2018-11-14 00:12:04 +00:00
glebius
9d29c246e4 style(9), mostly adjusting overly long lines. 2018-11-13 23:57:34 +00:00
glebius
64b7bc90bf With epoch not inlined, there is no point in using _lite KPI. While here,
remove some unnecessary casts.
2018-11-13 23:45:38 +00:00
glebius
db99058fbf The dualism between epoch_tracker and epoch_thread is fragile and
unnecessary. So, expose CK types to kernel and use a single normal
structure for epoch_tracker.

Reviewed by:	jtl, gallatin
2018-11-13 23:20:55 +00:00
glebius
4508d14d92 For compatibility KPI functions like if_addr_rlock() that used to have
mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.

Reviewed by:	markj
2018-11-13 22:58:38 +00:00
mjg
da67d6603f locks: plug warnings about unitialized variables
They only showed up after I redefined LOCKSTAT_ENABLED to 0.

doing_lockprof in mutex.c is a real (but harmless) bug. Should the
value be non-zero it will do checks for lock profiling which would
otherwise be skipped.

state in rwlock.c is a wart from the compiler, the value can't be
used if lock profiling is not enabled.

Sponsored by:	The FreeBSD Foundation
2018-11-13 21:29:56 +00:00
vangyzen
0b1c6c1fc3 Make no assertions about lock state when the scheduler is stopped.
Change the assert paths in rm, rw, and sx locks to match the lock
and unlock paths.  I did this for mutexes in r306346.

Reported by:	Travis Lane <tlane@isilon.com>
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-11-13 20:48:05 +00:00
glebius
fe830a037c Uninline epoch(9) entrance and exit. There is no proof that modern
processors would benefit from avoiding a function call, but bloating
code. In fact, clang created an uninlined real function for many
object files in the network stack.

- Move epoch_private.h into subr_epoch.c. Code copied exactly, avoiding
  any changes, including style(9).
- Remove private copies of critical_enter/exit.

Reviewed by:	kib, jtl
Differential Revision:	https://reviews.freebsd.org/D17879
2018-11-13 19:02:11 +00:00
markj
3ce6f385ad Allow allocations across meta boundaries.
Remove restrictions that prevent allocation requests to cross the
boundary between two meta nodes.

Replace the bmu_avail field in meta nodes with a bitmap that identifies
which subtrees have some free memory, and iterate over the nonempty
subtrees only in blst_meta_alloc.  If free memory is scarce, this should
make searching for it faster.

Put the code for handling the next-leaf allocation in a separate
function.  When taking blocks from the next leaf empties the leaf, be
sure to clear the appropriate bit in its parent, and so on, up to the
least-common ancestor of this leaf and the next.

Eliminate special terminator nodes, and rely instead on the fact that
there is a 0-bit at the end of the bitmask at the root of the tree that
will stop a meta_alloc search, or a next-leaf search, before the search
falls off the end of the tree. Make sure that the tree is big enough to
have space for that 0-bit.

Eliminate special all-free indicators.  Lazy initialization of subtrees
stands in the way of having an allocation span a meta-node boundary, so
a subtree of all free blocks is not treated specially.  Subtrees of
all-allocated blocks are still recognized by looking at the bitmask at
the root and finding 0.

Don't print all-allocated subtrees.  Do print the bitmasks for meta
nodes, when tree-printing.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12635
2018-11-13 18:40:01 +00:00
kevans
175621e50b Add dynamic_kenv assertion to init_static_kenv
Both to formally document the requirement that this not be called after the
dynamic kenv is setup, and to perhaps help static analyzers figure out
what's going on. While calling init_static_kenv this late isn't fatal, there
are some caveats that the caller should be aware of:

- Late calls are effectively a no-op, as far as default FreeBSD is
concerned, as everything will switch to searching the dynamic kenv once it's
available.

- Each of the kern_getenv calls will leak memory, as it's assumed that
these are searching static environment and allocations will not be made.

As such, this usage is not sensible and should be detected.
2018-11-13 04:34:30 +00:00
kib
a79b3edff0 Allow set ether/vlan PCP operation from the VNET jails.
The vlan interfaces can be created from vnet jails, it seems, so it
sounds logical to allow pcp configuration as well.

Reviewed by:	bz, hselasky (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17777
2018-11-12 15:59:32 +00:00
cem
4d82f7cb14 netdump: Fix netdumping with INVARIANTS kernels
Correct boneheaded assertion I added in r339501.  Mea culpa.

The intent is to notice when an M_WAITOK zone allocation would fail during
netdump, not to prevent all use of mbufs during netdump.

Reviewed by:	markj
X-MFC-With:	r339501
Differential Revision:	https://reviews.freebsd.org/D17957
2018-11-12 05:24:20 +00:00
kib
ce738ff2b3 Remove one-use variable.
This also removes a lot of #ifdefs and cleans up a warning when the
AUDIT kernel option is defined, but neither KDTRACE_HOOKS nor MAC are.

Reported and tested by:	danger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-11-11 00:21:28 +00:00
kib
b609d75f3a Allow absolute paths for O_BENEATH.
The path must have a tail which does not escape starting/topping
directory.  The documentation will come shortly, see the man pages
commit message for the reason of separate commit.

Reviewed by:	jilles (previous version)
Discussed with:	emaste
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17714
2018-11-11 00:04:36 +00:00
brooks
3f1281ac33 Fix freebsd32 mknod(at).
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument.  64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations.  They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips.  Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.

The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.

Use uint32_t consistently as the type of the old dev_t.  This matches
the old definition.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:01:16 +00:00
brooks
c1262215a1 Make freebsd32_umtx_op follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:46:10 +00:00
jhb
db3b3f63c4 Enable non-executable stacks by default on RISC-V.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17878
2018-11-07 18:32:02 +00:00
brooks
aece9729f0 Regen after r340221: allow pointer return types.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:56:07 +00:00
brooks
b47f3a5888 makesyscalls.sh: allow pointer return types.
The previous code required that the return type be a single word.  This
allows it to be a pointer without using a typedef.

Update the return types of break, mmap, and shmat to be void * as
declared.  This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:55:04 +00:00
markj
8ffe4c50ab Avoid fixing the tty_info() buffer size in tty.h.
Different compilation units may otherwise get a different view of the
layout of struct tty depending on whether they include opt_printf.h.
This caused a blowup in the number of types defined in the kernel's
CTF file after r339468; thanks to dim@ for bisecting down to that
revision.

PR:		232675
Reported by:	dim
Reviewed by:	cem (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17877
2018-11-06 23:41:44 +00:00
markj
c259f44e9a Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map.
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.

Reported by:	jhb
Reviewed by:	alc, jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17827
2018-11-06 21:57:03 +00:00
markj
aed0dab28a We need opt_stack.h after r339605.
Reviewed by:	cem
Sponsored by:	The FreeBSD Foundation
2018-11-06 21:47:22 +00:00
brooks
cd9ef76e47 Update some comments made obsolete by recent commits. 2018-11-06 20:45:15 +00:00
brooks
c2135dd9fd Regen after r340199: Use declared types for caddr_t arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:47:29 +00:00