16189 Commits

Author SHA1 Message Date
Mark Johnston
9295517ac9 Add a FALLTHROUGH comment to kvprintf().
Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	3 days
2018-07-17 14:56:54 +00:00
Mariusz Zaborski
f1fe1e020f Extend amount of possible coredumps from 10 to 100000 when using index format.
The amount of digits in the name of corefile is assigned dynamically.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D16118
2018-07-15 17:10:12 +00:00
Mateusz Guzik
95ab076d6e lockmgr: tidy up slock/sunlock similar to other locks 2018-07-13 22:40:14 +00:00
Warner Losh
25bc561e68 There's two files in the sys tree named inflate.c, in addition
to it being a common name elsewhere. Rename the old kzip one
to subr_inflate.c.

This actually fixes the build issues on sparc64 that my inclusion of
.PATH ${SYSDIR}/kern created in r336244, so also revert the broken
workaround I committed in r336249.

This slipped passed me because apparently, I never did a clean build.
2018-07-13 17:41:28 +00:00
Warner Losh
52379d36a9 Create helper functions for parsing boot args.
boot_parse_arg		to parse a single arg
boot_parse_cmdline	to parse a command line string
boot_parse_args		to parse all the args in a vector
boot_howto_to_env	Convert howto bits to env vars
boot_env_to_howto	Return howto mask mased on what's set in the environment.

All these routines return an int that's the bitmask of the args
translated to RB_* flags. As a special case, the 'S' flag sets the
comconsole_speed env var. Any arg that looks like a=b will set the env
key 'a' to value 'b'. If =b is omitted, 'a' is set to '1'.  This
should help us reduce the number of redundant copies of these routines
in the tree.  It should also give a more uniform experience between
platforms.

Also, invent a new flag RB_PROBE that's set when 'P' is parsed.  On
x86 + BIOS, this means 'probe for the keyboard, and if it's not there
set both RB_MULTIPLE and RB_SERIAL (which means show the output on
both video and serial consoles, but make serial primary).  Others it
may be some similar concept of probing, but it's loader dependent
what, exactly, it means.

These routines are suitable for /boot/loader and/or the kernel,
though they may not be suitable for the tightly hand-rolled-for-space
environments like boot2.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16205
2018-07-13 16:43:05 +00:00
Brooks Davis
d92da75941 Round down the location of execpathp to slightly improve copyout speed.
In practice, this moves the padding from below the canary to above
execpathp has no impact on stack consumption.

Submitted by:		Wuyang-Chung (via github pull request #159)
MFC after:	1 week
2018-07-13 11:32:27 +00:00
Mateusz Guzik
bcbc8d35eb fd: stop passing M_ZERO to uma_zalloc
The optimisation seen with malloc cannot be used here as zone sizes are
now known at compilation. Thus bzero by hand to get the optimisation
instead.
2018-07-12 22:48:18 +00:00
Kyle Evans
44314c3509 kern_environment: Give the static environment a chance to disable MD env
This variable has been given the name "loader_env.disabled" as it's the
primary way most people will have an MD environment. This restores the
previously-default behavior of ignoring the loader(8) environment, which may
be useful for vendor distributions or other scenarios where inheriting the
loader environment may be considered a security issue or potentially
breaking of a more locked-down environment.

As the change to config(5) indicates, disabling the loader environment
should not be a choice made lightly since it may provide ACPI hints and
other useful things that the system can rely on to boot.

An UPDATING entry has been added to mention an upgrade path for those that
may have relied on the previous behavior.

Discussed with:	bde
Relnotes:	yes (maybe)
2018-07-12 02:51:50 +00:00
Alan Somers
8a894c1aa1 Don't acquire evclass_lock with a spinlock held
When the "pc" audit class is enabled and auditd is running, witness will
panic during thread exit because au_event_class tries to lock an rwlock
while holding a spinlock acquired upstack by thread_exit.

To fix this, move AUDIT_SYSCALL_EXIT futher upstack, before the spinlock is
acquired. Of thread_exit's 16 callers, it's only necessary to call
AUDIT_SYSCALL_EXIT from two, exit1 (for exiting processes) and kern_thr_exit
(for exiting threads). The other callers are all kernel threads, which
needen't call AUDIT_SYSCALL_EXIT because since they can't make syscalls
there will be nothing to audit.  And exit1 already does call
AUDIT_SYSCALL_EXIT, making the second call in thread_exit redundant for that
case.

PR:		228444
Reported by:	aniketp
Reviewed by:	aniketp, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16210
2018-07-11 19:38:42 +00:00
Brooks Davis
942ae5c8b8 Regen after r336171. 2018-07-10 14:04:52 +00:00
Brooks Davis
7cc923f8a8 Get rid of netbsd_lchown and netbsd_msync syscall entries.
No valid FreeBSD binary very called them (they would call lchown and
msync directly) and we haven't supported NetBSD binaries in ages.

This is a respin of r335983 with a workaround for the ancient BFD linker
in the libc stubs.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16193
2018-07-10 13:32:04 +00:00
Brooks Davis
3a20f06a1c Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
Kyle Evans
c7a82b9c6c kern_environment: bool'itize dynamic_kenv; fix small style(9) nit 2018-07-10 02:43:22 +00:00
Kyle Evans
0f46005e4b subr_hints: Skip static_env and static_hints if they don't contain hints
This is possible because, well, they're static. Both the dynamic environment
and the MD-environment (generally loader(8) environment) can potentially
have room for new variables to be set, and thus do not receive this
treatment.
2018-07-10 00:36:37 +00:00
Kyle Evans
dc4446df5f subr_hints: Convert some bool-like ints to bools 2018-07-10 00:34:19 +00:00
Kyle Evans
5768da6c21 subr_hints: Use goto/label instead of series of conditionals 2018-07-10 00:33:31 +00:00
Mark Johnston
013072f04c Fix pre-SI_SUB_CPU initialization of per-CPU counters.
r336020 introduced pcpu_page_alloc(), replacing page_alloc() as the
backend allocator for PCPU UMA zones.  Unlike page_alloc(), it does
not honour malloc(9) flags such as M_ZERO or M_NODUMP, so fix that.

r336020 also changed counter(9) to initialize each counter using a
CPU_FOREACH() loop instead of an SMP rendezvous.  Before SI_SUB_CPU,
smp_rendezvous() will only execute the callback on the current CPU
(i.e., CPU 0), so only one counter gets zeroed.  The rest are zeroed
by virtue of the fact that UMA gratuitously zeroes slabs when importing
them into a zone.

Prior to SI_SUB_CPU, all_cpus is clear, so with r336020 we weren't
zeroing vm_cnt counters during boot: the CPU_FOREACH() loop had no
effect, and pcpu_page_alloc() didn't honour M_ZERO.  Fix this by
iterating over the full range of CPU IDs when zeroing counters,
ignoring whether the corresponding bits in all_cpus are set.

Reported and tested by:	pho (previous version)
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D16190
2018-07-10 00:18:12 +00:00
Jamie Gritton
0a1724045e Change prison_add_vfs() to the more generic prison_add_allow(), which
can add any dynamic allow.* or allow.*.* parameter.  Also keep
prison_add_vfs() as a wrapper.

Differential Revision:	D16146
2018-07-06 18:50:22 +00:00
Kyle Evans
cae22dd904 kern_environment: Fix SYSINIT ordering
The dynamic environment was being initialized at SI_SUB_KMEM, SI_ORDER_ANY.
I added the hint-merging at SI_SUB_KMEM, SI_ORDER_ANY as well in r335998 -
this can only work by coincidence.

Re-do both to operate at SI_SUB_KMEM + 1, SI_ORDER_FIRST and SI_ORDER_SECOND
respectively to be safe. It's sufficiently obfuscated away as to when in
SU_SUB_KMEM malloc will be available, and the dynamic environment cannot be
relied upon there anyways since it's initialized at SI_ORDER_ANY.

Reported by:	bde
Discussed with:	bde
X-MFC-With: r335998
2018-07-06 16:51:35 +00:00
Brooks Davis
7524b4c14b Correct breakage on 32-bit platforms from r335979. 2018-07-06 10:03:33 +00:00
Matt Macy
822e50e3f6 epoch(9): simplify initialization
replace manual NUMA aware allocation with a pcpu zone
2018-07-06 06:20:03 +00:00
Matt Macy
ab3059a8e7 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
Andrew Turner
2bf9501287 Create a new macro for static DPCPU data.
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by:	bz, emaste, markj
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D16140
2018-07-05 17:13:37 +00:00
Bjoern A. Zeeb
1534cd19b5 Split up deadlkres() to make it more readable in anticipation of
further changes adding another level of indentation.

Some of the logic got simplified with the break out functions.
There should be no functional changes.

Reviewed by:	kib
Sponsored by:	iXsystems, Inc.
Differential Revision:		https://reviews.freebsd.org/D15914
2018-07-05 17:06:54 +00:00
Kyle Evans
39d44f7f15 kern_environment: use any provided environments, evict hintmode/envmode
At the moment, hintmode and envmode are used to indicate whether static
hints or static env have been provided in the kernel config(5) and the
static versions are mutually exclusive with loader(8)-provided environment.
hintmode *can* be reconfigured later to pull from the dynamic environment,
thus taking advantage of the loader(8) or post-kmem environment setting.

This changeset fixes both problems at once to move us from a semi-confusing
state to a consistent state: if an environment file, hints file, or
loader(8) environment are provided, we use them in a well-known order of
precedence:

- loader(8) environment
- static environment
- static hints file

Once the dynamic environment is setup this becomes a moot point. The
loader(8) and static environments are merged (respecting the above order of
precedence), and the static hints are merged in on an as-needed basis after
the dynamic environment has been setup.

Hints lookup are changed to respect all of the above. Before the dynamic
environment is setup, lookups use the above-mentioned order and fallback to
the next environment if a matching hint is not found. Once the dynamic
environment is setup, that is used on its own since it captures all of the
above information plus any dynamic kenv settings that came up later in boot.

The following tangentially related changes were made to res_find:

- A hintp cookie is now passed in so that related searches continue using
  the chain of environments (or dynamic environment) without relying on
  global state
- All three environments will be searched if they actually have valid hints
  to use, rather than just choosing the first environment that actually had
  a hint and rolling with that only

The hintmode sysctl has been ripped out. static_{env,hints}.disabled are
still honored and will disable their respective environments from being used
for hint lookups and from being merged into the dynamic environment, as
expected.

MFC after:	1 month (maybe)
Differential Revision:	https://reviews.freebsd.org/D15953
2018-07-05 16:30:32 +00:00
Kyle Evans
e28687347f Revert r335995 due to accidental changes snuck in 2018-07-05 16:28:43 +00:00
Kyle Evans
8ef5886303 kern_environment: use any provided environments, evict hintmode/envmode
At the moment, hintmode and envmode are used to indicate whether static
hints or static env have been provided in the kernel config(5) and the
static versions are mutually exclusive with loader(8)-provided environment.
hintmode *can* be reconfigured later to pull from the dynamic environment,
thus taking advantage of the loader(8) or post-kmem environment setting.

This changeset fixes both problems at once to move us from a semi-confusing
state to a consistent state: if an environment file, hints file, or
loader(8) environment are provided, we use them in a well-known order of
precedence:

- loader(8) environment
- static environment
- static hints file

Once the dynamic environment is setup this becomes a moot point. The
loader(8) and static environments are merged (respecting the above order of
precedence), and the static hints are merged in on an as-needed basis after
the dynamic environment has been setup.

Hints lookup are changed to respect all of the above. Before the dynamic
environment is setup, lookups use the above-mentioned order and fallback to
the next environment if a matching hint is not found. Once the dynamic
environment is setup, that is used on its own since it captures all of the
above information plus any dynamic kenv settings that came up later in boot.

The following tangentially related changes were made to res_find:

- A hintp cookie is now passed in so that related searches continue using
  the chain of environments (or dynamic environment) without relying on
  global state
- All three environments will be searched if they actually have valid hints
  to use, rather than just choosing the first environment that actually had
  a hint and rolling with that only

The hintmode sysctl has been ripped out. static_{env,hints}.disabled are
still honored and will disable their respective environments from being used
for hint lookups and from being merged into the dynamic environment, as
expected.

MFC after:	1 month (maybe)
Differential Revision:	https://reviews.freebsd.org/D15953
2018-07-05 16:25:48 +00:00
Bjoern A. Zeeb
0fb9f29bae With the introduction of reapers and reaplists in r275800,
proc0 and init are setup as a circular dependency.

create_init() calls fork1() which calls do_fork(). There the
newproc (initproc) is setup with a reaper of proc0 who's reaper
points to itself. The newproc (initproc) is then put on its
reaper's (proc0) p_reaplist (initproc is a descendants of proc0
for proc0 to reap). Upon return to create_init(), proc0 is
added to initproc's p_reaplist (which would mean proc0 is a
descendant of init, for init to reap). This creates a
circular dependency which eventually leads to LIST corruptions
when trying to kill init and a proc0.

For the base system we never really hit this case during reboot.
The problem only became visible after adding more virtual process
spaces which could go away cleanly (work existing in an experimental
branch).

Reviewed by:	kib
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15924
2018-07-05 16:16:28 +00:00
Brooks Davis
714c03c81e Revert r335983.
The bfd linker in tree doesn't support multiple names for the same
symbol (at least with current flags).
2018-07-05 16:03:03 +00:00
Brooks Davis
5b04a71dae Get rid of netbsd_lchown and netbsd_msync syscall entries.
No valid FreeBSD binary ever called them (they would call lchown and
msync directly) and we haven't supported NetBSD binaries in ages.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15814
2018-07-05 14:12:56 +00:00
Konstantin Belousov
dbadb01591 Silence warnings about unused variables when RACCT is defined but RCTL
is not.

Reported by:	Dries Michiels <driesm.michiels@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-07-05 13:37:31 +00:00
Brooks Davis
f38b68ae8a Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
Matt Macy
10b8cd7f55 epoch(9): make nesting assert in epoch_wait_preempt more specific
Reported by:	markj
2018-07-04 21:34:08 +00:00
Mariusz Zaborski
6cad1a5d14 Add description to debug.ncores sysctl.
Reviewed by:	bcr
Differential Revision:	https://reviews.freebsd.org/D16123
2018-07-04 17:06:51 +00:00
Konstantin Belousov
d6eff0832c Add a way for the process to request cleanup of the kernel cache of
the process arguments.  New arguments length zero causes the drop of
the pargs instead of allocation of useless zero-length buffer.

Submitted by:	Thomas Munro
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16111
2018-07-04 13:22:48 +00:00
Andriy Gapon
b0af06052c remove unneeded inclusion of sys/interrupt.h from several files
It's likely that the header was needed in the past for swi(9).
But now that code does not use swi(9) or any other interfaces defined
in sys/interrupt.h.

MFC after:	1 week
2018-07-04 09:07:18 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Matt Macy
8bedbb4d42 expose thread_lite definition to tied modules 2018-07-03 02:50:07 +00:00
Matt Macy
6443773dab make critical_{enter, exit} inline
Avoid pulling in all of the <sys/proc.h> dependencies by
automatically generating a stripped down thread_lite exporting
only the fields of interest. The field declarations are type checked
against the original and the offsets of the generated result is
automatically checked.

kib has expressed disagreement and would have preferred to simply
use genassym style offsets (which loses type check enforcement).
jhb has expressed dislike of it due to header pollution and a
duplicate structure. He would have preferred to just have defined
thread in _thread.h. Nonetheless, he admits that this is the only
viable solution at the moment.

The impetus for this came from mjg's D15331:
"Inline critical_enter/exit for amd64"

Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D16078
2018-07-03 01:55:09 +00:00
Mariusz Zaborski
0dea6e3c98 core(5): overwrite the oldest core dump
The '%I' format in the kern.corefile sysctl limits the number of
core files that a process can generate to the number stored in the
debug.ncores sysctl. The '%I' format is replaced by the single digit
index. Previously, if all indexes were taken the kernel would overwrite
only a core file with the highest index in a filename.
Currently the system will create a new core file if there is a free
index or if all slots are taken it will overwrite the oldest one.

Reviewed by:	kib(code), bcr (updating)
Differential Revision:	https://reviews.freebsd.org/D15991
Differential Revision:	https://reviews.freebsd.org/D16084
2018-07-01 17:28:46 +00:00
Gleb Smirnoff
95dce07dea Correct r335242. Use unsigned cast instead of abs(). Using abs() gives
incorrect result when ticks has already wrapped, and are about to reach
the cr_ticks value (cr_ticks - ticks < hz).

Submitted by:	bde
2018-06-27 22:00:50 +00:00
Warner Losh
bc6cb3f6b4 Remove devctl_safe_quote since it's now unused.
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026
2018-06-27 04:11:19 +00:00
Warner Losh
349fcda430 Fix devctl generation for core files.
We have a problem with vn_fullpath_global when the file exists. Work
around it by printing the full path if the core file name starts with /,
or current working directory followed by the filename if not.

Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026
2018-06-27 04:11:09 +00:00
Warner Losh
ab531b8825 Create new devctl_safe_quote_sb to copy a source string into a struct
sbuf to make it safe. Callers are expected to add the " " around it,
if needed.

Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026
2018-06-27 04:10:48 +00:00
Matt Macy
74333b3dee fix assert and conditionally allow mutexes to be held across epoch_wait_preempt 2018-06-24 18:57:06 +00:00
Matt Macy
0bcfb47363 epoch(9): Don't trigger taskq enqueue before the grouptaskqs are setup
If EARLY_AP_STARTUP is not defined it is possible for an epoch to be
allocated prior to it being possible to call epoch_call without
issue.

Based on patch by andrew@

PR:		229014
Reported by:	andrew
2018-06-23 07:14:08 +00:00
Colin Percival
7e8db78116 Improve the accuracy of the POSIX "process CPU-time" clocks by adding the
used portion of the current thread's time slice if the current thread
belongs to the process being queried (i.e., if clock_gettime is invoked
with a clock ID of CLOCK_PROCESS_CPUTIME_ID or the value provided by
passing getpid(2) to clock_getcpuclockid(3)).

The CLOCK_VIRTUAL and CLOCK_PROF timers already make this adjustment via
long-standing code in calcru(), but since those timers are not specified
by POSIX it seems useful to add it here so that the higher accuracy is
available to code which aims to be portable.

PR:		228669
Reported by:	Graham Percival
Reviewed by:	kib
MFC after:	1 week
2018-06-22 10:23:32 +00:00
Matt Macy
ae25f40b72 epoch(9): make non-preemptible variant work early boot 2018-06-22 00:47:18 +00:00
Kyle Evans
03d7aee8a7 subr_hints: Fix acpi unit hinting (at the very least)
The refactoring in r335479 overlooked the fact that the dynamic kenv can
also be switched to if hintmode == 0. This is problematic because the
checkmethod bits are only ever ran once, but it worked previously because
the use_kenv was a global state and the first lookup would enable it if
occurring after the dynamic environment has been setup.

Extending our local definition of use_kenv to include all non-STATIC
hintmodes as long as the dynamic_kenv is setup fixes this. We still have
potential issues if the dynamic kenv comes up while we're doing an anchored
search through the environment, but this is not much of a concern right now
because:

1.) The dynamic environment comes up super early in boot, just after kmem

2.) This is going to get rewritten to provide a safer mechanism for the
anchored searches, ensuring that we continue using the same environment
chain (dynamic env or static fallback) for all anchored search invocations

Reported by:	mmamcy
X-MFC-With: r335479
2018-06-21 21:50:00 +00:00
Konstantin Belousov
6e22bbf66e fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the
fork_return() in its context, so parent is stuck forever.  Triggered
when trying to ptrace linux process.  Instead of waiting for the new
thread to clear TDB_STOPATFORK, tag it as traced and reparent to the
debugger in do_fork(), and let it only notify the debugger when run.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
Reviewed by:	jhb
MFC after:	1 week
X-MFC-Note:	keep p_dbgwait placeholder intact
Differential revision:	https://reviews.freebsd.org/D15857
2018-06-21 21:12:49 +00:00