Commit Graph

16195 Commits

Author SHA1 Message Date
mmacy
ff20311f27 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
andrew
ae591a440e Create a new macro for static DPCPU data.
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by:	bz, emaste, markj
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D16140
2018-07-05 17:13:37 +00:00
bz
784ef2f335 Split up deadlkres() to make it more readable in anticipation of
further changes adding another level of indentation.

Some of the logic got simplified with the break out functions.
There should be no functional changes.

Reviewed by:	kib
Sponsored by:	iXsystems, Inc.
Differential Revision:		https://reviews.freebsd.org/D15914
2018-07-05 17:06:54 +00:00
kevans
42651382a2 kern_environment: use any provided environments, evict hintmode/envmode
At the moment, hintmode and envmode are used to indicate whether static
hints or static env have been provided in the kernel config(5) and the
static versions are mutually exclusive with loader(8)-provided environment.
hintmode *can* be reconfigured later to pull from the dynamic environment,
thus taking advantage of the loader(8) or post-kmem environment setting.

This changeset fixes both problems at once to move us from a semi-confusing
state to a consistent state: if an environment file, hints file, or
loader(8) environment are provided, we use them in a well-known order of
precedence:

- loader(8) environment
- static environment
- static hints file

Once the dynamic environment is setup this becomes a moot point. The
loader(8) and static environments are merged (respecting the above order of
precedence), and the static hints are merged in on an as-needed basis after
the dynamic environment has been setup.

Hints lookup are changed to respect all of the above. Before the dynamic
environment is setup, lookups use the above-mentioned order and fallback to
the next environment if a matching hint is not found. Once the dynamic
environment is setup, that is used on its own since it captures all of the
above information plus any dynamic kenv settings that came up later in boot.

The following tangentially related changes were made to res_find:

- A hintp cookie is now passed in so that related searches continue using
  the chain of environments (or dynamic environment) without relying on
  global state
- All three environments will be searched if they actually have valid hints
  to use, rather than just choosing the first environment that actually had
  a hint and rolling with that only

The hintmode sysctl has been ripped out. static_{env,hints}.disabled are
still honored and will disable their respective environments from being used
for hint lookups and from being merged into the dynamic environment, as
expected.

MFC after:	1 month (maybe)
Differential Revision:	https://reviews.freebsd.org/D15953
2018-07-05 16:30:32 +00:00
kevans
c9fbaf1f26 Revert r335995 due to accidental changes snuck in 2018-07-05 16:28:43 +00:00
kevans
7852d84da8 kern_environment: use any provided environments, evict hintmode/envmode
At the moment, hintmode and envmode are used to indicate whether static
hints or static env have been provided in the kernel config(5) and the
static versions are mutually exclusive with loader(8)-provided environment.
hintmode *can* be reconfigured later to pull from the dynamic environment,
thus taking advantage of the loader(8) or post-kmem environment setting.

This changeset fixes both problems at once to move us from a semi-confusing
state to a consistent state: if an environment file, hints file, or
loader(8) environment are provided, we use them in a well-known order of
precedence:

- loader(8) environment
- static environment
- static hints file

Once the dynamic environment is setup this becomes a moot point. The
loader(8) and static environments are merged (respecting the above order of
precedence), and the static hints are merged in on an as-needed basis after
the dynamic environment has been setup.

Hints lookup are changed to respect all of the above. Before the dynamic
environment is setup, lookups use the above-mentioned order and fallback to
the next environment if a matching hint is not found. Once the dynamic
environment is setup, that is used on its own since it captures all of the
above information plus any dynamic kenv settings that came up later in boot.

The following tangentially related changes were made to res_find:

- A hintp cookie is now passed in so that related searches continue using
  the chain of environments (or dynamic environment) without relying on
  global state
- All three environments will be searched if they actually have valid hints
  to use, rather than just choosing the first environment that actually had
  a hint and rolling with that only

The hintmode sysctl has been ripped out. static_{env,hints}.disabled are
still honored and will disable their respective environments from being used
for hint lookups and from being merged into the dynamic environment, as
expected.

MFC after:	1 month (maybe)
Differential Revision:	https://reviews.freebsd.org/D15953
2018-07-05 16:25:48 +00:00
bz
2ec27fa964 With the introduction of reapers and reaplists in r275800,
proc0 and init are setup as a circular dependency.

create_init() calls fork1() which calls do_fork(). There the
newproc (initproc) is setup with a reaper of proc0 who's reaper
points to itself. The newproc (initproc) is then put on its
reaper's (proc0) p_reaplist (initproc is a descendants of proc0
for proc0 to reap). Upon return to create_init(), proc0 is
added to initproc's p_reaplist (which would mean proc0 is a
descendant of init, for init to reap). This creates a
circular dependency which eventually leads to LIST corruptions
when trying to kill init and a proc0.

For the base system we never really hit this case during reboot.
The problem only became visible after adding more virtual process
spaces which could go away cleanly (work existing in an experimental
branch).

Reviewed by:	kib
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15924
2018-07-05 16:16:28 +00:00
brooks
b11bbbe3ee Revert r335983.
The bfd linker in tree doesn't support multiple names for the same
symbol (at least with current flags).
2018-07-05 16:03:03 +00:00
brooks
3585a5abe3 Get rid of netbsd_lchown and netbsd_msync syscall entries.
No valid FreeBSD binary ever called them (they would call lchown and
msync directly) and we haven't supported NetBSD binaries in ages.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15814
2018-07-05 14:12:56 +00:00
kib
bf21f0b257 Silence warnings about unused variables when RACCT is defined but RCTL
is not.

Reported by:	Dries Michiels <driesm.michiels@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-07-05 13:37:31 +00:00
brooks
6615ed4c61 Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
mmacy
7015e5246c epoch(9): make nesting assert in epoch_wait_preempt more specific
Reported by:	markj
2018-07-04 21:34:08 +00:00
oshogbo
661ae84b0f Add description to debug.ncores sysctl.
Reviewed by:	bcr
Differential Revision:	https://reviews.freebsd.org/D16123
2018-07-04 17:06:51 +00:00
kib
304dcfc1f8 Add a way for the process to request cleanup of the kernel cache of
the process arguments.  New arguments length zero causes the drop of
the pargs instead of allocation of useless zero-length buffer.

Submitted by:	Thomas Munro
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16111
2018-07-04 13:22:48 +00:00
avg
da4bc8f61e remove unneeded inclusion of sys/interrupt.h from several files
It's likely that the header was needed in the past for swi(9).
But now that code does not use swi(9) or any other interfaces defined
in sys/interrupt.h.

MFC after:	1 week
2018-07-04 09:07:18 +00:00
mmacy
14de8a2820 epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
mmacy
71bf69db2d expose thread_lite definition to tied modules 2018-07-03 02:50:07 +00:00
mmacy
9a526826f9 make critical_{enter, exit} inline
Avoid pulling in all of the <sys/proc.h> dependencies by
automatically generating a stripped down thread_lite exporting
only the fields of interest. The field declarations are type checked
against the original and the offsets of the generated result is
automatically checked.

kib has expressed disagreement and would have preferred to simply
use genassym style offsets (which loses type check enforcement).
jhb has expressed dislike of it due to header pollution and a
duplicate structure. He would have preferred to just have defined
thread in _thread.h. Nonetheless, he admits that this is the only
viable solution at the moment.

The impetus for this came from mjg's D15331:
"Inline critical_enter/exit for amd64"

Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D16078
2018-07-03 01:55:09 +00:00
oshogbo
6627c6e55a core(5): overwrite the oldest core dump
The '%I' format in the kern.corefile sysctl limits the number of
core files that a process can generate to the number stored in the
debug.ncores sysctl. The '%I' format is replaced by the single digit
index. Previously, if all indexes were taken the kernel would overwrite
only a core file with the highest index in a filename.
Currently the system will create a new core file if there is a free
index or if all slots are taken it will overwrite the oldest one.

Reviewed by:	kib(code), bcr (updating)
Differential Revision:	https://reviews.freebsd.org/D15991
Differential Revision:	https://reviews.freebsd.org/D16084
2018-07-01 17:28:46 +00:00
glebius
9154bee2bc Correct r335242. Use unsigned cast instead of abs(). Using abs() gives
incorrect result when ticks has already wrapped, and are about to reach
the cr_ticks value (cr_ticks - ticks < hz).

Submitted by:	bde
2018-06-27 22:00:50 +00:00
imp
6a3794ff6f Remove devctl_safe_quote since it's now unused.
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026
2018-06-27 04:11:19 +00:00
imp
dc94188c95 Fix devctl generation for core files.
We have a problem with vn_fullpath_global when the file exists. Work
around it by printing the full path if the core file name starts with /,
or current working directory followed by the filename if not.

Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026
2018-06-27 04:11:09 +00:00
imp
494f812e46 Create new devctl_safe_quote_sb to copy a source string into a struct
sbuf to make it safe. Callers are expected to add the " " around it,
if needed.

Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026
2018-06-27 04:10:48 +00:00
mmacy
a3ff028a04 fix assert and conditionally allow mutexes to be held across epoch_wait_preempt 2018-06-24 18:57:06 +00:00
mmacy
6fdf166b3c epoch(9): Don't trigger taskq enqueue before the grouptaskqs are setup
If EARLY_AP_STARTUP is not defined it is possible for an epoch to be
allocated prior to it being possible to call epoch_call without
issue.

Based on patch by andrew@

PR:		229014
Reported by:	andrew
2018-06-23 07:14:08 +00:00
cperciva
cd7d354090 Improve the accuracy of the POSIX "process CPU-time" clocks by adding the
used portion of the current thread's time slice if the current thread
belongs to the process being queried (i.e., if clock_gettime is invoked
with a clock ID of CLOCK_PROCESS_CPUTIME_ID or the value provided by
passing getpid(2) to clock_getcpuclockid(3)).

The CLOCK_VIRTUAL and CLOCK_PROF timers already make this adjustment via
long-standing code in calcru(), but since those timers are not specified
by POSIX it seems useful to add it here so that the higher accuracy is
available to code which aims to be portable.

PR:		228669
Reported by:	Graham Percival
Reviewed by:	kib
MFC after:	1 week
2018-06-22 10:23:32 +00:00
mmacy
2c67c970b2 epoch(9): make non-preemptible variant work early boot 2018-06-22 00:47:18 +00:00
kevans
b549736777 subr_hints: Fix acpi unit hinting (at the very least)
The refactoring in r335479 overlooked the fact that the dynamic kenv can
also be switched to if hintmode == 0. This is problematic because the
checkmethod bits are only ever ran once, but it worked previously because
the use_kenv was a global state and the first lookup would enable it if
occurring after the dynamic environment has been setup.

Extending our local definition of use_kenv to include all non-STATIC
hintmodes as long as the dynamic_kenv is setup fixes this. We still have
potential issues if the dynamic kenv comes up while we're doing an anchored
search through the environment, but this is not much of a concern right now
because:

1.) The dynamic environment comes up super early in boot, just after kmem

2.) This is going to get rewritten to provide a safer mechanism for the
anchored searches, ensuring that we continue using the same environment
chain (dynamic env or static fallback) for all anchored search invocations

Reported by:	mmamcy
X-MFC-With: r335479
2018-06-21 21:50:00 +00:00
kib
03a776d6c5 fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the
fork_return() in its context, so parent is stuck forever.  Triggered
when trying to ptrace linux process.  Instead of waiting for the new
thread to clear TDB_STOPATFORK, tag it as traced and reparent to the
debugger in do_fork(), and let it only notify the debugger when run.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
Reviewed by:	jhb
MFC after:	1 week
X-MFC-Note:	keep p_dbgwait placeholder intact
Differential revision:	https://reviews.freebsd.org/D15857
2018-06-21 21:12:49 +00:00
kib
e4658b2431 Update proc->p_ptevents annotation to reflect the actual locking.
Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
Reviewed by:	jhb
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15954
2018-06-21 21:07:25 +00:00
jhibbits
a26a1b2cb6 Introduce PMCR-based cpufreq(4) driver, for IBM POWER8 and POWER9 systems
Summary: POWER8 and POWER9 use a single CPU register, per core, to change clock
speed.  Everything else is handled by the on-chip controller.  This change
necessitates a change to the cpufreq global kernel driver to bump supported
levels, as the device tree for these systems can have theoretically 256
different options.  On my POWER9 Talos, the list consists of 100 items.  At
16.67MHz intervals, that allows for a change of roughly 1.67GHz between lowest
and highest.

This has only been tested on the POWER9.  However, since they're similar, this
should work on POWER8 as well.

Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15932
2018-06-21 14:26:43 +00:00
kevans
acdfa11d8f subr_hints: simplify a little bit
Some complexity exists in these bits that isn't needed. The sysctl handler,
upon change to '2', runs through the current set of hints and sets them in
the kenv.

However, this isn't at all necessary if we're pulling hints from the kenv,
static or dynamic, as the former will get added to the latter in
init_dynamic_kenv (see: kern_environment.c). We can reduce this
configuration to just adding static_hints to the kenv if we were previously
using them.

The changes in res_find are minimal and based on the observation that once
use_kenv gets set to '1' it will never be reset to '0', and it gets set to
'1' as soon as we hit fallback mode. Later work will refactor res_find a
little bit and eliminate this now-local, because it's become clear that
there's some funkiness revolving around use_kenv=1 and it being used to
imply that we're certainly looking at the dynamic_kenv.

Reviewed by:	ray
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15940
2018-06-21 14:04:02 +00:00
hselasky
86f50c1a6b Permit the kernel environment to set an array of numeric values for a single
sysctl(9) node.

Reviewed by:		kib@, imp@, jhb@
Differential Revision:	https://reviews.freebsd.org/D15802
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-06-20 20:04:20 +00:00
kevans
dd71f380da Add debug.verbose_sysinit tunable for VERBOSE_SYSINIT
VERBOSE_SYSINIT is currently an all-or-nothing option. debug.verbose_sysinit
adds an option to have the code compiled in but quiet by default so that
getting this information from a device in the field doesn't necessarily
require distributing a recompiled kernel.

Its default is VERBOSE_SYSINIT's value as defined in the kernconf. As such,
the default behavior for simply omitting or including this option is
unchanged.

MFC after:	1 week
2018-06-20 19:23:56 +00:00
manu
b5d43b277c Add pmap_mapdev_attr for arm64
This is needed for efifb.
arm and ricv pmap (the two arch with arm64 that uses subr_devmap) have very
different implementation so for now only add this for arm64.

Tested with efifb on Pine64 with a few other patches.

Reviewed by:	cognet
Differential Revision:	https://reviews.freebsd.org/D15294
2018-06-20 16:07:35 +00:00
bz
299899c6fd Instead of using hand-rolled loops where not needed switch them
to FOREACH_PROC_IN_SYSTEM() to have a single pattern to look for.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15916
2018-06-20 11:42:06 +00:00
bz
49043c2660 Sometimes it is helpful to get the path for a vnode.
Implement a ddb function walking the namecache to do this.

Reviewed by:		jhb, mjg
Inspired by:		gdb macro from jhb (old version)
Sponsored by:		iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14898
2018-06-20 08:34:29 +00:00
mmacy
79793784f7 convert inpcbinfo hash and info rwlocks to epoch + mutex
- Convert inpcbinfo info & hash locks to epoch for read and mutex for write
- Garbage collect code that handled INP_INFO_TRY_RLOCK failures as
  INP_INFO_RLOCK which can no longer fail

When running 64 netperfs sending minimal sized packets on a 2x8x2 reduces
unhalted core cycles samples in rwlock rlock/runlock in udp_send from 51% to
3%.

Overall packet throughput rate limited by CPU affinity and NIC driver design
choices.

On the receiver unhalted core cycles samples in in_pcblookup_hash went from
13% to to 1.6%

Tested by LLNW and pho@

Reviewed by: jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15686
2018-06-19 01:54:00 +00:00
ae
a58623ba71 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
glebius
dfbc255945 Since 'ticks' is an int, it may wrap around and cr_ticks at a certain
counter_rate will be greater than ticks, resulting in counter_ratecheck()
failure. To fix this take an absolute value of the difference between
ticks and cr_ticks.

Reported by:	jtl
Sponsored by:	Netflix
2018-06-15 21:36:16 +00:00
bdrewery
16a3710079 proc0_post: Fix some locking issues
- Filter out PRS_NEW procs as rufetch() tries taking the thread lock
  which may not yet be initialized.
- Hold PROC_LOCK to ensure stability of iterating the threads.
- p_rux fields are protected by the process statlock as well.

MFC after:	2 weeks
Reviewed by:	kib
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D15809
2018-06-15 00:36:41 +00:00
cognet
79d2fc0f98 Use M_EXEC when calling malloc() to allocate the memory to store the module,
as it'll contain executable code.
2018-06-14 23:10:10 +00:00
brooks
8e419faaf8 Regen after 335177 (rename sys_obreak to sys_break). 2018-06-14 21:29:31 +00:00
brooks
ad6bae500f Name the implementation of brk and sbrk sys_break().
The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.

Reviewed by:	emaste, kib (older version)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15638
2018-06-14 21:27:25 +00:00
jtl
8222f5cb7c Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
imp
28723d54b4 Implement a 'car limit' for bioq.
Allow one to implement a 'car limit' for
bioq_disksort. debug.bioq_batchsize sets the size of car limit. Every
time we queue that many requests, we start over so that we limit the
latency for requests when the software queue depths are large. A value
of '0', the default, means to revert to the old behavior.

Sponsored by: Netflix
2018-06-13 16:48:07 +00:00
bde
8bff5fd956 Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers.  This
change breaks compatibility with the previous encoding (which was only
used in -current).

Fix truncation to (essentially) 16-bit dev_t in newnfs v3.

Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility.  Extra bits give the much
larger complication that the translations need to compress into fewer
bits.  Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.

The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings.  E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2.  But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102.  ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.

The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them.  I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current.  E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.

I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation.  So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set.  Add some error checking for when bits are lost.  Keep not
doing any error checking for translations for almost everything in
compat/linux.

compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.

compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.

fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself.  Also fix nearby style bugs.

kern/vfs_syscalls.c:
As for freebsd32.  Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.

sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros.  Describe the encoding and
some of the reasons for it.  16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding.  My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits.  This minimizes discontiguities.

Reviewed by:	kib (except for rewrite of the comment in linux_stats.c)
2018-06-13 12:22:00 +00:00
bde
216f1ebfa0 Fix some bugs found while fixing the representation and translation
of 64-bit dev_t's (but not ones involving dev_t's).

st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.

Change freebsd32's copy_ostat() to be no worse than cvtstat().  It was
missing clamping and bzero()ing of padding.

Reviewed by:	kib (except a final fix of the clamp to the signed maximum)
2018-06-13 08:50:43 +00:00
emaste
e452e96446 makesyscalls: simplify capenabled pipeline
Replace cat + 2x grep with one grep.

Sponsored by:	Turing Robotic Industries
2018-06-11 18:57:40 +00:00
mmacy
b55c8854cf limit change to fixing controlp handling pending review 2018-06-11 17:10:19 +00:00