imgact_binmisc matches magic/mask from imgp->image_header, which is only a
single page in size mapped from the first page of an image. One can specify
an interpreter that matches on, e.g., --offset 4096 --size 256 to read up to
256 bytes past the mapped first page.
The limitation is that we cannot specify a magic string that exceeds a
single page, and we can't allow offset + size to exceed a single page
either. A static assert has been added in case someone finds it useful to
try and expand the size, but it does seem a little unlikely.
While this looks kind of exploitable at a sideways squinty-glance, there are
a couple of mitigating factors:
1.) imgact_binmisc is not enabled by default,
2.) entries may only be added by the superuser,
3.) trying to exploit this information to read what's mapped past the end
would be worse than a root canal or some other relatably painful
experience, and
4.) there's no way one could pull this off without it being completely
obvious.
The first page is mapped out of an sf_buf, the implementation of which (or
lack thereof) depends on your platform.
MFC after: 1 week
access the socket send or receive buffer. This is not possible for
listening sockets since r319722.
Because send()/recv() calls fail on listening sockets, fail also ioctl()
indicating EINVAL.
PR: 250366
Reported by: Yong-Hao Zou
Reviewed by: glebius, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D26897
The offset we need to account for in the interpreter string comes in two
variants:
1. Fixed - macros other than #a that will not vary from invocation to
invocation
2. Variable - #a, which is substitued with the argv0 that we're replacing
Note that we don't have a mechanism to modify an existing entry. By
recording both of these offset requirements when the interpreter is added,
we can avoid some unnecessary calculations in the exec path.
Most importantly, we can know up-front whether we need to grab
calculate/grab the the filename for this interpreter. We also get to avoid
walking the string a first time looking for macros. For most invocations,
it's a swift exit as they won't have any, but there's no point entering a
loop and searching for the macro indicator if we already know there will not
be one.
While we're here, go ahead and only calculate the argv0 name length once per
invocation. While it's unlikely that we'll have more than one #a, there's no
reason to recalculate it every time we encounter an #a when it will not
change.
I have not bothered trying to benchmark this at all, because it's arguably a
minor and straightforward/obvious improvement.
MFC after: 1 week
This adds a dedicated counter updated with atomics when INVARIANTS
is used. As a side effect one can reliably determine the lock is held
for reading by at least one thread, but it's still not possible to
find out whether curthread has the lock in said mode.
This should be good enough in practice.
Problem spotted by avg.
This doesn't change anything at the moment since the out-of-order elements
were a pair of uint32_t, but future additions may have caused unnecessary
padding by following the existing precedent.
MFC after: 1 week
When using the ALT+CTRL+ESC sequence to break into kdb, the keyboard is
completely borked when you return. watch(8) shows that it's working, but
it's inserting escape sequences.
Further investigation revealed that VT_ALT_TO_ESC_HACK is the default and
directly conflicts with this sequence, so upon return from the debugger
ALKED is set.
If they triggered the break to debugger, it's safe to assume they didn't
mean to use VT_ALT_TO_ESC_HACK, so just unset it to reduce the surprise when
the keyboard seems non-functional upon return.
Reviewed by: tsoome
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27109
If we hadn't been traced in the first place when syscallenter()
started executing, we can ignore TDB_USERWR. TDB_USERWR can get set,
sure, but if it does, it's because the debugger raced with the syscall,
and it cannot depend on winning that race.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26585
This module handles relatively few execs (initial qemu-user-static, then
qemu-user-static handles exec'ing itself for binaries it's already running),
but all execs pay the price of at least taking the relatively expensive
sx/slock to check for a match when this module is loaded. Future work will
almost certainly swap this out for another lock, perhaps an rmslock.
The RLOCK/WLOCK phrasing was chosen based on what the callers are really
wanting, rather than using the verbiage typically appropriate for an sx.
MFC after: 1 week
We may want to reserve bits in the future for kernel-only use, so start
rejecting any that aren't the two that we're currently expecting from
userland.
MFC after: 1 week
Previously, non-preemptible epochs could not check; in_epoch() would always
fail, usually because non-preemptible epochs don't imply THREAD_NO_SLEEPING.
For default epochs, it's easy enough to verify that we're in the given
epoch: if we're in a critical section and our record for the given epoch
is active, then we're in it.
This patch also adds some additional INVARIANTS bookkeeping. Notably, we set
and check the recorded thread in epoch_enter/epoch_exit to try and catch
some edge-cases for the caller. It also checks upon freeing that none of the
records had a thread in the epoch, which may make it a little easier to
diagnose some improper use if epoch_free() took place while some other
thread was inside.
This version differs slightly from what was just previously reviewed by the
below-listed, in that in_epoch() will assert that no CPU has this thread
recorded even if it *is* currently in a critical section. This is intended
to catch cases where the caller might have somehow messed up critical
section nesting, we can catch both if they exited the critical section or if
they exited, migrated, then re-entered (on the wrong CPU).
Reviewed by: kib, markj (both previous version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27098
Notably, streamline error paths through the existing 'done' label, making it
easier to quickly verify correct cleanup.
Future work might add a kernel-only flag to indicate that a interpreter uses
#a. Currently, all executions via imgact_binmisc pay the penalty of
constructing sname/fname, even if they will not use it. qemu-user-static
doesn't need it, the stock rc script for qemu-user-static certainly doesn't
use it, and I suspect these are the vast majority of (if not the only)
current users.
MFC after: 1 week
Improve the output of the recently often experienced debug message in order
to gather further data.
PR: 237666
Reviewed by: hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D27108
Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.
Reported by: several
PR: 250897
Discussed with: emaste, trasz
Tested by: John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With: r367395
Differential Revision: https://reviews.freebsd.org/D27124
According to code comments the original motivation was to allow for
malloc_type_internal changes without ABI breakage. This can be trivially
accomplished by providing spare fields and versioning the struct, as
implemented in the patch below.
The upshots are one less memory indirection on each alloc and disappearance
of mt_zone.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27104
vt_generate_cons_palette() does take max values of RGB component colours, not
mask. Also we need to set info->fb_cmsize, or vt_fb_init() will re-initialize
the info->fb_cmap.
This brings its 'struct syscall_args' in sync with other architectures.
Reviewed by: bdragon, jhibbits
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26605
While here, use MAXARGS. This brings its 'struct syscall_args' in sync
with most other architectures.
Reviewed by: arichardson, brooks
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26619
This fixes a potential crash in firmware 1.25.0.0 on the passive open
side during TOE operation.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
Fix build errors introduced by r367417 and r367390:
- Guard label reached only by powerpc64
- Guard vm_reserv_level_iffullpop call, that is not defined on powerpc
variants that don't support superpages
- Add missing hwpmc file, for when hwpmc is built into kernel
- Rename cse*() to cse_*() to more closely match other local APIs in
this file.
- Merge the old csecreate() into cryptodev_create_session() and rename
the new function to cse_create().
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27070
My script to convert git commits to svn patch does not handle binary
files correctly, and r367387 committed a set of empty files as a result.
MFC with: r367387
Sponsored by: Rubicon Communications, LLC (Netgate)
This change adds support for transparent superpages for PowerPC64
systems using Hashed Page Tables (HPT). All pmap operations are
supported.
The changes were inspired by RISC-V implementation of superpages,
by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture
and existing MMU OEA64 code.
While these changes are not better tested, superpages support is disabled by
default. To enable it, use vm.pmap.superpages_enabled=1.
In this initial implementation, when superpages are disabled, system
performance stays at the same level as without these changes. When
superpages are enabled, buildworld time increases a bit (~2%). However,
for workloads that put a heavy pressure on the TLB the performance boost
is much bigger (see HPC Challenge and pgbench on D25237).
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25237
Add support for Floating-Point Exception traps on 32 and 64 bit platforms.
Also make sure to clean FPSCR on EXEC and thread exit
Author of initial version: Renato Riolino <renato.riolino@eldorad.org.br>
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23623
This simplifies cryptof_ioctl as it now a wrapper around functions that
contain the bulk of the per-ioctl logic.
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27068
This is consistent with cryptodevkey_cb being defined before it is used
and removes a prototype in the middle of the file.
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27067
OCF drivers in general should perform as many session parameter checks
as possible during probesession rather than when creating a new
session. I got this wrong for aesni(4) in r359374. In addition,
aesni(4) was performing the check for digest-only requests and failing
to create digest-only sessions as a result.
Reported by: jkim
Tested by: jkim
Sponsored by: Chelsio Communications
This breaks the case where the original pointer was NULL but an
in-line IV was used.
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27064
Both the size (128 bytes) and ephemeral nature of allocations make it a great
fit for malloc.
A dedicated zone unnecessarily avoids sharing buckets with 128-byte objects.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D27103
This ensures that no writes are pending in memory, either metadata or
user data, but not including dirty pages not yet converted to fs writes.
Only filesystems declared local are suspended.
Note that this does not guarantee absence of the metadata errors or
leaks if resume is not done: for instance, on UFS unlinked but opened
inodes are leaked and require fsck to gc.
Reviewed by: markj
Discussed with: imp
Tested by: imp (previous version), pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27054
in sync with (most) other architectures. No functional changes.
Reviewed by: manu
Tested by: mmel
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26604
This change adds support for POWER8 and POWER9 PMCs (bare metal and
pseries).
All PowerISA 2.07B non-random events are supported.
Implementation was based on that of PPC970.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26110
Sample usage: kernel modules can decide whether to stick to malloc or
create their own zone.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27097
This provides an OpenCrypto driver for Intel QuickAssist devices. The
driver was initially ported from NetBSD and comes with a few
improvements:
- support for GMAC/AES-GCM, AES-CTR and AES-XTS, and support for
SHA/HMAC-authenticated encryption
- support for detaching the driver
- various bug fixes
- DH895X support
Discussed with: jhb
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D26963
The 2 provided zones had inconsistent naming between each other
("int" and "64") and other allocator zones (which use bytes).
Follow malloc by naming them "pcpu-" + size in bytes.
This is a step towards replacing ad-hoc per-cpu zones with
general slabs.
And add a _74XX suffix to 74XX SPRs.
This is a preparation for adding support to POWER8/9 PMCs, which have most
SPRs equal to 970 ones.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26532
On a sample box vmstat -z shows:
ITEM SIZE LIMIT USED FREE REQ
64: 64, 0, 1043784, 4367538,3698187229
selfd: 64, 0, 1520, 13726,182729008
But at the same time:
vm.uma.selfd.keg.domain.1.pages: 121
vm.uma.selfd.keg.domain.0.pages: 121
Thus 242 pages got pulled even though the malloc zone would likely accomodate
the load without using extra memory.
Memory allocated by bus_dmamem_alloc will take into account any alignment
requirements of the CPU it's running on. Stop trying to bounce in this case
as there is no bounce zone allocated.
Reported by: manu, tuexen
Tested by: manu
Sponsored by: Innovate UK
Add a pseudofs node flag 'PFS_AUTODRAIN', which automatically emits sbuf
contents to the caller when the sbuf buffer fills. This is only
permissible if the corresponding PFS node fill function can sleep
whenever it appends to the sbuf.
linprocfs' /proc/self/maps node happens to meet this requirement.
Streaming out the file as it is composed avoids truncating the output
and also avoids preallocating a very large buffer.
Reviewed by: markj; earlier version: emaste, kib, trasz
Differential Revision: https://reviews.freebsd.org/D27047
- Removed a bunch of redundant headers
- Don't explicitly initialize to 0
- The !error check prior to setting imgp->interpreter_name is redundant, all
error paths should and do return or go to 'done'. We have larger problems
otherwise.