- Use ia32_freebsd4_* instead of ia32_*4.
- Use ia32_o* instead of ia32_*3.
Reviewed by: brooks, imp, kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33882
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.
This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts.
Win10 does not use LAPIC timer to often and in most cases, and I see it
is disabled by writing 0 to Initial Count Register (for Timer).
During resume, restart timer only for enabled LAPIC and enabled timer
for that LAPIC.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33448
This removes one or two atomic update of the pte on access. Also it
makes the pte constant during its lifecycle.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33778
Pre-calculate masks and page table/page directory bases, for LA48 and
LA57 cases, trading a conditional for the memory accesses.
This shaves 672 bytes of .text, by the cost of 32 .data bytes. Note
that eliminated code contained one conditional, and several costly
movabs instructions, which are traded by two memory fetches per
vtop{t,d}e() inlines.
Reviewed by: markj
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33776
Simplify control flow around handling of the execpath length and signal
trampoline. Cache the sysentvec pointer in a local variable.
No functional change intended.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33703
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
The current implementations never correctly return TRUE. In all cases,
when they currently return TRUE, they should have returned FALSE. And,
in some cases, when they currently return FALSE, they should have
returned TRUE. Except for its effects on performance, specifically,
additional page faults and pointless calls to pmap_enter_quick() that
abort, this error is harmless. That is why it has gone unnoticed.
Add a comment to the amd64, arm64, and riscv implementations
describing how their return values are computed.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33659
Notably, the current compat_options only makes sense for native and
freebsd32 ABIs. For the others, it just adds cruft. Switch to having
sets of compat options, and default to the native set. Setup the other
ABIs where it doesn't make sense to opt-out of the native set.
This removes some redundant COMPAT_FREEBSD* stuff from Linuxolator bits.
line_expr in makesyscalls.lua is fixed to allow empty strings to be
specified, since they're harmless.
Reviewed by: brooks, kib (both earlier version)
Differential Revision: https://reviews.freebsd.org/D33356
Suggested by: Michael Pratt <mpratt@google.com>
Reviewed by: Michael Pratt <mpratt@google.com>, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D33423
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
After a round of cleanups in late 2020, all definitions are
functionally identical.
This removes a rotted __aligned(8) on arm. It was added in
b7112ead32 and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6). As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems. The sole
effect is to bloat the struct by 4 bytes.
Reviewed by: kib, jhb, imp
Differential Revision: https://reviews.freebsd.org/D33308
The headers were mostly identical on amd64 and i386.
No functional change intended.
Reviewed by: cperciva, mav, imp, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33205
When a superpage mapping is destroyed and the original page table page
containing 4KB mappings that was being held in reserve is deallocated,
the recently introduced user page table page count was not being
decremented. Consequentially, the count was wrong and would grow over
time. For example, after multiple iterations of "buildworld", I was
seeing implausible counts, like the following:
vm.pmap.kernel_pt_page_count: 2184
vm.pmap.user_pt_page_count: 2280849
vm.pmap.pv_page_count: 106
With this change, I now see:
vm.pmap.kernel_pt_page_count: 2183
vm.pmap.user_pt_page_count: 344
vm.pmap.pv_page_count: 105
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D33276
It stopped being used in 3c256f5395, when trap() was reorganized to
have separate switch statements for user and kernel traps. Remove the
two leftover references and the flag itself.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33253
The vga and splash devices are part of the sc(4) system console. vt(4)
uses the vt_vga driver instead, and has some limited splash support
directly in vt_core.c. Leave the sc(4) options in GENERIC/MINIMAL (for
now) but group them together under an sc(4) comment.
Sponsored by: The FreeBSD Foundation
Followup to b8cf1c5c30, remove from MINIMAL in addition to GENERIC.
options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
the legacy sc(4) console. It is not used by the default console, vt(4).
PR: 253733
Fixes: b8cf1c5c30 ("Remove options VESA from x86 GENERIC")
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
the legacy sc(4) console. It is not used by the default console, vt(4).
There is a report[1] of an incompatibility between VESA and the Nvidia
driver breaking suspend/resume. Since VESA is not used by the default
configuration anyway, just remove options VESA from GENERIC. The kernel
module is still available and may be loaded by sc(4) users who want to
select a VBE mode.
(Note that vt(4) does not support selecting a VBE mode. The loader can
set a VBE mode and vt(4) will use it via the vt_vbefb driver.)
[1] https://lists.freebsd.org/archives/freebsd-hackers/2021-November/000469.html
PR: 253733
Reported by: Stefan Blachmann [1]
Reviewed by: imp, manu, tsoome
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33141
Belatedly remove twa(4). It was supposed to go before 13.0, but was
overlooked.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33114
Belatedly remove esp(4). It was tagged as gone in 13, but was overlooked
until now.
Sponsored by: Netflix
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33115
Belatedly remove amr(4). It was slated to depart before 13.0 but was
overlooked until now.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33113
Belatedly remove iir(4). It was slated to go before 13, but was
overlooked.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33112
We'd said this was going away in 13, but was overlooked. Belatedly
remove.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33111
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33095
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33093
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D32962
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
pipe requires no special handling.
ofreebsd32_sigpending did differ from osigpending in that it acted
on the siglist rather than the sigqueue, but this appears to be an
oversight in 3fbdb3c215.
ogetpagesize could theoretically have ABI-dependent results, but in
practice does not. If it does it would be easy handle in the central
implementation and be the least of the problems in changing the value of
PAGE_SIZE.
Reviewed by: kevans
We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.
Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers. Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().
Reviewed by: alc, kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
This is in preference to simply filling the cpuset, and allows the
conditional in pmap_invalidate_cpu_mask() to be elided.
Also export pmap_invalidate_cpu_mask() outside of pmap.c for use in a
subsequent commit.
Suggested by: kib
Reviewed by: alc, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.
Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964
The MINIMAL configs were overlooked. They are compiled as part of
universe, so this broke universe builds. Add the same defafults as for
GENERIC.
Sponsored by: Netflix
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.
Relnotes: yes
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32868
When working on the ports these functions were slightly different, but
now there's no reason for them to be separate.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Make sys/amd64/linux/linux_ptrace.c machine-independent,
in preparation for moving it into sys/compat/linux/.
No functional changes.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32756
Note that this is largely untested at this point, as was
the previous version; I'm committing this mostly to get
rid of `struct linux_pt_reg`.
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32735
Previously it returned a shorter struct. I can't find any
modern software that uses it, but tests/ptrace from strace(1)
repo complained.
Differential Revision: https://reviews.freebsd.org/D32601
Translate ERESTART into Linux "internal" errno ERESTARTSYS.
This fixes the erestartsys.gen.test from strace(1).
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32623
to match the added accounting of the top-level page table pages.
Reviewed by: markj
Tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32569
both for kernel and user page tables, the later exist in the PTI case.
Reviewed by: markj
Tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32569
This fixes panic when trying to run strace(8) from Focal.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32355
The virtual LAPIC driver uses callouts to implement the LAPIC timer.
Callouts are armed using callout_reset_sbt(), which currently puts
everything on CPU 0. On systems running many bhyve VMs this results in
a large amount of contention for CPU 0's callout lock.
Modify vlapic to schedule callouts on the local CPU instead. This
allows timer interrupts to be scheduled more evenly among CPUs where
bhyve is running.
Reviewed by: grehan, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32559
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
This is a re-application of commit
9068f6ea69.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
The root page is not zeroed at allocation time since with 4-level tables
each entry is copied from a template. However, with 5-level tables only
a single entry is filled, so the rest need to be cleared.
Reported by: alc
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32525
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.
PR: 259036
Requested by: John Hay <john@sanren.ac.za>
Discussed with: ian, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We actually do not know is it safe or not to flush cache for random
BAR/register page existing in the system. It is well-known that for
instance LAPICs cannot tolerate cache flush. As report indicates,
there are more such devices.
This issue typically affects AMD machines which do not report self-snoop,
causing real CLFLUSH invocation on the mapped pages. Intels do self-snoop,
so this change should be nop for them, and unsafe devices, if any, are
already ignored.
Reported and tested by: manu
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument. Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
Drop fpstate only after copying out xfpustate from the thread usermode
save area. Otherwise a context switch between get_fpcontext(), which now
returns the pointer directly into user save area, and copyout, would
cause reinit of the save area, loosing user registers.
Reported, reviewed, and tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D32159
It no longer serves any purpose as thread0's td_frame field is now
initialized during fpuinitstate(). No functional change intended.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32057
When creating a new thread, we unconditionally copy td_frame from the
creating thread. For threads which never return to user mode, this is
unnecessary since td_frame just points to the base of the stack or a
random interrupt frame.
If KASAN is configured this copying may also trigger false positives
since the td_frame region may contain poisoned stack regions. It was
not noticed before since thread0 used a dummy proc0_tf trapframe, and
kernel procs are generally created by thread0. Since commit
df8dd6025a, though, we call
cpu_thread_alloc(&thread0) when initializing FPU state, which
reinitializes thread0.td_frame.
Work around the problem by not copying the frame unless the copying
thread came from user mode. While here, de-duplicate the copying and
remove redundant re(initialization) of td_frame.
Reported by: syzbot+2ec89312bffbf38d9aec@syzkaller.appspotmail.com
Reviewed by: kib
Fixes: df8dd6025a
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32057
This reverts commit 0f6829488e.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble. Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant. Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.
PR: 258678
Reported by: cy
Reviewed by: cy, kevans, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32060
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.
There is no reason to keep it in FreeBSD.
Approved by: ed (private mail)
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31923
For signal send, copyout from the user FPU save area directly.
For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area. We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.
Requested by: jhb
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
Instead do one more allocation at the thread creation time. This frees
a lot of space on the stack.
Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn(). This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.
A useful experiment now would be to reduce KSTACK_PAGES.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
from machdep.c which is too large pile of unrelated things.
Some ptrace functions are moved from machdep.c to ptrace_machdep.c.
Now machdep.c contains code mostly related to the low level initialization
and regular low level operation of the architecture, while signal MD code
and registers handling is placed in exec_machdep.c.
Reviewed by: jhb, markj
Discussed with: jrtc27
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
Reimplement bdf0f24bb1 by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.
Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968
This is one of the pieces required to make modern (ie Focal)
strace(1) work.
Reviewed By: jhb (earlier version)
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D28212
There is no need to restrict trampoline page table to low 1M, it
should work with any pages below 4G. Only wakeup code itself should
be below 1M.
Do not waste level 5 page when LA48 mode is used.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31931
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31931
- Re-implement pcib interface to use standard pci bus driver on top of
vmd(4) instead of custom one.
- Re-implement memory/bus resource allocation to properly handle even
complicated configurations.
- Re-implement interrupt handling to evenly distribute children's MSI/
MSI-X interrupts between available vmd(4) MSI-X vectors and setup them
to be handled by standard OS mechanisms with minimal overhead, except
sharing when unavoidable.
Successfully tested on Dell XPS 13 laptop with Core i7-1185G7 CPU (VMD
device ID 0x9a0b) and single NVMe SSD, dual-booting with Windows 10.
Successfully tested on Supermicro X11DPI-NT motherboard with Xeon(R)
Gold 6242R CPUs (VMD device ID 0x201d), simultaneously handling NVMe
SSD on one PCIe port and PLX bridge with 3 NVMe and 1 AHCI SSDs on
another. Handles SSD hot-plug (except Optane 905p for some reason,
which are not detected until manual bus rescan) and enabled IOMMU
(directly connected SSDs work, but ones connected to the PLX fail
without errors from IOMMU).
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential revision: https://reviews.freebsd.org/D31762
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.
Add regression tests.
Reviewed by: jhb, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31156
Add support for the KVM paravirtual clock device.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29733
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.
Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416
Add a variant of 'rdtscp()' that retains and returns the 'IA32_TSC_AUX'
value read by 'rdtscp'.
Sponsored By: Juniper Networks, Inc.
Sponsored By: Klara, Inc.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D31415
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.
Differential revision: https://reviews.freebsd.org/D31474
MFC after: 2 weeks
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().
Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31472
MFC after: 2 weeks
The correct condition is to check the number of ivhd entries fit into
the array.
Reported by: bz
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D31514
Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code. This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls. Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.
Use these functions in amd64 interrupt and exception handlers. Note
that handlers for user->kernel transitions need not be annotated.
Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31467
- During boot, allocate PDP pages for the shadow maps. The region above
KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array. For now, this array is
not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled. As with
KASAN, sanitizer instrumentation appears to create stack frames large
enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured. KMSAN
cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured. Each
buffer has a static MAXPHYS-sized allocation of KVA, which in turn
eats 2*MAXPHYS of space in the shadow map.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295
KMSAN enables the use of LLVM's MemorySanitizer in the kernel. This
enables precise detection of uses of uninitialized memory. As with
KASAN, this feature has substantial runtime overhead and is intended to
be used as part of some automated testing regime.
The runtime maintains a pair of shadow maps. One is used to track the
state of memory in the kernel map at bit-granularity: a bit in the
kernel map is initialized when the corresponding shadow bit is clear,
and is uninitialized otherwise. The second shadow map stores
information about the origin of uninitialized regions of the kernel map,
simplifying debugging.
KMSAN relies on being able to intercept certain functions which cannot
be instrumented by the compiler. KMSAN thus implements interceptors
which manually update shadow state and in some cases explicitly check
for uninitialized bytes. For instance, all calls to copyout() are
subject to such checks.
The runtime exports several functions which can be used to verify the
shadow map for a given buffer. Helpers provide the same functionality
for a few structures commonly used for I/O, such as CAM CCBs, BIOs and
mbufs. These are handy when debugging a KMSAN report whose
proximate and root causes are far away from each other.
Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation
KMSAN requires two shadow maps, each one-to-one with the kernel map.
Allocate regions of the kernels PML4 page for them. Add functions to
create mappings in the shadow map regions, these will be used by the
KMSAN runtime.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295
Also remove a redundant assertion in pmap_kasan_enter().
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295
While here, use designated initializers and rename some AMD iommu method
implementations to match the corresponding op names. No functional
change intended.
Reviewed by: grehan
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31462
This does not appear to affect code generation, at least with the
default toolchain.
Noticed because incorrect output specifications lead to false positives
from KMSAN, as the instrumentation uses them to update shadow state for
output operands.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31466
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.
MFC after: 3 days
Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31372
On amd64 XENHVM depends on the xentimer device for PVH early startup,
so both should be added or removed together (like the current
dependency with xenpci). Fix this by adding xentimer to NOTES and
updating the comments on the config files. Note that on i386 there's
no such dependency between xentimer and XENHVM, since there's no PVH
support.
While there also fix the MINIMAL i386 build to include the xentimer,
so it keeps the same functionality as before xentimer was split from
XENHVM.
Reported by: lwhsu
PR: 257549
Fixes: ae59812748 ('xen/timer: make xen timer optional')
Current expression checks that vm_page_alloc(9) never returns a page
belonging to the preload area. This is not true if something was freed
from there, for instance a preloaded module was unloaded, or ucode update
freed.
Only check that we never allow to allocate a page belonging to the kernel
proper, check against _end.
Reported and tested by: dhw
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
Allow any 2M aligned contiguous location below 4G for the staging
area location. It should still be mapped by loader at KERNBASE.
The assumption kernel makes about loader->kernel handoff with regard to
the MMU programming are explicitly listed at the beginning of hammer_time(),
where kernphys is calculated. Now kernphys is the variable instead of
symbol designating the physical address.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
KASAN and KCSAN implement interceptors for various primitive operations
that are not instrumented by the compiler. KMSAN requires them as well.
Rather than adding new cases for each sanitizer which requires
interceptors, implement the following protocol:
- When interceptor definitions are required, define
SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX.
- In headers that declare functions which need to be intercepted by a
sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide
declarations.
- When SAN_RUNTIME is defined, do not redefine the names of intercepted
functions. This is typically the case in files which implement
sanitizer runtimes but is also needed in, for example, files which
define ifunc selectors for intercepted operations.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
There is no reason now why do we need to allocate trampoline page very
early in the boot process. The only requirement for the page is that
it is below 1M to be usable by the real mode during init. This can be
handled by vm_alloc_contig() when we do the startup.
Also assert that startup trampoline fits into single page. In principle
we can do multi-page allocation if needed, but it is not.
Move the alloc_ap_trampoline() function and the boot_address variable to
i386/mp_machdep.c. Keep existing mechanism of early alloc on i386.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31343
KMSAN instrumentation requires thread-local storage to track
initialization state for function parameters and return values. This
buffer is accessed as part of each function prologue. It is provided by
the KMSAN runtime, which looks up a pointer in the current thread's
structure.
When KMSAN is configured, init_secondary() is instrumented, but this
means that GS.base must be initialized first, otherwise the runtime
cannot safely access curthread. Work around this by loading GS.base
before calling init_secondary(), so that the runtime can at least check
curthread == NULL and return a pointer to some dummy storage. Note that
init_secondary() still must reload GS.base after calling lgdt(), which
loads a selector into %gs, which in turn clears the base register.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31336
There is no reason to initialize it to anything else, and this matches
initialization of the BSP. No functional change intended.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31336
gcc failed as it didn't inlined the builtins and generates calls to
the libgcc, ld can't find libgcc as cross-toolchain libgcc is not installed.
To avoid this add internal vDSO ffs functions without optimized builtins.
Reported by: jhb
MFC after: 2 weeks
The timer is not used on ARM.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29041
Stop using temporal page table with 1:1 mapping of low 1G populated over
the whole VA. Use 1:1 mapping of low 4G temporarily installed in the
normal kernel page table.
The features are:
- now there is one less step for startup asm to perform
- the startup code still needs to be at lower 1G because CPU starts in
real mode. But everything else can be located anywhere in low 4G
because it is accessed by non-paged 32bit protected mode. Note that
kernel page table root page is at low 4G, as well as the kernel itself.
- the page table pages can be allocated by normal allocator, there is
no need to carve them from the phys_avail segments at very early time.
The allocation of the page for startup code still requires some magic.
Pages are freed after APs are ignited.
- la57 startup for APs is less tricky, we directly load the final page
table and do not need to tweak the paging mode.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
To determine whether to use alternate signal stack or not,
we need to use the native signal number, not the one translated
with bsd_to_linux_signal().
In practical terms, this fixes golang.
Reviewed By: dchagin
Fixes: 135dd0cab5
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D31298
When a cmpset for removing the PG_RW bit in pmap_promote_pde() fails,
there is no need to repeat the alignment, PG_A, and PG_V tests just to
reload the PTE's value. The only bit that we need be concerned with at
this point is PG_M. Use fcmpset instead.
MFC after: 1 week
Old expression happens to provide the correct answer, but assumes that
kernel is loaded at physical address zero, with 2M gap. Do not use
kernphys to calculate KVA of kernel text start, just explicitly write
out KERNBASE and the hole size.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
The call to pmap_allow_2m_x_page() in pmap_enter_object() is redundant.
Specifically, even without the call to pmap_allow_2m_x_page() in
pmap_enter_object(), pmap_allow_2m_x_page() is eventually called by
pmap_enter_pde(), so the outcome will be the same. Essentially,
calling pmap_allow_2m_x_page() in pmap_enter_object() amounts to
"optimizing" for the unexpected case.
Reviewed by: kib
MFC after: 1 week
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.
PR: 255947
Submitted by: Philippe Michaud-Boudreault
Differential Revision: https://reviews.freebsd.org/D30332
The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();
As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.
Reviewed by: trasz
Differential Revision: https://reviews.freebsd.org/D30902
MFC after 2 weeks
In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D30901
MFC after: 2 weeks
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.
The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
implemented in the vDSO: for now gettimeofday, clock_gettime.
The first two have been implemented, so add the implementation of system
calls.
System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).
In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.
Tested by: trasz (arm64)
Differential revision: https://reviews.freebsd.org/D30900
MFC after: 2 weeks
Temporary add stubs to the Linux emulation layer which calls the existing hook.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30911
MFC after: 2 weeks
In preparation for vDSO code revision get rid of incomplete vDSO methods
from locore, but leave .note.Linux section commented out.
.note.Linux section is used by glibc rtld to get the kernel version, that
saves one system call call. I'll try to implement it later, if figure out
how to use it with jails.
MFC after: 2 weeks
The underlying types for both are the same so arguably this doesn't
really matter, but using the wrong type is still confusing and
technically incorrect.
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
pmap_copy() is used to speculatively create mappings, so those mappings
should not have their access bit preset.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31162
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE(). This enables the compiler to generate
slightly smaller machine code.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31161
The ACPI parsing code around rid range was wrong on assuming there is
only one pair of start/end device id range. Besides, ivhd_dev_parse()
never work as supposed. The start/end rid info was always zero.
Restructure the code to build dynamic-sized tables for each IOMMU softc
holding device entries. The device entries are enumerated to find a
suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU
unit itself) are no-op from now on. There are also a minor fix on wrong
%b formatting string usage.
Tested on my EPYC 7282.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D30827
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.
The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.
Thanks to Mike Karels for testing and feedback on the driver.
Reviewed by: bcr (manpages), kbowling, scottl, erj
MFC after: 1 month
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30668
Remove deugging stuff, since it's arguably not needed in a minimal
setup. Also vlan, tuntap and gif since they can be loaded.
imp didn't include the part of the patch that removed xen guest support.
Xen guest is relatively small and has no way of being loaded.
Reviewed by: imp
PR: 229564
MFC After: 3 days
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
Otherwise KASAN may generate false positives if the trapframe was
written into a poisoned region of the stack.
Reported by: pho
Sponsored by: The FreeBSD Foundation
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked. Eliminate some of these unnecessary tests.
Reviewed by: andrew, kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31014
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables. Some bits are still missing.
This is based on a prototype by chuck@.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30019
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock. (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30951
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
Hide the vDSO defines to the linux32_sysvec as they are not intended to
be used outside of it. Fix LINUX32_PS_STRINGS, use the size of
struct linux32_ps_strings instead of a numeric constant.
MFC after: 2 weeks
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().
I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.
This is the first stage of Linuxulator's vdso revision.
Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks
Stop confusing people, retire COMPAT_LINUX and COMPAT_LINUX32 kernel
build options. Since we have 32 and 64 bit Linux emulators, we can't build both
emulators together into the kernel. I don't think it matters, Linux emulation
depends on loadable modules (via rc).
Cut LINPROCFS and LINSYSFS for consistency.
PR: 215061
Reviewed by: bcr (manpages), trasz
Differential Revision: https://reviews.freebsd.org/D30751
MFC after: 2 weeks
This makes it easier to compare the two. This involves moving
the mutex slightly lower down, but there should be no functional
changes.
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30541